condor.depaul.educondor.depaul.edu/dmumaugh/readings/handouts/csc343/... · web viewa machine has a...

Memory Managementby Glen Lancaster

edited by Dennis Mumaugh

1. Some Memory Management History

The process model of an operating system puts constraints and demands on the memory management that it uses. Consider these two types

Monoprogramming

Multiprogramming

Early systems required executable programs to be in "one piece" or contiguous when loaded in memory.

Example 1. C-version Assembly version Machine code version Main.c Main.s loc. Main.o ------------- ------------- ------------- | .... | .... 00 | .... | while(...){ | loop: ... 08 | ... | ... | ... 16 | ... | } | jmp loop 24 | jmp 08 | | |

Here we assume each machine instruction is 8 bytes long. Each time an instruction is fetched, the PC is incremented by 8 to contain the address of the next instruction. This simple fact about the CPU hardware meant that the instructions of a program HAD to be loaded into consecutive memory locations. You couldn't put part of a program code in one place in memory and part in another disconnected part of memory.

Example 2. Subroutine calls and linking. Suppose Main.c calls a subroutine f in the file Sub.c

C-version Assembly version Machine code version Main.c Main.s loc. Main.o ------------- ------------- ------------- | .... | .... 00 | .... | while(...){ | loop: ... 08 | ... | ... | ... 16 | ... | } | jmp loop 24 | jmp 08 | f(); | call _f 32 | call 0000

C-version Assembly version Machine code version Sub.c Sub.s loc. Sub.o ------------- ------------- ------------- | void f() | _f:... 00 | ... | { | ... 08 | ... | while(){ |loop2:... 16 | ... | ... | ... 24 | ... | } | jmp loop2 32 | jmp 16 | } | ret 40 | ret | | |

1 of 17

Memory Management

The link step first concatenates these two object files: Machine code version before addresses are "FIXED"

loc. Main.o ------------- 00 | .... 08 | ... 16 | ... 24 | jmp 08 32 | call 0000 ; This address must be RESOLVED

------------------ 40 | ... ; Sub.o starts here 48 | ... 56 | ... 64 | ... 72 | jmp 16 ; This address must be RELOCATED 80 | ret |

Machine code version after addresses are "FIXED" loc. Main.o ------------- 00 | .... 08 | ... 16 | ... 24 | jmp 08 32 | call 40 ; This address was RESOLVED

------------------ 40 | ... ; Sub.o starts here 48 | ... 56 | ... 64 | ... 72 | jmp 56 ; This address was RELOCATED 80 | ret |

1a. monoprogramming

Simple monoprogramming systems can load compiled/linked programs into the same location each time the program is to be executed. In the example above, the program would have to be loaded at location 0 each time it was executed. For if it were instead loaded at location 100, we would get:

Machine code version after addresses are "FIXED" loc. Main.o ------------- 100 | .... 108 | ... 116 | ... 124 | jmp 08 132 | call 40 ; This address was RESOLVED at link time

------------------ 140 | ... ; Sub.o starts here 148 | ... 156 | ... 164 | ... 172 | jmp 56 ; This address was RELOCATED at link time 180 | ret |

2 of 17

Memory Management

The jmp instructions and the call instruction now will go to the wrong addresses. In monoprogramming, this is not a big problem, since the system only has one program to execute in memory at a time. It can always be loaded at the same place. Memory protection is not a big problem, at least as far as users in a monoprogramming system, since there is by definition only ONE user program in memory at a time.

1b. Multiprogramming

In multiprogramming two or more programs will be loaded in memory to execute at the same time. Clearly, we can't load both of them starting at location 00. Therefore, Multiprogramming may require some extra effort on the part of the loader to "FIX" addresses.

Memory protection is also a problem in a multiprogramming system. What is to keep the code of one program from containing an instruction that has a jmp to an address that is in another program's code in memory?

Addresses can be logical (labels), numerical (but not yet relocated or resolved), but are at some point translated to physical numerical memory addresses. This process of associating a final physical memory address with a logical address in the program (C, assembler, or machine object file) is called "address binding".

1c Address Binding

There are different times at which address binding can occur, ranging from very early to very late: Compile/Link time. Load time. (When the whole program is loaded.) Execution time (Each time each instruction is executed.)Compile/Link Time.

In a monoprogramming system, address binding could occur at compile/link time. All program addresses would be relocated or resolved by the linker. The loader would be a "simple" loader. It would always load any executable program into the same starting location (00 in our example) from disk.

Load Time.

In a multiprogramming system, the linker could resolve calls at link time, but a "relocating loader" would be needed. It would relocate jmp and call target addresses by adding to each one the starting address of where the program was loaded. This starting address will, in general, be different each time the program is executed, as it depends on what memory is not being used at the time.

Execution Time.

An alternative to binding at load time is to wait even longer to translate jmp and call target addresses to physical addresses. They could be translated just as the jmp or call instruction is about to be executed. The loader would just be a simple loader again. It would not resolve any jmp or call target addresses. So they would be "wrong" and still need to be fixed before the instruction is executed. How? Good question. This is what virtual memory systems (discussed below) do.

3 of 17

Memory Management

Position Independent Code [added by dlm]

Most modern computer architectures have an instruction set that supports Position Independent Code (PIC) and Position Independent Data (PID). This takes advantage of the design of the instructions and especially the address. The instruction address field is composed of an address and register combination. The instruction looks like:

opcode mode register1 address1 register2 address2

Where opcode is the instructionmode is how the address is calculatedregister is a general-purpose registeraddress is the address of the operand

One of the modes can be immediate then the address is used as the value PC relative then the address is used as a signed offset from the PC

If the compiler generates PIC then it is possible to link an executable that does not need to be executed at any specific location. Each chip designer tries to build this feature into the instruction set and most compiler developers provide for this capability.

Sample instructions:mov address, r0 => mov offset(pc), r0mov #123, r0 => movi 123, r0

2. Best-Fit, Worst-Fit, First-Fit etc Problem for contiguous memory systems.

Given a list of memory requests. The problem is to find free memory that is big enough for the request and that is in one piece (contiguous). If there are several choices, is one choice better than another? There are at least three possible algorithms: best-fit, first-fit, and worst-fit. We will look at an example in class. None of the algorithms will out perform the others in all situations. The main problem is that memory gets "fragmented" into small contiguous pieces that cannot be allocated. Several free blocks of memory cannot be combined to satisfy a memory request since they do not form ONE contiguous block as is required for program code.

Question: How does the process model of MS-DOS affect the memory requests and the memory releases? How does that affect the way in which you would allocate memory for processes?

3. Virtual Memory

3.1 Addresses bound at execution time

3.2 MMU sits between MAR and other CPU components

The Memory Management Unit (or MMU) is a hardware component of the CPU (if it has it) that is responsible for translating logical (also called virtual) addresses to physical addresses.

3.3 Contiguity requirement dropped.

If the memory need not be contiguous, then it will consist of 2 or more contiguous pieces. There are then two major alternatives:

a) The pieces are allowed to be of varying sizes

Or b) the pieces are all the same size

In the first case, the pieces are called segments. In the second, pages. We will confine the discussion to the paged virtual memory version for the rest of this lecture.

4 of 17

Memory Management

So physical memory is divided into EQUAL sized units called page frames. A typical size is 1K bytes. Executable program code files are then split into the same size units (1K bytes), which are called "virtual pages" or "logical pages" or simply "pages". So one page fits exactly into one page frame.

When using virtual memory the operating system must keep track of the free page frames instead of a list of free blocks maintained when using contiguous memory allocation. It need not keep track of the size, since all page frames are the same fixed size.

Picture for contiguous storage:(logical) (physical) size of free address program (code) address physical memory 0 +--------+ 0 +--------+ | | | | + + + + 300 | | | free | + + + + | | | | + + 300 +--------+ | | | used | + + 400 +--------+ 100 | | | free | 499 +--------+ 500 +--------+ | used | 600 +--------+ | | 300 + + | free | + + | | 899 +--------+

Picture for paged storage:(logical) (physical) address program (code) address physical memory 0 +--------+ 0 +--------+ | | | free | 100 +--------+ +--------+ | | | free | 200 +--------+ +--------+ | | | free | 300 +--------+ 300 +--------+ | | | used | 400 +--------+ 400 +--------+ | | | free | 499 +--------+ 500 +--------+ | used | 600 +--------+ | free | +--------+ | free | +--------+ | free | 899 +--------+

5 of 17

Memory Management

Assume the page size is 100 bytes for this example. We divide the program into pages and memory is already divided into page frames. Suppose that free list of frames is (0,1,2,4,6,7,8). We can use the first five to hold our program:page page frame contents +--------+ +--------+ 0 | | 0 | page 1 | +--------+ +--------+ 1 | | 1 | page 0 | +--------+ +--------+ 2 | | 2 | page 2 | +--------+ +--------+ 3 | | 3 | | +--------+ +--------+ 4 | | 4 | page 3 | +--------+ +--------+

5 | | +--------+ 6 | page 4 | +--------+ 7 | | +--------+ 8 | | +--------+

3.4 Address translation

A virtual address like 425 thought of as consisting of two parts: the page and the offset from the beginning of that page.

page number = floor(address/page_size)

offset = address mod page_size

So for virtual address 425,

page number = floor(425/100) = 4

offset = 425 mod 100 = 25

For each page of the program, the MMU keeps track of which page frame has been allocated to it. For the example above, page 4 has been allocated page frame 6. To find out what physical address has been allocated to a virtual address like 425 we just have to add the offset to the address of the beginning of the page frame. But the beginning page frame address is just the page frame number times the page size. So

physical address = page_frame * page_size + offset

E.g., page 4 is loaded in page frame 6. So an instruction at virtual address 425 is on page 4 and offset 25. Its physical address is then

physical address = 6 * 100 + 25 = 625.

6 of 17

Memory Management

3.5 MMU and page tables (abstract and concrete)

To do the translation above, an abstract table, PT - the page table - is used. Its contents are page frame numbers, and it is indexed by the virtual page number so that PT [n] contains the page frame number that was allocated to virtual page n. The page table for the example above would be: virtual page page frame

+--------+ 0 | 0 | PT[0] = 0 +--------+ 1 | 1 | PT[1] = 1 +--------+ 2 | 2 | PT[2] = 2 +--------+ 3 | 4 | PT[3] = 4 +--------+ 4 | 6 | PT[4] = 6 +--------+

So the address translation of virtual address va = 425 is calculated as follows

1. Extract the page number and offset from the virtual address:

n = page number = floor(va/page_size) = floor(425/100) = 4

off = offset = va mod page_size = 425 mod 100 = 25

2. Then use the page table to look up the page_frame which holds this page.

page_frame = PT[n] = PT[4] = 6

3. Calculate the physical address

physical address = page_frame * page_size + offset

= 6*100 + 25

= 625

The abstract page table above might be concretely implemented as a special internal memory in the MMU, say 8 locations starting at MR0. Suppose also that each location can hold a 4-byte integer. +-----------+ MR0 | 0 | +-----------+ MR0 + 1*4 | 1 | +-----------+ MR0 + 2*4 | 2 | +-----------+ MR0 + 3*4 | 4 | +-----------+ MR0 + 4*4 | 6 |

+-----------+ MR0 + 5*4 | ? |

+-----------+ MR0 + 6*4 | ? |

+-----------+ MR0 + 7*4 | ? |

+-----------+

So the page_frame lookup in step 2. above for virtual page 4 would be

page_frame = contents(MR0 + 4*4) = 6

7 of 17

Memory Management

and more generally, the page_frame that holds virtual page n would be

page_frame = contents(MR0 + n*4)

3.6 Benefits

Memory allocation is simple again, even in a multiprogramming environment. We no longer need complicated memory allocation algorithms like best-fit, first-fit, etc. If 20 pages are needed, then ANY free 20 page frames will do. They need not be consecutive page frames.

Fragmentation revisitedWe still have a little wasted space "internal fragmentation." Each program will not use 1/2 of the last page on the average. So just like cluster sizes for files, the page size should not be too large in order to avoid wasted disk and memory space.

3.7 Costs

3.7.1 MMU makes the CPU more complex

This may make it more expensive. It also may contribute to tradeoffs regarding other components in the CPU - fewer user registers, etc.

3.7.2 Problem if page table is to large to fit in MMU registers

In this case, the MMU is organized a little differently. It will contain a special associative memory or cache to hold recently used page table entries.

For example, suppose the actual page table can hold up to 50 entries but the MMU associative translation cache can only hold 10 items. Each cache entry consists of a pair - a key and a value. The value is a page_frame number. The key is part of the virtual page number that is stored in that page_frame. Since the cache has only five entries, there are not enough for each possible virtual page, n. So n is split into two parts, floor(n/10) and n mod 10. For a number between 00 and 49, these are just the left and right digits. The right digit is used as an index into the cache, C, and then the key is compared with the left digit. If it matches this is the entry for n. If the key does not match the cache does not contain an entry for n.

For example to look for the entry for virtual page 04, the right digit is 4, so we look up C[4]. Now C[4].key = 0, which matches the left digit. So the

page_frame number for virtual page 04 = C[4].value = 6.

This is a cache "hit" since the entry for 04 was in the cache.

An example of a cache miss would occur if we look up the entry for virtual page 35. Since 35 has the right digit = 5, we first compare C[5].key (= 4) with the left digit (= 3). Since they are not the same, the cache does not contain the entry for virtual address 35. Instead, it does contain the entry for virtual address 45.

8 of 17

Memory Management

index key value valid |_____|_______|_______| 0 | 4 | 6 | 1 | |_____|_______|_______| 1 | 4 | 7 | 1 | |_____|_______|_______| 2 | 4 | 12 | 1 | |_____|_______|_______| 3 | 1 | 8 | 1 | |_____|_______|_______| 4 | 0 | 6 | 1 | |_____|_______|_______| 5 | 4 | 9 | 1 | |_____|_______|_______| 6 | 3 | 11 | 0 | |_____|_______|_______| 7 | 5 | 22 | 0 | |_____|_______|_______| 8 | 5 | 3 | 0 | |_____|_______|_______| 9 | 5 | 13 | 0 | |_____|_______|_______|

Note that for this cache organization, only ONE virtual address in the range 30 - 39 can have an entry in the cache at a time. If we need the entry for 35, the MMU would have to read the entry from the page table in memory. Suppose virtual page 35 is in page frame 15. Then after reading this entry, the MMU would change the entry in the cache by setting

C[3].key = 5 and C[3].value = 15

3.7.3 Effective Memory Access Time using an Associative Translation Cache

When the MMU hardware using an associative translation cache is presented, a virtual address to translate it tries to get the page table entry from memory at the same time it searches the associative translation cache. If found in the associative translation cache, it cancels the memory read. If not found, it waits until it gets the page table entry from memory, then the MMU puts that entry in the appropriate translation cache slot, and calculates the correct physical memory address as usual.

Supposea) the translation buffer "search" takes 20 nsb) a memory read takes 100 nsc) for 90% of translations the entry is found in the buffer.

what is the effective time to access data given the virtual (not the physical) address? how much more time does the MMU add to each memory access? it's more than 20ns.Answer:For a "hit" (the entry is in the MMU translation buffer cache) the time is 20 ns (search the cache for the entry for this virtual address and do the translation in 0 time)100 ns fetch the item from the physical memory address just calculated.total 120 nsb) For a "miss" (the entry is not in the MMU translation buffer cache) the time is100 ns (get the correct page table entry from memory) 0 ns translate the address100 ns get the item from the physical memory address just calculatedtotal 200 ns.the effective access time is the weighted average of these two, weighted according to how often each occurs:

9 of 17

Memory Management

effective access time = hit_frequency * total_access_time_for_a_hit + miss_frequency * total_access_time_for_a_miss

= .90 * 120 ns + .10 * 200 ns = 108 ns + 20 ns = 128 nsWithout the MMU and virtual addresses, a memory access would not need to translate an address and the access time would just be 100 ns. using virtual memory addresses and the MMU changes this to 128 ns - an increase of 28 ns for each virtual address access.

Even if the hit "ratio" is 100%, the effective access time will be 120 ns. At the other extreme, if there is no cache for holding page table entries in the MMU, the hit ratio will be 0 and the effective access time using virtual addresses will be 200 ns - twice the 100 ns required for a physical memory access. We want to hit ratio to be high so that the effective access time is close to the 120 ns minimum.

3.8 Partial loading

3.8.1 Requirements to support partial loading

a) MMU page table must now indicate whether the page is loaded or not. Add a bit field to each entry, the valid bit. If the page is loaded in memory, the valid bit is 1, otherwise valid is 0.

b) Must have a fetch policy - how many and which pages to load when a page is referenced that has not been loaded.

c) Must have a replacement policy - if there is no room for the new page, the one that is not being used must be replaced. Which one?

d) efficient handling of references to pages not loaded - page faults (like interrupts but a little different)

The modified page table now looks like this:

virtual valid page bit page frame +-----+--------+ 0 | 1 | 0 | PT[0].valid = 1, PT[0].frame = 0 +-----+--------+ 1 | 1 | 1 | PT[1].valid = 1, PT[1].frame = 1 +-----+--------+ 2 | 0 | ? | PT[2].valid = 0, page not loaded +-----+--------+ 3 | 1 | 4 | PT[3].valid = 1, PT[3].frame = 4 +-----+--------+ 4 | 1 | 6 | PT[4].valid = 1, PT[4].frame = 6 +-----+--------+

3.8.2 page faults

A page fault occurs when a page is referenced which is not currently loaded in memory.

This can happen in the fetch step of the CPU cycle when the next instruction is to be loaded from a memory address and the page containing that part of the code is not in memory.

It can also occur when an instruction's operands are being fetched just prior to execution of the instruction.

Or it can happen when an instruction is being executed in the CPU cycle - for example, an store instruction that moves the value in a CPU register to a (virtual) memory location, but that virtual memory location is not currently in memory.

10 of 17

Memory Management

In these cases, the PC will have been incremented already when the page fault is detected even though the instruction or data was not fetched and the instruction was not executed. The page fault handler in general must therefore adjust the user's PC back to the beginning of the instruction that caused the page fault so that it can be executed after the page fault handler has loaded the missing page in memory. Contrast this with usual interrupt handling, where the interrupt is handled only AFTER an instruction has executed.

The sequence of actions associated with a page fault:

The page fault is detected by the MMU hardware.

The MMU hardware loads page fault handler PC/PSW from the interrupt vector (similar to interrupts)

After the page fault handler has fetched the page (possibly replacing some page) and recorded the changes in the page table, it must adjust the interrupted user's PC. It must be set back to the beginning of the instruction since the instruction did not execute.

Finally, the page fault handler can return control to the user, program, which will again attempt to execute the instruction that caused the page fault.

We would clearly like to minimize the number of page faults and avoid the extra time overhead necessary to handle page faults.

Fetch policies

a) Demand paging: Fetch a page into memory ONLY if it is referenced ("demanded"). Consequently, only one page is loaded into memory when a page fault occurs using demand paging.

b) Prefetch policy: Load a cluster of nearby pages as well as the page causing the page fault.

Replacement policies and algorithms

Policies

a) Global replacement: Choose the "victim" page to be replaced from all current processes, not just the one that caused the page fault.

b) Local replacement: Choose the "victim" page to be replaced from the process that caused the page fault.

Most of the time local replacement is used in multiprogramming environments, but some (VAX/VMS) are hybrids that have an element of global replacement mixed in with the primary policy of local replacement.

In local replacement there is one other issue, namely the number of page frames are allotted to each process. The OS could either allocate frames statically to a process (when it is first created) and not change the allocation while the process executes or allocate frames dynamically, possibly changing the number of frames (up or down) while the process is executing:

How should the OS determine the frames if it allocates frames statically?

a) total page frames/ number of processes ? 1/N of the total

b) number of pages * process_size/total of process sizes? That is, the fraction process_size/(total of process sizes) of the total

c) No limit as long as pages are available.

d) ???

Dynamic allocation requires more overhead, but can be much more efficient in the use of page frames:

Give each process an initial page frame quota. If the process reaches the quota and page faults, it must replace one of its pages. However, the OS can monitor the "page fault rate" (= page faults/time interval or page faults/page references). If the page fault rate goes higher than some threshold, the OS can try to allocate more page frames to the process.

11 of 17

Memory Management

Algorithms

For purpose of discussing the replacement algorithms, we will assume demand paging and static allocation of page frames.

a) FIFO

b) LRU

c) OPT

If we compare page faults with number of frames for ONE of these algorithms, say, FIFO, for the page reference string 1 2 3 1 2 4 1 2 5 4

we get page frames | page faults ------------------------- 1 | 10 2 | 10 3 | 8 4 | 5 5 | 5 6 | 5

This example confirms what seems intuitive - the more frames allocated means the same or fewer page faults. However, Belady observed that this is not true for all page replacement algorithms. In particular, for FIFO the following example shows that it is sometimes possible for the number of page faults to actually increase if the number of allocated frames increases.

Belady: 1 2 3 4 1 2 5 1 2 3 4 5

page frames | page faults ------------------------- 1 | 12 2 | 12 3 | 9 4 | 10 <=== Faults increased here. 5 | 5

The other comparison to make is among the algorithms. The following calculations compare the three algorithms above for 3 frames only, with the page reference string

12 of 17

Memory Management

1 2 3 1 2 4 1 2 5 4

FIFO Ref. 1 2 3 1 2 4 1 2 5 4 ---------------------------- : - 1 1 1 1 1 2 3 4 1 2 : - - 2 2 2 2 3 4 1 2 5 : - - - 3 3 3 4 1 2 5 4Faults F F F - - F F F F F

Total Faults = 8

LRURef. 1 2 3 1 2 4 1 2 5 4 ---------------------------- : - 1 1 1 2 3 1 2 4 1 2 : - - 2 2 3 1 2 4 1 2 5 : - - - 3 1 2 4 1 2 5 4Faults F F F - - F - - F F

Total Faults = 6

OPTRef. 1 2 3 1 2 4 1 2 5 4 ---------------------------- : - 1 1 1 1 1 1 1 1 2 2 : - - 2 2 2 2 2 2 2 4 4 : - - - 3 3 3 4 4 4 5 5Faults F F F - - F - - F -

Total Faults = 5

13 of 17

Memory Management

Example Problems

1. (Memory and CPU usage) A computer system has enough room to hold four programs in its main memory. These programs are idle waiting for I/O half the time. What fraction of the CPU time is wasted?

Ans. Using the approximation, that the fraction of CPU idle time is pn

where p is the fraction of the time each process would be idle waiting for I/On is the number of processesSo CPU idle fraction = pn = (0.5)^4 = 0.0625 or 6.25%

2. Consider a swapping system in which memory consists of the following hole sizes in memory order:

10K, 4K, 20K, 18K, 7K, 9K, 12K, 15K

Which hole is taken for successive segment requests of

(a) 12K

(b) 10K

(c) 9K

for first fit, best fit, worst fit, and next fit

Ans.

First Fit

10K, 4K, 20K, 18K, 7K, 9K, 12K, 15K

^ ^ ^

(a) 12K | 8K |

(b) 10K 0K |

(c) 9K 9K

So 12K comes from the 20K 10K comes from the 10K 9K comes from the 18K Best Fit

10K, 4K, 20K, 18K, 7K, 9K, 12K, 15K

^ ^ ^

(a) 12K | | 0K

(b) 10K 0K |

(c) 9K 0K

So 12K comes from the 12K 10K comes from the 10K 9K comes from the 9K Worst Fit

14 of 17

Memory Management

10K, 4K, 20K, 18K, 7K, 9K, 12K, 15K

^ ^ ^

(a) 12K 8K | |

(b) 10K 8K |

(c) 9K 6K

So 12K comes from the 20K 10K comes from the 18K 9K comes from the 15K Next Fit

10K, 4K, 20K, 18K, 7K, 9K, 12K, 15K

^ ^ ^

(a) 12K 8K | |

(b) 10K 8K |

(c) 9K 0K

So 12K comes from the 20K 10K comes from the 18K 9K comes from the 9K3. Suppose that a computer has a 2M memory, of which the operating system takes 512K and each user program also takes 512K. If all programs have 60 percent I/O wait, by what percentage will the throughput increase if another 1M is added?

Ans.

The degree of multiprogramming is initially 3. Adding 1M increases this to 5. So CPU utilization goes from

1 - (.6)^3 to 1 - (.6)^5

that is, from 78.4% to 92.2%, an increase of 17.6%.

4. Virtual addresses a) 20, b) 4100, and c) 8300 are translated to physical addresses using 4K pages and a page table, pt[] pt[0]: 2 pt[1]: 1 pt[2]: 6 pt[3]: 0 pt[4]: 4 pt[5]: 3 pt[6]: X pt[7]: X pt[8]: X pt[9]: 5

...

a) page = floor(20/4096) = 0, offset = 20 mod 4096 = 20

frame =pt[0] = 2 phys. addr. = 2*4096 + 20 = 8192+20= 8212

15 of 17

Memory Management

b) page = floor(4100/4096) = 1, offset = 4100 mod 4096 = 4

frame = pt[1] = 1 phys. addr. = 1*4096 + 4 = 4100c) page = floor(8300/4096) = 2, offset = 8300 mod 4096 = 8300 - 2*4096

= 8300 - 8192

= 108

frame = pt[2] = 6

phys. addr. = 6*4096 + 108 = 24576 + 108 = 24684

5. If an instruction takes 1 microsec and a page fault takes an additional n microsec, give a formula for the effective instruction time if page faults occur every k instructions.

The average time for an instruction is

time for k instructions/ k =

(k + n)/k microsec = 1 + n/k microsec

6. A machine has a 32-bit address space and an 8K page. The page table is entirely in hardware, with one 32-bit word per entry. When a process starts, the page table is copied to the hardware from memory, at one word every 100nsec. If each process runs for 100msec (including the time to load the page table), what fraction of the CPU time is devoted to loading the page tables?

Since 8K = 2^13, 13 bits of the 32-bit address are needed to specify the offset in a page. The remaining 19 bits, therefore, specify the page number. That means that there are potentially 2^19 pages and so 2^19 page table entries. The time to copy 2^19 words from memory is

2^19 * 100nsec = 524,288 * 100nsec

= 52428800nsec

= 52428.8microsec

= 52.428 msec

So 52.428msec/100msec = .52428 is the fraction of CPU time devoted to loading the page table.

7. A computer whose processes have 1024 pages in their address spaces keeps its page tables in memory. The overhead required for reading a word from the page table is 500 nsec. To reduce this overhead, the computer has an associative memory, which holds 32 (virtual page, physical page frame) pairs, and can do a look up in 100 nsec. What hit rate is needed to reduce the mean overhead to 200nsec.

x = hit rate

t = time for a physical memory access for the page table entry = 500

desired average effective access time for the page table entry

= 100x + (1-x)500

So setting this last quantity to 200:

100x + (1-x)500 = 200 400x = 300 x = 0.75(This means that an average virtual memory access now takes 700(500+200) nsec with a hit rate of 75%, instead of 1000nsec(500 +500) = 1microsec with no page table cache).

8. A computer has four page frames. The time of loading, time of last access, and the reference(R) and modified(M) bits are as show below:

Page Loaded LastRef R M 0 126 279 0 0

16 of 17

Memory Management

1 230 260 1 0 2 120 272 1 1 3 160 280 1 1 a) Which page will NRU replace? Answer: 0

b) Which page will FIFO replace? Answer: 2

c) Which page will LRU replace? Answer: 1

d) Which page will 2nd Chance replace? Answer: 0

NRU - not recently used (R = 0) 2nd Chance - Replace the oldest loaded page whose R bit is 0. Set the R bit to 0 for all pages older than

this page

Working Sets and Thrashing

The working set is the set of pages that have been referenced during a time period. More precisely, the working set will depend on both the length of the time interval as well as the starting (or ending) time of the time interval. Letting t denote time and d, the length of a time interval, the working set function, WS, depends on both t and d.

WS(t,d) = the set of pages referenced from time t - d to time t.

The working set size is the number of pages in the working set. It also is a function of t and d.

The working set size is the ideal allocation of page frames that a process should have. If it has more page frames, the extra frames will not be used. If it has fewer page frames, the process is guaranteed to page fault within the interval d. In fact, it will continue to fault, since it will replace a page and still not have enough frames to hold its working set. This condition of excessive page faulting is leads to 'thrashing'. A process is thrashing if it is spending more time paging than executing.

Some of the example problems were intended to reinforce the idea that CPU utilization increases as the number of processes increases. However, memory puts a constraint on the number of processes that can exist at a time. The number of processes is referred to as the 'degree of multiprogramming'. Virtual memory allows more processes to exist since more processes can be partially loaded in memory than can be completely loaded. However, even with virtual memory there is a limit. Physical memory can be overcommitted.

When there are so many processes that physical memory is not sufficient for each one to be allocated its working set size, thrashing will occur. As a consequence, instead of increased CPU utilization, further increasing the number of processes will in fact decrease the CPU utilization since processes will now increase their page i/o wait time at the expense of CPU burst time.

The problem with the working set is how to calculate and keep track of it. As an approximation, the page fault frequency of processes can be fairly easily maintained. If the PFF starts going up dramatically for a process, the memory management routines can increase the page frame allocation for that process. If the number of free page frames gets too low, that means that memory is in danger of being overcommitted. One drastic measure is to reduce the number of processes and reclaim their page frames. Allowing the degree of multiprogramming to increase at this point will just exacerbate the problem.

17 of 17

condor.depaul.educondor.depaul.edu/dmumaugh/readings/handouts/csc343/... · web viewa machine has a...

Documents