paging, page tables, and such
DESCRIPTION
Paging, Page Tables, and Such. Andrew Whitaker CSE451. Today’s Topics. Page Replacement Strategies Making Paging Fast Reducing the Overhead of Page Tables. working set. Review: Working Sets. Request / second of throughput. thrashing )-:. Over-allocation. - PowerPoint PPT PresentationTRANSCRIPT
Paging, Page Tables, and Such
Andrew Whitaker
CSE451
Today’s Topics
Page Replacement StrategiesMaking Paging FastReducing the Overhead of Page Tables
Review: Working Sets
Number of page frames allocated to process
Req
uest
/ s
econ
d of
thr
ough
put working
set
thrashing)-:
Over-allocation
Page Replacement
What happens when we take a page fault and we’ve run out of memory?
Goal: Keep each process’s working set in memory Giving more than the working set not necessary
Key issue: how do we identify working sets?
Belady’s Algorithm
Evict the page that won’t be used for the longest time in the future This page is probably not in the working set
If it is in the working set, we’re thrashing
This is optimal! Minimizes the number of page faults
Major problem: this requires a crystal ball There is no good way to predict future memory
accesses
How Good are These Page Replacement Algorithms?LIFO
Newest page is kicked outFIFO
Oldest page is kicked outRandom
Random page is kicked outLRU
Least recently used page is kicked out
Temporal Locality
Assumption: recently accessed pages will be accessed again soon Use the past to predict the future
LIFO is horrendous Random is also pretty bad
LRU is pretty good FIFO is mediocre
VAX VMS used a form of FIFO because of hardware limitations
Implementing LRU: Approach #1
One (bad) approach:
on each memory reference: long timeStamp = System.currentTimeMillis(); sortedList.insert(pageFrameNumber,timeStamp);
Problem: this is too inefficient Time stamp + data structure manipulation on each memory operation
Too complex for hardware
Making LRU Efficient
Use hardware support Reference bit is set when pages are accessed Can be cleared by the OS
Trade off accuracy for speed It suffices to find a “pretty old” page
page frame numberprotMRV
202111
Approach #2: LRU Approximation with Reference Bits
For each page, maintain a set of reference bits Let’s call it a reference byte
Periodically, shift the HW reference bit into the highest-order bit of the reference byte
Suppose the reference byte was 10101010 If the HW bit was set, the new reference bit become
11010101
Frame with the lowest value is the LRU page
Analyzing Reference Bits
Pro: Does not impose overhead on every memory reference Interval rate can be configured
Con: Scanning all page frames can still be inefficient e.g., 4 GB of memory, 4KB pages => 1 million
page frames
Approach #3: LRU Clock
Use only a single bit per page frame Basically, this is a degenerate form of reference
bits
On page eviction: Scan through the list of reference bits If the value is zero, replace this page If the value is one, set the value to zero
Why “Clock”?
0
0
1 0
0
1
1
00
0
0
1
Typically implemented with a circular queue
Analyzing Clock
Pro: Very low overhead Only runs when a page needs evicted Takes the first page that hasn’t been referenced
Con: Isn’t very accurate (one measly bit!) Degenerates into FIFO if all reference bits are set
Pro: But, the algorithm is self-regulating If there is a lot of memory pressure, the clock runs more
often (and is more up-to-date)
When Does LRU Do Badly?
LRU performs poorly when there is little temporal locality:
1 2 3 4 5 6 7 8
Example: Many database workloads:
SELECT *FROM EmployeesWHERE Salary < 25000
Today’s Topics
Page Replacement StrategiesMaking Paging FastReducing the Overhead of Page Tables
Review: Mechanics of address translation
pageframe 0
pageframe 1
pageframe 2
pageframe Y
…
pageframe 3
physical memory
offset
physical address
page frame #page frame #
page table
offset
virtual address
virtual page #
Problem: page tables live in memory
Making Paging Fast
We must avoid a page table lookup for every memory reference This would double memory access time
Solution: Translation Lookaside Buffer Fancy name for a cache
TLB stores a subset of PTEs (page table translation entries)
TLBs are small and fast (16-48 entries) Can be accessed “for free”
TLB Details
In practice, most (> 99%) of memory translations handled by the TLB
Each processor has its own TLB TLB is fully associative
Any TLB slot can hold any PTE entry The full VPN is the cache “key” All entries are searched in parallel
Who fills the TLB? Two options: Hardware (x86) walks the page table on a TLB miss Software (MIPS, Alpha) routine fills the TLB on a miss
TLB itself needs a replacement policy Usually implemented in hardware (LRU)
What Happens on a Context Switch?
Each process has its own address spaceSo, each process has its own page tableSo, page-table entries are only relevant for
a particular process Thus, the TLB must be flushed on a
context switch This is why context switches are so expensive
Ben’s Idea
We can avoid flushing the TLB if entries are associated with an address space
When would this work well? When would this not work well?
page frame numberprotMRV
202111
ASID
4
TLB Management Pain
TLB is a cache of page table entries OS must ensure that page tables and TLB
entries stay in sync Massive pain: TLB consistency across multiple
processors
Q: How do we implement LRU if reference bits are stored in the TLB?
One answer: we don’t Windows uses FIFO for multiprocessor machines
Today’s Topics
Page Replacement StrategiesMaking Paging FastReducing the Overhead of Page Tables
Page Table Overhead
For large address space, page table sizes can become enormous
Example: Alpha architecture 64 bit address space, 8KB pages
Num PTEs = 2^64 / 2^13 = 2^51
Assuming 8 bytes per PTE:Num Bytes = 2^54 = 16 Petabytes
And, this is per-process!
Optimizing for Sparse Address Spaces Observation: very little of the address space is in
use at a given time This is why virtual memory works
Basic idea: only allocate page tables where we need to And, fill in new page tables on demand
virtualaddressspace
Implementing Sparse Address SpacesWe need a data structure to keep track of
the page tables we have allocatedAnd, this structure must be small
Otherwise, we’ve defeated our original goal
Solution: multi-level page tables Page tables of page tables “Any problem in CS can be solved with a layer
of indirection”
Two level page tables
pageframe 0
pageframe 1
pageframe 2
pageframe Y
…
pageframe 3
physical memory
offset
physical address
page frame #
masterpage table
secondary page#
virtual address
master page # offset
secondarypage table
empty
empty
secondarypage table
page framenumber
Key point: not all secondary page tables must be allocated
Generalizing
Early architectures used 1-level page tablesVAX, x86 used 2-level page tablesSPARC uses 3-level page tablesAlpha 68030 uses 4-level page tables
Key thing is that the outer level must be wired down (pinned in physical memory) in order to break the recursion
Cool Paging Tricks
Basic Idea: exploit the layer of indirection between virtual and physical memory
Trick #1: Shared Memory
Allow different processes to share physical memory
Virt Address space 2Virt Address space 1
Physical memory
Trick #2: Copy-on-write
Recall that fork() copies the parent’s address space to the client This is ineffient, especially if the child calls exec
Copy-on-write allows for a fast “copy” by using shared pages
If the child tries to write to a page, the OS intervenes and makes a copy of the target page Implementation: pages are shared as “read-only”
OS intercepts write faults
page frame numberprotMRV
Trick #3: Memory-mapped Files
Normally, files are accessed with system calls Open, read, write, close
Memory mapping allows a program to access a file with load/store operations
Virt Address space
Foo.txt