1 comp 206: computer architecture and implementation montek singh mon., nov. 17, 2003 topic: virtual...
Post on 19-Dec-2015
220 views
TRANSCRIPT
1
COMP 206:COMP 206:Computer Architecture and Computer Architecture and
ImplementationImplementation
Montek SinghMontek Singh
Mon., Nov. 17, 2003Mon., Nov. 17, 2003
Topic: Topic: Virtual MemoryVirtual Memory
2
OutlineOutline IntroductionIntroduction Address TranslationAddress Translation VM OrganizationVM Organization ExamplesExamples
Reading: HP3 Section 5.10 Reading: HP3 Section 5.10 For background: Refer to PH (Comp. Org.)For background: Refer to PH (Comp. Org.)
3
CharacteristicsCharacteristics
Cache-MM MM-diskAccess time ratio ("speed gap") 1:5 - 1:15 1:10000 - 1:1000000
Hit time 1-2 cycles 40-100 cyclesHit ratio 0.90-0.99 0.99999-0.9999999
Miss (page fault) ratio 0.01-0.10 0.00000001-0.000001Miss penalty 10-100 cycles 1M-6M cycles
CPU during block transfer blocking/non-blocking task switchingBlock (page) size 16-128 bytes 4Kbytes - 64KbytesImplemented in hardware hardware + software
Mapping Direct or set-associative Page table ("fully associative")Replacement algorithm Not crucial Very important (LRU)
Write policy Many choices Write backDirect access to slow memory Yes No
4
AddressingAddressing Always a congruence mappingAlways a congruence mapping Assume Assume
4GB VM composed of 24GB VM composed of 22020 4KB pages 4KB pages 64MB DRAM main memory composed of 16384 page frames (of 64MB DRAM main memory composed of 16384 page frames (of
same size)same size) Only those pages (of the 2Only those pages (of the 22020) that are not empty actually exist) that are not empty actually exist
Each is either in main memory or on diskEach is either in main memory or on disk Can be located with two mappings (implemented with tables)Can be located with two mappings (implemented with tables)
Virtual address = (virtual page number, page offset)VA = (VPN, offset)32 bits = (20 bits + 12 bits)
Physical address = (real page number, page offset)PA = (RPN, offset)26 bits = (14 bits + 12 bits)
5
Address TranslationAddress Translation
RPN = fRPN = fMM(VPN)(VPN) In reality, VPN is mapped to a page table entry (PTE)In reality, VPN is mapped to a page table entry (PTE)
which contains RPN …which contains RPN … … … as well as miscellaneous control information (e.g., valid bit, as well as miscellaneous control information (e.g., valid bit,
dirty bit, replacement information, access control)dirty bit, replacement information, access control)
VA PA(VPN, offset within page) (RPN, offset within page)
VA disk address
6
Single-Level, Direct Page Table in Single-Level, Direct Page Table in MMMM Fully associative mapping:Fully associative mapping:
when VM page is brought in from disk to MM, it may when VM page is brought in from disk to MM, it may go into any of the real page framesgo into any of the real page frames
Simplest addressing scheme: one-level, direct Simplest addressing scheme: one-level, direct page tablepage table (page table base address + VPN) = PTE or page fault(page table base address + VPN) = PTE or page fault Assume that PTE size is 4 bytesAssume that PTE size is 4 bytes Then whole table requires 4Then whole table requires 4222020 = 4MB of main = 4MB of main
memorymemory
Disadvantage: 4MB of main memory must be Disadvantage: 4MB of main memory must be reserved for page tables, even when the VM reserved for page tables, even when the VM space is almost emptyspace is almost empty
7
Single-Level Direct Page Table in Single-Level Direct Page Table in VMVM To avoid tying down 4MB of physical memoryTo avoid tying down 4MB of physical memory
Put page tables in VMPut page tables in VM Bring into MM only those that are actually neededBring into MM only those that are actually needed ““Paging the page tables”Paging the page tables”
Needs only 1K PTEs in main memory, rather Needs only 1K PTEs in main memory, rather than 4MBthan 4MB
Slows down access to VM pages by possibly Slows down access to VM pages by possibly needing disk accesses for the PTEsneeding disk accesses for the PTEs
8
Multi-Level Direct Page Table in Multi-Level Direct Page Table in MMMM Another solution to storage problemAnother solution to storage problem Break 20-bit VPN into two 10-bit partsBreak 20-bit VPN into two 10-bit parts
VPN = (VPN1, VPN2)VPN = (VPN1, VPN2)
This turns original one-level page table into a This turns original one-level page table into a tree structuretree structure (1st level base address + VPN1) = 2nd level base (1st level base address + VPN1) = 2nd level base
addressaddress (2nd level base address + VPN2) = PTE or page fault(2nd level base address + VPN2) = PTE or page fault
Storage situation much improvedStorage situation much improved Always need root node (1K 4-byte entries = 1 VM page)Always need root node (1K 4-byte entries = 1 VM page) Ned only a few of the second level nodesNed only a few of the second level nodes
Allocated on demandAllocated on demandCan be anywhere in main memoryCan be anywhere in main memory
Access time to PTE has doubledAccess time to PTE has doubled
9
Inverted Page TablesInverted Page Tables Virtual address spaces may be vastly larger Virtual address spaces may be vastly larger
(and more sparsely populated) than real (and more sparsely populated) than real address spacesaddress spaces less-than-full utilization of tree nodes in multi-level less-than-full utilization of tree nodes in multi-level
direct page table becomes more significantdirect page table becomes more significant Ideal (i.e., smallest possible) page table would Ideal (i.e., smallest possible) page table would
have one entry for every VM page actually in have one entry for every VM page actually in main memorymain memory Need 4Need 416K = 64KB of main memory to store this ideal 16K = 64KB of main memory to store this ideal
page tablepage table Storage overhead = 0.1%Storage overhead = 0.1%
Inverted page tableInverted page table implementations are implementations are approximations to this ideal page tableapproximations to this ideal page table Associative inverted page table in special hardware Associative inverted page table in special hardware
(ATLAS)(ATLAS) Hashed inverted page table in MM (IBM, HP PA-RISC)Hashed inverted page table in MM (IBM, HP PA-RISC)
10
Translation Lookaside Buffer (TLB)Translation Lookaside Buffer (TLB) To avoid two or more MM accesses for each VM To avoid two or more MM accesses for each VM
access, use a small cache to store (VPN, PTE) access, use a small cache to store (VPN, PTE) pairspairs PTE contains RPN, from which RA can be constructedPTE contains RPN, from which RA can be constructed
This cache is the TLB, and it exploits localityThis cache is the TLB, and it exploits locality DEC Alpha (32 entries, fully associative)DEC Alpha (32 entries, fully associative) Amdahl V/8 (512 entries, 2-way set-associative)Amdahl V/8 (512 entries, 2-way set-associative)
Processor issues VAProcessor issues VA TLB hitTLB hit
Send RA to main memorySend RA to main memory TLB missTLB miss
Make two or more MM accesses to page tables to retrieve Make two or more MM accesses to page tables to retrieve RARA
Send RA to MMSend RA to MM– (Any of these may cause page fault)(Any of these may cause page fault)
11
TLB MissesTLB Misses Causes for TLB missCauses for TLB miss
VM page is not in main memoryVM page is not in main memory VM page is in main memory, but TLB entry has not yet VM page is in main memory, but TLB entry has not yet
been entered into TLBbeen entered into TLB VM page is in main memory, but TLB entry has been VM page is in main memory, but TLB entry has been
removed for some reason (removed as LRU, removed for some reason (removed as LRU, invalidated because page table was updated, etc.)invalidated because page table was updated, etc.)
Miss rates are remarkably low (~0.1%)Miss rates are remarkably low (~0.1%) Miss rate depends on size of TLB and on VM page size Miss rate depends on size of TLB and on VM page size
(coverage)(coverage)
Miss penalty varies from a single cache access Miss penalty varies from a single cache access to several page faultsto several page faults
12
Dirty Bits and TLB: Two SolutionsDirty Bits and TLB: Two Solutions TLB is TLB is read-onlyread-only cache cache Dirty bit is contained Dirty bit is contained
only in page table in MMonly in page table in MM TLB contains only a TLB contains only a
write-access bitwrite-access bit Initially set to zero Initially set to zero
(denying writing of page)(denying writing of page) On first attempt to write On first attempt to write
VM pageVM page An exception is causedAn exception is caused Sets the dirty bit in page Sets the dirty bit in page
table in MMtable in MM Resets the write access Resets the write access
bit to 1 in TLBbit to 1 in TLB
TLB is a TLB is a read-writeread-write cachecache
Dirty bit present in both Dirty bit present in both TLB and page table in TLB and page table in MMMM
On first write to VM On first write to VM pagepage Only dirty bit in TLB is Only dirty bit in TLB is
setset Dirty bit in page table is Dirty bit in page table is
brought up-to-datebrought up-to-date when TLB entry is when TLB entry is
evictedevicted when VM page and PTE when VM page and PTE
are evictedare evicted
13
Virtual Memory Access TimeVirtual Memory Access Time Assume existence of TLB, physical cache, MM, Assume existence of TLB, physical cache, MM,
diskdisk Processor issues VAProcessor issues VA
TLB hitTLB hitSend RA to cacheSend RA to cache
TLB missTLB missException: Access page tables, update TLB, retryException: Access page tables, update TLB, retry
Memory reference may involve accesses toMemory reference may involve accesses to TLBTLB Page table in MMPage table in MM CacheCache Page in MMPage in MM
Each of these can be a hit or a missEach of these can be a hit or a miss 16 possible combinations16 possible combinations
14
Virtual Memory Access Time (2)Virtual Memory Access Time (2) Constraints among these accessesConstraints among these accesses
Hit in TLB Hit in TLB hit in page table in MM hit in page table in MM Hit in cache Hit in cache hit in page in MM hit in page in MM Hit in page in MM Hit in page in MM hit in page table in MM hit in page table in MM
These constraints eliminate eleven These constraints eliminate eleven combinationscombinationsCase TLB MM PTE Cache MM data Comment
Cache hit Hit (Hit) Hit (Hit) MM not checkedCache miss Hit (Hit) Miss Hit Cache updatedTLB miss Miss Hit Hit Hit TLB updated, TLB access repeatedTLB+cache miss Miss Hit Miss Hit TLB+cache updatedPage fault Miss Miss Miss Miss Cache miss follows servicing of page fault
15
Virtual Memory Access Time (3)Virtual Memory Access Time (3) Number of MM accesses depends on page Number of MM accesses depends on page
table organizationtable organization MIPS R2000/R4000 accomplishes table walking with MIPS R2000/R4000 accomplishes table walking with
CPU instructions (eight instructions per page table CPU instructions (eight instructions per page table level)level)
Several CISC machines implement this in microcode, Several CISC machines implement this in microcode, with MC88200 having dedicated hardware for thiswith MC88200 having dedicated hardware for this
RS/6000 implements this completely in hardwareRS/6000 implements this completely in hardware
TLB miss penalty dominated by having to go to TLB miss penalty dominated by having to go to main memorymain memory Page tables may not be in cachePage tables may not be in cache Further increase in miss penalty if page table Further increase in miss penalty if page table
organization is complexorganization is complex TLB misses can have very damaging effect on TLB misses can have very damaging effect on
physical cachesphysical caches
16
Page SizePage Size ChoicesChoices
Fixed at design time (most early VM systems)Fixed at design time (most early VM systems) Statically configurableStatically configurable
At any moment, only pages of same size exist in systemAt any moment, only pages of same size exist in systemMC68030 allowed page sizes between 256B and 32KB this MC68030 allowed page sizes between 256B and 32KB this
wayway Dynamically configurableDynamically configurable
Pages of different sizes coexist in systemPages of different sizes coexist in systemAlpha 21164, UltraSPARC: 8KB, 64KB, 512KB, 4MBAlpha 21164, UltraSPARC: 8KB, 64KB, 512KB, 4MBMIPS R10000, PA-8000: 4KB, 16Kb, 64KB, 256 KB, 1 MB, 4 MIPS R10000, PA-8000: 4KB, 16Kb, 64KB, 256 KB, 1 MB, 4
MB, 16 MBMB, 16 MBAll pages are alignedAll pages are aligned
Dynamic configuration is a sophisticated way to Dynamic configuration is a sophisticated way to decrease TLB missdecrease TLB miss Increasing # TLB entries increases processor cycle timeIncreasing # TLB entries increases processor cycle time Increasing size of VM page increases internal memory Increasing size of VM page increases internal memory
fragmentationfragmentationNeeds fully associative TLBsNeeds fully associative TLBs
17
Segmentation and PagingSegmentation and Paging Paged segments: Segments are made up of pagesPaged segments: Segments are made up of pages Paging system has flat, linear address spacePaging system has flat, linear address space
32-bit VA = (10-bit VPN1, 10-bit VPN2, 12-bit offset)32-bit VA = (10-bit VPN1, 10-bit VPN2, 12-bit offset) If, for given VPN1, we reach max value of VPN2 and add 1, If, for given VPN1, we reach max value of VPN2 and add 1,
we reach next page at address (VPN+1, 0)we reach next page at address (VPN+1, 0) Segmented version has two-dimensional address spaceSegmented version has two-dimensional address space
32-bit VA = (10-bit segment #, 10-bit page number, 12-bit 32-bit VA = (10-bit segment #, 10-bit page number, 12-bit offset)offset)
If, for given segment #, we reach max page number and If, for given segment #, we reach max page number and add 1, we get an undefined valueadd 1, we get an undefined value
Segments are not contiguousSegments are not contiguous Segments do not need to have the same sizeSegments do not need to have the same size
Size can even vary dynamicallySize can even vary dynamically Implemented by storing upper bound for each segment Implemented by storing upper bound for each segment
and checking every reference against itand checking every reference against it
18
Example 1: Alpha 21264 TLBExample 1: Alpha 21264 TLB Figure 5.36Figure 5.36
19
Example 2: Hypothetical Virtual Example 2: Hypothetical Virtual MemMem Figure 5.37Figure 5.37