structure of computer systems curse 9 memory hierarchy
TRANSCRIPT
Structure of Computer Structure of Computer SystemsSystems
Curse 9 Memory hierarchyCurse 9 Memory hierarchy
Memory hierarchiesMemory hierarchies
Why memory hierarchies?Why memory hierarchies? what we want:what we want:
• big capacitybig capacity, , high speedhigh speed at an at an affordable priceaffordable price• no today’s memory technologies can assure all 3 no today’s memory technologies can assure all 3
requirements in the same timerequirements in the same time what we have:what we have:
• high speed, low capacity - SRAM, ROMhigh speed, low capacity - SRAM, ROM• medium speed, big capacity – DRAMmedium speed, big capacity – DRAM• low speed, almost infinite capacity – HDD, DVDlow speed, almost infinite capacity – HDD, DVD
how to achieve all 3 requirements?how to achieve all 3 requirements?• combining technologies in a hierarchical waycombining technologies in a hierarchical way
Performance features of Performance features of memoriesmemories
SRAMSRAM DRAMDRAM HDD, DVDHDD, DVD
CapacityCapacity smallsmall
1-64ko1-64ko
MediumMedium
256-2Go256-2Go
BigBig
20-160Go20-160Go
Access timeAccess time SmallSmall
1-10ns1-10ns
MediumMedium
15-70ns15-70ns
BigBig
1-10ms1-10ms
CostCost bigbig mediummedium smallsmall
Memory hierarchiesMemory hierarchies
Processor
CacheInternal memory
(operative)
Virtual memory
SRAM DRAM HD, CD, DVD
Principles in favor of memory Principles in favor of memory hierarchies hierarchies
Temporal localityTemporal locality – if a location is accessed at a given – if a location is accessed at a given time it has a high probability of being accessed in the time it has a high probability of being accessed in the near futurenear future
examples: exaction of loops (for, while, etc.), repeated examples: exaction of loops (for, while, etc.), repeated processing of some variables processing of some variables
Spatial localitySpatial locality – if a location is accessed than its – if a location is accessed than its neighbors have a high probability of being accessed in neighbors have a high probability of being accessed in the near futurethe near future
examples: loops, vectors and records processingexamples: loops, vectors and records processing 90/10 90/10 – 90% of the time the processor executes 10% of – 90% of the time the processor executes 10% of
the programthe program
The ideaThe idea: to bring memory zones with higher probability : to bring memory zones with higher probability of access in the future, closer to the processorof access in the future, closer to the processor
Cache memoryCache memory
High speed, low capacity memoryHigh speed, low capacity memory The closest memory to the processorThe closest memory to the processor Organization: lines of cache memoriesOrganization: lines of cache memories Keeps copies of zones (lines) from the main Keeps copies of zones (lines) from the main
(internal) memory(internal) memory The cache memory is not visible for the The cache memory is not visible for the
programmerprogrammer The transfer between the cache and the The transfer between the cache and the
internal memory is made automatically under internal memory is made automatically under the control of the Memory Management Unit the control of the Memory Management Unit (MMU)(MMU)
Typical cache memory Typical cache memory parameters parameters
Parameter ValueValue
Memory dimension 32kocteţi-64Moctet 32kocteţi-64Moctet
Dimension of a cache line 16-256 bytes 16-256 bytes
Access time 0.1-1 ns 0.1-1 ns
Speed (bandwidth) 800-5000Mbytes/sec. 800-5000Mbytes/sec.
Circuit types Processor’s internal RAM or external static Processor’s internal RAM or external static RAM RAM
Design of cache memoryDesign of cache memory
o Design problems:Design problems:1. Where should we place a new line ?1. Where should we place a new line ?2. How do we find a location in the cache memory ?2. How do we find a location in the cache memory ?3. Which line should be replace if the memory is full and a new 3. Which line should be replace if the memory is full and a new
data is requested ?data is requested ?4. How are the “write” operations solved ? 4. How are the “write” operations solved ? 5. Which is the optimal length of a cache line ? Cache efficiency?5. Which is the optimal length of a cache line ? Cache efficiency?
Cache memory architectures: Cache memory architectures: cache memory with direct mappingcache memory with direct mapping associative cache memoryassociative cache memory set associative cache memory – (N-way cache)set associative cache memory – (N-way cache) cache memory organized on sectorscache memory organized on sectors
Cache memory with direct Cache memory with direct mapping (1-way cache)mapping (1-way cache)
Principle: the address of the line in the cache memory is Principle: the address of the line in the cache memory is determined directly from the location’s physical address – determined directly from the location’s physical address – direct mappingdirect mapping
a memory line can be placed in a unique places in the cache (1-a memory line can be placed in a unique places in the cache (1-way cache)way cache)
the tag is used to identify lines with the same position in the cache the tag is used to identify lines with the same position in the cache memorymemory
Main memory
Cache memory
Cache memory with direct mappingCache memory with direct mapping Example:
4GB – internal memory – 32 address lines 4 MB – cache memory – 22 address lines 64 KLines – 16 Line index signals 64 locations/line – 6 Location index signals
Line 0
Line 1
Line 2
Line 3
Line FFFF
...........
Hit/MissComp.
Cache lines Tag
Tag
Line index
Location index
Physical address (32 bits)
10 bits 16 bits 6 bits
Cache memory with direct Cache memory with direct mappingmapping
Design issues:Design issues:1. Where to place a new line? 1. Where to place a new line?
• in the place pointed by the line index fieldin the place pointed by the line index field
2. How do we find a location in the cache memory ?2. How do we find a location in the cache memory ?• based on tag, line index and location index (compare based on tag, line index and location index (compare
tags of the current address and the one in the tags of the current address and the one in the indicated cache line – hit or miss)indicated cache line – hit or miss)
3. Which line should be replace when a new data is 3. Which line should be replace when a new data is requested ?requested ?• the one indicated by the line index (even if the present the one indicated by the line index (even if the present
one is occupied and other lines are free)one is occupied and other lines are free)
Cache memory with direct Cache memory with direct mappingmapping
Advantages:Advantages: simple to implementsimple to implement easy to place, find and replace a cache lineeasy to place, find and replace a cache line
Drawbacks:Drawbacks: in some cases, repeated replacement of lines in some cases, repeated replacement of lines
even if the cache memory is not fulleven if the cache memory is not full inefficient use of the cache memory spaceinefficient use of the cache memory space
Associative cache memoryAssociative cache memory(N-way cache memory)(N-way cache memory)
Principle: a line is placed in any place of the Principle: a line is placed in any place of the cache memory (N-way cache)cache memory (N-way cache)
Main memory
Cache memory
Associative cache memoryAssociative cache memory Example:
4GB – internal memory – 32 address lines 1 MB – cache memory – 22 address lines 256 locations/line – 8 Location index signals 4096 cache lines
Physical address (32 bits) Descriptor Cache line Counter
Comp.Hit/Miss
11223344
Line descriptor
Location
24 bits (11223344h) 8 bits
12345678
78345579
0
1
2
3
4095
4094
Associative cache memoryAssociative cache memory
Design issues:Design issues:1. Where to place a new line? 1. Where to place a new line?
• in any free cache line or in a line less used in the near past in any free cache line or in a line less used in the near past
2. How do we find a location in the cache memory ?2. How do we find a location in the cache memory ?• compare the line field in the address with the descriptor part compare the line field in the address with the descriptor part
in the cache linesin the cache lines• compare in parallel – number of comparators is equal compare in parallel – number of comparators is equal
with the number of cache lines – too many comparatorswith the number of cache lines – too many comparators• compare sequentially - one comparator – too much timecompare sequentially - one comparator – too much time
3. Which line should be replace if the memory is full and 3. Which line should be replace if the memory is full and a new data is requested ?a new data is requested ?
• random choicerandom choice• leased used in the near past – it uses a counter for every lineleased used in the near past – it uses a counter for every line
Associative cache memoryAssociative cache memory
advantages:advantages: efficient use of the cache memory's capacityefficient use of the cache memory's capacity
Drawback:Drawback: limited number of cache lines, so limited limited number of cache lines, so limited
cache capacity – because of the comparison cache capacity – because of the comparison operation (hardware limitation or time operation (hardware limitation or time limitation)limitation)
Set associative cache memory Set associative cache memory (2, 4, 8 .. WAY cache)(2, 4, 8 .. WAY cache)
Principle: combination of associative and direct Principle: combination of associative and direct mapping design:mapping design:
• lines organized on blockslines organized on blocks• block identification through direct mappingblock identification through direct mapping• line identification (inside the block) through associative methodline identification (inside the block) through associative method
Main memory
Cache memory
2-way memory
2 blocks,
2 lines in each block
Set associative cache memorySet associative cache memory Example: 16-way cacheExample: 16-way cache
4G – internal memory4G – internal memory 4 MB - cache4 MB - cache
......
......
......
......
Descriptor Content Block no.
Comp.Hit/Miss
Physical address (32 bits)
14 10 8 0
1
2
1023
256 locations/line256 locations/line 16 lines/block16 lines/block 1024 blocks1024 blocks
Set associative cache Set associative cache memorymemory
Advantages:Advantages: combines the advantages of the two techniques:combines the advantages of the two techniques:
• many lines are allowed, no capacity limitationmany lines are allowed, no capacity limitation• efficient use of the whole cache capacityefficient use of the whole cache capacity
Drawback:Drawback: more complex implementationmore complex implementation
Cache memory organized on Cache memory organized on sectorssectors
Memoria cache Physical address Descriptor Content Sector adr. Block ad. loc. sector 1356 sector 5789 sector 2266 .. sector 7891 Descr. Cont.
Cache memory organized on Cache memory organized on sectorssectors
Principle: similar with the Set associative Principle: similar with the Set associative cache, but:cache, but: the order is changed, the sector (block) is the order is changed, the sector (block) is
identified through associative method and the identified through associative method and the line inside the sector with direct mappingline inside the sector with direct mapping
Advantages and drawbacks: similar with Advantages and drawbacks: similar with the previous methodthe previous method
Writing operation in the cache Writing operation in the cache memorymemory
The problem: writing in the cache memory generates The problem: writing in the cache memory generates inconsistency between the main memory and the copy in inconsistency between the main memory and the copy in the cachethe cache
Two techniques:Two techniques: Write backWrite back – writes the data in the internal memory only when the – writes the data in the internal memory only when the
line is downloaded (replaced) from the cache memoryline is downloaded (replaced) from the cache memory• Advantage: write operations made at the speed of the cache memory – Advantage: write operations made at the speed of the cache memory –
high efficiencyhigh efficiency• Drawback: temporary inconsistency between the two memories – it may Drawback: temporary inconsistency between the two memories – it may
be critical in case of multi-master (e.g. multi-processor) systems, be critical in case of multi-master (e.g. multi-processor) systems, because it may generate errorsbecause it may generate errors
Write throughWrite through – writes the data in the cache and in the main memory – writes the data in the cache and in the main memory in the same time in the same time
• Advantage: no inconsistencyAdvantage: no inconsistency• Drawback: write operations are made at the speed of the internal Drawback: write operations are made at the speed of the internal
memory (much lower speed)memory (much lower speed) but, write operations are not so frequent (1 write from 10 read-write but, write operations are not so frequent (1 write from 10 read-write
operations)operations)
Efficiency of the cache memoryEfficiency of the cache memory
Hit/miss rate influence the access timeHit/miss rate influence the access time• reduce memory access time treduce memory access time taa
ttaa = t = tcc + (1-R + (1-Rss)*t)*tii
• where: where: ttaa – average access time – average access time
ttii – access time of the internal memory – access time of the internal memory
ttcc – access time of the cache memory – access time of the cache memory
RRs s – success rate– success rate
(1-R(1-Rss) – miss rate) – miss rate
Cache memoryCache memory Which is the optimal length of a cache line ?Which is the optimal length of a cache line ?
depends on the internal organization of the cache, depends on the internal organization of the cache, bus and the configuration of processorsbus and the configuration of processors
Miss rate dimension of cache memory
0.4 1 kbytes
0.3 8 kbytes 16 kbytes
0.2 256 kbytes
0.1
0 4 16 64 256 Length of a line (bites)
Virtual memoryVirtual memory
Objectives:Objectives: Extension of the internal memory over the Extension of the internal memory over the
external memoryexternal memory Protection of memory zones from un-Protection of memory zones from un-
authorized accesses authorized accesses Implementation techniques:Implementation techniques:
PagingPaging SegmentationSegmentation
SegmentationSegmentation Why? (objective)Why? (objective)
divide and protect memory zones from un-divide and protect memory zones from un-authorized accessesauthorized accesses
How? (principles)How? (principles) Divide the memory into blocks (segments)Divide the memory into blocks (segments)
• fixed or variable lengthfixed or variable length• with or without overlappingwith or without overlapping
Address a location with:Address a location with:
Physical_address = Segment_address + Offset_address Physical_address = Segment_address + Offset_address
• Attach attributes to a segment in order to:Attach attributes to a segment in order to:• control the operations allowed in the segment andcontrol the operations allowed in the segment and• describe its contentdescribe its content
SegmentationSegmentation Advantages:Advantages:
access of a program or task is limited to the locations contained access of a program or task is limited to the locations contained in segments allocated to itin segments allocated to it
memory zones may be separated according to their content or memory zones may be separated according to their content or destination: cod, date, stackdestination: cod, date, stack
a location address inside of a segment require less address bits a location address inside of a segment require less address bits – it’s only a relative/offset address– it’s only a relative/offset address
• consequence: shorter instructions, less memory required consequence: shorter instructions, less memory required segments may be placed in different memory zonessegments may be placed in different memory zones
• changing the location of a program does not require the change of changing the location of a program does not require the change of relative addresses (e.g. label addresses, variable addresses)relative addresses (e.g. label addresses, variable addresses)
Disadvantage:Disadvantage: more complex access mechanismsmore complex access mechanisms longer access timelonger access time
Segmentation for Intel Segmentation for Intel ProcessorsProcessors
Physical memory 1Mo Segment addr. Offset addr x16 + segment (64Ko) 0
15 0 31 0 Selector Offset address 4Go Liniar addr. + Seg. base Limit Segment descriptor 0
Address computation in Real mode
Address computation in Protected mode
Segmentation for Intel Segmentation for Intel ProcessorsProcessors
Details about segmentation in Protected mode:Details about segmentation in Protected mode: Selector: Selector:
• contains:contains: Index – the place of a segment descriptor in a descriptor tableIndex – the place of a segment descriptor in a descriptor table TI – table identification bit: GDT or LDTTI – table identification bit: GDT or LDT RPL – requested privilege level – privilege level required for a task in order to RPL – requested privilege level – privilege level required for a task in order to
access the segmentaccess the segment Segment descriptor:Segment descriptor:
• controls the access to the segment throughcontrols the access to the segment through:: the address of the segmentthe address of the segment length of the segmentlength of the segment access rights (privileges) access rights (privileges) flagsflags
Descriptor tables:Descriptor tables:• General Descriptor Table (GDT) – for common segmentsGeneral Descriptor Table (GDT) – for common segments• Local Descriptor Tables (LDT) – one for each task; contains descriptors for segments Local Descriptor Tables (LDT) – one for each task; contains descriptors for segments
allocated to one taskallocated to one task Descriptor typesDescriptor types::
• Descriptors for Code or Data segmentsDescriptors for Code or Data segments• System descriptorsSystem descriptors• Gate descriptors – controlled access ways to the operating systemGate descriptors – controlled access ways to the operating system
SegmentationSegmentation Protection mechanisms (Intel processors)Protection mechanisms (Intel processors)
AccesAccesss to the memory (only) through descriptors preserved in to the memory (only) through descriptors preserved in GDT GDT andand LDT LDT
• GDT GDT keeps the descriptors for segments accessible for more taskskeeps the descriptors for segments accessible for more tasks• LDT LDT keeps the descriptors of segments allocated for just one taskkeeps the descriptors of segments allocated for just one task
=> prote=> protected segmentscted segments Read and write operations are allowed in accordance with the Read and write operations are allowed in accordance with the
type of the segment (Code of data) and with some flags type of the segment (Code of data) and with some flags (contained in the descriptor)(contained in the descriptor)
• for Code segments: instruction fetch and maybe read datafor Code segments: instruction fetch and maybe read data• for Data segments: read and maybe write operationsfor Data segments: read and maybe write operations
Privilege levelsPrivilege levels::• 4 4 levelslevels, 0 , 0 most privilegedmost privileged, 3 , 3 least privilegedleast privileged• levels 0,1, and 2 allocated to the operating system, the last to the levels 0,1, and 2 allocated to the operating system, the last to the
user programsuser programs• a less privileged a less privileged task task cannot access a more privileged segment cannot access a more privileged segment
(e.g. a segment belonging to the operating system)(e.g. a segment belonging to the operating system)
PagingPaging Why ? (Objective) Why ? (Objective)
increase the internal memory over the external one increase the internal memory over the external one (e.g. hard disc) (e.g. hard disc)
How ? (Principles)How ? (Principles) Internal and external memory is divided into blocks Internal and external memory is divided into blocks
(pages) of fixed length(pages) of fixed length bring into the internal memories only those pages that bring into the internal memories only those pages that
have a high probability of being used in the near have a high probability of being used in the near futurefuture
• justified by the temporal and spatial locality and 90/10 justified by the temporal and spatial locality and 90/10 principlesprinciples
Implementation:Implementation: similar with the cache memory – associative approachsimilar with the cache memory – associative approach
PagingPaging
Design issues: Design issues: Placement of a new page in the internal Placement of a new page in the internal
memorymemory Finding the page in the memoryFinding the page in the memory Replacement policy – in case the internal Replacement policy – in case the internal
memorymemory is full is full Implementation of “write” operationsImplementation of “write” operations Optimal dimension of a page Optimal dimension of a page
• 4kb for ISA x864kb for ISA x86
Paging Paging implementation through associative techniqueimplementation through associative technique
31 0 1 2 3 4 5 6 7 8 Virtual address (12345678H) Page allocation table 0 0 0 1 1 8FFH ……. Page address in the internal memory
12345H 1 3ABH …..
FFFFF 0 0 23 0 Presence bit Page address in the 3 A B 6 7 8 external memory Physical address (3AB678)
Paging - implementationPaging - implementation
Implementation example:Implementation example: virtual memory - 1Tbytevirtual memory - 1Tbyte main memory – 4Gbytesmain memory – 4Gbytes one page – 4Kbytesone page – 4Kbytes number of pages = virtual memory/page number of pages = virtual memory/page = 1TB/4kb = 256kpages= 1TB/4kb = 256kpages• dimension of the page directory table =dimension of the page directory table = = 256Kpages * 4bytes/page_entry == 256Kpages * 4bytes/page_entry = = 1Gbyte !!!! => ¼ of the main memory allocated = 1Gbyte !!!! => ¼ of the main memory allocated
for the page directory tablefor the page directory table• solution: two levels of page directory tables – Intel’s solution: two levels of page directory tables – Intel’s
approachapproach
Paging implemented in Intel Paging implemented in Intel processorsprocessors
Linear address Physical memory 4Go 1023 + + . + 0 0 Page director Page table CR3
Paging – Write operationPaging – Write operation
Problem: Problem: inconsistency between the internal memory inconsistency between the internal memory
and the virtual oneand the virtual one it is critical in case of multi-master (multi-it is critical in case of multi-master (multi-
processor) systemsprocessor) systems Solution: Write backSolution: Write back
solve the inconsistency when the page is solve the inconsistency when the page is downloaded into the virtual memorydownloaded into the virtual memory
the write through technique is not feasible because of the very the write through technique is not feasible because of the very low access time of the virtual (external) memorylow access time of the virtual (external) memory
Virtual memoryVirtual memory
Implementations:Implementations: segmentationsegmentation pagingpaging segmentation and pagingsegmentation and paging
The operating system may The operating system may decide which implementation decide which implementation solution to usesolution to use no virtual memoryno virtual memory only one technique only one technique
(segmentation or paging)(segmentation or paging) both techniquesboth techniques
Offset address
Segmentation
Linear addrress
Paging
Physical address
Memory hierarchyMemory hierarchy cache memorycache memory
implemented in hardwareimplemented in hardware MMU – memory management unit responsible for the transfers MMU – memory management unit responsible for the transfers
between the cache and main memorybetween the cache and main memory transparent for the programmer (no tools or instructions to transparent for the programmer (no tools or instructions to
influence its work)influence its work)
virtual memoryvirtual memory implemented in software with some hardware supportimplemented in software with some hardware support the operating system is responsible for allocation memory space, the operating system is responsible for allocation memory space,
handle transfers between the external memory and the main handle transfers between the external memory and the main memorymemory
partially transparent for the programmer:partially transparent for the programmer:• in protected mode – full accessin protected mode – full access
• in real or virtual mode – transparent for the programmerin real or virtual mode – transparent for the programmer