structure of computer systems curse 9 memory hierarchy

Structure of Computer Structure of Computer SystemsSystems

Curse 9 Memory hierarchyCurse 9 Memory hierarchy

Memory hierarchiesMemory hierarchies

Why memory hierarchies?Why memory hierarchies? what we want:what we want:

• big capacitybig capacity, , high speedhigh speed at an at an affordable priceaffordable price• no today’s memory technologies can assure all 3 no today’s memory technologies can assure all 3

requirements in the same timerequirements in the same time what we have:what we have:

• high speed, low capacity - SRAM, ROMhigh speed, low capacity - SRAM, ROM• medium speed, big capacity – DRAMmedium speed, big capacity – DRAM• low speed, almost infinite capacity – HDD, DVDlow speed, almost infinite capacity – HDD, DVD

how to achieve all 3 requirements?how to achieve all 3 requirements?• combining technologies in a hierarchical waycombining technologies in a hierarchical way

Performance features of Performance features of memoriesmemories

SRAMSRAM DRAMDRAM HDD, DVDHDD, DVD

CapacityCapacity smallsmall

1-64ko1-64ko

MediumMedium

256-2Go256-2Go

BigBig

20-160Go20-160Go

Access timeAccess time SmallSmall

1-10ns1-10ns

MediumMedium

15-70ns15-70ns

BigBig

1-10ms1-10ms

CostCost bigbig mediummedium smallsmall

Memory hierarchiesMemory hierarchies

Processor

CacheInternal memory

(operative)

Virtual memory

SRAM DRAM HD, CD, DVD

Principles in favor of memory Principles in favor of memory hierarchies hierarchies

Temporal localityTemporal locality – if a location is accessed at a given – if a location is accessed at a given time it has a high probability of being accessed in the time it has a high probability of being accessed in the near futurenear future

examples: exaction of loops (for, while, etc.), repeated examples: exaction of loops (for, while, etc.), repeated processing of some variables processing of some variables

Spatial localitySpatial locality – if a location is accessed than its – if a location is accessed than its neighbors have a high probability of being accessed in neighbors have a high probability of being accessed in the near futurethe near future

examples: loops, vectors and records processingexamples: loops, vectors and records processing 90/10 90/10 – 90% of the time the processor executes 10% of – 90% of the time the processor executes 10% of

the programthe program

The ideaThe idea: to bring memory zones with higher probability : to bring memory zones with higher probability of access in the future, closer to the processorof access in the future, closer to the processor

Cache memoryCache memory

High speed, low capacity memoryHigh speed, low capacity memory The closest memory to the processorThe closest memory to the processor Organization: lines of cache memoriesOrganization: lines of cache memories Keeps copies of zones (lines) from the main Keeps copies of zones (lines) from the main

(internal) memory(internal) memory The cache memory is not visible for the The cache memory is not visible for the

programmerprogrammer The transfer between the cache and the The transfer between the cache and the

internal memory is made automatically under internal memory is made automatically under the control of the Memory Management Unit the control of the Memory Management Unit (MMU)(MMU)

Typical cache memory Typical cache memory parameters parameters

Parameter ValueValue

Memory dimension 32kocteţi-64Moctet 32kocteţi-64Moctet

Dimension of a cache line 16-256 bytes 16-256 bytes

Access time 0.1-1 ns 0.1-1 ns

Speed (bandwidth) 800-5000Mbytes/sec. 800-5000Mbytes/sec.

Circuit types Processor’s internal RAM or external static Processor’s internal RAM or external static RAM RAM

Design of cache memoryDesign of cache memory

o Design problems:Design problems:1. Where should we place a new line ?1. Where should we place a new line ?2. How do we find a location in the cache memory ?2. How do we find a location in the cache memory ?3. Which line should be replace if the memory is full and a new 3. Which line should be replace if the memory is full and a new

data is requested ?data is requested ?4. How are the “write” operations solved ? 4. How are the “write” operations solved ? 5. Which is the optimal length of a cache line ? Cache efficiency?5. Which is the optimal length of a cache line ? Cache efficiency?

Cache memory architectures: Cache memory architectures: cache memory with direct mappingcache memory with direct mapping associative cache memoryassociative cache memory set associative cache memory – (N-way cache)set associative cache memory – (N-way cache) cache memory organized on sectorscache memory organized on sectors

Cache memory with direct Cache memory with direct mapping (1-way cache)mapping (1-way cache)

Principle: the address of the line in the cache memory is Principle: the address of the line in the cache memory is determined directly from the location’s physical address – determined directly from the location’s physical address – direct mappingdirect mapping

a memory line can be placed in a unique places in the cache (1-a memory line can be placed in a unique places in the cache (1-way cache)way cache)

the tag is used to identify lines with the same position in the cache the tag is used to identify lines with the same position in the cache memorymemory

Main memory

Cache memory

Cache memory with direct mappingCache memory with direct mapping Example:

4GB – internal memory – 32 address lines 4 MB – cache memory – 22 address lines 64 KLines – 16 Line index signals 64 locations/line – 6 Location index signals

Line 0

Line 1

Line 2

Line 3

Line FFFF

...........

Hit/MissComp.

Cache lines Tag

Tag

Line index

Location index

Physical address (32 bits)

10 bits 16 bits 6 bits

Cache memory with direct Cache memory with direct mappingmapping

Design issues:Design issues:1. Where to place a new line? 1. Where to place a new line?

• in the place pointed by the line index fieldin the place pointed by the line index field

2. How do we find a location in the cache memory ?2. How do we find a location in the cache memory ?• based on tag, line index and location index (compare based on tag, line index and location index (compare

tags of the current address and the one in the tags of the current address and the one in the indicated cache line – hit or miss)indicated cache line – hit or miss)

3. Which line should be replace when a new data is 3. Which line should be replace when a new data is requested ?requested ?• the one indicated by the line index (even if the present the one indicated by the line index (even if the present

one is occupied and other lines are free)one is occupied and other lines are free)

Cache memory with direct Cache memory with direct mappingmapping

Advantages:Advantages: simple to implementsimple to implement easy to place, find and replace a cache lineeasy to place, find and replace a cache line

Drawbacks:Drawbacks: in some cases, repeated replacement of lines in some cases, repeated replacement of lines

even if the cache memory is not fulleven if the cache memory is not full inefficient use of the cache memory spaceinefficient use of the cache memory space

Associative cache memoryAssociative cache memory(N-way cache memory)(N-way cache memory)

Principle: a line is placed in any place of the Principle: a line is placed in any place of the cache memory (N-way cache)cache memory (N-way cache)

Main memory

Cache memory

Associative cache memoryAssociative cache memory Example:

4GB – internal memory – 32 address lines 1 MB – cache memory – 22 address lines 256 locations/line – 8 Location index signals 4096 cache lines

Physical address (32 bits) Descriptor Cache line Counter

Comp.Hit/Miss

11223344

Line descriptor

Location

24 bits (11223344h) 8 bits

12345678

78345579

0

1

2

3

4095

4094

Associative cache memoryAssociative cache memory

Design issues:Design issues:1. Where to place a new line? 1. Where to place a new line?

• in any free cache line or in a line less used in the near past in any free cache line or in a line less used in the near past

2. How do we find a location in the cache memory ?2. How do we find a location in the cache memory ?• compare the line field in the address with the descriptor part compare the line field in the address with the descriptor part

in the cache linesin the cache lines• compare in parallel – number of comparators is equal compare in parallel – number of comparators is equal

with the number of cache lines – too many comparatorswith the number of cache lines – too many comparators• compare sequentially - one comparator – too much timecompare sequentially - one comparator – too much time

3. Which line should be replace if the memory is full and 3. Which line should be replace if the memory is full and a new data is requested ?a new data is requested ?

• random choicerandom choice• leased used in the near past – it uses a counter for every lineleased used in the near past – it uses a counter for every line

Associative cache memoryAssociative cache memory

advantages:advantages: efficient use of the cache memory's capacityefficient use of the cache memory's capacity

Drawback:Drawback: limited number of cache lines, so limited limited number of cache lines, so limited

cache capacity – because of the comparison cache capacity – because of the comparison operation (hardware limitation or time operation (hardware limitation or time limitation)limitation)

Set associative cache memory Set associative cache memory (2, 4, 8 .. WAY cache)(2, 4, 8 .. WAY cache)

Principle: combination of associative and direct Principle: combination of associative and direct mapping design:mapping design:

• lines organized on blockslines organized on blocks• block identification through direct mappingblock identification through direct mapping• line identification (inside the block) through associative methodline identification (inside the block) through associative method

Main memory

Cache memory

2-way memory

2 blocks,

2 lines in each block

Set associative cache memorySet associative cache memory Example: 16-way cacheExample: 16-way cache

4G – internal memory4G – internal memory 4 MB - cache4 MB - cache

......

......

......

......

Descriptor Content Block no.

Comp.Hit/Miss

Physical address (32 bits)

14 10 8 0

1

2

1023

256 locations/line256 locations/line 16 lines/block16 lines/block 1024 blocks1024 blocks

Set associative cache Set associative cache memorymemory

Advantages:Advantages: combines the advantages of the two techniques:combines the advantages of the two techniques:

• many lines are allowed, no capacity limitationmany lines are allowed, no capacity limitation• efficient use of the whole cache capacityefficient use of the whole cache capacity

Drawback:Drawback: more complex implementationmore complex implementation

Cache memory organized on Cache memory organized on sectorssectors

Memoria cache Physical address Descriptor Content Sector adr. Block ad. loc. sector 1356 sector 5789 sector 2266 .. sector 7891 Descr. Cont.

Cache memory organized on Cache memory organized on sectorssectors

Principle: similar with the Set associative Principle: similar with the Set associative cache, but:cache, but: the order is changed, the sector (block) is the order is changed, the sector (block) is

identified through associative method and the identified through associative method and the line inside the sector with direct mappingline inside the sector with direct mapping

Advantages and drawbacks: similar with Advantages and drawbacks: similar with the previous methodthe previous method

Writing operation in the cache Writing operation in the cache memorymemory

The problem: writing in the cache memory generates The problem: writing in the cache memory generates inconsistency between the main memory and the copy in inconsistency between the main memory and the copy in the cachethe cache

Two techniques:Two techniques: Write backWrite back – writes the data in the internal memory only when the – writes the data in the internal memory only when the

line is downloaded (replaced) from the cache memoryline is downloaded (replaced) from the cache memory• Advantage: write operations made at the speed of the cache memory – Advantage: write operations made at the speed of the cache memory –

high efficiencyhigh efficiency• Drawback: temporary inconsistency between the two memories – it may Drawback: temporary inconsistency between the two memories – it may

be critical in case of multi-master (e.g. multi-processor) systems, be critical in case of multi-master (e.g. multi-processor) systems, because it may generate errorsbecause it may generate errors

Write throughWrite through – writes the data in the cache and in the main memory – writes the data in the cache and in the main memory in the same time in the same time

• Advantage: no inconsistencyAdvantage: no inconsistency• Drawback: write operations are made at the speed of the internal Drawback: write operations are made at the speed of the internal

memory (much lower speed)memory (much lower speed) but, write operations are not so frequent (1 write from 10 read-write but, write operations are not so frequent (1 write from 10 read-write

operations)operations)

Efficiency of the cache memoryEfficiency of the cache memory

Hit/miss rate influence the access timeHit/miss rate influence the access time• reduce memory access time treduce memory access time taa

ttaa = t = tcc + (1-R + (1-Rss)*t)*tii

• where: where: ttaa – average access time – average access time

ttii – access time of the internal memory – access time of the internal memory

ttcc – access time of the cache memory – access time of the cache memory

RRs s – success rate– success rate

(1-R(1-Rss) – miss rate) – miss rate

Cache memoryCache memory Which is the optimal length of a cache line ?Which is the optimal length of a cache line ?

depends on the internal organization of the cache, depends on the internal organization of the cache, bus and the configuration of processorsbus and the configuration of processors

Miss rate dimension of cache memory

0.4 1 kbytes

0.3 8 kbytes 16 kbytes

0.2 256 kbytes

0.1

0 4 16 64 256 Length of a line (bites)

Virtual memoryVirtual memory

Objectives:Objectives: Extension of the internal memory over the Extension of the internal memory over the

external memoryexternal memory Protection of memory zones from un-Protection of memory zones from un-

authorized accesses authorized accesses Implementation techniques:Implementation techniques:

PagingPaging SegmentationSegmentation

SegmentationSegmentation Why? (objective)Why? (objective)

divide and protect memory zones from un-divide and protect memory zones from un-authorized accessesauthorized accesses

How? (principles)How? (principles) Divide the memory into blocks (segments)Divide the memory into blocks (segments)

• fixed or variable lengthfixed or variable length• with or without overlappingwith or without overlapping

Address a location with:Address a location with:

Physical_address = Segment_address + Offset_address Physical_address = Segment_address + Offset_address

• Attach attributes to a segment in order to:Attach attributes to a segment in order to:• control the operations allowed in the segment andcontrol the operations allowed in the segment and• describe its contentdescribe its content

SegmentationSegmentation Advantages:Advantages:

access of a program or task is limited to the locations contained access of a program or task is limited to the locations contained in segments allocated to itin segments allocated to it

memory zones may be separated according to their content or memory zones may be separated according to their content or destination: cod, date, stackdestination: cod, date, stack

a location address inside of a segment require less address bits a location address inside of a segment require less address bits – it’s only a relative/offset address– it’s only a relative/offset address

• consequence: shorter instructions, less memory required consequence: shorter instructions, less memory required segments may be placed in different memory zonessegments may be placed in different memory zones

• changing the location of a program does not require the change of changing the location of a program does not require the change of relative addresses (e.g. label addresses, variable addresses)relative addresses (e.g. label addresses, variable addresses)

Disadvantage:Disadvantage: more complex access mechanismsmore complex access mechanisms longer access timelonger access time

Segmentation for Intel Segmentation for Intel ProcessorsProcessors

Physical memory 1Mo Segment addr. Offset addr x16 + segment (64Ko) 0

15 0 31 0 Selector Offset address 4Go Liniar addr. + Seg. base Limit Segment descriptor 0

Address computation in Real mode

Address computation in Protected mode

Segmentation for Intel Segmentation for Intel ProcessorsProcessors

Details about segmentation in Protected mode:Details about segmentation in Protected mode: Selector: Selector:

• contains:contains: Index – the place of a segment descriptor in a descriptor tableIndex – the place of a segment descriptor in a descriptor table TI – table identification bit: GDT or LDTTI – table identification bit: GDT or LDT RPL – requested privilege level – privilege level required for a task in order to RPL – requested privilege level – privilege level required for a task in order to

access the segmentaccess the segment Segment descriptor:Segment descriptor:

• controls the access to the segment throughcontrols the access to the segment through:: the address of the segmentthe address of the segment length of the segmentlength of the segment access rights (privileges) access rights (privileges) flagsflags

Descriptor tables:Descriptor tables:• General Descriptor Table (GDT) – for common segmentsGeneral Descriptor Table (GDT) – for common segments• Local Descriptor Tables (LDT) – one for each task; contains descriptors for segments Local Descriptor Tables (LDT) – one for each task; contains descriptors for segments

allocated to one taskallocated to one task Descriptor typesDescriptor types::

• Descriptors for Code or Data segmentsDescriptors for Code or Data segments• System descriptorsSystem descriptors• Gate descriptors – controlled access ways to the operating systemGate descriptors – controlled access ways to the operating system

SegmentationSegmentation Protection mechanisms (Intel processors)Protection mechanisms (Intel processors)

AccesAccesss to the memory (only) through descriptors preserved in to the memory (only) through descriptors preserved in GDT GDT andand LDT LDT

• GDT GDT keeps the descriptors for segments accessible for more taskskeeps the descriptors for segments accessible for more tasks• LDT LDT keeps the descriptors of segments allocated for just one taskkeeps the descriptors of segments allocated for just one task

=> prote=> protected segmentscted segments Read and write operations are allowed in accordance with the Read and write operations are allowed in accordance with the

type of the segment (Code of data) and with some flags type of the segment (Code of data) and with some flags (contained in the descriptor)(contained in the descriptor)

• for Code segments: instruction fetch and maybe read datafor Code segments: instruction fetch and maybe read data• for Data segments: read and maybe write operationsfor Data segments: read and maybe write operations

Privilege levelsPrivilege levels::• 4 4 levelslevels, 0 , 0 most privilegedmost privileged, 3 , 3 least privilegedleast privileged• levels 0,1, and 2 allocated to the operating system, the last to the levels 0,1, and 2 allocated to the operating system, the last to the

user programsuser programs• a less privileged a less privileged task task cannot access a more privileged segment cannot access a more privileged segment

(e.g. a segment belonging to the operating system)(e.g. a segment belonging to the operating system)

PagingPaging Why ? (Objective) Why ? (Objective)

increase the internal memory over the external one increase the internal memory over the external one (e.g. hard disc) (e.g. hard disc)

How ? (Principles)How ? (Principles) Internal and external memory is divided into blocks Internal and external memory is divided into blocks

(pages) of fixed length(pages) of fixed length bring into the internal memories only those pages that bring into the internal memories only those pages that

have a high probability of being used in the near have a high probability of being used in the near futurefuture

• justified by the temporal and spatial locality and 90/10 justified by the temporal and spatial locality and 90/10 principlesprinciples

Implementation:Implementation: similar with the cache memory – associative approachsimilar with the cache memory – associative approach

PagingPaging

Design issues: Design issues: Placement of a new page in the internal Placement of a new page in the internal

memorymemory Finding the page in the memoryFinding the page in the memory Replacement policy – in case the internal Replacement policy – in case the internal

memorymemory is full is full Implementation of “write” operationsImplementation of “write” operations Optimal dimension of a page Optimal dimension of a page

• 4kb for ISA x864kb for ISA x86

Paging Paging implementation through associative techniqueimplementation through associative technique

31 0 1 2 3 4 5 6 7 8 Virtual address (12345678H) Page allocation table 0 0 0 1 1 8FFH ……. Page address in the internal memory

12345H 1 3ABH …..

FFFFF 0 0 23 0 Presence bit Page address in the 3 A B 6 7 8 external memory Physical address (3AB678)

Paging - implementationPaging - implementation

Implementation example:Implementation example: virtual memory - 1Tbytevirtual memory - 1Tbyte main memory – 4Gbytesmain memory – 4Gbytes one page – 4Kbytesone page – 4Kbytes number of pages = virtual memory/page number of pages = virtual memory/page = 1TB/4kb = 256kpages= 1TB/4kb = 256kpages• dimension of the page directory table =dimension of the page directory table = = 256Kpages * 4bytes/page_entry == 256Kpages * 4bytes/page_entry = = 1Gbyte !!!! => ¼ of the main memory allocated = 1Gbyte !!!! => ¼ of the main memory allocated

for the page directory tablefor the page directory table• solution: two levels of page directory tables – Intel’s solution: two levels of page directory tables – Intel’s

approachapproach

Paging implemented in Intel Paging implemented in Intel processorsprocessors

Linear address Physical memory 4Go 1023 + + . + 0 0 Page director Page table CR3

Paging – Write operationPaging – Write operation

Problem: Problem: inconsistency between the internal memory inconsistency between the internal memory

and the virtual oneand the virtual one it is critical in case of multi-master (multi-it is critical in case of multi-master (multi-

processor) systemsprocessor) systems Solution: Write backSolution: Write back

solve the inconsistency when the page is solve the inconsistency when the page is downloaded into the virtual memorydownloaded into the virtual memory

the write through technique is not feasible because of the very the write through technique is not feasible because of the very low access time of the virtual (external) memorylow access time of the virtual (external) memory

Virtual memoryVirtual memory

Implementations:Implementations: segmentationsegmentation pagingpaging segmentation and pagingsegmentation and paging

The operating system may The operating system may decide which implementation decide which implementation solution to usesolution to use no virtual memoryno virtual memory only one technique only one technique

(segmentation or paging)(segmentation or paging) both techniquesboth techniques

Offset address

Segmentation

Linear addrress

Paging

Physical address

Memory hierarchyMemory hierarchy cache memorycache memory

implemented in hardwareimplemented in hardware MMU – memory management unit responsible for the transfers MMU – memory management unit responsible for the transfers

between the cache and main memorybetween the cache and main memory transparent for the programmer (no tools or instructions to transparent for the programmer (no tools or instructions to

influence its work)influence its work)

virtual memoryvirtual memory implemented in software with some hardware supportimplemented in software with some hardware support the operating system is responsible for allocation memory space, the operating system is responsible for allocation memory space,

handle transfers between the external memory and the main handle transfers between the external memory and the main memorymemory

partially transparent for the programmer:partially transparent for the programmer:• in protected mode – full accessin protected mode – full access

• in real or virtual mode – transparent for the programmerin real or virtual mode – transparent for the programmer

structure of computer systems curse 9 memory hierarchy

Documents

mb cache memory

cache memory architectures

cache memorycache memory

cache memory spaceassoci

memory zones

sectorscache memory

freecache memory

indicated cache line