Download - Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Caching

Chapter 7

Memory Hierarchy

CPU

L1

L2 Cache

DRAM

Speed

Fastest

Slowest

Size

Smallest

Largest

Cost/bit

Highest

Lowest

Tech

SRAM(logic)

SRAM(logic)

DRAM(capacitors)

Two design decisions

• What shall we put in the cache?

• How shall we organize cache to – find things quickly– hold the most important data– freezer or backpack….

What to put in cache?Try to apply a similar problem’s solution

• Can we predict what data we will use?

What to put in cache?

• Can we predict what data we will use?– Instead of predicting branch direction, predict

next memory address request



next memory address request– Like branch prediction, use previous behavior




• Keep a prediction for every load?– Fetch stage for load is *TOO LATE*

• Keep a prediction per-memory address?





• Keep a prediction per-memory address?– Given address, guess next likely address


• Can we predict what data we will use?– Instead of predicting branch direction, predict next

memory address request– Like branch prediction, use previous behavior


• Keep a prediction per-memory address?– Given address, guess next likely address– Too many choices – table too large or fits too few

Program CharacteristicsFind out more about programs

• Temporal Locality

• Spatial Locality

Program Characteristics

• Temporal Locality– If you use one item, you are likely to use it

again soon


Program Characteristics

• Temporal Locality– If you use one item, you are likely to use it

again soon

• Spatial Locality– If you use one item, you are likely to use its

neighbors soon

Locality

• Programs tend to exhibit spatial & temporal locality. Just a fact of life.

• How can we use this knowledge of program behavior to design a cache?

What does that mean?!?

• 1. Design cache that takes advantage of spatial & temporal locality



• 2. When you program, place data together that is used together to increase spatial & temporal locality



• 2. When you program, place data together that is used together to increase locality– Java - difficult to do– C - more control over data placement

• Note: Caches exploit locality. Programs have varying degrees of locality. Caches do not have locality!

Cache Design

• Temporal Locality


Cache Design

• Temporal Locality– When we obtain the data, store it in the cache.


Cache Design

• Temporal Locality– When we obtain the data, store it in the cache.

• Spatial Locality– Transfer large block of contiguous data to get

item’s neighbors.– Block (Line): Amount of data transferred for a

single miss (data plus neighbors)

Where do we put data?

• Searching whole cache takes time & power

• Direct-mapped– Limit each piece of data to one possible

position

• Search is quick and simple

What is our “key” for lookup?

• Tools are sorted by tool-type

• Books are sorted by subject (Dewey-Decimal)

• Old LISP machine sorted by data type

• Modern machines have no information – can only sort by address

Direct-Mapped

Cache

00011011

010000

100000

110000

Memory

000100

010100

100100

110100

Index

000000

Each box corresponds to one

word (4 bytes)

Direct-Mapped

Cache

00011011

Memory

One block (line)

Index

000000

010000

100000

110000

000100

010100

100100

110100

Direct-Mapped

Cache

00011011

000000

010000

100000

110000

Memory

000100

010100

100100

110100

One block (line)

Index

Draw on the board!!!Show what addresses go

where

Direct-Mapped cacheBlock (Line) size = 2 words or 8 bytes

00011011

Byte Address0b100100100

Where do we look in the cache?

How do we know if it is there?

DataIndex


00011011


Where do we look in the cache? BlockAddress mod #setsBlockAddress & (#sets-1)

How do we know if it is there?

DataIndex

Where is it within the block?Block Address


00011011


Where do we look in the cache? BlockAddress mod #slots BlockAddress & (#slots-1)

How do we know if it is there? We need a tag & valid bit

M[292-295]

DataTag1001

Valid1 M[288-291]

Where is it within the block?IndexTag

00011011

Direct-Mapped Cache

DataTagValid

000

00b1010001

Tag

Index

Byte Offset

Block Offset

Splitting the Address

Definitions

• Byte Offset: Which _____ within _____?

• Block Offset: Which _____ within ______?

• Set: Group of ______ checked each access

• Index: Which ______ within cache?• Tag: Is this the right one?

Definitions

• Byte Offset: Which byte within word• Block Offset: Which _____ within

______?• Set: Group of ______ checked each

access• Index: Which ______ within cache?• Tag: Is this the right one?

Definitions

• Byte Offset: Which byte within word• Block Offset: Which word within

block• Set: Group of ______ checked each


Definitions

• Byte Offset: Which byte within word• Block Offset: Which word within

block• Set: Group of blocks checked each


Definitions

• Byte Offset: Which byte within word

• Block Offset: Which word within block

• Set: Group of blocks checked each access

• Index: Which set within cache?

• Tag: Is this the right one?

(All of the upper bits)

Definitions

• Block (Line)

• Hit

• Miss

• Hit time / Access time

• Miss Penalty

Definitions

• Block - unit of data transfer – bytes/words

• Hit

• Miss


• Miss Penalty

Definitions


• Hit - data found in this cache

• Miss


• Miss Penalty

Definitions



• Miss - data not found in this cache– Send request to lower level


• Miss Penalty

Definitions

• Block - unit of data transfer – bytes/words• Hit - data found in this cache• Miss - data not found in this cache

– Send request to lower level

• Hit time / Access time– Time to access this cache – look for item, return

data

• Miss Penalty

Definitions• Block - unit of data transfer – bytes/words


• Miss - data not found in this cache– Send request to lower level

• Hit time / Access time– Time to access this cache

• Miss Penalty– Time to receive block from lower level– Not always constant

00011011

Direct-Mapped Cache

DataTagValid

000

0 0x1010001

Tag

Index

Byte Offset

Block Offset

Example 1 – Direct-MappedBlock size=2 words

00011011

Direct-Mapped Cache

DataTagValid

000

0

Reference Stream: Hit/Miss0b10010000b00101000b01110000b00100000b00101000b0100100

Miss Rate:Tag Index Byte OffsetBlock Offset



00011011

Direct-Mapped Cache

DataTagValid

00

0

Reference Stream: Hit/Miss0b10010000b00101000b0111000 0b00100000b0010100 0b0100100


0


001001

1011

M[76-79]

Direct-Mapped Cache

DataTagValid

100

0

Reference Stream: Hit/Miss0b1001000 M0b00101000b0111000 0b00100000b0010100 0b0100100


M[72-75]

001001

1011

Direct-Mapped Cache

DataTagValid

100

0




M[76-79] M[72-75]

0010010010

11M[20-23]

Direct-Mapped Cache

DataTagValid

11

0

0



M[16-19]


M[76-79] M[72-75]

0010010010

11

Direct-Mapped Cache

DataTagValid

0

11

0

Reference Stream: Hit/Miss0b1001000 M0b0010100 M0b01110000b00100000b0010100 0b0100100



M[76-79] M[72-75]M[20-23] M[16-19]

00100100100111 M[60-63]

Direct-Mapped Cache

DataTagValid

111

0

Reference Stream: Hit/Miss0b1001000 M0b0010100 M0b0111000 M0b00100000b0010100 0b0100100


M[56-59]


M[76-79] M[72-75]M[20-23] M[16-19]

00100100100111

Direct-Mapped Cache

DataTagValid

111

0

Reference Stream: Hit/Miss0b1001000 M0b0010100 M0b0111000 M0b00100000b0010100 0b0100100



M[76-79] M[72-75]M[20-23] M[16-19]M[60-63] M[56-59]

00100100100111

Direct-Mapped Cache

DataTagValid

111

0

Reference Stream: Hit/Miss0b1001000 M0b0010100 M0b0111000 M0b0010000 H0b0010100 0b0100100



M[16-19]M[20-23]M[76-79] M[72-75]

M[60-63] M[56-59]

00100100100111

Direct-Mapped Cache

DataTagValid

111

0

Reference Stream: Hit/Miss0b1001000 M0b0010100 M0b0111000 M0b0010000 H0b0010100 H0b0100100



M[16-19]M[20-23]M[76-79] M[72-75]

M[60-63] M[56-59]

00100101100111

Direct-Mapped Cache

DataTagValid

111

0

Reference Stream: Hit/Miss0b1001000 M0b0010100 M0b0111000 M0b0010000 H0b0010100 H0b0100100 M



M[16-19]M[20-23]M[76-79] M[72-75]

M[60-63] M[56-59]

0100100101100111

M[36-39]

Direct-Mapped Cache

DataTagValid

111

1



M[32-35]


M[16-19]M[20-23]M[76-79] M[72-75]

M[60-63] M[56-59]

0100100101100111

Direct-Mapped Cache

DataTagValid

111

1


Miss Rate: Tag Index Byte OffsetBlock Offset


M[16-19]M[20-23]M[76-79] M[72-75]

M[60-63] M[56-59]

M[36-39] M[32-35]

0100100101100111

Direct-Mapped Cache

DataTagValid

111

1


Miss Rate: 4 / 6 = 67%Hit Rate: 2 / 6 = 33%

Tag Index Byte OffsetBlock Offset


M[16-19]M[20-23]M[76-79] M[72-75]

M[60-63] M[56-59]

M[36-39] M[32-35]

Implementation

00011011

DataTagValid


Tag IndexByte Offset

=

Hit?

MUX

Block offset

Data

Example 2• You are implementing a 64-Kbyte cache,

32-bit address• The block size (line size) is 16 bytes.• Each word is 4 bytes• How many bits is the block offset?

• How many bits is the index?

• How many bits is the tag?

Example 2• You are implementing a 64-Kbyte cache

• The block size (line size) is 16 bytes.

• Each word is 4 bytes

• How many bits is the block offset?– 16 / 4 = 4 words -> 2 bits

• How many bits is the index?




• Each word is 4 bytes, address 32 bits


• How many bits is the index?– 64*1024 / 16 = 4096 -> 12 bits




• Each word is 4 bytes, address 32 bits


• How many bits is the index?– 64*1024 / 16 = 4096 -> 12 bits

• How many bits is the tag?– 32 - (2 + 12 + 2) = 16 bits

How caches work• Classic abstraction

• Each level of hierarchy has no knowledge of the configuration of lower level

L1

L2 Cache

DRAM

Memory

Me L2 Cache

DRAM

Memory

Me

L1 cache’s perspective L2 cache’s perspective

Memory Operation at any level

Cache

Memory

Me

Address

1. Cache receives request1.

Cache

Memory

Me

Address

1. Cache receives request2. Look for item in cache

Memory operation at any level

1.

2.

Cache

Memory

Me

Address


Hit - return data

Memory operation at any levelData

1.

2.

3.

Cache

Memory

Me

Address


Hit - return dataMiss - request memory


1.

2.

3.

Cache

Memory

Me

Address



receive dataupdate cache


1.

2.

3.4.

Cache

Memory

Me

Address


Hit - return dataMiss – 3. request memory

4. receive data5. update cache5. return data

Memory operation at any levelData

1.

2.

3.4.

5.

Timing

Cache

Memory

Me

Address

1. Cache receives request

Cache

Memory

Me

Address


Timing

Access Time

Cache

Memory

Me

Address


Hit - return data

Data

Access Time

Cache

Memory

Me

Address



Access Time

Cache

Memory

Me

Address



receive blockupdate cache

Access Time

Miss Penalty

Cache

Memory

Me

Address



receive blockupdate cachereturn data

Data

Access Time

Miss Penalty

Performance

• Hit: latency =

• Miss: latency =

• Goal: minimize misses!!!

Performance

• Hit: latency = access time

• Miss: latency =


Performance

• Hit: latency = access time

• Miss: latency = access time + miss penalty


Download - Caching Chapter 7. Memory Hierarchy CPU L1 L2 Cache DRAM Speed Fastest Slowest Size Smallest Largest Cost/bit Highest Lowest Tech SRAM (logic) SRAM (logic)

Top Related