an introduction to cache design

45
An Introduction to Cache Design 111/03/19 \course\cpeg323-08F\Topic7a 1

Upload: rhea-dodson

Post on 02-Jan-2016

39 views

Category:

Documents


0 download

DESCRIPTION

An Introduction to Cache Design. Cache. A safe place for hiding and storing things. Webster Dictionary. Even with the inclusion of cache, almost all CPUs are still mostly strictly limited by the cache access-time : - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: An Introduction to  Cache Design

An Introduction to

Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 1

Page 2: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 2

Cache

A safe place for hiding and storing things.

Webster Dictionary

Page 3: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 3

Even with the inclusion of cache, almost all CPUs are

still mostly strictly limited by the cache access-time:

In most cases, if the cache access time were decreased,

the machine would speedup accordingly.

- Alan Smith -

Even more so for MPs!

Page 4: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 4

While one can imagine ref. patterns that can defeat

existing cache M designs, it is the author’s

experience that cache M improve performance for

any program or workload which actually does useful

computation.

Page 5: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 5

Generally has four aspects:

1. Maximizing the probability of finding a memory reference’s

target in the cache (the hit ratio).

2. Minimizing the time to access information that is indeed in the

cache (access time).

3. Minimizing the delay due to a miss.

4. Minimizing the overheads of updating main memory,

maintaining cache coherence etc.

Optimizing the design of a cache memory

Page 6: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 6

Key Factor in Design Decision for VM and Cache

Access-timeMainMem

Access-timeCache

Access-timeSecondaryMem

Access-timeMainMem

= 4 ~ 20.

= 104 ~ 106.

Cache control is usually implemented in hardware!!

Page 7: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 7

Memory Technology Typical Access Time $ per MbyteSRAM 10-20 ns 200-400DRAM 90-120 ns 50-100

Magnetic disk 10,000,000 - 20,000,000 ns 2-5

Technology in 1990s:

Technology in 2000s ?

Page 8: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 8

Memory Technology Typical Access Time $ per GbyteSRAM 0.5 - 5 ns 4,000 - 10,000DRAM 50 - 70 ns 100 - 200

Magnetic disk 10,000,000 - 20,000,000 ns 0.5 - 2

Technology in 2004:

Technology in 2008s ?

See P&H Fig. pg. 469 3rd Ed

Page 9: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 9

Memory Technology Typical Access Time $ per GbyteSRAM 0.5 -2.5 ns 2,000 - 5,000DRAM 50 - 70 ns 20 - 75

Magnetic disk 5,000,000 - 20,000,000 ns 0.2 - 2

Technology in 2008:

See P&H Fig. pg. 453 4th Ed

Page 10: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 10

ProcessorProcessorCacheCache

Main

Memory

Main

Memory

Secondary

Memory

Secondary

Memory

Cache in Memory Hierarchy

Page 11: An Introduction to  Cache Design

Emerging Memory Device Technologies

Source: Emerging Nanoscale Memory and Logic devices: A Critical Assesment”, Hutchby et al, IEEE Computer, May, 2008

Page 12: An Introduction to  Cache Design

Emerging Memory Device Technologies

Source: “Emerging Nanoscale Memory and Logic devices: A Critical Assesment”, Hutchby et al, IEEE Computer, May, 2008

Page 13: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 13

Page 14: An Introduction to  Cache Design

Source: Kooge, Peter ACS Productivity Workshop 2008

Page 15: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 15

Four Questions for Classifying Memory Hierarchies:

The fundamental principles that drive all memory

hierarchies allow us to use terms that transcend the levels

we are talking about. These same principles allow us to

pose four questions about any level of the hierarchy:

Page 16: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 16

Q1: Where can a block be placed in the upper

level? (Block placement)

Q2: How is a block found if it is in the upper

level? (Block identification)

Q3: Which block should be replaced on a

miss? (Block replacement)

Q4: What happens on a write? (Write strategy)

Four Questions for Classifying Memory Hierarchies

Page 17: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 17

These questions will help us gain

an understanding of the different

tradeoffs demanded by the

relationships of memories at

different levels of a hierarchy.

Page 18: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 18

01173 30

Line

ADDRESS DATA

Concept of Cache miss and Cache hit

0 1 2 3 4 5 6 7

TAGS DATA

0117X 35, 72, 55, 30, 64, 23, 16, 14

7620X 11, 31, 26, 22, 55, …

3656X 71, 72, 44, 50, …

1741X 33, 35, 07, 65, ...

Page 19: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 19

teff : effective cache access time

tcache : cache access time

tmain : main memory access time

h : hit ratio

teff = htcache + (1-h)tmain

Access Time

Page 20: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 20

Example

Let tcache = 10 ns - 1- 4 clock cycles

tmain = 50 ns - 8-32 clock cycles

h = 0.95

teffect = ?

10 x 0.95 + 50 x 0.05

9.5 + 2.5 = 12

Page 21: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 21

Hit Ratio

• Need high enough (say > 90%) to obtain

desirable level of performance

• Amplifying effect of changes

• Never a constant even for the same

machine

Page 22: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 22

Sensitivity of Performance w.r.t h (hit ratio)

teff = h tcache + (1-h) tmain

= tcache [ h + (1-h) ]

tcache [ 1 + (1-h) ]

since 10, the magnifactor of h changes is 10

times.

Conclusion: very sensitive

tmain

tcachetmain

tcache

tmain

tcache

~~

Page 23: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 23

• Remember:

“h 1”

• Example:

Let h = 0.90

if h = 0.05 (0.90 0.95)

then (1 - h) = 0.05

then teff = tcache ( 1 + 0.5)

~~

Page 24: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 24

Basic Terminology

• Cache line (block) - size of a room

1 ~ 16 words

• Cache directory - key of rooms

Cache may use associativity to find the “right directory” by matching

“A collection of contiguousdata that are treated as a single entity of cache storage.”

The portion of a cache thatholds the access keys that support associative access.

Page 25: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 25

Cache Organization

• Fully associative: an element can be in any block

• Direct mapping : an element can be in only one

block.

• Set-associative : an element can be in a group

of block

Page 26: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 26

An Example

Mem Size = 256 k words x 4B/W = 1 MB

Cache Size = 2 k words = 8 k byte

Block Size = 16 word/block = 64 byte/block

So

Main M has = 16 k blocks (16,384)

Cache has = 128 blocks

addr = 18 bits + 2 bits = (28 x 210) x 22

256K16

2K16

(byte)20

256 k words

Page 27: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 27

Fully Associative

Feature

Any block in M can be in any block-

frame in cache.

All entries (block frame) are compared

simultaneously (by associative search).

Page 28: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 28

simplest example: a block = a word

entire memory word address becomes

Address

027560

0 17

027560 data

0 17

adv: no trashing (quick reorganizing)disadv: overhead of associative search:

cost + time

very “flexible” and higherprobability to reside in cache.

Cache

“tag”

A Special Case

Page 29: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 29

Fully associative cache organization

Page 30: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 30

• No associative match

• From M-addr, “directly” indexed to the

block frame in cache where the block

should be located. A comparison then is

to used to determine if it is a miss or hit.

Direct Mapping

Page 31: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 31

Direct Mapping

Advantage:

simplest:

Disadvantage: “trashing”

Cont’d

Fast (fewer logic)Low cost: (only one set comparator is needed

hence can be in the form of standard M

Page 32: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 32

since cache only has 128 block frames so the degree of multiplexing:

Disadr: “trashing”

Main Memory Size 16384 (block)

128 (27) 128= = 27 block/frame

for addressing the corresponding frame or set of size 1.

the high-order7 bit is usedas tag.

i.e. 27 blocks “fall” in one block frame.

Example

Page 33: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 33

Direct Mapping

Page 34: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 34

Direct Mapping

Mapping (indexing) block addr mod (# of blocks in cache –

in this case: mod (27))

Adv: low-order log2 (cache size) bit can be used for indexing

Cont’d

Page 35: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 35

Set-Associative

• A compromises between direct/full-associative

• The cache is divided into S sets

S = 2, 4, 8, …

• If the cache has M blocks

than, all together, there are

E = blocks/set

# of buildings available for indexing

MS

In our example, S = 128/2 = 64 sets

Page 36: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 36

2-way set associative

The 6-bit will index to the right set, then the 8-bit tag will be used for an associative match.

Page 37: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 37

Associativity with 8-block cache

Page 38: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 38

thus

or

Set Word

8 6 4 2

a 2-way set associative organization:

available for indexing

214 (16k)26

= 28 block/set

28 block/per set of 2 blocks

6 bit used to indexinto the right “set”higher order

8 bit used as taghence an associativematch of 8 bit withthe tags of the 2 blocks is required

2 way

Hence an associative matching of 8 bit with the tags of the 2 block is required.

Page 39: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 39

Sector Mapping Cache

• Sector (IBM 360/85) - 16 sector x 16 block/sector- 1 sector = consecutive multiple blocks- Cache miss: sector replacement- Valid bit - one block is moved on demand

• Example:

Sector block word (tag)

0 6 7 13 14 177 7 4

A sector in memory can be in any sector in cache

Page 40: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 40

Sector Mapping Cache

Page 41: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 41

Cache has = 8 sector

Main memory has = 1K sectors

128 blocks16 blocks/sector

16k

16

Sector mapping cache

cont’d

Page 42: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 42

Example

See P&H Fig. 7.7 3rd Ed or 5.7 4th Ed

Page 43: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 43

Total # of Bits in a CacheTotal # of bits = Cache size x (# of bits of a tag + # of bits of a block +

# of bits in valid field)

For the example:

Direct mapped Cache with 4kB of data, 1-word blocks and 32 bit address

4kB = 1k words = 210 words = 210 blocks

# of bits of tag = 32 – (10 + 0 + 2) = 20

210 blocks 20 words/block 22 bytes/word

Total # of bits = 210 x (20 + 32*1 + 1) = 53* 210 = 53 kbits = 6.625kBytes

Page 44: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 44

Another example: FastMATHFast embedded microprocessor that uses the MIPS Architecture and a

simple cache implementation.

16kB of data, 16-word blocks and 32 bit address

214 bytes * 1 word/4bytes * 1 block/16 words = 214 / (22 * 24 ) = 28 blocks

# of bits of tag = 32 – (8 + 4 + 2) = 18

28 blocks 24 words/block 22 bytes/word

Total # of bits = 28 x (18 + 32*16 + 1) = 531* 28 = 135,936 bits

= 132.75 kBytes

Page 45: An Introduction to  Cache Design

112/04/20 \course\cpeg323-08F\Topic7a 45

Example FastMATH

See P&H Fig. 7.9 3rd Ed or 5.9 4th Ed