16memhier1 - brown universitycs.brown.edu/courses/cs033/lecture/16memhier1x.pdf · cs33 intro to...

49
CS33 Intro to Computer Systems XVI–1 Copyright © 2018 Thomas W. Doeppner. All rights reserved. CS 33 Architecture and Optimization (3)

Upload: hoanghanh

Post on 04-Jan-2019

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–1 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

CS 33Architecture and Optimization (3)

Page 2: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–2 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

Hyper Threading

Execution

FunctionalUnits

Integer/Branch

FPAdd

FPMult/Div Load Store

DataCache

DataData

Addr. Addr.

GeneralInteger

Operation Results

Instruction Control

InstructionCache

FetchControl

InstructionDecode

Address

Instructions

RetirementUnit

RegisterFile

Instruction Control

InstructionCache

FetchControl

InstructionDecode

Address

Instructions

RetirementUnit

RegisterFile

Page 3: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–3 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

Chip

Multiple Cores

Execution

FunctionalUnits

Instruction Control

Integer/Branch

FPAdd

FPMult/Div Load Store

InstructionCache

DataCache

FetchControl

InstructionDecode

Address

Instructions

Operations

DataData

Addr. Addr.

GeneralInteger

Operation Results

RetirementUnit

RegisterFile

Execution

FunctionalUnits

Instruction Control

Integer/Branch

FPAdd

FPMult/Div Load Store

InstructionCache

DataCache

FetchControl

InstructionDecode

Address

Instructions

Operations

DataData

Addr. Addr.

GeneralInteger

Operation Results

RetirementUnit

RegisterFile

MoreCacheOther Stuff Other Stuff

Page 4: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–4 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

CS 33Memory Hierarchy I

Page 5: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer SystemsXVI–5

Copyright © 2018 Thomas W. Doeppner. All rights reserved.

Random-Access Memory (RAM)

• Key features

– RAM is traditionally packaged as a chip

– basic storage unit is normally a cell (one bit per cell)

– multiple RAM chips form a memory

• Static RAM (SRAM)

– each cell stores a bit with a four- or six-transistor circuit

– retains value indefinitely, as long as it is kept powered

– relatively insensitive to electrical noise (EMI), radiation, etc.

– faster and more expensive than DRAM

• Dynamic RAM (DRAM)

– each cell stores bit with a capacitor; transistor is used for access

– value must be refreshed every 10-100 ms

– more sensitive to disturbances (EMI, radiation,…) than SRAM

– slower and cheaper than SRAM

Page 6: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–6 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

SRAM vs DRAM Summary

Trans. Access Needs Needsper bit time refresh? EDC? Cost Applications

SRAM 4 or 6 1X No Maybe 100x Cache memories

DRAM 1 10X Yes Yes 1X Main memories,frame buffers

• EDC = error detection and correction• to cope with noise, etc.

Page 7: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–7 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

Conventional DRAM Organization• d x w DRAM:

– dw total bits organized as d supercells of size wbits

cols

rows

0 1 2 3

0

1

2

3

Internal row buffer

16 x 8 DRAM chip

addr

data

supercell(2,1)

2 bits/

8 bits/

Memorycontroller

(to/from CPU)

Page 8: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–8 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

Reading DRAM Supercell (2,1)Step 1(a): row access strobe (RAS) selects row 2Step 1(b): row 2 copied from DRAM array to row buffer

Cols

Rows

RAS = 20 1 2 3

0

1

2

Internal row buffer

16 x 8 DRAM chip

3

addr

data

2/

8/

Memorycontroller

Page 9: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–9 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

Reading DRAM Supercell (2,1)Step 2(a): column access strobe (CAS) selects column 1Step 2(b): supercell (2,1) copied from buffer to data lines, and

eventually back to the CPU

Cols

Rows

0 1 2 3

0

1

2

3

Internal row buffer

16 x 8 DRAM chip

CAS = 1

addr

data

2/

8/

Memorycontroller

supercell(2,1)

supercell(2,1)

To CPU

Page 10: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–10 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

Memory Modules

: supercell (i,j)

64 MB memory moduleconsisting ofeight 8Mx8 DRAMs

addr (row = i, col = j)

Memorycontroller

DRAM 7

DRAM 0

031 78151623243263 394047485556

64-bit doubleword at main memory address A

bits0-7

bits8-15

bits16-23

bits24-31

bits32-39

bits40-47

bits48-55

bits56-63

64-bit doubleword

031 78151623243263 394047485556

Page 11: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–11 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

Enhanced DRAMs

• Basic DRAM cell has not changed since its invention in 1966

– commercialized by Intel in 1970

• DRAMs with better interface logic and faster I/O:– synchronous DRAM (SDRAM)

» uses a conventional clock signal instead of asynchronous control» allows reuse of the row addresses (e.g., RAS, CAS, CAS, CAS)

– double data-rate synchronous DRAM (DDR SDRAM)» DDR1

• twice as fast» DDR2

• four times as fast» DDR3

• eight times as fast

Page 12: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–12 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

Enhanced DRAMs

DRAM Cell

Array

fSDR: n B/sec

fDRAM

Cell Array

I/O Buffer

DDR1: 2n B/sec

DRAM Cell

Array

I/O Buffer

2fDDR2: 4n B/sec

DRAM Cell

Array

I/O Buffer

4fDDR3: 8n B/sec

Page 13: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–13 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

Summary

• Memory transfer speed increased by a factor of 8

– no increase in DRAM Cell Array speed– 8 times more data transferred at once

» 64 adjacent bytes fetched from DRAM

Page 14: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–14 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

Quiz 1

A program is loading randomly selected bytes

from memory. These bytes will be delivered to

the processor on a DDR3 system at a speed

thatʼs n times that of an SDR system, where n

is:

a) 1

b) 2

c) 4

d) 8

Page 15: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–15 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

Traditional Bus Structure Connecting CPU and Memory

• A bus is a collection of parallel wires that carry address, data, and control signals

• Buses are typically shared by multiple devices

Mainmemory

I/O bridgeBus interface

ALU

Register file

CPU chip

System bus Memory bus

Page 16: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–16 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

Memory Read Transaction (1)

• CPU places address A on the memory bus

ALU

Register file

Bus interfaceA 0

Ax

Main memoryI/O bridge

%rax

Load operation: movq A, %rax

Page 17: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–17 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

Memory Read Transaction (2)

• Main memory reads A from the memory bus, retrieves word x, and places it on the bus

ALU

Register file

Bus interface

x 0

Ax

Main memory

%rax

I/O bridge

Load operation: movq A, %rax

Page 18: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–18 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

Memory Read Transaction (3)

• CPU reads word x from the bus and copies it into register %rax

xALU

Register file

Bus interface x

Main memory0

A

%rax

I/O bridge

Load operation: movq A, %rax

Page 19: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–19 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

Memory Write Transaction (1)

• CPU places address A on bus. Main memory reads it and waits for the corresponding data word to arrive

yALU

Register file

Bus interfaceA

Main memory0

A

%rax

I/O bridge

Store operation: movq %rax, A

Page 20: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–20 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

Memory Write Transaction (2)

• CPU places data word y on the bus

yALU

Register file

Bus interfacey

Main memory0

A

%rax

I/O bridge

Store operation: movq %rax, A

Page 21: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–21 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

Memory Write Transaction (3)

• Main memory reads data word y from the bus and stores it at address A

yALU

register file

Bus interface y

main memory0

A

%rax

I/O bridge

Store operation: movq %rax, A

Page 22: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–22 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

A Mismatch

• A processor clock cycle is ~0.3 nsecs– SunLab machines (Intel Core i5-4690) run at 3.5 GHz

• Basic operations take 1 – 10 clock cycles– .3 – 3 nsecs

• Accessing memory takes 70-100 nsecs• How is this made to work?

Page 23: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–23 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

Caching to the Rescue

CPU

Cache

Page 24: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–24 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

Cache Memories

• Cache memories are small, fast SRAM-based memories managed automatically in hardware

– hold frequently accessed blocks of main memory• CPU looks first for data in caches (e.g., L1, L2, and L3),

then in main memory• Typical system structure:

Mainmemory

I/ObridgeBus interface

ALU

Register fileCPU chip

System bus Memory bus

Cache memories

Page 25: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–25 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

General Cache Concepts

0 1 2 3

4 5 6 7

8 9 10 11

12 13 14 15

8 9 14 3Cache

MemoryLarger, slower, cheaper memoryviewed as partitioned into “blocks”

Data is copied in block-sized transfer units

Smaller, faster, more expensivememory caches a subset ofthe blocks

4

4

4

10

10

10

Page 26: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–26 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

General Cache Concepts: Hit

0 1 2 34 5 6 78 9 10 11

12 13 14 15

8 9 14 3Cache

Memory

Data in block b is neededRequest: 14

14Block b is in cache:Hit!

Page 27: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–27 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

General Cache Concepts: Miss

0 1 2 34 5 6 78 9 10 11

12 13 14 15

8 9 14 3Cache

Memory

Data in block b is neededRequest: 12

Block b is not in cache:Miss!

Block b is fetched frommemoryRequest: 12

12

12

12

Block b is stored in cache• placement policy:

determines where b goes• replacement policy:

determines which blockgets evicted (victim)

Page 28: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–28 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

General Caching Concepts:

Types of Cache Misses

• Cold (compulsory) miss

– cold misses occur because the cache is empty

• Conflict miss

– most caches limit blocks to a small subset (sometimes a

singleton) of the block positions in RAM

» e.g., block i in RAM must be placed in block (i mod 4) in the cache

– conflict misses occur when the cache is large enough, but

multiple data objects all map to the same cache block

» e.g., referencing blocks 0, 8, 0, 8, 0, 8, ... would miss every time

• Capacity miss

– occurs when the set of active cache blocks (working set) is

larger than the cache

Page 29: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–29 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

General Cache Organization (S, E, B)E = 2e lines per set

S = 2s sets

setline

0 1 2 B-1tagv

B = 2b bytes per cache block (the data)

Cache size:C = S x E x B data bytes

valid bit

Page 30: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–30 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

Cache ReadE = 2e lines per set

S = 2s sets

0 1 2 B-1tagv

valid bitB = 2b bytes per cache block (the data)

t bits s bits b bitsAddress of word:

tag setindex

blockoffset

data begins at this offset

• Locate set• Check if any line in set

has matching tag• Yes + line valid: hit• Locate data starting

at offset

Page 31: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–31 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

Example: Direct Mapped Cache (E = 1)

S = 2s sets

Direct mapped: one line per setAssume: cache block size 8 bytes

t bits 0…01 100Address of int:

0 1 2 7tagv 3 654

0 1 2 7tagv 3 654

0 1 2 7tagv 3 654

0 1 2 7tagv 3 654

find set

Page 32: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–32 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

Example: Direct Mapped Cache (E = 1)Direct mapped: one line per setAssume: cache block size 8 bytes

t bits 0…01 100Address of int:

0 1 2 7tagv 3 654

match: assume yes = hitvalid? +

block offset

tag

Page 33: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–33 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

Example: Direct Mapped Cache (E = 1)Direct mapped: one line per setAssume: cache block size 8 bytes

t bits 0…01 100Address of int:

0 1 2 7tagv 3 654

match: assume yes = hitvalid? +

int (4 Bytes) is here

block offset

No match: old line is evicted and replaced

Page 34: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–34 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

Direct-Mapped Cache SimulationM=16 byte addresses, B=2 bytes/block,

S=4 sets, E=1 Blocks/set

Address trace (reads, one byte per read):

0 [00002], 1 [00012], 7 [01112], 8 [10002], 0 [00002]

x

t=1 s=2 b=1

xx x

0 ? ?

v Tag Block

miss

1 0 M[0-1]

hitmiss

1 0 M[6-7]

miss

1 1 M[8-9]

miss

1 0 M[0-1]Set 0Set 1Set 2Set 3

Page 35: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–35 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

A Higher-Level Exampleint sum_array_rows(double a[16][16]){

int i, j;double sum = 0;

for (i = 0; i < 16; i++)for (j = 0; j < 16; j++)

sum += a[i][j];return sum;

}

32 B = 4 doubles

assume: cold (empty) cache,a[0][0] goes here

int sum_array_cols(double a[16][16]){

int i, j;double sum = 0;

for (j = 0; i < 16; i++)for (i = 0; j < 16; j++)

sum += a[i][j];return sum;

}

Ignore the variables sum, i, j

Page 36: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–36 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

A Higher-Level Exampleint sum_array_rows(double a[16][16]){

int i, j;double sum = 0;

for (i = 0; i < 16; i++)for (j = 0; j < 16; j++)

sum += a[i][j];return sum;

}

32 B = 4 doubles

int sum_array_cols(double a[16][16]){

int i, j;double sum = 0;

for (j = 0; i < 16; i++)for (i = 0; j < 16; j++)

sum += a[i][j];return sum;

}

a0,0 a0,1 a0,2 a0,3

a0,4 a0,5 a0,6 a0,7

a0,8 a0,9 a0,10 a0,11

a0,12 a0,13 a0,14 a0,15

a1,0 a1,1 a1,2 a1,3

a1,4 a1,5 a1,6 a1,7

a1,8 a1,9 a1,10 a1,11

a1,12 a1,13 a1,14 a1,15

Page 37: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–37 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

A Higher-Level Exampleint sum_array_rows(double a[16][16]){

int i, j;double sum = 0;

for (i = 0; i < 16; i++)for (j = 0; j < 16; j++)

sum += a[i][j];return sum;

}

32 B = 4 doubles

int sum_array_cols(double a[16][16]){

int i, j;double sum = 0;

for (j = 0; j < 16; i++)for (i = 0; i < 16; j++)

sum += a[i][j];return sum;

}

a0,0 a0,1 a0,2 a0,3

a1,0 a1,1 a1,2 a1,3

a2,0 a2,1 a2,2 a2,3

a3,0 a3,1 a3,2 a3,3

Page 38: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–38 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

Conflict Misses: Aligned

double dotprod(double x[8], double y[8]) {double sum = 0.0;int i;

for (i=0; i<8; i++)sum += x[i] * y[i];

return sum;}

32 B = 4 doubles

x0 x1 x2 x3y0 y1 y2 y3x0 x1 x2 x3y0 y1 y2 y3x0 x1 x2 x3y1 y2 y3y0x0 x1 x2 x3y0 y1 y2 y3

Page 39: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–39 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

Different Alignments

double dotprod(double x[8], double y[8]) {double sum = 0.0;int i;

for (i=0; i<8; i++)sum += x[i] * y[i];

return sum;}

32 B = 4 doubles

x0 x1 x2 x3

y0 y1 y2 y3x4 x5 x6 x7

y4 y5 y6 y7

Page 40: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–40 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

E-way Set-Associative Cache (Here: E = 2)E = 2: two lines per setAssume: cache block size 8 bytes

t bits 0…01 100Address of short int:

0 1 2 7tagv 3 654 0 1 2 7tagv 3 654

compare both

valid? + match: yes = hit

block offset

tag

Page 41: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–41 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

E-way Set-Associative Cache (Here: E = 2)E = 2: two lines per set

Assume: cache block size 8 bytes

t bits 0…01 100

Address of short int:

0 1 2 7tagv 3 654 0 1 2 7tagv 3 654

compare both

valid? + match: yes = hit

block offset

short int (2 Bytes) is here

No match:

• One line in set is selected for eviction and replacement

• Replacement policies: random, least recently used (LRU), …

Page 42: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–42 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

Quiz 2 100 01 100Address of int:

8 8 8 9tag=2v 8 999

4 4 4 5tag=0v 4 555

0 0 0 1tag=0v 0 111

c c c dtag=4v c ddd

0

1

2

3

tag=3v

tag=4v

tag=2v

tag=av

2 2 2 32 333

6 6 6 76 777

a a a ba bbb

e e e fe fff

Given the address above and the cache contents as shown, what is the value of the int at the given address?

a) 1111b) 3333c) 4444d) 7777

Page 43: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–43 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

2-Way Set-Associative Cache Simulation

M=16 byte addresses, B=2 bytes/block, S=2 sets, E=2 blocks/set

Address trace (reads, one byte per read):0 [00002], 1 [00012], 7 [01112], 8 [10002], 0 [00002]

xxt=2 s=1 b=1

x x

0 ? ?v Tag Block

0

00

miss

1 00 M[0-1]

hitmiss

1 01 M[6-7]

miss

1 10 M[8-9]

hit

Set 0

Set 1

Page 44: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–44 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

A Higher-Level Exampleint sum_array_rows(double a[16][16]){

int i, j;double sum = 0;

for (i = 0; i < 16; i++)for (j = 0; j < 16; j++)

sum += a[i][j];return sum;

}

32 B = 4 doubles

assume: cold (empty) cache,a[0][0] goes here

int sum_array_rows(double a[16][16]){

int i, j;double sum = 0;

for (j = 0; j < 16; i++)for (i = 0; i < 16; j++)

sum += a[i][j];return sum;

}

Ignore the variables sum, i, j

Page 45: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–45 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

A Higher-Level Example

32 B = 4 doubles

Ignore the variables sum, i, j

a0,0 a0,1 a0,2 a0,3

a0,4 a0,5 a0,6 a0,7

a0,8 a0,9 a0,10 a0,11

a0,12 a0,13 a0,14 a0,15

a1,0 a1,1 a1,2 a1,3

a1,4 a1,5 a1,6 a1,7

a1,8 a1,9 a1,10 a1,11

a1,12 a1,13 a1,14 a1,15

int sum_array_rows(double a[16][16]){

int i, j;double sum = 0;

for (i = 0; i < 16; i++)for (j = 0; j < 16; j++)

sum += a[i][j];return sum;

}

int sum_array_cols(double a[16][16]){

int i, j;double sum = 0;

for (j = 0; i < 16; i++)for (i = 0; j < 16; j++)

sum += a[i][j];return sum;

}

Page 46: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–46 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

A Higher-Level Example

32 B = 4 doubles

Ignore the variables sum, i, j

a0,0 a0,1 a0,2 a0,3 a1,0 a1,1 a1,2 a1,3a2,0 a2,1 a2,2 a2,3 a3,0 a3,1 a3,2 a3,3

int sum_array_rows(double a[16][16]){

int i, j;double sum = 0;

for (i = 0; i < 16; i++)for (j = 0; j < 16; j++)

sum += a[i][j];return sum;

}

int sum_array_cols(double a[16][16]){

int i, j;double sum = 0;

for (j = 0; i < 16; i++)for (i = 0; j < 16; j++)

sum += a[i][j];return sum;

}

Page 47: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–47 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

Conflict Misses

double dotprod(double x[8], double y[8]) {double sum = 0.0;int i;

for (i=0; i<8; i++)sum += x[i] * y[i];

return sum;}

32 B = 4 doubles

x0 x1 x2 x3 y0 y1 y2 y3

x4 x5 x6 x7 y4 y5 y6 y7

Page 48: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–48 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

Intel Core i5 and i7 Cache Hierarchy

Regs

L1 d-cache

L1 i-cache

L2 unified cache

Core 0

Regs

L1 d-cache

L1 i-cache

L2 unified cache

Core 3

L3 unified cache(shared by all cores)

Main memory

Processor packageL1 i-cache and d-cache:

32 KB, 8-way, Access: 4 cycles

L2 unified cache:256 KB, 8-way,

Access: 11 cycles

L3 unified cache:8 MB, 16-way,Access: 30-40 cycles

Block size: 64 bytes for all caches

Page 49: 16memhier1 - Brown Universitycs.brown.edu/courses/cs033/lecture/16memhier1X.pdf · CS33 Intro to Computer Systems XVI–5 Copyright © 2018 Thomas W. Doeppner. All rights reserved

CS33 Intro to Computer Systems XVI–49 Copyright © 2018 Thomas W. Doeppner. All rights reserved.

What About Writes?

• Multiple copies of data exist:– L1, L2, main memory, disk

• What to do on a write-hit?– write-through (write immediately to memory)

– write-back (defer write to memory until replacement of line)» need a dirty bit (line different from memory or not)

• What to do on a write-miss?– write-allocate (load into cache, update line in cache)

» good if more writes to the location follow

– no-write-allocate (writes immediately to memory)

• Typical– write-through + no-write-allocate

– write-back + write-allocate