synar systems networking and architecture group cmpt 886: computer architecture primer dr. alexandra...

24
SYNAR Systems Networking and Architecture Group SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU

Upload: noah-gilmore

Post on 04-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU

SYNAR Systems Networking and Architecture GroupSYNAR

Systems Networking and Architecture Group

CMPT 886: Computer Architecture Primer

Dr. Alexandra FedorovaSchool of Computing Science

SFU

Page 2: SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU

SYNAR Systems Networking and Architecture Group

Outline

• Caches• Branch prediction• Out-of-order execution• Instruction Level Parallelism

Page 3: SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU

SYNAR Systems Networking and Architecture Group

Caches

• Level 1 / Level 2 / Level 3• Instruction/Data or unified

Page 4: SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU

SYNAR Systems Networking and Architecture Group

Direct-Mapped Cache

Line size = 32 bytes

Cache eviction

Page 5: SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU

SYNAR Systems Networking and Architecture Group

Set-Associative Cache

• 4-way set associative cache• The data can go into any of the four locations• When the entire set is full, which line should we replace? • LRU – least recently used (LRU stack)

Page 6: SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU

SYNAR Systems Networking and Architecture Group

Cache Hit/Miss

• Cache hit – the data is found in the cache• Cache miss – the data is not in the cache• Miss rate:– misses per instruction– misses per cycle– misses per access (also miss ratio)

• Hit rate:– the opposite

Page 7: SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU

SYNAR Systems Networking and Architecture Group

Cache Miss Latency

• How long you have to wait if you miss in the cache

• Miss in L1 L2 latency (~20 cycles)• Miss in L2 memory latency (~300 cycles)

(if there is no L3)

Page 8: SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU

SYNAR Systems Networking and Architecture Group

Writing in Cache

• Write through – write directly to memory• Write back – write to memory later, when the

line is evicted

Page 9: SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU

SYNAR Systems Networking and Architecture Group

Caches on Multiprocessor Systems

Bus

cache

memory

cachecache

© Herlihy-Shavit 2007

Page 10: SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU

SYNAR Systems Networking and Architecture Group

Processor Issues Load Request

Bus

cache

memory

cachecache

datadata

© Herlihy-Shavit 2007

Page 11: SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU

SYNAR Systems Networking and Architecture Group

Another Processor Issues Load Request

Bus

cache

memory

cachecache

data

dataBus

I got data

dataBus

I want data

© Herlihy-Shavit 2007

Page 12: SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU

SYNAR Systems Networking and Architecture Group

memory

Bus

Processor Modifies Data

cache cachecache

data

datadata

Now other copies are invalid

data

© Herlihy-Shavit 2007

Page 13: SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU

SYNAR Systems Networking and Architecture Group

Send Invalidation Message to Others

memory

Bus

cache cachecache

data

datadata data

Invalidate!

Bus

Other caches lose read permission

No need to change now: other caches can provide valid data

© Herlihy-Shavit 2007

Page 14: SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU

SYNAR Systems Networking and Architecture Group

Processor Asks for Data

memory

Bus

cache cachecache

data

datadata

Bus

I want data

data

© Herlihy-Shavit 2007

Page 15: SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU

SYNAR Systems Networking and Architecture Group

Shared Caches

• Filled on demand• No control over cache shares• An aggressive thread can grab a large cache share, hurt others

Thread 1 Thread 1 Thread 2 Thread 2

Thread 1 Thread 1Thread 1 Thread 1

Thread 1 Thread 1

Thread 1 Thread 1Thread 1 Thread 1Thread 1 Thread 2

Page 16: SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU

SYNAR Systems Networking and Architecture Group

Outline

• Caches• Branch prediction• Out-of-order execution• Instruction Level Parallelism

Page 17: SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU

SYNAR Systems Networking and Architecture Group

Branching and CPU Pipeline

CPU pipeline

Page 18: SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU

SYNAR Systems Networking and Architecture Group

Branching Hurts Pipelining

Page 19: SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU

SYNAR Systems Networking and Architecture Group

Branch Prediction

Page 20: SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU

SYNAR Systems Networking and Architecture Group

Outline

• Caches• Branch prediction• Out-of-order execution• Instruction Level Parallelism

Page 21: SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU

SYNAR Systems Networking and Architecture Group

Out-of-order Execution

• Modern CPUs are super-scalar• They can issue more than one instructions per

clock cycle• If consecutive instructions depend on each

other instruction-level parallelism is limited• To keep the processor going at full speed,

issue instructions out of order

Page 22: SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU

SYNAR Systems Networking and Architecture Group

Speculative Execution

• Out-of-order execution is limited to basic blocks• To go beyond basic blocks, use speculative execution

Page 23: SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU

SYNAR Systems Networking and Architecture Group

Outline

• Caches• Branch prediction• Out-of-order execution• Instruction Level Parallelism

Page 24: SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU

SYNAR Systems Networking and Architecture Group

Instruction-Level Parallelism

• Many programs fail to keep processor busy– Code with lots of loads– Code with frequent and unpredictable branches

• CPU cycles are wasted: power is consumed, no useful work is done

• Running multiple threads on the chip helps this