computer architecture 2015 – cache coherency & consistency 1 computer architecture memory...

22
Computer Architecture 2015 – Cache 1 Computer Architecture Computer Architecture Memory Coherency & Consistency Memory Coherency & Consistency By Yoav Etsion and Dan Tsafrir Presentation based on slides by David Patterson, Avi Mendelson, Lihu Rappoport, and Adi Yoaz

Upload: barnard-fitzgerald

Post on 12-Jan-2016

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computer Architecture 2015 – Cache Coherency & Consistency 1 Computer Architecture Memory Coherency & Consistency By Yoav Etsion and Dan Tsafrir Presentation

Computer Architecture 2015 – Cache Coherency & Consistency1

Computer ArchitectureComputer Architecture

Memory Coherency & Memory Coherency & Consistency Consistency

By Yoav Etsion and Dan TsafrirPresentation based on slides by David Patterson, Avi Mendelson, Lihu Rappoport, and Adi Yoaz

Page 2: Computer Architecture 2015 – Cache Coherency & Consistency 1 Computer Architecture Memory Coherency & Consistency By Yoav Etsion and Dan Tsafrir Presentation

Computer Architecture 2015 – Cache Coherency & Consistency2

Coherence - intro Coherence - intro

When there’s only one core Caching doesn’t affect

correctness

But what happens when ≥ 2 cores work simultaneously on same memory location? If both are reading, not a

problem Otherwise, one might use a

stale, out-of-date copy of the data

The inconsistencies might lead to incorrect execution

Terminology

Processor 1

L1 cache

Processor 2

L1 cache

L2 cache (shared)

Memory

Memory coherence <=> Cache coherence

Page 3: Computer Architecture 2015 – Cache Coherency & Consistency 1 Computer Architecture Memory Coherency & Consistency By Yoav Etsion and Dan Tsafrir Presentation

Computer Architecture 2015 – Cache Coherency & Consistency3

The cache coherence problem The cache coherence problem for a single memory locationfor a single memory location

Time Event Cache contents for CPU-1

Cache contents for CPU-2

Memory contents for location X

0 1

1 CPU-1 reads X 1 1

2 CPU-2 reads X 1 1 1

3 CPU-1 stores 0 into X

0 1 0

Stale value, different than correspondingmemory location and CPU-1 cache.

(The next read by CPU-2 might yield “1”.)

Page 4: Computer Architecture 2015 – Cache Coherency & Consistency 1 Computer Architecture Memory Coherency & Consistency By Yoav Etsion and Dan Tsafrir Presentation

Computer Architecture 2015 – Cache Coherency & Consistency4

A memory system is coherent A memory system is coherent if…if…

Informally, we could say (or we would like to say) that...

A memory system is coherent if…

Any read of a data item returns the most recently written value of that data item

(This definition is intuitive, but too optimistic and overly simplistic)

More formally…

Page 5: Computer Architecture 2015 – Cache Coherency & Consistency 1 Computer Architecture Memory Coherency & Consistency By Yoav Etsion and Dan Tsafrir Presentation

Computer Architecture 2015 – Cache Coherency & Consistency5

A memory system is coherent A memory system is coherent if…if…

1. - Processor P writes to location X, and later- P reads from X, and- No other processor writes to X between above write & read

=> Read must return value previously written by P

2. - P1 writes to X- Some time – T – elapses- P2 reads from X

=> For big enough T, P2 will read the value written by P1

3. Two writes to same location by any two processors are serialized

=> Are seen in the same order by all processors (if “1” and then “2” are written, no processor would read “2” & “1”)

Page 6: Computer Architecture 2015 – Cache Coherency & Consistency 1 Computer Architecture Memory Coherency & Consistency By Yoav Etsion and Dan Tsafrir Presentation

Computer Architecture 2015 – Cache Coherency & Consistency6

A memory system is coherent A memory system is coherent if…if…

1. - Processor P writes to location X, and later- P reads from X, and- No other processor writes to X between above write & read

=> Read must return value previously written by P

2. - P1 writes to X- Some time – T – elapses- P2 reads from X

=> For big enough T, P2 will read the value written by P1

3. Two writes to same location X by any two processors are serialized

=> Are seen in the same order by all processors (if “1” and then “2” are written, no processor would read “2” & “1”)

Simply preserves program order(needed even on uniprocessor).

Defines notation of what it means to have acoherent view of memory; if X is never updated regardless of the duration of T, than the memory is not coherent.

If P1 writes to X and then P2 writes to X, serialization of writes ensures that everyprocessor will see P2’s write eventually; otherwise P1’s value might be maintainedindefinitely.

Page 7: Computer Architecture 2015 – Cache Coherency & Consistency 1 Computer Architecture Memory Coherency & Consistency By Yoav Etsion and Dan Tsafrir Presentation

Computer Architecture 2015 – Cache Coherency & Consistency7

MESI ProtocolMESI Protocol Each cache line can be on one of 4 states

Invalid – Line data is not valid (as in simple cache)

Shared – Line is valid & not dirty, copies may exist in other caches

Exclusive – Line is valid & not dirty, other processors do not have the line in their local caches

Modified – Line is valid & dirty, other processors do not have the line in their local caches

(MESI = Modified, Exclusive, Shared, Invalid) Coherency through store atomicity

Page 8: Computer Architecture 2015 – Cache Coherency & Consistency 1 Computer Architecture Memory Coherency & Consistency By Yoav Etsion and Dan Tsafrir Presentation

Computer Architecture 2015 – Cache Coherency & Consistency8

Two classes of protocols to Two classes of protocols to track sharingtrack sharing

Directory based Status of each memory block kept in just 1 location

(=directory) Directory-based coherence has bigger overhead But can scale to bigger core counts

Snooping Every cache holding a copy of the data has a copy of the

state No centralized state All caches are accessible via broadcast (bus or switch) All cache controllers monitor (or “snoop”) the broadcasts

• To determine if they have a copy of what’s requsted/used

• And to invalidate their copy when needed

Page 9: Computer Architecture 2015 – Cache Coherency & Consistency 1 Computer Architecture Memory Coherency & Consistency By Yoav Etsion and Dan Tsafrir Presentation

Computer Architecture 2015 – Cache Coherency & Consistency9

Processor 1

L1 cache

Processor 2

L1 cache

L2 cache (shared)

Memory

[1000]: 5

miss

Multi-processor System: Multi-processor System: ExampleExample

P1 reads 1000

P1 writes 1000

[1000]: 5

[1000]

miss[1000]: 5

[1000]: 6EM

0010

Page 10: Computer Architecture 2015 – Cache Coherency & Consistency 1 Computer Architecture Memory Coherency & Consistency By Yoav Etsion and Dan Tsafrir Presentation

Computer Architecture 2015 – Cache Coherency & Consistency10

Processor 1

L1 cache

Processor 2

L1 cache

L2 cache (shared)

Memory

MS

[1000]: 5

Multi-processor System: Multi-processor System: ExampleExample

P1 reads 1000

P1 writes 1000

P2 reads 1000

L2 requests 1000 from

L1

P1 writes back 1000

P2 gets 1000

[1000]: 5

[1000]: 6 [1000]miss

[1000]: 6

[1000]: 6

S

1011

Page 11: Computer Architecture 2015 – Cache Coherency & Consistency 1 Computer Architecture Memory Coherency & Consistency By Yoav Etsion and Dan Tsafrir Presentation

Computer Architecture 2015 – Cache Coherency & Consistency11

Processor 1

L1 cache

Processor 2

L1 cache

L2 cache (shared)

Memory

MS

[1000]: 5

Multi-processor System: Multi-processor System: ExampleExample

P1 reads 1000

P1 writes 1000

P2 reads 1000

L2 snoops 1000

P1 writes back 1000

P2 gets 1000[1000]: 6

[1000]: 6 [1000]: 6[1000]: 6S

1011

P2 requests for ownership with write intent

[1000]I

01[1000]

E

Page 12: Computer Architecture 2015 – Cache Coherency & Consistency 1 Computer Architecture Memory Coherency & Consistency By Yoav Etsion and Dan Tsafrir Presentation

Computer Architecture 2015 – Cache Coherency & Consistency12

The alternative: non-coherent The alternative: non-coherent memorymemory

As core counts grow, many argue that maintaining coherence Will slow down the machines Will waste a lot of energy Will not scale

Intel SCC Single chip cloud computer – for research purposes 48 cores Shared, non-coherent memory Software is responsible for correctness

The Barrelfish operating system By Microsoft & ETH (Zurich) Assumes no coherency as the base line

Page 13: Computer Architecture 2015 – Cache Coherency & Consistency 1 Computer Architecture Memory Coherency & Consistency By Yoav Etsion and Dan Tsafrir Presentation

Computer Architecture 2015 – Cache Coherency & Consistency13

Page 14: Computer Architecture 2015 – Cache Coherency & Consistency 1 Computer Architecture Memory Coherency & Consistency By Yoav Etsion and Dan Tsafrir Presentation

Computer Architecture 2015 – Cache Coherency & Consistency14

Intel SCCIntel SCC

Shared (non-coherent)memory

Page 15: Computer Architecture 2015 – Cache Coherency & Consistency 1 Computer Architecture Memory Coherency & Consistency By Yoav Etsion and Dan Tsafrir Presentation

Computer Architecture 2015 – Cache Coherency & Consistency15

Memory ConsistencyMemory Consistency Coherence definition is not enough

So as to be able to write correct programs It must be supplemented by a consistency model Critical for program correctness

Coherency & consistency are 2 different, complementary aspects of memory systems Coherency

• Assures that values written by one processor to a specific memory location are seen by other processors

• Deals with what values can be returned (not when) Consistency

• Behavior of reads & writes to different memory locations

• Insures that writes to different locations will be seen in an order that makes sense

Page 16: Computer Architecture 2015 – Cache Coherency & Consistency 1 Computer Architecture Memory Coherency & Consistency By Yoav Etsion and Dan Tsafrir Presentation

Computer Architecture 2015 – Cache Coherency & Consistency16

Memory Consistency (cont.)Memory Consistency (cont.) “How consistent is the memory system?”

A nontrivial question Assume: locations A & B are

originally cached by P1 & P2 With initial value = 0

If writes are immediately seenby other processors Impossible for both “if” conditions to be true Reaching “if” means either A or B must hold 1

But suppose: (1) “Write invalidate” can be delayed, and (2) Processor allowed to compute during this delay => It’s possible P1 & P2 haven’t seen the invalidations of B

& A until after the reads, thus, both “if” conditions are true

Should this be allowed? Determined by consistency model

Processor P1

Processor P2

A = 0; B = 0;

… …

A = 1; B = 1;

if ( B == 0 ) …

if ( A == 0 ) …

Page 17: Computer Architecture 2015 – Cache Coherency & Consistency 1 Computer Architecture Memory Coherency & Consistency By Yoav Etsion and Dan Tsafrir Presentation

Computer Architecture 2015 – Cache Coherency & Consistency17

Memory Consistency (cont.)Memory Consistency (cont.)

What value will processor P print?

Processor P1 Processor P2

A = 0; While ( A == 0 );

… /* do nothing */

B = 1

… print(B)

A = 1;

Page 18: Computer Architecture 2015 – Cache Coherency & Consistency 1 Computer Architecture Memory Coherency & Consistency By Yoav Etsion and Dan Tsafrir Presentation

Computer Architecture 2015 – Cache Coherency & Consistency18

Consistency modelsConsistency models From most strict to most relaxed

Strict consistency Sequential consistency Weak consistency Release consistency […more…]

Stricter models are Easier to understand Harder to implement Slower Involve more communication Waste more energy

Page 19: Computer Architecture 2015 – Cache Coherency & Consistency 1 Computer Architecture Memory Coherency & Consistency By Yoav Etsion and Dan Tsafrir Presentation

Computer Architecture 2015 – Cache Coherency & Consistency19

Strict consistency Strict consistency (“linearizability”)(“linearizability”)

All memory operations are ordered in time Since all ops are globally ordered, the if(A==1) if(B==1)

example from two slides ago behaves in a “sane” expected manner

Any read to location X returns the most recent write op to X

This is the intuitive notion of memory consistency

But too restrictive and thus unusedP1: W(x)

1

P2: R(x)2

R(x)2

P3: R(x)2

R(x)2

P4: W(x)2

P1: W(x)1

P2: R(x)2

R(x)2

P3: R(x)2

R(x)2

P4: W(x)2

time

Page 20: Computer Architecture 2015 – Cache Coherency & Consistency 1 Computer Architecture Memory Coherency & Consistency By Yoav Etsion and Dan Tsafrir Presentation

Computer Architecture 2015 – Cache Coherency & Consistency20

Sequential consistency Sequential consistency Relaxation of strict (defined by Lamport)

Requires the result of any execution be the same as if memory accesses were interleaved in some arbitrary order Can be a different order upon each run

Left is sequentially consistent (can be ordered as in the right)

Q. What if we flip the order of P2’s reads (on left)?

P1: W(x)1

P2: R(x)1

R(x)2

P3: R(x)1

R(x)2

P4: W(x)2

P1: W(x)1

P2: R(x)1 R(x)2

P3: R(x)1 R(x)2

P4: W(x)2

time

Page 21: Computer Architecture 2015 – Cache Coherency & Consistency 1 Computer Architecture Memory Coherency & Consistency By Yoav Etsion and Dan Tsafrir Presentation

Computer Architecture 2015 – Cache Coherency & Consistency21

Weak consistencyWeak consistency1. Access to “synchronization variables” are

sequentially consistent2. No access to a synchronization variable is allowed to

be performed until all previous writes have completed everywhere

3. No data access (read or write) is allowed to be performed until all previous accesses to synchronization variables have been performed

In other words, the processor doesn’t need to broadcast values at all, until a synchronization access happens

But then it broadcasts all values to all cores Can think of S as generating a “memory barrier”

P1: W(x)1

W(x)2

S

P2: R(x)0

R(x)2

S R(x)2

P3: R(x)1

S R(x)2

Page 22: Computer Architecture 2015 – Cache Coherency & Consistency 1 Computer Architecture Memory Coherency & Consistency By Yoav Etsion and Dan Tsafrir Presentation

Computer Architecture 2015 – Cache Coherency & Consistency22

Release consistencyRelease consistency Before accessing shared variable

Acquire op must be completed

Before a release allowed All accesses must be completed

Acquire/release calls are sequentially consistent

Serves as “lock”