caching in multiprocessor systems tiina niklander in amict 2009, petrozavodsk 19.5.2009

Caching in multiprocessor systems

Tiina Niklander

In AMICT 2009, Petrozavodsk

19.5.2009

Background

More transistors on one chip Multiple cores Larger cache Multiple on chip caches More functionality (more functional units, dedicated

multimedia / deciphering cell, integrated GPU)

Multiple cores introduce Cache organization Private vs shared caches Cache coherence

Cache organization

Common organization: L1 is private Last-level cache is shared

With three levels: L1 private L2 ? Private or shared L3 Shared

Private vs Shared cache

Fully private, fully shared, partially shared

F. Sibai: On the performance benefits of sharing and privatizing second and third-level cache memories in homogeneous multi-core architectures. Microprocessors and Microsystems 32 ( 2008), pp. 405-412

Shared L2 (all can access all L2)

Private L2 (pair of processors share)

Shared cache

Simple coherence issue (just one copy) Different latencies (CPU - cache location) Cache access competition (wait for other core)

M. Kandemir, F. Li, M.J. Irwin, S.W. Son: A Novel Migration-Based NUCA Design for ChipMultiprocessors. In SC2008. IEEE, 2008, pp.

Private cache

No access competition, smaller latencies, But coherence becomes an issue!

Same date in multiple caches -> invalidate on write

Cache partitioning Design time: Fixed partitioning Run time:

Fixed partitioning (configuration issue)Dynamic (based on current need)

Cache coherence

Protocols: MESI, MSI, MOSI, MOESI

Invalidation message: RFO (Read for ownership) Each cache snoops the bus to monitor memory ops

M E S I

M N N N Y

E N N N Y

S N N Y Y

I Y Y Y Y

M – modified(O- Owned)E – ExlusiveS – SharedI – Invalid

N – not allowed stateY – allowed statewikipedia

(Distributed) cooperative caches

Add a directory structure Knows the data locations in local caches Cache-to-cache copying

When in another cache (directory locates)On eviction (store temporarily on another cache)

E, Herrero, J. Conzález, R. Canal: Distributed Cooperative Caching. In PACT’08. ACM 2008, pp. 134-142

New improvement ideas for cache performance 1/2

Split the cache for different tasks Dynamically allocate cache areas

Software controlled eviction GOAL: thread moves unneeded, but strongly-shared

data to shared cache to improve performance of

other threads New instruction evict tells the processor to move

some data from private L1 or L2 to shared L3

New improvement ideas for cache performance 2/2

Helper threads GOAL: additional thread executes parts of the code

ahead of the actual thread to ‘prefetch’ data to cache

Generate memory traces for the programmer Tuning the software performance

Conclusion

Focus on fine-tuning the cache performance

Cache coherence itself is solved earlier Not always used (if allowed non-coherent usage)

L2 and L3 caches Shared or private Cache partitioning

Support for software-based improvements Eviction hints Traces Prefetching (like helper thread)

References

S. Fide, S. Jenks: Proactive use of shared L3 caches to enhance cache communic-ations

in multi-core processors. IEEE Comp. Arch. L. vol 7 (2008), pp 57-60

E. Herrero, J. Conzález, R. Canal: Distributed Cooperative Caching. In Conf. on Parallel

architectures and compilation techniques, PACT’08. ACM 2008, pp. 134-142

M. Kandemir, F. Li, M.J. Irwin, S.W. Son: A Novel Migration-Based NUCA Design for Chip.

Multiprocessors. In Proc. of the 2008 ACM/IEEE Conf. on Supercomputing. IEEE, 2008,

pp. 1-12

L. Peng, et.al.: Memory hierarchy performance measurement of commercial dual-core

desktop processors. Journal of Systems Architecture 54(2008), pp. 816-828.

F. Sibai: On the performance benefits of sharing and privatizing second and third-level

cache memories in homogeneous multi-core architectures. Microprocessors and

Microsystems 32 ( 2008), pp. 405-412

J. Zhang, X. Fan, S.H. Liu: A Pollution Alleviate L2 Cache Replacement Policy for Chip

Multiprocessor Architecture. In Int. Conf. on Networking, Architecture and Storage, IEEE,

2008, pp. 310-316

caching in multiprocessor systems tiina niklander in amict 2009, petrozavodsk 19.5.2009

Documents

cache copyingwhen

privatelastlevel cache

level cache memories

shared data

cache communications

shared l2

private l1

data locations