single-chip multiprocessors: the rebirth of parallel architecture guri sohi university of wisconsin
Post on 20-Dec-2015
219 views
TRANSCRIPT
![Page 1: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/1.jpg)
Single-Chip Multiprocessors: the Single-Chip Multiprocessors: the Rebirth of Parallel ArchitectureRebirth of Parallel Architecture
Guri Sohi
University of Wisconsin
![Page 2: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/2.jpg)
2
OutlineOutline
Waves of innovation in architecture Innovation in uniprocessors Lessons from uniprocessors Future chip multiprocessor architectures Software and such things
![Page 3: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/3.jpg)
3
Waves of Research and InnovationWaves of Research and Innovation
A new direction is proposed or new opportunity becomes available
The center of gravity of the research community shifts to that direction SIMD architectures in the 1960s HLL computer architectures in the 1970s RISC architectures in the early 1980s Shared-memory MPs in the late 1980s OOO speculative execution processors in the 1990s
![Page 4: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/4.jpg)
4
WavesWaves
Wave is especially strong when coupled with a “step function” change in technology Integration of a processor on a single chip Integration of a multiprocessor on a chip
![Page 5: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/5.jpg)
5
Uniprocessor Innovation Wave: Part 1Uniprocessor Innovation Wave: Part 1
Many years of multi-chip implementations Generally microprogrammed control Major research topics: microprogramming,
pipelining, quantitative measures Significant research in multiprocessors
![Page 6: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/6.jpg)
6
Uniprocessor Innovation Wave: Part 2Uniprocessor Innovation Wave: Part 2
Integration of processor on a single chip The inflexion point Argued for different architecture (RISC)
More transistors allow for different models Speculative execution
Then the rebirth of uniprocessors Continue the journey of innovation Totally rethink uniprocessor microarchitecture Jim Keller: “Golden Age of Microarchitecture”
![Page 7: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/7.jpg)
7
Uniprocessor Innovation Wave: ResultsUniprocessor Innovation Wave: Results
Current uniprocessor very different from 1980’s uniprocessor
Uniprocessor research dominates conferences
MICRO comes back from the dead Top 1% (NEC Citeseer)
Impact on compilers Source: Rajwar and Hill, 2001
![Page 8: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/8.jpg)
8
Why Uniprocessor Innovation Wave?Why Uniprocessor Innovation Wave?
Innovation needed to happen Alternatives (multiprocessors) not practical
option for using additional transistors Innovation could happen: things could be
done differently Identify barriers (e.g., to performance) Use transistors to overcome barriers (e.g., via
speculation) Simulation tools facilitate innovation
![Page 9: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/9.jpg)
9
Lessons from UniprocessorsLessons from Uniprocessors
Don’t underestimate what can be done in hardware Doing things in software was considered easy;
in hardware considered hard Now perhaps the opposite
Barriers or limits become opportunities for innovation Via novel forms of speculation E.g., barriers in Wall’s study on limits of ILP
![Page 10: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/10.jpg)
10
Multiprocessor ArchitectureMultiprocessor Architecture
A.k.a. “multiarchitecture” of a multiprocessor Take state-of-the-art uniprocessor Connect several together with a suitable
network Have to live with defined interfaces
Expend hardware to provide cache coherence and streamline inter-node communication Have to live with defined interfaces
![Page 11: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/11.jpg)
11
Software ResponsibilitiesSoftware Responsibilities
Have software figure out how to use MP Reason about parallelism Reason about execution times and overheads Orchestrate parallel execution Very difficult for software to parallelize
transparently
![Page 12: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/12.jpg)
12
Explicit Parallel ProgrammingExplicit Parallel Programming
Have programmer express parallelism Reasoning about parallelism is hard Use synchronization to ease reasoning
Parallel trends towards serial with the use of synchronization
![Page 13: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/13.jpg)
13
Net ResultNet Result
Difficult to get parallelism speedup Computation is serial Inter-node communication latencies
exacerbate problem Multiprocessors rarely used for parallel
execution Used to run threaded programs
Lower-overhead sync would help Used to improve throughput
![Page 14: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/14.jpg)
14
The Inflexion Point for MultiprocessorsThe Inflexion Point for Multiprocessors
• Can put a basic small-scale MP on a chip
• Can think of alternative ways of building multiarchitecture• Don’t have to work with defined interfaces!
• What opportunities does this open up?• Allows for parallelism to get performance.
• Allows for use of novel techniques to overcome (software and hardware) barriers
• Other opportunities (e.g., reliability)
![Page 15: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/15.jpg)
15
Parallel SoftwareParallel Software
Needs to be compelling reason to have a parallel application
Won’t happen if difficult to create Written by programmer or automatically
parallelized by compiler Won’t happen if insufficient performance gain
![Page 16: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/16.jpg)
16
Changes in MP MultiarchitectureChanges in MP Multiarchitecture
Inventing new functionality to overcome barriers Consider barriers as opportunities
Developing new models for using CMPs Revisiting traditional use of MPs
![Page 17: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/17.jpg)
17
Speculative MultithreadingSpeculative Multithreading
Speculatively parallelize an application Use speculation to overcome ambiguous
dependences Use hardware support to recover from mis-
speculation E.g., multiscalar Use hardware to overcome limitations
![Page 18: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/18.jpg)
18
Overcoming Barriers: Memory ModelsOvercoming Barriers: Memory Models
Weak models proposed to overcome performance limitations of SC
Speculation used to overcome “maybe” dependences
Series of papers showing SC can achieve performance of weak models
![Page 19: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/19.jpg)
19
ImplicationsImplications
Strong memory models not necessarily low performance
Programmer does not have to reason about weak models
More likely to have parallel programs written
![Page 20: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/20.jpg)
20
Overcoming Barriers: SynchronizationOvercoming Barriers: Synchronization
Synchronization to avoid “maybe” dependences Causes serialization
Speculate to overcome serialization Recent work on techniques to dynamically
elide synchronization constructs
![Page 21: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/21.jpg)
21
ImplicationsImplications
Programmer can make liberal use of synchronization to ease programming
Little performance impact of synchronization More likely to have parallel programs written
![Page 22: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/22.jpg)
22
Overcoming Barriers: CoherenceOvercoming Barriers: Coherence
Caches used to reduce data access latency; need to be kept coherent
Latencies of getting value from remote location impact performance
Getting remote value is two-part operation Get value Get permissions to use value Can separating these help?
![Page 23: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/23.jpg)
23
Coherence DecouplingCoherence Decoupling
Sequential execution
Speculativeexecution
Time
Time
Coherencemiss detected
Valuepredicted
Permissiongranted
Value arrivesand verified
Retried (onmisspeculation)
Coherencemiss detected
Permissionand value arrive
Miss latency
Worst case latency
Best case latency
![Page 24: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/24.jpg)
24
Zeros/Ones in Coherence MissesZeros/Ones in Coherence Misses
Preliminary Data, Simics (Ultrasparc/Solaris, 16P), Cache (4MB 4-way SA L2, 64B lines, MOSI)
Load Value 1 0 -1 TotalOLTP 5.1% 32.3% 9.9% 47.3%Apache 0.7% 27.8% 0.5% 29.0%JBB 18.6% 12.6% 0.5% 31.7%Barnes 1.7% 34.1% 1.7% 37.5%Ocean 1.3% 25.9% 1.7% 28.9%Store Value 1 0 -1 TotalOLTP 7.1% 29.1% 13.1% 49.4%Apache 3.4% 17.0% 7.4% 27.7%JBB 0.7% 21.3% 1.4% 23.4%Barnes 1.8% 29.8% 17.2% 48.7%Ocean 13.2% 4.9% 3.5% 21.5%
![Page 25: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/25.jpg)
25
Other Performance OptimizationsOther Performance Optimizations
Clever techniques for inter-processor communication Remember: no artificial constraints on chip
Further reduction of artificial serialization
![Page 26: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/26.jpg)
26
New Uses for CMPsNew Uses for CMPs
Helper threads, redundant execution, etc.
• will need extensive research in the context of CMPs
How about trying to parallelize application, i.e., “traditional” use of MPs?
![Page 27: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/27.jpg)
27
Revisiting Traditional Use of MPsRevisiting Traditional Use of MPs
Compilers and software for MPs Digression: Master/Slave Speculative
Parallelization (MSSP) Expectations for future software Implications
![Page 28: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/28.jpg)
28
Parallelizing Apps: A Moving TargetParallelizing Apps: A Moving Target
Learned to reason about certain languages, data structures, programming constructs and applications
Newer languages, data structures, programming constructs and applications appear
Always playing catch up Can we get a jump ahead?
![Page 29: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/29.jpg)
29
Master/Slave Speculative Parallelization (MSSP)Master/Slave Speculative Parallelization (MSSP)
Take a program and create program with two sub-programs: Master and Slave
Master program is approximate (or distilled) version of original program
Slave (original) program “checks” work done by master
Portions of the slave program execute in parallel
![Page 30: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/30.jpg)
30
MSSP - OverviewMSSP - Overview
A
B
C
D
E
MasterSlave
Slave
Slave
Original Code
AD
BD
CD
DD
ED
Distilled Code
Distilled Code on Master
Original Code concurrently on Slaves verifies Distilled Code
Use checkpoints to communicate changes
![Page 31: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/31.jpg)
31
MSSP - DistillerMSSP - Distiller
Program with many paths
![Page 32: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/32.jpg)
32
MSSP - DistillerMSSP - Distiller
Program with many paths
Dominant paths
![Page 33: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/33.jpg)
33
MSSP - DistillerMSSP - Distiller
Program with many paths
Dominant paths
![Page 34: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/34.jpg)
34
MSSP - DistillerMSSP - Distiller
Program with many paths
Dominant paths
![Page 35: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/35.jpg)
35
MSSP - DistillerMSSP - Distiller
Approximate important pathsCompiler Optimizations
Distilled Code
![Page 36: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/36.jpg)
36
MSSP - ExecutionMSSP - Execution
BBD
DDD
A AD
C CD
EDE
Slave Slave Slave Master
Verify checkpoint of AD -- CorrectVerify checkpoint of BD -- CorrectVerify checkpoint of CD -- CorrectVerify checkpoint of DD -- Wrong
EDE
Restart E and ED
![Page 37: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/37.jpg)
37
MSSP SummaryMSSP Summary
Distill away code that is unlikely to impact state used by later portions of program
Performance tracks distillation ratio Better distillation = better performance
Verification of distilled program done in parallel on slaves
![Page 38: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/38.jpg)
38
Future Applications and SoftwareFuture Applications and Software
What will future applications look like? Don’t know
What language will they be written in? Don’t know; don’t care
Code for future applications will have “overheads” Overheads for checking for correctness Overheads for improving reliability Overheads for checking security
![Page 39: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/39.jpg)
39
Overheads as an OpportunityOverheads as an Opportunity
Performance costs of overhead have limited their use Overheads not a limit; rather an opportunity Run overhead code in parallel with non-overhead
code Develop programming models Develop parallel execution models (a la MSSP) Recent work in this direction
Success at reducing overhead cost will encourage even more use of “overhead” techniques
![Page 40: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/40.jpg)
40
SummarySummary
New opportunities for innovation in MPs Expect little resemblance between MPs today
and CMPs in 15 years We need to invent and define differences Not because uniprocessors are running out of
steam But because innovation in CMP
multiarchitecture possible
![Page 41: Single-Chip Multiprocessors: the Rebirth of Parallel Architecture Guri Sohi University of Wisconsin](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4b5503460f94a29693/html5/thumbnails/41.jpg)
41
SummarySummary
Novel techniques for attacking performance limitations
New models for expressing work (computation and overhead)
New parallel processing models Simulation tools