chapter 06 computer organization and design, fifth edition: the hardware/software interface (the...
TRANSCRIPT
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 1/57
COMPUTER ORGANIZATION AND DThe Hardware/Software Interface
5th
Edition
Chap er 6
Parallel Processors fromClient to Cloud
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 2/57
Introduction
Goal: connecting multiple computersto get higher performance Multiprocessors Scalability, aailability, power efficiency
Tas!"leel #process"leel$ parallelism High throughput for independent %obs
&arallel processing program Single program run on multiple processors
Multicore microprocessors 'hips with multiple processors #cores$
()*+Int ro
duc
tion
Chapter 6 — Parallel Processors from Client to Cloud — 2
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 3/57
Hardware and Software
Hardware Serial: e*g*, &entium &arallel: e*g*, -uad"core .eon e0
Software Se-uential: e*g*, matri1 multiplication 'oncurrent: e*g*, operating system
Se-uential/concurrent software can run on
serial/parallel hardware 'hallenge: ma!ing effectie use of parallel
hardware
Chapter 6 — Parallel Processors from Client to Cloud — 3
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 4/57
What We’ve Alread Covered
(2*++: &arallelism and Instructions Synchroni3ation
(0*): &arallelism and 'omputer 4rithmetic Subword &arallelism
(*+5: &arallelism and 4dancedInstruction"6eel &arallelism
(*+5: &arallelism and Memory
Hierarchies 'ache 'oherence
Chapter 6 — Parallel Processors from Client to Cloud — !
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 5/57
Parallel Pro"rammin"
&arallel software is the problem 7eed to get significant performance
improement
8therwise, %ust use a faster uniprocessor,since it9s easier
;ifficulties &artitioning
'oordination
'ommunications oerhead
()*2The;iffic ulty
of'reating
&arall e
l&rocessing&
rogram
s
Chapter 6 — Parallel Processors from Client to Cloud — #
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 6/57
Amdahl’s $aw
Se-uential part can limit speedup E1ample: +55 processors, <5= speedup>
Tnew ? Tparalleli3able/+55 @ Tse-uential
Soling: Aparalleli3able ? 5*<<<
7eed se-uential part to be 5*+B of original
time
<5/+55A$A#+
+Speedupableparalleli3ableparalleli3
=
+−
=
Chapter 6 — Parallel Processors from Client to Cloud — 6
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 7/57
Scalin" %&le
Cor!load: sum of +5 scalars, and +5 = +5 matri1sum Speed up from +5 to +55 processors
Single processor: Time ? #+5 @ +55$ = tadd +5 processors
Time ? +5 = tadd @ +55/+5 = tadd ? 25 = tadd Speedup ? ++5/25 ? * #B of potential$
+55 processors Time ? +5 = tadd @ +55/+55 = tadd ? ++ = tadd Speedup ? ++5/++ ? +5 #+5B of potential$
4ssumes load can be balanced acrossprocessors
Chapter 6 — Parallel Processors from Client to Cloud — '
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 8/57
Scalin" %&le (cont)
Chat if matri1 si3e is +55 = +55>
Single processor: Time ? #+5 @ +5555$ = tadd +5 processors
Time ? +5 = tadd @ +5555/+5 = tadd ? +5+5 = tadd Speedup ? +55+5/+5+5 ? <*< #<<B of potential$
+55 processors Time ? +5 = tadd @ +5555/+55 = tadd ? ++5 = tadd
Speedup ? +55+5/++5 ? <+ #<+B of potential$ 4ssuming load balanced
Chapter 6 — Parallel Processors from Client to Cloud — *
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 9/57
Stron" vs Wea+ Scalin"
Strong scaling: problem si3e fi1ed 4s in e1ample
Cea! scaling: problem si3e proportional to
number of processors +5 processors, +5 = +5 matri1
Time ? 25 = tadd
+55 processors, 02 = 02 matri1 Time ? +5 = tadd @ +555/+55 = tadd ? 25 = tadd
'onstant performance in this e1ample
Chapter 6 — Parallel Processors from Client to Cloud — ,
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 10/57
Instruction and -ata Streams
4n alternate classification;ata Streams
Single Multiple
Instruction
Streams
Single SIS-:
Intel &entium
SI.-: SSE
instructions of 1D)Multiple .IS-:
7o e1amples today.I.-:Intel .eon e0
S&M;: Single &rogram Multiple ;ata 4 parallel program on a MIM; computer
'onditional code for different processors
Chapter 6 — Parallel Processors from Client to Cloud — /0
()*0SIS
;,M
IM;,SIM
;,S&M
;,andEecto
r
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 11/57
%&le1 -AP ( 4 a 5 ) 'onentional MI&S code
l.d $f0,a($sp) ;load scalar a addiu r4,$s0,#512 ;upper bound of what to loadloop: l.d $f2,0($s0) ;load (i) !ul.d $f2,$f2,$f0 ;a " (i) l.d $f4,0($s1) ;load (i) add.d $f4,$f4,$f2 ;a " (i) (i)
s.d $f4,0($s1) ;store into (i) addiu $s0,$s0,#% ;incre!ent inde to addiu $s1,$s1,#% ;incre!ent inde to subu $t0,r4,$s0 ;co!pute bound bne $t0,$&ero,loop ;chec' if done
ector MI&S code
l.d $f0,a($sp) ;load scalar a l $1,0($s0) ;load ector !uls.d $2,$1,$f0 ;ectorscalar !ultipl l $*,0($s1) ;load ector add.d $4,$2,$* ;add to product s $4,0($s1) ;store the result
Chapter 6 — Parallel Processors from Client to Cloud — //
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 12/57
7ector Processors
Highly pipelined function units
Stream data from/to ector registers to units ;ata collected from memory into registers
Fesults stored from registers to memory
E1ample: ector e1tension to MI&S 02 = )"element registers #)"bit elements$
ector instructions l, s: load/store ector
add.d: add ectors of double
adds.d: add scalar to each element of ector of double
Significantly reduces instruction"fetch bandwidth
Chapter 6 — Parallel Processors from Client to Cloud — /2
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 13/57
7ector vs8 Scalar
ector architectures and compilers Simplify data"parallel programming E1plicit statement of absence of loop"carried
dependences Feduced chec!ing in hardware
Fegular access patterns benefit frominterleaed and burst memory
4oid control ha3ards by aoiding loops
More general than ad"hoc mediae1tensions #such as MM., SSE$ etter match with compiler technology
Chapter 6 — Parallel Processors from Client to Cloud — /3
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 14/57
SI.-
8perate elementwise on ectors of data E*g*, MM. and SSE instructions in 1D) Multiple data elements in +2D"bit wide registers
4ll processors e1ecute the same
instruction at the same time Each with different data address, etc*
Simplifies synchroni3ation
Feduced instruction control hardware Cor!s best for highly data"parallel
applications
Chapter 6 — Parallel Processors from Client to Cloud — /!
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 15/57
7ector vs8 .ultimedia %&tensions
ector instructions hae a ariable ector width,
multimedia e1tensions hae a fi1ed width
ector instructions support strided access,
multimedia e1tensions do not
ector units can be combination of pipelined andarrayed functional units:
Chapter 6 — Parallel Processors from Client to Cloud — /#
(
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 16/57
.ultithreadin"
&erforming multiple threads of e1ecution inparallel Feplicate registers, &', etc* Aast switching between threads
Aine"grain multithreading Switch threads after each cycle Interleae instruction e1ecution If one thread stalls, others are e1ecuted
'oarse"grain multithreading 8nly switch on long stall #e*g*, 62"cache miss$ Simplifies hardware, but doesn9t hide short stalls
#eg, data ha3ards$
()*Hardware M
ultithreading
Chapter 6 — Parallel Processors from Client to Cloud — /6
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 17/57
Simultaneous .ultithreadin"
In multiple"issue dynamically scheduledprocessor Schedule instructions from multiple threads Instructions from independent threads e1ecute
when function units are aailable Cithin threads, dependencies handled by
scheduling and register renaming
E1ample: Intel &entium" HT Two threads: duplicated registers, shared
function units and caches
Chapter 6 — Parallel Processors from Client to Cloud — /'
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 18/57
.ultithreadin" %&le
Chapter 6 — Parallel Processors from Client to Cloud — /*
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 19/57
9uture of .ultithreadin"
Cill it surie> In what form> &ower considerations⇒ simplified
microarchitectures
Simpler forms of multithreading Tolerating cache"miss latency
Thread switch may be most effectie
Multiple simple cores might shareresources more effectiely
Chapter 6 — Parallel Processors from Client to Cloud — /,
(
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 20/57
Shared .emor
SM&: shared memory multiprocessor Hardware proides single physical
address space for all processors
Synchroni3e shared ariables using loc!s
Memory access time M4 #uniform$ s* 7M4 #nonuniform$
Chapter 6 — Parallel Processors from Client to Cloud — 20
()*/Mu
lticoreand8th
erSha
redMemoryM
u
ltiproc
essors
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 21/57
%&le1 Sum :eduction
Sum +55,555 numbers on +55 processor M4 Each processor has I;: 5 &n << &artition +555 numbers per processor Initial summation on each processor
su!+n- 0; for (i 1000/n; i 1000/(n1); i i 1) su!+n- su!+n- +i-;
7ow need to add these partial sums Feduction: diide and con-uer Half the processors add pairs, then -uarter, J 7eed to synchroni3e between reduction steps
Chapter 6 — Parallel Processors from Client to Cloud — 2/
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 22/57
%&le1 Sum :eduction
half 100;
repeat snch();
if (half2 3 0 n 0)
su!+0- su!+0- su!+half1-;
/ 6onditional su! needed when half is odd;
rocessor0 7ets !issin7 ele!ent / half half2; / diidin7 line on who su!s /
if (n half) su!+n- su!+n- su!+nhalf-;
until (half 1);
Chapter 6 — Parallel Processors from Client to Cloud — 22
(
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 23/57
Histor of ;P<s
Early ideo cards Arame buffer memory with address generation for
ideo output
0; graphics processing
8riginally high"end computers #e*g*, SGI$ Moore9s 6aw⇒ lower cost, higher density
0; graphics cards for &'s and game consoles
Graphics &rocessing nits &rocessors oriented to 0; graphics tas!s
erte1/pi1el processing, shading, te1ture mapping,
rasteri3ation
()*)Introductionto
Graphics
&roc
essing
nits
Chapter 6 — Parallel Processors from Client to Cloud — 23
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 24/57
;raphics in the Sstem
Chapter 6 — Parallel Processors from Client to Cloud — 2!
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 25/57
;P< Architectures
&rocessing is highly data"parallel G&s are highly multithreaded se thread switching to hide memory latency
6ess reliance on multi"leel caches Graphics memory is wide and high"bandwidth
Trend toward general purpose G&s Heterogeneous '&/G& systems '& for se-uential code, G& for parallel code
&rogramming languages/4&Is
;irect., 8penG6 ' for Graphics #'g$, High 6eel Shader 6anguage
#H6S6$ 'ompute nified ;eice 4rchitecture #';4$
Chapter 6 — Parallel Processors from Client to Cloud — 2#
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 26/57
%&le1 =7I-IA >esla
Streaming
multiprocessor
D = Streaming
processors
Chapter 6 — Parallel Processors from Client to Cloud — 26
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 27/57
%&le1 =7I-IA >esla
Streaming &rocessors Single"precision A& and integer units Each S& is fine"grained multithreaded
Carp: group of 02 threads E1ecuted in parallel,
SIM; style D S&s
= cloc! cycles
Hardware conte1tsfor 2 warps Fegisters, &'s, J
Chapter 6 — Parallel Processors from Client to Cloud — 2'
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 28/57
Classifin" ;P<s
;on9t fit nicely into SIM;/MIM; model 'onditional e1ecution in a thread allows anillusion of MIM; ut with performance degredation
7eed to write general purpose code with care
Static: ;iscoeredat 'ompile Time
;ynamic: ;iscoeredat Funtime
Instruction"6eel
&arallelism
6IC Superscalar
;ata"6eel&arallelism
SIM; or ector >esla .ultiprocessor
Chapter 6 — Parallel Processors from Client to Cloud — 2*
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 29/57
;P< .emor Structures
Chapter 6 — Parallel Processors from Client to Cloud — 2,
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 30/57
Puttin" ;P<s into Perspective
Chapter 6 — Parallel Processors from Client to Cloud — 30
9eature .ulticore with SI.- ;P<
SIM; processors to D D to +)
SIM; lanes/processor 2 to D to +)
Multithreading hardware support forSIM; threads
2 to +) to 02
Typical ratio of single precision to
double"precision performance
2:+ 2:+
6argest cache si3e D M 5*K M
Si3e of memory address )"bit )"bit
Si3e of main memory D G to 2) G G to ) G
Memory protection at leel of page Les Les
;emand paging Les 7o
Integrated scalar processor/SIM;processor
Les 7o
'ache coherent Les 7o
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 31/57
;uide to ;P< >erms
Chapter 6 — Parallel Processors from Client to Cloud — 3/
()
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 32/57
.essa"e Passin"
Each processor has priate physicaladdress space Hardware sends/receies messages
between processors
)*K'lu
sters,CS',a
nd8th
erMessage"&a
s
singM
&s
Chapter 6 — Parallel Processors from Client to Cloud — 32
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 33/57
$oosel Coupled Clusters
7etwor! of independent computers Each has priate memory and 8S
'onnected using I/8 system E*g*, Ethernet/switch, Internet
Suitable for applications with independent tas!s Ceb serers, databases, simulations, J
High aailability, scalable, affordable
&roblems 4dministration cost #prefer irtual machines$
6ow interconnect bandwidth c*f* processor/memory bandwidth on an SM&
Chapter 6 — Parallel Processors from Client to Cloud — 33
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 34/57
Sum :eduction (A"ain)
Sum +55,555 on +55 processors
Airst distribute +55 numbers to each The do partial sums
su! 0;for (i 0; i1000; i i 1) su! su! 8+i-;
Feduction Half the processors send, other half receie
and add
The -uarter send, -uarter receie and add, J
Chapter 6 — Parallel Processors from Client to Cloud — 3!
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 35/57
Sum :eduction (A"ain)
Gien send#$ and receie#$ operations
li!it 100; half 100;/ 100 processors /repeat half (half1)2; / send s. receie diidin7 line /
if (n 9 half n li!it) send(n half, su!); if (n (li!it2)) su! su! receie(); li!it half; / upper li!it of senders /
until (half 1); / eit with final su! /
Send/receie also proide synchroni3ation
4ssumes send/receie ta!e similar time to addition
Chapter 6 — Parallel Processors from Client to Cloud — 3#
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 36/57
;rid Computin"
Separate computers interconnected by
long"haul networ!s E*g*, Internet connections
Cor! units farmed out, results sent bac!
'an ma!e use of idle time on &'s E*g*, SETIhome, Corld 'ommunity Grid
Chapter 6 — Parallel Processors from Client to Cloud — 36
()
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 37/57
Interconnection =etwor+s
7etwor! topologies 4rrangements of processors, switches, and lin!s
)*DIntroductionto
Multiprocesso
r7etwo
r!Topologies
us Fing
2; Mesh
7"cube #7 ? 0$
Aully connected
Chapter 6 — Parallel Processors from Client to Cloud — 3'
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 38/57
.ultista"e =etwor+s
Chapter 6 — Parallel Processors from Client to Cloud — 3*
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 39/57
=etwor+ Characteristics
&erformance 6atency per message #unloaded networ!$ Throughput
6in! bandwidth
Total networ! bandwidth isection bandwidth
'ongestion delays #depending on traffic$
'ost &ower Foutability in silicon
Chapter 6 — Parallel Processors from Client to Cloud — 3,
()
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 40/57
Parallel ?enchmar+s
6inpac!: matri1 linear algebra S&E'rate: parallel run of S&E' '& programs
Nob"leel parallelism
S&64SH: Stanford &arallel 4pplications for
Shared Memory Mi1 of !ernels and applications, strong scaling
74S #74S4 4danced Supercomputing$ suite computational fluid dynamics !ernels
&4FSE' #&rinceton 4pplication Fepository forShared Memory 'omputers$ suite Multithreaded applications using &threads and
8penM&
)*+5Multipro
cessorench
m
ar!sa
nd&erforman
ceMod
els
Chapter 6 — Parallel Processors from Client to Cloud — !0
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 41/57
Code or Applications@
Traditional benchmar!s Ai1ed code and data sets
&arallel programming is eoling Should algorithms, programming languages,
and tools be part of the system> 'ompare systems, proided they implement a
gien application E*g*, 6inpac!, er!eley ;esign &atterns
Could foster innoation in approaches toparallelism
Chapter 6 — Parallel Processors from Client to Cloud — !/
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 42/57
.odelin" Performance
4ssume performance metric of interest isachieable GA68&s/sec Measured using computational !ernels from
er!eley ;esign &atterns
4rithmetic intensity of a !ernel A68&s per byte of memory accessed
Aor a gien computer, determine
&ea! GA68&S #from data sheet$ &ea! memory bytes/sec #using Stream
benchmar!$
Chapter 6 — Parallel Processors from Client to Cloud — !2
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 43/57
C i S
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 44/57
Comparin" Sstems
E1ample: 8pteron .2 s* 8pteron . 2"core s* "core, 2= A& performance/core, 2*2GH3
s* 2*0GH3
Same memory system
To get higher performance
on . than .2 7eed high arithmetic intensity
8r wor!ing set must fit in .9s
2M 6"0 cache
Chapter 6 — Parallel Processors from Client to Cloud — !!
ti i i P f
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 45/57
ptimiBin" Performance
8ptimi3e A& performance alance adds O multiplies Improe superscalar I6&
and use of SIM;
instructions 8ptimi3e memory usage
Software prefetch 4oid load stalls
Memory affinity 4oid non"local data
accesses
Chapter 6 — Parallel Processors from Client to Cloud — !#
ti i i P f
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 46/57
ptimiBin" Performance
'hoice of optimi3ation depends on
arithmetic intensity of code
4rithmetic intensity is
not always fi1ed May scale with
problem si3e
'aching reducesmemory accesses Increases arithmetic
intensity
Chapter 6 — Parallel Processors from Client to Cloud — !6
()*
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 47/57
i',60 vs8 =7I-IA >esla 2*0D!*0*++FealSt u
ff:enchmar!
in
gand
Fooflines
iK
s*Tesl a
Chapter 6 — Parallel Processors from Client to Cloud — !'
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 48/57
? h +
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 49/57
?enchmar+s
Chapter 6 — Parallel Processors from Client to Cloud — !,
P f S
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 50/57
Performance Summar
Chapter 6 — Parallel Processors from Client to Cloud — #0
G& #D5$ has * . the memory bandwidth enefits memory bound !ernels
G& has +0*+ . the single precision throughout, 2* .
the double precision throughput enefits A& compute bound !ernels
'& cache preents some !ernels from becomingmemory bound when they otherwise would on G&
G&s offer scatter"gather, which assists with !ernels with
strided data
6ac! of synchroni3ation and memory consistency supporton G& limits performance for some !ernels
. lti th di -;%..()*
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 51/57
.ultithreadin" -;%..
Chapter 6 — Parallel Processors from Client to Cloud — #/
+2GoingA
aster:Multiple
&roce
ssorsand
Mat ri1
Mult ip
ly
se 8penM&:
void dgemm (int n, double* A, double* B, double* C)
{
#pragma omp parallel for
for ( int sj = 0; sj n; sj != B"C$%&' ) for ( int si = 0; si n; si != B"C$%&' )
for ( int s = 0; s n; s != B"C$%&' )
doblo+(n, si, sj, s, A, B, C);
. ltith d d -;%..
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 52/57
.ultithreaded -;%..
Chapter 6 — Parallel Processors from Client to Cloud — #2
. ltith d d -;%..
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 53/57
.ultithreaded -;%..
Chapter 6 — Parallel Processors from Client to Cloud — #3
9 ll i()*+
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 54/57
9allacies
4mdahl9s 6aw doesn9t apply to parallel
computers Since we can achiee linear speedup
ut only on applications with wea! scaling
&ea! performance trac!s obsered
performance Mar!eters li!e this approach
ut compare .eon with others in e1ample
7eed to be aware of bottlenec!s
+0Aallacie
sand&
itfalls
Chapter 6 — Parallel Processors from Client to Cloud — #!
Pitf ll
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 55/57
Pitfalls
7ot deeloping the software to ta!e
account of a multiprocessor architecture E1ample: using a single loc! for a shared
composite resource Seriali3es accesses, een if they could be done in
parallel
se finer"granularity loc!ing
Chapter 6 — Parallel Processors from Client to Cloud — ##
Concludin" :emar+s()*+
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 56/57
Concludin" :emar+s
Goal: higher performance by using multiple
processors
;ifficulties ;eeloping parallel software
;eising appropriate architectures SaaS importance is growing and clusters are a
good match
&erformance per dollar and performance per
Noule drie both mobile and CS'
+'onclud
ingFemar!s
Chapter 6 — Parallel Processors from Client to Cloud — #6
Concludin" :emar+s (con’t)
8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 57/57
Concludin" :emar+s (con’t)
SIM; and ector
operations matchmultimedia applications
and are easy to
program