chapter 06 computer organization and design, fifth edition: the hardware/software interface (the...

57
COMPUTER ORGANIZATION AND D The Hardware/Software Interface 5 th Edition Chap er 6 Parallel Processors from Client to Cloud

Upload: priyanka-meena

Post on 06-Jul-2018

217 views

Category:

Documents


1 download

TRANSCRIPT

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 1/57

COMPUTER ORGANIZATION AND DThe Hardware/Software Interface

5th

Edition

Chap er 6

Parallel Processors fromClient to Cloud

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 2/57

Introduction

Goal: connecting multiple computersto get higher performance Multiprocessors Scalability, aailability, power efficiency

Tas!"leel #process"leel$ parallelism High throughput for independent %obs

&arallel processing program Single program run on multiple processors

Multicore microprocessors 'hips with multiple processors #cores$

()*+Int ro

duc

tion

Chapter 6 — Parallel Processors from Client to Cloud — 2

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 3/57

Hardware and Software

Hardware Serial: e*g*, &entium &arallel: e*g*, -uad"core .eon e0

Software Se-uential: e*g*, matri1 multiplication 'oncurrent: e*g*, operating system

Se-uential/concurrent software can run on

serial/parallel hardware 'hallenge: ma!ing effectie use of parallel

hardware

Chapter 6 — Parallel Processors from Client to Cloud — 3

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 4/57

What We’ve Alread Covered

(2*++: &arallelism and Instructions Synchroni3ation

(0*): &arallelism and 'omputer 4rithmetic Subword &arallelism

(*+5: &arallelism and 4dancedInstruction"6eel &arallelism

(*+5: &arallelism and Memory

Hierarchies 'ache 'oherence

Chapter 6 — Parallel Processors from Client to Cloud — !

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 5/57

Parallel Pro"rammin"

&arallel software is the problem 7eed to get significant performance

improement

8therwise, %ust use a faster uniprocessor,since it9s easier

;ifficulties &artitioning

'oordination

'ommunications oerhead

()*2The;iffic ulty

of'reating

&arall e

l&rocessing&

rogram

s

Chapter 6 — Parallel Processors from Client to Cloud — #

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 6/57

Amdahl’s $aw

Se-uential part can limit speedup E1ample: +55 processors, <5= speedup>

Tnew ? Tparalleli3able/+55 @ Tse-uential

 

Soling: Aparalleli3able ? 5*<<<

7eed se-uential part to be 5*+B of original

time

<5/+55A$A#+

+Speedupableparalleli3ableparalleli3

=

+−

=

Chapter 6 — Parallel Processors from Client to Cloud — 6

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 7/57

Scalin" %&ample

Cor!load: sum of +5 scalars, and +5 = +5 matri1sum Speed up from +5 to +55 processors

Single processor: Time ? #+5 @ +55$ = tadd +5 processors

Time ? +5 = tadd @ +55/+5 = tadd ? 25 = tadd Speedup ? ++5/25 ? * #B of potential$

+55 processors Time ? +5 = tadd @ +55/+55 = tadd ? ++ = tadd Speedup ? ++5/++ ? +5 #+5B of potential$

 4ssumes load can be balanced acrossprocessors

Chapter 6 — Parallel Processors from Client to Cloud — '

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 8/57

Scalin" %&ample (cont)

Chat if matri1 si3e is +55 = +55>

Single processor: Time ? #+5 @ +5555$ = tadd +5 processors

Time ? +5 = tadd @ +5555/+5 = tadd ? +5+5 = tadd Speedup ? +55+5/+5+5 ? <*< #<<B of potential$

+55 processors Time ? +5 = tadd @ +5555/+55 = tadd ? ++5 = tadd

Speedup ? +55+5/++5 ? <+ #<+B of potential$  4ssuming load balanced

Chapter 6 — Parallel Processors from Client to Cloud — *

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 9/57

Stron" vs Wea+ Scalin"

Strong scaling: problem si3e fi1ed  4s in e1ample

Cea! scaling: problem si3e proportional to

number of processors +5 processors, +5 = +5 matri1

Time ? 25 = tadd

+55 processors, 02 = 02 matri1 Time ? +5 = tadd @ +555/+55 = tadd ? 25 = tadd

'onstant performance in this e1ample

Chapter 6 — Parallel Processors from Client to Cloud — ,

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 10/57

Instruction and -ata Streams

 4n alternate classification;ata Streams

Single Multiple

Instruction

Streams

Single SIS-:

Intel &entium

SI.-: SSE

instructions of 1D)Multiple .IS-:

7o e1amples today.I.-:Intel .eon e0

S&M;: Single &rogram Multiple ;ata  4 parallel program on a MIM; computer 

'onditional code for different processors

Chapter 6 — Parallel Processors from Client to Cloud — /0

()*0SIS

;,M

IM;,SIM

;,S&M

;,andEecto

r

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 11/57

%&ample1 -AP ( 4 a 5 )  'onentional MI&S code

  l.d $f0,a($sp) ;load scalar a  addiu r4,$s0,#512 ;upper bound of what to loadloop: l.d $f2,0($s0) ;load (i)  !ul.d $f2,$f2,$f0 ;a " (i)  l.d $f4,0($s1) ;load (i)  add.d $f4,$f4,$f2 ;a " (i) (i)

  s.d $f4,0($s1) ;store into (i)  addiu $s0,$s0,#% ;incre!ent inde to   addiu $s1,$s1,#% ;incre!ent inde to   subu $t0,r4,$s0 ;co!pute bound  bne $t0,$&ero,loop ;chec' if done

  ector MI&S code

  l.d $f0,a($sp) ;load scalar a  l $1,0($s0) ;load ector   !uls.d $2,$1,$f0 ;ectorscalar !ultipl  l $*,0($s1) ;load ector   add.d $4,$2,$* ;add to product  s $4,0($s1) ;store the result

Chapter 6 — Parallel Processors from Client to Cloud — //

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 12/57

7ector Processors

Highly pipelined function units

Stream data from/to ector registers to units ;ata collected from memory into registers

Fesults stored from registers to memory

E1ample: ector e1tension to MI&S 02 = )"element registers #)"bit elements$

ector instructions l, s: load/store ector 

add.d: add ectors of double

adds.d: add scalar to each element of ector of double

Significantly reduces instruction"fetch bandwidth

Chapter 6 — Parallel Processors from Client to Cloud — /2

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 13/57

7ector vs8 Scalar 

ector architectures and compilers Simplify data"parallel programming E1plicit statement of absence of loop"carried

dependences Feduced chec!ing in hardware

Fegular access patterns benefit frominterleaed and burst memory

 4oid control ha3ards by aoiding loops

More general than ad"hoc mediae1tensions #such as MM., SSE$ etter match with compiler technology

Chapter 6 — Parallel Processors from Client to Cloud — /3

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 14/57

SI.-

8perate elementwise on ectors of data E*g*, MM. and SSE instructions in 1D) Multiple data elements in +2D"bit wide registers

 4ll processors e1ecute the same

instruction at the same time Each with different data address, etc*

Simplifies synchroni3ation

Feduced instruction control hardware Cor!s best for highly data"parallel

applications

Chapter 6 — Parallel Processors from Client to Cloud — /!

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 15/57

7ector vs8 .ultimedia %&tensions

ector instructions hae a ariable ector width,

multimedia e1tensions hae a fi1ed width

ector instructions support strided access,

multimedia e1tensions do not

ector units can be combination of pipelined andarrayed functional units:

Chapter 6 — Parallel Processors from Client to Cloud — /#

(

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 16/57

.ultithreadin"

&erforming multiple threads of e1ecution inparallel Feplicate registers, &', etc* Aast switching between threads

Aine"grain multithreading Switch threads after each cycle Interleae instruction e1ecution If one thread stalls, others are e1ecuted

'oarse"grain multithreading 8nly switch on long stall #e*g*, 62"cache miss$ Simplifies hardware, but doesn9t hide short stalls

#eg, data ha3ards$

()*Hardware M

ultithreading

Chapter 6 — Parallel Processors from Client to Cloud — /6

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 17/57

Simultaneous .ultithreadin"

In multiple"issue dynamically scheduledprocessor  Schedule instructions from multiple threads Instructions from independent threads e1ecute

when function units are aailable Cithin threads, dependencies handled by

scheduling and register renaming

E1ample: Intel &entium" HT Two threads: duplicated registers, shared

function units and caches

Chapter 6 — Parallel Processors from Client to Cloud — /'

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 18/57

.ultithreadin" %&ample

Chapter 6 — Parallel Processors from Client to Cloud — /*

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 19/57

9uture of .ultithreadin"

Cill it surie> In what form> &ower considerations⇒ simplified

microarchitectures

Simpler forms of multithreading Tolerating cache"miss latency

Thread switch may be most effectie

Multiple simple cores might shareresources more effectiely

Chapter 6 — Parallel Processors from Client to Cloud — /,

(

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 20/57

Shared .emor

SM&: shared memory multiprocessor  Hardware proides single physical

address space for all processors

Synchroni3e shared ariables using loc!s

Memory access time M4 #uniform$ s* 7M4 #nonuniform$

Chapter 6 — Parallel Processors from Client to Cloud — 20

()*/Mu

lticoreand8th

erSha

redMemoryM

u

ltiproc

essors

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 21/57

%&ample1 Sum :eduction

Sum +55,555 numbers on +55 processor M4 Each processor has I;: 5 &n << &artition +555 numbers per processor  Initial summation on each processor 

  su!+n- 0;  for (i 1000/n;  i 1000/(n1); i i 1)  su!+n- su!+n- +i-;

7ow need to add these partial sums Feduction: diide and con-uer  Half the processors add pairs, then -uarter, J 7eed to synchroni3e between reduction steps

Chapter 6 — Parallel Processors from Client to Cloud — 2/

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 22/57

%&ample1 Sum :eduction

half 100;

repeat  snch();

  if (half2 3 0 n 0)

  su!+0- su!+0- su!+half1-;

  / 6onditional su! needed when half is odd;

  rocessor0 7ets !issin7 ele!ent /  half half2; / diidin7 line on who su!s /

  if (n half) su!+n- su!+n- su!+nhalf-;

until (half 1);

Chapter 6 — Parallel Processors from Client to Cloud — 22

(

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 23/57

Histor of ;P<s

Early ideo cards Arame buffer memory with address generation for

ideo output

0; graphics processing

8riginally high"end computers #e*g*, SGI$ Moore9s 6aw⇒ lower cost, higher density

0; graphics cards for &'s and game consoles

Graphics &rocessing nits &rocessors oriented to 0; graphics tas!s

erte1/pi1el processing, shading, te1ture mapping,

rasteri3ation

()*)Introductionto

Graphics

&roc

essing

nits

Chapter 6 — Parallel Processors from Client to Cloud — 23

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 24/57

;raphics in the Sstem

Chapter 6 — Parallel Processors from Client to Cloud — 2!

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 25/57

;P< Architectures

&rocessing is highly data"parallel G&s are highly multithreaded se thread switching to hide memory latency

6ess reliance on multi"leel caches Graphics memory is wide and high"bandwidth

Trend toward general purpose G&s Heterogeneous '&/G& systems '& for se-uential code, G& for parallel code

&rogramming languages/4&Is

;irect., 8penG6 ' for Graphics #'g$, High 6eel Shader 6anguage

#H6S6$ 'ompute nified ;eice 4rchitecture #';4$

Chapter 6 — Parallel Processors from Client to Cloud — 2#

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 26/57

%&ample1 =7I-IA >esla

Streaming

multiprocessor 

D = Streaming

processors

Chapter 6 — Parallel Processors from Client to Cloud — 26

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 27/57

%&ample1 =7I-IA >esla

Streaming &rocessors Single"precision A& and integer units Each S& is fine"grained multithreaded

Carp: group of 02 threads E1ecuted in parallel,

SIM; style D S&s

= cloc! cycles

Hardware conte1tsfor 2 warps Fegisters, &'s, J

Chapter 6 — Parallel Processors from Client to Cloud — 2'

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 28/57

Classifin" ;P<s

;on9t fit nicely into SIM;/MIM; model 'onditional e1ecution in a thread allows anillusion of MIM; ut with performance degredation

7eed to write general purpose code with care

Static: ;iscoeredat 'ompile Time

;ynamic: ;iscoeredat Funtime

Instruction"6eel

&arallelism

6IC Superscalar  

;ata"6eel&arallelism

SIM; or ector  >esla .ultiprocessor 

Chapter 6 — Parallel Processors from Client to Cloud — 2*

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 29/57

;P< .emor Structures

Chapter 6 — Parallel Processors from Client to Cloud — 2,

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 30/57

Puttin" ;P<s into Perspective

Chapter 6 — Parallel Processors from Client to Cloud — 30

9eature .ulticore with SI.- ;P<

SIM; processors to D D to +)

SIM; lanes/processor 2 to D to +)

Multithreading hardware support forSIM; threads

2 to +) to 02

Typical ratio of single precision to

double"precision performance

2:+ 2:+

6argest cache si3e D M 5*K M

Si3e of memory address )"bit )"bit

Si3e of main memory D G to 2) G G to ) G

Memory protection at leel of page Les Les

;emand paging Les 7o

Integrated scalar processor/SIM;processor 

Les 7o

'ache coherent Les 7o

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 31/57

;uide to ;P< >erms

Chapter 6 — Parallel Processors from Client to Cloud — 3/

()

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 32/57

.essa"e Passin"

Each processor has priate physicaladdress space Hardware sends/receies messages

between processors

)*K'lu

sters,CS',a

nd8th

erMessage"&a

s

singM

&s

Chapter 6 — Parallel Processors from Client to Cloud — 32

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 33/57

$oosel Coupled Clusters

7etwor! of independent computers Each has priate memory and 8S

'onnected using I/8 system E*g*, Ethernet/switch, Internet

Suitable for applications with independent tas!s Ceb serers, databases, simulations, J

High aailability, scalable, affordable

&roblems  4dministration cost #prefer irtual machines$

6ow interconnect bandwidth c*f* processor/memory bandwidth on an SM&

Chapter 6 — Parallel Processors from Client to Cloud — 33

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 34/57

Sum :eduction (A"ain)

Sum +55,555 on +55 processors

Airst distribute +55 numbers to each The do partial sums

  su! 0;for (i 0; i1000; i i 1)  su! su! 8+i-;

Feduction Half the processors send, other half receie

and add

The -uarter send, -uarter receie and add, J

Chapter 6 — Parallel Processors from Client to Cloud — 3!

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 35/57

Sum :eduction (A"ain)

Gien send#$ and receie#$ operations

li!it 100; half 100;/ 100 processors /repeat  half (half1)2; / send s. receie  diidin7 line /

  if (n 9 half n li!it)  send(n half, su!);  if (n (li!it2))  su! su! receie();  li!it half; / upper li!it of senders /

until (half 1); / eit with final su! /

Send/receie also proide synchroni3ation

 4ssumes send/receie ta!e similar time to addition

Chapter 6 — Parallel Processors from Client to Cloud — 3#

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 36/57

;rid Computin"

Separate computers interconnected by

long"haul networ!s E*g*, Internet connections

Cor! units farmed out, results sent bac!

'an ma!e use of idle time on &'s E*g*, SETIhome, Corld 'ommunity Grid

Chapter 6 — Parallel Processors from Client to Cloud — 36

()

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 37/57

Interconnection =etwor+s

7etwor! topologies  4rrangements of processors, switches, and lin!s

)*DIntroductionto

Multiprocesso

r7etwo

r!Topologies

us Fing

2; Mesh

7"cube #7 ? 0$

Aully connected

Chapter 6 — Parallel Processors from Client to Cloud — 3'

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 38/57

.ultista"e =etwor+s

Chapter 6 — Parallel Processors from Client to Cloud — 3*

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 39/57

=etwor+ Characteristics

&erformance 6atency per message #unloaded networ!$ Throughput

6in! bandwidth

Total networ! bandwidth isection bandwidth

'ongestion delays #depending on traffic$

'ost &ower  Foutability in silicon

Chapter 6 — Parallel Processors from Client to Cloud — 3,

()

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 40/57

Parallel ?enchmar+s

6inpac!: matri1 linear algebra S&E'rate: parallel run of S&E' '& programs

Nob"leel parallelism

S&64SH: Stanford &arallel 4pplications for

Shared Memory Mi1 of !ernels and applications, strong scaling

74S #74S4 4danced Supercomputing$ suite computational fluid dynamics !ernels

&4FSE' #&rinceton 4pplication Fepository forShared Memory 'omputers$ suite Multithreaded applications using &threads and

8penM&

)*+5Multipro

cessorench

m

ar!sa

nd&erforman

ceMod

els

Chapter 6 — Parallel Processors from Client to Cloud — !0

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 41/57

Code or Applications@

Traditional benchmar!s Ai1ed code and data sets

&arallel programming is eoling Should algorithms, programming languages,

and tools be part of the system> 'ompare systems, proided they implement a

gien application E*g*, 6inpac!, er!eley ;esign &atterns

Could foster innoation in approaches toparallelism

Chapter 6 — Parallel Processors from Client to Cloud — !/

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 42/57

.odelin" Performance

 4ssume performance metric of interest isachieable GA68&s/sec Measured using computational !ernels from

er!eley ;esign &atterns

 4rithmetic intensity of a !ernel A68&s per byte of memory accessed

Aor a gien computer, determine

&ea! GA68&S #from data sheet$ &ea! memory bytes/sec #using Stream

benchmar!$

Chapter 6 — Parallel Processors from Client to Cloud — !2

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 43/57

C i S

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 44/57

Comparin" Sstems

E1ample: 8pteron .2 s* 8pteron . 2"core s* "core, 2= A& performance/core, 2*2GH3

s* 2*0GH3

Same memory system

To get higher performance

on . than .2 7eed high arithmetic intensity

8r wor!ing set must fit in .9s

2M 6"0 cache

Chapter 6 — Parallel Processors from Client to Cloud — !!

ti i i P f

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 45/57

ptimiBin" Performance

8ptimi3e A& performance alance adds O multiplies Improe superscalar I6&

and use of SIM;

instructions 8ptimi3e memory usage

Software prefetch  4oid load stalls

Memory affinity  4oid non"local data

accesses

Chapter 6 — Parallel Processors from Client to Cloud — !#

ti i i P f

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 46/57

ptimiBin" Performance

'hoice of optimi3ation depends on

arithmetic intensity of code

 4rithmetic intensity is

not always fi1ed May scale with

problem si3e

'aching reducesmemory accesses Increases arithmetic

intensity

Chapter 6 — Parallel Processors from Client to Cloud — !6

()*

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 47/57

i',60 vs8 =7I-IA >esla 2*0D!*0*++FealSt u

ff:enchmar!

in

gand

Fooflines

iK

s*Tesl a

Chapter 6 — Parallel Processors from Client to Cloud — !'

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 48/57

? h +

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 49/57

?enchmar+s

Chapter 6 — Parallel Processors from Client to Cloud — !,

P f S

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 50/57

Performance Summar

Chapter 6 — Parallel Processors from Client to Cloud — #0

G& #D5$ has * . the memory bandwidth enefits memory bound !ernels

G& has +0*+ . the single precision throughout, 2* .

the double precision throughput enefits A& compute bound !ernels

'& cache preents some !ernels from becomingmemory bound when they otherwise would on G&

G&s offer scatter"gather, which assists with !ernels with

strided data

6ac! of synchroni3ation and memory consistency supporton G& limits performance for some !ernels

. lti th di -;%..()*

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 51/57

.ultithreadin" -;%..

Chapter 6 — Parallel Processors from Client to Cloud — #/

+2GoingA

aster:Multiple

&roce

ssorsand

Mat ri1

Mult ip

ly

se 8penM&:

void dgemm (int n, double* A, double* B, double* C)

{

#pragma omp parallel for

 for ( int sj = 0; sj n; sj != B"C$%&' )  for ( int si = 0; si n; si != B"C$%&' )

  for ( int s = 0; s n; s != B"C$%&' )

  doblo+(n, si, sj, s, A, B, C);

. ltith d d -;%..

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 52/57

.ultithreaded -;%..

Chapter 6 — Parallel Processors from Client to Cloud — #2

. ltith d d -;%..

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 53/57

.ultithreaded -;%..

Chapter 6 — Parallel Processors from Client to Cloud — #3

9 ll i()*+

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 54/57

9allacies

 4mdahl9s 6aw doesn9t apply to parallel

computers Since we can achiee linear speedup

ut only on applications with wea! scaling

&ea! performance trac!s obsered

performance Mar!eters li!e this approach

ut compare .eon with others in e1ample

7eed to be aware of bottlenec!s

+0Aallacie

sand&

itfalls

Chapter 6 — Parallel Processors from Client to Cloud — #!

Pitf ll

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 55/57

Pitfalls

7ot deeloping the software to ta!e

account of a multiprocessor architecture E1ample: using a single loc! for a shared

composite resource Seriali3es accesses, een if they could be done in

parallel

se finer"granularity loc!ing

Chapter 6 — Parallel Processors from Client to Cloud — ##

Concludin" :emar+s()*+

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 56/57

Concludin" :emar+s

Goal: higher performance by using multiple

processors

;ifficulties ;eeloping parallel software

;eising appropriate architectures SaaS importance is growing and clusters are a

good match

&erformance per dollar and performance per

Noule drie both mobile and CS'

+'onclud

ingFemar!s

Chapter 6 — Parallel Processors from Client to Cloud — #6

Concludin" :emar+s (con’t)

8/17/2019 Chapter 06 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition

http://slidepdf.com/reader/full/chapter-06-computer-organization-and-design-fifth-edition-the-hardwaresoftware 57/57

Concludin" :emar+s (con’t)

SIM; and ector

operations matchmultimedia applications

and are easy to

program