sjsu spring 2011 parallel computing parallel computing cs 147: computer architecture instructor:...

23
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti

Upload: brendan-shelton

Post on 16-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti

SJSU SPRING 2011 PARALLEL COMPUTING

Parallel Computing

CS 147: Computer ArchitectureInstructor: Professor Sin-Min Lee

Spring 2011By: Alice Cotti

Page 2: SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti

SJSU SPRING 2011 PARALLEL COMPUTING

Background

Amdahl's law and Gustafson's law Dependencies Race conditions, mutual exclusion,

synchronization, and parallel slowdown Fine-grained, coarse-grained, and

embarrassing parallelism

Page 3: SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti

SJSU SPRING 2011 PARALLEL COMPUTING

Amdahl's Law

The speed-up of a program from parallelization is limited by how much of the program can be parallelized.

Amdahl's Law

Page 4: SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti

SJSU SPRING 2011 PARALLEL COMPUTING

Dependencies

Consider the following functions, which demonstrate several kinds of dependencies:

1: function Dep(a, b)2: c := a·b3: d := 2·c4: end function

Operation 3 in Dep(a, b) cannot be executed before (or even in parallel with) operation 2, because operation 3 uses a result from operation 2. It violates condition 1, and thus introduces a flow dependency.

Page 5: SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti

SJSU SPRING 2011 PARALLEL COMPUTING

Dependencies

Consider the following functions

1: function NoDep(a, b)2: c := a·b3: d := 2·b4: e := a+b5: end function

In this example, there are no dependencies between the instructions, so they can all be run in parallel.

Page 6: SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti

SJSU SPRING 2011 PARALLEL COMPUTING

Race condition

A flaw whereby the output or result of the process is unexpectedly and critically dependent on the sequence or timing of other events.

Can occur in electronics systems, logic circuits, and multithreaded software.

Race condition in a logic circuit. Here, ∆t1 and ∆t2 represent the propagation delays of

The logic elements. When the input value (A) changes, the circuit outputs a short spike of duration ∆t1.

Page 7: SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti

SJSU SPRING 2011 PARALLEL COMPUTING

Fine-grained, coarse-grained, and embarrassing parallelism

Applications are often classified according to how often their subtasks need to synchronize or communicate with each other.

Fine-grained parallelism: subtasks must communicate many times per second

Coarse-grained parallelism: they do not communicate many times per second

Embarrassingly parallel: rarely or never have to communicate. Embarrassingly parallel applications are the easiest to parallelize

Page 8: SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti

SJSU SPRING 2011 PARALLEL COMPUTING

Types of parallelism

Data parallelism Task parallelism Bit-level parallelism Instruction-level parallelism

A five-stage pipelined superscalar processor, capable of issuing two instructions per cycle.

It can have two instructions in each stage of the pipeline, for a total of up to 10 instructions (shown

in green) being simultaneously executed.

Page 9: SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti

SJSU SPRING 2011 PARALLEL COMPUTING

Hardware

Memory and communication Classes of parallel computers Multicore computing Symmetric multiprocessing Distributed computing

Page 10: SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti

SJSU SPRING 2011 PARALLEL COMPUTING

Multicore Computing

PROS

better than dual core

won't use the same bandwidth and bus

therefore be even faster.

CONS

heat dissipation problems

more expensive

Page 11: SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti

SJSU SPRING 2011 PARALLEL COMPUTING

Software

Parallel programming languages Automatic parallelization Application checkpointing

Page 12: SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti

SJSU SPRING 2011 PARALLEL COMPUTING

Parallel programming languages

Concurrent programming languages, libraries, APIs, and parallel programming models (such as Algorithmic Skeletons) have been created for programming parallel computers.

Shared memory Distributed memory Shared distributed memory

Page 13: SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti

SJSU SPRING 2011 PARALLEL COMPUTING

Automatic parallelizationAutomatic parallelization of a sequential program

by a compiler is the holy grail of parallel computing. Despite decades of work by compiler researchers, has had only limited success.

Mainstream parallel programming languages remain either explicitly parallel or (at best) partially implicit, in which a programmer gives the compiler directives for parallelization.

A few fully implicit parallel programming languages exist—SISAL, Parallel Haskell, and (for FPGAs) Mitrion-C.

Page 14: SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti

SJSU SPRING 2011 PARALLEL COMPUTING

Application checkpointing

The larger and more complex a computer is, the more that can go wrong and the shorter the mean time between failures.

Application checkpointing is a technique whereby the computer system takes a "snapshot" of the application. This information can be used to restore the program if the computer should fail.

Page 15: SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti

SJSU SPRING 2011 PARALLEL COMPUTING

Algorithmic methods

Parallel computing is used in a wide range of fields, from bioinformatics to economics. Common types of problems found in parallel computing applications are:

Dense linear algebra Sparse linear algebra Dynamic programming Finite-state machine simulation

Page 16: SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti

SJSU SPRING 2011 PARALLEL COMPUTING

Programming

The parallel architectures of supercomputers often dictate the use of special programming techniques to exploit their speed.

The base language of supercomputer code is, in general, Fortran or C, using special libraries to share data between nodes.

The new massively parallel GPGPUs have hundreds of processor cores and are programmed using programming models such as CUDA and OpenCL.

Page 17: SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti

SJSU SPRING 2011 PARALLEL COMPUTING

Classes of parallel computers

Parallel computers can be roughly classified according to the level at which the hardware supports parallelism.

Multicore computing Symmetric multiprocessing Distributed computing Specialized parallel computers

Page 18: SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti

SJSU SPRING 2011 PARALLEL COMPUTING

Multicore computing

Includes multiple execution units ("cores") on the same chip.

Can issue multiple instructions per cycle from multiple instruction streams. Each core in a multicore processor can potentially be superscalar.

Simultaneous multithreading has only one execution unit, but when that unit is idling (such as during a cache miss), it process a second thread. IBM's Cell microprocessor, for use in the Sony PlayStation 3 is multithreading.

Page 19: SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti

SJSU SPRING 2011 PARALLEL COMPUTING

Symmetric multiprocessing

A computer system with multiple identical processors that share memory and connect via a bus.

Bus contention prevents bus architectures from scaling. As a result, SMPs generally do not comprise more than 32 processors.

Small size of the processors and the significant reduction in the requirements for bus bandwidth achieved by large caches, such symmetric multiprocessors are extremely cost-effective.

Page 20: SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti

SJSU SPRING 2011 PARALLEL COMPUTING

Distributed computing

A distributed memory computer system in which the processing elements are connected by a network.

Highly scalable.

(a)–(b) A distributed system.(c) A parallel system.

Page 21: SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti

SJSU SPRING 2011 PARALLEL COMPUTING

Specialized parallel computers

Within parallel computing, there are specialized parallel devices that tend to be applicable to only a few classes of parallel problems.

Reconfigurable computing General-purpose computing on graphics

processing units Application-specific integrated circuits Vector processors

Page 22: SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti

SJSU SPRING 2011 PARALLEL COMPUTING

Questions?

Page 23: SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti

SJSU SPRING 2011 PARALLEL COMPUTING

References:

Wikipedia.org Google.com