data structures and algorithms in parallel computing lecture 1

25
Data Structures and Algorithms in Parallel Computing Lecture 1

Upload: elizabeth-harrison

Post on 18-Jan-2016

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Data Structures and Algorithms in Parallel Computing Lecture 1

Data Structures and Algorithms in Parallel Computing

Lecture 1

Page 2: Data Structures and Algorithms in Parallel Computing Lecture 1

Parallel computing

• Form of computation in which many calculations are done simultaneously

• Divide and conquer – Split problem and solve

each sub-problem in parallel

– Pay the communication cost

Page 3: Data Structures and Algorithms in Parallel Computing Lecture 1

A bit of history

• 1958 – S. Gill discusses parallel programming– J. Cocke and D. Slotnick discuss parallel numerical computing

• 1967– Amdahl’s law is introduced

• Defines the speed-up due to parallelism

• 1969– Honneywell introduces the first symmetric multiprocessor

• It allowed for up to 8 parallel processors

• July 2015– China’s Tianhe-2 is the fastest computer in the world

• 33.86 petaflops

Page 4: Data Structures and Algorithms in Parallel Computing Lecture 1

Classification

• Bit level– Increase word size to reduce number of instructions

• 2 instructions to add a 16 bit number on 8 bit processor• 1 instruction to add a 16 bit number on 16 bit processor

• Instruction level– Hardware level– Software level– Example

1. e = a + b2. f = c + d3. m = e * f

• 3 depends on 1 and 2 both of which can be executed in parallel

Page 5: Data Structures and Algorithms in Parallel Computing Lecture 1

Classification (2)

• Data parallelism– Big Data

• Volume, Velocity, Variety, Veracity• Does not fit in memory

– Split data among different processors• Each processor executes same code on different data piece

– MapReduce

• Task parallelism– Distribute tasks on processors and execute them in

parallel

Page 6: Data Structures and Algorithms in Parallel Computing Lecture 1

Architecture classification

• Flynn’s taxonomy (1966)– Single Instruction Single Data stream (SISD)

• No parallelism• Uniprocessor PCs

– Single Instruction Multiple Data streams (SIMD)• Data parallelism• GPUs

– Multiple Instructions Single Data streams (MISD)• Fault tolerant systems

– Multiple Instructions Multiple Data streams (MIMD)• Different tasks handle different data streams• Distributed computing

Page 7: Data Structures and Algorithms in Parallel Computing Lecture 1

Architecture classification (2)

• MIMD can be further divided:– Single Program Multiple Data• Autonomous processors execute asynchronously the

same program

– Multiple Program Multiple Data• Autonomous processors execute different programs

– Manager/worker strategy

Page 8: Data Structures and Algorithms in Parallel Computing Lecture 1

Memory models

• Shared memory– Multiple programs access the same memory– Example:

• Cray machines

• Distributed shared memory– Memory physically distributed– Programs access the same address space

• Distributed memory– Each processor has its own private memory– Example:

• Grid computing

Page 9: Data Structures and Algorithms in Parallel Computing Lecture 1

Need for speed

• Amdahl’s law

Page 10: Data Structures and Algorithms in Parallel Computing Lecture 1

Algorithm design

• How to transform a sequential algorithm in a parallel one?

• Example: Compute the sum of n numbers– Numbers are stored in a matrix A1. Pair A[i] with A[i+1]2. Add the pair on machine k

• We need n/k machines• We obtain a new sequence of n/k numbers

3. Repeat from step 14. After log2n iterations we get a sequence of 1 number: the

sum

Page 11: Data Structures and Algorithms in Parallel Computing Lecture 1

Modeling parallel computations

• No consensus on the right model

• Random-Access Machine (PRAM)– Ignores many of the computer architecture details– Captures enough detail for reasonable accuracy– Each CPU operation including arithmetic and

logical operations, and memory accesses requires 1 time step

Page 12: Data Structures and Algorithms in Parallel Computing Lecture 1

Multiprocessor model

• Local memory– Each processor has its own local memory– Processors are attached to a local network

• Modular memory– M memory modules

• Parallel RAM (PRAM)– Shared memory– No real machine lives up to its ideal of unit time

access to a shared memory

Page 13: Data Structures and Algorithms in Parallel Computing Lecture 1

Network limitations

Communication bottlenecks• Bus topology

– Processors take turn to access the bus• 2 dimensional mesh

– Remote accesses are done by routing messages– Appears in local memory machines

• Multistage network– Used to connect one set of input switches to another set of output

switches– Designed for telephone networks– Appears in modular memory machines– Processors are attached to input switches and memory to output

switches

Page 14: Data Structures and Algorithms in Parallel Computing Lecture 1

Network limitations (2)

• Algorithms designed for one topology may not work for another

• Algorithms considering network topology are more complicated than the ones designed for simpler models such as PRAM

Page 15: Data Structures and Algorithms in Parallel Computing Lecture 1

Model routing capabilities

• Alternative to topology modeling• Consider – Bandwidth• Rate at which a processor can inject data in the

network

– Latency• Time to traverse the network

Page 16: Data Structures and Algorithms in Parallel Computing Lecture 1

Model routing capabilities (2)

• Existing models:– Postal model• Model only latency

– Bulk Synchronous Parallel• Adds g, i.e., the minimum ration of computation steps

to communication steps

– LogP• Adds o, i.e., the overhead of a processor upon

sending/receiving a message

Page 17: Data Structures and Algorithms in Parallel Computing Lecture 1

Primitive operations

• Basic operations that processors and network can perform– All processors can perform the same local

instructions as the single processor in the RAM model

– Processors can also issue non-local memory requests • For message passing• For global operations

Page 18: Data Structures and Algorithms in Parallel Computing Lecture 1

Restrictions on operations

• Restrictions on operations can exist– E.g., two processors may not write the same memory location

at the same time• Exclusive vs. concurrent access– Exclusive read exclusive write (EREW)– Concurrent read concurrent write (CRCW)– Concurrent read exclusive write (CREW)

• Solving concurrent writes– Random picking– Priority picking– Queued access: Queued read queued write

Page 19: Data Structures and Algorithms in Parallel Computing Lecture 1

Examples of operations

• Read-write to non-local memory or other processors

• Synchronization• Broadcast messages to processors• Gather messages from processors

Page 20: Data Structures and Algorithms in Parallel Computing Lecture 1

Work-depth model

• Focus on algorithm instead of the multiprocessor model

• Cost of an algorithm is determined based on the number of operations and their dependencies:

P=W/D– Where W is the total number of operations (work)– And D is the longest chain of dependencies among

them (depth)

Page 21: Data Structures and Algorithms in Parallel Computing Lecture 1

Types of work-depth models

• Vector model– Sequence of steps operating on a vector

• Circuit model– Nodes (operations) and directed arcs (communication)– Input arcs

• Provide input to the whole circuit

– Output arcs• Return the final output values of the circuit

– No directed cycles allowed• Language model

Page 22: Data Structures and Algorithms in Parallel Computing Lecture 1

Circuit model example

Page 23: Data Structures and Algorithms in Parallel Computing Lecture 1

Importance of cost

• Cost can be applied to multiprocessor models too– The work is equal to the number of processors times

the time required for the algorithm to finish– The depth is equal to the total time required to

execute the algorithm• E.g., weather forecasting, real-time planning

A parallel algorithm is work-efficient if asymptotically it requires at most a constant factor more work than the best sequential algorithm known

Page 24: Data Structures and Algorithms in Parallel Computing Lecture 1

What’s next?• Parallel algorithmic techniques

– Divide and conquer– Randomization– Parallel pointer techniques

• Graphs– Breadth first search– Connected components– Page Rank– Single source shortest path– Vertex centric vs. subgraph centric models

• Sorting– Quicksort– Radix sort

• Computational geometry– Closest pair– Planar convex hull

• Numerical algorithms– Matrix operations– Fourier transform

Page 25: Data Structures and Algorithms in Parallel Computing Lecture 1

Evaluation

• 100% of the grade comes from projects & assignments– 7 assignments requiring to implement a parallel

algorithm– Passing grade if at least 2 are completed