basic of parallel algorithm design

Upload: cleopatra2121

Post on 04-Apr-2018

236 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 Basic of Parallel Algorithm Design

    1/16

    uses some of the slides for chapters 3 and 5 accompanying

    Introduction to Parallel Computing,

    Addison Wesley, 2003.

    http://www-users.cs.umn.edu/~karypis/parbook

  • 7/29/2019 Basic of Parallel Algorithm Design

    2/16

    Preliminaries: Decomposition,

    Tasks, and Dependency Graphs Parallel algorithm should decompose the problem

    into tasks that can be executed concurrently.

    A decomposition can be illustrated in the form of adirected graph (task dependency graph).

    nodes = tasks and edges = dependencies: the result of onetask is required for processing the next

    Degree of concurrency determines the amount oftasks which can indeed run in parallel.

    Critical path is the longest path in the dependencygraph lower bound on program runtime.

  • 7/29/2019 Basic of Parallel Algorithm Design

    3/16

    Example: Multiplying a Dense Matrix

    with a Vector

    Observations: While tasks share data (namely, the vector b ),they do not have any control dependencies - i.e., no taskneeds to wait for the (partial) completion of any other. Alltasks are of the same size in terms of number of operations. Isthis the maximum number of tasks we could decompose this

    problem into?

    Computation of each element ofoutput vectory is independent ofother elements. Based on this, adense matrix-vector product canbe decomposed into nindependent tasks. Is this themaximum number of tasks wecould decompose this probleminto?

  • 7/29/2019 Basic of Parallel Algorithm Design

    4/16

    Multiplying a dense matrix with a

    vector

    2n CPUs availableTask 1 Task 2

    On what kind of platform will we have 2N processors?

  • 7/29/2019 Basic of Parallel Algorithm Design

    5/16

    Multiplying a dense matrix with a

    vector

    2n CPUs availableTask 1

    Task 2 Task 2n+1

    Task 1 Task 2Task 2n+1

    Task 2n-1 Task 2n

    Task 3n

  • 7/29/2019 Basic of Parallel Algorithm Design

    6/16

    Granularity of Task Decompositions The size of tasks into which a problem is

    decomposed is called granularity

  • 7/29/2019 Basic of Parallel Algorithm Design

    7/16

    Parallelization efficiency

  • 7/29/2019 Basic of Parallel Algorithm Design

    8/16

    Guidelines for good parallel algorithm

    design Maximize concurrency.

    Spread tasks evenly between processors to avoid

    idling and achieve load balancing. Execute tasks on critical path as soon as the

    dependencies are satisfied.

    Minimize(overlap) communication between

    processors.

  • 7/29/2019 Basic of Parallel Algorithm Design

    9/16

    Evaluating parallel algorithm Speedup S = Tserial / Tparallel

    Efficiency E = S / #processors

    Cost

    C = #processors * Tparallel Scalability

    Speedup (efficiency) as a function of number ofprocessors with fixed task size.

  • 7/29/2019 Basic of Parallel Algorithm Design

    10/16

    Simple example: summing n numbers on p CPUs

  • 7/29/2019 Basic of Parallel Algorithm Design

    11/16

    Summing n numbers on p CPUs Parallel Time = ((n/p)log p)

    Serial Time = (n)

    Speedup = (p/(log p))

    Efficiency = (1/log p)

  • 7/29/2019 Basic of Parallel Algorithm Design

    12/16

    Parallel multiplication of 3 matricesA, B, C rectangular matrices N x N

    We want to compute in parallel: A x B x C

    How to achieve the maximum performance?

  • 7/29/2019 Basic of Parallel Algorithm Design

    13/16

    A B TEMP

    X

    TEMP C Result

    X

    Try 12-CPU parallelization:

  • 7/29/2019 Basic of Parallel Algorithm Design

    14/16

    A B C'

    X

    C' C Result

    X

    Try 23-CPU parallelization:

    Is this a good way?

  • 7/29/2019 Basic of Parallel Algorithm Design

    15/16

    A(1,1)*B(1,1) A(1,2)*B(2,1) A(1,1)*B(1,2) A(1,2)*B(2,2)

    T(1,1)*C(1,1) T(2,1)*C(2,1)

    Result(1,1)

    +

    +

    +

    Dynamic task creation -

    Simple Master-Worker Identify independent computations

    Create a queue of tasks

    Each process picks from the queue and may insert anew task to the queue

  • 7/29/2019 Basic of Parallel Algorithm Design

    16/16

    Hypothetic implementation

    by multiple threads Let's assign separate thread for each multiply/sumoperation.

    How do we coordinate the threads?

    Initialize threads to know their dependencies.

    Build producer-consumer like logic for every thread.