scalable algorithmic techniquespeople.cs.aau.dk/~adavid/teaching/mvp-11/06-chap05-lect06-07.pdf ·...

74
Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05 [email protected]

Upload: others

Post on 25-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

Scalable AlgorithmicTechniques

Decompositions & Mapping

Alexandre David1.2.05

[email protected]

Page 2: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 2

IntroductionFocus on data parallelism, scale with size.

Task parallelism limited.

Notion of scalability is fuzzy in the book.More precision later.Idea: You lose efficiency with # of processors, gain efficiency with the size of the problem. Scalability measures the ratio.

You can experiment with assignment 2 to see that.

Page 3: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 3

In Practice…Typical tasks:

Identify concurrent works.Map them to processors.Distribute inputs, outputs, and other data.Manage shared resources.Synchronize the processors.

Page 4: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 4

Basic PrinciplesLarge blocks of independent computations.

Rare, [email protected] when computations >> size of data.

Matrix multiplication too.

Good performance recipe:minimize interaction (= communication)maximize locality (= blocks of computation)

Page 5: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 5

Minimizing Interaction Overheads

Maximize data locality.Minimize volume of data-exchange.Minimize frequency of interactions.

Minimize contention and hot spots.Share a link, same memory block, etc…Re-design original algorithm to change the interaction pattern.Use task interaction graph to help.

Page 6: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 6

Minimizing Interaction Overheads

Overlapping computations with interactions – to reduce idling.

Initiate interactions in advance.Non-blocking communications.Multi-threading.

Replicating data or computation.Group communication instead of point to point.Overlapping interactions.

Page 7: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 7

Decomposing ProblemsDecomposition into concurrent tasks.

No unique solution.Different sizes.Decomposition illustrated as a directed graph:

Nodes = tasks.Edges = dependency.

Task dependency graph!

Page 8: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 8

Example: Matrix * Vector

N tasks, 1 task/row:

Matrix

Vect

or

Task dependency graph?

Page 9: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 9

Example: database query processing

MODEL = ``CIVIC'' AND YEAR = 2001 AND(COLOR = ``GREEN'' OR COLOR = ``WHITE)

Page 10: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 10

Measure of concurrency?Nb. of processors?Optimal?

A solution

Page 11: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 11

Another Solution

Better/worse?

Page 12: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 12

GranularityNumber and size of tasks.

Fine-grained: many small tasks.Coarse-grained: few large tasks.

Related: degree of concurrency.(Nb. of tasks executable in parallel).

Maximal degree of concurrency.Average degree of concurrency.

Page 13: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 13

Coarser Matrix * Vector

N tasks, 3 task/row:

Matrix

Vect

or

Page 14: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 14

MeasuresAverage degree of concurrency if we take into account varying amount of work?Critical path = longest directed path between any start & finish nodes.Critical path length = sum of the weights of nodes along this path.Average degree of concurrency = total amount of work / critical path length.

Page 15: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 15

Database example

Critical path (3). Critical path (4).Critical path length = 27. Critical path length = 34.

Av. deg. of concurrency = 63/27. Av. deg. of conc. = 64/34.

2.33 1.88

Page 16: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 16

Number of tasks: 15.

• Maximum degree of concurrency: 8.• Critical path length: 4.• Maximum possible speedup: 15/4.• Minimum number of processes to reachthis speedup: 8.• Maximum speedup if we limit the processesto 2,4, and 8: 15/8, 3, and 15/4.

Example

Page 17: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 17

Number of tasks: 15.

• Maximum degree of concurrency: 8.• Critical path length: 4.• Maximum possible speedup: 15/4.• Minimum number of processes to reachthis speedup: 8.• Maximum speedup if we limit the processesto 2,4, and 8: 15/8, 3, and 15/4.

Example

Page 18: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 18

Number of tasks: 14.

• Maximum degree of concurrency: 8.• Critical path length: 7.• Maximum possible speedup: 14/7.• Minimum number of processes to reachthis speedup: 3.• Maximum speedup if we limit the processesto 2,4, and 8: 14/8, 14/7, and 14/7.

Example

Page 19: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 19

Number of tasks: 15.

• Maximum degree of concurrency: 2.• Critical path length: 8.• Maximum possible speedup: 15/8.• Minimum number of processes to reachthis speedup: 2.• Maximum speedup if we limit the processesto 2,4, and 8: 15/8.

Example

Page 20: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 20

Interaction Between TasksTasks often share data.Task interaction graph:

Nodes = tasks.Edges = interaction.Optional weights.

Task dependency graph is a sub-graph of the task interaction graph.

Page 21: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 21

Characteristics of Task Interactions

One-way interactions.Only one task initiates and completes the communication without interrupting the other one.

Two-way interactions.Producer – consumer model.

Page 22: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 22

Processes and MappingTasks run on processors.Process: processing agent executing the tasks. Not exactly like in your OS course.Processes ~ threads here.Mapping = assignment of tasks to processes.API exposes processes and binding to processors not always controlled.

Scheduling of threads is not controlled.What makes a good mapping?

Page 23: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 23

Mapping example

Page 24: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 24

Processes vs. processorsProcesses = logical computing agent.Processor = hardware computational unit.In general 1-1 correspondence but this model gives better abstraction.Useful for hardware supporting multiple programming paradigms.

How do you decompose?

Page 25: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 25

Decomposition TechniquesRecursive decomposition.

Divide-and-conquer.

Data decomposition.Large data structure.

Exploratory decomposition.Search algorithms.

Speculative decomposition.Dependent choices in computations.

Page 26: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 26

Recursive decompositionProblem solvable by divide-and-conquer:

Decompose into sub-problems.Do it recursively.

Combine the sub-solutions.Do it recursively.

Concurrency: The sub-problems are solved in parallel.

Page 27: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 27

Schwartz’s Algorithm+ reduce by block

Reduce with maximal concurrency:one thread per pair (n/2)combine results in a tree structure

Schwartz:one thread per n/p block of numberslocal sumscombine results in a tree structurefollows recipe

Recursive Decomposition

Page 28: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 28

Combining Results

Page 29: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 29

Peril-Lint nodeval’[P];…forall(index in (0..P-1)){

int tally;stride=1;…while(stride<P){

if (index%(2*stride)==0){

tally += nodeval’[index+stride];stride *= 2;

}else{

nodeval’[index]=tally;break;

}}

}

full/emptyvariableintermediatevalue – constantamount of extraspace

Page 30: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 30

Reduce & Scan AbstractionsReduce: combine values to a single one.

Almost always needed.

Scan: prefix computation.Logic that performs sequential operations and carries along intermediate results.

Lesson: Try to use them as much as possible.Abstract them as functions.

high-level, contain informationmay customize implementation (e.g. BlueGene).

Page 31: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 31

Small typo p130A={0,2,4} ⇒ A={0,2,6}

Page 32: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 32

Basic StructureIdea:

Assume block allocation,use Schwartz’s like algorithm,local variable – tally – stores intermediate results.

Primitives:init() – init tallyaccum() – local accumulationcombine() – combines tally resultsx-gen() – final answer

Page 33: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 33

Example: + reduceinit(): tally=0accum(tally,val): tally+=valuecombine(left,right): left+right sent to parentreduce-gen(root): return

Page 34: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 34

Reduce Basic Structure

Page 35: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 35

Gen

eral

Red

uce

in P

eril-

L

Page 36: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 36

2nd

Min

in P

eril-

L

Page 37: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 37

2nd

Min

in P

eril-

L

Page 38: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 38

General ScanDifference with reduce:

need to pass intermediate results too.Propagate tally down the tree:value from a parent = tally from the left sub-tree of the parent.root has no parent – fix that

Idea:up-sweep with reducedown-sweep to propagate tallys

Page 39: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 39

Prefix Sum - sum

Page 40: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 40

Prefix Sum - prefix

Page 41: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 41

Down-sweep

Page 42: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 42

Scan in Peril-L

Page 43: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 43

Page 44: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 44

LessonStructure the algorithm with reduce & scan.Use efficient implementations of reduce & scan.

Page 45: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 45

Data Decomposition2 steps:

Partition the data.Induce partition into tasks.

How to partition data?Partition output data:

Independent “sub-outputs”.

Partition input data:Local computations, followed by combination.

1-D, 2-D, 3-D block decomposition.

Page 46: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 46

Static Allocation of Work to Processes

# of threads fixed but unknown.Allocate data to threads.Owner compute rule.

Block allocation – maximize locality1-D or 2-D depending on the communication pattern – minimize communicationsurface area to volume in favour of 2-D

Page 47: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 47

Owner-Compute Rule

Process assigned to some datais responsible for all computations associated with it.

Input data decomposition:All computations done on the (partitioned) input data are done by the process.

Output data decomposition:All computations for the (partitioned) output data are done by the process.

Page 48: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 48

1-D & 2-D Block Allocations

Page 49: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 49

Overlap RegionsObtain data from neighbors.Compute locally.Avoid false sharing.Use local matrix

no special edge casesuniform indicesbatch communication cheaper

Page 50: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 50

Overlap Regions

Page 51: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 51

Cyclic & Block CyclicCyclic = round-Robin. Idea:

Partition an array into many more blocks than available processes.Assign partitions (tasks) to processes in a round-robin manner.→ each process gets several non adjacent blocks.

Useful when computations are not proportional to the data.

ex: assignment 2otherwise poor load balance

Good: load balance.Bad: more communication, break large blocks.

Page 52: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 52

Example: LU factorizationNon singular square matrix A (invertible).A = L*U.Useful for solving linear equations.

L

UA

Page 53: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 53

LU factorization

In practice we work on A.

N steps

Page 54: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 54

Load Imbalance: LU-Decomposition

Matrix inversion – similar.

Page 55: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 55

8x8 Array on 5 Processes

Page 56: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 56

Block Cyclic Distribution

Page 57: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 57

Julia SetsAssignment: Mandelbrot

Page 58: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 58

Irregular Allocations

Page 59: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 59

Randomized Distributions

Irregular distribution with regular mapping!Not good.

Page 60: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 60

1-D Randomized Distribution

Permutation

Page 61: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 61

2-D Randomized Distribution

2-D block random distribution.

Block mapping.

Page 62: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 62

Irregular AllocationsSame idea as overlap regions:

get data local – inspectorlocal computations – executor

Page 63: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 63

Dynamic AllocationsKeys

dynamic load balancingdynamic interactionschoose right granularity of tasks

Work queuese.g. producer/consumercentralized schemes with master/slavedifferent queue orderingsmultiple queues – issues with load balancing

Page 64: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 64

Graph PartitioningFor sparse data structures and data dependent interaction patterns.

Numerical simulations. Discretize the problem and represent it as a mesh.

Sparse matrix: assign equal number of nodes to processes & minimize interaction.Example: simulation of dispersion of a water contaminant in Lake Superior.

Page 65: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 65

Discretization

Page 66: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 66

Partitioning Lake Superior

Random partitioning. Partitioning with minimumedge cut.

Finding an exact optimal partitioningis an NP-complete problem.

Page 67: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 67

Exploratory Decomposition - Trees

Exploration ofstates.

Page 68: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 68

TreesUseful data-structures.Usually constructed with pointers.Challenges for

communicationload balance on irregular trees

If little communication among sub-trees:Allocate sub-trees to processes, copy the “cap”.All processes know the structure.

Page 69: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 69

Cap Copy & Subtree Allocation

Page 70: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 70

Dynamic AllocationsDynamic & unpredictable trees.

Search with different algorithms.Work queues useful.Pruning involves communication.Termination may be an issue!

Page 71: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 71

Application to Game Search

Page 72: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 72

Performance Anomalies

Work depends on the order of the search!

Page 73: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 73

Search Orderings Issues

Breadth-first-search(BFS)

1

2 3 4

5 6 7 8 9

10 11 12 13 14

Depth-first-search(DFS)

1

2

3

4 5

6

7

8

9

10

11

12

13

14

Gives shortestpath but maybe more expensivethan heuristics orrandom search.

Page 74: Scalable Algorithmic Techniquespeople.cs.aau.dk/~adavid/teaching/MVP-11/06-chap05-lect06-07.pdf · Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05

11+14-03-2011 MVP'11 - Aalborg University 74

Speculative DecompositionDependencies between tasks are not known a-priori.

How to identify independent tasks?Conservative approach: identify tasks that are guaranteed to be independent.Optimistic approach: schedule tasks even if we are not sure – may roll-back later.