simd and associative computing computational models and algorithms

94
SIMD and Associative Computing Computational Models and Algorithms

Upload: amy-miles

Post on 16-Dec-2015

227 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: SIMD and Associative Computing Computational Models and Algorithms

SIMD and Associative Computing

Computational Models and Algorithms

Page 2: SIMD and Associative Computing Computational Models and Algorithms

2

Associative Computing Topics

• Introduction– References – SIMD computing & Architecture– Motivation for the MASC model– The MASC and ASC Models– A Language Designed for the ASC Model– List of Algorithms and Programs designed for

ASC • An ASC Algorithm Examples

– ASC version of Prim’s MST Algorithm

Page 3: SIMD and Associative Computing Computational Models and Algorithms

Comment on Slides Included

• Some of these slides will be covered only lightly or else left for students to read.– The emphasis here is to provide an introduction

to material covered, not a deep understanding. – Inclusion of these slides will provide a better

survey of this material. – This material is a useful background for the Air

Traffic Control example and projects we expect to use in this course.

3

Page 4: SIMD and Associative Computing Computational Models and Algorithms

4

Associative Computing References

Note: Below KSU papers are available on the website: http://www.cs.kent.edu/~parallel/

(Click on the link to “papers”)

1. Maher Atwah, Johnnie Baker, and Selim Akl, An Associative Implementation of Classical Convex Hull Algorithms, Proc of the IASTED International Conference on Parallel and Distributed Computing and Systems, 1996, 435-438

2. Mingxian Jin, Johnnie Baker, and Kenneth Batcher, Timings for Associative Operations on the MASC Model, Proc. of the 15th International Parallel and Distributed Processing Symposium, (Workshop on Massively Parallel Processing, San Francisco, April 2001.

Page 5: SIMD and Associative Computing Computational Models and Algorithms

5

Associative Computing References

3. Jerry Potter, Johnnie Baker, Stephen Scott, Arvind Bansal, Chokchai Leangsuksun, and Chandra Asthagiri, An Associative Computing Paradigm, Special Issue on Associative Processing, IEEE Computer, 27(11):19-25, Nov. 1994. (Note: MASC is called ‘ASC’ in this article.)

4. Jerry Potter, Associative Computing - A Programming Paradigm for Massively Parallel Computers, Plenum Publishing Company, 1992.

Page 6: SIMD and Associative Computing Computational Models and Algorithms

SIMD slides from Chapter 2

Page 7: SIMD and Associative Computing Computational Models and Algorithms

7

Alternate Names for SIMDs

• Recall that all active processors of a true SIMD computer must simultaneously access the same memory location.

• The value in the i-th processor can be viewed as the i-th component of a vector.

• SIMD machines are sometimes called vector computers [Jordan,et.al.] or processor arrays [Quinn 94,04] based on their ability to execute vector and matrix operations efficiently.

Page 8: SIMD and Associative Computing Computational Models and Algorithms

8

SIMD Architecture• Has only one control unit.• Scientific applications have data parallelism

Page 9: SIMD and Associative Computing Computational Models and Algorithms

9

Data/instruction Storage

• Front end computer– Also called the control unit– Holds and runs program– Data manipulated sequentially

• Processor array– Data manipulated in parallel

Page 10: SIMD and Associative Computing Computational Models and Algorithms

10

Processor Array Performance

• Performance: work done per time unit

• Performance of processor array– Speed of processing elements– Utilization of processing elements

Page 11: SIMD and Associative Computing Computational Models and Algorithms

11

Performance Example 1

• 1024 processors

• Each adds a pair of integers in 1 sec (1 microsecond or one millionth of second or 10-6 second.)

• What is the performance when adding two 1024-element vectors (one per processor)?

sec/ops10024.1ePerformanc 9sec1

operations1024

Page 12: SIMD and Associative Computing Computational Models and Algorithms

12

Performance Example 2

• 512 processors

• Each adds two integers in 1 sec

• What is the performance when adding two vectors of length 600?

• Since 600 > 512, 88 processor must add two pairs of integers.

• The other 424 processors add only a single pair of integers.

Page 13: SIMD and Associative Computing Computational Models and Algorithms

13

Example of a 2-D Processor Interconnection Network in a Processor

ArrayEach VLSI chip has 16 processing elements.Each PE can simultaneously send a value to a neighbor.

PE = processor element

Page 14: SIMD and Associative Computing Computational Models and Algorithms

14

SIMD Execution Style• The traditional (SIMD, vector, processor array) execution

style ([Quinn 94, pg 62], [Quinn 2004, pgs 37-43]:– The sequential processor that broadcasts the

commands to the rest of the processors is called the front end or control unit (or sometimes host).

– The front end is a general purpose CPU that stores the program and the data that is not manipulated in parallel.

– The front end normally executes the sequential portions of the program.

– Each processing element has a local memory that can not be directly accessed by the control unit or other processing elements.

Page 15: SIMD and Associative Computing Computational Models and Algorithms

15

SIMD Execution Style

– Collectively, the individual memories of the processing elements (PEs) store the (vector) data that is processed in parallel.

• Called the parallel memory

– When the front end encounters an instruction whose operand is a vector, it issues a command to the PEs to perform the instruction in parallel.

– Although the PEs execute in parallel, some units can be allowed to skip any particular instruction.

Page 16: SIMD and Associative Computing Computational Models and Algorithms

16

Masking in Processor Arrays

• All the processors work in lockstep except those that are masked out (by setting mask register).

• The conditional if-then-else is different for processor arrays than sequential version – Every active processor tests to see if its data meets

the negation of the boolean condition.– If it does, it sets its mask bit so those processors will

not participate in the operation initially.– Next the unmasked processors, execute the THEN

part.– Afterwards, mask bits (for original set of active

processors) are flipped and unmasked processors perform the ELSE part.

Page 17: SIMD and Associative Computing Computational Models and Algorithms

17

if (COND) then A else B

Page 18: SIMD and Associative Computing Computational Models and Algorithms

18

if (COND) then A else B

Page 19: SIMD and Associative Computing Computational Models and Algorithms

19

if (COND) then A else B

Page 20: SIMD and Associative Computing Computational Models and Algorithms

20

SIMD Machines• An early SIMD computer designed for vector and

matrix processing was the Illiac IV computer – Initial development at the University of Illinois 1965-70 – Moved to NASA Ames, completed in 1972 but not

fully functional until 1976.– See Jordan et. al., pg 7 and Wikipedia

• The MPP, DAP, the Connection Machines CM-1 and CM-2, MasPar MP-1 and MP-2 are examples of SIMD computers– See Akl pg 8-12 and [Quinn, 94]

• The CRAY-1 and the Cyber-205 use pipelined arithmetic units to support vector operations and are sometimes called a pipelined SIMD – See [Jordan, et al, p7], [Quinn 94, pg 61-2], and

[Quinn 2004, pg37).

Page 21: SIMD and Associative Computing Computational Models and Algorithms

21

SIMD Machines• Quinn [1994, pg 63-67] discusses the CM-2

Connection Machine (with 64K PEs) and a smaller & updated CM-200.

• Our Professor Batcher was the chief architect for the STARAN and the MPP (Massively Parallel Processor) and an advisor for the ASPRO– ASPRO is a small second generation STARAN used

by the Navy in surveillance planes.

• Professor Batcher is best known architecturally for the MPP, which is at the Smithsonian Institute & currently displayed at a D.C. airport.

Page 22: SIMD and Associative Computing Computational Models and Algorithms

22

Today’s SIMDs

• Many SIMDs are being embedded in sequential machines.

• Others are being build as part of hybrid architectures.

• Others are being build as special purpose machines, although some of them could classify as general purpose.

• Much of the recent work with SIMD architectures is proprietary.– Often the fact that a parallel computer is SIMD is not

mentioned by company building them.

Page 23: SIMD and Associative Computing Computational Models and Algorithms

23

ClearSpeed’s Inexpensive SIMD

• ClearSpeed is producing a COTS (commodity off the shelf) SIMD Board

• Not a traditional SIMD as the hardware doesn’t synchronize every step.– PEs are full CPUs– Hardware design supports efficient synchronization

• This machine is programmed like a SIMD.• The U.S. Navy has observed that their machines

process radar a magnitude faster than others.• There is quite a bit of information about this at

www.clearspeed.com and www.wscape.com

Page 24: SIMD and Associative Computing Computational Models and Algorithms

24

Special Purpose SIMDs in the Bioinformatics Arena

• Parcel – Acquired by Celera Genomics in 2000– Products include the sequence

supercomputer GeneMatcher, which has a high throughput sequence analysis capability• Supports over a million processors

– GeneMatcher was used by Celera in their race with U.S. government to complete the description of the human genome sequencing

• TimeLogic, Inc– Has DeCypher, a reconfigurable SIMD

Page 25: SIMD and Associative Computing Computational Models and Algorithms

25

Advantages of SIMDs

• Reference: [Roosta, pg 10] • Less hardware than MIMDs as they have only

one control unit. – Control units are complex.

• Less memory needed than MIMD – Only one copy of the instructions need to be stored– Allows more data to be stored in memory.

• Less startup time in communicating between PEs.

Page 26: SIMD and Associative Computing Computational Models and Algorithms

26

Advantages of SIMDs (cont)

• Single instruction stream and synchronization of PEs make SIMD applications easier to program, understand, & debug.– Similar to sequential programming

• Control flow operations and scalar operations can be executed on the control unit while PEs are executing other instructions.

• MIMD architectures require explicit synchronization primitives, which create a substantial amount of additional overhead.

Page 27: SIMD and Associative Computing Computational Models and Algorithms

27

Advantages of SIMDs (cont)• During a communication operation between

PEs, – PEs send data to a neighboring PE in parallel and in

lock step– No need to create a header with routing information

as “routing” is determined by program steps.– the entire communication operation is executed

synchronously– SIMDs are deterministic & have much more

predictable running time.• Can normally compute a tight (worst case) upper

bound for the time for communications operations.

• Less complex hardware in SIMD since no message decoder is needed in the PEs– MIMDs need a message decoder in each PE.

Page 28: SIMD and Associative Computing Computational Models and Algorithms

28

SIMD Shortcomings(with some rebuttals)

• Claims are from our textbook [i.e., Quinn 2004].– Similar statements are found in [Grama, et. al].

• Claim 1: Not all problems are data-parallel– While true, most problems seem to have a

data parallel solution. – In [Fox, et.al.], the observation was made in

their study of large parallel applications at national labs, that most were data parallel by nature, but often had points where significant branching occurred.

Page 29: SIMD and Associative Computing Computational Models and Algorithms

29

SIMD Shortcomings(with some rebuttals)

• Claim 2: Speed drops for conditionally executed branches– MIMDs processors can execute multiple branches

concurrently.– For an if-then-else statement with execution times for

the “then” and “else” parts being roughly equal, about ½ of the SIMD processors are idle during its execution

• With additional branching, the average number of inactive processors can become even higher.

• With SIMDs, only one of these branches can be executed at a time.

• This reason justifies the study of multiple SIMDs (or MSIMDs).

Page 30: SIMD and Associative Computing Computational Models and Algorithms

30

SIMD Shortcomings(with some rebuttals)

• Claim 2 (cont): Speed drops for conditionally executed code– In [Fox, et.al.], the observation was made that

for the real applications surveyed, the MAXIMUM number of active branches at any point in time was about 8.

– The cost of the extremely simple processors used in a SIMD are extremely low

• Programmers used to worry about ‘full utilization of memory’ but stopped this after memory cost became insignificant overall.

Page 31: SIMD and Associative Computing Computational Models and Algorithms

31

SIMD Shortcomings(with some rebuttals)

• Claim 3: Don’t adapt to multiple users well.– This is true to some degree for all parallel computers. – If usage of a parallel processor is dedicated to a

important problem, it is probably best not to risk compromising its performance by ‘sharing’

– This reason also justifies the study of multiple SIMDs (or MSIMD).

– SIMD architecture has not received the attention that MIMD has received and can greatly benefit from further research.

Page 32: SIMD and Associative Computing Computational Models and Algorithms

32

SIMD Shortcomings(with some rebuttals)

• Claim 4: Do not scale down well to “starter” systems that are affordable.– This point is arguable and its ‘truth’ is likely to

vary rapidly over time– ClearSpeed currently sells a very economical

SIMD board that plugs into a PC.

Page 33: SIMD and Associative Computing Computational Models and Algorithms

33

SIMD Shortcomings(with some rebuttals)

Claim 5: Requires customized VLSI for processors and expense of control units in PCs has dropped.

• Reliance on COTS (Commodity, off-the-shelf parts) has dropped the price of MIMDS

• Expense of PCs (with control units) has dropped significantly

• However, reliance on COTS has fueled the success of ‘low level parallelism’ provided by clusters and restricted new innovative parallel architecture research for well over a decade.

Page 34: SIMD and Associative Computing Computational Models and Algorithms

34

SIMD Shortcomings(with some rebuttals)

Claim 5 (cont.)• There is strong evidence that the period of

continual dramatic increases in speed of PCs and clusters is ending.

• Continued rapid increases in parallel performance in the future will be necessary in order to solve important problems that are beyond our current capabilities

• Additionally, with the appearance of the very economical COTS SIMDs, this claim no longer appears to be relevant.

Page 35: SIMD and Associative Computing Computational Models and Algorithms

Slides from Associative Computing – Part 1

Page 36: SIMD and Associative Computing Computational Models and Algorithms

36

Associative Computers

Associative Computer: A SIMD computer with a few additional features supported in hardware.

• These additional features can be supported (less efficiently) in traditional SIMDs in software.

• The name “associative” is due to its ability to locate items in the memory of PEs by content rather than location.

Page 37: SIMD and Associative Computing Computational Models and Algorithms

37

Associative Models

The ASC model (for ASsociative Computing) gives a list of the properties assumed for an associative computer.

The MASC (for Multiple ASC) Model• Supports multiple SIMD (or MSIMD)

computation. • Allows model to have more than one Instruction

Stream (IS)– The IS corresponds to the control unit of a SIMD.

• ASC is the MASC model with only one IS. – The one IS version of the MASC model is sufficiently

important to have its own name.

Page 38: SIMD and Associative Computing Computational Models and Algorithms

38

ASC & MASC are KSU Models• Several professors and their graduate students at Kent

State University have worked on models • The STARAN and the ASPRO fully support the ASC

model in hardware. The MPP supports ASC, partly in hardware and partly in software.– Prof. Batcher was chief architect or consultant – He received both the Eckert-Mauchly Award and the

Seymour Cray Computer Engineering Award • Dr. Potter developed a language for ASC• Dr. Baker works on algorithms for models and

architectures to support models• Dr. Walker is working with a hardware design to support

the ASC and MASC models. • Dr. Batcher and Dr. Potter are currently not actively

working on ASC/MASC models but still provide advice.

Page 39: SIMD and Associative Computing Computational Models and Algorithms

39

Motivation• The STARAN Computer (Goodyear Aerospace,

early 1970’s) and later the ASPRO provided an architectural model for associative computing embodied in the ASC model.– STARAN built to support Air Traffic Control.– ASPRO built to support Air Defense Systems

• ASC extends the data parallel programming style to a complete computational model.

• ASC provides a practical model that supports massive parallelism.

• MASC provides a hybrid data-parallel, control parallel model that supports associative programming.

• Descriptions of these models allow them to be compared to other parallel models

Page 40: SIMD and Associative Computing Computational Models and Algorithms

40

The ASC Model

IS

CELL

NETWORK

PEMemory

Cells

PEMemory

PEMemory

Page 41: SIMD and Associative Computing Computational Models and Algorithms

41

Basic Properties of ASC• Instruction Stream

– The IS has a copy of the program and can broadcast instructions to cells in unit time

• Cell Properties– Each cell consists of a PE and its local memory– All cells listen to the IS – A cell can be active, inactive, or idle

• Inactive cells listen but do not execute IS commands until reactivated

• Idle cells contain no essential data and are available for reassignment

• Active cells execute IS commands synchronously

Page 42: SIMD and Associative Computing Computational Models and Algorithms

42

Basic Properties of ASC

• Responder Processing– The IS can detect if a data test is satisfied by

any of its responder cells in constant time (i.e., any-responders property).

– The IS can select an arbitrary responder in constant time (i.e., pick-one property).

Page 43: SIMD and Associative Computing Computational Models and Algorithms

43

• Constant Time Global Operations (across PEs)– Logical OR and AND of binary values– Maximum and minimum of numbers– Associative searches

• Communications– There are at least two real or virtual networks

• PE communications (or cell) network • IS broadcast/reduction network (which could be

implemented as two separate networks)

Basic Properties of ASC

Page 44: SIMD and Associative Computing Computational Models and Algorithms

44

Basic Properties of ASC

– The PE communications network is normally supported by an interconnection network

• E.g., a 2D mesh

– The broadcast/reduction network(s) are normally supported by a broadcast and a reduction network (sometimes combined).

• See posted paper by Jin, Baker, & Batcher (listed in associative references)

• Control Features– PEs and the IS and the networks all operate

synchronously, using the same clock

Page 45: SIMD and Associative Computing Computational Models and Algorithms

45

Non-SIMD Properties of ASC

• Observation: The ASC properties that are unusual for SIMDs are the constant time operations:– Constant time responder processing

• Any-responders?• Pick-one

– Constant time global operations• Logical OR and AND of binary values• Maximum and minimum value of numbers• Associative Searches

• These timings are justified by implementations using a resolver in the paper by Jin, Baker, & Batcher (listed in associative references and posted).

Page 46: SIMD and Associative Computing Computational Models and Algorithms

46

1

Busy-idle

Dodge

Ford

Ford

Make

Subaru

Color

PE1

PE2

PE3

PE4

PE5

PE6

PE7

red

blue

white

red

Year

1994

1996

1998

1997

Model PriceOnlot

1

1

0

0

0

0

1

0

1

1

0

0

1

IS

Typical Data Structure for ASC Model

Make, Color – etc. are fields the programmer establishesVarious data types are supported. Some examples will show string data, but they are not supported in the ASC simulator.

Page 47: SIMD and Associative Computing Computational Models and Algorithms

47

Dodge

Ford

Ford

Make

Subaru

Color

PE1

PE2

PE3

PE4

PE5

PE6

PE7

red

blue

white

red

Year

1994

1996

1998

1997

Model PriceOnlot

1

1

0

0

0

0

1

Busy-idle

1

0

1

1

0

0

1

IS

The Associative Search

IS asks for all cars that are red and on the lot.PE1 and PE7 respond by setting a mask bit in their PE.

Page 48: SIMD and Associative Computing Computational Models and Algorithms

48

MASC Model• Basic Components

– An array of cells, each consisting of a PE and its local memory

– A PE interconnection network between the cells

– One or more Instruction Streams (ISs)

– An IS network

• MASC is a MSIMD model that supports – both data and control

parallelism– associative

programming

Memory

Memory

Memory

Memory

Memory

Memory

Memory

Memory

PE

Inte

rcon

nect

ion

Net

wor

k

IS N

etw

ork

PE

PE

PE

PE

PE

PE

PE

PE

Instruc-tion

Stream(IS)

Instruc-tion

Stream(IS)

Instruc-tion

Stream(IS)

Page 49: SIMD and Associative Computing Computational Models and Algorithms

49

MASC Basic Properties• Each cell can listen to only one IS

• Cells can switch ISs in unit time, based on the results of a data test.

• Each IS and the cells listening to it follow rules of the ASC model.

• Control Features:– The PEs, ISs, and networks all operate

synchronously, using the same clock– Restricted job control parallelism is used to

coordinate the interaction of the multiple ISs.

Page 50: SIMD and Associative Computing Computational Models and Algorithms

50

Characteristics of Associative

Programming • Consistent use of style of programming called

data parallel programming• Consistent use of global associative searching

and responder processing• Usually, frequent use of the constant time global

reduction operations: AND, OR, MAX, MIN• Broadcast of data using IS bus allows the use of

the PE network to be restricted to parallel data movement.

Page 51: SIMD and Associative Computing Computational Models and Algorithms

51

Characteristics of Associative Programming

• Tabular representation of data – think 2D arrays• Use of searching instead of sorting• Use of searching instead of pointers• Use of searching instead of the ordering provided

by linked lists, stacks, queues• Promotes an highly intuitive programming style

that promotes high productivity• Uses structure codes (i.e., numeric

representation) to represent data structures such as trees, graphs, embedded lists, and matrices.

• Examples of the above are given in– Ref: Nov. 1994 IEEE Computer article in references– Also, see “Associative Computing” book by Potter.

Page 52: SIMD and Associative Computing Computational Models and Algorithms

52

Languages Designed for the ASC

• Professor Potter has created several languages for the ASC model.

• The most important of these is called ASC, a C-like language designed for ASC model

• ACE is a higher level language than ASC that uses natural language syntax; e.g., plurals, pronouns.

• Language References:– ASC Primer – Copy available on parallel lab website

www.cs.kent.edu/~parallel/– “Associative Computing” book by Potter [11] – some

features in this book were never fully implemented in ASC Compiler

Page 53: SIMD and Associative Computing Computational Models and Algorithms

53

Algorithms and Programs Implemented in ASC

• A wide range of algorithms implemented in ASC without the use of the PE network:– Graph Algorithms

• minimal spanning tree• shortest path• connected components

– Computational Geometry Algorithms• convex hull algorithms (Jarvis March, Quickhull,

Graham Scan, etc)• Dynamic hull algorithms

Page 54: SIMD and Associative Computing Computational Models and Algorithms

54

ASC Algorithms and Programs(not requiring PE network)

– String Matching Algorithms• all exact substring matches• all exact matches with “don’t care” (i.e., wild card)

characters.

– Algorithms for NP-complete problems• traveling salesperson • 2-D knapsack.

– Data Base Management Software• associative data base• relational data base

Page 55: SIMD and Associative Computing Computational Models and Algorithms

55

ASC Algorithms and Programs (not requiring a PE network)

– A Two Pass Compiler for ASC – not the one we will be using. This compiler uses ASC parallelism.

• first pass• optimization phase

– Two Rule-Based Inference Engines for AI• An Expert System OPS-5 interpreter• PPL (Parallel Production Language interpreter)

– A Context Sensitive Language Interpreter• (OPS-5 variables force context sensitivity)

– An associative PROLOG interpreter

Page 56: SIMD and Associative Computing Computational Models and Algorithms

56

Associative Algorithms & Programs

(using a network)• There are numerous associative programs that

use a PE network;– 2-D Knapsack ASC Algorithm using a 1-D mesh– Image processing algorithms using 1-D mesh– FFT (Fast Fourier Transform) using 1-D nearest

neighbor & Flip networks– Matrix Multiplication using 1-D mesh– An Air Traffic Control Program (using Flip network

connecting PEs to memory)• Demonstrated using live data at Knoxville in mid

70’s.

• All but first were developed in assembler at Goodyear Aerospace

Page 57: SIMD and Associative Computing Computational Models and Algorithms

57

Example 1 – An ASC algorithm for MST

• A graph has nodes labeled by some identifying letter or number and arcs which are directional and have weights associated with them.

• Such a graph could represent a map where the nodes are cities and the arc weights give the mileage between two cities.

A B

C D

E

3

5 2

54

Page 58: SIMD and Associative Computing Computational Models and Algorithms

58

The MST Problem

• The MST problem assumes the weights are positive, the graph is connected, and seeks to find the minimal spanning tree,

– i.e. a subgraph that is a tree1, that includes all nodes (i.e. it spans), and

– where the sum of the weights on the arcs of the subgraph is the smallest possible weight (i.e. it is minimal).

• Note: The solution may not be unique.1 A tree is a set of points called vertices, pairs of distinct

vertices called edges, such that (1) there is a sequence of edges called a path from any vertex to any other, and (2) there are no circuits, that is, no paths starting from a vertex and returning to the same vertex.

Page 59: SIMD and Associative Computing Computational Models and Algorithms

Recalling Prim’s MST Sequential Algorithm

• The next 12 slides are included to recall Prim’s MST sequential algorithm

• These slides are reference slides for students and will not be covered in class.

59

Page 60: SIMD and Associative Computing Computational Models and Algorithms

60

An Example(Prim’s MST Sequential Algorithm)

DE

HI

F CG

BA

86

53

3

2

2

2

1

6

1

4

2

47

As we will see, the algorithm is simple.The ASC program is quite easy to write.A SISD solution is a bit messy because of the data structures needed to hold the data for the problem

Page 61: SIMD and Associative Computing Computational Models and Algorithms

61

An Example – Step 0

DE

HI

F CG

BA

86

53

3

2

2

2

1

6

1

4

2

47

We will maintain three sets of nodes whose membership will change during the run.The first, V1, will be nodes selected to be in the tree.The second, V2, will be candidates at the current step to be added to V1.The third, V3, will be nodes not considered yet.

Page 62: SIMD and Associative Computing Computational Models and Algorithms

62

An Example – Step 0

DE

HI

F CG

BA

86

53

3

2

2

2

1

6

1

4

2

47

V1 nodes will be in red with their selected edges being in red also.V2 nodes will be in light blue with their candidate edges in light blue also.V3 nodes and edges will remain white.

Page 63: SIMD and Associative Computing Computational Models and Algorithms

63

An Example – Step 1

DE

HI

F CG

BA

86

53

3

2

2

2

1

6

1

4

2

47

Select an arbitrary node to place in V1, say A.Put into V2, all nodes incident with A.

Page 64: SIMD and Associative Computing Computational Models and Algorithms

64

An Example – Step 2

DE

HI

F CG

BA

86

53

3

2

2

2

1

6

1

4

2

47

Choose the edge with the smallest weight and put its node, B, into V1. Mark that edge with red also.Retain the other edge-node combinations in the “to be considered” list.

Page 65: SIMD and Associative Computing Computational Models and Algorithms

65

An Example – Step 3

DE

HI

F CG

BA

86

53

3

2

2

2

1

6

1

4

2

47

Add all the nodes incident to B to the “to be considered list”.However, note that AG has weight 3 and BG has weight 6. So, there is no sense of including BG in the list.

Page 66: SIMD and Associative Computing Computational Models and Algorithms

66

An Example – Step 4

DE

HI

F CG

BA

86

53

3

2

2

2

1

6

1

4

2

47

Add the node with the smallest weight that is colored light blue and add it to V1.Note the nodes and edges in red are forming a subgraph which is a tree.

Page 67: SIMD and Associative Computing Computational Models and Algorithms

67

An Example – Step 5

DE

HI

F CG

BA

86

53

3

2

2

2

1

6

1

4

2

47

Update the candidate nodes and edges by including all that are incident to those that are in V1 and colored red.

Page 68: SIMD and Associative Computing Computational Models and Algorithms

68

An Example – Step 6

DE

HI

F CG

BA

86

53

3

2

2

2

1

6

1

4

2

47

Select I as its edge is minimal. Mark node and edge as red.

Page 69: SIMD and Associative Computing Computational Models and Algorithms

69

An Example – Step 7

DE

HI

F CG

BA

86

53

3

2

2

2

1

6

1

4

2

47

Add the new candidate edges.Note that IF has weight 5 while AF has weight 7. Thus, we drop AF from consideration at this time.

Page 70: SIMD and Associative Computing Computational Models and Algorithms

70

An Example – after several more passes, C is added & we have …

DE

HI

F CG

BA

86

53

3

2

2

2

1

6

1

4

2

47

Note that when CH is added, GH is dropped as CH has less weight.Candidate edge BC is also dropped since it would form a back edge between two nodes already in the MST.When there are no more nodes to be considered, i.e. no more in V3, we obtain the final solution.

Page 71: SIMD and Associative Computing Computational Models and Algorithms

71

An Example – the final solution

DE

HI

F CG

BA

86

53

3

2

2

2

1

6

1

4

2

47

The subgraph is clearly a tree – no cycles and connected.The tree spans – i.e. all nodes are included.While not obvious, it can be shown that this algorithm always produces a minimal spanning tree.The algorithm is known as Prim’s Algorithm for MST.

Page 72: SIMD and Associative Computing Computational Models and Algorithms

72

An ASC MST Algorithm vs the Sequential Prim’s MST Algorithm

• First, think about how you would write the program in C or C++.

• The usual solution uses some way of maintaining the sets as lists using pointers or references. – See solutions to MST in Algorithms texts by Baase, et. al. listed

in the posted references.• In ASC, pointers and references are not even supported

as they are not needed and their use is likely to result in inefficient SIMD algorithms

• The ASC algorithm given here basically follows the preceding outline provided for Prim’s MST, using pseudo-code based on the ASC language.

• A pointer to the ASC manual will be posted on the course web site.– The ASC pseudo-code used for algorithms will require using

only a few ASC language commands.

Page 73: SIMD and Associative Computing Computational Models and Algorithms

73

ASC-MST Algorithm Preliminaries

• Next, a “data structure” level presentation of Prim’s algorithm for the MST is given.

• The data structure used is illustrated in the upcoming slides. – This example is from the paper, “ASC: An Associative

Paradigm”, listed in the references and on the class website under the online references.

• There are two types of variables for the ASC model, namely– the parallel variables (i.e., ones for the PEs) – the scalar variables (ie., the ones used by the IS). – Scalar variables are essentially global variables.

• Can replace each with a parallel variable with this scalar value with a vector with each vector entry stored in its PE.

Page 74: SIMD and Associative Computing Computational Models and Algorithms

74

ASC-MST Algorithm Preliminaries (cont.)

• In order to distinguish between them here, the parallel variables names end with a “$” symbol.– This convention is optional and not part of ASC language

• Each step in this algorithm takes constant time. • One MST edge is selected during each pass through the

loop in this algorithm.• Since a spanning tree has n-1 edges, the running time of

this algorithm is O(n) and its cost is O(n 2).– Recall, cost is (running time) (number of processors)

• Since the sequential running time of the Prim MST algorithm is O(n 2) and is time optimal, this parallel implementation is cost optimal.

Page 75: SIMD and Associative Computing Computational Models and Algorithms

75

Graph used for Data Structure

Figure 6 in [Potter, Baker, et. al.]

a

b c

d ef

2 8

96

3

3

4

7

2

Page 76: SIMD and Associative Computing Computational Models and Algorithms

76

MST Algorithm Data Structure for Figure 6 (Data Structure Before Execution)

curr

ent_

best

$

cand

idat

e$

next-node

a

IS

∞∞∞9∞∞f

∞∞363∞e

∞3∞∞4∞d

96∞∞78 c

∞347∞2b

∞∞∞82∞aP

Es

mas

k$

node

$

a$ b$ pare

nt$

rootc$ d$ e$ f$

Page 77: SIMD and Associative Computing Computational Models and Algorithms

77

Shorter Version of Algorithm: ASC-MST-PRIM(root)

1. Initialize candidates to “waiting”

2. If there are any finite values in root’s field,

3. set candidate$ to “yes”

4. set parent$ to root

5. set current_best$ to the values in root’s field

6. set root’s candidate field to “no”

7. Loop while some candidate$ contain “yes”

8. for them

9. restrict mask$ to mindex(current_best$)

10. set next_node to a node identified in the preceding step

11. set its candidate to “no”

12. if the value in their next_node’s field are less than current_best$, then

13. set current_best$ to value in next_node’s field

14. set parent$ to next_node

15. if candidate$ is “waiting” and the value in its next_node’s field is finite

16. set candidate$ to “yes”

17. set parent$ to next_node

18. set current_best to the values in next_node’s field

Page 78: SIMD and Associative Computing Computational Models and Algorithms

78

Comments on ASC-MST Algorithm• The three preceding slides are Figure 6 in [Potter, Baker,

et.al.] IEEE Computer, Nov 1994].• Preceding slide gives a compact, data-structures level

pseudo-code description for this algorithm – Pseudo-code illustrates Potter’s use of pronouns

(e.g., them, its) and possessive nouns.– The mindex function returns the index of a processor

holding the minimal value.– This MST pseudo-code is much shorter and simpler

than data-structure level sequential MST pseudo-codes

• e.g., see one of Baase’s textbooks in website references• Algorithm given in Baase’s books is identical to this parallel

algorithm, except it is for a sequential computer

• Next, a more detailed explanation of the algorithm in preceding slide will be given next.

Page 79: SIMD and Associative Computing Computational Models and Algorithms

79

Algorithm: ASC-MST-PRIM(A more detailed presentation)

• Initially assign any node to root.• All processors set

– candidate$ to “wait”– current-best$ to – the candidate field for the root node to “no”

• All processors whose distance d from their node to root node is finite do– Set their candidate$ field to “yes”– Set their parent$ field to root.– Set current_best$ = d.

Page 80: SIMD and Associative Computing Computational Models and Algorithms

80

Algorithm: ASC-MST-PRIM (cont. 2/3)

• While the candidate field of some processor is “yes”, – Restrict the active processors whose candidate field

is “yes” and (for these processors) do• Compute the minimum value x of current_best$.• Restrict the active processors to those with

current_best$ = x and do– pick an active processor, say node y.

» Set the candidate$ value of node y to “no” – Set the scalar variable next-node to y.

Page 81: SIMD and Associative Computing Computational Models and Algorithms

81

Algorithm: ASC-MST-PRIM (cont. 3/3)

– If the value z in the next_node column of a processor is less than its current_best$ value, then

» Set current_best$ to z. » Set parent$ to next_node

– For all processors, if candidate$ is “waiting” and the distance of its node from next_node y is finite, then

• Set candidate$ to “yes”• Set current_best$ to the distance of its node from

y.• Set parent$ to y

Page 82: SIMD and Associative Computing Computational Models and Algorithms

82

Trace of 1st Pass of MST Algorithm for Figure 6

curr

ent_

best

$

cand

idat

e$

next-node b

a

IS

∞wait∞∞∞9∞∞f

3byes∞∞363∞e

4byes∞3∞∞4∞d

7byes96∞∞78 c

2ano∞347∞2b

no∞∞∞82∞aP

Es

mas

k$

node

$

a$ b$ pare

nt$

rootc$ d$ e$ f$

Page 83: SIMD and Associative Computing Computational Models and Algorithms

ASC Quickhull Algorithm

A Second ASC Algorithm Example

Page 84: SIMD and Associative Computing Computational Models and Algorithms

84

Quickhull Algorithm for ASC• Reference:

– [Maher, Baker, Akl, “An Associative Implementation of Classical Convex Hull Algorithms” ]

• Review of Sequential Quickhull Algorithm– Suffices to find the upper convex hull

of points that are on or above the line

• Select point h so that the area of triangle weh is maximal.

• Proceed recursively with the sets of points on or above the lines and .

whhe

we

h

Page 85: SIMD and Associative Computing Computational Models and Algorithms

85

Previous Illustration

w

e

h

Page 86: SIMD and Associative Computing Computational Models and Algorithms

86

Example for Data Structure

p1, w

p7

p2

P3, e

p4

p5

P6, h

Page 87: SIMD and Associative Computing Computational Models and Algorithms

87

Data Structure for Preceding Example

1p3p162p7

11p3p198p6ctr

1p3p1711p5h

1p3p148p4

11p3p1212IS

0p3p117p20

11p3p131p1

job$hull$righ

t-p

t$

area$name$ left

-pt$

x-co

ord

$

y-co

ord

$

point$

w

ep3

h

PE ma

sk

Page 88: SIMD and Associative Computing Computational Models and Algorithms

88

Algorithms & Assumption

• Basic algorithms exist for the following problems in Euclidean geometry for plane:– Determine whether a third point lies on, above, or

below the line determined by two other points.– Compute the area of a triangle determined by three

points.• Standard Assumption

– Three arbitrary points do not all lie on the same line.

Reference: Introduction to Algorithms by Cormen, Leisterson, Rivest,

(& Stein), McGraw Hill, Chapter on Computational Geometry.

Page 89: SIMD and Associative Computing Computational Models and Algorithms

89

ASC Quickhull Algorithm(Upper Convex Hull)

ASC-Quickhull( planar-point-set )

1. Initialize: ctr = 1, area$ = 0, hull$ = 02. Find the PE with the minimal x-coord$ and let

w be its point$a) Set its hull$ value to 1

3. Find the PE with the PE with maximal x-coord$ and let e be its point$

a) Set its hull$ to 14. All PEs set their left-pt to w and right-pt to e.5. If the point$ for a PE lies above the line

a) Then set its job$ value to 1b) Else set its job$ value to 0

Page 90: SIMD and Associative Computing Computational Models and Algorithms

90

ASC Quickhull Algorithm (cont)

6. Loop while parallel job$ contains a nonzero valuea) The IS makes its active cell those with a maximal

job$ value.b) Each (active) PE computes and stores the area of

triangle (left-pt$, right-pt$, point$ ) in area$c) Find the PE with the maximal area$ and let h be its

point.• Set its hull$ value to 1

d) Each PE whose point$ is above• sets its job$ value to ++ctr• sets its right-pt to h

e) Each PE whose point$ is above• sets its job$ to ++ctr• sets its left-pt to h

f) Each PE with job$ < ctr -2 sets its job$ value to 0

hptleft ,

ptrighth ,

Page 91: SIMD and Associative Computing Computational Models and Algorithms

91

Highest Job Order Assigned to Points Above Lines

1

2

6

7

3

5

4

Page 92: SIMD and Associative Computing Computational Models and Algorithms

92

Order that Triangles are Computed

1

5

7

6

2

3

4

Page 93: SIMD and Associative Computing Computational Models and Algorithms

93

Performance of ASC-Quickhull

Average Case:

• Assume either of the following: – For some integer k>1, on average 1/k of the

points above each line being processed are eliminated each round.

• For example, consider k = 3, as one of three different areas are eliminated each round

– O(lg n) points are on the convex hull.• For randomly generated points, the number of

convex hull points is very close to lg(n) points.

Page 94: SIMD and Associative Computing Computational Models and Algorithms

94

Performance of ASC-Quickhull (cont)

• Either of above assumptions imply the average running time is O(lg n). – For example, each pass through algorithm

loop produces one convex hull point.

• The average cost is O(n lg n)Worst Case:• Running time is O(n).• Cost is O(n2)

Recall: The definition of cost is Cost = (running time) (nr. of processors)