Transcript
Page 1: SIMD, Associative, and  Multi-Associative Computing

SIMD, Associative, and Multi-Associative Computing

Computational Models and Algorithms

Page 2: SIMD, Associative, and  Multi-Associative Computing

2

SIMD Review Remarks• Recall that all active processors of a SIMD

computer must simultaneously access the same memory location.

• These locations can be viewed as components of a vector.

• SIMD machines are sometimes called vector computers [Jordan,et.al.] or processor arrays [Quinn 94,04] based on their ability to execute vector and matrix operations efficiently.

Page 3: SIMD, Associative, and  Multi-Associative Computing

3

SIMD Review Remarks (cont)

• SIMD computers that focus on vector operations usually – support some vector and possibly matrix operations in

hardware, and– limit or provide less support for non-vector type

operations

• The inner loops of some sequential algorithms consist only of performing the same operation on a set of independent data items. – These are easy to parallelize using a SIMD by

assigning each data item to a different processor and having each operation performed simultaneously.

Page 4: SIMD, Associative, and  Multi-Associative Computing

4

SIMD Execution Style

The traditional SIMD (or vector computer, processor array) execution style:

• References: [Quinn 94, pg 62] & [Quinn 2004, pgs 37-43]:

• The sequential processor that broadcasts the commands to the rest of the processors is called the front end or control unit.

• The front end is a general purpose CPU that stores the program and the “scalar data”– I.e., the data that is not manipulated in parallel.

• The front end normally executes the sequential portions of the program.

Page 5: SIMD, Associative, and  Multi-Associative Computing

5

SIMD Execution Style (cont)• Each processing element has a local memory that

can not be directly accessed directly by the host or other processing elements.

• Collectively, the individual memories of the processing elements (PEs) store the vector data that is processed in parallel. – Collective PE memory is called the array memory

• When the front end encounters an instruction whose operand is a vector, it issues a command to the PEs to perform the instruction in parallel.

• Although the PEs execute instructions in parallel, some units can be allowed to skip any particular instruction.

Page 6: SIMD, Associative, and  Multi-Associative Computing

6

Possible Architecture for a Generic SIMD

Page 7: SIMD, Associative, and  Multi-Associative Computing

7

Real SIMD Architectures

• An early SIMD computer designed for vector and matrix processing was the Illiac IV computer built at the University of Illinois. [Jordan et. al., pg 7].

• The CRAY-1 and the Cyber-205 use pipelined arithmetic units to support vector operations and are viewed as pipelined SIMDs ([Jordan, et al, p7], [Quinn 94, pg 61-2], [Quinn 2004, pg37).

Page 8: SIMD, Associative, and  Multi-Associative Computing

8

Real SIMD Architectures• Goodyear Aerospace’s STARAN, MPP, and

ASPRO; Thinking Machine’s CM-1, CM-2, and CM200; ATP’s (or Cambridge Parallel Processing’s) DAP; and MasPar’s MP-1 and MP-2 are examples of SIMD computers.– CM is an acronym for “Connection Machine” and DAP is an

acronym for “Data Array Processor”.– Information on these can be found in parallel architecture books

and also on the web.

• Quinn [1994, pg 63-67] discusses the CM-200 (a smaller & updated CM-2) as well as several of above.

• Professor Batcher at Kent State was the chief architect for the STARAN and the MPP (Massively Parallel Processor) and an advisor for the ASPRO (a very small, second generation STARAN)

Page 9: SIMD, Associative, and  Multi-Associative Computing

9

Today’s SIMDs

• Many SIMDs are being embedded in SISD machines.

• Others are being build as part of hybrid architectures.

• Others are being build as special purpose machines, although some of them could classify as general purpose.

• Much of the recent work with SIMD architectures is proprietary.

Page 10: SIMD, Associative, and  Multi-Associative Computing

10

A Company that Builds an Inexpensive SIMD

• WorldScape is building a COTS SIMD.

• The architecture is changing rapidly as they are in development.

• See http://www.wscapeinc.com/

• There is quite a bit of information about their work on the above site.

Page 11: SIMD, Associative, and  Multi-Associative Computing

11

An Example of a Hybrid SIMD• Embedded Massively Parallel Accelerators

– Fuzion 150: 1536 processors on a single chip

– Other accelerators: Decypher, Biocellerator, GeneMatcher2, Kestrel, SAMBA, P-NAC, Splash-2, BioScan

(This and the next two slides are due to Prabhakar R. Gudla (U of Maryland) at a CMSC 838T Presentation, 4/23/2003.)

– Systola 1024: PC add-on board with 1024 processors

Page 12: SIMD, Associative, and  Multi-Associative Computing

12

Hybrid Architecture

High speed Myrinet switchHigh speed Myrinet switch

Systola1024

Systola1024

Systola1024

Systola1024

Systola1024

Systola1024

Systola1024

Systola1024

Systola1024

Systola1024

Systola1024

Systola1024

Systola1024

Systola1024

Systola1024

Systola1024

– combines SIMD and MIMD paradigm within a parallel architecture Hybrid ComputerHybrid Computer

Page 13: SIMD, Associative, and  Multi-Associative Computing

13

SIMDs Embedded in SISDs• Intel's Pentium 4 includes what they call MMX

technology to gain a significant performance boost

• IBM and Motorola incorporated the technology into their G4 PowerPC chip in what they call their Velocity Engine.

• Both MMX technology and the Velocity Engine are the chip manufacturer's name for their proprietary SIMD processors and parallel extensions to their operating code.

• This same approach is used by NVidia and Evans & Sutherland to dramatically accelerate graphics rendering.

Page 14: SIMD, Associative, and  Multi-Associative Computing

14

Special Purpose SIMDs in the

Bioinformatics Market • Paracel, Inc. (acquired by Celera

Genomics for $283 million in March of 2000) – Paracel's systems are based on a proprietary

SIMD processor packaged as an integrated system with proprietary software algorithms.

– One of their machines is called GeneMatcher.

• TimeLogic, Inc – Has DeCypher, a reconfigurable SIMD.

Page 15: SIMD, Associative, and  Multi-Associative Computing

15

Associative Computing Topics

• Introduction– References for Associative Computing– Motivation for the MASC model– The MASC and ASC Models– A Language Designed for the ASC Model– Two ASC Algorithms and Programs

• ASC and MASC Algorithm Examples– ASC version of Prim’s MST Algorithm– ASC version of QUICKHULL – MASC version of QUICKHULL.

Page 16: SIMD, Associative, and  Multi-Associative Computing

16

Associative Computing References

Note: Below KSU papers are available on the website: http://www.cs.kent.edu/~parallel/

(Click on the link to “papers”)

• Maher Atwah, Johnnie Baker, and Selim Akl, An Associative Implementation of Classical Convex Hull Algorithms, Proc of the IASTED International Conference on Parallel and Distributed Computing and Systems, 1996, 435-438

• Johnnie Baker and Mingxian Jin, Simulation of Enhanced Meshes with MASC, a MSIMD Model, Proc. of the Eleventh IASTED International Conference on Parallel and Distributed Computing and Systems, Nov. 1999, 511-516.

Page 17: SIMD, Associative, and  Multi-Associative Computing

17

Associative Computing References

• Mingxian Jin, Johnnie Baker, and Kenneth Batcher, Timings for Associative Operations on the MASC Model, Proc. of the 15th International Parallel and Distributed Processing Symposium, (Workshop on Massively Parallel Processing, San Francisco, April 2001.

• Jerry Potter, Johnnie Baker, Stephen Scott, Arvind Bansal, Chokchai Leangsuksun, and Chandra Asthagiri, An Associative Computing Paradigm, Special Issue on Associative Processing, IEEE Computer, 27(11):19-25, Nov. 1994. (Note: MASC is called ‘ASC’ in this article.)– First reading assignment

• Jerry Potter, Associative Computing - A Programming Paradigm for Massively Parallel Computers, Plenum Publishing Company, 1992.

Page 18: SIMD, Associative, and  Multi-Associative Computing

18

Associative Computers

Associative Computer: A SIMD computer with a few additional features supported in hardware.

• These additional features can be supported (less efficiently) in traditional SIMDs in software.

• The name “associative” is due to its ability to locate items in the memory of PEs by content rather than location.

Page 19: SIMD, Associative, and  Multi-Associative Computing

19

Associative Models

The ASC model (for ASsociative Computing) gives a list of the properties assumed for an associative computer.

The MASC (for Multiple ASC) Model• Supports multiple SIMD (or MSIMD)

computation. • Allows model to have more than one Instruction

Stream (IS)– The IS corresponds to the control unit of a SIMD.

• ASC is the MASC model with only one IS. – The one IS version of the MASC model is sufficiently

important to have its own name.

Page 20: SIMD, Associative, and  Multi-Associative Computing

20

ASC & MASC are KSU Models• Several professors and their graduate students

at Kent State University have worked on models • The STARAN and the ASPRO fully support the

ASC model in hardware. The MPP supports it partly in hardware and partly in software.– Prof. Batcher was chief architect or consultant

• Dr. Potter developed a language for ASC• Dr. Baker works on algorithms for models and

architectures to support models• Dr. Walker is working with the hardware design

of the machine. • Dr. Batcher and Dr. Potter are currently advisors

Page 21: SIMD, Associative, and  Multi-Associative Computing

21

Motivation• The STARAN Computer (Goodyear Aerospace,

early 1970’s) and later the ASPRO provided an architectural model for associative computing embodied in the ASC model.

• ASC extends the data parallel programming style to a complete computational model.

• ASC provides a practical model that supports massive parallelism.

• MASC provides a hybrid data-parallel, control parallel model that supports associative programming.

• Descriptions of these models allow them to be compared to other parallel models

Page 22: SIMD, Associative, and  Multi-Associative Computing

22

The ASC Model

IS

CELL

NETWORK

PEMemory

Cells

PEMemory

PEMemory

Page 23: SIMD, Associative, and  Multi-Associative Computing

23

Basic Properties of ASC• Instruction Stream

– The IS has a copy of the program and can broadcast instructions to cells in unit time

• Cell Properties– Each cell consists of a PE and its local memory– All cells listen to the IS – A cell can be active, inactive, or idle

• Inactive cells listen but do not execute IS commands until reactivated

• Idle cells contain no essential data and are available for reassignment

• Active cells execute IS commands synchronously

Page 24: SIMD, Associative, and  Multi-Associative Computing

24

Basic Properties of ASC

• Responder Processing– The IS can detect if a data test is satisfied by

any of its responder cells in constant time (i.e., any-responders?).

– The IS can select an arbitrary responder in constant time (i.e., pick-one).

Page 25: SIMD, Associative, and  Multi-Associative Computing

25

• Constant Time Global Operations (across PEs)– Logical OR and AND of binary values– Maximum and minimum of numbers– Associative searches

• Communications– There are at least two real or virtual networks

• PE communications (or cell) network • IS broadcast/reduction network (which could be

implemented as two separate networks)

Basic Properties of ASC

Page 26: SIMD, Associative, and  Multi-Associative Computing

26

Basic Properties of ASC

– The PE communications network is normally supported by an interconnection network

• E.g., a 2D mesh

– The broadcast/reduction network(s) are normally supported by a broadcast and a reduction network (sometimes combined).

• See posted paper by Jin, Baker, & Batcher (listed in associative references)

• Control Features– PEs and the IS and the networks all operate

synchronously, using the same clock

Page 27: SIMD, Associative, and  Multi-Associative Computing

27

Non-SIMD Properties of ASC

• Observation: The ASC properties that are unusual for SIMDs are the constant time operations:– Constant time responder processing

• Any-responders?• Pick-one

– Constant time global operations• Logical OR and AND of binary values• Maximum and minimum value of numbers• Associative Searches

• These timings are justified by implementations using a resolver in the paper by Jin, Baker, & Batcher (listed in associative references and posted).

Page 28: SIMD, Associative, and  Multi-Associative Computing

28

Dodge

Ford

Ford

Make

Subaru

Color

PE1

PE2

PE3

PE4

PE5

PE6

PE7

red

blue

white

red

Year

1994

1996

1998

1997

Model PriceOnlot

1

1

0

0

0

0

1

Busy-idle

1

0

1

1

0

0

1

IS

Typical Data Structure for ASC Model

Make, Color – etc. are fields the programmer establishes

Various data types are supported. Some examples will show string data, but they are not supported in the ASC simulator.

Page 29: SIMD, Associative, and  Multi-Associative Computing

29

Dodge

Ford

Ford

Make

Subaru

Color

PE1

PE2

PE3

PE4

PE5

PE6

PE7

red

blue

white

red

Year

1994

1996

1998

1997

Model PriceOnlot

1

1

0

0

0

0

1

Busy-idle

1

0

1

1

0

0

1

IS

The Associative Search

IS asks for all cars that are red and on the lot.

PE1 and PE7 respond by setting a mask bit in their PE.

Page 30: SIMD, Associative, and  Multi-Associative Computing

30

MASC Model• Basic Components

– An array of cells, each consisting of a PE and its local memory

– A PE interconnection network between the cells

– One or more Instruction Streams (ISs)

– An IS network

• MASC is a MSIMD model that supports – both data and control

parallelism– associative

programming

Memory

Memory

Memory

Memory

Memory

Memory

Memory

Memory

PE

Inte

rcon

nect

ion

Net

wor

k

IS N

etw

ork

PE

PE

PE

PE

PE

PE

PE

PE

Instruc-tion

Stream(IS)

Instruc-tion

Stream(IS)

Instruc-tion

Stream(IS)

Page 31: SIMD, Associative, and  Multi-Associative Computing

31

MASC Basic Properties• Each cell can listen to only one IS

• Cells can switch ISs in unit time, based on the results of a data test.

• Each IS and the cells listening to it follow rules of the ASC model.

• Control Features:– The PEs, ISs, and networks all operate

synchronously, using the same clock– Restricted job control parallelism is used to

coordinate the interaction of the multiple ISs.

Page 32: SIMD, Associative, and  Multi-Associative Computing

32

Characteristics of Associative

Programming • Consistent use of style of programming called

data parallel programming• Consistent use of global associative searching

and responder processing• Usually, frequent use of the constant time global

reduction operations: AND, OR, MAX, MIN• Broadcast of data using IS bus allows the use of

the PE network to be restricted to parallel data movement.

Page 33: SIMD, Associative, and  Multi-Associative Computing

33

Characteristics of Associative Programming

• Tabular representation of data – think 2D arrays• Use of searching instead of sorting• Use of searching instead of pointers• Use of searching instead of the ordering provided

by linked lists, stacks, queues• Promotes an highly intuitive programming style

that promotes high productivity• Uses structure codes (i.e., numeric

representation) to represent data structures such as trees, graphs, embedded lists, and matrices.

• We’ll see examples of the above.– Ref: Nov. 1994 IEEE Computer article.– Also, see “Associative Computing” book by Potter.

Page 34: SIMD, Associative, and  Multi-Associative Computing

34

Languages Designed for the ASC

• Professor Potter has created several languages for the ASC model.

• ASC is a C-like language designed for ASC model• ACE is a higher level language that uses natural

language syntax; e.g., plurals, pronouns.• Anglish is an ACE variant that uses an English-like

grammar (e.g., “their”, “its”) • An OOPs version of ASC for the MASC was discussed

(by Potter and his students), but never designed.• Language References:

– ASC Primer – Copy available on parallel lab website www.cs.kent.edu/~parallel/

– “Associative Computing” book by Potter [11] – some features in this book were never fully implemented in ASC Compiler

Page 35: SIMD, Associative, and  Multi-Associative Computing

35

Algorithms and Programs Implemented in ASC

• A wide range of algorithms implemented in ASC without the use of the PE network:– Graph Algorithms

• minimal spanning tree• shortest path• connected components

– Computational Geometry Algorithms• convex hull algorithms (Jarvis March, Quickhull,

Graham Scan, etc)• Dynamic hull algorithms

Page 36: SIMD, Associative, and  Multi-Associative Computing

36

ASC Algorithms and Programs(not requiring PE network)

– String Matching Algorithms• all exact substring matches• all exact matches with “don’t care” (i.e., wild card)

characters.

– Algorithms for NP-complete problems• traveling salesperson • 2-D knapsack.

– Data Base Management Software• associative data base• relational data base

Page 37: SIMD, Associative, and  Multi-Associative Computing

37

ASC Algorithms and Programs (not requiring a PE network)

– A Two Pass Compiler for ASC – not the one we will be using. This compiler uses ASC parallelism.

• first pass• optimization phase

– Two Rule-Based Inference Engines for AI• An Expert System OPS-5 interpreter• PPL (Parallel Production Language interpreter)

– A Context Sensitive Language Interpreter• (OPS-5 variables force context sensitivity)

– An associative PROLOG interpreter

Page 38: SIMD, Associative, and  Multi-Associative Computing

38

Associative Algorithms & Programs

(using a network)• There are numerous associative programs that

use a PE network;– 2-D Knapsack ASCAlgorithm using a 1-D mesh– Image processing algorithms using 1-D mesh– FFT (Fast Fourier Transform) using 1-D nearest

neighbor & Flip networks– Matrix Multiplication using 1-D mesh– An Air Traffic Control Program (using Flip network

connecting PEs to memory)• Demonstrated using live data at Knoxville in mid

70’s.

• All but first were developed in assembler at Goodyear Aerospace

Page 39: SIMD, Associative, and  Multi-Associative Computing

39

Example 1 - MST

• A graph has nodes labeled by some identifying letter or number and arcs which are directional and have weights associated with them.

• Such a graph could represent a map where the nodes are cities and the arc weights give the mileage between two cities.

A B

C D

E

3

5 2

54

Page 40: SIMD, Associative, and  Multi-Associative Computing

40

The MST Problem

• The MST problem assumes the weights are positive, the graph is connected, and seeks to find the minimal spanning tree,

– i.e. a subgraph that is a tree1, that includes all nodes (i.e. it spans), and

– where the sum of the weights on the arcs of the subgraph is the smallest possible weight (i.e. it is minimal).

• Why would an algorithm solving this problem be useful?

• Note: The solution may not be unique.1 A tree is a set of points called vertices, pairs of distinct

vertices called edges, such that (1) there is a sequence of edges called a path from any vertex to any other, and (2) there are no circuits, that is, no paths starting from a vertex and returning to the same vertex.

Page 41: SIMD, Associative, and  Multi-Associative Computing

41

An Example

DE

HI

F CG

BA

86

53

3

2

2

2

1

6

1

4

2

47

As we will see, the algorithm is simple.

The ASC program is quite easy to write.

A SISD solution is a bit messy because of the data structures needed to hold the data for the problem

Page 42: SIMD, Associative, and  Multi-Associative Computing

42

An Example – Step 0

DE

HI

F CG

BA

86

53

3

2

2

2

1

6

1

4

2

47

We will maintain three sets of nodes whose membership will change during the run.

The first, V1, will be nodes selected to be in the tree.

The second, V2, will be candidates at the current step to be added to V1.

The third, V3, will be nodes not considered yet.

Page 43: SIMD, Associative, and  Multi-Associative Computing

43

An Example – Step 0

DE

HI

F CG

BA

86

53

3

2

2

2

1

6

1

4

2

47

V1 nodes will be in red with their selected edges being in red also.

V2 nodes will be in light blue with their candidate edges in light blue also.

V3 nodes and edges will remain white.

Page 44: SIMD, Associative, and  Multi-Associative Computing

44

An Example – Step 1

DE

HI

F CG

BA

86

53

3

2

2

2

1

6

1

4

2

47

Select an arbitrary node to place in V1, say A.

Put into V2, all nodes incident with A.

Page 45: SIMD, Associative, and  Multi-Associative Computing

45

An Example – Step 2

DE

HI

F CG

BA

86

53

3

2

2

2

1

6

1

4

2

47

Choose the edge with the smallest weight and put its node, B, into V1. Mark that edge with red also.

Retain the other edge-node combinations in the “to be considered” list.

Page 46: SIMD, Associative, and  Multi-Associative Computing

46

An Example – Step 3

DE

HI

F CG

BA

86

53

3

2

2

2

1

6

1

4

2

47

Add all the nodes incident to B to the “to be considered list”.

However, note that AG has weight 3 and BG has weight 6. So, there is no sense of including BG in the list.

Page 47: SIMD, Associative, and  Multi-Associative Computing

47

An Example – Step 4

DE

HI

F CG

BA

86

53

3

2

2

2

1

6

1

4

2

47

Add the node with the smallest weight that is colored light blue and add it to V1.

Note the nodes and edges in red are forming a subgraph which is a tree.

Page 48: SIMD, Associative, and  Multi-Associative Computing

48

An Example – Step 5

DE

HI

F CG

BA

86

53

3

2

2

2

1

6

1

4

2

47

Update the candidate nodes and edges by including all that are incident to those that are in V1 and colored red.

Page 49: SIMD, Associative, and  Multi-Associative Computing

49

An Example – Step 6

DE

HI

F CG

BA

86

53

3

2

2

2

1

6

1

4

2

47

Select I as its edge is minimal. Mark node and edge as red.

Page 50: SIMD, Associative, and  Multi-Associative Computing

50

An Example – Step 7

DE

HI

F CG

BA

86

53

3

2

2

2

1

6

1

4

2

47

Add the new candidate edges.

Note that IF has weight 5 while AF has weight 7. Thus, we drop AF from consideration at this time.

Page 51: SIMD, Associative, and  Multi-Associative Computing

51

An Example – after several more passes we have …

DE

HI

F CG

BA

86

53

3

2

2

2

1

6

1

4

2

47

Note that when CH is added, GH is dropped as CH has less weight.

Also, BC is dropped for the same type of reasoning (i.e., it would form a back edge between two nodes already in the MST).

When there are no more nodes to be considered, i.e. no more in V3, we obtain the final solution.

Page 52: SIMD, Associative, and  Multi-Associative Computing

52

An Example – the final solution

DE

HI

F CG

BA

86

53

3

2

2

2

1

6

1

4

2

47

The subgraph is clearly a tree – no cycles and connected.

The tree spans – i.e. all nodes are included.

While not obvious, it can be shown that this algorithm always produces a minimal spanning tree.

The algorithm is known as Prim’s Algorithm for MST.

Page 53: SIMD, Associative, and  Multi-Associative Computing

53

The ASC Program vs a SISD solution in , say, C, C++, or Java

• First, think about how you would write the program in C or C++.

• The usual solution uses some way of maintaining the sets as lists using pointers or references. – See solutions to MST in Algorithms texts by Baase listed in the

posted references.• In ASC, pointers and references are not even supported

as they are not needed and their use is likely to result in inefficient SIMD algorithms

• The implementation of MST in ASC, basically follows the outline I provided to the problem, but first, we need to learn something about the language ASC.

• The ASC manual (or a pointer to it) will be posted on the course web site.

Page 54: SIMD, Associative, and  Multi-Associative Computing

54

ASC-MST Algorithm Preliminaries

• Next, a “data structure” level presentation of Prim’s algorithm for the MST is given.

• The data structure used is illustrated in the next two slides. – This example is from the Nov. 1994 IEEE Computer

paper cited in the references.

• There are two types of variables for the ASC model, namely– the parallel variables (i.e., ones for the PEs) – the scalar variables (ie., the ones used by the IS). – Scalar variables are essentially global variables.

• Can replace each with a parallel variable with this scalar value stored in each entry.

Page 55: SIMD, Associative, and  Multi-Associative Computing

55

ASC-MST Algorithm Preliminaries (cont.)

• In order to distinguish between them here, the parallel variables names end with a “$” symbol.

• Each step in this algorithm takes constant time. • One MST edge is selected during each pass through the

loop in this algorithm.• Since a spanning tree has n-1 edges, the running time of

this algorithm is O(n) and its cost is O(n 2).– Definition of cost is (running time) (number of processors)

• Since the sequential running time of the Prim MST algorithm is O(n 2) and is time optimal, this parallel implementation is cost optimal.– Cost & optimality will be covered in parallel algorithm

performance evaluation chapter (See Ch 7 of Quinn)

Page 56: SIMD, Associative, and  Multi-Associative Computing

56

Graph used for Data Structure

Figure 6 in [Potter, Baker, et. al.]

a

b c

d ef

2 8

96

3

3

4

7

2

Page 57: SIMD, Associative, and  Multi-Associative Computing

57

Data Structure for MST Algorithm

curr

ent_

best

$

cand

idat

e$

next-node b

a

IS

wait∞∞∞9∞∞f

3byes∞∞363∞e

4byes∞3∞∞4∞d

7byes96∞∞78 c

2ano∞347∞2b

no∞∞∞82∞aP

Es

mas

k$

node

$

a$ b$ pare

nt$

rootc$ d$ e$ f$

Page 58: SIMD, Associative, and  Multi-Associative Computing

58

Algorithm: ASC-MST-PRIM(root)1. Initialize candidates to “waiting”

2. If there are any finite values in root’s field,

3. set candidate$ to “yes”

4. set parent$ to root

5. set current_best$ to the values in root’s field

6. set root’s candidate field to “no”

7. Loop while some candidate$ contain “yes”

8. for them

9. restrict mask$ to mindex(current_best$)

10. set next_node to a node identified in the preceding step

11. set its candidate to “no”

12. if the value in their next_node’s field are less than current_best$, then

13. set current_best$ to value in next_node’s field

14. set parent$ to next_node

15. if candidate$ is “waiting” and the value in its next_node’s field is finite

16. set candidate$ to “yes”

17. set parent$ to next_node

18. set current_best to the values in next_node’s field

Page 59: SIMD, Associative, and  Multi-Associative Computing

59

Comments on ASC-MST Algorithm• The three preceding slides are Figure 6 in [Potter, Baker,

et.al.] IEEE Computer, Nov 1994].• Figure 6c gives a compact, data-structures level

pseudo-code description for this algorithm – Pseudo-code illustrates Potter’s use of pronouns

(e.g., them) and possessive nouns.– The mindex function returns the index of a processor

holding the minimal value.– This MST pseudo-code is much shorter and simpler

than data-structure level sequential MST pseudo-codes

• e.g., see one of Baase’s textbook cited in references• Algorithm given in Baase’s books is essentially the same as

this parallel algorithm

• Next, a more detailed explanation of the algorithm in preceding slide will be given.

Page 60: SIMD, Associative, and  Multi-Associative Computing

60

Algorithm: ASC-MSP-PRIM• Initially assign any node to root.• All processors set

– candidate$ to “waiting”– current-best$ to – the candidate field for the root node to “no”

• All processors whose distance d from their node to root node is finite do– Set their candidate$ field to “yes– Set their parent$ field to root.– Set current_best$ = d.

Page 61: SIMD, Associative, and  Multi-Associative Computing

61

Algorithm: ASC-MSP-PRIM (cont. 2/3)

• While the candidate field of some processor is “yes”, – Restrict the active processors whose candidate field

is “yes” and (for these processors) do• Compute the minimum value x of current_best$.• Restrict the active processors to those with

current_best$ = x and do– pick an active processor, say one that contains

node y.» Set the candidate$ value of node y to “no”

– Set the scalar variable next-node to y.

Page 62: SIMD, Associative, and  Multi-Associative Computing

62

Algorithm: ASC-MSP-PRIM (cont. 3/3)

– If the value z in the next_node column of a processor is less than its current_best$ value, then

» Set current_best$ to z. » Set parent$ to next_node

• For all processors, if candidate$ is “waiting” and the distance of its node from next_node is not , then – Set candidate$ to “yes”– Set current_best$ to the distance of its node from

next_node.– Set parent$ to next-node

Page 63: SIMD, Associative, and  Multi-Associative Computing

63

Quickhull Algorithm for ASC• Reference:

– [Maher, Baker, Akl, “An Associative Implementation of Classical Convex Hull Algorithms” ]

• Review of Sequential Quickhull Algorithm– Suffices to find the upper convex hull

of points that are on or above the line

• Select point h so that the area of triangle weh is maximal.

• Proceed recursively with the sets of points on or above the lines and .

whhe

we

h

Page 64: SIMD, Associative, and  Multi-Associative Computing

64

Previous Illustration

w

e

h

Page 65: SIMD, Associative, and  Multi-Associative Computing

65

Example for Data Structure

p1, w

p7

p2

P3, e

p4

p5

P6, h

Page 66: SIMD, Associative, and  Multi-Associative Computing

66

Data Structure for Preceding Example

1p3p162p7

11p3p198p6ctr

1p3p1711p5h

1p3p148p4

11p3p1212IS

0p3p117p20

11p3p131p1

job$hull$righ

t-p

t$

area$name$ left

-poi

nt$

x-co

ord

$

y-co

ord

$

point$

w

ep3

h

PE ma

sk

Page 67: SIMD, Associative, and  Multi-Associative Computing

67

ASC Quickhull Algorithm(Upper Convex Hull)

ASC-Quickhull( planar-point-set )

1. Initialize: ctr = 1, area$ = 0, hull$ = 02. Find the PE with the minimal x-coord$ and let w be its

point$a) Set its hull$ value to 1

3. Find the PE with the PE with maximal x-coord$ and let e be its point$

a) Set its hull$ to 1

4. All PEs set their left-pt to w and right-pt to e.5. If the point$ for a PE lies above the line

a) Then set its job$ value to 1b) Else set its job$ value to 0

Page 68: SIMD, Associative, and  Multi-Associative Computing

68

ASC Quickhull Algorithm (cont)

6. Loop while parallel job$ contains a nonzero valuea) The IS makes its active cell those with a maximal job$ value.b) Each (active) PE computes and stores the area of triangle

(left-pt$, right-pt$, point$ ) in area$c) Find the PE with the maximal area$ and let h be its point.

• Set its hull$ value to 1

d) Each PE whose point$ is above sets its job$ value to ++ctr

e) Each PE whose point$ is above sets its job$ to ++ctr

f) Each PE with job$ < ctr -2 sets its job$ value to 0

hptleft ,

ptrighth ,

Page 69: SIMD, Associative, and  Multi-Associative Computing

69

Performance of ASC-Quickhull

0

4

6

5

1

2

3

Page 70: SIMD, Associative, and  Multi-Associative Computing

70

Performance of ASC-Quickhull (cont)

Average Case:• Assume

– Roughly 1/3 of the points above each line being processed are eliminated.

– O(lg n) points are on the convex hull.• Shown to be true for randomly generated points

• Then the average running time is O(lg n)• The average cost is O(n lg n)Worst Case:• Running time is O(n).• Cost is O(n2)

– Definition of cost is (running time) (nr. of processors)

Page 71: SIMD, Associative, and  Multi-Associative Computing

71

MASC Quickhull AlgorithmAlgorithm:• Use IS1 to execute the first loop of ASC-

Quickhull• When an IS completes computing the loop in

ASC-Quickhull, • Idle ISs request problems from busy ISs who

have inactive jobs on their job$ list.• Control of the PEs for an inactive job is

transferred to the idle IS. The control of these PEs is returned to original IS after the job is finished.

Page 72: SIMD, Associative, and  Multi-Associative Computing

72

ASC Quickhull Algorithm (cont)

0

1

2

2

1

2

2

Page 73: SIMD, Associative, and  Multi-Associative Computing

73

Analysis for MASC Quickhull

Average Case:• Assumptions:

– roughly 1/3 of the points above each line being processed are eliminated.

– O(lg n) Instruction Streams are available.– There are O(lg n) convex hull points

• The average running time is O(lg lg n)• Essentially constant time for real world

problems.

Worst Case• O(n)

Page 74: SIMD, Associative, and  Multi-Associative Computing

74

MASC Quickhull for a Limited Number of ISs

• A manager IS is used to control the interactions of the ISs and the task workpool.

• The manager assigns IS1 to execute the first loop of ASC-Quickhull

• When an IS completes the execution of a loop,– If two jobs are created, it gives one to the manager IS

to place on the workpool and then executes the remaining IS

– If only one job is created, it executes this job next.– If no new job is created, this IS requests a new job

from the manager IS.

Page 75: SIMD, Associative, and  Multi-Associative Computing

75

Additional Comments on MASC Quickhull

• For one million points this algorithm would require lg n = 20. – Note that increasing the ISs by only 5 (to 25) would allow 33.5

million points to be processed.

• Even if (lg n) ISs are available for this algorithm, the actual number of ISs would likely to still be less than lg n.– It would be inefficient to assume that every time a new task is

created, an idle IS would be available to execute it.

• However, this algorithm should also provide a speedup, even if only a small number k of ISs are available. – The complexity of the running time will still be O(lg n).– The actual running time could be up to k times faster than for

one IS.• There will be some loss of efficiency due to IS interactions.

– This is probably a more practical approach.

Page 76: SIMD, Associative, and  Multi-Associative Computing

76

Additional Comments on ASC and MASC Algorithms

• The full “convex hull” algorithm requires that an order (e.g., clockwise) list of convex hull points be returned.– Preceding algorithms for ASC and MASC can be

extended to handle this.

• This detail is omitted here to keep the algorithms simpler.– More information can be found in the paper “An

Associative Implementation of Classical Convex Hull Algorithms” by Atwah, Baker, and Akl and in Maher Atwah’s master’s thesis at KSU.


Top Related