simd and associative computational models part ii: associative models

105
SIMD and Associative Computational Models Part II: Associative Models

Post on 22-Dec-2015

260 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: SIMD and Associative Computational Models Part II: Associative Models

SIMD and Associative Computational Models

Part II: Associative Models

Page 2: SIMD and Associative Computational Models Part II: Associative Models

Associative Models Topics• Introduction

– References for Associative Computing Models– Motivation for the MASC model– The MASC and ASC Models– Programming characteristics for ASC Model– Overview of existing ASC algorithms and programs

• ASC Algorithms– Minimal Spanning Tree– Perhaps Graham Scan– String-Matching (Shannon Steinfadt)

• MASC simulations with other models (esp. MMB)• Timing Justifications for Associative Operations• A proposed IS control system for MASC (Wittaya

Chantam)

Page 3: SIMD and Associative Computational Models Part II: Associative Models

Slides that Overlap with PDC

• Some slides in the Introduction will overlap with PDC

• Some of these slides are essential for this course.

• Other slides are included to give you a broader understanding of related information

• On overlaps slides, if they are not central to PDA course, then they will be covered fairly quickly.

• Students who have not had PDC course should plan to spend more time studying these slides.

Page 4: SIMD and Associative Computational Models Part II: Associative Models

Associative Computing References(listed in order of expected use)

• Jerry Potter, Johnnie Baker, et. al., Associative Computing - A Programming Paradigm for Massively Parallel Computers, Plenum Publishing Company, 1992.– Initial Assignment – Read material related to slides

• Maher M. Atwah, Johnnie W. Baker, and Selim Akl, An Associative Implementation of Graham's Convex Hull Algorithm,, IASTED International Conference on Parallel and Distributed Computing and Systems, October 1995.

• Mary Esenwein and Johnnie Baker, VLCD String Matching for Associative Computing and Multiple Broadcast Mesh, IASTED International Conference on Parallel and Distributed Computing and Systems, 1997.

• Johnnie Baker and Mingxian Jin, Simulation of Enhanced Meshes with MASC, a MSIMD Model, Proc. of the Eleventh IASTED International Conference on Parallel and Distributed Computing and Systems, Nov. 1999, 511-516.

Page 5: SIMD and Associative Computational Models Part II: Associative Models

Associative Computing References (cont.)• Mingxian Jin & Johnnie Baker, Relating the Power of the MASC

Model with that of Reconfigurable Bus-Based Models, International Parallel and Distributed Processing Symposium (APDCM Workshop), April 2007.

• Mingxian Jin, Wittaya Chantamas, Johnnie Baker, forthcoming paper on Comparing the Power of the MASC model to Reconfigurable Bus-Based Models. 2007-8.

• Mingxian Jin, Johnnie Baker, and Kenneth Batcher, Timings for Associative Operations on the MASC Model, International Parallel and Distributed Processing Symposium, IEEE Workshop on Massively Parallel Processing San Francisco, April 2001

• Wittaya Chantamas, Johnnie Baker, and Michael Scherger, Compiler Extension of the ASC Language to Support Multiple Instruction Streams in the MASC Model using the Manager-Worker Paradigm, International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA’06), June 2006.

• Wittaya Chantamas and Johnnie Baker, A Multiple Associative Model to Support Branches in Data Parallel Applications using the Manager-Worker Paradigm, International Parallel and Distributed Processing Symposium (WMPP Workshop), April 2005.

Page 6: SIMD and Associative Computational Models Part II: Associative Models

WEBSITE FOR PAPERS

http://www.cs.kent.edu/~parallel

Follow the pointer to “papers”

Note: The preceding papers and many others on ASC and MASC are available at the above website.

Page 7: SIMD and Associative Computational Models Part II: Associative Models

Introduction to Associative Models ASC and MASC

Page 8: SIMD and Associative Computational Models Part II: Associative Models

ASC & MASC are KSU Models• Several professors and their graduate students

at Kent State University have worked on models • The STARAN and the ASPRO fully support the

ASC model in hardware. The MPP supports it partly in hardware and partly in software.– Prof. Batcher was chief architect or consultant

• Dr. Potter developed a language for ASC• Dr. Baker works on algorithms for models and

architectures to support models• Dr. Walker is working with the hardware design

of the machine. • Dr. Batcher and Dr. Potter are currently advisors

Page 9: SIMD and Associative Computational Models Part II: Associative Models

Motivation• The STARAN Computer (Goodyear Aerospace,

early 1970’s) and later the ASPRO provided an architectural model for associative computing embodied in the ASC model.

• ASC extends the data parallel programming style to a complete computational model.

• ASC provides a practical model that easily supports massive parallelism.

• MASC provides a hybrid data-parallel, control parallel model that supports associative programming.

• Descriptions of these models allow them to be compared to other parallel models

Page 10: SIMD and Associative Computational Models Part II: Associative Models

Parallel and Associative Computing Lab

Parallel and AssociativeResearch Group

Associative Models of Computation

Parallel RuntimeEnvironments Parallel and Associative

System Software

Parallel and AssociativeApplications

Associative and Parallel Algorithms

Page 11: SIMD and Associative Computational Models Part II: Associative Models

Associative Models of Computation

Parallel RuntimeEnvironments

Parallel and AssociativeSystem Software

Parallel and AssociativeApplications

Associative and Parallel Algorithms

Parallel and AssociativeResearch Group

ASC ProcessorResearch Group

FPGA-BasedASC Processor

MASCProcessor

Structure Codes,ASC-centric

Implementations

Pipelined ASCw/ Reconfigurable

Network

MultithreadedASC Processor

Page 12: SIMD and Associative Computational Models Part II: Associative Models

The ASC (Associative Computing) Model

Computers supporting ASC model in hardware are Goodyear Aerospace’s STARAN

USN ASPRO

CELL

NETWORK

Memory

CELLS

PE Memory

PE Memory

IS PE

Instruction Stream

Page 13: SIMD and Associative Computational Models Part II: Associative Models

The ASC Model of Computation

• The ASC name is derived from ASsociative Computing)

• ASC Model: A SIMD model with certain additional constant time features.– Constant time features identified on next slide– These constant time features can be supported (less

efficiently) in software by a traditional SIMD– The name “associative” is due to its ability to locate

items in the memory of PEs by content rather than location.

• Uses associative features to simulate an associative memory• Does not have an associative memory

Page 14: SIMD and Associative Computational Models Part II: Associative Models

The ASC Constant Time Properties• Broadcast data in constant time• Constant time global reduction of

– Boolean values using AND/OR– Integer values using MAX/MIN

• Constant time associative search (see next slide)• Responder processing

– An IS can detect if a data test is satisfied by any of its cells in constant time (i.e., any-responders)

– An IS can select one arbitrary responder in constant time (i.e., pick-one)

• Above properties can be supported in hardware with broadcast and reduction networks (see [2] below).

• References:1. Potter, Baker, Scott, Bansal, Leangsuksun, Asthagiri, ASC: An

Associative Computing Paradigm. IEEE Computer, Nov. 1994, 19-252. M. Jin, J. Baker, and K. Batcher, Timings of Associative Operations

on the MASC model, Workshop of Massively Parallel Processing, IPDPS ’01.

Page 15: SIMD and Associative Computational Models Part II: Associative Models

Dodge

Ford

Ford

Make

Subaru

Color

PE1

PE2

PE3

PE4

PE5

PE6

PE7

red

blue

white

red

Year

1994

1996

1998

1997

Model PriceOnlot

1

1

0

0

0

0

1

Busy-idle

1

0

1

1

0

0

1

IS

The Associative Search

IS asks for all cars that are red and on the lot.

PE1 and PE7 respond by setting a mask bit in their PE.

Page 16: SIMD and Associative Computational Models Part II: Associative Models

Detailed List of ASC Properties• Instruction Stream

– The IS has a copy of the program and can broadcast instructions to cells in unit time

• Cell Properties– Each cell consists of a PE and its local memory– All cells listen to the IS – A cell can be active, inactive, or idle

• Inactive cells listen but do not execute IS commands until reactivated

• Idle cells contain no essential data and are available for reassignment

• Active cells execute IS commands synchronously

Page 17: SIMD and Associative Computational Models Part II: Associative Models

Detailed List of ASC Properties (cont.)

• Responder Processing– The IS can detect if a data test is satisfied by any of

its responder cells in constant time (i.e., any-responders?).

– The IS can select an arbitrary responder in constant time (i.e., pick-one).

Page 18: SIMD and Associative Computational Models Part II: Associative Models

• Constant Time Global Operations (across PEs)– Logical OR and AND of binary values– Maximum and minimum of numbers

– Associative searches

• Communications– There are at least two real or virtual networks

• PE communications (or cell) network • IS broadcast/reduction network (which could be

implemented as two separate networks)

Detailed List of ASC Properties

(cont.)

Page 19: SIMD and Associative Computational Models Part II: Associative Models

Detailed List of ASC Properties (cont.)

– The PE communications network is normally supported by an interconnection network

• E.g., a 2D mesh– The broadcast/reduction network(s) are normally

supported by a broadcast and a reduction network (sometimes combined).

• See posted paper by Jin, Baker, & Batcher (listed in associative references)

• Control Features– PEs and the IS and the networks all operate

synchronously, using the same clock

Page 20: SIMD and Associative Computational Models Part II: Associative Models

The Associative Computing (ASC) Model(An Alternate Diagram)

CellsInstruction Stream (Control Unit)

Cel

l Net

wor

k

Broa

dc ast / R

edu

cti on N

etwork

. . .

PEMemory

PEMemory

PEMemory

PEMemory

PEMemory

PEMemory

Page 21: SIMD and Associative Computational Models Part II: Associative Models

The MASC ModelInstruction

Stream

Instruction Stream

Instruction Stream

Cel

l Net

wor

kInstruction S

tream N

etwork

Broadcast / R

eduction Netw

ork

. . .

PEMemory

PEMemory

PEMemory

PEMemory

PEMemory

PEMemory

. . .

Page 22: SIMD and Associative Computational Models Part II: Associative Models

The MASC ModelMASC (i.e., Multiple ASC) is a multiple ASC model • A Multiple SIMD (or MSIMD) model with more than one

Instruction Stream (IS)• Each IS can execute a separate data-parallel task

– These threads execute to completion without interacting or interruption

• Dynamically reconfigurable – Each cell listens to only one IS – Cells can switch ISs, based on a data test.– Cells can switch between being active, inactive, or idle

• Each IS with its cells is an ASC model• Job/functional parallelism is used to control the ISs using

the IS network.

Page 23: SIMD and Associative Computational Models Part II: Associative Models

The MASC Model• Basic Components

– An array of cells, each consisting of simple PEs (or ALUs) and their local memory

– An interconnection network between the cells

– One or more instruction streams (ISs)

– A control unit for the ISs

• MASC is a MSIMD model that supports – data parallel threads that execute to completion

– Job parallelism used to control these threads

– Uses associative programming techniques

Page 24: SIMD and Associative Computational Models Part II: Associative Models

MASC Basic Properties• Each cell can listen to only one IS

• Cells can switch ISs in unit time, based on the results of a data test.

• Each IS and the cells listening to it follow rules of the ASC model.

• Control Features:– The PEs, ISs, and networks all operate

synchronously, using the same clock– Restricted job control parallelism is used to

coordinate the interaction of the multiple ISs.

Page 25: SIMD and Associative Computational Models Part II: Associative Models

Characteristics of Associative

Programming • Consistent use of style of programming called

data parallel programming• Consistent use of global associative searching

and responder processing• Usually, frequent use of the constant time global

reduction operations: AND, OR, MAX, MIN• Broadcast of data using IS bus allows the use of

the PE network to be restricted to parallel data movement.

Page 26: SIMD and Associative Computational Models Part II: Associative Models

Characteristics of Associative Programming

• Tabular representation of data – think 2D arrays• Use of searching instead of sorting• Use of searching instead of pointers• Use of searching instead of the ordering provided

by linked lists, stacks, queues• Promotes an highly intuitive programming style

that promotes high productivity• Uses structure codes (i.e., numeric

representation) to represent data structures such as trees, graphs, embedded lists, and matrices.

• We’ll see examples of the above.– Ref: Nov. 1994 IEEE Computer article.– Also, see “Associative Computing” book by Potter.

Page 27: SIMD and Associative Computational Models Part II: Associative Models

Languages Designed for the ASC

• Professor Potter has created several languages for the ASC model.

• ASC is a C-like language designed for ASC model• ACE is a higher level language that uses natural

language syntax; e.g., plurals, pronouns.• Anglish is an ACE variant that uses an English-like

grammar (e.g., “their”, “its”) • An OOPs version of ASC for the MASC was discussed

(by Potter and his students), but never designed.• Language References:

– ASC Primer – Copy available on parallel lab website www.cs.kent.edu/~parallel/

– “Associative Computing” book by Potter [11] – some features in this book were never fully implemented in ASC Compiler

Page 28: SIMD and Associative Computational Models Part II: Associative Models

Dodge

Ford

Ford

Make

Subaru

Color

PE1

PE2

PE3

PE4

PE5

PE6

PE7

red

blue

white

red

Year

1994

1996

1998

1997

Model PriceOnlot

1

1

0

0

0

0

1

Busy-idle

1

0

1

1

0

0

1

IS

Typical Data Structure for ASC Model

Make, Color – etc. are fields the programmer establishes

Various data types are supported. Some examples will show string data, but they are not supported in the ASC simulator.

Page 29: SIMD and Associative Computational Models Part II: Associative Models

ASC Algorithms and Programs

• A wide range of algorithms implemented in ASC without the use of the PE network:– Graph Algorithms

• minimal spanning tree• shortest path• connected components

– Computational Geometry Algorithms• convex hull algorithms (Jarvis March, Quickhull,

Graham Scan, etc)• Dynamic hull algorithms

Page 30: SIMD and Associative Computational Models Part II: Associative Models

ASC Algorithms and Programs(not requiring PE network)

– String Matching Algorithms• all exact substring matches• all exact matches with “don’t care” (i.e., wild card)

characters.

– Algorithms for NP-complete problems• traveling salesperson • 2-D knapsack.

– Data Base Management Software• associative data base• relational data base

Page 31: SIMD and Associative Computational Models Part II: Associative Models

ASC Algorithms and Programs (not requiring a PE network)

– A Two Pass Compiler for ASC – not the one we will be using. This compiler uses ASC parallelism.

• first pass• optimization phase

– Two Rule-Based Inference Engines for AI• An Expert System OPS-5 interpreter• PPL (Parallel Production Language interpreter)

– A Context Sensitive Language Interpreter• (OPS-5 variables force context sensitivity)

– An associative PROLOG interpreter

Page 32: SIMD and Associative Computational Models Part II: Associative Models

Associative Algorithms & Programs

(using a network)• There are numerous associative programs that

use a PE network;– 2-D Knapsack ASCAlgorithm using a 1-D mesh– Image processing algorithms using 1-D mesh– FFT (Fast Fourier Transform) using 1-D nearest

neighbor & Flip networks– Matrix Multiplication using 1-D mesh– An Air Traffic Control Program (using Flip network

connecting PEs to memory)• Demonstrated using live data at Knoxville in mid

70’s.

• All but first were developed in assembler at Goodyear Aerospace

Page 33: SIMD and Associative Computational Models Part II: Associative Models

ASC Algorithms

Minimal Spanning Tree

Graham Scan (postponed)

String Matching

Page 34: SIMD and Associative Computational Models Part II: Associative Models

Example 1 - MST

• A graph has nodes labeled by some identifying letter or number and arcs which are directional and have weights associated with them.

• Such a graph could represent a map where the nodes are cities and the arc weights give the mileage between two cities.

A B

C D

E

3

5 2

54

Page 35: SIMD and Associative Computational Models Part II: Associative Models

The MST Problem

• The MST problem assumes the weights are positive and the graph is connected. Its goal is to find the minimal spanning tree,

– i.e. a subgraph that is a tree1, that includes all nodes (i.e. it spans), and

– where the sum of the weights on the arcs of the subgraph is the smallest possible weight (i.e. it is minimal).

• Why would an algorithm solving this problem be useful?

• Note: The solution may not be unique.1 A tree is a set of points called vertices, pairs of distinct

vertices called edges, such that (1) there is a sequence of edges called a path from any vertex to any other, and (2) there are no circuits, that is, no paths starting from a vertex and returning to the same vertex.

Page 36: SIMD and Associative Computational Models Part II: Associative Models

An Example

DE

HI

F CG

BA

86

53

3

2

2

2

1

6

1

4

2

47

As we will see, the algorithm is simple.

The ASC program is quite easy to write.

A SISD solution is a bit messy because of the data structures needed to hold the data for the problem

Page 37: SIMD and Associative Computational Models Part II: Associative Models

An Example – Step 0

DE

HI

F CG

BA

86

53

3

2

2

2

1

6

1

4

2

47

We will maintain three sets of nodes whose membership will change during the run.

The first, V1, will be nodes selected to be in the tree.

The second, V2, will be candidates at the current step to be added to V1.

The third, V3, will be nodes not considered yet.

Page 38: SIMD and Associative Computational Models Part II: Associative Models

An Example – Step 0

DE

HI

F CG

BA

86

53

3

2

2

2

1

6

1

4

2

47

V1 nodes will be in red with their selected edges being in red also.

V2 nodes will be in blue with their candidate edges in blue also.

V3 nodes and edges will remain black

Page 39: SIMD and Associative Computational Models Part II: Associative Models

An Example – Step 1

DE

HI

F CG

BA

86

53

3

2

2

2

1

6

1

4

2

47

Select an arbitrary node to place in V1, say A.

Put into V2, all nodes incident with A.

Page 40: SIMD and Associative Computational Models Part II: Associative Models

An Example – Step 2

DE

HI

F CG

BA

86

53

3

2

2

2

1

6

1

4

2

47

Choose the edge with the smallest weight and put its node, B, into V1. Mark that edge with red also.

Retain the other edge-node combinations in the “to be considered” list.

Page 41: SIMD and Associative Computational Models Part II: Associative Models

An Example – Step 3

DE

HI

F CG

BA

86

53

3

2

2

2

1

6

1

4

2

47

Add all the nodes incident to B to the “to be considered list”.

However, note that AG has weight 3 and BG has weight 6. So, there is no sense of including BG in the list.

Page 42: SIMD and Associative Computational Models Part II: Associative Models

An Example – Step 4

DE

HI

F CG

BA

86

53

3

2

2

2

1

6

1

4

2

47

Add the node with the smallest weight that is colored blue and add it to V1.

Note the nodes and edges in red are forming a subgraph which is a tree.

Page 43: SIMD and Associative Computational Models Part II: Associative Models

An Example – Step 5

DE

HI

F CG

BA

86

53

3

2

2

2

1

6

1

4

2

47

Update the candidate nodes and edges by including all that are incident to those that are in V1 and colored red.

Page 44: SIMD and Associative Computational Models Part II: Associative Models

An Example – Step 6

DE

HI

F CG

BA

86

53

3

2

2

2

1

6

1

4

2

47

Select I as its edge is minimal. Mark node and edge as red.

Page 45: SIMD and Associative Computational Models Part II: Associative Models

An Example – Step 7

DE

HI

F CG

BA

86

53

3

2

2

2

1

6

1

4

2

47

Add the new candidate edges.

Note that IF has weight 5 while AF has weight 7. Thus, we drop AF from consideration at this time.

Page 46: SIMD and Associative Computational Models Part II: Associative Models

An Example – after several more passes we have …

DE

HI

F CG

BA

86

53

3

2

2

2

1

6

1

4

2

47

Note that when CH is added, GH is dropped as CH has less weight.

Also, BC is dropped for the same type of reasoning (i.e., it would form a back edge between two nodes already in the MST).

When there are no more nodes to be considered, i.e. no more in V3, we obtain the final solution.

Page 47: SIMD and Associative Computational Models Part II: Associative Models

An Example – the final solution

DE

HI

F CG

BA

86

53

3

2

2

2

1

6

1

4

2

47

The subgraph is clearly a tree – no cycles and connected.

The tree spans – i.e. all nodes are included.

While not obvious, it can be shown that this algorithm always produces a minimal spanning tree.

The algorithm is known as Prim’s Algorithm for MST.

Page 48: SIMD and Associative Computational Models Part II: Associative Models

ASC-MST Algorithm Preliminaries

• Next, a “data structure” level presentation of Prim’s algorithm for the MST is given.

• The data structure used is illustrated in the next two slides. – This example is from the Nov. 1994 IEEE Computer

paper cited in the references.

• There are two types of variables for the ASC model, namely– the parallel variables (i.e., ones for the PEs) – the scalar variables (ie., the ones used by the IS). – Scalar variables are essentially global variables.

• Can replace each with a parallel variable with this scalar value stored in each entry.

Page 49: SIMD and Associative Computational Models Part II: Associative Models

ASC-MST Algorithm Preliminaries (cont.)

• In order to distinguish between them here, the parallel variables names end with a “$” symbol.

• Each step in this algorithm takes constant time. • One MST edge is selected during each pass through the

loop in this algorithm.• Since a spanning tree has n-1 edges, the running time of

this algorithm is O(n) and its cost is O(n 2).– Definition of cost is (running time) (number of processors)

• Since the sequential running time of the Prim MST algorithm is O(n 2) and is time optimal, this parallel implementation is cost optimal.– Cost & optimality will be covered in parallel algorithm

performance evaluation chapter (See Ch 7 of Quinn)

Page 50: SIMD and Associative Computational Models Part II: Associative Models

Graph used for Data Structure

Figure 6 in [Potter, Baker, et. al.]

a

b c

d ef

2 8

96

3

3

4

7

2

Page 51: SIMD and Associative Computational Models Part II: Associative Models

Data Structure for MST Algorithm

curr

ent_

best

$

cand

idat

e$

next-node b

a

IS

wait∞∞∞9∞∞f

3byes∞∞363∞e

4byes∞3∞∞4∞d

7byes96∞∞78 c

2ano∞347∞2b

no∞∞∞82∞aP

Es

mas

k$

node

$

a$ b$ pare

nt$

rootc$ d$ e$ f$

Page 52: SIMD and Associative Computational Models Part II: Associative Models

Algorithm: ASC-MST-PRIM(root)1. Initialize candidates to “waiting”

2. If there are any finite values in root’s field,

3. set candidate$ to “yes”

4. set parent$ to root

5. set current_best$ to the values in root’s field

6. set root’s candidate field to “no”

7. Loop while some candidate$ contain “yes”

8. for them

9. restrict mask$ to mindex(current_best$)

10. set next_node to a node identified in the preceding step

11. set its candidate to “no”

12. if the value in their next_node’s field are less than current_best$, then

13. set current_best$ to value in next_node’s field

14. set parent$ to next_node

15. if candidate$ is “waiting” and the value in its next_node’s field is finite

16. set candidate$ to “yes”

17. set parent$ to next_node

18. set current_best to the values in next_node’s field

Page 53: SIMD and Associative Computational Models Part II: Associative Models

Comments on ASC-MST Algorithm• The three preceding slides are Figure 6 in [Potter, Baker,

et.al.] IEEE Computer, Nov 1994].• Figure 6c gives a compact, data-structures level

pseudo-code description for this algorithm – Pseudo-code illustrates Potter’s use of pronouns

(e.g., them) and possessive nouns.– The mindex function returns the index of a processor

holding the minimal value.– This MST pseudo-code is much shorter and simpler

than data-structure level sequential MST pseudo-codes

• e.g., see one of Baase’s textbook cited in references• Algorithm given in Baase’s books is essentially the same as

this parallel algorithm

• Next, a more detailed explanation of the algorithm in preceding slide will be given.

Page 54: SIMD and Associative Computational Models Part II: Associative Models

Algorithm: ASC-MSP-PRIM• Initially assign any node to root.• All processors set

– candidate$ to “waiting”– current-best$ to – the candidate field for the root node to “no”

• All processors whose distance d from their node to root node is finite do– Set their candidate$ field to “yes– Set their parent$ field to root.– Set current_best$ = d.

Page 55: SIMD and Associative Computational Models Part II: Associative Models

Algorithm: ASC-MSP-PRIM (cont. 2/3)

• While the candidate field of some processor is “yes”, – Restrict the active processors whose candidate field

is “yes” and (for these processors) do• Compute the minimum value x of current_best$.• Restrict the active processors to those with

current_best$ = x and do– pick an active processor, say one that contains

node y.» Set the candidate$ value of node y to “no”

– Set the scalar variable next-node to y.

Page 56: SIMD and Associative Computational Models Part II: Associative Models

Algorithm: ASC-MSP-PRIM (cont. 3/3)

• If the value z in the next_node column of a processor is less than its current_best$ value, then

– Set current_best$ to z. – Set parent$ to next_node

– For all processors, if candidate$ is “waiting” and the distance of its node from next_node is not , then

• Set candidate$ to “yes”• Set current_best$ to the distance of its node from

next_node.• Set parent$ to next-node

Page 57: SIMD and Associative Computational Models Part II: Associative Models

Associative Graham Scan Material

Postponed this year. See “Graham Scan” Reference for

further information

Page 58: SIMD and Associative Computational Models Part II: Associative Models

Associative String Matching

Shannon Steinfadt

Page 59: SIMD and Associative Computational Models Part II: Associative Models

Shannon’s Slides

• Shannon’s slides belong here.

• Currently, they will be posted separately on our webpage.

Page 60: SIMD and Associative Computational Models Part II: Associative Models

For Additional ASC and MASC Algorithms

• See Slides for PDC’06 course or other ASC/MASC papers at

www.cs.kent.edu/~parallel/

• Also, MMB Simulations provide many algorithms for MASC

Page 61: SIMD and Associative Computational Models Part II: Associative Models

Simulations between MASC and other models

(Primarily MMB Model)

Page 62: SIMD and Associative Computational Models Part II: Associative Models

Simulation References• Johnnie Baker and Mingxian Jin, Simulation of

Enhanced Meshes with MASC, a MSIMD Model, Proc. of the Eleventh IASTED International Conference on Parallel and Distributed Computing and Systems, Nov. 1999, 511-516.– Reading Assignment & Primary Reference: Read

material that is related to slides.

• Mingxian Jin, Wittaya Chantamas, Johnnie Baker, forthcoming paper on comparing the power of the MASC model to reconfigurable bus-based models. 2007-8. Use following reference until this becomes available.– Mingxian Jin & Johnnie Baker, Relating the Power of the

MASC Model with that of Reconfigurable Bus-Based Models, International Parallel and Distributed Processing Symposium (APDCM Workshop), April 2007.

Page 63: SIMD and Associative Computational Models Part II: Associative Models

Previous MASC Simulation

• MASC Simulation of PRAM– MASC(n,j) can simulate priority CRCW

PRAM(n,m) in O(min{n/j, m/j}) with high probability.

– MASC(n,1) [or ASC] can simulate priority CRCW with a constant number of global memory locations in constant time (Avg Case)

• This result is stronger than it first appears• Many CRCW algorithms only require a constant nr

of global memory locations

– Recent related results of Mingxian Jin mentioned later

Page 64: SIMD and Associative Computational Models Part II: Associative Models

Previous MASC Simulation (cont)

• Self-simulation of MASC– Provides an efficient algorithm for MASC to

efficiently simulate a larger MASC - with more PEs and/or ISs.

– Establishes that MASC is highly scalable– MASC(n,j) can simulate MASC(N,J) in O(N/n

+ J) extra time and O(N/n + J) extra memory.• Best reference on above is Darrell Ulm’s

Dissertation

Page 65: SIMD and Associative Computational Models Part II: Associative Models

The Enhanced Mesh, MMB

• References:– [Baker & Jin], Reference listed on Slide 19 – Mingxian Jin, Evaluating the power of the parallel

MASC model using simulations and Real-Time Applications, KSU Dissertation Aug. 2004, 145 pages.

• Enhanced meshes are basic mesh models augmented with fixed or reconfigurable buses– At most one PE on a bus can broadcast to remaining

PEs during one step.

Page 66: SIMD and Associative Computational Models Part II: Associative Models

The Enhanced Mesh, MMB• Best-known fixed bus example:

– Mesh with multiple broadcasting (MMB) – Standard 2-D mesh– Row and column bus enhancements– Broadcasts can occur along only row or column buses

(but not both) in one step

Page 67: SIMD and Associative Computational Models Part II: Associative Models

The Reconfigurable Enhanced Mesh RM

• For all reconfigurable bus models, buses are created dynamically during execution

• Best known example:– General Reconfigurable Mesh (RM)– Each PE has four ports called N,S, E, W (often

called “NEWS”)– In one step, each PE can set the connections of its

ports, based on local data – At most two disjoint pairs of ports can be

connected at any time– One such connection is the adjacent pairs,

{{N,E}, {W,S}}.

Page 68: SIMD and Associative Computational Models Part II: Associative Models

Reconfigurable Mesh Architecture

N

W E

S

Page 69: SIMD and Associative Computational Models Part II: Associative Models

Simulation Preliminaries

• Reasons to simulate other models using MASC– Allows a better understanding of the power of MASC– Provides a simulation algorithm that can be used to

convert algorithms designed for the other model to MASC

– Provides alternate methods to support MASC.

• Basic Assumption Used in the Simulations– MASC(n, ) has a mesh PE network with

row-major ordering– The enhanced meshes have a 2D mesh with the

same size and ordering

nn n

Page 70: SIMD and Associative Computational Models Part II: Associative Models

Simulation Preliminaries (cont.)

• Basic Assumption Used in the Simulations (cont.)– Each PE in MASC has the same computational power

as an enhanced mesh PE– The MASC buses and the buses of the enhanced

mesh have the same characteristics– The word lengths of both models are the same and at

least lg(n).– Each PE in MASC knows its position in the 2D mesh.

• Words can store the positions of various PEs

Page 71: SIMD and Associative Computational Models Part II: Associative Models

Simulating MMB using MASC

• The mapping is between MASC(n, ) and Enhanced meshes of size

• The mapping assigns a PE in one model to the PE that is in the same position in the 2D mesh in the other model

• The i-th IS in MASC simulates both the i-th row and the i-th column buses

nnn

Page 72: SIMD and Associative Computational Models Part II: Associative Models

Simulating MMB using MASC

IS1

IS2

ISj

Cell Cell

CellCell Cell

Cell

C e l l N e t w o r k

I S N e t w o r k

Page 73: SIMD and Associative Computational Models Part II: Associative Models

Simulating MMB with MASC

• Since both models have identical 2D meshes, these do not need to be simulated

• Since the power of PEs in respective models are identical, their local computations are not simulated

• To simulate a MMB row broadcast on the MASC,– All PEs switch to their assigned row IS– The IS for each row checks to see if there is a PE that

wishes to broadcast– If true, the IS broadcasts this value to all of its PEs

(i.e., the ones on its assigned row).• Simulation of a MMB column broadcast is similar• The running time is O(1)

Page 74: SIMD and Associative Computational Models Part II: Associative Models

MASC More Powerful Than MMB

• The MASC model is strictly more powerful than the MMB model when both models have the “same size”.– Since MASC can simulate MMB in O(1) time, MASC

is as powerful as MMB.

– There is a problem which can be solved in constant time using MASC(n, ) with a mesh but which requires (n log n) time for a MMB to solve.

• See “Simulation of Enhanced Meshes with MASC, a MSIMD Model” by Baker & Jin (listed in references & posted on web)

n nn nn

Page 75: SIMD and Associative Computational Models Part II: Associative Models

Simulation of MMB with MASC

Theorem 1.• MASC(n, j) with a 2-D mesh is strictly more

powerful than a MMB for j = ( ).

• An algorithm for a MMB can be executed on MASC(n, j) with j=( ) and a 2-D mesh with a running time at least fast as the MMB time.

nn n

nnn

Page 76: SIMD and Associative Computational Models Part II: Associative Models

Simulation of MASC by MMB

• PE(1,1) stores a copy of the program and simulates the ISs sequentially.

• Each instruction stream command or datum is first sent by P(1,1) to the PEs in the first column.

• Next, the PEs in the first column broadcast this command or datum along the rows to all PEs.

• Each MMB processor uses two registers, channel and status, to decide whether or not to execute the current instruction.– channel records which IS the processor is

assigned to– status records whether PE is active, inactive, idle

n

Page 77: SIMD and Associative Computational Models Part II: Associative Models

Simulation of MASC by MMB

• The simulation of simultaneous broadcasts of ISs takes O( ) time.

• A local computation, memory access, or a data movement along local links are identical in the two models and require O(1) time.

• The execution of a global reduction operator OR, AND, MAX, MIN takes O( ) using an optimal MMB algorithm (MMB reduction published by Olariu, et. al.).

• Since the global reduction operators may be required for O( ) ISs, an upper bound is O( ) or

O( ).

nn

n 61

n nn 61n 32

Page 78: SIMD and Associative Computational Models Part II: Associative Models

Simulation of MASC by MMB

Theorem 2.• MASC(n, ) with a 2-D mesh can be simulated

by a MMB in O( ) time with O( ) extra memory

nnn n 32 n

Page 79: SIMD and Associative Computational Models Part II: Associative Models

Power Comparisons between MASC, RM, and PRAM

• Here, we allow the size of the simulating model to be a polynomial times larger than the simulated model.– E.g., MASC(P(n),Q(m)) can be used to simulate PRAM(n,m),

where P(n) and Q(m) are polynomials in n and m, respectively.

• Even if the running time of a simulation is small, its “cost” can be huge due to polynomial increase in size of the simulating model.

• MASC and PRAM can each simulate each other in O(1) time, so they have the same power.

• RM can simulate both MASC and RM in O(1) time, but neither MASC nor PRAM can simulate RM in O(1) time.

• RM is more powerful than either MASC or PRAM.

Page 80: SIMD and Associative Computational Models Part II: Associative Models

Simulation Conclusions• MASC is strictly more powerful than an MMB of the

same size. • Any algorithm for an MMB can be executed on a MASC

of the same size with the same running time. In particular,– Optimal algorithms for MMB are also optimal when

executed on MASC• By using a polynomial times large simulating model, O(1)

simulations can be established between MASC and PRAM.

• A polynomimal times larger RM can simulate MASC and PRAM in O(1) time

• A polynomial times larger MASC or PRAM can not simulate RM in O(1) time.

Page 81: SIMD and Associative Computational Models Part II: Associative Models

Timings for Associative Operations

Mingxian Jin Johnnie Baker

Kenneth Batcher

Kent State University

Page 82: SIMD and Associative Computational Models Part II: Associative Models

OUTLINE

• The ASC/MASC Constant Time Operations• Broadcast/reduction network• Discussion on the basic operations• Comparison of timings with other models • Summary• Reference

– Mingxian Jin, Johnnie Baker, and Kenneth Batcher, Timings for Associative Operations on the MASC Model, International Parallel and Distributed Processing Symposium, IEEE Workshop on Massively Parallel Processing San Francisco, April 2001

Page 83: SIMD and Associative Computational Models Part II: Associative Models

Recall ASC & MASC Models

PEMemory IS

IS

CE LL NE T WO RK

IS NETWORK

PEMemory

PEMemory

Cells

Broadcast/reduction network

Page 84: SIMD and Associative Computational Models Part II: Associative Models

Basic Constant-Time Operations

• Broadcasting

• Global reduction of logic OR/AND

• Global reduction of Maximum/Minimum

• Associative search– Responders – Non-responders

• The AnyResponder operation

• The PickOne operation

Page 85: SIMD and Associative Computational Models Part II: Associative Models

Motivation• The basic operations are essential to support associative

style of computing• Accuracy and fairness of the timings

– determine the accuracy of the comparison between MASC and other models

– determine the ability of MASC to efficiently support algorithm design and complexity analysis

• Simulation of MMB has raised some questions about the assigned timings

• Evidence is needed to justify the correctness of timings• Correctness depends on

– possible hardware implementations– comparative fairness with respect to other model

Page 86: SIMD and Associative Computational Models Part II: Associative Models

Broadcast/reduction network•Constructed using a group of resolver

circuits

–Ri : a responder bit of PEi

–Vi = R0R1... Ri-1

–A resolver tells whether any earlier PE has Ri equal to 1

•STARAN-- architectural motivation of MASC–An associative SIMD computer built in 1970’s

by Goodyear Aerospace–Possible hardware implementation of the

MASC operations

Page 87: SIMD and Associative Computational Models Part II: Associative Models

A 4-PE resolver

PEi

Ri

PEi+1

Ri+1

PEi+2

Ri+2

PEi+3

Ri+3

Vi Vi+1 Vi+2Vi+3Ri Ri+1 Ri+2

Ri+3

Vi = R0R1... Ri-1

Ri Ri+1Ri+2 Ri+3

Figure 2. A 4-PE resolver with at most 1-gate delay from any input to any output

Page 88: SIMD and Associative Computational Models Part II: Associative Models

A 16-PE ResolverPEi PEi+15

4-PE Resolver

4-PE Resolver

4-PE Resolver

4-PE Resolver

4-PE Resolver

ViRi Ri+1…Ri+15

Figure 3. A 16-PE resolver with at most a 3-gate delay from any input to any output

Page 89: SIMD and Associative Computational Models Part II: Associative Models

Gate Delay for Broadcast/Reduction

•The gate delay on the MASC broadcast/reduction network is at most (2log4N –1)

•If a gate delay takes about 1-5 nanoseconds.

– A machine with 2210 processors would have the gate delay at most (2log42210-1)5 nanoseconds 5.1 microseconds

– A machine with 100 million processors would have the gate delay less than 50 nanoseconds, which is comparable to the time for a memory access in today’s systems.

•A machine with 2210 processors is impractical– This is greater than the number of atoms in the observable universe

Page 90: SIMD and Associative Computational Models Part II: Associative Models

The gate delay of (2log4 N -1) using a regular scale

(s)

Page 91: SIMD and Associative Computational Models Part II: Associative Models

The gate delay of (2log4 N -1) with the vertical axis using a logarithmic scale

1.023

1.010

(s)

Page 92: SIMD and Associative Computational Models Part II: Associative Models

Recall Constant Time Associative Operations

• Broadcasting

• Global OR and AND operations

• Associative search

• The AnyResponder operation

• The PickOne operation

• Global maximum and minimum operations

Page 93: SIMD and Associative Computational Models Part II: Associative Models

Broadcasting• 1 bit -- O(1) and bits -- O()• Let be the length of an instruction or a data

item• For bus-based architectures, buses normally

have bandwidth of =logN where N is the number of PEs,– Allows a processor ID to be stored in a word– Words can be transmitted in one step

• Separate broadcast network can be built for MASC with the bus bandwidth

• For MASC, both the word and instruction broadcasts require constant time

Page 94: SIMD and Associative Computational Models Part II: Associative Models

Global OR and AND Timing

• Is computed through the resolver network, like broadcasting

• However, the tree traversal is in the reverse direction

• Reasonable to assume this requires constant time

Page 95: SIMD and Associative Computational Models Part II: Associative Models

Associative Search Operation Timing

• A IS determines if any of its active PEs contains a (word-length) search pattern

• The search pattern is broadcast to the PEs by the IS

• Next, each active PE makes a sequential comparison. Those PEs with matching values set their responder bit and remain active

• Same tree travel through the network as above• Operation requires constant time

Page 96: SIMD and Associative Computational Models Part II: Associative Models

AnyResponder Operation Timing

• Usually follows an associative search

• Returns true if any responder bit is set and false otherwise

• Essentially a global OR operation

Page 97: SIMD and Associative Computational Models Part II: Associative Models

PickOne Operation Timing

• Usually follows an associative search

• The IS can instruct this PE to broadcast a value

• After a PE is processed, the responder bit is cleared and the IS can pick another active PE

Page 98: SIMD and Associative Computational Models Part II: Associative Models

Global Maximum and Minimum Timings

• Implemented bit-serially with a global AND in the order of the left bit to the right bit.

• Each global AND keeps active those PEs whose bit value is 1 (or 0 for Minimum) until – one responder is left or– all bits have been processed

• If all active processors have a 0 value for one bit, then all remain active for the next round.

• With the word length , this operation takes O() • An addition of two word length operands is normally

assumed to be an O(1) operation, i.e., is a constant• Similarly, a global Maximum and Minimum can be

justified to be a constant time operation

Page 99: SIMD and Associative Computational Models Part II: Associative Models

RAM & PRAM Memory Access Time

• A lower bound established using electronic circuit access – Called a Memory Access Unit (MAU)

• MAU implemented as a binary tree of switches rooted at each processor

• When the memory size is M, the memory access time is (log M) for RAM (log N) for PRAMs with N processors

(assuming N=(M)• Both assume the memory access takes

constant time

Page 100: SIMD and Associative Computational Models Part II: Associative Models

Mesh with Multiple Broadcast (MMB)

• A MMB is a basic mesh enhanced with row and column buses

• In one time unit, only one processor is allowed to broadcast

• All other processors read the value being broadcast in constant time

• More practical compared to PRAMs• On a NN MMB and each PE holding a data item, a

global reduction can be computed in O(N ).• MMB perform these reductions by designing specific

algorithms rather than using circuits• Substantially different methods are used to execute

these operations on MASC

Page 101: SIMD and Associative Computational Models Part II: Associative Models

Summary of Associative Operations Timings

Operations1-bit

Broadcast Bus

-bit Broadcast

Bus

Broadcast O() O(1)

Addition/Subtraction

O(1)

Logic OR O(1)

Logic AND O(1)

Associative Search O() O(1)

AnyResponder O(1)

PickOne O(1)

Maximum/Minimum

O() O(1)

Page 102: SIMD and Associative Computational Models Part II: Associative Models

A Proposed IS Control System for MASC

Wittaya Chantamas

These slides will be posted separately on webpage.

Page 103: SIMD and Associative Computational Models Part II: Associative Models

Wittaya’s slides go here

• This is the location that Wittaya’s slide belong.

• However, they will be posted separately on the webpage.

Page 104: SIMD and Associative Computational Models Part II: Associative Models

Slides for Possible Future Use

Page 105: SIMD and Associative Computational Models Part II: Associative Models

MASC Model• Basic Components

– An array of cells, each consisting of a PE and its local memory

– A PE interconnection network between the cells

– One or more Instruction Streams (ISs)

– An IS network

• MASC is a MSIMD model that supports – both data and control

parallelism– associative

programming

Memory

Memory

Memory

Memory

Memory

Memory

Memory

Memory

PE

Inte

rcon

nect

ion

Net

wor

k

IS N

etw

ork

PE

PE

PE

PE

PE

PE

PE

PE

Instruc-tion

Stream(IS)

Instruc-tion

Stream(IS)

Instruc-tion

Stream(IS)