internet-based tsp computation with javelin++ michael neary & peter cappello computer science,...

Post on 19-Jan-2016

218 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Internet-Based TSP Computation with Javelin++

Michael Neary & Peter CappelloComputer Science, UCSB

IntroductionGoals

• Service parallel applications that are:– Large: too big for a cluster– Coarse-grain: to hide communication latency

• Simplicity of use– Design focus: decomposition [composition] of computation.

• Scalable high performance– despite large communication latency

• Fault-tolerance– 1000s of hosts, each dynamically [dis]associates.

IntroductionSome Related Work

IntroductionSome Applications

• Search for extra-terrestrial life• Computer-generated animation• Computer modeling of drugs for:

– Influenza– Cancer– Reducing chemotherapy’s side-effects

• Financial modeling• Storing nuclear waste

Outline

• Architecture

• Model of Computation

• API

• Scalable Computation

• Experimental Results

• Conclusions & Future Work

Architecture Basic Components

Brokers

Clients

Hosts

ArchitectureBroker Discovery

B

B B B

B

B B B

BrokerNamingSystem

B

H

ArchitectureBroker Discovery

B

B B B

B

B B B

BrokerNamingSystem

B

H

ArchitectureBroker Discovery

B

B B B

B

B B B

BrokerNamingSystem

B

H

ArchitectureBroker Discovery

B

B B B

B

B B B

BrokerNamingSystem

B

H

PING(BID?)

ArchitectureBroker Discovery

B

B B B

B

B B B

BrokerNamingSystem

B

H

ArchitectureNetwork of Broker-Managed Host Trees

• Each broker manages a tree of hosts

ArchitectureNetwork of Broker-Managed Host Trees

• Brokers form a network

ArchitectureNetwork of Broker-Managed Host Trees

• Brokers form a network

• Client contacts broker

ArchitectureNetwork of Broker-Managed Host Trees

• Brokers form a network

• Client contacts broker• Client gets host trees

Scalable ComputationDeterministic Work-Stealing Scheduler

Task container

addTask( task ) getTask( )

stealTask( )

HOST

Scalable ComputationDeterministic Work-Stealing Scheduler

Task getWork( )

{

if ( my deque has a task )

return task;

else if ( any child has a task )

return child’s task;

else

return parent.getWork( );

}

CLIENT

HOSTS

Models of Computation

• Master-slave

– AFAIK all proposed commercial applications

• Branch-&-bound optimization

– A generalization of master-slave.

Models of ComputationBranch & Bound

34 8 7 12 10 9 10

3 6 10 8

2 7

0 0UPPER = LOWER = 0

Models of ComputationBranch & Bound

34 8 7 12 10 9 10

3 6 10 8

2 7

0

2

0UPPER = LOWER = 2

Models of ComputationBranch & Bound

34 8 7 12 10 9 10

3 6 10 8

2 7

0

3

2

0UPPER = LOWER = 3

Models of ComputationBranch & Bound

34 8 7 12 10 9 10

3 6 10 8

2 7

0

4

3

2

0UPPER = 4LOWER = 4

Models of ComputationBranch & Bound

34 8 7 12 10 9 10

3 6 10 8

2 7

0

34

3

2

0UPPER = 3LOWER = 3

Models of ComputationBranch & Bound

34 8 7 12 10 9 10

3 6 10 8

2 7

0

34

3 6

2

0UPPER = 3LOWER = 6

Models of ComputationBranch & Bound

34 8 7 12 10 9 10

3 6 10 8

2 7

0 UPPER = 3LOWER = 7

34

3 6

2 7

0

Models of ComputationBranch & Bound

• Tasks created dynamically

• Upper bound is shared

• To detect termination:

scheduler detects tasks that

have been:

– Completed

– Killed (“bounded”)34

3 6

2 7

0

APIpublic class Host implements Runnable{ . . . public void run() { while ( (node = jDM.getWork()) != null ) { if ( isAtomic() ) compute(); // search space; return result else { child = node.branch(); // put children in child array for (int i = 0; i < node.numChildren; i++) if ( child[i].setLowerBound() < UpperBound )

jDM.addWork( child[i] ); //else child is killed implicitly } } }

APIprivate void compute() { . . .

boolean newBest = false;

while ( (node = stack.pop()) != null ) { if ( node.isComplete() ) if ( node.getCost() < UpperBound ) { newBest = true; UpperBound = node.getCost(); jDM.propagateValue( UpperBound ); best = Node( child[i] ); } else { child = node.branch(); for (int i = 0; i < node.numChildren; i++) if ( child[i].setLowerBound() < UpperBound ) stack.push( child[i] ); //else child is killed implicitly } } if ( newBest ) jDM.returnResult( best );} }

Scalable ComputationWeak Shared Memory Model

• Slow propagation of bound affects performance not correctness.

Propagate bound

Scalable ComputationWeak Shared Memory Model

• Slow propagation of bound affects performance not correctness.

Propagate bound

Scalable ComputationWeak Shared Memory Model

• Slow propagation of bound affects performance not correctness.

Propagate bound

Scalable ComputationWeak Shared Memory Model

• Slow propagation of bound affects performance not correctness.

Propagate bound

Scalable ComputationWeak Shared Memory Model

• Slow propagation of bound affects performance not correctness.

Propagate bound

Scalable ComputationFault Tolerance via Eager Scheduling

When:

• All tasks have been assigned

• Some results have not been reported

• A host wants a new task

Re-assign a task!

• Eager scheduling tolerates faults & balances the load.

– Computation completes, if at least 1 host communicates with client.

Scalable ComputationFault Tolerance via Eager Scheduling

• Scheduler must know which:

– Tasks have completed

– Nodes have been killed

• Performance balance

– Centralized schedule info

– Decentralized computation34

3 6

2 7

0

Experimental Results

0

20

40

60

80

100

0 20 40 60 80 100

Processors

Speedup graph22

ideal

graph24

Experimental Results

34 8 7 12 10 9 10

3 6 10 8

2 7

0 Example of a “bad” graph

Conclusions• Javelin 2 relieves designer/programmer managing a set of

[Inter-] networked processors that is:– Dynamic– Faulty

• A wide set of applications is covered by:– Master-slave model– Branch & bound model

• Weak shared memory performs well.• Use multicast (?) for:

– Code distribution– Propagating values

Future Work

• Improve support for long-lived computation:– Do not require that the client run continuously.

• A dag model of computation– with limited weak shared memory.

Future WorkJini/JavaSpaces Technology

TaskManageraka Broker

H H

HH

H

H

H

H

“Continuously” disperse Tasks among brokers via a physics model

Future WorkJini/JavaSpaces Technology

• TaskManager uses

persistent JavaSpace

– Host management: trivial

– Eager scheduling: simple

• No single point of failure

– Fat tree topology

Future WorkAdvanced Issues

• Privacy of data & algorithm• Algorithms

– New computational complexity model“Minimize” communication between machines

– N-body problem, …

• Accounting: Associate specific work with specific host– Correctness– Compensation (how to quantify?)

• Create international open source organization– System infrastructure– Application codes

Models of ComputationBranch & Bound

34 8 7 12 10 9 10

3 6 10 8

2 7

0

34 8 7 12 10 9 10

3 6 10 8

2 7

0UPPER = 3LOWER = 0

ArchitectureBroker Name Service (BNS)

BROKER

HOST

BNS1. Register with BNS

ArchitectureBroker Name Service (BNS)

BROKER

HOST

BNS1. Register with BNS

2. Get broker list

ArchitectureBroker Name Service (BNS)

BROKER

HOST

BNS1. Register with BNS

2. Get broker list

3. Ping brokers on list

ArchitectureBroker Name Service (BNS)

BROKER

HOST

BNS1. Register with BNS

2. Get broker list

3. Ping brokers on list

4. Connect to selected broker

top related