a survey on parallel computing in heterogeneous grid environments

29
A Survey on Parallel Computing in Heterogeneous Grid Environments Takeshi Sekiya Chikayama-Taura Laborator y M1 Nov 24, 2006

Upload: illias

Post on 28-Jan-2016

52 views

Category:

Documents


0 download

DESCRIPTION

A Survey on Parallel Computing in Heterogeneous Grid Environments. Takeshi Sekiya Chikayama-Taura Laboratory M1 Nov 24, 2006. Dynamic Change of CPU/Network Load. Parallel Computing in Grid Environments. Increase opportunity in which we can use multi cluster environments - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Survey on Parallel Computing in Heterogeneous Grid Environments

A Survey on Parallel Computing in Heterogeneous Grid Environments

Takeshi Sekiya

Chikayama-Taura Laboratory M1

Nov 24, 2006

Page 2: A Survey on Parallel Computing in Heterogeneous Grid Environments

Parallel Computingin Grid Environments• Increase opportunity in whic

h we can use multi cluster environments– But, schemes for stand alone c

lusters cause problems in grid-like usage

• New mechanisms are needed– Handling heterogeneity– Firewall/NAT traversal– Adaptation to dynamic environ

ment – Monitoring

Heterogeneous hardware and

software

Failure

Firewall/NAT

Maintenance

DynamicChange of

CPU/Network Load

Complex Configuration

Difficult to Know What’s Happening

Page 3: A Survey on Parallel Computing in Heterogeneous Grid Environments

Heterogeneous Environments

• Heterogeneous machines– Binaries are different– Complex configuration are required when

hardware/software is different

• Heterogeneous networks– Overheads of synchronization in parallel a

pplication with different latency/bandwidth– Firewalls/NATs

Page 4: A Survey on Parallel Computing in Heterogeneous Grid Environments

Firewall/NAT

• Firewalls/NATs hinder bi-directional connectivity

• Bi-directional TCP/IP connectivity needs to be provided to support a wide spectrum of applications

Firewall or NAT

Page 5: A Survey on Parallel Computing in Heterogeneous Grid Environments

Solutions to the Internet Asymmetric-Connectivity Problem

• MPI Environment on Grid with Virtual Machines [Tachibana et al. 2006]

– Xen for VM and VPN for Virtual Network– Low cost VM migration

• ViNe [Tsugawa et al. 2006]

– A host named Virtual Router– Overlay network base

• WOW [Ganguly et al. 2006]

Page 6: A Survey on Parallel Computing in Heterogeneous Grid Environments

Outline

• Introduction• WOW

– IPOP: IP over P2P– Routing IP on the P2P Overlay– Connection Setup– Joining an Existing Network– NAT Traversal– Experiments

• Summary

Page 7: A Survey on Parallel Computing in Heterogeneous Grid Environments

Objective and Approach

• The system architected to …– Adapt heterogeneous environments

• Present to end-users a cluster-like environment– Scale to large number of nodes– Facilitate the addition of nodes through self-

organization of virtual network• Less manual configuration

• Approach with Virtualization– Virtual Machines

• Homogeneous software– Self-organizing overlay network

• All-to-all connectivity

Page 8: A Survey on Parallel Computing in Heterogeneous Grid Environments

Virtual Machine

• A homogeneous software environment

• Offering opportunities for load balancing and fault tolerance

• Users can use pre-configured systems– Linux distribution– Libraries and softwares

Page 9: A Survey on Parallel Computing in Heterogeneous Grid Environments

Virtual Network

NAT

P2P overlay network

IPOP (IP over P2P)

Physical Infrastructure

P2P Network

Virtual Grid Cluster

firewall

Page 10: A Survey on Parallel Computing in Heterogeneous Grid Environments

IPOP [Ganguly et al. 2006]

• Characteristics– A virtual IP address space– Self-organizing

• Architecture– IP tunneling over P2P– A virtualized network interface (tap)

captures virtual IP packets– Brunet P2P overlay network

Page 11: A Survey on Parallel Computing in Heterogeneous Grid Environments

Capturing Virtual IP Packets

• The tap appears as a network interface from applications

• IPOP translates virtual IP addresses to Brunet P2P network addresses

IPOP

application

tap

IPOP

application

tap

Ethernet Frame

IP Packet

Brunet Message

IP Packet

Ethernet Frame

IP Packet

Tunneling

Page 12: A Survey on Parallel Computing in Heterogeneous Grid Environments

Brunet P2P

• Ring-structured overlay • Organized connections

– Near: with neighbors– Far: across the ring

• 160 bit SHA-1 hash address

• Greedy routing• Each node has constant

number of connections– O(log2(n)) overlay hops

n1

n2

n3n4

n5

n6

n7

n8

n9n10

n11

n12

Multi hop pathfrom n1 to n7

Page 13: A Survey on Parallel Computing in Heterogeneous Grid Environments

Connection SetupConnection Protocol

• Node A wishes to connect to node B1. A sends a CTM (Connect

To Me) request to B over P2P network• The CTM request contains

A’s URI

2. When B receives the CTM request, B sends a CTM reply to A• The CTM reply contains B’s

URI

AB

CTM requestCTM reply

URI (Uniform Resource Indicator)ex.) brunet.tcp:192.0.0.1:1024

Page 14: A Survey on Parallel Computing in Heterogeneous Grid Environments

Connection SetupLinking Protocol

AB

3. B sends a link request message to A over the physical network

4. When A receives the link request, A simply responds with a link reply message

5. Finally, new connection is established between A and B

link request

link reply

Direct connectionA to B

Page 15: A Survey on Parallel Computing in Heterogeneous Grid Environments

Linking Race Condition (1)

• Race condition may occur because linking protocol is initiated by both peers

link request link request

link reply link reply

Both attempts succeed

Page 16: A Survey on Parallel Computing in Heterogeneous Grid Environments

Linking Race Condition (2)

• Check no existing connection or connection attempt, when nodes receive link request

• When nodes receive link error, they restart protocol with random back-off

link request link request

link error link error

link request

link reply

Active linking on?

Random back-off

Page 17: A Survey on Parallel Computing in Heterogeneous Grid Environments

Joining an Existing NetworkLeaf Connection

• A new node N creates a leaf connection to an initial node I by directly using linking protocol

• I acts as forwarding agent for N

New node N

Leaf connection

Initial node I

Correct positionof new node

Page 18: A Survey on Parallel Computing in Heterogeneous Grid Environments

Joining an Existing NetworkSend CTM request

• N sends a CTM request addressed to itself over P2P network– the CTM request

contains N’s URI

• A CTM request is received by right and left neighbors, since N is still not in the ring

CTM request

Right neighbor R

Left neighbor L

New node N

Initial node I

Page 19: A Survey on Parallel Computing in Heterogeneous Grid Environments

Joining an Existing NetworkSend CTM reply

• L and R send CTM reply including their URI to I

• I forwards CTM reply to N

CTM reply

Right neighbor R

Left neighbor L

New node N

Initial node ICTM reply

Page 20: A Survey on Parallel Computing in Heterogeneous Grid Environments

Joining an Existing NetworkLinking Protocol

• Start linking protocol• L and R send link

request message to N over the physical network

Right neighbor R

Left neighbor L

New node N

Initial node ILink request

Link request

Page 21: A Survey on Parallel Computing in Heterogeneous Grid Environments

Joining an Existing NetworkComplete Joining

• N forms connections with neighbors and is in ring

• Acquires “far” connections

Right neighbor R

Left neighbor L

New node N

Initial node I

Page 22: A Survey on Parallel Computing in Heterogeneous Grid Environments

Adaptive Shortcut Creation

• High latencies were observed in experiments due to multi-hop overlay routing

• Shortcut creation– Count IPOP packets to other nodes– When number of packets within an interval

exceeds threshold, initiate connection setup– Because overhead incurred during maintenance

connections, drop connections no longer in use

Page 23: A Survey on Parallel Computing in Heterogeneous Grid Environments

NAT

Host a NAT Host b

NAT Table192.168.0.2:5000 ⇔ 133.11.23.100:6000

IP: 192.168.0.2 IP: 133.11.238.100 IP: 157.82.13.244

Src: 192.168.0.2:5000Dst: 157.82.13.244:80

Src: 133.11.23.100:6000Dst: 157.82.13.244:80

Src: 157.82.13.244:80Dst: 192.168.0.2:5000

Src: 157.82.13.244:80Dst: 133.11.23.100:6000

Private Network Global Network

Page 24: A Survey on Parallel Computing in Heterogeneous Grid Environments

NAT TraversalUDP Hole Punching

NAT NATHost A Host B

IP: A IP: N IP: M IP: B

NAT TableA:a ⇔ N:n

NAT TableM:m ⇔ B:b

Src: A:aDst: M:m

Src: B:bDst: N:n

Src: M:mDst: A:a

Src: N:nDst: M:m

Src: M:mDst: N:n

Page 25: A Survey on Parallel Computing in Heterogeneous Grid Environments

Experimental SetupHosts: 2.4GHz Xeon, Linux 2.4.20,

VMware GSX

Host: 1.3GHz P-III Linux 2.4.21VMPlayer

Host: 1.7GHz P4,Win XP SP2, VMPlayer

Hosts: 2.0 GHz Xeon, Linux 2.4.20,VMware GSX

34 compute nodes, 118 P2P router nodes on PlanetLab

Page 26: A Survey on Parallel Computing in Heterogeneous Grid Environments

Experiment 1Joining and Shortcut Connections

• Node A: IPOP node• Node B: new joining node

– A and B are in different network domains with NAT

– B sends ICMP packets to A at 1sec intervals

• Within period 1 (about 3 seconds), B establish a route to other nodes

• Within period 2 (about 28seconds), B establish a shortcut connections to A

Page 27: A Survey on Parallel Computing in Heterogeneous Grid Environments

Experiment 2PVM parallel application: FastDNAml (1)

• Parallelization with PVM based master-workers model

• FastDNAml has a high computation-to-communication ratio

• Dynamic task assignment tolerates performance heterogeneities among computing nodes

Master

Workers

Task Pool

Page 28: A Survey on Parallel Computing in Heterogeneous Grid Environments

Experiment 2PVM parallel application: FastDNAml (2)

• The execution with shortcuts enabled is 24% faster than that with shortcuts disabled

• The parallel speedup is 13.6x– 23x is reported in previous work in homogeneous

cluster

Sequential Execution

Parallel Execution

Node #2 30 Nodes

Shortcuts disabled Shortcuts enabled

Execution time (sec) 22272 2033 1642

Parallel Speed up n/a 11.0 13.6

Page 29: A Survey on Parallel Computing in Heterogeneous Grid Environments

Summary

• Introduced WOW– Scalable, fault-resilient and low

management infrastructure

• Future works– Research on middleware which is easy to

use for heterogeneous adaptive Grid environment