large scale experimental grids

27
Grid'5000 and Grid eXplorer 1 Grid’5000 GdX GdX Large Scale Experimental Grids Grid’5000 Grid eXplorer Grid eXplorer & Franck Cappello INRIA [email protected]

Upload: caine

Post on 15-Jan-2016

44 views

Category:

Documents


0 download

DESCRIPTION

Grid’5000. Large Scale Experimental Grids. &. Grid eXplorer. Franck Cappello INRIA [email protected]. Grid experimental platforms rational. Grid raises a lot of research issues: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Large Scale Experimental Grids

Grid'5000 and Grid eXplorer 1

Grid’5000 GdXGdX

Large ScaleExperimental Grids

Grid’5000 Grid eXplorerGrid eXplorer&

Franck CappelloINRIA

[email protected]

Page 2: Large Scale Experimental Grids

Grid'5000 and Grid eXplorer 2

Grid’5000 GdXGdX

Grid raises a lot of research issues:Security, Performance, Fault tolerance, Scalability, Load Balancing, Coordination, Message passing, Data storage, Programming, Communication protocols and architecture, Deployment, etc.

Theoretical models and simulators cannot capture reallife conditionsProduction platforms have strong difficulties to reproduce experimental conditions

How to test and compare?• Fault tolerance protocols• Security mechanisms• Deployment tools• etc.

Grid experimental platformsrational

Page 3: Large Scale Experimental Grids

Grid'5000 and Grid eXplorer 3

Grid’5000 GdXGdX

log(cost)

log(realism)

math simulation emulation live systems

Models:Sys, apps,Platforms,conditions

Real systemsReal applicationsReal platformsReal conditions

Tools for Distributed System Studies To investigate Distributed System issues, we need:1) Tools (model, simulators, emulators, experi. Platforms)2) Strong interaction between these research tools

Tools for Large Scale Distributed Systems

Real systemsReal applications“In-lab” platformsSynthetic conditions

Key system mecas.Algo, app. kernelsVirtual platformsSynthetic conditions

Page 4: Large Scale Experimental Grids

Grid'5000 and Grid eXplorer 4

Grid’5000 GdXGdX

log(cost)

log(realism)

math simulation emulation live systems

SimGridMicroGridBricksNS, etc.Model

Protocol proof

Grid eXplorerWANinLabEmulab

Grid’5000TERAGridPlanetLabNaregi Testbed

We need a Grid experimental platformAccording to the current knowledge:There is no large scale testbed dedicated to Grid experiments

Grid’5000 as a live system Grid eXplorer as a large scale emulator

Page 5: Large Scale Experimental Grids

Grid'5000 and Grid eXplorer 5

Grid’5000 GdXGdX

Fundamental components of Grids• Systems

– nodes, OS, – distributed systems mechanisms (resource discovery, storage, scheduling,

etc.), – middleware, runtimes,

– Fault (crash, transient)– Workload (multiple users/multiple applications)– Heterogeneity (resource diversity, performance)– Malicious users/behaviors

• Networks – routers, links, topology,– protocols, – Theoretical features: synchronous, pseudo synchronous or asynchronous

– Disconnection– Packet loss– Congestion

Static

Static

Dyn.

Dyn.

Page 6: Large Scale Experimental Grids

Grid'5000 and Grid eXplorer 6

Grid’5000 GdXGdX

1) Remotely controllable Grid nodes installed in geographically distributed laboratories

2) A « Controllable » and « Monitorable » Network between theGrid nodes

3) A middleware infrastructure connecting the nodes (security)

4) A playground to prepare experiments

5) A toolkit to deploy, run, monitor, control experiments and collect results

What do we need for Grid experiments ?

We need these components for A nation wide Experimental Platforms and Emulators

Page 7: Large Scale Experimental Grids

Grid'5000 and Grid eXplorer 7

Grid’5000 GdXGdX

-Thierry Priol (ACI Grid Director)-Brigitte Plateau (President of ACI Grid SC) -Dani Vandrome (Director of Renater)-Frédéric Desprez (Lyon)-Michel Daydé (Toulouse)-Yvon Jégou (Rennes)-Stéphane Lantéri (Sophia)-Raymond Namyst (Bordeaux)-Pascale Primet (Lyon)-Olivier Richard (Grenoble)

Steering Committee:(organizer: Franck Cappello, Orsay)

Technical Committee:-David Gueldrech (Sophia)-Jean Claude Barbet (Orsay)-Franck Bonnassieux (UREC)-Julien le duc (Grenoble)-Fred Desprez (Lyon)-Yvon Jégou (Rennes)-Olivier Coulaud (Bordeaux)-Frédéric Barbaresco (Toulouse)

Forums:Deployment/exploitation: Franck Cappello (AS1, RTP8)Programming models: Raymond Namyst (AS2, RTP8)

Grid experiments under real life conditionsGrid’5000

Page 8: Large Scale Experimental Grids

Grid'5000 and Grid eXplorer 8

Grid’5000 GdXGdX

1) Building a nation wide experimental platform for Grid researches (like a particle accelerator for the computerscientists)• 10/11 geographically distributed sites• every site hosts a cluster (from 256 CPUs to 1K CPUs)• All sites are connected by RENATER (French Res. and Edu. Net.)• RENATER hosts probes to trace network load conditions• Design and develop a system/middleware environment

for safely test and repeat experiments

2) Use the platform for Grid experiments in real life conditions• Address critical issues of Grid system/middleware:

• Programming, Scalability, Fault Tolerance, Scheduling• Address critical issues of Grid Networking

• High performance transport protocols, Qos• Port and test applications• Investigate original mechanisms

• P2P resources discovery, Desktop Grids

The Grid’5000 Project

Page 9: Large Scale Experimental Grids

Grid'5000 and Grid eXplorer 9

Grid’5000 GdXGdX

Lab’sNetwork

LAB/FirewallRouter

Test Cluster

ControlMaster

Site 1

Site 2

Site 3

Users(ssh loggin

+ password)

Firewall/nat

ControlSlave

Test Cluster

Front end

ControlSlave

Control site

Grid’5000 Big Picture

Gateway+VPN (192. For all nodes)

One machineCan be seen as a Virtual Grid Gateway

Page 10: Large Scale Experimental Grids

Grid'5000 and Grid eXplorer 10

Grid’5000 GdXGdXGrid’5000 Schedule

Grid’5000 Hardware

Call for proposals

Sept03

Selection of 7 sites

Nov03

ACI GRIDFunding

Jan04

Call for ExpressionOf Interest

March04

Vendorselection

Jun/July 04

Instal.First tests

Spt 04

Final review

Oct 04

FisrtDemo(SC04)

Nov 04

Grid’5000 System/middleware Forum

Security Prototypes (not full size)

Control Prototypes (not full size)

Grid’5000 Programming Forum

Grid’5000 Builder Community

Grid’5000 Experiments

Page 11: Large Scale Experimental Grids

Grid'5000 and Grid eXplorer 11

Grid’5000 GdXGdXGrid’5000 in November’2004

Grid 5000 nodes

(soon 4)3

Grid eXplorerPau

Page 12: Large Scale Experimental Grids

Grid'5000 and Grid eXplorer 12

Grid’5000 GdXGdX

Applications

Algorithms

Runtime

Middleware

Operating System

Network protocols

Grid’5000 Experiments

Page 13: Large Scale Experimental Grids

Grid'5000 and Grid eXplorer 13

Grid’5000 GdXGdXSummary of Grid5000 experiments

of Grid’5000 members• Networking

– End Host Communication layer– High performance long distance protocols– High Speed Network Emulation

• Middleware / OS– Grid’5000 control/access– Grid’5000 experiment automation– Scheduling / data distribution in Grid– Fault tolerance in Grid– Resource management– Grid SSI OS and Grid I/O– Desktop Grid/P2P systems

• Programming– Component programming for the Grid (Java, Corba)– GRID-RPC– GRID-MPI– Code Coupling

• Applications– Multi-parametric applications (Climate modeling/Functional Genomic)– Large scale experimentation of distributed applications (Electromagnetism, multi-material

fluid mechanics, parallel optimization algorithms, CFD, astrophysics – Medical images, Collaborating tools in virtual 3D environment

Page 14: Large Scale Experimental Grids

Grid'5000 and Grid eXplorer 14

Grid’5000 GdXGdXMiddleware1(XP)Grid5000

• Grid’5000 control- Computing Environment deployment (Ka-tools)- Experiment automation (security and control)- VGrid « mapping a virtual Grid on a real testbed »- Monitoring, benchmarking, performance characterization and analysis

• Scheduling / distribution- Scheduling : Data transfers, global communications, work stealing,...- Data re-distribution in Grid- Task distribution and load balancing in heterogeneous Grid- Mixed Parallelism (task and data parallelism) - Mixing data management and task scheduling- Hierarchical and Distributed Scheduling

• Fault tolerance- Fault tolerant Grid-RPC (RPC-V)- Hierarchical Fault tolerant MPI (MPICH-V)- Fault tolerant in data-flow approach (Athapascan)

XP: eXPeriments on

Page 15: Large Scale Experimental Grids

Grid'5000 and Grid eXplorer 15

Grid’5000 GdXGdXMiddleware2(XP)Grid5000• Management

- AROMA tool : resources management over a Grid of clusters with different classes of services

- Mobile agents for open Grid management- Management of Grids and hosted services (security, QoS, monitoring & control,

dynamic configuration, …)- Optimization for wide area distributed query processing- Virtualization of data storage on Grids- Automatic Deployment of GridRPC middle tier.- Multiclusters and lightweights Grid resource management (OAR/CIGRI)

• Global Computing/P2P Middleware- Executing Web Services on Desktop Grid Workers (XtremWeb)- Distributing the Coordination in Desktop Grids (XtremWeb)- Harnessing Clusters as parallel Workers - Probabilistic certification in peer-to-peer systems- Large Scale Data Sharing Service based on JXTA (JuxMem)- Experimenting management services for textual document in P2P systems

• Grid SSI OS and Grid I/O- Grid file system (NFSG)- Grid-aware OS (Kerrighed)- Coupling Computational Grid with Reality Center

Page 16: Large Scale Experimental Grids

Grid'5000 and Grid eXplorer 16

Grid’5000 GdXGdX

• End Host Communication layer

- Intelligent Usage of NICs for local and wide area communications

- Direct file access over Myrinet : ORFA/NFS and ORFA/LUSTRE

• High performance long distance protocols

- Alternative Transport for very high speed networks (backpressure)

- Differentiated transport with delay control on WAN

- Reliable active and non active Multicast

- Network Bandwidth optimization in Grid (VTHD++, Paco++).

• High Speed Network Emulation

- Automatic Deployment of emulated high speed domains

- Experiment design for grid flow interactions studies

• Grid Networking Layer

- Network Resource and QoS on demand

- Grid Overlay and Programmable Routers

- Measurement Services for network aware middleware

Network(XP)Grid5000

Page 17: Large Scale Experimental Grids

Grid'5000 and Grid eXplorer 17

Grid’5000 GdXGdX

• Component programming on the grid - ProActive : a JAVA library (parallel, distributed, concurrent

computing with security and mobility)- Assessment of scalability, deployment, security and fault tolerance issues- Hierarchical components architecture- PadicoTM/Paco++ combining parallel and distributed computing

• RPC Environment- Large scale experimentation of the DIET platform (Distributed Interactive Engineering Toolbox)- Client/Agent/Server model following the GridRPC standard with distributed scheduling agents

• MPI Environment- Time sharing Grid resources- Migration over Clusters with heterogeneous high speed networks

• Code Coupling- Application coupling with Athapascan - Communication / method invocation rescheduling into ORB (HOMA)- Fluid transfer simulation and geological code with PadicoTM/Paco++

Programming(XP)Grid5000

Page 18: Large Scale Experimental Grids

Grid'5000 and Grid eXplorer 18

Grid’5000 GdXGdX

Applications(XP)Grid5000• Multi-parametric applications

- ACI GRID-TLSE Project : expertise site for sparse linear algebra- Climate modeling and Global Change- DataGène Project : Functional genomic

• Large scale experimentation of distributed message passing applications – JECS: a JAVA Environment for Computational Steering

• Distributed computing and interactive visualization of 3D numerical simulations (Caiman and Oasis project-teams)

• Collaborative environment • Computational Electromagnetism application (JEM3D)

– MECAGRID (ACI GRID project, Smash project-team)• Massively parallel computations in multi-material fluid mechanics• Study of numerical algorithms for heterogeneous computing platforms

– Grid computing for medical applications (Epidaure project-team)• Interoperable medical image registration grid service

– Optimal design of complex systems (Coprin project-team)• Evaluation of parallel optimization algorithms based on interval analysis

techniques• Study of load balancing strategies on heterogeneous resources

+ CFD, astrophysics,… applications+ Collaborating tools in virtual 3D environment.

Page 19: Large Scale Experimental Grids

Grid'5000 and Grid eXplorer 19

Grid’5000 GdXGdX

Other experiments on Grid’5000

• Grid’5000 will be opened to other French Grid researchers (certainly through a selection procedure)

• Grid’5000 will be connected to the EU CoreGrid testbed and may be used as an experimental platform for CoreGrid researchers (still through a kind of selection procedure)

Page 20: Large Scale Experimental Grids

Grid'5000 and Grid eXplorer 20

Grid’5000 GdXGdX

log(cost)

log(realism)

math simulation emulation live systems

SimGridMicroGridBricksNS, etc.Model

Protocol proof

Grid eXplorer

Grid’5000

Grid’5000 + Grid eXplorerCombining two Grid research instruments

Relax RealLife Conditions

Relax ConditionsReproducibility

Page 21: Large Scale Experimental Grids

Grid'5000 and Grid eXplorer 21

Grid’5000 GdXGdXGrid experiments undersynthetic reproducible conditions

1) Build the instrument:- 1K CPU cluster (may be only 500 depending on the budget) - configurable network (Ethernet, Myrinet, others?) - configurable OS (kernel, distribution, etc.)- A set of emulation/simulation tools (existing + new ones)- Multi-users- Located/managed by IDRIS

2) Study impact of Scale in Grid/P2P systems 1) Address critical issues of Grid system/middleware:

• Programming, Scalability, Fault Tolerance, Scheduling• Address critical issues of Grid Networking

• High performance transport protocols, Qos• Port and test applications• Investigate original mechanisms

• P2P resources discovery, Desktop Grids

Grid'5000 and Grid eXplorer

GridGrideXplorereXplorer

Page 22: Large Scale Experimental Grids

Grid'5000 and Grid eXplorer 22

Grid’5000 GdXGdXGrid eXplorerBig picture

An experimentalconditions data base

Emulator CoreHardware + Soft:

Emulation &Simulation

A set of toolsfor analysis

A set of sensorsin Grid’5000

Validation onGrid’5000

Emulab cluster

Page 23: Large Scale Experimental Grids

Grid'5000 and Grid eXplorer 23

Grid’5000 GdXGdX

IMAG, ID (UMR 5132), Laboratoire d’Informatique et Distribution, Université de Grenoble

LaRIA (UPRES EA 2083), Laboratoire de Recherche en Informatique d’Amiens, Université de Picardie Jules Verne

LRI (UMR 8623), Laboratoire de Recherche en Informatique, Université de Paris-sud

LAAS-CNRS (UPR 8001), Laboratoire d'Analyse et d'Architecture des Systèmes

LORIA (UMR 7503), Laboratoire lorrain de recherche en informatique et ses applications

LIP-ENS Lyon (URM 5668), Laboratoire de l'Informatique du Parallélisme

LIFL (ESA 8022), Laboratoire d’Informatique Fondamentale de Lille

INRIA Sophia Antipolis, UNSA, I3S-CNRS

LIP6 (UMR 7606), Laboratoire d'Informatique de Paris 6

LABRI (UMR 5800), Laboratoire Bordelais de Recherche en Informatique

IBCP (UMR5086), Institut de Biologie et Chimie des Protéines

CEA, Direction des Technologies de l'Information (Saclay)

IRISA, Institut de Recherche en Informatique et Systèmes Aléatoires

Laboratories involved in GdX 13 Labs

Page 24: Large Scale Experimental Grids

Grid'5000 and Grid eXplorer 24

Grid’5000 GdXGdXExperiences Infrastructure Emulation Network ApplicationI.1 Platform X X X XI.2 Virtual Grid X X

I.3 Virt. Techniques X X

I.4 Emul driven Simul X

I.5 Network emul. X X X

I.6 Heterogeneity emul X

I.7 Communication X

I.8 Internet Emul. X X X

II.1 Engineering tech. X X X

II.2 Mobile objects X X

II.3 Fault tolerance X X

II.4 DHT X

II.5 Data base X X

II.6 Scheduling X X

II.7 Comm. Optimizat. X

II.8 Data sharing X X

II.9 Uni and multicast X X

II.10 Cellul. automaton X X

II.11 Bioinformatique X

II.12 P2P storage X X

II.13 Reliability X X X

II.14 Security X X X

II.15 NG. Internet X X X

II.16 Grid coupled sys. X

Page 25: Large Scale Experimental Grids

Grid'5000 and Grid eXplorer 25

Grid’5000 GdXGdXGrid eXplorer and Grid’5000interactions

DesignTest/Check

Validation underReal Life Conditions

Grid’5000

DesignTest/Check Validation of

Scalability and underSynthetic Conditions

Grid eXplorer

Integration to standard middleware, Deployment,

Performance

Scalability, Fault tolerance

Page 26: Large Scale Experimental Grids

Grid'5000 and Grid eXplorer 26

Grid’5000 GdXGdX

Summary

• Researches in Grid and P2P need large scale platforms– To study protocols, systems, middleware, programming models

and applications in real life OR reproducible experimental conditions

• Grid’5000 and Grid eXplorer- Will be experimental platforms for Grid researchers

(like particle accelerator for physicists)- A nation wide platform and a large scale emulator- Strong relations between this two projects (researchers are the same

persons for the two projects!)- Hardware should be installed by November 2004- Prototypes (security and control) should work for November 2004- Will be opened for experiments in early 2005

Page 27: Large Scale Experimental Grids

Grid'5000 and Grid eXplorer 27

Grid’5000 GdXGdX

Q&A