grid@inria ii – 17-18/07/02 aci grid asp client-server approach for simulation over the grid...
TRANSCRIPT
GRID@INRIA II – 17-18/07/02
ACI GRID ASP Client-Server Approach
for Simulation over the GRID
Frédéric DesprezLIP ENS Lyon
ReMaP Project
2
GRID@INRIA II – 17-18/07/02
Outline
• Grid RPC and ASP concepts• ACI Grid ASP• Target applications• DIET
– History– An ASP platform
RNTL
3
GRID@INRIA II – 17-18/07/02
• Future of parallel computing: distributed and heterogeneous • Metacomputing/Grid Computing = Using distributed sets of
hetereogeneous platforms• Network Computing today!
– SMP clusters with very fast processors, high performance (and low cost) networks, (almost) mature software
• (Too) many projects• Target: many applications in many different fields (not
only number crunching or embarrasingly parallel ones)
• Some important problems:– algorithmic (data distribution, load-balancing, latency
tolerant algorithms, ...)– system (administration, fault-tolerance, security,
resource localisation, …)– software (interoperability, code re-use, ...)
• Global Grid forum
INTRODUCTION
4
GRID@INRIA II – 17-18/07/02
One long-term idea for Grid computing: renting computational power and memory capacity over the net Very high potential
• Need of PSEs (Problem Solving Environments ) and ASPs (Application Service Provider)
• Applications will always need more and more computational power and memory capacity
• Some libraries or codes need to stay where they have been developed• Some confidential data must not travel over the net• Use of computational servers reachable through a simple interface
Still difficult to use for non-specialists Almost no transparency Security and fault-tolerance problems are generally not addressed
enough
Often application-dependent PSEs No standards (CORBA, JAVA/JINI, sockets, …) to build the
computational servers
INTRODUCTION, cont.
5
GRID@INRIA II – 17-18/07/02
Outline
• Grid RPC and ASP concepts• ACI Grid ASP• Target applications• DIET
– History– An ASP platform
6
GRID@INRIA II – 17-18/07/02
RPC and Grid-Computing : GridRPC
• One simple idea – Implement the (old!) RPC programming model over the GRID– Use computational resources available over the net – Applications that have huge computational and/or data storage
needs– Task parallel programming model (synchronous and asynchronous
calls) + data-parallelism on the servers themselves, mixed parallelism
• Features needed– Load-balancing (resource localisation and performance evaluation,
scheduling), – Simple interface, – Data distribution et migration, – Security, – Fault-tolerance, – Interoperability with other systems, …
7
GRID@INRIA II – 17-18/07/02
Five fundamental components:• Client
Provides several user interfaces and submit requests to servers
• ServerReceives clients requests and executes the software modules on their behalf
• DatabaseStores the static and dynamic data about the software and hardware resources
• SchedulerCatches the clients requests and takes decisions to map the tasks on the servers depending on the data stored in the database
• MonitorDynamically monitors the status of computational resources and stores the obtained information in the database
RPC and Grid-Computing : GridRPC, cont.
8
GRID@INRIA II – 17-18/07/02
AGENT(s)
S1 S2 S3 S4
Client
A, B, C
Answer (C)
S2 !
Request
Op(C, A, B)
ASP Scheme
10
GRID@INRIA II – 17-18/07/02
• Middleware between portals and Grid components• Basic tools for the deployment of large scale environments
(Web portals, Problem Solving Environments, Grid Toolkits, …)
• Big success on several applications• Discussion in the Advanced Programming Models (APM)
working group from the Global Grid Forum• GridRPC Client API proposed
RPC and Grid-Computing : GridRPC, cont
SCIRun torso defibrillator application – Chris Johnson, U. of Utah
11
GRID@INRIA II – 17-18/07/02
• Security– Authentication and Authorization– Data transfers
• Fault-tolerance– Servers or agents
• Interoperability– Problem description– API
• Data management– Data persistence– Data (re)distribution – Garbage collection
• Check-pointing– Fast parallel IO
• Scalability– Hierarchy of servers/agents
• User assistance/PSE– Automatic choice of solutions
• Resource localization– Hardware and software
• Scheduling – On-line scheduling of off-line
scheduling• Sharing servers between users
– Security problems– Lock/unlock, data consistency,
race conditions• Performance evaluation
– Heterogeneity– Batch systems
• Data visualization – Scalability problems
• Dynamic platform – Resource localization– Agents/servers mapping
RPC and Grid-Computing: GridRPC: related problems
12
GRID@INRIA II – 17-18/07/02
• Some available tools
• NetSolve (University of Tennessee, USA)
• Ninf and OmniRPC (Japan)
• DIET (ReMaP, ARES, LIFC, Résédas)
• Based on CORBA
• NEOS, Meta-NEOS (Argonne National Lab., USA)
• Combinatorial optimization problems
• RCS (ETH Zürich)
• ScaLAPACK Servers
• NIMROD, NIMROD-G (Monash University, Australia)
RPC and Grid-Computing : GridRPC, cont
13
GRID@INRIA II – 17-18/07/02
Outline
• Grid RPC and ASP concepts• ACI Grid ASP• Target applications• DIET
– History– An ASP platform
14
GRID@INRIA II – 17-18/07/02
Project Overview
• Multi-disciplinary project• Rent computational power and memory capacity over the net
• Four applications with different needs and different behavior
• Develop a toolbox for the deployment of application servers
• Study the impact of these applications on our environment
and adapt it to these new needs
• A highly hierarchical and heterogeneous network (VTHD +
networks of the labs involved in the project)
• A software architecture developed in an RNTL project
(GASP)
15
GRID@INRIA II – 17-18/07/02
• High speed network between INRIA research centers(2.5 Gb/s) and several other research institutes
• Connecting several PCs clusters, SGI O2K, and virtual reality caves
• Ideal test platform for our developments
• RNRT project
• Several Grid computing projects – Parallel CORBA objects, – Grid computing environments
and multi-protocols communication layers, – Computational servers, – Code coupling, – Virtual reality, ...
Experimentation platform: VTHD
16
GRID@INRIA II – 17-18/07/02
ASP Partners
• ReMaP – LIP ENS Lyon F. Desprez, E. Caron, P. Combes, M. Quinson, F. Suter, Ing. X, Y
• ARES – INSA Lyon E. Fleury
• Résédas – LORIA Y. Caniou, E. Jeannot
• SDRP – LIFC J.-M. Nicod, L. Philippe, S. Contassot, F. Lombard
• Physique Lyon 1, Physique ENS Lyon, MAPLI J.-L. Barrat, V. Volpert
• LST – ENS Lyon G. Vidal
• SRMSC Nancy G. Monard
• IRCOM R. Quéré, R. Sommet
17
GRID@INRIA II – 17-18/07/02
Outline
• Grid RPC and ASP concepts• ACI Grid ASP• Target applications • DIET
– History– An ASP platform
18
GRID@INRIA II – 17-18/07/02
Target Applications
• Researchers of four different fields (chemistry, physics, electronics, geology)
• Four applications with different needs and different behavior
Digital Elevation Models Molecular Dynamics
HSEP Microwave circuits simulation
19
GRID@INRIA II – 17-18/07/02
Applications in ASP Mode
Study the target applicationsValidate the parallel versions on serverDevelop client and server « glues » and adapt DIETValidate the prototype with non-specialist usersAdapt DIET if necessary
20
GRID@INRIA II – 17-18/07/02
Digital Elevation Models (MNT)
• Stereoscopic processing:• Maximal matching between the spots of both pictures.
• Elevation computation.
View angles information and coordinates of initial corresponding points
• Geometrical constraints
• Optical disparities
MNT Binary files
LST
21
GRID@INRIA II – 17-18/07/02
DIETAGENT(s)
S1
Geologist
Maps server MNT server
Client
S2
Digital Elevation Models (MNT), cont.
LST
22
GRID@INRIA II – 17-18/07/02
Digital Elevation Models (MNT), cont.
• Specific needs:
– Great amount of memory
– Great amount of data
– Visualization
• ASP approach:– Computational power:
Processing high-definition pictures
Ex : Pictures from SPOT Satellite < 5m Reducing processing time
Ex : Earthquake.
LST
23
GRID@INRIA II – 17-18/07/02Physique Lyon 1, Physique ENS Lyon, MAPLI
Molecular Dynamics
• Simulation of atomic trajectories from molecular interactions
- Hydrodynamics (velocity fields, temperature, etc.)
- Mechanical properties for solids at a micro scale:
• Short range interactions:
- Partitioning Good parallelism.
• Differential equation solving:
• Logs dumped on disk and exploited postmortem
• Private and public codes
)}({2
2
jkk xF
dt
xd k = 1…10-6
24
GRID@INRIA II – 17-18/07/02
DIETAGENT(s)
S1S2
Client
Physicist
Application server Application server
Molecular Dynamics, cont.
Physique Lyon 1, Physique ENS Lyon, MAPLI
25
GRID@INRIA II – 17-18/07/02Physique Lyon 1, Physique ENS Lyon, MAPLI
Molecular Dynamics, cont.
• Specific needs:– High accuracy– Large systems– Disk logs
• ASP approach:– Computational power– Checkpointing mechanisms on the grid
26
GRID@INRIA II – 17-18/07/02
Potential Energy HyperSurface (HSEP)
• Distributed computation of various points on a surface (quantum chemistry)
• Existing software: Gaussian (PSMN) QC++ (free code)
Computedpoints
Molecularconfiguration
X
SRSMC
27
GRID@INRIA II – 17-18/07/02
DIETAGENT(s)
S1
S2
Client DB
Chemist
QC++ server
Gaussian server
Database of computed points
HSEP, cont.
SRSMC
28
GRID@INRIA II – 17-18/07/02
• Specific needs:– Use of a Relational DB (MySQL) storing all
computation done and to be done– A Web Interface (http+PHP) links the client to the RDB
and DIET– Results filtering through Python scripts – Complexity: O(N4)
• ASP approach:– DB as a DIET client– Security– Coarse grain parallelism
HSEP, cont.
SRSMC
29
GRID@INRIA II – 17-18/07/02
Microwave Circuits Simulation
• Direct coupling between transport equations of Hetero-junction Bipolar Transistors and circuit simulator for coupled microwave circuit/components design
• Coupling between
o Physical simulator of HBT
o Circuit simulator
o Thermal reduced model derived from 3D Finite Element
simulation
Integrated simulator
• Analysis tool, predictive and “process” oriented (co design of the circuit and the transistor devices for a given application: amplifier, mixer ...)
IRCOM
30
GRID@INRIA II – 17-18/07/02
DIETAGENT(s)
S1
S2
Sparse solver serverSimulation server
Client
Microwave Circuits Simulation, cont.
IRCOM
31
GRID@INRIA II – 17-18/07/02
Microwave Circuits Simulation, cont.
• Large systems to solve
Clients look fast and efficient sparse solvers
• Simulators source code may be confidential
Dedicated servers for physical simulation, reachable through DIET which provides the part of the jacobian matrix in order to build the large system to solve
IRCOM
32
GRID@INRIA II – 17-18/07/02
Metacompil (CRI, Ecole des Mines Fontainebleau)
DIETAGENT
S1S2
Compilation server Application server
Client
source source
Parallelizedsource code
problem
33
GRID@INRIA II – 17-18/07/02
Outline
• Grid RPC and ASP concepts• ACI Grid ASP• Target applications • DIET
– History– An ASP platform
34
GRID@INRIA II – 17-18/07/02
Where do we start from ?
• 1998-2000: ARC INRIA OURAGANTools for the resolution of large size numerical problems
– Parallelization of Scilab (PVM, MPI, PBLAS, BLACS, ScaLAPACK, Pastix, NetSolve)
– Use of Scilab in front of computational servers (parallel or sequential)
– NetSolve optimization (data persistence, development of an environment for the evaluation of communication and computational performance)
– ReMaP, Métalau, Résédas, LIFC, LaBRI
35
GRID@INRIA II – 17-18/07/02
• Ideas – Scilab as a first target application– Simplify the use of new libraries (sparse systems libraries)– Benefit from the development of software components around
Grid computing– Develop a toolkit for the deployment of computational servers
• First prototype developed from existing software modules– NetSolve (University of Tennessee, Knoxville)– NWS (UCSD and UTK) for the dynamic evaluation of performance
Our developments on libraries (data redistribution routines, sparse solvers, out-of-core routines)
– LDAP software database and CORBA for the server management
Our first view of computational servers
36
GRID@INRIA II – 17-18/07/02
• Add some features to NetSolve for our applications– Data-persistence on servers – Data-redistribution and parallelism between servers– Better evaluation of [routine, machine] pairs for fine
grain computation– Portable database for available libraries (LDAP-based)
• Get an experimentation platform for our developments– Mixed parallelism (data- and task-parallelism)– Scheduling heuristics for data-parallel tasks– Parallel algorithms for heterogeneous platforms– Performance evaluation– Server management using CORBA
Our first goals
37
GRID@INRIA II – 17-18/07/02
NetSolve over VTHD
AgentClients
Servers
38
GRID@INRIA II – 17-18/07/02
NetSolve Behavior
Utilisation intensive
• VTHD Network• Clients: Rennes cluster (paraski)
• Scheduler: NetSolve Agent (Rocquencourt)
• Server : paraski26 (paraski)
39
GRID@INRIA II – 17-18/07/02
DIET (Distributed Interactive Engineering Toolbox)
S2
S3
Batch system
S1Local
Scheduler
AGENT
Scheduler
Visualization server
Software database
(distributed)
Performancedatabase
(distributed)
AGENTScheduler
AGENTScheduler
C, Fortran, Java
Direct connection
40
GRID@INRIA II – 17-18/07/02
DIET Goals
http://www.ens-lyon.fr/~desprez/DIET/
• Our goals:
• Develop a toolbox for the deployment of ASP environments with
different applications
• Use as much as possible standard (and public domain) software
• Obtain a high performance and scalable environment
• Implement our more theoretical results in this environment
(scheduling, data (re)distribution, performance evaluation, algorithms
for heterogeneous platform)
• Use CORBA, NWS, LDAP and our software components (SLiM and
FAST)
• Different applications (simulation, compilation, …)
• ReMaP, ARES, Résédas, LIFC, Sun Labs (RNTL GASP)
41
GRID@INRIA II – 17-18/07/02
MAMA
MAMA
MA
LA
LALA
LA
• Hierarchical architecture for scalability• Distributing information in the entire tree• plug-in schedulers• Data persistence
Direct connection
Computational serverfront-end
Master Agent
Hierarchical Architecture
Local Agent
42
GRID@INRIA II – 17-18/07/02
Evaluation of DIET’s Server Invocation
43
GRID@INRIA II – 17-18/07/02
• Distributed set of agents for an improved scalability • Study of several connection schemes between agents
(hierarchical, distributed, duplicated agents, …) and agent mapping
• Tree-based scheduling algorithms with information distributed in each node in the hierarchical approach
• Connection to FAST to gather information about resources and to SLiM to find the available applications
• Different generic and application dependent schedulers
• Corba, JXTA
DIET AGENT(s)
C
A
S S S
C
C
C
A
S S S
AA
A
S
S
S
S
C
C
CC
S S S
A
S S S
A
S S S
A
C CC
C
CC
44
GRID@INRIA II – 17-18/07/02
• Performance evaluation of the GRID-RPC platform• Finding one (or many) efficient server(s) (computational
cost of the function requested, server’s load, communication costs between the client and the server, memory capacity, …)
Performance database for the scheduler
• Hard to accurately model (and understand) networks like Internet or VTHD
• Need for a small response time • To be able to model applications (problems with application
which execution time depends of the input data)• Accounting
Performance Evaluation
45
GRID@INRIA II – 17-18/07/02
• NWS-based (Network Weather Service from UCSB)• Computational performance
– load, memory capacity, and performance of batch queues (dynamic)
– Benchmarks and modelisation of available libraries (static)
• Communication performance– To be able to guess the data redistribution cost between
two servers (or clients to server) as a function of the network architecture and dynamic information
– Bandwidth and latency (hierarchical)• Hierarchical set of agents
– Scalability problems A B
C
FAST: Fast Agent’s System Timer
46
GRID@INRIA II – 17-18/07/02
Memory
Name server
SensorSensor
Forecaster
Request
Request
Network Weather Service (Wolski, UCSB)• Measure the availability of resources
• CPU load, bandwidth, etc.
• Forecast the variation with statistics• Extensible and open• Used by many projects (Globus, NetSolve, Ninf, etc.)
Test
Storage
Test
StorageClient
Data
Answer
Availability of the System: NWS
47
GRID@INRIA II – 17-18/07/02
NWSLDAP
Needs modelingSys availabilities
Client application
Structural approach
BenchmarkerInstallation
time
Run-timelibrary
Benchmarker
Overall Architecture
48
GRID@INRIA II – 17-18/07/02
Comparison of estimated and measured time
0
20
40
60
80
100
120
128
256
384
512
640
768
896
1024
1152
Matrix size
Tim
e (s
) Model (Pixies)Measured (Pixies)Model (Kwad)Measured (Kwad)
Mean error: 1%
Time Modeling of DGEMM
49
GRID@INRIA II – 17-18/07/02
Comparison of expected and measured time
0
20
40
60
80
100
120
140
160
128 256 384 512 640 768 896 1024
Matrices size
Tim
e (s
)
MeasuredExpected
23%
Mean error: 15%
Performance forecasting: Complex matrices multiplication
50
GRID@INRIA II – 17-18/07/02
NWS Optimization- response time
0,099569 s 0,100685 s
24 us0
0,02
0,04
0,06
0,08
0,1
0,12
Direct interrogation Interrogation by FAST(cache default)
Interrogation by FAST(without cache
default)
Tim
e (s
)
51
GRID@INRIA II – 17-18/07/02
NWS Optimization- collaboration with the scheduler
0
0,5
1
Time (s)
Pro
cess
or a
vaila
bilit
y (%
)
without collaboration with collaboration theoretical value
Execution of a taskIdle timeIdle time
52
GRID@INRIA II – 17-18/07/02
• Shortest Execution Time• Other algorithms possible (economical model, dead-line
scheduling, classical problem of on-line scheduling)• Request sequencing• Mono (or distributed)-agent(s) for Ninf and NetSolve• Hierarchy of agents for DIET (local scheduling)• Model the cost of the
scheduling itself
Scheduling
SCHEDULER(s)
53
GRID@INRIA II – 17-18/07/02
• Mandatory !– Securing the data transfers, – Authentication of clients on servers– Authentication of servers on clients– Sharing of servers between several clients (application
coupling)– Delegation of authority
• Netsolve– Use of Kerberos V5– Generation of access lists
• Ninf, DIET– Authentification based on SSL (GSI-like)– NAA (NES Authentication Authorization module)
Security
54
GRID@INRIA II – 17-18/07/02
Short-Term Work
• Resource localization– Hardware and software
• Scheduling – On-line scheduling of off-line
scheduling• Sharing servers between users
– Security problems– Lock/unlock, data consistency,
race conditions• Performance evaluation
– Heterogeneity– Batch systems
• Data visualization – Scalability problems
• Dynamic platform – Resource localization– Agents/servers mapping
• Security– Authentication and Authorization– Data transfers
• Fault-tolerance– Servers or agents
• Interoperability– Problem description– API
• Data management– Data persistence– Data (re)distribution – Garbage collection
• Check-pointing– Fast parallel IO
• Scalability– Hierarchy of servers/agents
• User assistance/PSE– Automatic choice of solutions
55
GRID@INRIA II – 17-18/07/02
• Development of a set of portable (and open-source) set of tools to build ASP environments
• Multi-applications, multi-platforms and multi-interfaces• Use of developments made in other projects (NWS,
NetSolve, Ninf, Globus, Paris, CGP2P, ACI TLSE)• Concentration of several problems like resource
localization, scheduling, agent deployment, algorithmic for heterogeneous platforms, performance analysis
• Find new applications … not only number crunching ones (ex: Metacompil)
• Support of the Grid RPC standard proposed by NetSolve and Ninf teams
• Follow the Global Grid Forum
Conclusion and Future Work
http://www.ens-lyon.fr/~desprez/DIET/