component-based grid environment for programming scientific applications
DESCRIPTION
Component-based Grid Environment for Programming Scientific Applications. Maciej Malawski. Outline. Problem: programming applications on Grid Programming models and virtualization CCA + H2O Extensions to the environment Applications and tests Summary and future work. - PowerPoint PPT PresentationTRANSCRIPT
1
Component-based Grid Environment for Programming Scientific Applications
Maciej Malawski
2
Outline
• Problem: programming applications on Grid• Programming models and virtualization• CCA + H2O• Extensions to the environment• Applications and tests• Summary and future work
3
Experience (CrossGrid) Grid is complex
T e s t b e d
ApplicationsApplications
ServicesServices
ToolsTools
•17 sites•9 countries
•over 200 CPUs•4 TB of storage
RoamingAccess Server
MigratingDesktop
OCM-G
PerformancePrediction
Data Access Globus
Toolkit
MPI Verification
MPI Library
Portal
Post-processing
InfrastructureMonitoring
Plugin
Plugin
Plugin
SOAP
SOAP
SOAP
SOAP
SOAP
Protocol
API
API
API
API
Links
API
API API
SOAP
SOAP
API
API(JMX)
(OMIS)
VisualizationKernel
Links
PerformanceAnalysis
API
ApplicationMonitoring
Benchmarks
NetworkMonitoring
MedicalSupport
ParticlePhysics
Meteo/Pollution
FloodSimulation
DataGrid
Scheduler
RoamingAccess Server
MigratingDesktop
OCM-G
PerformancePrediction
Data Access Globus
Toolkit
MPI Verification
MPI Library
Portal
Post-processing
InfrastructureMonitoring
Plugin
Plugin
Plugin
SOAP
SOAP
SOAP
SOAP
SOAP
Protocol
API
API
API
API
Links
API
API API
SOAP
SOAP
API
API(JMX)
(OMIS)
VisualizationKernel
Links
PerformanceAnalysis
API
ApplicationMonitoring
Benchmarks
NetworkMonitoring
MedicalSupport
ParticlePhysics
Meteo/Pollution
FloodSimulation
DataGrid
Scheduler
4
Problem – how to program grid applications
• Scientific applications: Compute intensive May be data-intensive Often custom-made Written in many programming languages (e.g. Fortran) Collaborative
• Current practice on Grid: “Write a JDL scripts which submits a shell script as a batch job, which uses SSH
to launch a process on the head node of the cluster to serve as a proxy for communication...” (from CGW'06 presentation by ICM)
“Submit a shell script which queries the LFC catalog, retrieves TAR archive from SE using GRIDFTP, unpacks the archive, runs another computing script, stores the output on SE and registers in LFC catalog.” - a biomedical application (CGW'06)
• Problems with scientific computing (IPDPS'05 panel discussion): Software Software Software... engineering
5
Two key challenges
• Programming model Suitable for the distributed environment Allowing to manage complex applications Supported by standards Supporting scientific applications Facilitating programming
• Virtualization Hiding the complexity of heterogeneous environment Allowing to dynamically create/acquire pools of resources on
demand
6
Research objectives
• Concept of programming environment for scientific applications on Grid Analysis of programming models for grid applications Identification of desired features of programming environment
• Prototype implementation and feasibility study• Verification of the model and prototype with
typical applications
• Thesis (provisional): Extended Component model may be used for creating grid
environment for programming and running complex scientific applications.
7
Many programming models
• MPI, PVM• Custom protocols• Tuple spaces, HLA• Distributed objects• Active objects • Components• Skeletons• Service Oriented Architectures, Web Services
8
Virtualization: state of the art (incomplete)
• Globus GRAM, Condor, VDT, gLite, Unicore large-scale batch job oriented submission systems
• Virtual Workspaces: using Globus to submit VMWare (or other type) virtual machines to create a Condor pool of resources, which can be in turn accessible using Globus Toolkit Cannot call it lightweight solution!
• SOA – everything accessible as Web Service Efforts to support dynamic service deployment
• Component model: a container provides a virtualization layer for hosting components Dynamic deployment directly embedded into a programming model -
(component = unit of deployment)
9
What are components?• A unit of software development/deployment/reuse
i.e. has interesting functionality Ideally, functionality someone else might be able to (re)use Can be developed independently of other components
• Interacts with the outside world only through well-defined interfaces
• Can be composed with other components “Plug and play” model to build applications Composition based on interfaces
• Hosted in a framework/container responsible for other services (communication, security)
10
Benefits of Component-based Approach
• Enables composing applications from blocks which originally were not designed to be combined
• Addresses software complexity issues• Many frameworks provide language interoperability• Enformcement of separation of interface from
implementation• Facilitates managing third party libraries• Allows easy swapping of implementation• Increases software productivity• Mature and successful technology in business and
desktop applications
11
Components vs. Web Services
• Component: Formal models for component
programming (e.g. Fractal) May be created on-demand,
e.g. more components deployed when needed
Explicitly declare required interfaces (uses ports) – can be directly connected – no need to pass invocation data via central workflow engine
May have parallel connections Does not require SOAP as a
protocol
Client2
...
ServerComponent
Client1
ClientN
Server2
Client
...
Server1
ServerN
12
Proposed approach to building grid environment
• Use a component model• Apply a virtualization layer• Design a base component environment with a
set of desired features• Extend the environment features
13
Desired features of Grid components
• Scalable to different environments (from laptops to HPC clusters) lightweight platform dynamic, pluggable, reconfigurable at runtime
• Facilitated deployment on shared resources Virtualization (creating dynamic workspaces) Dynamic (hot) deployment
• Communication adjusted to various levels of coupling P2P, WANs, LANs, intercluster connections, direct binding in one process supporting parallelism
• Supporting multiple languages allowing easy adaptation of legacy code combining Java flexibility with optimized Fortran libraries
• Facilitating programming composable in space and in time taking advantage of semantic description and reasoning
• Adapted to unreliable Grid environment supporting dynamic and interactive reconfiguration of connections, locations, bindings providing support for migration and checkpointing
• Interoperability with grid standards Web Services – SOAP, WSDL, possibly WSRF Grid Component Model (ProActive/Fractal)
14
State of the art – examples of solutions (incomplete)
• Scalable to different environments (from laptops to HPC clusters) HPC: CCAFFEINE, GridCCM Lightweight: XCAT, ProActive, ICENI
• Facilitated deployment on shared resources ProActive, XCAT (using Globus)
• Communication adjusted to various levels of coupling CCAFFEINE – direct binding, MPI; XCAT – SOAP optimized communication: IBIS, GridCCM Parallel, collective communication: GridCCM, IBIS, ProActive
• Supporting multiple languages legacy code: BABEL Interoperability: CORBA, SOAP
• Facilitating programming composable in space and in time: XCAT, ICENI, GCM – hierarchical Skeleton approach: HOC, ASSIST taking advantage of semantic description and reasoning: ICENI, Semantic Web Services
• Adapted to unreliable Grid environment dynamic and interactive reconfiguration: ProActive, XCAT, Web Services model migration and checkpointing: Proactive, XCAT
• Interoperability with grid standards Web Services – XCAT, ProActive Grid Component Model: ProActive reference implementation
15
Base for the Solution: CCA and H2O
• Common Component Architecture (CCA) Component standard for HPC Uses and provides ports described in SIDL Support for scientific data types Existing tightly coupled (CCAFFEINE) and
loosely coupled, distributed (XCAT) frameworks
• H2O Java-based distributed resource sharing
platform Providers setup H2O kernel (container) Allowed parties can deploy pluglets
(components) Separation of roles: decoupling
• Providers from deployers• Providers from each other
RMIX: efficient multiprotocol RMI extension
ContainerProvider host
Deploy Lookup& use
Provider Client
<<create>>
B
A
Provider
<<create>>
A
B
Container
Lookup& use
Client
DeployProvider,
Client,or Reseller
Provider host
Traditional model
Proposed model
ContainerProvider host
Deploy Lookup& use
Provider Client
<<create>>
BB
AA
Provider
<<create>>
AA
BB
Container
Lookup& use
Client
DeployProvider,
Client,or Reseller
Provider host
Traditional model
Proposed model
16
Example scenarios of H2O
1. Provider = deployer
e.g. resource = legacy application
2. Reseller:= developer = deployer
e.g. computational service offered within a grid system
3. Client = deployer
e.g. client runs custom distributed application on shared resources
Deploy
B
A
LegacyApp
DeployProvider
AClient
Repository
A BReseller
C
Deploy
Anativecode
ProviderClient
Repository
ABDeveloper
C
ProviderClient
B
A
...
Registration and Discovery e-mail,phone, ...JNDIUDDI LDAP DNS GIS ...
B
Publish Find
Provider
17
Features of the environment• Scalable to different environments (from Laptops to HPC clusters)
– lightweight platform: use H2O
– dynamic, pluggable, reconfigurable at runtime: dynamic CCA model + H2O kernel facilities• Facilitated deployment on shared resources
– Static virtualization by using H2O kernel as a daemon– Dynamic virtualization using a pool of transient H2O kernels created on-demand
• Communication adjusted to various levels of coupling– Offered by RMIX library of H2O
– Parallel extensions for CCA: multiple ports
• Facilitating programming
– Composition in time: Low-level Python or Ruby Scripting, High-level: Virolab/GridSpace programming environment
– Semantic description: under development within Virolab
• Supporting multiple languages– Integration of RMIX with Babel
– Integration of MOCCA with Babel – pending• Interoperability with grid standards
– Web Services – future work (technically feasible: either RMIX of embedded server – Xfire)– Grid Component Model (ProActive/Fractal) interoperability – recent work
• Adapted to unreliable Grid environment– supporting dynamic and interactive reconfiguration of connections, locations, bindings
– providing fault-tolerance support: migration and checkpointing – future work
18
MOCCA – a basic component framework
• Each component is a separate pluglet Dynamic remote deployment of components Components packaged as JAR files Security: Java sandboxing, detailed access policy
• Using RMIX for communication – efficiency, multiprotocol interoperability• Flexibility and multiple scenarios – as in H2O• MOCCA_Light: pure Java implementation
Java API or Jython and Ruby scripting for application asssembly
• http://www.icsr.agh.edu.pl/mambo/mocca
ComponentPlugletComponent
Pluglet
CCAComponent
ComponentPluglet
CCAComponent
BuilderPluglet
H2O Kernel
BuilderService
Invoke
Manage
Builder
CCACCA
Pluglet Pluglet
Builder Builder
CCACCA
Pluglet Pluglet
BuilderBuilder
CCACCA
Pluglet Pluglet
Builder
MoccaMainBuilder
MoccaMainBuilder
19
Dynamic virtualization
• A pool of computing resources may be created by submitting a number of H2O kernels on many Grid sites
• Application components may be deployed on the kernels belonging to the pool
• Virtual resource pool may be used by a single user or shared for collaboration
• Interaction with cluster nodes in private network – JXTA transport (needs more testing)
Standalonemachine
Cluster
Grid node
ResourceBrokerSSH
PBSLCG
H2O
H2O
H2OH2OH2O
H2O
User'svirtual
resourcepool
NSbind()
lookup()
20
Communication extension: RMIX over JXTA
• Fully operational RMI implementation running over JXTA P2P network
• Methods can be invoked on remote objects located behind firewalls or NATs
• Our implementation of JXTA socket factories manages all the JXTA connectivity transparently from user’s point of view
21
Parallelism: Extensions of CCA for Multiple Ports and Connections
• Multiple users of one provides port (easy part) Single provides port Naming convention for client
components (client1, client2, ...)• Single client of multiple
providers: Need multiple uses ports on the
client side Use ParameterPort of CCA to
parametrize the number of uses ports
Client component creates a required number of uses ports
Naming convention for server components and uses port names
• Extension of CCA BuilderService: MultiBuilder Creation of multiple components Handling multiple connections
Client2
...
ServerComponent
Client1
ClientN
Server2
Client
...
Server1
ServerN
22
Support for composition in space and in time
• Declarative vs. imperative programing
• Composition in space Graph of component
connections ADL – Application
Description Language Supported by
MOCCAccino• Composition in time
Workflow model (script) Centralized execution Currently supported low-
level scripting in Jython and JRuby
High-level scripting developed within Virolab
init()
store()
...
getMolecule()
simulate()
Configuration Generator
Simulated Annealing
Storeroom
Runtime system
Invocation
Direct connection
Simulated Annealing
Simulated Annealing
23
Composition in space - Moccaccino
• ADLM (ADL for MOCCAccino) – XML based language for: Describing types and number of components and their
connections Concept of hierarchical component groups Optional information to specify resources Hints for deployment of components (whether they are
computation intensive or communication intensive).• Application Manager – responsible for:
Discovering available kernel pool Planning optimal location of components Deploying components in specified kernels Connecting components
24
Moccacino usage
Pong
Ping
list
map
index: 0
key: „left”
index: 1
key: „right”
Zonk
Pong
Zonk Zonk Zonk
map
key: „left”
key: „right”
PongPong
PingPing
list
map
index: 0
key: „left”
index: 1
key: „right”
ZonkZonk
PongPong
ZonkZonk ZonkZonk ZonkZonk
map
key: „left”
key: „right”
…each with 2-element list of…
…each withmap with „left”
and „right”keys of…
1
* 1
*
Pings
Pongs
Zonks
Component GraphBiulder creates one
component instance of…
…each with 2-element list of…
…each withmap with „left”
and „right”keys of…
1
* 1
*
PingsPings
PongsPongs
ZonksZonks
Component GraphBiulder creates one
component instance of…HDNSRegistry
Kernelinformation
Provider
Parser
GraphBuilder
DeploymentPlanner
ApplicationDeployer
ApplicationManager
MOCCABuilder
25
Motivation for multiprotocol and multilanguage interoperability
• Grids are heterogeneous• Multiple programming languages – in single application
Java for middleware C for system programming FORTRAN for computing Python for scripting
• Multiple protocols – in single application High speed local networks (Myrinet) TCP/SSL/TLS in WAN SOAP for loosely coupled message exchange Overlay P2P networks for traversing private network boundaries (NATs)
• Context: MOCCA component framework
26
Multilanguage Solution - Babel
• SIDL – Scientific Interface Definition Language Standard for CCA Components Supports arrays and complex types Focus on interfaces
• Babel: SIDL parser Code generator Runtime library
• Intermediate ObjectRepresentation (IOR)
Core of Babel object Array of function
pointers Generated code in C
C
C++
f77
f90
Python
Java
Babel
C
C++
f77
f90
Python
Java
Babel
package example version 1.2 { class Hello { string hello( in string hello); }}
// user defined non-static methods: /** * Method: hello[] */ public java.lang.String hello_Impl ( /*in*/ java.lang.String hello ) { // DO-NOT-DELETE splicer.begin(example.Hello.hello) // Insert-Code-Here {example.Hello.hello} (hello) return ”Server says: ” + hello; // DO-NOT-DELETE splicer.end(example.Hello.hello) }
/** * Method: hello[] */char*example_Hello_hello( /*in*/ example_Hello self, /*in*/ const char* hello);
27
Currently: Babel for Local Applications
• All Babel objects in one process
• Implemented in CCAFFEINE framework
• Existing multilanguage CCA components – see CCA tutorial
Javaapplication
Fortrannativelibrary
SIDL
C++nativelibrary
SIDL
Babel IOR
Babel IOR
28
Our Solution
• Babel + RMIX• Implementation of
Babel RMI extensions– generic mechanism
of method invocation (reflection)
– Dynamic loading of communication library
– No need for code generation and compilation
Javaapplication
Fortrannativelibrary
SIDL
C++nativelibrary
SIDL
Babel IOR
RMIXlibrary
Babel IOR
Network
SIDL
RMIXlibrary
SIDL
29
Interoperability with Grid Component Model (CoreGRID)
• Based on Fractal Model
• Deployment Functionalities
• Asynchronous and extensible
port semantics
• Collective Interfaces
• Autonomicity and adaptivity
thanks to “autonomic” and
“dynamic” controllers
• Support for language neutrality
and interoperability
ComponentIdentity
BindingController
LifeCycleController
ContentController
ContentController
30
Motivation for interoperability
• Framework interoperability is an important issue for GCM
• Existing component models and frameworks for Grids CCA, CCM
• Already existing „legacy” components• ProActive/Fractal and H2O/MOCCA – alternative
Java-based frameworks for distributed computing: can they interoperate?
31
Fractal vs. CCA
• Similarities: general for most component models Separation of interface from implementation Composition by connecting interfaces
• Differences Fractal components are reflective (introspection) vs. the CCA
components are given initiative to add/remove ports at runtime BindingController in Fractal vs. BuilderService in CCA No ContentController in CCA (and no hierarchy) Factory interface in Fractal vs. BuilderService in CCA AttributeController in Fractal vs. ParameterPort in CCA No ADL in CCA
32
Approaches to integration
• Single component integration Wrapping a CCA component
into a primitive GCM one Allow to use a CCA
component in a GCM framework
• Framework interoperability Ability for two component
frameworks to interoperate Allow to connect a CCA
component assembly (running in a CCA framework) to a GCM component application
Wrapper
CCA Component
C BC
cca.Services
Wrapper
CCA Component
C BC
CCA Component
CCA Component
BuilderService
GlueGlue
CCA Framework
33
Solutions to typing issues
1. Generate the type of a wrapped CCA component at runtime (at initialization) Pros: fully automated Cons: restricts to usage of ports which are declared by CCA
component during initialization (at setServices() call)
2. Manual description of a CCA component in ADL format Pros: Generic solution Cons: Require additional task from developer
3. (Semi)automatic generation of ADL• May combine approach 1. and 2.
4. Reuse existing CCA type specifications (SIDL, CCAFFEINE scripting, others – not standardized)
34
Technical approach – CCA controller
• Creates glue components for all ports (client and server)• Connects glue to CCA system (using CCA builder) and to membrane
(using BC)
CCAController
CCA Component
C
CCA Component
CCA Component
BuilderService
Server Glue A
CCA Framework
ClientGlue B
BC
BC
WA
CCA
A AA A
B B B BB
H2O Kernel
H2O Kernel
H2O Kernel
35
Glue Components
• Server Glue: Deployed as Fractal component Uses MOCCA client code to delegate
invocation to CCA interface Can be also deployed on H2O kernel
• Client Glue: Deployed as CCA component in H2O
kernel Launches ProActive runtime in H2O
kernel Creates Fractal component in this
runtime
• Both: Can be generated from the interface
type (TODO)
CCA Component Client
Glue B
BC
B B B B
H2O Kernel
CCA Component
Server Glue A
WA
AA A
H2O Kernel
36
ProActive + MOCCA
• MOCCA invocations are synchronous Composite (membrane) should be synchronous to avoid
deadlocks Or, we may consider generating glue with wrapped types
(IntWrapper, etc) – this changes types of interfaces
• Class loading issues The classes generated by ProActive runtime must be visible to the
code running in H2O kernel The RMI class loading works fine if the codebase is set properly
on ProActive side
37
Communication Intensive Application Benchmark
• Simplified scenario: 2 components Provides port: receive and send-back array of double (ping-pong)
• Tested on local Gigabit Ethernet and on transatlantic Internet between Atlanta and Krakow
• 2.4 GHz Linux machines• Comparison with XCAT
38
Small Data Packets
Factors:• SOAP header overhead in XCAT• Connection pools in RMIX
39
Large Data Packets
• Encoding (binary vs. base64)
• CPU saturation on Gigabit LAN (serialization)
• Variance caused by Java garbage collection
40
Automatic Flow Composer Example
• Compose application graph from initial data (e.g. initial ports) or incomplete graph
• First implemented for XCAT framework
• Easy migration to MOCCA• Modification of code required
(xcat.Port)• Similar performance for XCAT and
MOCCA (exchange of text documents)
Lookup
FlowOptimizer
FlowComposer
LinkEvaluator
SiteEvaluator
ComponentRegistry
Evaluate
Compose
Evaluate
41
Other applications
• Domain decomposition (some student toy apps)• Data mining using Weka (as a Virolab example)
42
Gold Cluster Application
• Components Starter – a „driver” component for
the application, provides a Go port Configuration generator – random
initial configurations Simulated annealing – compute
intensive simulation component Storeroom – used for keeping
results and statistics Gather – auxiliary component for
passing molecules
• Ports Molecule – offers getMolecule()
method Control ports – for steering the
application
Generator Control
Starter
Simulated Annealing Gather
MoleculeMolecule
...
Molecule
Annealing Control
Configuration Generator
Simulated Annealing
Storeroom
Simulated Annealing
Control
43
Resources and Results
• Using heterogeneous infrastructure – available ad-hoc Local machine
• SSH access Cluster in CYFRONET
• PBS CrossGrid tesbed (LCG based
middleware)• Clusters in PSNC Poznan and
IFCA Santander
• Java VMs already installed• Cluster nodes allow remote
point-to-point communication (MPICH-enabled: no firewalls!)
• Problem size grows with number of nodes (weak scaling)
1 2 3 4 5 6 7 8 9 100
2550
75100
125150
175200
225250
275300
325350
375
Number of nodes
Com
putin
g tim
e[s]
44
Future work
• Optimization algorithms (scheduling) for ADL and scripting models
• Monitoring support (Gemini)• Formal model (adapted from GCM)• Further integration with Babel• More applications
45
Summary
• Analysis of programming models for Grid, selection of component model
• Design and implementation of CCA framework based on H2O platform
• Extending applicability of H2O for dynamically created pools of resources (user-centric or ad-hoc created Vos)
• Extensions for parallel-distributed CCA components• Support for time and space composition modes by high-
level scripting and ADL-based application • Towards multilanguage interop• Supporting interoperability between component models
46
Key papers
• Maciej Malawski, Dawid Kurzyniec, and Vaidy Sunderam. MOCCA – towards a distributed CCA framework for metacomputing. In Proceedings of the 10th International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS2005), 2005. IEEE Computer Society
• Maciej Malawski, Marian Bubak, Michał Placek, Dawid Kurzyniec, and Vaidy Sunderam. Experiments with distributed component computing across Grid boundaries. In Proceedings of the HPC-GECO/CompFrame workshop in conjunction with HPDC 2006, 2006.
• P. Jurczyk, M. Golenia, M. Malawski, D. Kurzyniec, M. Bubak, V. S. Sunderam, Enabling Remote Method Invocations in Peer-to-Peer Environments: RMIX over JXTA, in: Roman Wyrzykowski, Jack Dongarra, Norbert Meyer, Jerzy Wasniewski (Eds.), Parallel Processing and Applied Mathematics: 6th International Conference, PPAM 2005, Poznan, Poland, September 11-14, 2005, Revised Selected Papers, Lecture Notes in Computer Science, 3911, Springer, 2006, pp. 667-674
• M. Malawski, D. Harezlak, M. Bubak, Towards Multiprotocol and Multilanguage Interoperability: Experiments with Babel and RMIX, in: M. Bubak, M. Turała, K. Wiatr (Eds.), Proceedings of Cracow Grid Workshop - CGW'05, November 20-23 2005, ACC-Cyfronet UST, 2006, Kraków, pp. 266-278.
• M. Bubak, M. Malawski, M. Placek, Using MOCCA Component Environment for Simulation of Gold Clusters, in: M. Bubak, M. Turała, K. Wiatr (Eds.), Proceedings of Cracow Grid Workshop - CGW'05, November 20-23 2005, ACC-Cyfronet UST, 2006, Kraków, pp. 295-299.
47
Acknowledgements
• Vaidy Sunderam, Dawid Kurzyniec – Emory University, Atlanta
• Daniel Harężlak, Michał Placek• Tomek Bartyński, Eryk Ciepiela, Joanna Kocot,
Przemysław Pelczar, Iwona Ryszka• Paweł Jurczyk, Maciej Golenia• Tomasz Gubała, Marek Kasztelnik, Piotr Nowakowski• Ludovic Henrio, Matthieu Morel, Francoise Baude, Denis
Caromel – Sophia-Antipolis, France• Marian Bubak