a hybrid decomposition scheme for building scientific workflows
DESCRIPTION
A Hybrid Decomposition Scheme for Building Scientific Workflows. Wei Lu Indiana University. Our work. Application Decomposition. Large scientific applications require Decomposing the problem into manageable units Units need to be Self-described Self-encapsulated - PowerPoint PPT PresentationTRANSCRIPT
A Hybrid Decomposition Scheme for Building Scientific
WorkflowsWei Lu
Indiana University
Application Decomposition
• Large scientific applications require– Decomposing the problem into manageable units– Units need to be
• Self-described • Self-encapsulated • Independently developed and deployed • composable
• Two decomposition dimensions– Functional Decomposition (a.k.a. Spatial Decomposition)
• C/C++, JAVA• Component
– Temporal Decomposition• Unix Pipe• Workflow
– however,• most PSEs provide only one approach to the exclusion of the other
Our work
Common Component Architecture (CCA)
• Scientific computing imposes special requirements– Support for legacy software– Performance is crucial– languages, data types
• Fortran, C/C++, Python, Java, etc.• Complex numbers and Arrays (as first-class objects)
– Support the various parallel run-time platforms
• CCA– Component framework specification– Designed for the scientific high performance computing– Aims at improving the scientific software reusing
CCA Component
• Each component describes– What functionality it fulfills
• Provide port– What functionality it needs to fulfill its task
• Use port
• Use-Provide pattern– Plug-and-play
• The port is described in SIDL– Scientific Interface Definition Language– Partially derived from CORBA IDL– With constructs to describe the complex number,
array, etc.– Babel : Language Interoperability Tool
NonlinearFunction
FunctionPortFunctionPort
MidpointIntegrator
IntegratorPort
C Fortran
LinearFunction
FunctionPort
Python
Example of the CCA Composition
interface IntegratorPort extends gov.cca.Port{ double integrate(in double lowBound, in double upBound, in int count);}
Ccaffeine
• Parallel implementation of the CCA framework• SCMD (Single Component Multiple Data)
– Inter-components communication • virtual function call in the same address space
– Intra-components communication• could be MPI, PVM, etc.
Kepler
• Scientific workflow enviroment– Data-flow oriented
• Basic unit: Actor– Input, Output– Typed dataflow structure– Lots of domain-specific actors supporting
• biology, ecology, astronomy – General facility actors
• Grid service actor• Web service actor
• Wire the actors by piping
GridFtp ClassifierlocalFilePath
URL
Credential
Compare Side by Side
• Actor– Stands for one function
• Port– Input/Output– A data-structure definition
• Connection– Producer to Consumer
• Compositions defines “How”
• Advantages– Loosely coupled – Supports distributed
resource sharing
• Component– Stands for one class
• Port – Provide/Use– An interface signature
• Connection– Caller to Callee
• Composition defines “What”
• Advantages– Good performance– Supports parallel
programming model
A Hybrid solution
• Typical scientific applications – involve multiple distributed data processing phases. – Among those phases there are number of
computationally intensive cores, • often the classical numerical algorithm • need the high performance execution environment.
• The hybrid scheme – use the workflow scheme to decompose based on the
distribution of the resource– Then use the component scheme to further
decompose those computationally intensive sub-problems to form the parallel solution.
• Benefit from both schemes
Service over Components
• Building web service over the CCA– Web service = good interoperability – Kepler supports web service as the actor– More resource and protocols (e.g., WS-BEPL)
• Façade pattern– External view by the coarse-grained web service– Internal functionality by the fine-grained components.
• Factory pattern– Workflow needs
• a task-specific service rather than meta-level service.
– The task-specific Service • Should be created dynamically and on-demand
– But service is not instantiable !
service
Task-specificservice
create
Architecture
• Job– A specific task performed by a group wired components
• Two phases execution– Compose the job– Run the job
• Two explicitly separated web services (CCA-Services)– Factory Service– Job Proxy
FactoryService
CcaffeineFramework
IPC
JobProxy
Composer
UserInvocation
Job description
Job Factory Service• A Façade for the ccaffeine framework
– Connects the ccaffeine muxer via a socket– Maintains the job tables, job lifecycle
• Create– parameters
• Gateway port– the task-specific interface
• Composition Description: – how components wired to support the Gateway port
– Convert the SIDL to WSDL• Gateway port definition to the equivalent WSDL
– Forward the composition commands to the ccaffeine muxer• Will be executed in parallel
– Maintain job records internally – Create the Job Proxy service
• return its WSDL URL• Modify
– Change the composition without impacting the service interface
Job Proxy Service
• Façade for the wired components
• With task-specific WSDL interface
• When getting the SOAP message– Extract the argument from the message– Pass the argument to the ccaffeine – Invoke the ccaffeine– Get result from Driver and send SOAP
responseJob
ProxyUser
SOAP request Arguments Driver
Example
FactoryService
socketComposer Gateway port
composition
JobProxy
Go
Gateway port
User SOAP
Job WSDL
Job table
Convert SIDL to WSDL • SIDL• Port interface (methods)• object oriented
– Port interface• A virtual interface
• inheritance, polymorphism
• Can be referred as the function parameter type
– No data structure so far
• WSDL• PortType (operations) • wire-format description
– PortType• A group of message exchanges
• no inheritance, no polymorphism
• can’t be referred as the method parameter type
– Any type is data structure essentially (by XML Schema)
No way to figure out the structural information from a SIDL port interface!Challenge
Current workaround:Only allow the methods with primitive argument type
Introducing structure in SIDL will alleviate the problem reasonably
Exampleinterface IntegratorPort extends gov.cca.Port{ double integrate(in double lowBound, in double upBound, in int count);}
<wsdl:message name="integrateInput"> <wsdl:part name="lowBound" type="xsd:double"/> <wsdl:part name="upBound" type="xsd:double"/> <wsdl:part name="count" type="xsd:integer"/></wsdl:message><wsdl:message name="integrateOutput"> <wsdl:part name="return" type="xsd:double"/></wsdl:message><wsdl:portType name="integrator.IntegratorPort_PortType"> <wsdl:operation name="integrate"> <wsdl:input message="integrateInput"/> <wsdl:output message="integrateOutput"/> </wsdl:operation></wsdl:portType>
Kepler Web Service Actor
• Kepler provides a general web service actor• For a method defined in the WSDL
– The actor will dynamically adjusts its input/output setting
Kepler CCA-Service Actor
• For CCA-Serivce– Recall that we have 2 explicit steps– the JobProxy service is dynamically created– We need to hide the procedure of creating the
JobProxy service from the user
• CCA-Service Actor– Extended from the web service actor– First calls the JobFactory service to create the
JobProxy service– With the WSDL of JobProxy, it does same thing as a
general web service actor does
Change the GUI fromSocket stream based toSoap message based.
Conclusion
• A hybrid decomposition scheme for scientific application• Workflow scheme is used first based on the resource
distribution• Component scheme is used to further decompose the
core parts• Web service interface is the key to the integration• CCA integrates into Kepler as a special actor, with GUI
supporting unified visual environment.• Converting SIDL to WSDL is inherently challenging,
Structure is useful for distributed systems, so we need to introduce the Structure into SIDL
Thanks
• Thanks for the valuable comment by the reviewers