beyond workflows - doe cloud computing paradigm and the sdm role and future
DESCRIPTION
Beyond Workflows - DOE Cloud Computing Paradigm and the SDM Role and Future. Mladen A. Vouk, Nagiza Smatova, Paul Breimyer, Pierre Moualem, Mei Nagappan, and the whole SPA team (list available separately) Scientific Data Management Center – Scientific Process Automation Group - PowerPoint PPT PresentationTRANSCRIPT
your name here
Beyond Workflows - DOE Cloud Computing Paradigm and the
SDMRole and Future
Mladen A. Vouk, Nagiza Smatova, Paul Breimyer, Pierre Moualem, Mei Nagappan,
and the whole SPA team (list available separately)
Scientific Data Management Center – Scientific Process Automation Group
NC State University, Raleigh, NC 276951
your name here
Overview
• Scientific Workflow technology – A success story from
the past 7 years in the SDM center (a technology used
in production or otherwise by application people) –
Developed components: Workflows, Provenance,
“Dashboard”, other
• DOE SDM “Cloud” -Vision for the future of the SDM
centre – Integration of components - Intelligent
Analytics and Social Networks, Component-based
“cloud”, Integrated Services (service oriented
architecture)
• Sustainable science - Long term approach for the
survival of SDM center technology (Beyond SciDAC and
longer) – Integration of Research, Engineering,
Transfer-of-Technology, Partnerships, Results (ROI,
TOC)2
your name here
Scientific Process Automation
• A key differentiating element of a successful
information technology (IT) is its ability to become
a true, valuable, and economical contributor to
cyberinfrastructure.
• An IT-assisted workflow represents a series of
structured activities and computations that arise in
information assisted problem solving.
• Scientific process automation principles, as well as
production level pilots, is SDM’s Key Contribution
over last 7 years – Smokey Mountains retreat.
• From NC State: numerous publications, 3 graduated
PhD and 4 MS with thesis students, several in
progress, several generations of software.
3
your name here
4
Environment
Computations
Orchestration(Kepler) Data, DataBases
Provenance…Storage
Analytics
Control Panels(Dashboard)
& DisplayNetworking
Local/Remote… “Cloud” Services
NetworkingLocal/Remote… “Cloud” Services
AnalyticsAnalytics
Computations
Orchestration(Kepler) Data, DataBases
Provenance…Storage
your name here
5
Workflow Framework
Provenance,Tracking &Meta-Data
(DBs and Portals)
Control Plane(light data flows)
ExecutionPlane(“HeavyLifting” Computationsand flows)
Synchronous or Asynchronous
Kepler
your name here
66
Out
Network/”Cloud”Bsub < code_run------------ where code_run is a script --------------code_run#! /bin/csh source /usr/local/lsf/conf/cshrc.lsf #BSUB -W 5 #BSUB -n 100 mpiexec ./code#BSUB -o /share/vouk/WFLOW/code.out.%J #BSUB -e /share/vouk/WFLOW/code.err.%J #BSUB -J codevouk
-------------------------
In
Actor/Process in a Broader Sense
your name here
7
Modular Framework
Supercomputers+
Analytics Nodes
Kepler
Dash
Storage
Meta-Data about:
Processes,Data,Workflows,System, Apps & Environment
Orchestration
Auth
DataStore
RecAPI
DispAPI
Management API
Access
Trust
your name here
Read More …
• Singh M.P. and M.A. Vouk, "Network Computing," in John G. Webster (editor),
Encyclopedia of Electrical and Electronics Engineering, John Wiley & Sons, New York, Vol.
14, pp. 114-132, 1999
• S Klasky, M Beck, V Bhat, E Feibush, B Ludäscher, M Parashar, A Shoshani, D Silver and M
Vouk, "Data management on the fusion computational pipeline," SciDAC 2005, Journal of
Physics: Conference Series 16 (2005), 510-520, doi:10.1088/1742-6596/16/1/070
• Ilkay Altintas, Oscar Barney, Zhengang Cheng, Terence Critchlow, Bertram Ludaescher,
Steve Parker, Arie Shoshani and Mladen Vouk, "Accelerating the scientific exploration
process with scientific workflows," sciDAC 2006, Journal of Physics: Conference Series 46
(2006), 468-478, doi:10.1088/1742-6596/46/1/065
• M. A. Vouk, I. Altintas R. Barreto, J. Blondin, Z.Cheng, T. Critchlow, A. Khan, S. Klasky, J.
Ligon, B. Ludaescher, P. A. Mouallem, S. Parker, N. Podhorszki, A. Shoshani, C. Silva, "
Automation of Network-Based Scientific Workflows," Proc. of the IFIP WoCo 9 on Grid-
based Problem Solving Environemnts: Implications for Development and Deployment of
Numerical Software, IFIP WG 2.5 on Numerical Software, Prescott, AZ, 2006, printed in
IFIP, Vol 239, "Grid-Based Problem Solving Environments, eds. Gaffney PW and Pool JCT
(Boston: Springer), pp. 35-61, 2007
• Klasky, S.; Barreto, R.; Kahn, A.; Parashar, M.; Podhorszki, N.; Parker, S.; Silver, D.; Vouk,
M.A. "Collaborative visualization spaces for petascale simulations," Proceedings of the
CTS 2008 - International Symposium on Collaborative Technologies and Systems, pp 203-
211, Digital Object Identifier 10.1109/CTS.2008.4543933,10-23 May 2008
• More… http://sdm.ncsu.edu
8
your name here
DOE Cloud
• “Cloud” computing – builds on decades of research in
virtualization, distributed computing, utility computing, grids, and more
recently networking, web and software services.
• It implies a seamless service oriented and component-
based architecture - delivery of an integrated and orchestrated
suite of on-demand functions to an end-user through composition of
both loosely and tightly coupled functions, or services - often network-
based, reduced information technology overhead for the
end-user, service orchestration, virtualization of
resources, great flexibility, reduced total cost of
ownership, different “flavors”.
• Intelligent Analytics and Knowledge-Creating Social
Networks, Component-based “Clouds”,
Seamless/Integrated Services
• Necessary in the context of Peta- and Exa- sciences, data,
etc.9
your name here
10
“Analytics Cloud"
Knowledge creation& Integration,
Social Networking,Provenance,Tracking &Meta-Data
(DBs and Portals)
ExecutionPlane - “Heavy duty” in-cloudComputations, Flows Services
W/FEngine
Concept-driven Analytics
W/F GenerationWizard
Run-time Manager and Scheduler
Synchronous & Asynchronous Services
Workflow control plane
Analytics Enabled ResourcesSupercomputers ClustersSupercomputers Active
StorageOther “cloud” devices
your name here
Components
• Reusability (elements can be re-used in other workflows)
• Substitutability (alternative implementations are easy to insert, very
precisely specified interfaces are available, run-time component
replacement mechanisms exist, there is ability to verify and validate
substitutions, etc), extensibility and scalability (ability to readily
extend system component pool and to scale it, increase capabilities of
individual components, have an extensible and scalable architecture
that can automatically discover new functionalities and resources, etc),
• Customizability (ability to customize generic features to the needs of a
particular scientific domain and problem),
• Composability (easy construction of more complex functional solutions
using basic components, reasoning about such compositions, etc.).
There are other characteristics that also are very important.
• Reliability and availability of the components and services,
• Cost - the cost of the services, total cost of ownership, economy of
scale
• Security and privacyand so on.
11
your name here
12
Example: Meta-Data Framework
Supercomputers+
Analytics
Kepler?
Dash
Storage
Orchestration
Auth
DBRecAPI
DispAPI
CustomWeb
Other...
your name here
Fault-Tolerance – Clouds of Clouds
13
Master DB(replicated)
your name here
User Categories
• Developers (10)
• Service Authors (100 to 1,000)
• Service Integrators (100– 10,000)
• End-users (1000 - ?)
14
your name here
Read More …
• Sam Averitt, Michael Bugaev, Aaron Peeler, Henry Shaffer, Eric Sills,
Sarah Stein, Josh Thompson, Mladen Vouk “Virtual Computing
Laboratory (VCL),” In the proceedings of the International Conference
on Virtual Computing Initiative, May 7-8, 2007, IBM Corp., Research
Triangle Park, NC, pp. 1-16.
• Mladen Vouk, Sam Averitt, Michael Bugaev, Andy Kurth, Aaron Peeler,
Andy Rindos*, Henry Shaffer, Eric Sills, Sarah Stein, Josh Thompson ,
“Powered by VCL” - Using Virtual Computing Laboratory (VCL)
Technology to Power Cloud Computing, Published in the Prelim.
Proceedings of the 2nd International Conference on Virtual Computing
Initiative, 15-16 May 2008, RTP, NC, pp. 1-10, final version to be
available through the ACM Digital Library
• Mladen A. Vouk, “Cloud Computing – Issues, Research and
Implementations,” ITI08, to appear in IEEE Digital Library
• Google for “cloud computing” …
• Other ..
15
your name here
Sustainable Science
• A Long term approach for the survival of SDM
center technology (Beyond SciDAC and longer)
• Research
• Engineering
• Transfer-of-Technology,
• Partnerships with scientists
• Operational open-source tools
• Visible results (agreed upon ROI, and an
accounting of TOC)
16