knowledge-based information/data- driven science …parashar/papers/parashar-ru...2006/04/04 ·...
TRANSCRIPT
Knowledge-based Information/Data-Driven Science and The
CyberinfrastructureOpportunities for Collaboration at Rutgers
Manish ParasharThe Applied Software Systems Laboratory
ECE/CAIP, Rutgers Universityhttp://www.caip.rutgers.edu/TASSL
Outline of this presentation
• Cyberinfrastructure - Unprecedented Opportunities– seamless, access, aggregation, interactions …
• Cyberinfrastructure - Unprecedented Complexity– scale, heterogeneity, dynamism, uncertainty
• Knowledge-based, data-driven science and the Cyberinfrastructure– research challenges and opportunities
• Enabling opportunities at Rutgers University– seeding, supporting multidisciplinary computational science
• Concluding remarks
Cyberinfrastructure - Unprecedented Opportunities
• “Cyberinfrastructure integrates hardware for computing, data and networks, digitally-enabled sensors, observatories and experimental facilities, and an interoperable suite of software and middleware services and tools…”
- NSF’s Cyberinfrastructure Vision for 21st Century Discovery
• A global phenomenon– “Strategic Roadmap for the National Collaborative Research Infrastructure
Strategy”• Australian Ministry for Education, Science and Training
– “Grids and Basic Research Programs”• Organization for Economic Co-operation and Development (OECD), Paris,
France.
• A new paradigm for planetary-scale investigations ?– seamless access
• resources, services, data, information, expertise, …– seamless aggregation– seamless (opportunistic) interactions/couplings
Emerging Computational Science & Engineering
“If you want to understand life, don’t think about vibrant, throbbing gels and oozes, think about information technology”
- Richard Dawkin, The Blind Watchmaker, 1986
“Science and engineering research and education are foundational drivers of cyberinfrastructure.”
- NSF’s Cyberinfrastructure Vision for 21st Century Discovery
• Information/systems technology and high performance applicationshave become critical research modalities in science and engineering– Conceptual and numerical models– Computational and computer science– Information and infrastructure
• Knowledge-based, data-driven scientific/engineering investigation can provide dramatic insights– symbiotically and opportunistically combine computations, experiments,
observations, and real-time data to model, manage, control, adapt, optimize, …
Knowledge-based, Information/Data-driven Scientific Investigation
Models dynamically composed.
“WebServices” discovered &
invoked.
Resources discovered, negotiated, co-allocated
on-the-fly. Model/Simulation
deployed
Experts query, configure resources
Experts interact and collaborate using ubiquitous and
pervasive portals
Applications& Services
Model A
Model BLaptop
PDA
ComputerScientist
Scientist
Resources
Computers, Storage,Instruments, ...
Data Archive &Sensors
DataArchives
Sensors, Non-Traditional Data
Sources
Experts mine archive, match
real-time data with history
Real-time data injection (sensors, laboratory
investigations, instruments, data
archives),
Automated mining & matching
Models write into
the archive
Experts monitor/interact with/interrogate/steer models (“what if”
scenarios,…). Application notifies experts of interesting phenomenon.
Examples of Application Areas
• Hazard prevention, mitigation and response– Earthquakes, hurricanes, tornados, wild fires, floods, landslides, tsunamis, terrorist
attacks• Critical infrastructure systems
– Condition monitoring and prediction of future capability• Transportation of humans and goods
– Safe, speedy, and cost effective transportation networks and vehicles (air, ground, space)
• Energy and environment– Safe and efficient power grids, safe and efficient operation of regional collections
of buildings• Health
– Reliable and cost effective health care systems with improved outcomes• Enterprise-wide decision making
– Coordination of dynamic distributed decisions for supply chains under uncertainty• Next generation communication systems
– Reliable wireless networks for homes and businesses
• Report of the Workshop on Dynamic Data Driven Applications Systems, F. Darema et al., March 2006, www.dddas.org
Source: M. Rotea, NSF
The Cyber Infrastructure – Unprecedented Challenges
• Unprecedented complexity, challenges– Very large scales– Ad hoc (amorphous) structures/behaviors of virtual organizations
• p2p/hierarchical architecture– Dynamic
• entities join, leave, move, change behavior– Heterogeneous
• capability, connectivity, reliability, guarantees, QoS– Unreliable, lack of guarantees
• components, communication– Lack of common/complete knowledge
• number, type, location, availability, connectivity, protocols, semantics, etc.
“Grid Computing: An Evolving Vision,” M. Parashar and C. Lee, Proceedings of the IEEE, Special Issue on Grid Computing, IEEE Press, Vol. 19, No. 3, March 2005.
Computational Modeling of Natural Phenomena
• Realistic, physically accurate computational modeling– Enormous computation requirements
• turbulent flow simulations using active flow control for biomedical engineering requires 5000x1000x500=2.5·109 grid points and approximately 107 time steps, i.e. with 1GFlop processors, requires a runtime of ~7·106 CPU hours, or about one month on 10,000 CPUs! (with perfect speedup). Also with 700B/pt the memory requirement is ~1.75TB of run time memory and ~800TB of storage.
• simulation of the core-collapse of supernovae in 3D with reasonable resolution (5003) would require ~20 teraflops for 1.5 months and about 200 terabytes of storage
– Dynamic and complex couplings• multi-physics, multi-model, multi-resolution, ….
– Dynamic and complex (ad hoc, opportunistic) interactions• application ↔ application, application ↔ resource, application ↔ data,
application ↔ user, …– Software/systems engineering issue
• volume and complexity of code, community of developers, …– scores of models, hundreds of components, millions of lines of code, …
Enabling Computational Science on CI: A Convergence of Biology and Information Technology
• Computing has evolved and matured to provide specialized solutions to satisfy relatively narrow and well defined requirements in isolation
– performance, security, dependability, reliability, availability, throughput, pervasive/amorphous, automation, reasoning, etc.
– current programming paradigms, middleware/systems software, management tools, are inadequate to handle the scale, complexity, dynamism and heterogeneity of emerging systems
• In case of emerging applications/environments, requirements, objectives, execution contexts are dynamic and not known a priori
– requirements, objectives and choice of specific solutions (algorithms, behaviors, interactions, etc.) depend on runtime state, context, and content
– applications should be aware of changing requirements and executions contexts and to respond to these changes are runtime
• Convergence of Biology and Information Technology – Autonomic Computing – nature has evolved to cope with scale, complexity, heterogeneity, dynamism and
unpredictability, lack of guarantees• self configuring, self adapting, self optimizing, self healing, self protecting, highly
decentralized, heterogeneous architectures that work !!!
• Goal of autonomic computing is to build a self-managing systems/applications that self-manage using high level guidance from humans
Project AutoMate: Enabling Autonomic Applications(http://automate.rutgers.edu)
• Conceptual models and implementation architectures for AutonomicComputing – programming systems based on popular programming models
• object, component and service based prototypes– content-based coordination and messaging middleware– amorphous and emergent overlays
Rudder
Coordination
Middlew
are
Sesam
e/DA
IS P
rotection Service
Autonomic Grid Applications
Programming SystemAutonomic Components, Dynamic Composition,
Opportunistic Interactions, Collaborative Monitoring/Control
Decentralized Coordination EngineAgent Framework,
Decentralized Reactive Tuple Space
Semantic Middleware ServicesContent-based Discovery, Associative Messaging
Content OverlayContent-based Routing Engine,
Self-Organizing Overlay
Ont
olog
y, T
axon
omy
Met
eor/S
quid
Con
tent
-bas
edM
iddl
ewar
e
Acc
ord
Prog
ram
min
gFr
amew
ork
“AutoMate: Enabling Autonomic Grid Applications,” M. Parashar et al, Cluster Computing: The Journal of Networks, Software Tools, and Applications, Special Issue on Autonomic Computing, Kluwer Academic Publishers. Vol. 9, No. 2, pp. 161 – 174, 2006.
Project AutoMate: Components
• Accord – A Programming System for Autonomic Grid Applications
• Rudder/Comet – Decentralized Coordination Middleware• ACE – Autonomic Composition Engine• Meteor – Content-based Middleware• Squid – Decentralized Information Discovery and Content-
based Routing• SESAME – Context-Aware Access Management• DAIS – Cooperative Protection against Network Attacks
• More information/Papers – http://automate.rutgers.edu
The Instrumented Oil Field of the Future (UT-CSM, UT-IG, RU, OSU, UMD, ANL) (NSF ITR 01, 04)
Detect and track changes in data during productionInvert data for reservoir propertiesDetect and track reservoir changes
Assimilate data & reservoir properties intothe evolving reservoir model
Use simulation and optimization to guide future production, future data acquisition strategy
• Production of oil and gas can take advantage of permanently installed sensors that will monitor the reservoir’s state as fluids are extracted
• Knowledge of the reservoir’s state during production can result in better engineering decisions
– economical evaluation; physical characteristics (bypassed oil, high pressure zones); productions techniques for safe operating conditions in complex and difficult areas
“Application of Grid-Enabled Technologies for Solving Optimization Problems in Data-Driven Reservoir Studies,” M. Parashar, H. Klie, U. Catalyurek, T. Kurc, V. Matossian, J. Saltz and M Wheeler, FGCS. The International Journal of Grid Computing: Theory, Methods and Applications (FGCS), Elsevier Science Publishers, Vol. 21, Issue 1, pp 19-26, 2005.
Effective Oil Reservoir Management: Well Placement/Configuration
• Why is it important – Better utilization of existing reservoirs– Discovering new reservoirs– Minimizing adverse effects to the environment
Better
Less Bypassed Oil
Bad
Much Bypassed Oil
Effective Oil Reservoir Management: Well Placement/Configuration
• What needs to be done– Exploration of possible well placements and configurations for
optimized production strategies– Understanding field properties and interactions between and across
subdomains– Tracking and understanding long term changes in field characteristics
• Challenges – Geologic uncertainty: Key engineering properties unattainable– Large search space: Infinitely many production strategies possible– Complex physical properties and interactions. Complex numerical
models
Autonomic Well Placement/Configuration
If guess not in DBinstantiate IPARS
with guess asparameter
Send guesses
MySQLDatabase
If guess in DB:send response to Clientsand get new guess fromOptimizer
OptimizationService
IPARSFactory
SPSA
VFSA
ExhaustiveSearch
DISCOVERclient
client
Generate Guesses Send GuessesStart Parallel
IPARS InstancesInstance connects to
DISCOVER
DISCOVERNotifies ClientsClients interact
with IPARS
AutoMate Programming System/Grid Middleware
History/ Archived
Data
SensorData
AutonomicAutonomic OilOil Well Placement/Configuration
permeability Pressure contours3 wells, 2D profile
Contours of NEval(y,z,500)(10)
Requires NYxNZ (450)evaluations. Minimumappears here.
VFSA solution: “walk”: found after 20 (81) evaluations
AutonomicAutonomic OilOil Well Placement/Configuration (VFSA)
“An Reservoir Framework for the Stochastic Optimization of Well Placement,” V. Matossian, M. Parashar, W. Bangerth, H. Klie, M.F. Wheeler, Cluster Computing: The Journal of Networks, Software Tools, and Applications, Kluwer Academic Publishers (to appear).“Autonomic Oil Reservoir Optimization on the Grid,” V. Matossian, V. Bhat, M. Parashar, M. Peszynska, M. Sen, P. Stoffa and M. F. Wheeler, Concurrency and Computation: Practice and Experience, John Wiley and Sons, Volume 17, Issue 1, pp 1 – 26, 2005.
Solution for 7 different initial guesses
Convergence history
AutonomicAutonomic OilOil Well Placement/Configuration (SPSA)
MethodMetric
Nelder-Mead GA VFSA FDSA SPSA
Best solution -1.018e8 -1.073e8 -1.083e8 -1.062e8 -1.075e8
Average number of function evaluations 99.95 104.02 75.5 57.0 37.8
Optimal Well PlacementOptimal Well Placement
Comparison of optimization approachesComparison of optimization approaches
Optimal solution: F* = -1.098E8
Learned lessons:Learned lessons:• Robust stochastic algorithms increases the chances to find (near) optimal solutions (VFSA)
• Several trials of a fast algorithm pay off against sophisticated algorithms (SPSA)
• Need to develop hybrid strategies
Knowledge-based Data-driven Management of Subsurface Geosystems: The Instrumented Oil Field (NSF KDI, ITR, with UT-CSM, UT-IG, OSU, UMD, ANL)
Detect and track changes in data during production.Invert data for reservoir properties.Detect and track reservoir changes.
Assimilate data & reservoir properties intothe evolving reservoir model.
Use simulation and optimization to guide future production.
Data Driven
ModelDriven
Management of the Ruby Gulch Waste Repository (DoESciDAC, with UT-CSM, INL, OU, RU)
Inverse Modeling
System responses
Parameters,Boundary & Initial Conditions
Forward Modeling
Prediction
ComparisonWith observations
Networkdesign
Applicationgoodbad
• Near-Real Time Monitoring, Characterization and Prediction of Flow Through Fractured Rocks
Adaptive Fusion of Stochastic Information for Imaging Fractured VadoseZones (NSF SEIII, with U of AZ, OSU, U of IW;)
Data-Driven Forest Fire Simulation (DoE SCIDAC, with U of AZ)
• Predict the behavior and spread of wildfires (intensity, propagation speed and direction, modes of spread) – based on both dynamic and
static environmental and vegetation conditions
– factors include fuel characteristics and configurations, chemical reactions, balances between different modes of hear transfer, topography, and fire/atmosphere interactions.
“Self-Optimizing of Large Scale Wild Fire Simulations,” J. Yang*, H. Chen*, S. Hariri and M. Parashar, Proceedings of the 5th International Conference on Computational Science (ICCS 2005), Atlanta, GA, USA, Springer-Verlag, May 2005.
System for Laser Treatment of Cancer – UT, Austin
Source: L. Demkowicz, UT Austin
Synthetic Environment for Continuous Experimentation – Purdue University
Source: A. Chaturvedi, Purdue Univ.
Integrated Wireless Phone Based Emergency Response System – Notre Dame
• Detect abnormal patterns in mobile call activity and locations• Initiate dynamic data driven simulations to predict the evolution of
the abnormality• Initiate higher resolution data collection in localities of interest• Interface with emergency response Decision Support Systems
Source: G. Madey, ND
Grid Resource Broker and SchedulerGrid Resource Broker and Scheduler
Adaptive Simulation ControllerAdaptive Simulation Controller
Adaptive Optimization Controller
Sensors& Data
Mobile RF AMR Sensors Static RF AMRSensor Network
Static Water Quality Sensor Network
Resource Needs
Resource Availability
Bayesian Monte-Carlo Engine
Optimization Engine
Simulation Models
Grid Computing Resources
Adaptive Wireless Data Receptor and ControllerDeci-sions
DataAdaptive Workflow
Portal
Algorithms & Models
Middleware & Resources
Model Parameters
Model Outputs
Grid Resource Broker and SchedulerGrid Resource Broker and Scheduler
Adaptive Simulation ControllerAdaptive Simulation Controller
Adaptive Optimization Controller
Sensors& Data
Mobile RF AMR Sensors Static RF AMRSensor Network
Static Water Quality Sensor Network
Resource Needs
Resource Availability
Bayesian Monte-Carlo Engine
Optimization Engine
Simulation Models
Grid Computing Resources
Adaptive Wireless Data Receptor and ControllerDeci-sions
DataAdaptive Workflow
Portal
Algorithms & Models
Middleware & Resources
Model Parameters
Model Outputs
Adaptive Cyberinfrastructure for Threat Management in Urban Water Distribution Systems – NC State Univ.
Source: K. Mahinthakumar, NCSU
Structural Health Monitoring and Critical Event Prediction – Stanford University
Pilot Display of Crisis
On-board Sensing & Processing
n-line Prognosis & Processing
DamageDetection
Material CharacterizationHigh-Performance Off-Line
Prognosis by Modeling and Simulation
On-Demand Maintenance
Detailed Report
Data for Model Update
Pilot Display of CrisisPilot Display of Crisis
On-board Sensing & Processing
n-line Prognosis & Processing
On-board Sensing & Processing
n-line Prognosis & Processing
DamageDetection
Material Characterization
Material CharacterizationHigh-Performance Off-Line
Prognosis by Modeling and SimulationHigh-Performance Off-Line
Prognosis by Modeling and Simulation
On-Demand Maintenance
Detailed Report
On-Demand Maintenance
Detailed Report
Data for Model Update
Source: C. Farhat, SU
Auto-Steered Information-Decision Processes for Electrical Systems Asset Management – Iowa State Univ.
• Develop a hardware-software prototype for auto-steering the information-decision cycles inherent to managing operations, maintenance, & planning of high-voltage electric power transmission systems.
Remote ControlHeadquarters
WideArea
Network
SecureAccess
SatelliteEthernet Radio Optical Fiber Power Line Internet
2.4 or 5 GHz ISM Band
Wireless Sensor
Network Transceiver
Local Communication Box
NetworkSwitch
Base Station
Port ServerWAN
Connection
Data Processor
Source: J. McCalley, Iowa State
Scientific Investigation and CI: Challenges and Opportunities
• Applications
• Algorithms
• Measurement systems
• Systems software
• Community testbeds
• Report of the Workshop on Dynamic Data Driven Applications Systems, F. Darema et al., March 2006, www.dddas.org
Challenges and Opportunities: Applications (I)
• Dynamic and continuous validation of models, algorithms, systems, and (emergent) system of systems
• Uncertainty quantification– novel methods are needed to effectively quantify the end-to-end
measures of uncertainty, as models, data, and algorithms are cycled in and out
• Dynamic data driven, on demand scaling and resolution – application driven on demand scaling and configuration of
measurement systems for dynamic multi-resolution data inputs
• Observability, identifiability and tractability – establishing the pedigree of data and applications that contributed
to the result is critical and challenging
Challenges and Opportunities: Applications (II)
• Self-organization of measurement systems, models, applications, workflows, and organizations
• Human in the loop decision-making and distribution of tasks among human and non-human actors– new theories and paradigms are needed to distribute intelligence and
tasks among human and non-human actors
• Community frameworks/tools for rapid experimentation
• Accommodation of plurality of system demands – (thoughts: social, economic impact vs. rapid response, and view
points: outside the system or inside the system)
Challenges and Opportunities: Algorithms (I)
• Methods related to the measurement/data feedback on the computational model– mechanisms in which the computational models respond to the information
coming from the instrumentation • examples include new ways of data assimilation (ensemble Kalman-like or
hybrid stochastic estimation for messy, incomplete, and possibly out of time order data) and uncertainty estimation, adaptive parameter updating, dynamic model selection, and fast nonlinear update methods combined with multiscaleinterpolation of sensor data to form self correction methods, etc.
• Methods related to the computational model feedback on the measurement/data– mechanisms in which the data collection and instrumentation responds to
the results from the computational models • examples include instrument placement and control, data relevance assessment,
noise quantification and qualification, and robust dynamic optimization, sensor steering, and targeted observations/data refinement.
Challenges and Opportunities: Algorithms (II)
• Advances in computational simulation tools for modeling– new advances in existing computational technologies
• uncertainty estimation and propagation, adaptive discretizationschemes, fast linear and nonlinear solvers that enable high-fidelity real-time simulations, fast error estimation, dynamic validation and verification, appropriate couplings for multiscale, multiphysicssimulations, and system identification in the context of dynamic data assimilation, are examples of simulation-level tools that need to be developed
• System and integration of related algorithms• real time algorithms for managing large dataware, managing
processing, parallel prototyping, guaranteed and consistent resources (including networks), dynamic scheduling of processors, networks, Grids, and changes in allocations, process migration, check-pointing, fast visualization techniques, algorithms appropriate for sensorembedding, and secure transmission
Challenges and Opportunities: Measurement Systems
• Components: – instruments, sensors, databases, human inputs and other devices for
taking measurements; data quality assessment, data formatting, and feature extraction; and data measurement steering
• Issues: – on-demand data collection and management, data streaming into the
simulation, data representation, data models and congruence of data, real-time constraints, data processing/preprocessing, data collectionrates, consumption rates, available bandwidths and other resources and how to discover, obtain, establish the authenticity and correctness and how to maintain an audit trail
Challenges and Opportunities: Systems Software (I)
• Programming models and system:– support the formulation of application components and applications that are
capable of correctly and consistently adapting their behaviors, interactions and compositions in real time in response to dynamic data and application/system state, while satisfying real time, space, functional, performance, reliability, security, and quality of service constraints
• Data management (acquisition, assimilation, transport, manipulation, exploration) mechanisms and services:– enable seamless acquisition of high data rate high volume data from varied,
distributed and possible unreliable data sources, while addressing stringent real time, space and data quality constraints, and enabling the seamless and dynamic integration of this data with computation and resources
Challenges and Opportunities: Systems Software (II)
• Runtime execution and middleware services– support dynamic, data and knowledge driven and time constrained
executions, adaptations, interactions, compositions for application elements, while guaranteeing reliable and resilient execution and predictable and controllable response times
• Computation infrastructures– support immediate on-demand and anticipatory resource co-allocations and
co-scheduling, dynamic, secure and seamless aggregation of and interactions betweens distributed resources and varied data sources, dynamic configuration and instantiation and reliable and traceable execution of application workflows, real time and soft real time QoS guarantees, pervasive remote monitoring and access and human-in-the-loop interactions
• Community testbeds
Building Collaborations at Rutgers (IMHO)
• Application are naturally large and complex
• Computational science is critical “… applied computer (computational ...) science is now playing the role
which mathematics the role which mathematics did from the seventeenth through the twentieth centuries: providing an orderly, formal framework and exploratory apparatus for other sciences…“
- G. Djorgovski, Director, CACR, Caltech, 2005.
• Need to seed, catalyze and nurture multi-disciplinary collaborations– productive, lasting collaborations are:
• built bottom-up • must be win-win• need time and effort
• Resources and infrastructure is crucial …– compute (capability, capacity), communication (NLR), collaboration, data,
visualization, etc.
• Home for Multidisciplinary Computational Science at Rutgers !
Conclusion
• Next Generation Scientific Investigation: Knowledge-based, data and information driven, dynamically adaptive applications on the Cyberinfrastructure– Unprecedented opportunities for global scientific investigation
• can enable accurate solutions to complex applications; provide dramatic insights into complex phenomena
– Unprecedented research challenges• scale, complexity, heterogeneity, dynamism, reliability, uncertainty, …• applications, algorithms, measurements, data/information, software
– Project AutoMate: Autonomic Computational Science on the Grid• Accord, Rudder/Comet, Meteor, Squid, Topos, …
• More Information, publications, software– www.caip.rutgers.edu/~parashar/– [email protected]
The Team
• TASSL, CAIP/ECE Rutgers University– Viraj Bhat– Sumir Chandra– Andres Q. Hernandez– Nanyan Jiang– Zhen Li (Jenny)– Vincent Matossian– Cristina Schmidt– Mingliang Wang– Li Zhang
• Key CE/CS Collaborators– Rutgers Univ.
• D. Silver, D. Raychaudhuri, P. Meer, M. Bushnell, etc.
– Univ. of Arizona• S. Hariri
– Ohio State Univ.• T. Kurc, J. Saltz
– GA Tech• K. Schwan, M. Wolf
– University of Maryland• A. Sussman, C. Hansen
• Key Applications Collaborators– Rutgers Univ.
• R. Levy, S. Garofilini– UMDNJ
• D. Foran, M. Reisse– CSM/IG, Univ. of Texas at Austin
• H. Klie, M. Wheeler, M. Sen, P. Stoffa– ORNL, NYU
• S. Klasky, C.S. Chang– CRL, Sandia National Lab., Livermore
• J. Ray, J. Steensland– Univ. of Arizona/Univ. of Iowa, OSU
• T. –C. J. Yeh, J. Daniels, A. Kruger– Idaho National Laboratory
• R. Versteeg– PPPL
• R. Sataney– ASCI/CACR, Caltech
• J. Cummings