knowledge-based information/data- driven science …parashar/papers/parashar-ru...2006/04/04  ·...

40
Knowledge-based Information/Data- Driven Science and The Cyberinfrastructure Opportunities for Collaboration at Rutgers Manish Parashar The Applied Software Systems Laboratory ECE/CAIP, Rutgers University http://www.caip.rutgers.edu/TASSL

Upload: others

Post on 08-Jul-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

Knowledge-based Information/Data-Driven Science and The

CyberinfrastructureOpportunities for Collaboration at Rutgers

Manish ParasharThe Applied Software Systems Laboratory

ECE/CAIP, Rutgers Universityhttp://www.caip.rutgers.edu/TASSL

Page 2: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

Outline of this presentation

• Cyberinfrastructure - Unprecedented Opportunities– seamless, access, aggregation, interactions …

• Cyberinfrastructure - Unprecedented Complexity– scale, heterogeneity, dynamism, uncertainty

• Knowledge-based, data-driven science and the Cyberinfrastructure– research challenges and opportunities

• Enabling opportunities at Rutgers University– seeding, supporting multidisciplinary computational science

• Concluding remarks

Page 3: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

Cyberinfrastructure - Unprecedented Opportunities

• “Cyberinfrastructure integrates hardware for computing, data and networks, digitally-enabled sensors, observatories and experimental facilities, and an interoperable suite of software and middleware services and tools…”

- NSF’s Cyberinfrastructure Vision for 21st Century Discovery

• A global phenomenon– “Strategic Roadmap for the National Collaborative Research Infrastructure

Strategy”• Australian Ministry for Education, Science and Training

– “Grids and Basic Research Programs”• Organization for Economic Co-operation and Development (OECD), Paris,

France.

• A new paradigm for planetary-scale investigations ?– seamless access

• resources, services, data, information, expertise, …– seamless aggregation– seamless (opportunistic) interactions/couplings

Page 4: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

Emerging Computational Science & Engineering

“If you want to understand life, don’t think about vibrant, throbbing gels and oozes, think about information technology”

- Richard Dawkin, The Blind Watchmaker, 1986

“Science and engineering research and education are foundational drivers of cyberinfrastructure.”

- NSF’s Cyberinfrastructure Vision for 21st Century Discovery

• Information/systems technology and high performance applicationshave become critical research modalities in science and engineering– Conceptual and numerical models– Computational and computer science– Information and infrastructure

• Knowledge-based, data-driven scientific/engineering investigation can provide dramatic insights– symbiotically and opportunistically combine computations, experiments,

observations, and real-time data to model, manage, control, adapt, optimize, …

Page 5: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

Knowledge-based, Information/Data-driven Scientific Investigation

Models dynamically composed.

“WebServices” discovered &

invoked.

Resources discovered, negotiated, co-allocated

on-the-fly. Model/Simulation

deployed

Experts query, configure resources

Experts interact and collaborate using ubiquitous and

pervasive portals

Applications& Services

Model A

Model BLaptop

PDA

ComputerScientist

Scientist

Resources

Computers, Storage,Instruments, ...

Data Archive &Sensors

DataArchives

Sensors, Non-Traditional Data

Sources

Experts mine archive, match

real-time data with history

Real-time data injection (sensors, laboratory

investigations, instruments, data

archives),

Automated mining & matching

Models write into

the archive

Experts monitor/interact with/interrogate/steer models (“what if”

scenarios,…). Application notifies experts of interesting phenomenon.

Page 6: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

Examples of Application Areas

• Hazard prevention, mitigation and response– Earthquakes, hurricanes, tornados, wild fires, floods, landslides, tsunamis, terrorist

attacks• Critical infrastructure systems

– Condition monitoring and prediction of future capability• Transportation of humans and goods

– Safe, speedy, and cost effective transportation networks and vehicles (air, ground, space)

• Energy and environment– Safe and efficient power grids, safe and efficient operation of regional collections

of buildings• Health

– Reliable and cost effective health care systems with improved outcomes• Enterprise-wide decision making

– Coordination of dynamic distributed decisions for supply chains under uncertainty• Next generation communication systems

– Reliable wireless networks for homes and businesses

• Report of the Workshop on Dynamic Data Driven Applications Systems, F. Darema et al., March 2006, www.dddas.org

Source: M. Rotea, NSF

Page 7: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

The Cyber Infrastructure – Unprecedented Challenges

• Unprecedented complexity, challenges– Very large scales– Ad hoc (amorphous) structures/behaviors of virtual organizations

• p2p/hierarchical architecture– Dynamic

• entities join, leave, move, change behavior– Heterogeneous

• capability, connectivity, reliability, guarantees, QoS– Unreliable, lack of guarantees

• components, communication– Lack of common/complete knowledge

• number, type, location, availability, connectivity, protocols, semantics, etc.

“Grid Computing: An Evolving Vision,” M. Parashar and C. Lee, Proceedings of the IEEE, Special Issue on Grid Computing, IEEE Press, Vol. 19, No. 3, March 2005.

Page 8: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

Computational Modeling of Natural Phenomena

• Realistic, physically accurate computational modeling– Enormous computation requirements

• turbulent flow simulations using active flow control for biomedical engineering requires 5000x1000x500=2.5·109 grid points and approximately 107 time steps, i.e. with 1GFlop processors, requires a runtime of ~7·106 CPU hours, or about one month on 10,000 CPUs! (with perfect speedup). Also with 700B/pt the memory requirement is ~1.75TB of run time memory and ~800TB of storage.

• simulation of the core-collapse of supernovae in 3D with reasonable resolution (5003) would require ~20 teraflops for 1.5 months and about 200 terabytes of storage

– Dynamic and complex couplings• multi-physics, multi-model, multi-resolution, ….

– Dynamic and complex (ad hoc, opportunistic) interactions• application ↔ application, application ↔ resource, application ↔ data,

application ↔ user, …– Software/systems engineering issue

• volume and complexity of code, community of developers, …– scores of models, hundreds of components, millions of lines of code, …

Page 9: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

Enabling Computational Science on CI: A Convergence of Biology and Information Technology

• Computing has evolved and matured to provide specialized solutions to satisfy relatively narrow and well defined requirements in isolation

– performance, security, dependability, reliability, availability, throughput, pervasive/amorphous, automation, reasoning, etc.

– current programming paradigms, middleware/systems software, management tools, are inadequate to handle the scale, complexity, dynamism and heterogeneity of emerging systems

• In case of emerging applications/environments, requirements, objectives, execution contexts are dynamic and not known a priori

– requirements, objectives and choice of specific solutions (algorithms, behaviors, interactions, etc.) depend on runtime state, context, and content

– applications should be aware of changing requirements and executions contexts and to respond to these changes are runtime

• Convergence of Biology and Information Technology – Autonomic Computing – nature has evolved to cope with scale, complexity, heterogeneity, dynamism and

unpredictability, lack of guarantees• self configuring, self adapting, self optimizing, self healing, self protecting, highly

decentralized, heterogeneous architectures that work !!!

• Goal of autonomic computing is to build a self-managing systems/applications that self-manage using high level guidance from humans

Page 10: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

Project AutoMate: Enabling Autonomic Applications(http://automate.rutgers.edu)

• Conceptual models and implementation architectures for AutonomicComputing – programming systems based on popular programming models

• object, component and service based prototypes– content-based coordination and messaging middleware– amorphous and emergent overlays

Rudder

Coordination

Middlew

are

Sesam

e/DA

IS P

rotection Service

Autonomic Grid Applications

Programming SystemAutonomic Components, Dynamic Composition,

Opportunistic Interactions, Collaborative Monitoring/Control

Decentralized Coordination EngineAgent Framework,

Decentralized Reactive Tuple Space

Semantic Middleware ServicesContent-based Discovery, Associative Messaging

Content OverlayContent-based Routing Engine,

Self-Organizing Overlay

Ont

olog

y, T

axon

omy

Met

eor/S

quid

Con

tent

-bas

edM

iddl

ewar

e

Acc

ord

Prog

ram

min

gFr

amew

ork

“AutoMate: Enabling Autonomic Grid Applications,” M. Parashar et al, Cluster Computing: The Journal of Networks, Software Tools, and Applications, Special Issue on Autonomic Computing, Kluwer Academic Publishers. Vol. 9, No. 2, pp. 161 – 174, 2006.

Page 11: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

Project AutoMate: Components

• Accord – A Programming System for Autonomic Grid Applications

• Rudder/Comet – Decentralized Coordination Middleware• ACE – Autonomic Composition Engine• Meteor – Content-based Middleware• Squid – Decentralized Information Discovery and Content-

based Routing• SESAME – Context-Aware Access Management• DAIS – Cooperative Protection against Network Attacks

• More information/Papers – http://automate.rutgers.edu

Page 12: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

The Instrumented Oil Field of the Future (UT-CSM, UT-IG, RU, OSU, UMD, ANL) (NSF ITR 01, 04)

Detect and track changes in data during productionInvert data for reservoir propertiesDetect and track reservoir changes

Assimilate data & reservoir properties intothe evolving reservoir model

Use simulation and optimization to guide future production, future data acquisition strategy

• Production of oil and gas can take advantage of permanently installed sensors that will monitor the reservoir’s state as fluids are extracted

• Knowledge of the reservoir’s state during production can result in better engineering decisions

– economical evaluation; physical characteristics (bypassed oil, high pressure zones); productions techniques for safe operating conditions in complex and difficult areas

“Application of Grid-Enabled Technologies for Solving Optimization Problems in Data-Driven Reservoir Studies,” M. Parashar, H. Klie, U. Catalyurek, T. Kurc, V. Matossian, J. Saltz and M Wheeler, FGCS. The International Journal of Grid Computing: Theory, Methods and Applications (FGCS), Elsevier Science Publishers, Vol. 21, Issue 1, pp 19-26, 2005.

Page 13: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

Effective Oil Reservoir Management: Well Placement/Configuration

• Why is it important – Better utilization of existing reservoirs– Discovering new reservoirs– Minimizing adverse effects to the environment

Better

Less Bypassed Oil

Bad

Much Bypassed Oil

Page 14: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

Effective Oil Reservoir Management: Well Placement/Configuration

• What needs to be done– Exploration of possible well placements and configurations for

optimized production strategies– Understanding field properties and interactions between and across

subdomains– Tracking and understanding long term changes in field characteristics

• Challenges – Geologic uncertainty: Key engineering properties unattainable– Large search space: Infinitely many production strategies possible– Complex physical properties and interactions. Complex numerical

models

Page 15: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

Autonomic Well Placement/Configuration

If guess not in DBinstantiate IPARS

with guess asparameter

Send guesses

MySQLDatabase

If guess in DB:send response to Clientsand get new guess fromOptimizer

OptimizationService

IPARSFactory

SPSA

VFSA

ExhaustiveSearch

DISCOVERclient

client

Generate Guesses Send GuessesStart Parallel

IPARS InstancesInstance connects to

DISCOVER

DISCOVERNotifies ClientsClients interact

with IPARS

AutoMate Programming System/Grid Middleware

History/ Archived

Data

SensorData

Page 16: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

AutonomicAutonomic OilOil Well Placement/Configuration

permeability Pressure contours3 wells, 2D profile

Contours of NEval(y,z,500)(10)

Requires NYxNZ (450)evaluations. Minimumappears here.

VFSA solution: “walk”: found after 20 (81) evaluations

Page 17: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

AutonomicAutonomic OilOil Well Placement/Configuration (VFSA)

“An Reservoir Framework for the Stochastic Optimization of Well Placement,” V. Matossian, M. Parashar, W. Bangerth, H. Klie, M.F. Wheeler, Cluster Computing: The Journal of Networks, Software Tools, and Applications, Kluwer Academic Publishers (to appear).“Autonomic Oil Reservoir Optimization on the Grid,” V. Matossian, V. Bhat, M. Parashar, M. Peszynska, M. Sen, P. Stoffa and M. F. Wheeler, Concurrency and Computation: Practice and Experience, John Wiley and Sons, Volume 17, Issue 1, pp 1 – 26, 2005.

Page 18: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

Solution for 7 different initial guesses

Convergence history

AutonomicAutonomic OilOil Well Placement/Configuration (SPSA)

Page 19: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

MethodMetric

Nelder-Mead GA VFSA FDSA SPSA

Best solution -1.018e8 -1.073e8 -1.083e8 -1.062e8 -1.075e8

Average number of function evaluations 99.95 104.02 75.5 57.0 37.8

Optimal Well PlacementOptimal Well Placement

Comparison of optimization approachesComparison of optimization approaches

Optimal solution: F* = -1.098E8

Learned lessons:Learned lessons:• Robust stochastic algorithms increases the chances to find (near) optimal solutions (VFSA)

• Several trials of a fast algorithm pay off against sophisticated algorithms (SPSA)

• Need to develop hybrid strategies

Page 20: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

Knowledge-based Data-driven Management of Subsurface Geosystems: The Instrumented Oil Field (NSF KDI, ITR, with UT-CSM, UT-IG, OSU, UMD, ANL)

Detect and track changes in data during production.Invert data for reservoir properties.Detect and track reservoir changes.

Assimilate data & reservoir properties intothe evolving reservoir model.

Use simulation and optimization to guide future production.

Data Driven

ModelDriven

Page 21: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

Management of the Ruby Gulch Waste Repository (DoESciDAC, with UT-CSM, INL, OU, RU)

Page 22: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

Inverse Modeling

System responses

Parameters,Boundary & Initial Conditions

Forward Modeling

Prediction

ComparisonWith observations

Networkdesign

Applicationgoodbad

• Near-Real Time Monitoring, Characterization and Prediction of Flow Through Fractured Rocks

Adaptive Fusion of Stochastic Information for Imaging Fractured VadoseZones (NSF SEIII, with U of AZ, OSU, U of IW;)

Page 23: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

Data-Driven Forest Fire Simulation (DoE SCIDAC, with U of AZ)

• Predict the behavior and spread of wildfires (intensity, propagation speed and direction, modes of spread) – based on both dynamic and

static environmental and vegetation conditions

– factors include fuel characteristics and configurations, chemical reactions, balances between different modes of hear transfer, topography, and fire/atmosphere interactions.

“Self-Optimizing of Large Scale Wild Fire Simulations,” J. Yang*, H. Chen*, S. Hariri and M. Parashar, Proceedings of the 5th International Conference on Computational Science (ICCS 2005), Atlanta, GA, USA, Springer-Verlag, May 2005.

Page 24: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

System for Laser Treatment of Cancer – UT, Austin

Source: L. Demkowicz, UT Austin

Page 25: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

Synthetic Environment for Continuous Experimentation – Purdue University

Source: A. Chaturvedi, Purdue Univ.

Page 26: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

Integrated Wireless Phone Based Emergency Response System – Notre Dame

• Detect abnormal patterns in mobile call activity and locations• Initiate dynamic data driven simulations to predict the evolution of

the abnormality• Initiate higher resolution data collection in localities of interest• Interface with emergency response Decision Support Systems

Source: G. Madey, ND

Page 27: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

Grid Resource Broker and SchedulerGrid Resource Broker and Scheduler

Adaptive Simulation ControllerAdaptive Simulation Controller

Adaptive Optimization Controller

Sensors& Data

Mobile RF AMR Sensors Static RF AMRSensor Network

Static Water Quality Sensor Network

Resource Needs

Resource Availability

Bayesian Monte-Carlo Engine

Optimization Engine

Simulation Models

Grid Computing Resources

Adaptive Wireless Data Receptor and ControllerDeci-sions

DataAdaptive Workflow

Portal

Algorithms & Models

Middleware & Resources

Model Parameters

Model Outputs

Grid Resource Broker and SchedulerGrid Resource Broker and Scheduler

Adaptive Simulation ControllerAdaptive Simulation Controller

Adaptive Optimization Controller

Sensors& Data

Mobile RF AMR Sensors Static RF AMRSensor Network

Static Water Quality Sensor Network

Resource Needs

Resource Availability

Bayesian Monte-Carlo Engine

Optimization Engine

Simulation Models

Grid Computing Resources

Adaptive Wireless Data Receptor and ControllerDeci-sions

DataAdaptive Workflow

Portal

Algorithms & Models

Middleware & Resources

Model Parameters

Model Outputs

Adaptive Cyberinfrastructure for Threat Management in Urban Water Distribution Systems – NC State Univ.

Source: K. Mahinthakumar, NCSU

Page 28: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

Structural Health Monitoring and Critical Event Prediction – Stanford University

Pilot Display of Crisis

On-board Sensing & Processing

n-line Prognosis & Processing

DamageDetection

Material CharacterizationHigh-Performance Off-Line

Prognosis by Modeling and Simulation

On-Demand Maintenance

Detailed Report

Data for Model Update

Pilot Display of CrisisPilot Display of Crisis

On-board Sensing & Processing

n-line Prognosis & Processing

On-board Sensing & Processing

n-line Prognosis & Processing

DamageDetection

Material Characterization

Material CharacterizationHigh-Performance Off-Line

Prognosis by Modeling and SimulationHigh-Performance Off-Line

Prognosis by Modeling and Simulation

On-Demand Maintenance

Detailed Report

On-Demand Maintenance

Detailed Report

Data for Model Update

Source: C. Farhat, SU

Page 29: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

Auto-Steered Information-Decision Processes for Electrical Systems Asset Management – Iowa State Univ.

• Develop a hardware-software prototype for auto-steering the information-decision cycles inherent to managing operations, maintenance, & planning of high-voltage electric power transmission systems.

Remote ControlHeadquarters

WideArea

Network

SecureAccess

SatelliteEthernet Radio Optical Fiber Power Line Internet

2.4 or 5 GHz ISM Band

Wireless Sensor

Network Transceiver

Local Communication Box

NetworkSwitch

Base Station

Port ServerWAN

Connection

Data Processor

Source: J. McCalley, Iowa State

Page 30: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

Scientific Investigation and CI: Challenges and Opportunities

• Applications

• Algorithms

• Measurement systems

• Systems software

• Community testbeds

• Report of the Workshop on Dynamic Data Driven Applications Systems, F. Darema et al., March 2006, www.dddas.org

Page 31: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

Challenges and Opportunities: Applications (I)

• Dynamic and continuous validation of models, algorithms, systems, and (emergent) system of systems

• Uncertainty quantification– novel methods are needed to effectively quantify the end-to-end

measures of uncertainty, as models, data, and algorithms are cycled in and out

• Dynamic data driven, on demand scaling and resolution – application driven on demand scaling and configuration of

measurement systems for dynamic multi-resolution data inputs

• Observability, identifiability and tractability – establishing the pedigree of data and applications that contributed

to the result is critical and challenging

Page 32: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

Challenges and Opportunities: Applications (II)

• Self-organization of measurement systems, models, applications, workflows, and organizations

• Human in the loop decision-making and distribution of tasks among human and non-human actors– new theories and paradigms are needed to distribute intelligence and

tasks among human and non-human actors

• Community frameworks/tools for rapid experimentation

• Accommodation of plurality of system demands – (thoughts: social, economic impact vs. rapid response, and view

points: outside the system or inside the system)

Page 33: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

Challenges and Opportunities: Algorithms (I)

• Methods related to the measurement/data feedback on the computational model– mechanisms in which the computational models respond to the information

coming from the instrumentation • examples include new ways of data assimilation (ensemble Kalman-like or

hybrid stochastic estimation for messy, incomplete, and possibly out of time order data) and uncertainty estimation, adaptive parameter updating, dynamic model selection, and fast nonlinear update methods combined with multiscaleinterpolation of sensor data to form self correction methods, etc.

• Methods related to the computational model feedback on the measurement/data– mechanisms in which the data collection and instrumentation responds to

the results from the computational models • examples include instrument placement and control, data relevance assessment,

noise quantification and qualification, and robust dynamic optimization, sensor steering, and targeted observations/data refinement.

Page 34: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

Challenges and Opportunities: Algorithms (II)

• Advances in computational simulation tools for modeling– new advances in existing computational technologies

• uncertainty estimation and propagation, adaptive discretizationschemes, fast linear and nonlinear solvers that enable high-fidelity real-time simulations, fast error estimation, dynamic validation and verification, appropriate couplings for multiscale, multiphysicssimulations, and system identification in the context of dynamic data assimilation, are examples of simulation-level tools that need to be developed

• System and integration of related algorithms• real time algorithms for managing large dataware, managing

processing, parallel prototyping, guaranteed and consistent resources (including networks), dynamic scheduling of processors, networks, Grids, and changes in allocations, process migration, check-pointing, fast visualization techniques, algorithms appropriate for sensorembedding, and secure transmission

Page 35: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

Challenges and Opportunities: Measurement Systems

• Components: – instruments, sensors, databases, human inputs and other devices for

taking measurements; data quality assessment, data formatting, and feature extraction; and data measurement steering

• Issues: – on-demand data collection and management, data streaming into the

simulation, data representation, data models and congruence of data, real-time constraints, data processing/preprocessing, data collectionrates, consumption rates, available bandwidths and other resources and how to discover, obtain, establish the authenticity and correctness and how to maintain an audit trail

Page 36: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

Challenges and Opportunities: Systems Software (I)

• Programming models and system:– support the formulation of application components and applications that are

capable of correctly and consistently adapting their behaviors, interactions and compositions in real time in response to dynamic data and application/system state, while satisfying real time, space, functional, performance, reliability, security, and quality of service constraints

• Data management (acquisition, assimilation, transport, manipulation, exploration) mechanisms and services:– enable seamless acquisition of high data rate high volume data from varied,

distributed and possible unreliable data sources, while addressing stringent real time, space and data quality constraints, and enabling the seamless and dynamic integration of this data with computation and resources

Page 37: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

Challenges and Opportunities: Systems Software (II)

• Runtime execution and middleware services– support dynamic, data and knowledge driven and time constrained

executions, adaptations, interactions, compositions for application elements, while guaranteeing reliable and resilient execution and predictable and controllable response times

• Computation infrastructures– support immediate on-demand and anticipatory resource co-allocations and

co-scheduling, dynamic, secure and seamless aggregation of and interactions betweens distributed resources and varied data sources, dynamic configuration and instantiation and reliable and traceable execution of application workflows, real time and soft real time QoS guarantees, pervasive remote monitoring and access and human-in-the-loop interactions

• Community testbeds

Page 38: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

Building Collaborations at Rutgers (IMHO)

• Application are naturally large and complex

• Computational science is critical “… applied computer (computational ...) science is now playing the role

which mathematics the role which mathematics did from the seventeenth through the twentieth centuries: providing an orderly, formal framework and exploratory apparatus for other sciences…“

- G. Djorgovski, Director, CACR, Caltech, 2005.

• Need to seed, catalyze and nurture multi-disciplinary collaborations– productive, lasting collaborations are:

• built bottom-up • must be win-win• need time and effort

• Resources and infrastructure is crucial …– compute (capability, capacity), communication (NLR), collaboration, data,

visualization, etc.

• Home for Multidisciplinary Computational Science at Rutgers !

Page 39: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

Conclusion

• Next Generation Scientific Investigation: Knowledge-based, data and information driven, dynamically adaptive applications on the Cyberinfrastructure– Unprecedented opportunities for global scientific investigation

• can enable accurate solutions to complex applications; provide dramatic insights into complex phenomena

– Unprecedented research challenges• scale, complexity, heterogeneity, dynamism, reliability, uncertainty, …• applications, algorithms, measurements, data/information, software

– Project AutoMate: Autonomic Computational Science on the Grid• Accord, Rudder/Comet, Meteor, Squid, Topos, …

• More Information, publications, software– www.caip.rutgers.edu/~parashar/– [email protected]

Page 40: Knowledge-based Information/Data- Driven Science …parashar/Papers/parashar-ru...2006/04/04  · • Cyberinfrastructure - Unprecedented Complexity – scale, heterogeneity, dynamism,

The Team

• TASSL, CAIP/ECE Rutgers University– Viraj Bhat– Sumir Chandra– Andres Q. Hernandez– Nanyan Jiang– Zhen Li (Jenny)– Vincent Matossian– Cristina Schmidt– Mingliang Wang– Li Zhang

• Key CE/CS Collaborators– Rutgers Univ.

• D. Silver, D. Raychaudhuri, P. Meer, M. Bushnell, etc.

– Univ. of Arizona• S. Hariri

– Ohio State Univ.• T. Kurc, J. Saltz

– GA Tech• K. Schwan, M. Wolf

– University of Maryland• A. Sussman, C. Hansen

• Key Applications Collaborators– Rutgers Univ.

• R. Levy, S. Garofilini– UMDNJ

• D. Foran, M. Reisse– CSM/IG, Univ. of Texas at Austin

• H. Klie, M. Wheeler, M. Sen, P. Stoffa– ORNL, NYU

• S. Klasky, C.S. Chang– CRL, Sandia National Lab., Livermore

• J. Ray, J. Steensland– Univ. of Arizona/Univ. of Iowa, OSU

• T. –C. J. Yeh, J. Daniels, A. Kruger– Idaho National Laboratory

• R. Versteeg– PPPL

• R. Sataney– ASCI/CACR, Caltech

• J. Cummings