cyberinfrastructure to support scientific exploration and collaboration dennis gannon (based on work...

37
CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale ) School of Informatics Indiana University

Upload: louisa-chase

Post on 20-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

CyberInfrastructure to Support Scientific Exploration and Collaboration

CyberInfrastructure to Support Scientific Exploration and Collaboration

Dennis Gannon(based on work with many collaborators,

most notably Beth Plale ) School of Informatics

Indiana University

Page 2: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

Overview• CyberInfrastructure for Virtual Organizations

in Computational Science– Science Portals and The Gateway Concept

• Automating the Trail From Data to Scientific Discovery– An Example in Depth: Mesoscale Storm Prediction

• The challenge for the Individual Researcher– Connecting tools to on-line services

• The Promise of the Future– Multicore, personal petabyte and gigabit

bandwidth

Page 3: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

The Realities of Science in the U.S.

• “Big Science” dominates the funding hierarchy. – Why? Its important and easy to sell to

congress.

• The NSF is investing in vast network of supercomputers to support big science– The results are empowering a broad

range of scientific communities.

• Where is the single investigator?– The Web has enabled democratization of

information access– Is there a similar path for access

advanced computational resources?

Page 4: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

Democratizing Access to Science

• What is needed for the individual or small team to do large scale science?– Access to data and the tools to analyze it

and transform it.– A means to publish the not just the results

of a study but a way to share the path to discovery.

• Where are the resources?– What we have now: TeraGrid– What is emerging?

Page 5: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

The TeraGrid• The US National Supercomputer Grid

– CyberInfrastructure composed of a set of resources (compute and data) that provide common services for

• Wide area data management, Single sign-on user authentication• Distributed Job scheduling and management. (in the works.)

• Collectively– 1Petaflop– 20 Petabytes

• Soon to triple.• Will add a petaflop

each year.– But at a slower

rate than google, ebay, amazon add resources.

Page 6: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

TeraGrid Wide: Science Gateways

• Science Portals– A Portal = a web-based home+personal workspace

+ personal tools. – Web Portal Technology + Grid Middleware

• Enables a community of researcher to:– Access to shared resources (both data and

computation)– A forum to collaboration on shared problem solving

• TeraGrid Science Gateways – Allow the TeraGrid to be the back-end resource.

Page 7: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

NEESGridRealtime access to earthquake Shake table experiments at remote sites.

Page 8: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

BIRN – Biomedical Information

Page 9: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

Geological Information Grid Portal

Page 10: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

Renci Bio PortalProviding access to biotechnology tools running on a back-end Grid.

- leverage state-wide investment in bioinformatics- undergraduate & graduate education, faculty research- another portal soon: national evolutionary synthesis center

Page 11: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

Nanohub - nanotechnology

Page 12: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

X-Ray Crystallography

Page 13: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

ServoGrid Portal

Page 14: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

The LEAD Project

Page 15: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

Predicting Storms• Hurricanes and tornadoes cause massive

loss of life and damage to property• Underlying physical systems involve highly

non-linear dynamics so computationally intense

• Data comes from multiple sources– “real time” derived from streams of data from

sensors– Archived in databases of past storms

• Infrastructure challenges:– Data mine instrument radar data for storms– Allocate supercomputer resources

automatically to run forecast simulations– Monitor results and retarget instruments.– Log provenance and metadata about

experiments for auditing.

Page 16: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

Analysis/Assimilation

Quality ControlRetrieval of Unobserved

QuantitiesCreation of Gridded Fields

Prediction/Detection

PCs to Teraflop Systems

Product Generation, Display,

Dissemination

End Users

NWSPrivate Companies

Students

Traditional Methodology

STATIC OBSERVATIONS

Radar DataMobile Mesonets

Surface ObservationsUpper-Air BalloonsCommercial Aircraft

Geostationary and Polar Orbiting Satellite

Wind ProfilersGPS Satellites

The Process is Entirely Serialand Static (Pre-Scheduled): No Response to the Weather!

The Process is Entirely Serialand Static (Pre-Scheduled): No Response to the Weather!

Page 17: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

Analysis/Assimilation

Quality ControlRetrieval of Unobserved

QuantitiesCreation of Gridded Fields

Prediction/Detection

PCs to Teraflop Systems

Product Generation, Display,

Dissemination

End Users

NWSPrivate Companies

Students

The LEAD Vision: Enabling a new paradigm of scientific exploration.

DYNAMIC OBSERVATIONS

Models and Algorithms Driving Sensors

The CS challenge: The CS challenge: * Build cyberinfrastructure services that provide * Build cyberinfrastructure services that provide adaptability, scalability, availability, useability. adaptability, scalability, availability, useability. * Create a new paradigm of meteorology research.* Create a new paradigm of meteorology research.

Page 18: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

Building Experiments that Respond to the Future

Can we pose a scientific search and discovery query that the cyber infrastructure executes as our agent?

• In the LEAD case it is Data Driven, Persistent and Agile – Weather data streams define nature of computation– Mine the data streams, detect “interesting” features, event

triggers workflow scenario that has been waiting for months.

Page 19: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

The LEAD Gateway Portal• To support three classes of users

– Meteorology research scientists & grad students.– Undergrads in meteorology classes– People who want easy access to weather data.

Go to:http://www.leadproject.org

Page 20: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

Gateway Components • A Framework for Discovery

– Four basic components

• Data Discovery– Catalogs and index services

• The experiment– Computational workflow managing on-demand

resources

• Data analysis and visualization• Data product preservation,

– automatic metadata generation and experimental data providence.

Page 21: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

Data Search

• Select a region and a time range and desired attributes

Page 22: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

Building Experiments

• As the user interacts with the portal they are creating “experiments”

• An experiment is– A collection of data (or desired data)– A set of analysis, transformational or

predictive tasks • Defined by a workflow or a high level query

– A provenance document that encodes a repeatable history of the experiment.

Page 23: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

Portal: Experimental Data & Metadata Space

• CyberInfrastructure extends user’s desktop to incorporate vast data analysis space.

• As users go about doing scientific experiments, the CI manages back-end storage and compute resources.

– Portal provides ways to explore this data and search and discover it.

• Metadata about experiments is largely automatically generated, and highly searchable.

– Describes data object (the file) in application-rich terms, and provides URI to data service that can resolve an abstract unique identifier to real, on-line data “file”.

Page 24: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

arpssfc

arpstrn Ext2arps-ibc

88d2arps

mci2arps

ADASassimilation

arps2wrf

nids2arps

WRF

Ext2arps-lbc

wrf2arps

arpsplot

IDV viz

Terrain data files

Surface data files

ETA, RUC, GFS data

Radar data (level II)

Radar data (level III)

Satellite data

Surface, upper air mesonet & wind profiler

data

Typical weather forecast runs as workflow

~400 Data Products Consumed & ~400 Data Products Consumed & Produced Produced –– transformedtransformed –– during during

Workflow LifecycleWorkflow Lifecycle

~400 Data Products Consumed & ~400 Data Products Consumed & Produced Produced –– transformedtransformed –– during during

Workflow LifecycleWorkflow Lifecycle

Pre-ProcessingPre-Processing AssimilationAssimilation ForecastForecast VisualizationVisualization

Page 25: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

The Experiment Builder• A Portal “wizard” that leads the user

through the set-up of a workflow

• Asks the user: – “Which workflow do you want to run?”

• Once this is know, it can prompt the user for the required input data sources

• Then it “launches” the workflow.

Page 26: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

Parameter Selection

Page 27: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

Selecting the forecast region

Page 28: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )
Page 29: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

Experience so far• First release to support “WxChallenge: the new collegiate weather

forecast challenge”– The goal: “forecast the maximum and minimum temperatures,

precipitation, and maximum sustained wind speeds for select U.S. cities.

– to provide students with an opportunity to compete against their peers and faculty meteorologists at 64 institutions for honors as the top weather forecaster in the nation.”

– 79 “users” ran 1,232 forecast workflows generating 2.6TBybes of data.

• Over 160 processors were reserved on Tungsten from 10am to 8pm EDT(EST), five days each week

• National Spring Forecast– First use of user initiated 2Km forecasts as part of that program.

Generated serious interest from National Severe Storm Center.

• Integration with CASA project scheduled for final year of LEAD ITR.

Page 30: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

Is TeraGrid the Only Enabler?

• The web has evolved a set information and service “super nodes”– Directories & indexes (google, MS, Yahoo)– Transactional mosh pits (eBay, Facebook, Wikipedia)– Raw data and compute services (Amazon …)

• We can build the tools for scientific discovery on this “private sector” grid?– Yes.– One CS student + one Bio-informatician + Amazon

Storage Service + Amazon Compute Cloud = ..

Page 31: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

A Virtual Lab for Evolutionary Genomics• Data and databases live on S3• Computational Tools run (on-

demand) as services on EC2.• User composes workflows.• Result data and metadata visible to

user through desktop client.

Page 32: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

Validating Scientific Discovery

• The Gateway is becoming part of the process of science by being an active repository of data provenance

• The system records each computational experiment that a user initiates – A complete audit trail of the

experiment or computation – Published results can include link to

provenance information for repeatability and transparency.

• The Scientific Method is all about repeatability of experiments– Are we there yet?

Page 33: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

Almost

• The provenance contains the workflow and if we publish it, it can be re-run– Are the same resources still available?

• Not a necessary condition for validation

– Has the data changed?

• Another user can modify it.– Replace an analysis step with another– Test it on different data.

Page 34: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

The Future Experimental Testbed• In five years multicore, personal

petabytes and ubiquitous gigabit bandwidth– Much richer experimental capability on the

desktop. More of the computational work can be downloaded

• Do we no longer need the massive remote data/compute center?– Demand scales with capability.

• But there is more.

Page 35: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

Last Thought

• Vastly improved capability for interactive experimentation– Data exploration and visualization. Interacting

with hundreds of incoming data streams. – Tracking our path and exploring 100 possible

experimental scenarios concurrently. – Deep search agents

• Discovering new data and new tools• Grab data - automatically fetch and analyze the

provenance and set up the workflow to be re-run.

Page 36: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

Questions

Page 37: CyberInfrastructure to Support Scientific Exploration and Collaboration Dennis Gannon (based on work with many collaborators, most notably Beth Plale )

The Realization in Software

Data Storage

Application services Compute Engine

User Portal

PortalserverPortalserver

DataCatalogservice

DataCatalogservice

MyLEAD UserMetadatacatalog

MyLEAD UserMetadatacatalog

MyLEAD Agent

service

MyLEAD Agent

service DataManagement

Service

DataManagement

Service

WorkflowEngine

WorkflowEngine

Workflow graph

ProvidenceCollection

service

ProvidenceCollection

service

Event Notification Bus

FaultTolerance

& scheduler

FaultTolerance

& scheduler