scientific workflows and the dissemination of computations...

81
Scientific Workflows and the Dissemination of Computations and Data

Upload: others

Post on 08-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Scientific Workflows and the Dissemination of Computations and Data

Page 2: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Goals

• Generalize the generic functionality into reusable frameworks

• Create a simple language to share, discuss and re-execute flows of computations

• Enable dissemination of computations and results both on the desktop and on the web

Page 3: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Another view of Sp2Learn/GeoLearn

• Sp2Learn/Geolearn is a sequence of steps• Each step leads to next step in sequence• Sequence of steps can be re-executed• Sequence of steps is scientific workflow

LoadRaster(s)

CombineRasters

ExtractArea

SelectInputs and

Outputs

CreatePrediction

Model

ComputeAccuracy

Page 4: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Scientific Workflow

Wikipedia defines a scientific workflow as:“A scientific workflow is the process of combining data and processes into a configurable, structured set of steps that implement semi-automated computational solutions of a scientific problem. Scientific workflow systems often provide graphical user interfaces to combine different technologies along with efficient methods for using them, and thus increase the efficiency of the scientists.”

Page 5: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Requirements (1/3)

• Allow for reuse of existing tools• Don’t force the use of our favorite programming language• Don’t make people re-implement their tools

• Allow for sharing of knowledge• Share the data/tools/workflows

• Standards• Use of standard technologies where possible

• Provenance• Who did what when and how?

• Security• Limit access to data• Limit access to compute resources

Page 6: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Requirements (2/3)

• Create an editor for scientific workflows• Don’t hide the workflow

• Easy to extend• Support java, compiled code, matlab, …

• Easy to use• Playing is more fun (try things out, don’t solve everything first)

• Remote execution• Long running jobs• Compute intensive jobs• Limited resources

Page 7: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Requirements (3/3)

• Allow for web access• Allows for easy sharing• Allows for easy re-execution• Allows for visualizations of results• No required downloads

Page 8: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Provenance

Wikipedia defines a provenance as:“Provenance, from the French provenir, "to come from", means the origin, or the source, of something, or the history of the ownership or location of an object. The term was originally mostly used of works of art, but is now used in similar senses in a wide range of fields, including science and computing.”

Page 9: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Cyberintegrator & DSE

• 2 proposed frameworks we are currently building• Playgrounds for us to

• try new features in• solve problems presented to us by different communities

• Cyberintegrator is currently in beta• DSE going into alpha release• All code developed is OpenSource

Page 10: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Birthday Weather Demo (Input Form)

Page 11: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Birthday Weather Demo (Result)

Page 12: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Birthday Weather Demo (Execution)

Page 13: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Birthday Weather Demo

• Data for 48 contiguous Unites States• Basic meteorological variables• Over 1000 observing stations• Data from 1871 to 2005 available

• 1,250,055 data points computed per execution

Page 14: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

High Level System Architecture

Page 15: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Information Sharing

• Virtual Communities• Multiple people want access to same data/tools etc• Sharing of knowledge

• Multiple applications want shared access• Cyberintegrator• DSE• Cyberintegrator execution service• …

• Data is stored as blobs and metadata about data as RDF

Page 16: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Resource Description Framework (RDF)

• RDF data model is based upon the idea of making statements about resources, in the form of subject-predicate-object expressions

• For example "The sky has the color blue" in RDF:• a subject denoting “the sky”• a predicate denoting “has the color”• an object denoting “blue”

• A collection of RDF statements intrinsically represents a labeled, directed multi-graph.

Page 17: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Shared Content Repository

•Mysql•Derby•Sesame•File System•WebDAV

Content Repository

Page 18: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Cyberintegrator

Page 19: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Cyberintegrator Editor

Page 20: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Cyberintegrator

• Focus attention on exploration• Support discovery in workflow creation via ‘Macro-recording’

style interface• Separate science from ‘logistics’

• Workflows as a communication mechanism• Make workflows (templates and provenance of runs)

documented and sharable

• Enable integration of independent tools• Keep models, algorithms, data in open formats accessible from

outside the scientific workflow system

Page 21: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Workflow View

• Show tools executed and information• Parameters used for execution• Who executed the tool• When was it executed (and when did it finish)

• Data sets used for input• Data sets generated as output

Page 22: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Cyberintegrator Architecture

• RCP application• Plugin based• Engine plugin

• Threaded Engine• Remote Engine

• Executor plugin• Java• Matlab• Command Line

Cyberintegrator

Engine

Executor Executor Executor

Page 23: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Tool Creation

• Use wizards (defined by executor)• All resources are stored with tool definition

• All files are zipped and uploaded to repository• Can export tool and look at included files

• When executing tool all files downloaded• File are stored in temp folder• Inputs are stored in temp folder as well• No need to have tool installed on local machine!

• Can edit tool• We all make mistakes• Trail of edits is stored (and documented)

Page 24: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Matlab Tool Creation

Script to execute

Resources required by script

Page 25: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Digital System Explorer

Page 26: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Digital System Explorer (DSE)

• A simple and accessible web interface for the end user to browse and share scientific workflows and results on the web

• A set of libraries to build rich internet applications• Intuitive interfaces for

• Scientific data• Aggregate scenarios around workflows and executions of

workflows• Provenance trail

• Requirements• Easy access: web browser• Open system: ease of integration with existing applications.

Restful web services, RSS feeds, web gadgets, portlets

Page 27: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Computation: Input Page

Text Input Widget

Map Input Widget

Page 28: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Computation: Executing

Execution Status

WorkflowSteps

Page 29: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Computation: Results Page

Datasets produced

by execution

Simple Visualization

Page 30: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

New Computation

• Publishing a workflow by• Providing high level descriptions• Selecting the workflow to publish• Selecting what parameters to make available and how• Selecting what datasets to publish as results of an execution and

how to visualize them

Page 31: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

New Computation: General Information

Page 32: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

New Computation: Select Workflow

Page 33: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

New Computation: Select Parameters

Page 34: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Configure Input Widgets

Page 35: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Configure Output Visualizations

Page 36: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Publishing Workflows

• Publishing workflows to the web is• Wizard driven• Require no writing of code

• Similarly to other web 2.0 sites, we are counting on the wisdom of the crowds to self organize to solve scientific problems

• One possible caveat• Privacy and being willing to share scientific results

Page 37: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Possible Future Work

• Migrate SP2Learn and GeoLearn functionality to Cyberintegrator framework

• Add more input widget types and visualizations types to the DSE framework

• Add collaborative features such as social tagging and discussion thread to the DSE (this is already available in Cyberintegrator)

Page 38: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Scenarios

• Two ongoing efforts to highlight some of the features discussed so far:• Plant Growth 4-H• Virtual Sensors

Page 39: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Plant Growth 4-H Scenario

Page 40: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Goals

• Incorporates a state-of-the-art generic crop growth simulation model and historic weather data for the purposes of designing educational activities for young learners.

• Enable informal learners to operate a high fidelity plant growth model through a web interface.

Page 41: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Team

• Luigi Marini, Andrew Wadsworth, Terry McLaren, Jim Myers, Raouf Berrabah, Joe Mansour, CET, NCSA;

• Anand Padmanabhan, NCSA & Dept. of Geography; • Lisa Bouillion-Diaz, Extension Specialist, 4-H Youth; • Xinguang Zhu, NCSA & Dept. of Plant Biology; • Wen Wu Tang, Dept. of Geography; • Dennis Bowman, Extension Educator, Crop Sciences; • Bill Million, Extension Specialist, 4-H Youth, UIUC

Page 42: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

The model

• “WIMOVAC (Windows Intuitive Model of Vegetation response to Atmosphere and Climate Change) is designed to facilitate the modelling of various aspects of plant photosynthesis with particular emphasis on the effects of global climate change.”

• S.W. Humphries , and S.P. Long WIMOVAC: a software package for modelling the dynamics of plant leaf and canopy photosynthesis Comput. Appl. Biosci. 11: 361-371.

Page 43: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Exposed Parameters

• Wrapped the model as a Cyberintegrator tool

• Exposed few relevant parameters as tool parameters

• This allowed us to easily create several workflows to be used in the different activities

Page 44: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Activities

• Activity 1 - Compare historical crop yield results in different regions.

• Activity 2 - Learn about the effects of CO2 levels in the atmosphere on plant growth.

• Activity 3 - Determine amount of seed to plant for optimal crop yield.

Page 45: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Plant Growth 4-H: Activity 1

• Compare historical crop yield results in different regions.• User input: latitude, longitude• Output: county, state, soil type and yield (bushels/acre)

of Corn or Soybeans visualized as table and plant graph

Page 46: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Plant Growth 4-H: Activity 1

Page 47: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Plant Growth 4-H: Activity 1

Page 48: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Plant Growth 4-H: Activity 1

Page 49: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Behind the Scenes

• Input map widget (similar to the one in weather scenario) passes on Latitude.

• Simple web application retrieves information about that particular point using PostGIS* queries.

• PostGIS supports spatial comparison functions such as ST_Containts (whether one geometry is completely contained by anther geometry)

*Adds support for geographic objects to the PostgreSQL object-relational database

Page 50: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Behind the Scenes

• Data was collected by Dennis Bowman and WenwuTang from:• USDA - National Resource Conservation Services• USDA - National Agricultural Statistics Services

Page 51: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Plant Growth 4-H: Activity 2

• Effects of CO2 levels in the atmosphere on plant growth.• User input: 5 CO2 levels (one per model execution) and

latitude of location• Output: yield (bushels/acre) of Corn or Soybeans

visualized in table, plant graph and growth line graph

Page 52: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Plant Growth 4-H: Activity 2

Page 53: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Plant Growth 4-H: Activity 2

Page 54: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Plant Growth 4-H: Activity 2Yi

eld

Page 55: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Behind the Scenes

• Each text box maps to the Carbon Dioxide Concentration parameter of model execution

• The location on the map defines the latitude of all five executions

• When the user clicks “Run All” a new instance of the workflow is instantiated

• The second page polls the server waiting for all executions to complete

Page 56: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Behind the Scenes

• Once the executions are done running the results page parses the output files to visualize a specific subset of results

• The model itself outputs the results in a CSV format with more than 100 variables

• In this particular case we are only interested in the total yield

Page 57: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Plant Growth 4-H: Activity 3

• Determine amount of seed to plant for optimal crop yield.• User input: 5 seed/m2 values (one per model execution)

and latitude of location• Output: yield (bushels/acre) of Corn or Soybeans

visualized in table, plant graph and growth line graph

Page 58: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Plant Growth 4-H: Activity 3

Page 59: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Plant Growth 4-H: Activity 3

Page 60: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Plant Growth 4-H: Activity 3Yi

eld

Page 61: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Future Work

• Reuse workflows and tools to create a more advanced scenario for researchers and/or scientists• Ability to tweak more parameters• Visualize more of the output data

• Add collaborative features such as the ability to discuss results in threaded discussions

Page 62: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Virtual Sensor Scenario

Page 63: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Goals

• Being developed to support a Chicago watershed research project to provide a real-time decision support system for optimal control of the Combined Sewage Overflow system

• Virtual rain gages can be defined on the Google map by clicking the "Add VS" button and clicking on a location in the Google Map

Page 64: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

The Concept of Virtual Sensors

• Our definition of a virtual sensor is• the product of thematic, spatial, and/or temporal transformation and

aggregation of one or multiple raw sensor measurement(s)

• An example of a virtual sensor• From WATERS Network SEDS Draft (Chapter 5, p108)

• Signals from arrays of individual sensors and clusters of such arrays would be combined to provide higher-level information. For example, an array of soil moisture and temperature sensors might be coupled to a microclimate array to provide a virtual soil moisture flux sensor.

Slide courtesy of Yong Liu

Page 65: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Team

• Yong Liu, David Hill, Alejandro Rodriguez, Luigi Marini, Rob Kooper, James Myers, Terry McLaren, Nick Michal, Xiaowen Wu, NCSA;

• Barbara Minsker, NCSA and Civil and Environmental Engineering at UIUC;

Page 66: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

User-Created Virtual Sensors in the Upper Illinois Watershed

User-created Virtual Sensor

Locations

Virtual Sensor Time-series

Plot(derived from

KLOT NEXRAD)

Slide courtesy of Yong Liu

USGSGages(green bubbles)

Page 67: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

NEXRAD

• The Next Generation Radar (NEXRAD) system is a network of approximately 160 high-resolution Doppler weather radars operated by the National Weather Service.

• The NEXRAD system measures reflectivity, radial velocity and spectrum width of the radar echoes returned from volumes within the atmosphere at a frequency of 5, 6, or 10 minutes (but never exact) depending on the operation mode of the radar.

• The reflectivity is correlated with the precipitation rate.

Page 68: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Virtual Sensor based on NEXRAD

• Workflows• Provides NEXRAD Level II-based virtual sensor data stream in near-real-time

0. NEXRAD data stream 1. Spatial transformation to points

2. Thematic transformation to rainfall rates

3. Publish one data stream per point of interest

4. Temporal aggregation to produce n-minute rainfall accumulation at one point

5. Publish one virtual sensor data stream with n-minute

rainfall accumulation

•Workflow 1: step 0,1,2,3•Run periodically at the arrival rate of the NEXRAD Level II data stream

•Workflow 2: step 4,5•run at the user-specified rain accumulation interval

•E.g.: every 20 minutes

Slide courtesy of Yong Liu

Page 69: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Workflows

• One master workflow handling the spatial and thematic transformation• Triggered when new data is available

• One workflow per Virtual Sensor handling the temporal transformation

• Tools in workflows wrap C++ model• Streams are the links between the pieces

Page 70: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

A Virtual Sensor Data Model

Virtual SensorhasLocation

SpatialThing

Point Polygon

isAisA

hasDataStream DataStreamderivedFrom

hasThematicInterest

ThematicIntereste.g. rainfall rate, rain fall accumulation

TemporalFrequency GIS Layer

hasTemporalIntervalbelongsToLayer

A Virtual Sensor is more than just a new time-series data stream.

Slide courtesy of Yong Liu

Page 71: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Metadata and Standards

• Using existing namespaces and standards:• http://www.opengis.net/sensorML/1.0.1/uom• http://www.w3.org/2003/01/geo/wgs84_pos#Point • http://www.opengis.net/gml/location

• Describing metadata in a portable and expressive framework:• Resource Description Framework (RDF)

• (Think XML squared)

Page 72: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Publish Virtual Sensor as OGC SWE-compliant SensorML format

<sml:SensorML version="1.0.1" xsi:schemaLocation="http://www.opengis.net/sensorML/1.0.1 http://schemas.opengis.net/sensorML/1.0.1/sensorML.xsd">

<sml:identifier name="URI"><sml:Term definition="urn:ogc:def:identifierType:OGC:uniqueID"><sml:value>tag:cet.ncsa.uiuc.edu,2008:/VirtualSensor/Sears/rainfall-rate</sml:value></sml:Term></sml:identifier>......

<sml:identifier name="derivedFrom"><sml:value>NEXRAD Level II data from WSR-88D KLOTDoppler radar</sml:value>............

<sml:methodname="SpatialandThematicTransformation"xlink:href="http://sensorweb-dev.ncsa.uiuc.edu:8190/cyberintegrator/cron/jobs/4327600e-a8a5-4cec-9d00-f099081b764e/>

What source data is used to derive the virtual sensor?

What workflow is used?

•Provenance informationIs available for verification

•Interoperability is maintainedthrough SWE-compliant publishing

Slide courtesy of Yong Liu

Page 73: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Summary and Conclusions

Page 74: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Summary

• Common Requirements• Reusable Solutions• Conclusions

Page 75: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Common Requirements• Data:

• Access large amounts of hydrologic, geographic, meteorological, water quality, soil type, land-use and many other types of data

• Ingest and integrate heterogeneous large size data and streaming data• Computational Resources:

• Perform complex CPU and memory intensive data-driven analyses• Utilize a spectrum of distributed computational resources

• Data-driven Analyses (Software):• Design data-driven (data mining, machine learning, pattern recognition,

statistical) analyses• Visualize and interpret data-driven models• Integrate data-driven models with physics/chemistry/bio based models

• Data and Software Integration:• Exercise seamlessly functionality present in heterogeneous software

packages using available computational resources• Provide an environment where heterogeneous visualization and mining

tools could be integrated into workflows, and the analysis workflows could be re-used and modified.

75

Page 76: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Reusable Solutions

• Multiple layers of abstraction

DataSensors Algorithms

Computational ResourcesFrameworksEnd User

Applications

Users

Page 77: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Conclusions

• Geospatial data increasing exponentially• 160 NEXRAD stations, physical sensors, satellites, historical

maps, planmaps, virtual sensors, …

• More earth observatories being planned• Resulting in even more data

Page 78: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Conclusions

• Research is done in larger groups• Problems are getting bigger• More people are interested in results

• Need to create tools to share knowledge• Both the final results as well as how the results are obtained• Need to allow for re-use of workflows

• Need to create infrastructures for re-use• We should not create custom applications for each problem

• Users are what matters• Whether they are researchers, students, stakeholders, scientists,

developers, etc.

Page 79: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Acknowledgement

• This research was partially supported by National Aeronautics and Space Administration (NASA), the Faculty Fellow Program at National Center for Supercomputing Applications (NCSA), the Illinois State Water Survey (ISWS), NCSA Industrial partners, ONR Technology Research Education and Commercialization Center (TRECC), State of Illinois, Costa Rica CENAT, NARA, UIUC Provost Office, Google Summer of Code

• Contributions by the members of the Image Spatial Data Analysis (ISDA) Group, CET Division at NCSA and our collaborators from ISWS, CEE UIUC, University of Illinois Extension 4-H and International Institutions

Imaginations unbound

Page 80: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Disclaimer

• The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the sponsors.

Page 81: Scientific Workflows and the Dissemination of Computations ...isda.ncsa.uiuc.edu/ILGISA/geoinformatics-3.pdf · • Enable dissemination of computations and results both on the desktop

Thank you! Questions?

• Slides will be posted by the end of the week at• http://isda.ncsa.uiuc.edu/ILGISA/

• More information on the projects is available at• http://isda.ncsa.uiuc.edu• http://cet.ncsa.uiuc.edu

• Contact Information • Peter Bajcsy - [email protected]• Michal Ondrejcek - [email protected]• Rob Kooper - [email protected]• Luigi Marini - [email protected]