instant karma collecting provenance for amsr-e beth plale director, data to insight center indiana...

19
Instant Karma Collecting Provenance for AMSR-E Beth Plale Director, Data to Insight Center Indiana University Helen Conover Information Technology and Systems Center, University of Alabama in Huntsville Joint AMSR-E Science Team Meeting June 2-3, 2010 Huntsville, AL 1

Upload: barnard-barber

Post on 30-Dec-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Instant Karma Collecting Provenance for AMSR-E Beth Plale Director, Data to Insight Center Indiana University Helen Conover Information Technology and

1

Instant KarmaCollecting Provenance for AMSR-E

Beth PlaleDirector, Data to Insight Center

Indiana University

Helen ConoverInformation Technology and Systems Center,

University of Alabama in Huntsville

Joint AMSR-E Science Team MeetingJune 2-3, 2010Huntsville, AL

Page 2: Instant Karma Collecting Provenance for AMSR-E Beth Plale Director, Data to Insight Center Indiana University Helen Conover Information Technology and

Objective

Approach

Co-Is/Partners

Key Milestones

Instant Karma: Applying a Proven Provenance Tool to NASA’s AMSR-E Data Production Stream

PI: Michael Goodman, NASA MSFC

• Improve the collection, preservation, utility and dissemination of provenance information within the NASA Earth Science community

• Customize and integrate Karma, a proven provenance tool into NASA data production

• Collect and disseminate provenance of AMSR-E (Advanced Microwave Scanning Radiometer – Earth Observing System) standard data products, initially focusing on Sea Ice

• Engage the Sea Ice science team and user community• Adhere to the Open Provenance Model (OPM)

• Apply Karma to Sea Ice data production workflows• Customize Karma’s provenance dissemination user

interface• Evaluate usefulness of provenance collected

- Measure traffic to Karma Provenance Browser- Collect user feedback

- Expand use of Karma to other AMSR-E data production streams

Thorsten Markus, NASA GSFC; Beth Plale, Indiana University; Rahul Ramachandran, Helen Conover, UAHuntsville TRLin= 7 TRLcurrent= 7

11/09

• Evaluate current AMSR-E SIPS product generation

06/10• Extend Karma provenance collection tools for SIPS 09/10• Enhance Karma Provenance Browser interface

10/10• Instrument AMSR-E Sea Ice production in Testbed

12/10• Evaluate with Sea Ice science team

03/11• Introduce Provenance Browser to NSIDC DAAC

06/11• Instrument AMSR-E Sea Ice production in Ops

09/11• Evaluate with AMSR-E Sea Ice user community

02/12• Instrument other AMSR-E data streams

02/12

Page 3: Instant Karma Collecting Provenance for AMSR-E Beth Plale Director, Data to Insight Center Indiana University Helen Conover Information Technology and

3

Types of Provenance InformationLots of information already available, but

scattered across multiple locations– Processing system configuration– Dataset and file level metadata– Processing history information – Quality assurance information – Software documentation (e.g., algorithm theoretical

basis documents, release notes)– Data documentation (e.g., guide documents, README

files) Instant Karma project aims to collate and

organize information from multiple sources

Page 4: Instant Karma Collecting Provenance for AMSR-E Beth Plale Director, Data to Insight Center Indiana University Helen Conover Information Technology and

Sea Ice Processing Flow and Dependencies

4

One day’s worth of Level-2A Tbs

Delivered Algorithm Package

Daily Processing Script

Sea Ice Algorithm

Default Multi-year Ice Mask

Sea Ice 6.25 km

Sea Ice 25 km

Snow Melt Mask

are running 5 day averages that are updated and replaced daily. Masks generated yesterday are used for today’s products.

Mask files

Ice Mask

Sea Ice 12.5 km Ice

MaskSnow

Melt Mask

Page 5: Instant Karma Collecting Provenance for AMSR-E Beth Plale Director, Data to Insight Center Indiana University Helen Conover Information Technology and

Karma provenance collection and representation

Karma analysis tool suite and portal

= Optionally installed in future

5

Page 6: Instant Karma Collecting Provenance for AMSR-E Beth Plale Director, Data to Insight Center Indiana University Helen Conover Information Technology and

6

AMSR-E daily processing workflow

• Workflow executes once per day of input files received

• Uses configuration files, data files, mask files

• Invokes processes, programs, algorithms

• Generates data files, images

Page 7: Instant Karma Collecting Provenance for AMSR-E Beth Plale Director, Data to Insight Center Indiana University Helen Conover Information Technology and

7

Graph Viz client

Subscriber Interface (provenance

listener)

Notification Ingester Interface

Relational store

Synchronous ingest Web service

Query Service

WS messenger Bus (future)

WSM

OPM 1.0 XML events OPM 1.0

RDF XML

Axis 2 other

Instrumented appsQuery client

Karma 3.0 architecture

Preserv client

Preservation object

Prov Track lib Prov Track lib

Client Toolkit Client Toolkit Client Toolkit

Ingester Implementer Interface

Knowledge discovery:Inferencing, quality,

completenessDatabaseSetup script

RESTfulService

Axis 2

Axis 2 Prov Track lib

Xregistry(Optional)

XMC Cat metadata

catalog (optional)

Page 8: Instant Karma Collecting Provenance for AMSR-E Beth Plale Director, Data to Insight Center Indiana University Helen Conover Information Technology and

8

Karma Architecture• Service Core

– Bridge pattern for independent Ingester and IngesterImplementer implementation

– Core components for ingesting notifications– Asynchronously shredding raw notifications to populate tables

• Axis2 Web Service Layer– API layer to ingest notifications from clients’ push– Also allows another layer to ingest notifications by pulling from

message bus• Axis2 Handlers

– Gather information by intercepting SOAP message from host services– Minimal intrusiveness and lightweight instrumentation

Page 9: Instant Karma Collecting Provenance for AMSR-E Beth Plale Director, Data to Insight Center Indiana University Helen Conover Information Technology and

9

Scavenging: for Stand-alone Provenance Collection

• Collects provenance using scavenging– Use existing collection mechanisms

• e.g., logging tool, auditing tool

– Low burden on both users and programmersUser Annotation Scavenging Full

InstrumentationApplication Burden Low Low High

Human Burden High Low Low

Information Quality Error rates and omissions lead to

incomplete information

Could have incompleteness

Complete

Page 10: Instant Karma Collecting Provenance for AMSR-E Beth Plale Director, Data to Insight Center Indiana University Helen Conover Information Technology and

10

Open Provenance Model (OPM)• Karma is generic and stand-alone

– Not coupled to any particular system• Karma 3.0 Utilizes OPM v1.01 to represent

provenance graph– OPM is a standard

http://eprints.ecs.soton.ac.uk/16148/1/opm-v1.01.pdf

– Enables provenance information exchange with other OPM-compliant tools

Page 11: Instant Karma Collecting Provenance for AMSR-E Beth Plale Director, Data to Insight Center Indiana University Helen Conover Information Technology and

11

Types of Provenance Information

Page 12: Instant Karma Collecting Provenance for AMSR-E Beth Plale Director, Data to Insight Center Indiana University Helen Conover Information Technology and

12

Types of Provenance Info (2)• [1] launches

– Whom: user ID or name– What: service e.g., service URI– When: launch context, time

• [2] consumes and [3] produces– File (e.g., file URL, owner)– Service: program, algorithm

• version

• [4] invokes– Invoking service– Invoked service– Parameters– Results/faults

Page 13: Instant Karma Collecting Provenance for AMSR-E Beth Plale Director, Data to Insight Center Indiana University Helen Conover Information Technology and

13

Additional types of provenance Information Captured by Karma

• Execution Status– Terminated or Failed

• Transfer of Data– Sending of results– Receiving of results

• Workflow and Program Lifecycles• Unknown Notifications

– Stored as raw notifications• Forthcoming: Spatial and temporal information,

simple and complex data values, quality information

Page 14: Instant Karma Collecting Provenance for AMSR-E Beth Plale Director, Data to Insight Center Indiana University Helen Conover Information Technology and

14

Simple provenance graph for Sea Ice

SeaIce files

Brightness Files

Brightness Files

…..

Uses (role = input)Uses (role = input)

Mask Files

Uses (role = applyMask)

hasPGEVersion

hasQualityFlag

Page 15: Instant Karma Collecting Provenance for AMSR-E Beth Plale Director, Data to Insight Center Indiana University Helen Conover Information Technology and

Mask file

Sea ice file

Input files

Provenance graph for sea ice product

Page 16: Instant Karma Collecting Provenance for AMSR-E Beth Plale Director, Data to Insight Center Indiana University Helen Conover Information Technology and

16

Provenance can be used to explain difference in images 1/28/2010 and 2/05/2010 as change in sea mask due to missing data

Page 17: Instant Karma Collecting Provenance for AMSR-E Beth Plale Director, Data to Insight Center Indiana University Helen Conover Information Technology and

17

Example• The provenance visualization is obtained using a simulated Karma

provenance database and in this use case its aim is to help scientist identify the mask file being used and provenance information about mask file.

• The provenance graph gives the user annotated lineage about a sea ice data product: inputs required for its creation, the files created as a result of processing of the file.

• Provenance visualization in this form allows for deeper examination.– e. g. : for a recurring error, the scientist can view all related provenance

information to get to source of error.

Page 18: Instant Karma Collecting Provenance for AMSR-E Beth Plale Director, Data to Insight Center Indiana University Helen Conover Information Technology and

18

Ongoing work• Better graph layout with detail for each data product and

process used generating a sea ice product. • Give nodes different shape and color depending on whether

input node or generated output node etc.• The user will be able to add annotations to edges by simply

right clicking on them. Thus capturing semantic annotations to the existing causal dependencies.

• Forthcoming: Spatial and temporal information, simple and complex data values, quality information

• Provenance bundle archived with data or embedded in HDF file, in addition to Karma database

Page 19: Instant Karma Collecting Provenance for AMSR-E Beth Plale Director, Data to Insight Center Indiana University Helen Conover Information Technology and

19

AMSR-E Provenance Use Cases• Browse provenance graphs : convey rich information about

final data product details– Spatial location, time of observation, algorithms employed, quality

propagation

• Answer “Something isn’t right” question• Example illustrated earlier: did not receive data for several days so

mask can be inaccurate.

• Provenance “bundle” includes relevant science papers• New communication satellites interfere with NASA satellites

for certain channels• Identify channels affected by RFI and channels used to generate each

product