mark servilla & duane costa lter network office lter 2012 all scientist meeting lter network...

36
Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office A Shareholders Introduction to the LTER Data Co-op

Upload: brenden-clowers

Post on 31-Mar-2015

221 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office

Mark Servilla & Duane CostaLTER Network Office

LTER 2012 All Scientist Meeting

LTER Network Office

A Shareholders Introduction to the LTER

Data Co-op

Page 2: Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office

Why LTER Data Co-op? A Diamond in the Rough Demonstrations

◦How can I contribute data?◦How do I find data?◦How can I see who is using my data?◦How is Network synthesis enabled?◦How is provenance captured?

Where do we go from here? Panel Discussion

Working Group Roadmap

Page 3: Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office

LTER Network Office

Why LTER Data Co-op? It’s about community

◦ “A cooperative … is an autonomous association of persons who voluntarily cooperate for their mutual social, economic, and cultural benefit.” - Wikipedia

Producers – LTER sites Middleware - PASTA Consumers – Science Community

Page 4: Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office

LTER Network Office

A Diamond in the Rough…

Page 5: Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office

LTER Network Office

Data producers can evaluate their data package prior to harvesting into PASTA

Data packages are discovered via browsing and/or search tools

Derived data may be generated when a data package insert or update event occurs

Provenance metadata can be generated for derived data packages

Data package “use” information is viewed by a contributor

PASTA - Basic Tools

Page 6: Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office

LTER Network Data Portalportal.lternet.edu

Page 7: Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office

PASTA Web Service API

Page 8: Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office

LTER Network Office

Subcomponent of the Data Package Manager component in PASTA

Generates a quality report for each data package A quality report contains a set of quality checks Stored as XML but usually rendered in HTML for

human readability 27 quality checks implemented in the NIS

prototype (of 52 proposed by EML Metrics Working Group)

Available to the greater ecoinformatics community via the Data Manager Library (ecoinformatics.org)

Quality Engine

Page 9: Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office

An individual metric or a best practice May involve looking at:

◦ metadata (independent of data), or◦ data (independent of metadata), or◦ congruency between metadata and data

Can result in one of four statuses◦ valid◦ info◦ warn◦ error

LTER Network Office

What's a Quality Check?

Page 10: Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office

LTER Network Office

Users can evaluate data packages before inserting them into PASTA

An error status reported by any quality check blocks insertion of the data package into PASTA

Every data package stored in PASTA has a quality report that can be accessed along with its metadata and data

How is the Quality Engine used in the NIS?

Page 11: Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office

Data Package Quality Report

Page 12: Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office

Evaluate◦ Runs quality checks on the data package but doesn’t

insert it into PASTA◦ May reveal more diagnostic information (as compared to

harvest) because it doesn’t necessarily halt after encountering the first error

Harvest◦ Runs quality checks on the data package; if no errors are

discovered, inserts (or updates) the data package into PASTA

◦ May reveal less diagnostic information (as compared to evaluate) because it may halt as soon as an error is encountered

Bottom line: Always evaluate before harvesting!

LTER Network Office

Evaluate versus Harvest

Page 13: Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office

• EML is version 2.1.0 or beyond

• Document is schema-valid EML

• Document is EML parser-valid

• All entity-level data URLs are live

• The packageId pattern matches scope.identifier.revision

• There are no duplicate entity names

• An entity-level URL which is not set to “information” returns data

• Data table does not have more fields than metadata attributes

• Data table does not have fewer fields than metadata attributes

• Database table can be created from EML metadata

• Field delimiter in metadata is a single character

• Document is schema-valid after dereferencing

• enumeratedDomain codes are unique (not yet implemented)

LTER Network Office

valid or error

Page 14: Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office

• Data can be loaded into the database

• Length of entityName is not excessive

• A methods element is present

• Record delimiter is present in metadata

• Data examined and possible record delimiters returned

• Number of records in metadata matches number of rows loaded

• At least one keyword element is present

• Dataset title length is at 5 least words

• Dataset abstract element is a minimum of 20 words

• ...others not yet implemented

LTER Network Office

valid or warn

Page 15: Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office

• Display downloaded data

• Display first insert row

• coverage element is present

• temporalCoverage element is present

• geographicCoverage element is present

• taxonomicCoverage element is present

• ...others not yet implemented

LTER Network Office

info

Page 16: Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office

LTER Network Office

< Insert Demo Here />

Page 17: Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office

LTER Network Office

Data producers can evaluate their data package prior to harvesting into PASTA

Data packages are discovered via browsing and/or search tools

Derived data may be generated when a data package insert or update event occurs

Provenance metadata can be generated for derived data packages

Data package “use” information is viewed by a contributor

PASTA - Support for Synthesis

Page 18: Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office

North Inlet Meteorological – Air Temperature Yearly aggregation of data Down-sample Hourly to Daily and Monthly

LTER Network Office

Time Series Resampling

1982

1982

1982

1982

1983

1983 1984

1983 1984 1992……

1.

2.

3.

11.

Page 19: Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office

LTER Network Office

Workflow Integration into PASTA

PASTANIN

Workflow

SourceData

Page 20: Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office

LTER Network Office

Workflow Integration into PASTA

PASTANIN

Workflow

Notify

Page 21: Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office

LTER Network Office

Workflow Integration into PASTA

PASTANIN

Workflow

Request Data

Page 22: Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office

LTER Network Office

Workflow Integration into PASTA

PASTANIN

WorkflowSourceData

Page 23: Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office

LTER Network Office

Workflow Integration into PASTA

PASTA

Page 24: Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office

LTER Network Office

Workflow Integration into PASTA

PASTANIN

WorkflowDerived

Data

Page 25: Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office

Subscribe to a Data Package event

Page 26: Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office

LTER Network Office

Subscribe to a Data Package event

Page 27: Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office

LTER Network Office

Provenance Metadata

Source DataPackage

Derived DataPackage

WorkflowDescription

Page 28: Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office

Provenance Metadata

Page 29: Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office
Page 30: Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office

LTER Network Office

Provenance “chaining”

Page 31: Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office

LTER Network Office

< Insert Demo Here />

Page 32: Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office
Page 33: Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office

LTER Network Office

Page 34: Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office

LTER Network Office

Page 35: Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office

December 2012◦ Support DOI assignment to metadata and data

objects◦ Refine NIS Data Portal

Complete metadata rendering Improve catalog browsing

◦ Hang out shingle Summer 2013

◦ Standup DataONE member node

Where do we go from here?

Page 36: Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office

Thank you!