mark servilla & duane costa lter network office lter 2012 all scientist meeting lter network...

Post on 31-Mar-2015

221 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Mark Servilla & Duane CostaLTER Network Office

LTER 2012 All Scientist Meeting

LTER Network Office

A Shareholders Introduction to the LTER

Data Co-op

Why LTER Data Co-op? A Diamond in the Rough Demonstrations

◦How can I contribute data?◦How do I find data?◦How can I see who is using my data?◦How is Network synthesis enabled?◦How is provenance captured?

Where do we go from here? Panel Discussion

Working Group Roadmap

LTER Network Office

Why LTER Data Co-op? It’s about community

◦ “A cooperative … is an autonomous association of persons who voluntarily cooperate for their mutual social, economic, and cultural benefit.” - Wikipedia

Producers – LTER sites Middleware - PASTA Consumers – Science Community

LTER Network Office

A Diamond in the Rough…

LTER Network Office

Data producers can evaluate their data package prior to harvesting into PASTA

Data packages are discovered via browsing and/or search tools

Derived data may be generated when a data package insert or update event occurs

Provenance metadata can be generated for derived data packages

Data package “use” information is viewed by a contributor

PASTA - Basic Tools

LTER Network Data Portalportal.lternet.edu

PASTA Web Service API

LTER Network Office

Subcomponent of the Data Package Manager component in PASTA

Generates a quality report for each data package A quality report contains a set of quality checks Stored as XML but usually rendered in HTML for

human readability 27 quality checks implemented in the NIS

prototype (of 52 proposed by EML Metrics Working Group)

Available to the greater ecoinformatics community via the Data Manager Library (ecoinformatics.org)

Quality Engine

An individual metric or a best practice May involve looking at:

◦ metadata (independent of data), or◦ data (independent of metadata), or◦ congruency between metadata and data

Can result in one of four statuses◦ valid◦ info◦ warn◦ error

LTER Network Office

What's a Quality Check?

LTER Network Office

Users can evaluate data packages before inserting them into PASTA

An error status reported by any quality check blocks insertion of the data package into PASTA

Every data package stored in PASTA has a quality report that can be accessed along with its metadata and data

How is the Quality Engine used in the NIS?

Data Package Quality Report

Evaluate◦ Runs quality checks on the data package but doesn’t

insert it into PASTA◦ May reveal more diagnostic information (as compared to

harvest) because it doesn’t necessarily halt after encountering the first error

Harvest◦ Runs quality checks on the data package; if no errors are

discovered, inserts (or updates) the data package into PASTA

◦ May reveal less diagnostic information (as compared to evaluate) because it may halt as soon as an error is encountered

Bottom line: Always evaluate before harvesting!

LTER Network Office

Evaluate versus Harvest

• EML is version 2.1.0 or beyond

• Document is schema-valid EML

• Document is EML parser-valid

• All entity-level data URLs are live

• The packageId pattern matches scope.identifier.revision

• There are no duplicate entity names

• An entity-level URL which is not set to “information” returns data

• Data table does not have more fields than metadata attributes

• Data table does not have fewer fields than metadata attributes

• Database table can be created from EML metadata

• Field delimiter in metadata is a single character

• Document is schema-valid after dereferencing

• enumeratedDomain codes are unique (not yet implemented)

LTER Network Office

valid or error

• Data can be loaded into the database

• Length of entityName is not excessive

• A methods element is present

• Record delimiter is present in metadata

• Data examined and possible record delimiters returned

• Number of records in metadata matches number of rows loaded

• At least one keyword element is present

• Dataset title length is at 5 least words

• Dataset abstract element is a minimum of 20 words

• ...others not yet implemented

LTER Network Office

valid or warn

• Display downloaded data

• Display first insert row

• coverage element is present

• temporalCoverage element is present

• geographicCoverage element is present

• taxonomicCoverage element is present

• ...others not yet implemented

LTER Network Office

info

LTER Network Office

< Insert Demo Here />

LTER Network Office

Data producers can evaluate their data package prior to harvesting into PASTA

Data packages are discovered via browsing and/or search tools

Derived data may be generated when a data package insert or update event occurs

Provenance metadata can be generated for derived data packages

Data package “use” information is viewed by a contributor

PASTA - Support for Synthesis

North Inlet Meteorological – Air Temperature Yearly aggregation of data Down-sample Hourly to Daily and Monthly

LTER Network Office

Time Series Resampling

1982

1982

1982

1982

1983

1983 1984

1983 1984 1992……

1.

2.

3.

11.

LTER Network Office

Workflow Integration into PASTA

PASTANIN

Workflow

SourceData

LTER Network Office

Workflow Integration into PASTA

PASTANIN

Workflow

Notify

LTER Network Office

Workflow Integration into PASTA

PASTANIN

Workflow

Request Data

LTER Network Office

Workflow Integration into PASTA

PASTANIN

WorkflowSourceData

LTER Network Office

Workflow Integration into PASTA

PASTA

LTER Network Office

Workflow Integration into PASTA

PASTANIN

WorkflowDerived

Data

Subscribe to a Data Package event

LTER Network Office

Subscribe to a Data Package event

LTER Network Office

Provenance Metadata

Source DataPackage

Derived DataPackage

WorkflowDescription

Provenance Metadata

LTER Network Office

Provenance “chaining”

LTER Network Office

< Insert Demo Here />

LTER Network Office

LTER Network Office

December 2012◦ Support DOI assignment to metadata and data

objects◦ Refine NIS Data Portal

Complete metadata rendering Improve catalog browsing

◦ Hang out shingle Summer 2013

◦ Standup DataONE member node

Where do we go from here?

Thank you!

top related