mark servilla & duane costa lter network office lter 2012 all scientist meeting lter network...
TRANSCRIPT
Mark Servilla & Duane CostaLTER Network Office
LTER 2012 All Scientist Meeting
LTER Network Office
A Shareholders Introduction to the LTER
Data Co-op
Why LTER Data Co-op? A Diamond in the Rough Demonstrations
◦How can I contribute data?◦How do I find data?◦How can I see who is using my data?◦How is Network synthesis enabled?◦How is provenance captured?
Where do we go from here? Panel Discussion
Working Group Roadmap
LTER Network Office
Why LTER Data Co-op? It’s about community
◦ “A cooperative … is an autonomous association of persons who voluntarily cooperate for their mutual social, economic, and cultural benefit.” - Wikipedia
Producers – LTER sites Middleware - PASTA Consumers – Science Community
LTER Network Office
A Diamond in the Rough…
LTER Network Office
Data producers can evaluate their data package prior to harvesting into PASTA
Data packages are discovered via browsing and/or search tools
Derived data may be generated when a data package insert or update event occurs
Provenance metadata can be generated for derived data packages
Data package “use” information is viewed by a contributor
PASTA - Basic Tools
LTER Network Data Portalportal.lternet.edu
PASTA Web Service API
LTER Network Office
Subcomponent of the Data Package Manager component in PASTA
Generates a quality report for each data package A quality report contains a set of quality checks Stored as XML but usually rendered in HTML for
human readability 27 quality checks implemented in the NIS
prototype (of 52 proposed by EML Metrics Working Group)
Available to the greater ecoinformatics community via the Data Manager Library (ecoinformatics.org)
Quality Engine
An individual metric or a best practice May involve looking at:
◦ metadata (independent of data), or◦ data (independent of metadata), or◦ congruency between metadata and data
Can result in one of four statuses◦ valid◦ info◦ warn◦ error
LTER Network Office
What's a Quality Check?
LTER Network Office
Users can evaluate data packages before inserting them into PASTA
An error status reported by any quality check blocks insertion of the data package into PASTA
Every data package stored in PASTA has a quality report that can be accessed along with its metadata and data
How is the Quality Engine used in the NIS?
Data Package Quality Report
Evaluate◦ Runs quality checks on the data package but doesn’t
insert it into PASTA◦ May reveal more diagnostic information (as compared to
harvest) because it doesn’t necessarily halt after encountering the first error
Harvest◦ Runs quality checks on the data package; if no errors are
discovered, inserts (or updates) the data package into PASTA
◦ May reveal less diagnostic information (as compared to evaluate) because it may halt as soon as an error is encountered
Bottom line: Always evaluate before harvesting!
LTER Network Office
Evaluate versus Harvest
• EML is version 2.1.0 or beyond
• Document is schema-valid EML
• Document is EML parser-valid
• All entity-level data URLs are live
• The packageId pattern matches scope.identifier.revision
• There are no duplicate entity names
• An entity-level URL which is not set to “information” returns data
• Data table does not have more fields than metadata attributes
• Data table does not have fewer fields than metadata attributes
• Database table can be created from EML metadata
• Field delimiter in metadata is a single character
• Document is schema-valid after dereferencing
• enumeratedDomain codes are unique (not yet implemented)
LTER Network Office
valid or error
• Data can be loaded into the database
• Length of entityName is not excessive
• A methods element is present
• Record delimiter is present in metadata
• Data examined and possible record delimiters returned
• Number of records in metadata matches number of rows loaded
• At least one keyword element is present
• Dataset title length is at 5 least words
• Dataset abstract element is a minimum of 20 words
• ...others not yet implemented
LTER Network Office
valid or warn
• Display downloaded data
• Display first insert row
• coverage element is present
• temporalCoverage element is present
• geographicCoverage element is present
• taxonomicCoverage element is present
• ...others not yet implemented
LTER Network Office
info
LTER Network Office
< Insert Demo Here />
LTER Network Office
Data producers can evaluate their data package prior to harvesting into PASTA
Data packages are discovered via browsing and/or search tools
Derived data may be generated when a data package insert or update event occurs
Provenance metadata can be generated for derived data packages
Data package “use” information is viewed by a contributor
PASTA - Support for Synthesis
North Inlet Meteorological – Air Temperature Yearly aggregation of data Down-sample Hourly to Daily and Monthly
LTER Network Office
Time Series Resampling
1982
1982
1982
1982
1983
1983 1984
1983 1984 1992……
1.
2.
3.
11.
LTER Network Office
Workflow Integration into PASTA
PASTANIN
Workflow
SourceData
LTER Network Office
Workflow Integration into PASTA
PASTANIN
Workflow
Notify
LTER Network Office
Workflow Integration into PASTA
PASTANIN
Workflow
Request Data
LTER Network Office
Workflow Integration into PASTA
PASTANIN
WorkflowSourceData
LTER Network Office
Workflow Integration into PASTA
PASTA
LTER Network Office
Workflow Integration into PASTA
PASTANIN
WorkflowDerived
Data
Subscribe to a Data Package event
LTER Network Office
Subscribe to a Data Package event
LTER Network Office
Provenance Metadata
Source DataPackage
Derived DataPackage
WorkflowDescription
Provenance Metadata
LTER Network Office
Provenance “chaining”
LTER Network Office
< Insert Demo Here />
LTER Network Office
LTER Network Office
December 2012◦ Support DOI assignment to metadata and data
objects◦ Refine NIS Data Portal
Complete metadata rendering Improve catalog browsing
◦ Hang out shingle Summer 2013
◦ Standup DataONE member node
Where do we go from here?
Thank you!