long term ecological research network office trends project spaghetti & linguine (aka trends...

24
Long Term Ecological Research Network Office Trends Project Spaghetti & Linguine (aka Trends Data Store) Mark Servilla [email protected] 14 September 2006

Upload: hortense-hart

Post on 29-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Long Term Ecological Research Network Office Trends Project Spaghetti & Linguine (aka Trends Data Store) Mark Servilla servilla@lternet.edu 14 September

Long Term Ecological ResearchNetwork Office

Trends ProjectSpaghetti & Linguine

(aka Trends Data Store)

Mark [email protected]

14 September 2006

Page 2: Long Term Ecological Research Network Office Trends Project Spaghetti & Linguine (aka Trends Data Store) Mark Servilla servilla@lternet.edu 14 September

LNO NIS

Table of Contents

• Background• System Architecture• System Workflow and Architecture Details• Demonstration Screen Examples

Page 3: Long Term Ecological Research Network Office Trends Project Spaghetti & Linguine (aka Trends Data Store) Mark Servilla servilla@lternet.edu 14 September

LNO NIS

Message from IMExec - Feb 2006

• “IMExec suggests that this activity be used to scope and determine the feasibility of using EML in the development of NIS modules for solving general synthesis problems.”

• “The premise of this project is that EML will adequately describe the data set (e.g., entities, attributes, physical characteristics) to allow the capture of distributed data sets into a central SQL database.”

• “Determining the nature of this model for dynamic data delivery – whether it is more site-loaded or more (network) service-loaded – is critical.”

• “IMExec suggests that the near-term Trends NIS module activity be focused on development of a prototype for demonstration at the ASM in September.”

Page 4: Long Term Ecological Research Network Office Trends Project Spaghetti & Linguine (aka Trends Data Store) Mark Servilla servilla@lternet.edu 14 September

LNO NIS

Prerequisites

• Site data is documented with “rich” and “complete” EML

• Time-series data must be captured as “snap shots” for EML temporal coverage – i.e., no “continuous end date”

• Site data is open and accessible through a standard protocol such as HTTP

• Site EML documents are harvested on a regular basis into the LTER Metacat

Page 5: Long Term Ecological Research Network Office Trends Project Spaghetti & Linguine (aka Trends Data Store) Mark Servilla servilla@lternet.edu 14 September

LNO NIS

What is EML?

Ecological Metadata Language is…• An ecological metadata standard• Very extensible; it can be used to describe many

different types of data• Comprehensive and supports a rich set of

constructs to fully describe data including– how to access distributed data– its logical and physical structure

• Defined by an XML Schema

• For further information:– http://knb.ecoinformatics.org/software/eml/

Page 6: Long Term Ecological Research Network Office Trends Project Spaghetti & Linguine (aka Trends Data Store) Mark Servilla servilla@lternet.edu 14 September

LNO NIS

What is Metacat?

Metacat is…• A storage system for metadata and data

(optimized for use with EML)• Built on top of relational database system using

Java servlets• Requires metadata to be in XML format• Provides a customizable web interface• Support point-to-point replication

• For further information:– http://knb.ecoinformatics.org/software/metacat/

Page 7: Long Term Ecological Research Network Office Trends Project Spaghetti & Linguine (aka Trends Data Store) Mark Servilla servilla@lternet.edu 14 September

LNO NIS

Trends Data Store Architecture

SourceA

SourceB

SourceC

EML

DatasetRegistry

1 ̊� f(x) 2 ̊�

HTML

SOAP

EMLFactory

- Derived Metadata- Source Provenance- Integration Methods- Trends Contact

EML ̊Parser/Loader

Metacat/Harvester

EML.xml

TrendsMetadata

PrimaryDatabase

(source ̊data)

SecondaryDatabase

(derived ̊data)

Data ̊Integration/

Transformation

Trends Data Warehouse

Store

Front

Page 8: Long Term Ecological Research Network Office Trends Project Spaghetti & Linguine (aka Trends Data Store) Mark Servilla servilla@lternet.edu 14 September

LNO NIS

Generalized Workflow

1. Sites collect and document time-series data (e.g., climate, social-economics, …)

2. Sites update EML with a new revision3. EML is harvested into Metacat4. EML Loader/Parser loads new/updated dataset

into primary database5. Data integration/transformation converts “raw”

data into “derived” data6. Derived data is stored in secondary database7. EML is generated for derived data and is stored

in Metacat8. Derived data is made available to store front

Page 9: Long Term Ecological Research Network Office Trends Project Spaghetti & Linguine (aka Trends Data Store) Mark Servilla servilla@lternet.edu 14 September

LNO NIS

Decomposed Workflow

1. Sites collect and document time-series data (e.g., climate, social-economics, …)

2. Sites update EML with a new revision3. EML is harvested into Metacat4. EML Loader/Parser loads new/updated dataset

into primary database5. Data integration/transformation converts “raw”

data into “derived” data6. Derived data is stored in secondary database7. EML is generated for derived data and is stored

in Metacat8. Derived data is made available to store front

Page 10: Long Term Ecological Research Network Office Trends Project Spaghetti & Linguine (aka Trends Data Store) Mark Servilla servilla@lternet.edu 14 September

LNO NIS

LTER Site Data Collection

• Time-series data– Physical environment

(e.g., climate, …)– Human population and

economy– Biogeochemistry– Biotic structure

• Data/metadata– Relational Database– Spreadsheet– Text file– HTML/XML

Page 11: Long Term Ecological Research Network Office Trends Project Spaghetti & Linguine (aka Trends Data Store) Mark Servilla servilla@lternet.edu 14 September

LNO NIS

Generalized Workflow

1. Sites collect and document time-series data (e.g., climate, social-economics, …)

2. Sites update EML with a new revision3. EML is harvested into Metacat4. EML Loader/Parser loads new/updated dataset

into primary database5. Data integration/transformation converts “raw”

data into “derived” data6. Derived data is stored in secondary database7. EML is generated for derived data and is stored

in Metacat8. Derived data is made available to store front

Page 12: Long Term Ecological Research Network Office Trends Project Spaghetti & Linguine (aka Trends Data Store) Mark Servilla servilla@lternet.edu 14 September

LNO NIS

EML, Metacat, and the Harvester

• EML Package IDknb-lter-site.XX.YYknb-lter-sev.354.1knb-lter-sev.354.2knb-lter-sev.354.3

• Metacat stores the XML of EML; new revisions take precedence – old revisions are deprecated, but not deleted

• Harvester is a time-based update process to “pull” site EML and inserts into Metacat

SourceA

SourceB

SourceC

EML

Metacat/Harvester

“independent of the Trends

Project”

Page 13: Long Term Ecological Research Network Office Trends Project Spaghetti & Linguine (aka Trends Data Store) Mark Servilla servilla@lternet.edu 14 September

LNO NIS

Generalized Workflow

1. Sites collect and document time-series data (e.g., climate, social-economics, …)

2. Sites update EML with a new revision3. EML is harvested into Metacat4. EML Loader/Parser loads new/updated dataset

into primary database5. Data integration/transformation converts “raw”

data into “derived” data6. Derived data is stored in secondary database7. EML is generated for derived data and is stored

in Metacat8. Derived data is made available to store front

Page 14: Long Term Ecological Research Network Office Trends Project Spaghetti & Linguine (aka Trends Data Store) Mark Servilla servilla@lternet.edu 14 September

LNO NIS

EML Loader/Parser

• Dataset registry identifies Trends data in Metacat

• New revisions assert a “new” data load. The EML parser/loader– Translates the site EML

into the RDBMS DDL– Creates a new DB table

in the primary database based on the revision

– Loads the new data into the primary database

– Trigger to continue workflow

SourceA

SourceB

SourceC

EML

DatasetRegistry

1 ̊�EML ̊Parser/Loader

Metacat/Harvester

Page 15: Long Term Ecological Research Network Office Trends Project Spaghetti & Linguine (aka Trends Data Store) Mark Servilla servilla@lternet.edu 14 September

LNO NIS

Generalized Workflow

1. Sites collect and document time-series data (e.g., climate, social-economics, …)

2. Sites update EML with a new revision3. EML is harvested into Metacat4. EML Loader/Parser loads new/updated dataset

into primary database5. Data integration/transformation converts “raw”

data into “derived” data6. Derived data is stored in secondary database7. EML is generated for derived data and is stored

in Metacat8. Derived data is made available to store front

Page 16: Long Term Ecological Research Network Office Trends Project Spaghetti & Linguine (aka Trends Data Store) Mark Servilla servilla@lternet.edu 14 September

LNO NIS

Data Transformation

• Primary DB (1°) stores site data in native schema• Transformation module reads native schema,

performs transformation/integration, and writes to global schema

• Secondary DB (2°) stores derived data in consistent global schema

1 ̊� f(x) 2 ̊�

MCM Canada Glacier Wind

date_time Timestamp of observation 15 min interval

wdir Wind direction (azimuth)

wdirstd Standard deviation of wind direction

wspd Wind speed meters/second

wspdmax Maximum wind speed meters/second

wpsdmin Minimum wind speed meters/second

Wind direction (knb-eco-trends.1.1)

Timestamp (daily)

value

Wind direction std dev (knb-eco-trends.2.1)

Timestamp (daily) value

Wind speed max (knb-eco-trends.5.1)

Timestamp (daily)

value

“triggered bydata load”

Page 17: Long Term Ecological Research Network Office Trends Project Spaghetti & Linguine (aka Trends Data Store) Mark Servilla servilla@lternet.edu 14 September

LNO NIS

Global Schema

knb_eco_trends_1_1scope

identifier

revision

Page 18: Long Term Ecological Research Network Office Trends Project Spaghetti & Linguine (aka Trends Data Store) Mark Servilla servilla@lternet.edu 14 September

LNO NIS

Generalized Workflow

1. Sites collect and document time-series data (e.g., climate, social-economics, …)

2. Sites update EML with a new revision3. EML is harvested into Metacat4. EML Loader/Parser loads new/updated dataset

into primary database5. Data integration/transformation converts “raw”

data into “derived” data6. Derived data is stored in secondary database7. EML is generated for derived data and is stored

in Metacat8. Derived data is made available to store front

Page 19: Long Term Ecological Research Network Office Trends Project Spaghetti & Linguine (aka Trends Data Store) Mark Servilla servilla@lternet.edu 14 September

LNO NIS

EML for the “derived”

• EML Factory generates EML metadata for the derived data and inserts into Metacat

• Derived data is now accessible through the Metacat user interface

EML

2 ̊�

EMLFactory

- Derived Metadata- Source Provenance- Integration Methods- Trends Contact

Metacat/Harvester

EML.xml

TrendsMetadata

Page 20: Long Term Ecological Research Network Office Trends Project Spaghetti & Linguine (aka Trends Data Store) Mark Servilla servilla@lternet.edu 14 September

LNO NIS

Generalized Workflow

1. Sites collect and document time-series data (e.g., climate, social-economics, …)

2. Sites update EML with a new revision3. EML is harvested into Metacat4. EML Loader/Parser loads new/updated dataset

into primary database5. Data integration/transformation converts “raw”

data into “derived” data6. Derived data is stored in secondary database7. EML is generated for derived data and is stored

in Metacat8. Derived data is made available to store front

Page 21: Long Term Ecological Research Network Office Trends Project Spaghetti & Linguine (aka Trends Data Store) Mark Servilla servilla@lternet.edu 14 September

LNO NIS

Store Front

• Store Front provides API to derived data products in secondary DB

• HTML – today• Web service –

tomorrow• Issues:

– Authentication– Authorization– Provenance– Quality– Interactive Plots

2 ̊�

HTML

SOAP

Store

Front

http://fire.lternet.edu/Trends(beta site location)

Page 22: Long Term Ecological Research Network Office Trends Project Spaghetti & Linguine (aka Trends Data Store) Mark Servilla servilla@lternet.edu 14 September

LNO NIS

HTML Store Front(evolution in progress)

Page 23: Long Term Ecological Research Network Office Trends Project Spaghetti & Linguine (aka Trends Data Store) Mark Servilla servilla@lternet.edu 14 September

LNO NIS

Animated Workflow

SourceA

SourceB

SourceC

EML

DatasetRegistry

1 ̊� f(x) 2 ̊�

HTML

SOAP

EMLFactory

- Derived Metadata- Source Provenance- Integration Methods- Trends Contact

EML ̊Parser/Loader

Metacat/Harvester

EML.xml

TrendsMetadata

Store

Front

Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

Page 24: Long Term Ecological Research Network Office Trends Project Spaghetti & Linguine (aka Trends Data Store) Mark Servilla servilla@lternet.edu 14 September

LNO NIS

Thank You – The End