data management for cens stasa milojevic information studies ucla

19
Data Management for CENS Stasa Milojevic Information Studies UCLA

Post on 22-Dec-2015

223 views

Category:

Documents


2 download

TRANSCRIPT

Data Management for CENS

Stasa Milojevic

Information Studies

UCLA

CENS Data

• CENS will generate massive amounts of heterogeneous scientific and technical data from the sensors.

• The data need to be useful for CENS researchers • Real time• Archived

• The data also need to be useful for other researchers in those problem domains (larger community).

Data Management: Goals

Data Metadata Share with community

- <dataset>

  <alternateIdentifier>PLT-GCEM-0311b.1.0</alternateIdentifier>

  <title>Fall 2003 plant monitoring survey -- biomass calculated from shoot height and flowering status of plants in permanent plots at GCE sampling sites 1-10</title>

- <creator>

  <organizationName>Georgia Coastal Ecosystems LTER Project</organizationName>

- <address>

  <deliveryPoint>Dept. of Marine Sciences</deliveryPoint>

  <deliveryPoint>University of Georgia</deliveryPoint>

  <city>Athens</city>

  <administrativeArea>Georgia</administrativeArea>

  <postalCode>30602-3636</postalCode>

  <country>USA</country>

  </address>

How to make data useful and usable?

• One data model for all of CENS• Not likely, that presumes that all science problems are the same

• One data model for each CENS research area• More promising approach• Various scientific communities have agreed on the common models

Seismology

• Seismic data has been collected via digital instruments for over 30 years.

• There are robust and stable standards for describing seismic data across systems and data formats (SEED – Standard for the Exchange of Earthquake Data)

• Consortia to centralize and disseminate seismic datasets• IRIS (Incorporated Research Institutions for Seismology)• NEES (Network for Earthquake Engineering Simulation)

Habitat Monitoring

Habitat monitoring research:• Draws upon multiple disciplines and technologies

• Integrates data across a wide range of ecological scales (chemistry, physiology, ecology, and environment)

• Available testbeds include: embedded microclimate sensor network and embedded phenology network (including wildlife and plant monitoring)

Habitat monitoring data:• Temperature, moisture, and barometric pressure

• Video data

James Reserve and habitat monitoring community

Why we started with this community?

• One of the initial CENS sensor deployments

• The project is at an early stage of defining data and metadata requirements

• Data from this project are being used as the basis for our initial inquiry learning research in CENS

Ecological Metadata Language (EML)

• XML- based standard, developed by and for ecological community

• Divided into modules such as eml-access, eml-attribute, eml-project

• Describes data, literature, software, products

• Not well optimized for sensor data

• Optimized for describing data and not the derivation of data

• Uses Morpho Client as a cross-platform for creating and organizing data and metadata, either locally or on a shared network server

Ecological Metadata Language (EML)

- <coverage>

- <geographicCoverage>  <geographicDescription>GCE Study Site GCE1 -- Eulonia, Georgia, USA.

Transitional salt marsh/upland forest site at the upper reach of the Sapelo River near Eulonia, Georgia. The main marsh area is to the north of the channel where the upland is controlled by DNR. Several small creeks lie within the study area. Residential development is increasing on the upland areas south of the channel. A hydrographic sonde is deployed within this site attached to a private dock to the south of the main channel near the HW-17 bridge.</geographicDescription>

- <boundingCoordinates>

  <westBoundingCoordinate>-81.427321</westBoundingCoordinate>

  <eastBoundingCoordinate>-81.410390</eastBoundingCoordinate>

  <northBoundingCoordinate>31.546173</northBoundingCoordinate>

  <southBoundingCoordinate>31.535095</southBoundingCoordinate>

  </boundingCoordinates>

  </geographicCoverage>

Describing Instruments

Sensor Model Language (SensorML)• Emerging OpenGIS standard for describing sensors and sensor data

• Developed to support data discovery, data processing and geolocation

• Can be used for in-situ or remote sensors, dynamic or static platforms

• Optimized for large sensors and large platforms

• Describes resources for sensor management and discoveries, but not sensor-derived data

Sensor Model Language (SensorML)

Sensor

identifiedAs

documentConstrainedBy

measures

operatedBy

attachedTo

locatedUsing

describedBy

documentedBy

hasCRS

Sensor

identifiedAs

documentConstrainedBy

Science and Education

• We need to make the science data useful for teaching grade 6-12 science.

• Problem because the scientific models describe the data, and the education models describe lessons (grade level, instruments required for the lesson, time required to perform the lesson, educational standards, etc.)

METADATA FOR SENSOR DATA FOR HABITAT MONITORING METADATA FOR EDUCATION MODULES FOR HABITAT MONITORING

CENS Schema SensorML EML 2.0 LOM GEM ADN

CENS_Node.Node_NameName of Node

Sml:IdentifiedAs(2.2.2)

       

CENS_Node.Node_DescDescription of Node

AssetDescription:sml:description(2.2.12)

       

CENS_Location.Location_IDUnique location ID

CrsID (2.2.5) Eml-Coverage (2.4.4)      

CENS_Location.X_Pos(Position on X axis)

HasCRS (2.2.5)ObjectState (3.3.6)

Eml-Coverage-GeographicCoverage(2.4.4)

     

CENS_Location.Time_RecordedTime location was captured

  Eml-Coverage-TemporalCoverage(2.4.4)

     

CENS_Location.Time_Type_IDRefers to type of time of Time_Type ID table

  Eml-Coverage (2.4.4)      

      Educational-Typical Age Range(5.7)

Audience-Age 

Audience 

      Life Cycle-Contribute(2.3)

Creator 

Resource Creator 

      General-Coverage(1.6)

Coverage-Spatial, Temporal 

Coverage (spatial and temporal)

      Life Cycle-Date (2.3.3)DateTime (8)

Date 

Creation date Accession date  

      General-Description(1.4)

Description 

Description

      Educational (5)

Pedagogy 

Educational

Science and Education Data Models

Science and Education Data Models : Possible Solution

• Manage scientific data with models appropriate to the scientific community

• Construct filters and tools to make scientific data useful to K-12 students and teachers:

• Reduce granularity of data (e.g. temperature at hourly, rather than minute intervals)

• Develop tools to display these data (e.g. simple charts and graphs)• Describe filters and tools using models appropriate to educational community

(e.g. LOM, SCORM, GEM)

Science and Education Data Models –Possible Solution

Sets of Data collected run through Filters and Tools

to produce understandable Tables, Charts and Graphs

Current accomplishments and next steps

James Reserve:• Map current data structures to EML and SensorML to determine the

fit

• Analyze scientific papers and documents to determine required data elements

• Create use scenarios

• Interview scientists

Current accomplishments and next steps

Education:• Work with inquiry module team to identify data requirements

• Interview teachers

Discussion and Conclusions

Ensuring accessibility and integrity of CENS data to multiple communities

requires:• Understanding of the practices of each community

• Understanding of relationships between those practices

• Means to bridge the gaps

Acknowledgements

Christine Borgman

Andrew Wu

Bill Sandoval

Noel Enyedy

Joe Wise

Mike Wimbrow