usgs bioinformatics activities ecoinformatics january 2010 gladys cotter mike frame ecoinformatics...

33
USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame

Upload: corey-norton

Post on 18-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame

USGS Bioinformatics ActivitiesUSGS Bioinformatics ActivitiesEcoinformatics

January 2010

Gladys Cotter

Mike Frame

Ecoinformatics

January 2010

Gladys Cotter

Mike Frame

Page 2: USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame

3

2

1USGS Bioinformatics Activities

Potential areas of collaboration

Questions

Topics for Discussion

Page 3: USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame

•Tools•Protocols•Standards

Collecting

Bioinformatics USGS NBII – addressing bioinformatics challenges

through collaboration, content development, technology, and creating long-term infrastructure

•Cross-referencing •Relationship of data

Linking

•DBMS•Central & Distributed•Security•Backups•Archival •Standards

Storage

•Structure•Governance•Standards •Policies

Organization

•Multi-levels•Difficult•Mashups•Standards

Integration

•Tools•Standards•Usability•Training•Non-biased

Analysis Synthesis

•Tools•Governance•Infrastructure•User analysis

Delivery

•Tools•Protocols•Standards

Applications

for

•Fusion•Blending•Related Integration•Analysis •Models

•Research•Decision Making•Policies•Education•Outreach

Sustainable Reliable Outreach Training

Page 4: USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame

Biological Spatial InfrastructureNBII

Over 72,000 records Based on FGDC BDP Training Program QA/QC Program Standards Cross-walks

EML Dublin Core

Establishing Administrative Tools Expanding internationally Embedding in-line visualization

Page 5: USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame

World Data Center for Biodiversity & Ecology

• World Data System created through the International Council of Scientific Unions (ICSU) in 1957

• Currently 50 World Data Centers (WDC) in place internationally

• USGS National Biological Information Infrastructure (NBII) network designated as the WDC for Biodiversity & Ecology in 2002

Page 6: USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame

WDC Current Activities

• Renewable Energy Project Prequalification Demonstration project – Goal: support rapid prequalification of sites across the nation that are potentially

suitable for renewable energy (with an initial focus on federal lands).  • Data sets include, but are not limited to: • Land Cover (GAP), • Protected areas/Stewardship (GAP), • Species Distributions/Habitat Affinities (GAP), • Species Occurrences (US-GBIF Mirror Site and NBII), • Integrated Taxonomic Information System (ITIS)• Topography (USGS), • Landforms (USGS/GAM), • Soil Moisture (USGS/GAM), • Ecosystems (USGS/GAM), • Renewable Energy Potential (i.e., wind, solar, geothermal, and

biofuels; NREL), and • Infrastructure (i.e., power grid, projected smart grid, and roads; NREL

and USGS).

• Protected areas – working with WDPA, USGS GAP• Sponsoring WDC for Biodiversity & Human Health

– South Africa is hosting– Providing workshops, training, demonstration projects– Evaluating how to leverage ILTER activities

Page 7: USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame

Multilingual IABIN Catalog

Ability to search by:IABIN TNMap interface Resource TypeLanguageTaxonomyMulti-lingual thesaurus

Thesaurus web-servicesEnglishSpanishPortuguese

Page 8: USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame

NBII Search

Unique Facets

Dynamic biological clusters

Refine Results

Biological images

Map Display

Page 9: USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame

Additional

Unique Facets

Thesaurus integration

Publisher refinement

Diverse Sources

DBMSWebsites

FederationDocuments

Weighting of sources

Page 10: USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame

Integrated Taxonomic

Information System

• Multi-agency partnership

• Primarily North America Taxa

• Used Globally

• Web-services released Summer 2009

• Taxonomic Workbench 2010

Page 11: USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame

NBII Species Mashups• Designed for

– One-stop-shop for species information in SE– Integrate diverse sources

• Content Type• UI Presentation

Page 12: USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame

USGS Data Integration

3 Major Goals:1. Establishing corporate data available via

ESRI services

2. Improving access to Modeling data, including Water quality, stream, etc.

3. Providing easy to use “data upload”, “registry”, and “discovery tools”

Page 13: USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame

North American EOL

• Multi-agency partnership designed to develop a prototype for “species” information” within the Great Lakes and Chesapeake Bay regions

Page 14: USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame

NSF DataNet Grant Background

• NSF solicitation to establish– Long-term archives for science data – Develop sustainable business model to

support these activities– Involve multi-disciplinary domains– Develop various R&D needed to support effort – Provide ongoing “operational” support

Funded 2:

DataONE

The Data Conservancy

Page 15: USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame

DataONEAreas of emphasis

• Data loss: preserving all the work that has been done; by preserving at-risk (orphaned) biological ecological environmental data from individual scientists

• Data dispersion: finding the needle in the haystack; by facilitating discovery and access of data through a single easy-to-use portal

• Data deluge: navigating the flood of increasingly heterogeneous data; by providing a toolbox that empowers scientists and organizations to more easily and effectively manage, analyze, and synthesize data

• Data Practice: using the best tools to do the job; by creating an informatics-literate workforce through innovative outreach and training efforts (e.g., best-practice videos, podcasts, on-line certificate programs, downloadable best practice guides and exemplars of data management plans)

Page 16: USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame

16

DataONE Technology Directions

• DataONE will enable new science and knowledge creation through universal access to data about life on earth and the environment that sustains it by:

– making the scientist an active member of the data preservation process,

– creating cyberinfrastructure that supports the full data life cycle,

– promulgating cultural changes that value data stewardship and data sharing,

– broadly promoting best practices– engaging citizens in science – domain-agnostic Solutions

Page 17: USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame

17

Partnering organizations

• Libraries & digital libraries • Academic institutions • Research networks • NSF- and government-funded

synthesis & supercomputer centers/networks

• Governmental organizations • International organizations • Data and metadata archives • Professional societies • NGOs • Commercial sector

Page 18: USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame

Why is this relevant to Ecoinformatics

Share similar Cyber infrastructure needs Architecture Portals Distributed approaches Replication Secure, controlled access Authentication methods Tools deployed, and supported Data discovery & interoperability methods Standards developed, deployed

Life Cycle Data Management tools (i.e Investigator toolkit, CI) R&D activities in the areas of CS, IS, SS, GIS, Env., etc. Opportunity for broad Governmental & International Participation (i.e. working groups, tool evaluations, etc.) Complementary to several of our groups goals, projects, activities Potential Microsoft related projects (i.e. MS Excel)

Page 19: USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame

Potential areas of collaboration

• NBII Metadata Expansion• Incorporation of additional species data

into NA EOL, NBII Species Mashups, etc • USGS Data Integration activities• NSF DataONE Grant• Potential Microsoft tools

Page 20: USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame

Questions & Comments

Mike [email protected] 576-3605

Gladys [email protected] 648-4182

Page 21: USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame

Technical Architecture & Discussions

DataONE: Enabling Data-Intensive Biological and Environmental Research

Page 22: USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame

22

Existing biological data archives

ESA’s Ecological Archive

Long Term Ecological Research Network

Fire Research & Management Exchange System

National Biological Information Infrastructure

Distributed Active Archive Center

Knowledge Network for Biocomplexity

Page 23: USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame

23

Example data holdings

Data Archive Types of Data ManagedMetadata

Standard(s)

Biodiversity, taxonomic, ecological BDP, DwC, DC, OGIS

Biogeochemical dynamics, terrestrial ecological Earth observation imagery

DIF, BDP, ECHO

Ecological, biodiversity, biophysical, social, genomics, and taxonomic

EML

Avian populations and molecular biology DwC

Biological and taxonomic DC subset

Biophysical, biodiversity, disturbance, and Earth observation imagery

EML

Biodiversity, biotic structure, function/process, biogeochemical,

climate, and hydrologic

EML

Metadata Interoperability Across Data Holdings

EML=Ecological Metadata Language

BDP=Biological Data Profile DwC=Darwin Core

DC=Dublin Core ECHO=EOS ClearingHOuse

OGIS=OpenGIS

DC subset=Dublin Core subset

DIF=Directory Interchange Format

Page 24: USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame

Distributed framework

Member Nodes

• diverse institutions

• serve local community

• provide resources for managing their data

Coordinating Nodes• retain complete metadata catalog • subset of all data• perform basic indexing• provide network-wide services• ensure data availability (preservation) • provide replication services

Flexible, scalable, sustainable network

Page 25: USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame

Supporting the data lifecycle

UCSBNode

UNMNode

ORCNode

1. Deposition/acquisition/ingest2. Curation and metadata management3. Protection, including privacy4. Discovery, access, use, and dissemination5. Interoperability, standards, and integration6. Evaluation, analysis, and visualization

The data lifecycle }

Page 26: USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame

Use Cases, Architecture Planning

http://mule1.dataone.org/ArchitectureDocs/index.html

Page 27: USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame

Changing science culture

1. Education and training

2. Engaging citizens in science

3. Building global communities of practice

Page 28: USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame

Career Long Learning: • best practice guides• exemplary data management

plans• podcasts, web-casts• workshops and seminars• downloadable curricula

Education and training

Best Practice Guide

How to Cite Your Data

6 in a series

Best Practice Guide

Using Metadata fore-research

5 in a series

Gold Star Data Management Plan

Here’s HowBest Practice Guide

How to Cite Your Data

6 in a series

Page 29: USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame

www.CitizenScience.org

Engaging citizens in science

Page 30: USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame

Building global long-lived communities of practice:

• Broad, active community engagement– Involvement of library and science educators engaging

new generations of students in best practices– Existing outreach and education programs

• Transparent, participatory governance• Adoption/creation of innovative and sustainable business

and organizational models

Page 31: USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame

Engagement Working Groups

External Advisory Committee

DIUG

Infrastructure and Research Working Groups

Director Development & Operations

Principal Investigator

R&D Operations

Coordinating Nodes

Member Nodes

Sociocultural barriers to data sharing and preservation

Long-term sustainability and governance

Community engagement and education

Citizen science and public outreach

Usability and assessment

Data integration and semantics

Data preservation, metadata, and interoperability Distributed storage

Federated security

Scientific workflows

Usability and assessment

DirectorCommunity Engagement & Outreach

Education and Outreach Team

Operations

Core CI Team

R&D

Executive Director

Exploration, Visualization, Analysis Exploration, Visualization, Analysis

DataNet Partners

NSF

Leadership Team

DataONEOffice

Page 32: USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame

Why is this relevant to Ecoinformatics

Share similar Cyber infrastructure needs Architecture Portals Distributed approaches Replication Secure, controlled access Authentication methods Tools deployed, and supported Data discovery & interoperability methods Standards developed, deployed

Life Cycle Data Management tools (i.e Investigator toolkit, CI) R&D activities in the areas of CS, IS, SS, GIS, Env., etc. Opportunity for broad Governmental & International Participation (i.e. working groups, tool evaluations, etc.) Complementary to several of our groups goals, projects, activities Potential Microsoft related projects (i.e. MS Excel)

Page 33: USGS Bioinformatics Activities Ecoinformatics January 2010 Gladys Cotter Mike Frame Ecoinformatics January 2010 Gladys Cotter Mike Frame

Thanks!

Leadership Team:Bill Michener – UNM, PISuzie Allard – UTJohn Cobb – ORNLBob Cook – ORNLPatricia Cruse – CDLMike Frame – USGSStephanie Hampton – UCSBViv Hutchison – USGSMatt Jones – UCSBSteve Kelling – CornellKathleen Smith - DukeCarol Tenopir – UTDave Vieglais – KU, DataONEBruce Wilson – Joint ORNL – UT