datanet federation consortiumdatafed.org/dev/wp-content/uploads/2012/04/1-dfc... · doe/nnsa na-122...

24
University of North Carolina at Chapel Hill University of California, San Diego University of South Carolina Arizona State University University of Arizona Drexel University Duke University RENCI Years 1-2: 4 Universities 3 Consortia 2 Companies Years 1-2: 6 FTEs 8 Staff 5 Faculty 3 Students 1 DataNet Federation Consortium National Science Foundation Cooperative Agreement: OCI-0940841 Please credit the DataNet Federation Consortium when referencing this information.

Upload: lyngoc

Post on 19-Jul-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/1-DFC... · DoE/NNSA NA-122 EarthCube Testbed . Vice Chancellor of ... QoLT 3-plane chart Barriers/Risks

University of North Carolina at Chapel Hill

University of California, San Diego

University of South Carolina

Arizona State University

University of Arizona

Drexel University

Duke University

RENCI Years 1-2:

4 Universities

3 Consortia

2 Companies

Years 1-2:

6 FTEs

8 Staff

5 Faculty

3 Students

1

DataNet Federation Consortium

National Science Foundation Cooperative Agreement: OCI-0940841

Please credit the DataNet Federation Consortium when referencing this

information.

Page 2: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/1-DFC... · DoE/NNSA NA-122 EarthCube Testbed . Vice Chancellor of ... QoLT 3-plane chart Barriers/Risks

DFC Vision - Data Driven Science

• Enable reproducible science through collaborative research on shared workflows and data collections

– Researcher management of workflows and data

– Policy-based management of entire scientific data life cycle from data analysis pipelines to long-term sustainability of reference collections

• Implement NSF national scale data cyberinfrastructure

– Federation of exemplar data management technologies from national research initiatives

– Provision of interoperability mechanisms

– Proven technology implemented in extant data grids

• Integrate “live” research data collections into education initiatives

– Student digital libraries accessing national data sets

Project

Shared Collection

Processing Pipeline

Digital Library

Reference Collection

Federation

Community-based Collection Life Cycle

4/5/2012 2

Page 3: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/1-DFC... · DoE/NNSA NA-122 EarthCube Testbed . Vice Chancellor of ... QoLT 3-plane chart Barriers/Risks

DFC Vision

• Collaboration environment enables research:

– Registration of remote data sets into local workspace

– Registration of analysis workflows

– Tracking of provenance of workflows

Input data sets and input parameters

Correlation of workflows with research data products

• Reproducible science

– Sharing of workflows and data

– Re-execution of workflows by collaborators

4/5/2012 3

Page 4: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/1-DFC... · DoE/NNSA NA-122 EarthCube Testbed . Vice Chancellor of ... QoLT 3-plane chart Barriers/Risks

The Challenge:

To Enable Data-Driven Science Deliver the capability to manage, mine, and analyze data in near real time and build sustainable collections.

Experiments Archives

Sensors

Literature

Simulation

4/5/2012 4

The Future: an Explosion of Data

Page 5: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/1-DFC... · DoE/NNSA NA-122 EarthCube Testbed . Vice Chancellor of ... QoLT 3-plane chart Barriers/Risks

Use Case: National Water Model

Terrain in the Neuse River Basin, NC constructed from 390 million LiDAR measurements

Flooding in the Mississippi River Basin, August 1993 observed from satellite imagery Hydrologic scientist have expressed a “grand

research challenge” of building a National Water Model for flood and drought applications.

Achieving this goal will require a system like DFC to handle the massive data requirements.

Source: nasa.gov

Source: terrain.cs.duke.edu

4/5/2012 5

Page 6: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/1-DFC... · DoE/NNSA NA-122 EarthCube Testbed . Vice Chancellor of ... QoLT 3-plane chart Barriers/Risks

Data and Model Integration Needed to Support Hydrologic Science

Observations

Hydrologic Models

Weather and Climate Models

Physical Data

Socioeconomic Data

CUAHSI HIS

DFC

6 4/5/2012

Page 7: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/1-DFC... · DoE/NNSA NA-122 EarthCube Testbed . Vice Chancellor of ... QoLT 3-plane chart Barriers/Risks

Adaptive Behavior

Cooperative Behavior

Acoustic sensing Uncertainty

Uncertain Communication

Self-navigating Network

Uncertain, Unknown Environment

No maps

Net-centric, Distributed Autonomous Sensing Systems

Platform-centric Sensing Systems

AOSN

Ocean Sensing Systems Paradigm Shift

Federation of real-time sensor data

7 4/5/2012

Page 8: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/1-DFC... · DoE/NNSA NA-122 EarthCube Testbed . Vice Chancellor of ... QoLT 3-plane chart Barriers/Risks

4/5/2012

• Use data grid technology within OOI – Manage real-time sensor data stored as files

• Collaborate on installation of data grids at NOAA – Ingest and access to NOAA data records

• Federate with NOAA data archives – Automate deposition into NOAA archive

– Generate climate data records

8

Use Case: Ocean Observatories

Page 9: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/1-DFC... · DoE/NNSA NA-122 EarthCube Testbed . Vice Chancellor of ... QoLT 3-plane chart Barriers/Risks

State of Engineering Design Education

• Improving Engineering Design: Designing for Competitive Advantage (1991)

• The Engineer of 2020: Visions of Engineering in the New Century (2004)

• Educating the Engineer of 2020: Adapting Engineering Education to the New Century (2005)

• Rising Above the Gathering Storm: Energizing and Employing America for a Brighter Economic Future (2007)

• Is America Falling Off the Flat Earth? (2007)

9 4/5/2012

Page 10: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/1-DFC... · DoE/NNSA NA-122 EarthCube Testbed . Vice Chancellor of ... QoLT 3-plane chart Barriers/Risks

DFC Supports NAE “Grand Challenges” Efforts

4/5/2012 10

• CUASHI • Drexel “Engineering

Cities” Initiative • TDLC • Drexel’s ACIN

Initiative • iRODS and DFC tools

are the tools for data-driven scientific discovery!

Page 11: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/1-DFC... · DoE/NNSA NA-122 EarthCube Testbed . Vice Chancellor of ... QoLT 3-plane chart Barriers/Risks

Community-based Collection Life Cycle

Project Collection

Private

Local

Policies

Data Grid

Shared

Distribution

Access Policies

Digital Library

Published

Description

Format Metadata

Policies

Data

Processing Pipeline

Analyzed

Analysis Service Policies

Reference Collection

Preserved

Representation

Policy

Federation

Sustained

Re-purposing Policy

• The stages correspond to addition of new policies for a broader community. • We virtualize the stages of the collection life cycle through policy evolution.

Each collection life cycle stage re-purposes the original collection

11 4/5/2012

Page 12: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/1-DFC... · DoE/NNSA NA-122 EarthCube Testbed . Vice Chancellor of ... QoLT 3-plane chart Barriers/Risks

Scientific Goals Challenges

• Data-driven science (capturing metadata)

• National data cyberinfrastructure deployment (massive collections)

• Enable collaborative research (controlled sharing)

• Improve research productivity (workflow registration)

New Opportunities and Capabilities

• Time-dependent data support (new data modalities)

• Federation of repositories (integration across systems)

• Reference collections for educational use (active collections)

• Virtualization of stages of data life cycle (sustainability)

Opportunity and Capabilities

4/5/2012 12

Goals and Challenges

Page 13: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/1-DFC... · DoE/NNSA NA-122 EarthCube Testbed . Vice Chancellor of ... QoLT 3-plane chart Barriers/Risks

Highly extensible, scalable, modular policy-based framework

• Policy-based data management

• Interoperability across clients and storage resources

• Automation of administrative tasks

• Validation of assessment criteria

• Workflow management

• Federation of repositories

Sustainability of infrastructure, collections, policies

• Repurposing of collections

• Federation of repurposed collections

• International development collaborations (EUDAT)

• Enterprise-iRODS version (RENCI)

4/5/2012 13

Addressing the Challenges

Page 14: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/1-DFC... · DoE/NNSA NA-122 EarthCube Testbed . Vice Chancellor of ... QoLT 3-plane chart Barriers/Risks

Communities of Practice Policies for automating administrative tasks

Standards Groups

International Projects

Sustainability And

Institutions

Facilities And

Operations

Technology And

Research

Education And

Outreach

Policies for federation

and re-purposing

Policies for IPR and access

Policies for domains

Science & Engineering

Domains

Policies and

Standards

External Advisory

Board 1

2

3

4

5

6

Policies for international collaboration

Policies for metadata,

formats, APIs

NSF data management

policy examples

4/5/2012 14

Page 15: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/1-DFC... · DoE/NNSA NA-122 EarthCube Testbed . Vice Chancellor of ... QoLT 3-plane chart Barriers/Risks

GeoScience - Hydrology Jon Goodall – USC

Ocean Observatories Initiative (OOI) John Orcutt – UCSD

Cyberinfrastructure-Based Engineering Repositories for Undergraduates (CIBER-U)

William Regli – Drexel

Temporal Dynamics of Learning Center (TDLC)

Andrea Chiba – UCSD

iPlant Collaborative Sudha Ram – University of Arizona

Odum Institute for Research in Social Science Tom Carsey, Jon Crabtree – UNC-CH

Years 1-2

Years 3-5

4/5/2012 15

Science Domain Partners

Page 16: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/1-DFC... · DoE/NNSA NA-122 EarthCube Testbed . Vice Chancellor of ... QoLT 3-plane chart Barriers/Risks

Extended Collaboration Communities

4/5/2012 16

Open Source iRODS

DDN

iPLANT

DFC Technology

COP

IN2P3

SDCI EUDAT

Distributed Bio

DFC Science Collaborators OOI, CIBER-U,

Hydrology

Other Open Source

Contributors

RENCI

Product Realization Integrated Digital Enterprise (PRIDE),

DoE/NNSA NA-122

EarthCube Testbed

Page 17: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/1-DFC... · DoE/NNSA NA-122 EarthCube Testbed . Vice Chancellor of ... QoLT 3-plane chart Barriers/Risks

Vice Chancellor of Research, UNC-CH Barbara Entwisle

DFC Organizational Structure

External Advisory Board

Institutions and Sustainability

Richard Marciano

Outreach & Education Marilyn Lombardi Julian Lombardi

Project Manager Mary Whitton

Management Team

Science and Engineering William Regli

OOI ---------------- John Orcutt CIBER-U -------- William Regli Hydrology ------ Ken Galluppi

PI, Reagan Moore, and

Executive Committee

Technology and Research

Arcot Rajasekar

Policies and Standards

Helen Tibbo Cal Lee

Facilities & Operations

Stan Ahalt

Funded Years 1-2

Funded Years 3-5

4/5/2012 17

Page 18: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/1-DFC... · DoE/NNSA NA-122 EarthCube Testbed . Vice Chancellor of ... QoLT 3-plane chart Barriers/Risks

4/5/2012 18

Executive Committee Members are: The PI The Project Manager Community of Practice and

Science Domain representatives (rotating membership)

• Manages overall vision • Manages external relationships • Meets quarterly

Management Team Members are: The PI The Project Manager Community of Practice

and Science Domain representatives

• Manages internal activities and implementation • Meets weekly

Project Manager Manages o finances o personnel o risk tracking

and mitigation o project

information systems

Liaises with university & sponsoring agencies

Performs other duties as assigned

Principal Investigator Reagan Moore

Strategic (vision) Tactical (operations)

Project Manager Mary Whitton

Transition to Production Infrastructure

5/3/2012

Page 19: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/1-DFC... · DoE/NNSA NA-122 EarthCube Testbed . Vice Chancellor of ... QoLT 3-plane chart Barriers/Risks

QoLT 3-plane chart

Barriers/Risks

Enabling Technologies Technology Base

Barriers/Risks

Research Policy-based Data Management, Sustainability

Systems/Applications Technology Integration

Requirements OOI

Hydrology

CYBER-U

iPC

TDLC

Odum

Barriers/Risks

Technology Elements

Fundamental Insights

Technology Elements

Products & Outcomes

Strategic Framework for Organizing Work Plan

Syst

em R

equ

irem

ents

Repository Federation

Operational Evaluation

National Scale Cyber-

Infrastructure

Digital Library /

Workflow Systems

(Fedora, Taverna)

Distributed Trust Mechanisms

Semantically Self-Describing Data & Sensors

Rule Generation Consistency of Rule Bases

Data Life Cycle Policy Sets Verification of Rule Correctness

Expressive Policy Language Evaluating Collection Worth

Reference Collections

Real-time Data

Policy-based

Data Management

(iRODS)

User Interfaces &

Interoperability

(Cobalt)

Automation

Validation Mechanisms

Monitoring & Control

Policy Consensus

Autonomic Applications

Test & Measurement

Heterogeneous Systems

Complex Data Types

Federation across Multiple Protocols

Community Specific User Interfaces

Scale & Diversity &

Heterogeneity

Access to pub/private data & sensors

Embedding new Tech w/ Operators

How to measure live systems?

4/5/2012 19

Page 20: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/1-DFC... · DoE/NNSA NA-122 EarthCube Testbed . Vice Chancellor of ... QoLT 3-plane chart Barriers/Risks

Evolution of Domain Policies

Policies and Standards Science and Engineering

Facilities and Operations

Data Cyber-infrastructure Technology and Research

Analyze

Prioritize requests

Iterate with rapid prototypes

Science and Engineering

Identify / Create New Policies

Implement and test

Install and administer

Key Management Elements

4/5/2012 20

Page 21: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/1-DFC... · DoE/NNSA NA-122 EarthCube Testbed . Vice Chancellor of ... QoLT 3-plane chart Barriers/Risks

Policy-Based Roadmap

• Purpose - reason a collection is assembled

• Properties - attributes needed to ensure the purpose

• Policies - enforce and maintain properties

• Procedures - functions that implement the policies

• State information - results of applying procedures

• Assessment criteria - validate that state information conforms to the desired purpose

• Federation - controlled sharing of logical name spaces to enable repurposing of community collections for long-term sustainability

4/5/2012 21

Page 22: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/1-DFC... · DoE/NNSA NA-122 EarthCube Testbed . Vice Chancellor of ... QoLT 3-plane chart Barriers/Risks

Year 2 Year 1 Year 3 Year 4 Year 5

DFC Deliverables

Build Consensus

for Standards

Validate Social

Networking Models

Develop

Sustainability Models

Research:

Policy Creation

Automate Validation of Assessment

Criteria

Implement Rules to Validate

Assertions about

Collections

Research: Collection

Value

Develop Services for Structured

Data

Develop Social

Network Services for Data Use in Education

Research:

Policy Validation

Develop Reference

Implementa-tions

Develop Interopera-

bility Mechanisms

Research: Education

Policies

Implement a National Data Management Infrastructure

Requirements Analysis and Middleware Installation

Within first 3 months:

Develop Program

Execution Plan

4/5/2012 22

Page 23: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/1-DFC... · DoE/NNSA NA-122 EarthCube Testbed . Vice Chancellor of ... QoLT 3-plane chart Barriers/Risks

Social consensus for the sharing of data, policies, methods, practice

• Each community controls their own collection policies

– Policies can be dynamically changed, versioned, and audited

• Explicit computer actionable rules control the type of federation interactions

– Peer-to-peer, central archive, master-slave data distribution, chained data grids, deep archives

Interoperability mechanisms to support technology integration

• Community specific clients, policies and procedures

• Cross registration of data

• Structured information resource drivers

4/5/2012 23

Our Approach - Federation

Page 24: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/1-DFC... · DoE/NNSA NA-122 EarthCube Testbed . Vice Chancellor of ... QoLT 3-plane chart Barriers/Risks

Questions?

DataNet Federation Consortium