datanet federation consortiumdatafed.org/dev/wp-content/uploads/2012/04/3-dfc... · 4/3/2012  ·...

33
1 Engagement and Prototype User Requirements User Groups Prototype Helen Tibbo, Reagan Moore, Arcot Rajasekar UNC-CH DataNet Federation Consortium National Science Foundation Cooperative Agreement: OCI-0940841 Please credit the DataNet Federation Consortium when referencing this information.

Upload: others

Post on 08-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/3-DFC... · 4/3/2012  · University NextGen Network Enabled Weather Web analysis services GMU GeOnAS, DataONE

1

Engagement and Prototype

User Requirements User Groups

Prototype

Helen Tibbo, Reagan Moore, Arcot Rajasekar UNC-CH

DataNet Federation Consortium

National Science Foundation Cooperative Agreement: OCI-0940841

Please credit the DataNet Federation Consortium when referencing this

information.

Page 2: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/3-DFC... · 4/3/2012  · University NextGen Network Enabled Weather Web analysis services GMU GeOnAS, DataONE

Topics

• User requirements – User surveys – Assessment: Use cases – Policies and standards

• User Groups – DFC funded community interactions – Extended community

• Prototype – Architecture – Deliverables – Federation

2

Page 3: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/3-DFC... · 4/3/2012  · University NextGen Network Enabled Weather Web analysis services GMU GeOnAS, DataONE

User Requirements

Lesson Learned: There are three levels of requirements:

1. Infrastructure interoperability – Survey of science and engineering technology

– Track technology evolution

2. Domain management – Governance policies (within a project)

– Federation policies (with other projects)

3. Researcher features – Specific capabilities to improve productivity

– User interfaces

4

Page 4: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/3-DFC... · 4/3/2012  · University NextGen Network Enabled Weather Web analysis services GMU GeOnAS, DataONE

Methods

• Survey (Rajasekar)

– Identify technology interoperability requirements

• Interviews (Tibbo)

– Identify consortia governance, workflows, provenance requirements

– Identify researcher needs

5

Page 5: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/3-DFC... · 4/3/2012  · University NextGen Network Enabled Weather Web analysis services GMU GeOnAS, DataONE

Survey: Hydrology Use Case- Automated analyses in hydrology

Integrated Water Model – Automation of VIC Workflows

Capture of Provenance & Process Information – Identify Lineage & Acknowledgements

Provision for Unique Data Identification Signature Impose Restrictions & Apply Transformations Capture & Propagate Caveats & Error Corrections Provisions for Failure Recovery, Debugging & Explanations Re-execute models for Reproducible Science

– Extension to RHESsys Workflow Integration with GRASS Methods

– Identify and Apply SLAs Heterogeneous Data Access

– CUAHSI HIS Data – Climate Data (NCDC) – Satellite Data (NASA)

6

Page 6: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/3-DFC... · 4/3/2012  · University NextGen Network Enabled Weather Web analysis services GMU GeOnAS, DataONE

Survey: Engineering Use Case Information-based Engineering

Integrated Infrastructure for Engineering Models – From Silos (Projects/Teams) to Clouds

– Format Registry for Design Models

– Format Verification Services

– Model Conversion Services

– Model Metadata Extraction & Discovery Services

– Distributed Model Data Access OOI Sensor Data CIBER-U CAD data iConnect Civil Infrastructure (Bridges) System

Take Design Models from NSF Projects to Education – Integrate DFC Platform with CIBER-U

7

Page 7: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/3-DFC... · 4/3/2012  · University NextGen Network Enabled Weather Web analysis services GMU GeOnAS, DataONE

Survey: Marine Use Case

• Long-term Access to Marine Data Streams – Replicate & Archive Data Streams at NCDC – Capture & Propagate Data Provenance, Errors & Corrections – Capture Metadata & Enable Long-term Discovery – Provision for Replay of Archived Data Streams – Services for Runtime Stream Format Conversions – Impose Restrictions & Apply Transformations

• Federate with Hydrology and Climate Data – Applications of Hydrology Models with Marine Data – Enable Integration of Ocean and Hydrology Modeling – Enable Integration of Ocean and Climate Modeling

• Expose Marine Engineering Design to Educational reuse – Provision access to Sensor/Platform Design Data

8

Page 8: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/3-DFC... · 4/3/2012  · University NextGen Network Enabled Weather Web analysis services GMU GeOnAS, DataONE

GeoScience Domain Requirements: NSF EarthCube Interoperability Testbed

Architecture Layer Technologies

Collaboration environment UNC-CH integrated Rule Oriented Data System (iRODS)

Models UC Boulder Community Surface Dynamics Modeling System,

UNC RHESSys

Data grids GMU geospatial data grid, iRODS, DataONE OneDrive

Workflows iRODS, NCSA Cyberintegrator, UCSD Kepler, GMU

BPELPower

Policies iRODS, NCSA Cyberintegrator

Web Services

OGC Sensor Web Enablement standard (SWE), WHOI

observation assessment (SWE), NCSA Semantic Geostreaming

toolkit (SWE, W3C), GMU Geospatial client, Colorado State

University NextGen Network Enabled Weather

Web analysis services GMU GeOnAS, DataONE Oner

Web visualization services GRASS, NOAA Environmental Information Service

Network and security protocols iRODS (Grid Security Infrastructure, Kerberos, Shibboleth,

Reliable Blast UDP, parallel TCP/IP)

Repositories NOAA CLASS, NASA Echo, GEOSS ClearingHouse

Catalog GMU GI-Cat, CUAHSI

Federation iRODS, CLASS

9

Page 9: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/3-DFC... · 4/3/2012  · University NextGen Network Enabled Weather Web analysis services GMU GeOnAS, DataONE

Policies and Standards: Central to the DFC

Policies for

automating

data management

Standards Groups

International Projects

Advisory Committee

Science & Engineering

Domains

Sustainability And

Institutions

Facilities And

Operations

Technology And

Research

Education And

Outreach Policies

And Standards

Policies for

publication

& federation

Policies for

IPR & citations

Policies for

provenance &

sustainability

Policies for

collaboration

and reuse

Policies for

technology

migration

Policies for

metadata

extraction

Policies for

analysis and

workflow

Policies for

change

management

Domain-centric

policies

Policies for

authentication

& authorization

Policies for

archiving, staging

& caching

Policies for

replication &

synchronization

Policies for

retention &

disposition

Policies for

Deletion &

redaction

Policies for

trust

Polices for

curation & preservation

Page 10: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/3-DFC... · 4/3/2012  · University NextGen Network Enabled Weather Web analysis services GMU GeOnAS, DataONE

Policies and Standards COP Methodology

Standards Community

Domain Scientists & Engineers Advisory

Committees Peer

Initiatives

Requirements Inputs

Outcomes and Deliverables

Ingest, Management

& Storage

Educational Reuse

Requirements Transformed to Policies for Testing,

Evaluation & Iteration

Expertise and Interactions

Graduate Digital

Curation Program

Professional Institutes

Int’l. DigCCurr

Conferences

System Resources & Experience

ISO WG for Repository Audit

& Certification

Partnerships w/ NARA, JISC, DCC, Glasgow

SAA Leadership US, EU

Grant Reviews

Reuse by Scientists & Others

Educators

Novel, Integrated Analysis

Sustainable Repositories

11

Page 11: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/3-DFC... · 4/3/2012  · University NextGen Network Enabled Weather Web analysis services GMU GeOnAS, DataONE

Multiple Methodologies to Elicit Work Practices and Curation Needs

• Review of the literature • Collaborate with other DataNet groups and international

projects. • Interviews, surveys, and content analysis of documentation

to produce Curation Profiles extending work of Purdue & UIUC (collaborators on CDCG) and the DCC SCARP project.

• Use cases and data requirements observed, solicited, and transformed into curation requirements across the lifecycle.

• Start with targeted communities and limited functions; iterate out to other communities and across lifecycle functions over the life of the project.

• Test policy efficacy in communities; iterate.

5/3/2012 12

Page 12: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/3-DFC... · 4/3/2012  · University NextGen Network Enabled Weather Web analysis services GMU GeOnAS, DataONE

User Groups

DataNet Federation Consortium

13

Page 13: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/3-DFC... · 4/3/2012  · University NextGen Network Enabled Weather Web analysis services GMU GeOnAS, DataONE

Ocean Observatories Initiative (OOI) John Orcutt – UCSD

DFC Funded Collaborations

5/3/2012 15

Hydrology-GeoScience

Jon Goodall – South Carolina

Cyber-Infrastructure-Based Engineering Repositories for Undergraduates (CIBER-U) William Regli – Drexel

Page 14: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/3-DFC... · 4/3/2012  · University NextGen Network Enabled Weather Web analysis services GMU GeOnAS, DataONE

16

Broader Impact: Federal Agencies

• NASA Center for Climate Simulation – Developed Virtual Climate Data Server based on

iRODS data grid

• National Climatic Data Center – Installed two iRODS data grids to manage access to

climate data records

• National Nuclear Security Administration – Assisting Product Realization Digital Enterprise (PRIDE)

Program (NA 122) in representation, ingest and curation of engineering records on ‘at risk’ media and digital CAD artifacts; contributing to requirements modernization initiatives in model-based enterprise

Page 15: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/3-DFC... · 4/3/2012  · University NextGen Network Enabled Weather Web analysis services GMU GeOnAS, DataONE

17

Broader Impact: Collaborative International Development

• EUDAT – Memorandum of Understanding on interoperable systems

• ARCS - Australian Research Collaboration Service – Web-DAV user interface

• France National Institute for Nuclear Physics and Particle Physics – monitoring system

• Academia Sinica – Multi-lingual support, Storage Resource Manager

• Sustainable Heritage Access through Multivalent Archiving – Cheshire3 text processing

• Sanger Wellcome Trust – Genomics data grid

• CoopeUs – Ocean Sciences data sharing including EU programs and OOI

Page 16: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/3-DFC... · 4/3/2012  · University NextGen Network Enabled Weather Web analysis services GMU GeOnAS, DataONE

Broader Impact: Vendor Relationships

• Data Direct Networks – SFA10KE storage controller integrates iRODS policy

based data management within the storage controller; demonstrated at SC’11

– Enables policy controlled storage-based processing

• Distributed Bio – Security and high performance extensions

• RENCI – Enterprise version of iRODS (E-iRODS) for DFC

production system

18

Page 17: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/3-DFC... · 4/3/2012  · University NextGen Network Enabled Weather Web analysis services GMU GeOnAS, DataONE

Prototype Architecture

Deliverables

Federation

19

DataNet Federation Consortium

Page 18: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/3-DFC... · 4/3/2012  · University NextGen Network Enabled Weather Web analysis services GMU GeOnAS, DataONE

Zen of DFC Architecture

Architecture design: Highly extensible, scalable, modular virtualization environment • Based on three basic goals:

o Organize distributed data into a shareable collection o Virtualize the collection instead of the storage systems o Make it easy to customize at all levels

-------------------------------------- Our Model ------------------------------------------------ • Take a Peer-to-peer client-server architecture

– Enables a distributed cloud management

• Add a Virtualization Framework to Manage and Abstract Namespaces – Provides logical independence from physical attributes – Enables abstraction for Authentication, Authorization and Identification (AAI)

• Integrate meta-data support – Ease of publishing and discovery of data and services

• Interleave with Policies – Empowers service-level customizability

• Expose a Published Protocol – Both at the Front and the Back end – Eases multi-language client interfacing & and adding new services

20

Page 19: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/3-DFC... · 4/3/2012  · University NextGen Network Enabled Weather Web analysis services GMU GeOnAS, DataONE

DFC Platform Architecture

• Build the DFC Platform on Proven Technology: iRODS – Stable releases and in production in multiple projects – Scalable, Extensible and Modular – Community-oriented & Open Source – Integrates data, metadata from different resources – Built-in federation & server-side computation facilities – Established software practices

SVN, Bugzilla, Gforge, irod-chat, Wiki, Doxygen, Continuous testing, Installation scripts, RPM, etc

• DFC User requirements translated into – Integration of new data resources – Wrapping new functions and procedures as micro-services – Create new rules/workflows to perform transformations or

analysis – Implementation/Integration of new client interfaces

21

Page 20: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/3-DFC... · 4/3/2012  · University NextGen Network Enabled Weather Web analysis services GMU GeOnAS, DataONE

iRODS Architecture

22

iCAT iRES iXMS iSEC iRES iRES

Resc

Resc Res

c Res

c Resc DB

Schedule &

Compute

Queue

Message

Queue

Metadata

Database

Storage & Compute Resources

File Systems, Archives, Databases, Sensor Systems, Clusters,…

Windows

Browser

iCommands

Command Line WebDAV

On iPOD

iRODS Rich

Web Client

Visualization

Of HDF5 File

HDF Viewer

For iRODS

Clients

iRODS Protocol

Servers

Resources

Page 21: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/3-DFC... · 4/3/2012  · University NextGen Network Enabled Weather Web analysis services GMU GeOnAS, DataONE

Foundational Ideas

• Policy-based data management – Essential for supporting scalable data-driven science

• Sustainability – Underpins the stages of the data life cycle through

repurposing of collections

• Extensibility – Essential for incorporating new technologies and new

research domains

• Federation – Mechanism for building collaboration environments and

implementing long-term sustainability

• Enabling Reproducible Science – Support researcher by managing data, workflows,

collaboration environment, and sharing of data and workflows

23

Page 22: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/3-DFC... · 4/3/2012  · University NextGen Network Enabled Weather Web analysis services GMU GeOnAS, DataONE

DFC Prototype Realization

• Build on iRODS Data Grid Software – Support for heterogeneous resource access, multiple data

movement protocols, integrated handling for system and descriptive metadata, provenance management, seamless federation capability, full-fledged data management capability, rule-based policy execution, extensibility through micro-services, orchestration of internal and external workflows

• Extend with Domain-specific Software & Resource Access – Hydrology: support for programs, functions, services, multiple data

collection access for hydrology workflows

– Marine: support for sensor data preservation (snapshots) and replay, access to marine memorizing functions, federated access to national repositories

– Engineering: support for format registry, format verifications, model conversions and integration with CIBER-U repositories and tools

24

Page 23: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/3-DFC... · 4/3/2012  · University NextGen Network Enabled Weather Web analysis services GMU GeOnAS, DataONE

DFC Federation Technology

• Data grid – sustainable & extensible policy-oriented data management – Build shared name spaces – Provide distributed data management functions – Enforce administration and usage through policies

• Federated data grids – Cross-register users and resources across data grids

• Soft links – Register data from external data management system,

accessed through its protocol

• Workflow integration – Register workflows into data grid for storage side procedures – Integrate data management workflows with external workflows – Gather provenance as workflow is executed

25

Page 24: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/3-DFC... · 4/3/2012  · University NextGen Network Enabled Weather Web analysis services GMU GeOnAS, DataONE

Sequence of Technology Activities

1. Support Applications in Collaborator Communities

– Automate analyses

2. Facilitate Cross-Domain Applications – Support workflow execution across domains

3. Establish end-to-end data life cycle management

– Support preservation of reference collections

26

Page 25: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/3-DFC... · 4/3/2012  · University NextGen Network Enabled Weather Web analysis services GMU GeOnAS, DataONE

DFC Federation

27

Federation Hub

DFC Snow-flake Federation

AdHoc Inter-Domain Federation

Federation to External Grids

Page 26: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/3-DFC... · 4/3/2012  · University NextGen Network Enabled Weather Web analysis services GMU GeOnAS, DataONE

DFC Federation Grid Status (Phase-1)

DFC-HUB

DFC-ENGG

DFC-HYDRO

DFC-MARINE

DFC-Marine Administration: UCSD Metadata: UCSD Data Resc: UCSD Replica Resc: RENCI Ingestion Resc: Oregon Ingestion Resc: Rutgers Workflow Resc: ALL Rule Engine: UCSD Message Hub: UCSD

DFC-Hydrology Administration: RENCI Metadata: RENCI Data Resc: USC Data Resc: NCDC Replica Resc: RENCI Workflow Resc: ALL Rule Engine: RENCI Message Hub: RENCI

DFC-Engineering Administration: Drexel Metadata: Drexel Data Resc: Drexel Replica Resc: RENCI Workflow Resc: ALL Rule Engine: RENCI Message Hub: Drexel

DFC-Federation Hub Administration: RENCI Metadata: RENCI Data Resc: RENCI Replica Resc: ITS-UNC Workflow Resc: ALL Rule Engine: RENCI Message Hub: RENCI

28

Page 27: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/3-DFC... · 4/3/2012  · University NextGen Network Enabled Weather Web analysis services GMU GeOnAS, DataONE

TeraGrid TeraGrid

Federation of Federations

29

DFC-HUB

DFC-ENGG

DFC-HYDRO

DFC-MARINE

DFC Federation

RENCI-VO

NARA-RENCI

TDLC ASGC

TIP-DUKE

TACC

RENCI Federation

TeraGrid CoopeUs

DFC-Learning

DFC-Sociology

DFC-Biology

NCDC

EU-DAT

29

Page 28: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/3-DFC... · 4/3/2012  · University NextGen Network Enabled Weather Web analysis services GMU GeOnAS, DataONE

Iterative Software Development

• Identify Requirement – Working closely with S&E partner – Small semantically well-defined functionality

• Design with iRODS Framework – With feedback from S&E partner – High-level design of Resource Drivers, micro-services, rules or

client integration

• Construct and Perform Unit & Integration Testing – Technology team with some liaison with S&E partner – Using iRODS Coding, Testing & Documentation Practices

• Apply & Tune – Working closely with S&E partner

• Transition into Production Release

30 5/3/2012

Page 29: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/3-DFC... · 4/3/2012  · University NextGen Network Enabled Weather Web analysis services GMU GeOnAS, DataONE

Technology Success Metrics (Apr 2013)

• Support Applications in S&E Partner Communities Hydrology: Federate access to data from CUASHI, NCDC, NASA and other resources; Automate VIC Workflow & Show Reproducibility of Results

Engineering: Integrate format registry, model conversion, format verification & metadata extraction services; CIBER-U access to DFC data

Marine: Facilitate sensor data preservation (snapshots) into DFC (possibly at NCDC); Wrap OOI memoizing functions to provide real-time access to marine sensor data and services

31 5/3/2012

Page 30: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/3-DFC... · 4/3/2012  · University NextGen Network Enabled Weather Web analysis services GMU GeOnAS, DataONE

DFC Sustainability Metrics

• Create community resources for science & engineering – Facility for Reference Collections

– Facility for Preserving “At Risk” and “One of a kind” Data

– Facility for “provenance-supported” data

– Facility to discover and access & apply cross-domain data

• Create research environment for collaborations – Enable Reuse & Repurposing of Data Collections

Add more micro-services & workflows

– Provision Value-added Services for Industry-related Capabilities Compartmentalized Privacy & Security Policies

– Extensible, Modular, Scalable, Technology-agnostic & Policy-oriented full data-life-cycle services for Academia & Research

32

Page 31: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/3-DFC... · 4/3/2012  · University NextGen Network Enabled Weather Web analysis services GMU GeOnAS, DataONE

33

Questions?

DataNet Federation Consortium

Page 32: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/3-DFC... · 4/3/2012  · University NextGen Network Enabled Weather Web analysis services GMU GeOnAS, DataONE

DFC: A Services Oriented Architecture

34

Page 33: DataNet Federation Consortiumdatafed.org/dev/wp-content/uploads/2012/04/3-DFC... · 4/3/2012  · University NextGen Network Enabled Weather Web analysis services GMU GeOnAS, DataONE

RENCI-DFC Network Connectivity

35