system development & operations nsf datanet site visit to mit february 8, 2010 2/8/20101nsf site...

15
System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/2010 1 NSF Site Visit to MIT DataSpace DataSpace

Upload: mervyn-stevenson

Post on 08-Jan-2018

215 views

Category:

Documents


0 download

DESCRIPTION

PLATFORM ARCHITECTURE 2/8/2010NSF Site Visit to MIT DataSpace3 DataSpace

TRANSCRIPT

Page 1: System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/20101NSF Site Visit to MIT DataSpace DataSpace

System Development & Operations

NSF DataNet site visit to MITFebruary 8, 2010

2/8/2010 1NSF Site Visit to MIT DataSpace

DataSpace

Page 2: System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/20101NSF Site Visit to MIT DataSpace DataSpace

Other USA Nodes

International Nodes

DataSpaceHigh-Level

Architecture

Global Network (Web)

Local Network

Metadata Repository

for Scientific Data

Multiple Scientific Data Repositories (DataSpace Native Architecture)

Interface to Legacy Scientific

Data Repositories

. . .

Distributed Data Management Services: Security, Replication, Administration

Policy Management, Workflow Services

Additional Data User Services : • Data Analytics • Data Visualization

Basic Data User Services:Discovery, Quality, Conversion, IntegrationData Curation Services:Process, Catalog, Annotate, Preserve

DataSpace Services

MIT Node

. . .

Scientist Curator UserProvides

data,preliminary metadata

Process and ingests data,

complete metadata, and policies (e.g.

retention)

Searches (meta)data, accesses/integrates data, analyzes/visualizes data (via DataSpace data services or 3rd party data services)

Basic Workflow

DataSpace

3rd par

3rd Party Specialized Data Services

2

Page 3: System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/20101NSF Site Visit to MIT DataSpace DataSpace

PLATFORM ARCHITECTURE

2/8/2010 NSF Site Visit to MIT DataSpace 3

DataSpace

Page 4: System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/20101NSF Site Visit to MIT DataSpace DataSpace

Platform Architecture

Version 0.1 Version 1.0

2/8/2010 4NSF Site Visit to MIT DataSpace

Page 5: System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/20101NSF Site Visit to MIT DataSpace DataSpace

2/8/2010 5NSF Site Visit to MIT DataSpace

Page 6: System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/20101NSF Site Visit to MIT DataSpace DataSpace

Federated Architecture

2/8/2010 6NSF Site Visit to MIT DataSpace

Page 7: System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/20101NSF Site Visit to MIT DataSpace DataSpace

Multiple Implementations

2/8/2010 7NSF Site Visit to MIT DataSpace

Page 8: System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/20101NSF Site Visit to MIT DataSpace DataSpace

Federated Model• Data can be widely distributed; Web-based Services

can be centralized or federated– e.g. centralized, domain-specific search service that

harvests metadata from relevant archives (“google for biological oceanography”)

– e.g. real-time data integration across small sets of archives identified via subject search

• DataSpace will develop some , but more importantly create an ecosystem that others can contribute to (e.g. technology & scientific companies, universities, researchers, labs)

February 8, 2010 NSF Site Visit to MIT DataSpace 8

Page 9: System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/20101NSF Site Visit to MIT DataSpace DataSpace

Development Methodology

• Behavior-Driven Development model• Continuous Integration Process– iteratative research prototyping and production

implementation phases• Small centralized development team to start • Institutional partners add developers in years 1-2• Transparent, open source process• Close collaboration with Data Conservancy

2/8/2010 9NSF Site Visit to MIT DataSpace

Page 10: System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/20101NSF Site Visit to MIT DataSpace DataSpace

OPERATIONS

2/8/2010 NSF Site Visit to MIT DataSpace 10

DataSpace

Page 11: System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/20101NSF Site Visit to MIT DataSpace DataSpace

Local Operations – MIT Example

• Scientists– data production, early-stage curation– lots of domain expertise, little or no curation expertise

• Libraries– outreach and recruitment (e.g. HMI study)– later-stage data curation, ingest– some domain expertise, lots of curation expertise

• IS&T – identifying, operating hardware & system– Enterprise systems management expertise– lots of IT expertise, some curation expertise

2/8/2010 11NSF Site Visit to MIT DataSpace

Page 12: System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/20101NSF Site Visit to MIT DataSpace DataSpace

Project-Wide Operations

• Platform governance– distributed open source software model– transparent decision-making process

• Service model(s) for each institutional partner– including all data curation activities– including CI templates (e.g. hardware, cloud)– associated cost model for each service model

2/8/2010 12NSF Site Visit to MIT DataSpace

Page 13: System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/20101NSF Site Visit to MIT DataSpace DataSpace

Project-Wide Operations

• Ongoing usability studies with researchers, students, public audiences

• Develop certification strategy for TDRs using DataSpace (.arc domain)

2/8/2010 13NSF Site Visit to MIT DataSpace

Page 14: System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/20101NSF Site Visit to MIT DataSpace DataSpace

Data Curation Lifecycle Highlights

• Deposit workflows for researchers based on locally-produced data (interactive and batch)

• Data Curators– outreach, marketing, data recruitment– metadata creation and data ontology application– curatorial policies developed, applied– tailored preservation strategies (local, consortial, outsourced)

Direct access to data creators and boots on the ground support services

2/8/2010 NSF Site Visit to MIT DataSpace 14

Page 15: System Development & Operations NSF DataNet site visit to MIT February 8, 2010 2/8/20101NSF Site Visit to MIT DataSpace DataSpace

Data Curation Lifecycle Highlights

• Novel distributed, standards-based policy management strategy based on emerging Semantic Web standards and TRAC

• Semantic Web standards (e.g. RDF) to support improved data integration and interoperability

• Separation of access layer (discovery, use) from curation layer, in support of broad federation, distributed tool development

2/8/2010 NSF Site Visit to MIT DataSpace 15