ocean observatories initiative data management (dm) subsystem overview michael meisinger september...

22
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009

Upload: liliana-cathleen-daniels

Post on 13-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009

Ocean Observatories Initiative

Data Management (DM) Subsystem Overview

Michael Meisinger

September 29, 2009

Page 2: Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009

2

OOI CI Kick-Off MeetingSept 9-11, 2009

Outline

• Subsystem Architecture Overview• Scope of Release 1• Selected Components

– Data Distribution based on the Exchange– Data Store as a Service

Page 3: Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009

3

OOI CI Kick-Off MeetingSept 9-11, 2009

Data Distribution w/ Exchange

• Context of DM within CI

• Exchange handles Data distribution

Common Operating Infrastructure

Data Management

(Science)

Sensing & Acquisition

Data Management (Information Distribution)

Analysis & Synthesis

Identity Management

State Management

GovernanceFramework

Resource Management

Planning & Prosecution

Exchange

Service Framework

Presentation Framework

Common Execution

Infrastructure

Page 4: Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009

4

OOI CI Kick-Off MeetingSept 9-11, 2009

Data Processing and Availability• Multiple

aspects of data management

• Data processing and analysis at various levels of abstraction

• Data distribution critical to global scientific research

Page 5: Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009

5

OOI CI Kick-Off MeetingSept 9-11, 2009

Requirements

• Focus on High risk requirementsThe CI shall implement an OOI-standard metadata model for resources

The OOI-standard metadata model shall support a description of physical resource behavior

The OOI-standard metadata model shall support a description of physical resource content

The OOI-standard metadata model shall support a syntactic description for the content of an

information resource

The OOI-standard metadata model shall support a semantic description for the content of an

information resource

The OOI-standard metadata model shall support tracking of resource provenance

The OOI-standard metadata model shall support tracking of quality

The OOI-standard metadata model shall support tracking of context

The OOI-standard metadata model shall support tracking of correspondence

The OOI-standard metadata model shall support tracking of citation

The OOI-standard metadata model shall support tracking of lineage

The OOI-standard metadata model shall be extensible

The CI shall provide semantic services to support ontological representations and relationships

The semantic services shall utilize domain-specific vocabularies

A user interface to define vocabulary terms shall be provided

The vocabularies shall be extensible

The semantic services shall recommend new terms to enter into the vocabulary

The semantic services shall implement an ontological language

The semantic services shall implement an ontological engine

The CI shall provide persistent archive services

The persistent archive services shall be data format agnostic

The persistent archive services shall be subject to policy

The persistent archive services shall preserve all associations between data and metadata

The persistent archive services shall ingest data independent of delivery order

The persistent archive services shall guarantee the integrity of archived data

The persistent archive services shall support distributed data repositories

The persistent archive services shall support federation

The persistent archive services shall support data versioning

The persistent archive services shall acknowledge requests for data and provide an estimate for response time

Com

mon

dat

a an

d m

eta-

data

mod

els

Com

mon

dat

a an

d m

eta-

data

mod

els

Arc

hiva

l, sy

ntax

, sem

antic

s, U

I…

Arc

hiva

l, sy

ntax

, sem

antic

s, U

I…

Page 6: Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009

6

OOI CI Kick-Off MeetingSept 9-11, 2009

Scope of Release 1• Common data and metadata model

– Resource metadata, behavior, lifecycle, content, provenance, lineage, citation, quality, context, correspondence

– Extensible vocabularies and ontologies– Data formats (syntax and semantics)

• Dynamic data distribution services– Pub/sub, topics, processing chaining, sequestration

• Data catalog and repository– Discovery, metadata management

• Persistent archive services– Repository management, common repository

framework, ingestion services, long-term archival

Page 7: Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009

7

OOI CI Kick-Off MeetingSept 9-11, 2009

DM Functional ComponentsData Management Services Network

Science Data Services

Information Distribution Services

“Standardized“Data products

Ingestion Transformation

ExchangeSensing &

AcquisitionSN

Observed dataData Products

Metadata

Presentation

Analysis & Synthesis SN

Preservation Inventory

DX Prototype

• Data Exchange (DX) prototype barely touches the Ingestion/Transformation/ Exchange/Preservation in the context of a Data distribution model

• DX strongly informs further refinements of the DM architecture and technology choices

Page 8: Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009

8

OOI CI Kick-Off MeetingSept 9-11, 2009

Information Container Model

• Encapsulates all kinds of information resources, such as: scientific data, user identities, process definitions, virtual machine images, etc.

• Multiple levels of meta-data

• Separation of concerns between Information services

Information Container

Meta-data(L1)

Information Block

Meta-data(L2)

Information Content

Header (optional)

Body (Content)

Information Container

Information Block

Meta-data (L1)

describe

Information Content

Meta-data (L2)

Header(Meta-data L3)

Body

describe

describe

Process Spec

Science Data

Ingestion

Transformation

operates

operates

InformationServices

InformationModel

Page 9: Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009

9

OOI CI Kick-Off MeetingSept 9-11, 2009

Ingestion

• Provides basic mechanisms for identifying the data streams and formats, parsing the content and identifying the associated meta-data, adding version information, and registering the streams with a ISN Repository

Ingestion

Versioning

Exchange

Data Format Detector

Registrar

Metadata Extractor

Data Parser

Page 10: Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009

10

OOI CI Kick-Off MeetingSept 9-11, 2009

Ingestion Service Data Model

• Relationship between the constituents of the Ingestion Service and the Information Container Model

Versioning

Data Format Detector

Registrar

Meta-Data Extractor

Data Parser

Information Container

Information Block

Meta-data (L1)

describe

Registration Information

Version

Ownership

Authorship

Policies

Annotations

operates

operates

operates

operates

operates

IngestionServices

InformationModel

Page 11: Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009

11

OOI CI Kick-Off MeetingSept 9-11, 2009

Transformation Service Data Model

• Relationship between the constituents of the Transformation Service and the Information Container Model

Format Conversion

Data Parser

Mediation

Meta-Data Extraction

V&V

Information Block

InformationModel

Information Content

Meta-data (L2)

describe

Header(Meta-data L3)

Bodydescribe

operates

operates

operates

Syntax

SemanticsOntology rely

Standard rely

operates

operates

TransformationServices

Page 12: Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009

12

OOI CI Kick-Off MeetingSept 9-11, 2009

Preservation Service Data Model

• Relationship between the constituents of the Preservation Service and the Information Repository Model

ReplicationHistory Backup Archive

Information Services

Information Repository

*

InformationRepresentations

Syntax

SemanticsOntology rely

*

InformationEntities

*

represent

InformationContainer

Meta-data (L0) Data Product

*

Process Definition

Model Process Repository

Instrument Process Repository

Process Definition Repository

Data Product Repository

Resource Repository

retainsretains

PreservationServices

Distribution Strategy*

abstracts

Resource Reference

locates

Resource

Organizational Architecture

Deployment Architecture

locates

Standard rely

describe

operates on

Page 13: Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009

13

OOI CI Kick-Off MeetingSept 9-11, 2009

Scientific Data Transport

• As DAP evolves, Unidata’s CDM may be its successor*– OpenDAP– netCDF– HDF5

DataRepresentation

Meta-dataRepresentation

Type Domain

Protocol

Dataset

describe

Variables AttributesVar

Value

Var Type

VarName

AttrName

Attr Type

Attr Value

semantic meta-data *

Syntactic metadata

DAP Data Type

DDSDataset Descriptor Structure

DAP Atomic Type

String0..1

DASDataset Attribute Structure

characterize

encode

Variable Attributes

Global Attributes

Structure

0..1

Arrayunidimensional

0..1

0..1

DataDDSData Dataset Descriptor Structure expose

provide data

* Comparison available at: http://wiki.opendap.org/twiki/bin/view/Developers/ModelSummary

• Currently DAP as canonical form

Page 14: Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009

14

OOI CI Kick-Off MeetingSept 9-11, 2009

Data Store as Service

• Exchange makes data transport possible and physical location of data becomes transparent to application

• Storage mechanisms abstracted to improve flexibility• Ability to choose the best technology for the available

platform that fits the intended purpose• Multiple different storage “back-ends” possible

• Attribute Store prototype as the predecessor to a storage architecture

Page 15: Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009

15

OOI CI Kick-Off MeetingSept 9-11, 2009

Attribute Store

Attribute Store

Command Processor

Repository

Command Set

Commandsimplements

READ WRITE DELETE QUERY SEARCH

Entities

*Key

Value

operate on

Lookup<<Specification>>

matches

describesRepresentational<<Specification>>

understands

Composite

*<<Specification>>

Atom

Map StringList Wildcard

Regexp

• generic repository of information organized around key + value pairs• intended to provide fast, reliable data storage and retrieval for lightweight data

elements (not a full-blown SQL engine).• Decomposition:

– Command Processor – interfaces with other OOI entities and abstracts from Repository technology

– Repository – stores the actual content in using the best technology available for the selected platform

– Specification – describes Repository and how to store/retrieve/match elements to/from Repository

Page 16: Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009

16

OOI CI Kick-Off MeetingSept 9-11, 2009

Attribute Store - DesignFundamental Interaction Pattern

ALT

ALT

Attribute Store

Command Processor Repository

WRITE(key,newvalue)

query(key)

Application

Key

KEY_NOT_FOUND

set(key, newvalue)

ERRORFAILURE

OKOldValue

WRITE(key, newvalue)

FAILURE

OldValue

get(key)

OldValue

Assign(OldValue, Newvalue)

Application Attribute Store

COMMAND(arguments)

RESPONSE

Internal Interaction Pattern for the WRITE Cmd.

Command SetCommand Arguments

(Input) Response (Output)

Semantics

WRITE Key, NewValue

OldValue, FAILURE

Locate pair (key, *) and if exists then assign to OldValue to the current value, otherwise assign to OldValue to NewValue. Set/Create pair (key, NewValue) and return OldValue or failure when creation failed.

READ Key Value, INVALID KEY, FAILURE

Locate pair (key, *) and return the value associated with the key (if found). Return invalid key when there’s no pair with that key, or failure when the read could not be performed.

DELETE Key SUCCESS, INVALID KEY, FAILURE

Locate pair (key, *) and delete it. Return invalid key when the pair could not be found, failure when the pair could not be deleted.

QUERY Regexp [Keylist] Searches for keys matching the regexp pattern and returns a list of them. The list is void when there are no matching keys.

Page 17: Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009

17

OOI CI Kick-Off MeetingSept 9-11, 2009

Data Representation

SDXF Element

Composite Chunk

Atomic Chunk

HeaderData

Chunk Identifier

(16bit)

Flags (8 bit)

Length0..1

Data Type (3 bit)

Compressed (1 bit)

Encrypted(1 bit)

Short Chunk (1 bit)

Array(1 bit)

Reserved(1 bit)

Pending structure

String

Bit string

Numeric (16 bit int)

Structure

Float

UTF-8

Reserved

SDXF

Data TypeInteger32 bit

Enum(integers)

Hyper int(64 bit)

Float(32 bit)

Unsigned Int32 bit

Boolean

String

ElementUnique

Identifier

Structure

XDR Data Object

Composite

Union

void

Double(64 bit)

Quadruple(128 bit)

Opaque

Fixed sized

Variable sized

Array

Container

Typedef

Optional Data

Constant

XDR

Data Type Boolean

Signed Int(16 bit)

Double Float(64 bit)

Byte(8 bit)

Long(64 bit)

String

ElementUnique

Identifier

Extprot Data

Composite

ListTuple ArrayDisjoint Sum

(tagged union)

PolymorphicMessagesExtprot

Data Type Bool

Signed Int 16

Signed Int 64

String

Byte

Signed Int 32

Double

Element

UniqueIdentifier

Structure

List

Set

Map

Thrift Object

Composite

Container Struct

Exception

Thrifft

TSocket

TFileTransport

Transport

Implementation

Protocol

void

Service

Methods

Interfaces

Bidirectional Sequenced Messaging

Encoding of Types

STOP

Version

carries

Facebook Thrift

• Data Representation/Encoding Standards – Processing– Transport– Storage

• Many choices… with overlapping capabilities

Page 18: Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009

Technology MappingFunctional Component Technology TRLDataset Catalog THREDDS 7Semantic Framework VSTO Semantic Framework 8Semantic Query ESG Facetted Search 8Data Integration with Applications NetCDF lib 8Data Integration with Applications Matlab lib for OpenDAP access 8Dataset Management & Distribution OpenDAP Hyrax Server 7Dataset Preservation iRODS 7General Purpose Database MySQL cluster 9Data Grid File Transfer GridFTP 9Dataset Access Protocol DAP 9Dataset File Format NetCDF 9Metadata Conventions CF Metadata (Climate & Forecast)9Dataset Aggregation Language NcML 9Query language for RDF SPARQL 8Knowledge Discovery Model URIQA 9Oceangraphic Vocabularies & MappingsMMI 7External Data Presentation OGC Services 9 & 7

Page 19: Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009

19

OOI CI Kick-Off MeetingSept 9-11, 2009

Thanks !

Page 20: Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009

20

OOI CI Kick-Off MeetingSept 9-11, 2009

DM Components• Base is DM FDR presentation• Data Distribution based on the Exchange

– Data Exchange architecture after services OV2 slide as example for a data distribution (vs storage model, the older model); real architecture has not been chosen; DX strongly informs. Covers Ingestion, Transformation, Preservation in the context of a Data distribution model

– DAP as canonical form for transport of data. For given streams there are canonical forms (e.g. DAP), but not for the system in general (i.e. a database). That’s why we chose the new model. Be aware that the underlying data model of DAP is in evolution. Unidata CDM. Insert a few references to these models.

– Reference to encoding formats, FIPA header– Query against the past (e.g. archive query) or the future (e.g.

subscriptions). Pointer to SQLstream prototype• Data Store as a Service

– Attribute store as the predecessor to a storage architecture– Model, commands

Page 21: Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009

21

OOI CI Kick-Off MeetingSept 9-11, 2009

• FIPA Provides valuable models for– Communication

patterns– Message structure

Message

Header

*

Message Parameters

Message TypeEnd of

Message

Message ID

Version

User defined<<Predefined Msg Type>>

User defined<<Predefined

Message Parameters>>

End of Collection

PredefinedMessage

Parameters

1

Control

*Sender

<<identity>>ConversationParticipants

0..1

<<Communicative Act>> Performative

Receiver<<identity set>>

Reply-To<<identity>>

0..1

0..1

Content

1

Semantics

Encoding0..1

LanguageOntology

0..10..1

*

Protocol

ConversationID

Reply-with

In-Reply-To

Reply-by

FIPA ACL Message Parameters

Content Description

Syntax

Page 22: Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009

22

OOI CI Kick-Off MeetingSept 9-11, 2009

Subsystem

• Data and Information Access– Search & Navigation– External observatory access (IOOS, Neptune

Canada, …)

• Transformation and Mediation– Attribution & Association– Aggregation– Syntactical Transformation– Ontology-based mediation between vocabularies

• Dynamic Data/Information Distribution– Persistent Archive– Information Catalog & Repository

Sensing & Acquisition

Data Management

Planning & Prosecution

Analysis & Synthesis

Common Execution Infrastructure

Common Operating Infrastructure

Capability Container

Data Management