data fabric ig use case analysis. 2 data fabric analysis how to come to essential components &...

Post on 28-Dec-2015

224 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Data Fabric IGUse Case Analysis

2Data Fabric Analysis

how to come to essential components & services?

Analyze Data Practices

3Data Practices I (120 interviews etc.)

4Data Practices II – EUDAT federation

Community CentersCommon Data Centers

projects to push limits and raise awareness

5Data Practices II – split of functions

physical layer operations are trivial – know how to do it “logical layer” operations are complex due to relations, etc. all LL information needs to be aggregated and we need to have

a secure access layer around it

6Data Fabric Analysis

how to come to essential components & services?

Analyze Use Cases

710 (+5) Use Cases so far (2 in development, others mature)

environmental science natural science life science humanities, soc. sciences IT, variousall indicated nodes are centers of national, regional and even worldwide federations

810 (+5) Use Cases so far (2 in development, others mature)

all indicated nodes are centers of national, regional and even worldwide federations

Name Institute state

1 Language Archive Max Planck Institute NL in operation

2 Geodata Sharing Platform Academy of China In operation

3 Datanet Federation Concortium RENCI US In operation

4 ADCIRC Storm Forcasting RENCI US In operation

5 EPOS Plate Observation INGV/CINECA Italy In operation

6 ENVRI Environment Observation U Helsinki, Finland In design

7 Nanoscopy Repository Cell structures KIT, Germany In design

8 Human Brain Neuroinformatics EPFL Switzerland in testing

9 ENES Climate Modeling DKRZ Germany In operation

10 LIGO Gravitation Physics NCSA US In operation

11 ECRIN Medical Trial Interoperation U Düsseldorf Germany In testing

12 VPH Physiology Simulation U London UK In operation

13 Species Archive Nature Museum Germany In operation

14 International NeuroI Facility INCF Sweden In operation

15 Molecular Genetics MPI Germany In operation

910 (+5) Use Cases so far (2 in development, others mature)

all indicated nodes are centers of national, regional and even worldwide federations

Name Institute state

1 Language Archive Max Planck Institute NL in operation

2 Geodata Sharing Platform Academy of China In operation

3 Datanet Federation Concortium RENCI US In operation

4 ADCIRC Storm Forcasting RENCI US In operation

5 EPOS Plate Observation INGV/CINECA Italy In operation

6 ENVRI Environment Observation U Helsinki, Finland In design

7 Nanoscopy Repository Cell structures KIT, Germany In design

8 Human Brain Neuroinformatics EPFL Switzerland in testing

9 ENES Climate Modeling DKRZ Germany In operation

10 LIGO Gravitation Physics NCSA US In operation

11 ECRIN Medical Trial Interoperation U Düsseldorf Germany In testing

12 VPH Physiology Simulation U London UK In operation

13 Species Archive Nature Museum Germany In operation

14 International NeuroI Facility INCF Sweden In operation

15 Molecular Genetics MPI Germany In operation

a few side remarks

• these are all federated approaches

• some have various use cases (one selected)

• 3 is more of an IT framework applied by many

• description of state very vague indication

• 5 marked red need another round of interaction

10Issues of Relevance

sensorssimulationscrowdetc.

PID, MetadataRightsSyntax, TypesSemanticsRelations

FS, Cloud, DBRepository System

virtualcollectionbuilder

management, analytics, conversionprovenance – reproducibilityworkflows, policies, deployment

new collectionnew metadatatemp store

highly distributedin federations

AAI/FIM

11How do WGs/IGs fit?

CITDD

PROV BROK

CERT

CERTBDA

REP

REPRO

DMP

DOM

FIM

PP

12

domain of registered digital objects (DO) incl. basic organization principles (data, code, knowledge) -> worldwide PID system (Handles/DOI)

domain of registered actors -> worldwide ID system (ORCID)

domain of trusted repositories for DOs -> worldwide Rep Registry proper DFT/DSA/WDS compliant repository systems

accepted policy commons (proper organization support, self-documenting, tested/certified, etc.) -> policy component registry

policy/services -> service registry

authentication system -> various in place (ORCID just number)

authorization system -> authorization registry

Components I

13

MD components/schemas -> metadata schema registry

data types /schemas/formats -> data type registry

semantic categories -> category registry

vocabularies -> vocabulary registry

what about complex ontologies (thesauri, ontologies, etc.) what about mapping relations?

Components II

14

MD components/schemas -> metadata schema registry

data types /schemas/formats -> data type registry

semantic categories -> category registry

vocabularies -> vocabulary registry

what about complex ontologies (thesauri, ontologies, etc.) what about mapping relations?

Components II

much already out there but ...

... why does it cost months

• to federate and integrate data

• to make data interoperable

... need to harmonize, raise trust & value

... make it ready for machines

15

4 use cases (max 10 min) with the following goals understand whether we get what we want to get

(common components/services) discuss whether we need to adapt the template

Zhu Dieter Sean Giuseppe Ed

discuss how to move on with use cases & analysis discuss my first look on C/S (?) update of existing and appearance on wiki (deadline) deadline for first round (when, whom to motivate, ?) virtual meeting for a discussion on analysis (when?)

at P6 (September) a first document with analysis

What to do today

16

Did we forget something?

17Data Practices I – Survey

~120 Interviews/Interactions 2 Workshops with Leading Scientists (EU, US)

too much manual or via ad hoc scripts too much in Legacy formats (no PID & MD) there are lighthouse projects etc. but ... DM and DP not efficient and too expensive

(Biologist for 75% of his time data manager) federating data incl. logical information much too expensive hardly usage of automated workflows and lack of

reproducibility

18Data Practices I – Survey

~120 Interviews/Interactions 2 Workshops with Leading Scientists (EU, US)

too much manual or via ad hoc scripts too much in Legacy formats (no PID & MD) there are lighthouse projects etc. but ... DM and DP not efficient and too expensive

(Biologist for 75% of his time data manager) federating data incl. logical information much too expensive hardly usage of automated workflows and lack of

reproducibility

• is DI research only available for Power-Institutes

• pressure towards DI research is high, but only

some departments are fit for the challenges

• Senior Researchers: can’t continue like this!

• need to move towards proper data organization

and automated workflows is evident

• but changes now are risky: lack of trained

experts, guidelines and support

top related