data fabric ig use case analysis. 2 data fabric analysis how to come to essential components &...

18
Data Fabric IG Use Case Analysis

Upload: oswin-kennedy

Post on 28-Dec-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Fabric IG Use Case Analysis. 2 Data Fabric Analysis how to come to essential components & services? Analyze Data Practices

Data Fabric IGUse Case Analysis

Page 2: Data Fabric IG Use Case Analysis. 2 Data Fabric Analysis how to come to essential components & services? Analyze Data Practices

2Data Fabric Analysis

how to come to essential components & services?

Analyze Data Practices

Page 3: Data Fabric IG Use Case Analysis. 2 Data Fabric Analysis how to come to essential components & services? Analyze Data Practices

3Data Practices I (120 interviews etc.)

Page 4: Data Fabric IG Use Case Analysis. 2 Data Fabric Analysis how to come to essential components & services? Analyze Data Practices

4Data Practices II – EUDAT federation

Community CentersCommon Data Centers

projects to push limits and raise awareness

Page 5: Data Fabric IG Use Case Analysis. 2 Data Fabric Analysis how to come to essential components & services? Analyze Data Practices

5Data Practices II – split of functions

physical layer operations are trivial – know how to do it “logical layer” operations are complex due to relations, etc. all LL information needs to be aggregated and we need to have

a secure access layer around it

Page 6: Data Fabric IG Use Case Analysis. 2 Data Fabric Analysis how to come to essential components & services? Analyze Data Practices

6Data Fabric Analysis

how to come to essential components & services?

Analyze Use Cases

Page 7: Data Fabric IG Use Case Analysis. 2 Data Fabric Analysis how to come to essential components & services? Analyze Data Practices

710 (+5) Use Cases so far (2 in development, others mature)

environmental science natural science life science humanities, soc. sciences IT, variousall indicated nodes are centers of national, regional and even worldwide federations

Page 8: Data Fabric IG Use Case Analysis. 2 Data Fabric Analysis how to come to essential components & services? Analyze Data Practices

810 (+5) Use Cases so far (2 in development, others mature)

all indicated nodes are centers of national, regional and even worldwide federations

Name Institute state

1 Language Archive Max Planck Institute NL in operation

2 Geodata Sharing Platform Academy of China In operation

3 Datanet Federation Concortium RENCI US In operation

4 ADCIRC Storm Forcasting RENCI US In operation

5 EPOS Plate Observation INGV/CINECA Italy In operation

6 ENVRI Environment Observation U Helsinki, Finland In design

7 Nanoscopy Repository Cell structures KIT, Germany In design

8 Human Brain Neuroinformatics EPFL Switzerland in testing

9 ENES Climate Modeling DKRZ Germany In operation

10 LIGO Gravitation Physics NCSA US In operation

11 ECRIN Medical Trial Interoperation U Düsseldorf Germany In testing

12 VPH Physiology Simulation U London UK In operation

13 Species Archive Nature Museum Germany In operation

14 International NeuroI Facility INCF Sweden In operation

15 Molecular Genetics MPI Germany In operation

Page 9: Data Fabric IG Use Case Analysis. 2 Data Fabric Analysis how to come to essential components & services? Analyze Data Practices

910 (+5) Use Cases so far (2 in development, others mature)

all indicated nodes are centers of national, regional and even worldwide federations

Name Institute state

1 Language Archive Max Planck Institute NL in operation

2 Geodata Sharing Platform Academy of China In operation

3 Datanet Federation Concortium RENCI US In operation

4 ADCIRC Storm Forcasting RENCI US In operation

5 EPOS Plate Observation INGV/CINECA Italy In operation

6 ENVRI Environment Observation U Helsinki, Finland In design

7 Nanoscopy Repository Cell structures KIT, Germany In design

8 Human Brain Neuroinformatics EPFL Switzerland in testing

9 ENES Climate Modeling DKRZ Germany In operation

10 LIGO Gravitation Physics NCSA US In operation

11 ECRIN Medical Trial Interoperation U Düsseldorf Germany In testing

12 VPH Physiology Simulation U London UK In operation

13 Species Archive Nature Museum Germany In operation

14 International NeuroI Facility INCF Sweden In operation

15 Molecular Genetics MPI Germany In operation

a few side remarks

• these are all federated approaches

• some have various use cases (one selected)

• 3 is more of an IT framework applied by many

• description of state very vague indication

• 5 marked red need another round of interaction

Page 10: Data Fabric IG Use Case Analysis. 2 Data Fabric Analysis how to come to essential components & services? Analyze Data Practices

10Issues of Relevance

sensorssimulationscrowdetc.

PID, MetadataRightsSyntax, TypesSemanticsRelations

FS, Cloud, DBRepository System

virtualcollectionbuilder

management, analytics, conversionprovenance – reproducibilityworkflows, policies, deployment

new collectionnew metadatatemp store

highly distributedin federations

AAI/FIM

Page 11: Data Fabric IG Use Case Analysis. 2 Data Fabric Analysis how to come to essential components & services? Analyze Data Practices

11How do WGs/IGs fit?

CITDD

PROV BROK

CERT

CERTBDA

REP

REPRO

DMP

DOM

FIM

PP

Page 12: Data Fabric IG Use Case Analysis. 2 Data Fabric Analysis how to come to essential components & services? Analyze Data Practices

12

domain of registered digital objects (DO) incl. basic organization principles (data, code, knowledge) -> worldwide PID system (Handles/DOI)

domain of registered actors -> worldwide ID system (ORCID)

domain of trusted repositories for DOs -> worldwide Rep Registry proper DFT/DSA/WDS compliant repository systems

accepted policy commons (proper organization support, self-documenting, tested/certified, etc.) -> policy component registry

policy/services -> service registry

authentication system -> various in place (ORCID just number)

authorization system -> authorization registry

Components I

Page 13: Data Fabric IG Use Case Analysis. 2 Data Fabric Analysis how to come to essential components & services? Analyze Data Practices

13

MD components/schemas -> metadata schema registry

data types /schemas/formats -> data type registry

semantic categories -> category registry

vocabularies -> vocabulary registry

what about complex ontologies (thesauri, ontologies, etc.) what about mapping relations?

Components II

Page 14: Data Fabric IG Use Case Analysis. 2 Data Fabric Analysis how to come to essential components & services? Analyze Data Practices

14

MD components/schemas -> metadata schema registry

data types /schemas/formats -> data type registry

semantic categories -> category registry

vocabularies -> vocabulary registry

what about complex ontologies (thesauri, ontologies, etc.) what about mapping relations?

Components II

much already out there but ...

... why does it cost months

• to federate and integrate data

• to make data interoperable

... need to harmonize, raise trust & value

... make it ready for machines

Page 15: Data Fabric IG Use Case Analysis. 2 Data Fabric Analysis how to come to essential components & services? Analyze Data Practices

15

4 use cases (max 10 min) with the following goals understand whether we get what we want to get

(common components/services) discuss whether we need to adapt the template

Zhu Dieter Sean Giuseppe Ed

discuss how to move on with use cases & analysis discuss my first look on C/S (?) update of existing and appearance on wiki (deadline) deadline for first round (when, whom to motivate, ?) virtual meeting for a discussion on analysis (when?)

at P6 (September) a first document with analysis

What to do today

Page 16: Data Fabric IG Use Case Analysis. 2 Data Fabric Analysis how to come to essential components & services? Analyze Data Practices

16

Did we forget something?

Page 17: Data Fabric IG Use Case Analysis. 2 Data Fabric Analysis how to come to essential components & services? Analyze Data Practices

17Data Practices I – Survey

~120 Interviews/Interactions 2 Workshops with Leading Scientists (EU, US)

too much manual or via ad hoc scripts too much in Legacy formats (no PID & MD) there are lighthouse projects etc. but ... DM and DP not efficient and too expensive

(Biologist for 75% of his time data manager) federating data incl. logical information much too expensive hardly usage of automated workflows and lack of

reproducibility

Page 18: Data Fabric IG Use Case Analysis. 2 Data Fabric Analysis how to come to essential components & services? Analyze Data Practices

18Data Practices I – Survey

~120 Interviews/Interactions 2 Workshops with Leading Scientists (EU, US)

too much manual or via ad hoc scripts too much in Legacy formats (no PID & MD) there are lighthouse projects etc. but ... DM and DP not efficient and too expensive

(Biologist for 75% of his time data manager) federating data incl. logical information much too expensive hardly usage of automated workflows and lack of

reproducibility

• is DI research only available for Power-Institutes

• pressure towards DI research is high, but only

some departments are fit for the challenges

• Senior Researchers: can’t continue like this!

• need to move towards proper data organization

and automated workflows is evident

• but changes now are risky: lack of trained

experts, guidelines and support