data fabric ig use case analysis. 2 data fabric analysis how to come to essential components &...
TRANSCRIPT
Data Fabric IGUse Case Analysis
2Data Fabric Analysis
how to come to essential components & services?
Analyze Data Practices
3Data Practices I (120 interviews etc.)
4Data Practices II – EUDAT federation
Community CentersCommon Data Centers
projects to push limits and raise awareness
5Data Practices II – split of functions
physical layer operations are trivial – know how to do it “logical layer” operations are complex due to relations, etc. all LL information needs to be aggregated and we need to have
a secure access layer around it
6Data Fabric Analysis
how to come to essential components & services?
Analyze Use Cases
710 (+5) Use Cases so far (2 in development, others mature)
environmental science natural science life science humanities, soc. sciences IT, variousall indicated nodes are centers of national, regional and even worldwide federations
810 (+5) Use Cases so far (2 in development, others mature)
all indicated nodes are centers of national, regional and even worldwide federations
Name Institute state
1 Language Archive Max Planck Institute NL in operation
2 Geodata Sharing Platform Academy of China In operation
3 Datanet Federation Concortium RENCI US In operation
4 ADCIRC Storm Forcasting RENCI US In operation
5 EPOS Plate Observation INGV/CINECA Italy In operation
6 ENVRI Environment Observation U Helsinki, Finland In design
7 Nanoscopy Repository Cell structures KIT, Germany In design
8 Human Brain Neuroinformatics EPFL Switzerland in testing
9 ENES Climate Modeling DKRZ Germany In operation
10 LIGO Gravitation Physics NCSA US In operation
11 ECRIN Medical Trial Interoperation U Düsseldorf Germany In testing
12 VPH Physiology Simulation U London UK In operation
13 Species Archive Nature Museum Germany In operation
14 International NeuroI Facility INCF Sweden In operation
15 Molecular Genetics MPI Germany In operation
910 (+5) Use Cases so far (2 in development, others mature)
all indicated nodes are centers of national, regional and even worldwide federations
Name Institute state
1 Language Archive Max Planck Institute NL in operation
2 Geodata Sharing Platform Academy of China In operation
3 Datanet Federation Concortium RENCI US In operation
4 ADCIRC Storm Forcasting RENCI US In operation
5 EPOS Plate Observation INGV/CINECA Italy In operation
6 ENVRI Environment Observation U Helsinki, Finland In design
7 Nanoscopy Repository Cell structures KIT, Germany In design
8 Human Brain Neuroinformatics EPFL Switzerland in testing
9 ENES Climate Modeling DKRZ Germany In operation
10 LIGO Gravitation Physics NCSA US In operation
11 ECRIN Medical Trial Interoperation U Düsseldorf Germany In testing
12 VPH Physiology Simulation U London UK In operation
13 Species Archive Nature Museum Germany In operation
14 International NeuroI Facility INCF Sweden In operation
15 Molecular Genetics MPI Germany In operation
a few side remarks
• these are all federated approaches
• some have various use cases (one selected)
• 3 is more of an IT framework applied by many
• description of state very vague indication
• 5 marked red need another round of interaction
10Issues of Relevance
sensorssimulationscrowdetc.
PID, MetadataRightsSyntax, TypesSemanticsRelations
FS, Cloud, DBRepository System
virtualcollectionbuilder
management, analytics, conversionprovenance – reproducibilityworkflows, policies, deployment
new collectionnew metadatatemp store
highly distributedin federations
AAI/FIM
11How do WGs/IGs fit?
CITDD
PROV BROK
CERT
CERTBDA
REP
REPRO
DMP
DOM
FIM
PP
12
domain of registered digital objects (DO) incl. basic organization principles (data, code, knowledge) -> worldwide PID system (Handles/DOI)
domain of registered actors -> worldwide ID system (ORCID)
domain of trusted repositories for DOs -> worldwide Rep Registry proper DFT/DSA/WDS compliant repository systems
accepted policy commons (proper organization support, self-documenting, tested/certified, etc.) -> policy component registry
policy/services -> service registry
authentication system -> various in place (ORCID just number)
authorization system -> authorization registry
Components I
13
MD components/schemas -> metadata schema registry
data types /schemas/formats -> data type registry
semantic categories -> category registry
vocabularies -> vocabulary registry
what about complex ontologies (thesauri, ontologies, etc.) what about mapping relations?
Components II
14
MD components/schemas -> metadata schema registry
data types /schemas/formats -> data type registry
semantic categories -> category registry
vocabularies -> vocabulary registry
what about complex ontologies (thesauri, ontologies, etc.) what about mapping relations?
Components II
much already out there but ...
... why does it cost months
• to federate and integrate data
• to make data interoperable
... need to harmonize, raise trust & value
... make it ready for machines
15
4 use cases (max 10 min) with the following goals understand whether we get what we want to get
(common components/services) discuss whether we need to adapt the template
Zhu Dieter Sean Giuseppe Ed
discuss how to move on with use cases & analysis discuss my first look on C/S (?) update of existing and appearance on wiki (deadline) deadline for first round (when, whom to motivate, ?) virtual meeting for a discussion on analysis (when?)
at P6 (September) a first document with analysis
What to do today
16
Did we forget something?
17Data Practices I – Survey
~120 Interviews/Interactions 2 Workshops with Leading Scientists (EU, US)
too much manual or via ad hoc scripts too much in Legacy formats (no PID & MD) there are lighthouse projects etc. but ... DM and DP not efficient and too expensive
(Biologist for 75% of his time data manager) federating data incl. logical information much too expensive hardly usage of automated workflows and lack of
reproducibility
18Data Practices I – Survey
~120 Interviews/Interactions 2 Workshops with Leading Scientists (EU, US)
too much manual or via ad hoc scripts too much in Legacy formats (no PID & MD) there are lighthouse projects etc. but ... DM and DP not efficient and too expensive
(Biologist for 75% of his time data manager) federating data incl. logical information much too expensive hardly usage of automated workflows and lack of
reproducibility
• is DI research only available for Power-Institutes
• pressure towards DI research is high, but only
some departments are fit for the challenges
• Senior Researchers: can’t continue like this!
• need to move towards proper data organization
and automated workflows is evident
• but changes now are risky: lack of trained
experts, guidelines and support