emerging domain agnostic functionalities on the handle-centered networks

18
Emerging domain agnostic functionalities on the handle-centered networks Kei Kurakawa National Institute of Informatics Takayuki Sekiya The University of Tokyo Yasumasa Baba The Institute of Statistical Mathematics 1 International Workshop on Sharing, Citation and Publication of Scientific Data across Disciplines Joint Support-Center for Data science Research (DS), ROIS NIPR / NINJAL, Tachikawa, Tokyo, Japan, 5-7 December 2017.

Upload: national-institute-of-informatics

Post on 28-Jan-2018

64 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Emerging domain agnostic functionalities on the handle-centered networks

Emerging domain agnostic functionalities

on the handle-centered networks

Kei Kurakawa

National Institute of Informatics

Takayuki Sekiya

The University of Tokyo

Yasumasa Baba

The Institute of Statistical Mathematics

1

International Workshop on Sharing, Citation and Publication of Scientific Data across Disciplines

Joint Support-Center for Data science Research (DS), ROIS

NIPR / NINJAL, Tachikawa, Tokyo, Japan, 5-7 December 2017.

Page 2: Emerging domain agnostic functionalities on the handle-centered networks

Overview

• Research data sharing

• Domain-independent automatic data processing environment on the PID centric information model for very large collections of distributed scientific data

• Kernel Information

• Handle-centered networks on Kernel Information metadata layer

• Future directions

• Summary

2

Page 3: Emerging domain agnostic functionalities on the handle-centered networks

Research data sharing mind from Open

Access, Open Data , Open Science

• Disciplinary historical events– Meteorology and geoscience

• The first International Polar Year (IPY)(1882)

• The first International Geophysics Year (IGY) (1957)

– Biology• “Bermuda Rules” (1996)

• Interdisciplinary events– Budapest Open Access Initiative (2002)

– Berlin Declaration (2003)

– G8 Open Data Charter (2013)

• The movement reached at the slogan “research data sharing without barriers” of RDA (Research Data Alliance) among all disciplines, in order to innovate and develop societal and technological specifications for scientific data infrastructures.

3

Page 4: Emerging domain agnostic functionalities on the handle-centered networks

Current procedure to aggregate

and process the scientific data• The procedure, which may be

peculiar to each discipline, is a process of craftsmanship and too much time consuming task.

• The data consumer needs to understand the semantics of data structure in domain dependent schemes and choose ordinarily a community standard of tools on a specific computational environment to process the data.

• It seems to be difficult for outsiders of the expertise to do the same things.

4

1. Fetch and crawl data

Data on the Web

Data consumer

2. Manually process the data

Manually check for:

data format

data structure

data version

data provenance

data quality

Page 5: Emerging domain agnostic functionalities on the handle-centered networks

A community objective

• Data on the web

– Very large collections of scientific data, which is distributed on the web

– PID centric information model

• Two major processes in the scientific data use

– Data discovery

– Automatic data processing

• To invest domain-independent automatic data processing environment on the PID centric information model for very large collections of distributed scientific data

5

Page 6: Emerging domain agnostic functionalities on the handle-centered networks

PID centric information model

and services

• Information elements

– Handle : PID

– Metadata

– Data

– Data type

• PID resolve service

– Handle server

• Metadata service

– Metadata repository

• Data services

– Data repository

– Data type registry

6

Working group outputs of the Data Fabric, Data Type

Registries, PID Information Types, and PID Kernel Information

Page 7: Emerging domain agnostic functionalities on the handle-centered networks

Kernel Information : Metadata

7

<Web Space>

Handle:PID

Data

Handle:PID

KI:Metadata

Handle:PID

KI:Metadata

Data type

Data

Data type for the “Data”

Kernel Information represents a connection between

Data and Data type.

digitalObjectLocation

digitalObjectLocationdigitalObjectType

digitalObjectType

Page 8: Emerging domain agnostic functionalities on the handle-centered networks

Kernel Information : Metadata

8

<Web Space>

Handle:PID

Data

Handle:PID

Data type

Handle:PID

KI:Metadata

Handle:PID

KI:Metadata

Data type

Data

KI:Metadata itself also should be data-typed.

Data type for the “KI:Metadata”

digitalObjectLocation

digitalObjectLocationdigitalObjectType

digitalObjectType

RDAKIProfileType

RDAKIProfileType

Page 9: Emerging domain agnostic functionalities on the handle-centered networks

Kernel Information : Metadata

9

<Web Space>

Handle:PID

Data

Handle:PID

Data type

Handle:PID

KI:Metadata

Handle:PID

KI:Metadata

Data type

Data

KI represents structural relationships.

digitalObjectLocation

digitalObjectLocationdigitalObjectType

digitalObjectType

RDAKIProfileType

RDAKIProfileType

wasDerivedFrom

Page 10: Emerging domain agnostic functionalities on the handle-centered networks

Kernel Information structural

data relationships defined• wasDerivedFrom

• specializationOf

• revisionOf

• primarySourceOf

• quotationOf

• alternateOf

• hadMember

• externalW3CPROVDoc

10

Page 11: Emerging domain agnostic functionalities on the handle-centered networks

11

PID profile metadata 17.03.06 from the WG

Page 12: Emerging domain agnostic functionalities on the handle-centered networks

PID centric information sequence

12

KI:Metadata

Handle:PID

Handle serverClient

1. Query with Handle

2. Handle information

(e.g., PID to Profile, URL to target ROR,

Data field for PID Kernel Information)

Data type registry Data repository

Metadata repository

or Landing page

3. Query with Handle for DTR profile

4. DTR profile definition

5. HTTP GET Resources (data, metadata, landing page)

6. ResourcesMetadata

Data type

Data

Page 13: Emerging domain agnostic functionalities on the handle-centered networks

Data processing paradigm shift

13

1. Fetch and crawl data

Data on the Web

Data consumer

2. Manually process the data

Manually check for:

data format

data structure

data version

data provenance

data quality

Data on PID centric

information architecture

Current manual method Future automatic method

1. Fetch the list of PIDs

Client program

Data type registry

Handle service

2. Query/response for PID KI profile

3. Query/response for data type profile

5.Automatically process the data

4. Fetch the data

Page 14: Emerging domain agnostic functionalities on the handle-centered networks

Metadata splitting

14

Domain independent

Domain dependent

Automatic data processing

Data discovery

Separation of concerns

Metadata for data

Generality levels

Domain independent

Domain dependent

Kernel Information

Metadata definitions

Metadata

for data discovery

Page 15: Emerging domain agnostic functionalities on the handle-centered networks

Handle-centered networks

on Kernel Information metadata layer

15

Attribute augmented graph

Data layer

Data type layer

Kernel Information metadata layer

Page 16: Emerging domain agnostic functionalities on the handle-centered networks

Future directions

• Domain agnostic functionalities

• Data science approach

– Analysis of data

– Classification of data

– Recommendations of data

– Prediction of data

• On Kernel Information metadata layer,

– Trustworthy and traceability analysis before download the data

16

Page 17: Emerging domain agnostic functionalities on the handle-centered networks

Summary

• The objective is – Domain-independent automatic data processing environment on

the PID centric information model for very large collections of distributed scientific data

• We introduced– Kernel Information as a RDA working output

• Data

• Data type

• Structural relation between data

• We viewed – Handle-centered networks on the Kernel Information metadata

layer

• Domain agnostic functionalities is emerging from– Graph based reasoning on the framework.

17

Page 18: Emerging domain agnostic functionalities on the handle-centered networks

Acknowledgement

• This work is supported by the open

collaborative research at National Institute

of Informatics (NII) Japan (FY2017).

• The authors are thankful to all RDA Kernel

Information WG members for their great

discussions on remotely and in-person

meetings.

18