a provenance assisted roadmap for life sciences linked open data cloud

18
A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud Ali Hasnain et. al Insight Center for Data Analytics National University of Ireland, Galway

Upload: syed-muhammad-ali-hasnain

Post on 16-Apr-2017

305 views

Category:

Healthcare


0 download

TRANSCRIPT

Page 1: A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud

A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud

Ali Hasnain et. alInsight Center for Data Analytics

National University of Ireland, Galway

Page 2: A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud

Agenda

• Motivation• Linked Life Sciences Roadmap• Cataloguing and Linking• Extending Catalogue – Metadata &

Provenance• Query Engine• Results

Page 3: A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud

Motivation• Biomedical Data is heterogeneous and

spread across multiple sources (SPARQL endpoints).

• Navigation is a challenge.

• Containing trillions of triples and represented with insufficient vocabulary reuse.

• Biologists sometimes want to get more information regarding the data including its source, creator, publisher and also statistics with respect to its size (Metadata & Provenance).

3

Page 4: A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud

How to deal heterogeneous data?

DrugBank

DailyMed

CheBI, KEGG

Reactome

Sider

BioPax

Medicare

Page 5: A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud

We want to query the content, not the source

Proteins

Molecules

Genes

Diseases

Page 6: A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud

A Linked Life Sciences Roadmap

Proteins

Molecules

Genes

Diseases

:Protein :Molecule

:Gene:Disease

UniprotPDB

Pfam PROSITEProDom

UnirefUniPark DailymedDrug

Bank ChemBL

PubChem KEGG

Gene OntologyGeneID

Affymetrix

Homogene

MGI

Diseasome

SIDER

Page 7: A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud

2- Possible Solutions

• To assemble queries over multiple graphs at multiple endpoints, either:

• vocabularies and ontologies are reused, Or • translation maps between different terminologies

are created (“a posteriori integration”)

Page 8: A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud

a-priori v.s a-posteriori Integration

8

Page 9: A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud

Cataloguing and Linking

9

Page 10: A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud

Describing DataSets- an Extract from Catalogue

Page 11: A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud

Extending Catalogue – Metadata & Provenance

Page 12: A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
Page 13: A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
Page 14: A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud

Query Engine

http://srvgal86.deri.ie:8000/graph/Granatum

Page 15: A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud

Visual & Graphical View

Page 16: A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud

SPARQL Endpoints returning results per query

Page 17: A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud

Runtimes taken by different queries (Max, Min, Average, Median)

Page 18: A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud