2016 acs semantic approaches for biochemical knowledge discovery

31
Semantic Approaches for Biochemical Knowledge Discovery 1 Michel Dumontier , Ph.D. Associate Professor of Medicine (Biomedical Informatics) Stanford University @micheldumontier::ACS:15-03-2016

Upload: michel-dumontier

Post on 12-Apr-2017

494 views

Category:

Science


1 download

TRANSCRIPT

Page 1: 2016 ACS Semantic Approaches for Biochemical Knowledge Discovery

Semantic Approachesfor Biochemical Knowledge Discovery

1

Michel Dumontier, Ph.D.

Associate Professor of Medicine (Biomedical Informatics)Stanford University

@micheldumontier::ACS:15-03-2016

Page 2: 2016 ACS Semantic Approaches for Biochemical Knowledge Discovery

Science!

@micheldumontier::ACS:15-03-20162

Page 3: 2016 ACS Semantic Approaches for Biochemical Knowledge Discovery

3 @micheldumontier::ACS:15-03-2016

Most published research findings are false.- John Ioannidis, Stanford University

Page 4: 2016 ACS Semantic Approaches for Biochemical Knowledge Discovery

4 @micheldumontier::ACS:15-03-2016

Science is hard.

Page 5: 2016 ACS Semantic Approaches for Biochemical Knowledge Discovery

Scientific knowledge is growing at an unprecedented rate

5 @micheldumontier::ACS:15-03-2016

Page 6: 2016 ACS Semantic Approaches for Biochemical Knowledge Discovery

Reusing raw and curated data in thousands of databases is challenging: identifiers, formats, access methods, links

6 @micheldumontier::ACS:15-03-2016

Page 7: 2016 ACS Semantic Approaches for Biochemical Knowledge Discovery

Various software are needed to analyze data(problems: OS, versioning, input/output formats)

7 @micheldumontier::ACS:15-03-2016

Page 8: 2016 ACS Semantic Approaches for Biochemical Knowledge Discovery

Ultimately, scientists develop fairly sophisticated programs/workflows to test hypotheses

8 @micheldumontier::ACS:15-03-2016

Page 9: 2016 ACS Semantic Approaches for Biochemical Knowledge Discovery

The absence of intelligent systems

requires vast amounts of experience and technical expertise

@micheldumontier::ACS:15-03-20169

Page 10: 2016 ACS Semantic Approaches for Biochemical Knowledge Discovery

How can we automatically find the evidence that support or dispute a scientific hypothesis using the latest data, tools and scientific knowledge?

@micheldumontier::ACS:15-03-201610

Page 11: 2016 ACS Semantic Approaches for Biochemical Knowledge Discovery

So what do we need to achieve this?

1. Data Science Tools and Methods– To identify, represent, interlink, integrate, and query

data and services– To identify and uncover support for known or novel

associations

2. Community Standards to share and interrogate a massive, decentralized network of interconnected data and software

@micheldumontier::ACS:15-03-201611

Page 12: 2016 ACS Semantic Approaches for Biochemical Knowledge Discovery

First, we need FAIR data

Findable– Globally unique identifiers for datasets and the data they contain– Rich set of descriptors to search and filter with– Indexed and searchable

Accessible– Metadata is eternally available.– Identifiers are used to retrieve representations using standard protocols (e.g.

HTTP)

Interoperable– Data represented with formal knowledge representations– Include links to other datasets/vocabularies

Reusable– Licensing, Provenance, Community standards

@micheldumontier::ACS:15-03-201612

“Numbers have no way of speaking for themselves. We need to imbue them with meaning.” - Nate Silver, The signal and the noise

Page 13: 2016 ACS Semantic Approaches for Biochemical Knowledge Discovery

@micheldumontier::ACS:15-03-201613

FAIR: Findable, Accessible, Interoperable, Re-usable

See paper for motivation and examples

We are now starting to think about quality measures.

Page 14: 2016 ACS Semantic Approaches for Biochemical Knowledge Discovery

The Semantic Webis the new global web of knowledge

14 @micheldumontier::ACS:15-03-2016

standards for publishing, sharing and querying facts, expert knowledge and services

scalable approach for the discoveryof independently formulated

and distributed knowledge

Page 15: 2016 ACS Semantic Approaches for Biochemical Knowledge Discovery

Linked Data is FAIR data

15 @micheldumontier::ACS:15-03-2016Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/"

Page 16: 2016 ACS Semantic Approaches for Biochemical Knowledge Discovery

@micheldumontier::ACS:15-03-2016

Linked Data for the Life Sciences

16

Bio2RDF is an open source project to unify the representation and interlinking of biological data using RDF.

chemicals/drugs/formulations, genomes/genes/proteins, domainsInteractions, complexes & pathwaysanimal models and phenotypesDisease, genetic markers, treatmentsTerminologies & publications

• 11B+ interlinked statements from 35 biomedical datasets

• dataset description, provenance & statistics• A growing interoperable ecosystem with the EBI,

NCBI, DBCLS, NCBO, OpenPHACTS, and commercial tool providers

Page 17: 2016 ACS Semantic Approaches for Biochemical Knowledge Discovery

@micheldumontier::ACS:15-03-201617

Page 18: 2016 ACS Semantic Approaches for Biochemical Knowledge Discovery

Bio2RDF shows how datasets are connected together

@micheldumontier::ACS:15-03-201618

Page 19: 2016 ACS Semantic Approaches for Biochemical Knowledge Discovery

graph methods for data qualityto find mismatches and discover new links

@micheldumontier::ACS:15-03-201619

W Hu, H Qiu, M Dumontier. Link Analysis of Life Science Linked Data. International Semantic Web Conference (2) 2015: 446-462.

Page 20: 2016 ACS Semantic Approaches for Biochemical Knowledge Discovery

Federated Queriesover public SPARQL EndPoints

Get all protein catabolic processes (and more specific GO terms) in biomodels

SELECT ?go ?label count(distinct ?x) WHERE {service <http://bioportal.bio2rdf.org/sparql> {

?go rdfs:label ?label .?go rdfs:subClassOf+ ?tgo?tgo rdfs:label ?tlabel .FILTER regex(?tlabel, "^protein catabolic process")}service <http://biomodels.bio2rdf.org/sparql> {?x <http://bio2rdf.org/biopax_vocabulary:identical-to> ?go . ?x a <http://www.biopax.org/release/biopax-level3.owl#BiochemicalReaction> .

}}

@micheldumontier::ACS:15-03-201620

Page 21: 2016 ACS Semantic Approaches for Biochemical Knowledge Discovery

EbolaKBUsing Linked Data and Software

@micheldumontier::ACS:15-03-201621

Kamdar, Dumontier. An Ebola virus-centered knowledge base. Database. 2015 Jun 8;2015. doi: 10.1093/database/bav049.

Page 22: 2016 ACS Semantic Approaches for Biochemical Knowledge Discovery

@micheldumontier::ACS:15-03-201622

Network analysis and discovery

Jim McCusker & Deb McGuiness

David Wild, Ying Ding

Page 23: 2016 ACS Semantic Approaches for Biochemical Knowledge Discovery

@micheldumontier::ACS:15-03-201623

HyQue

Page 24: 2016 ACS Semantic Approaches for Biochemical Knowledge Discovery

tactical formalization

@micheldumontier::ACS:15-03-201624

Take what you needand represent it in a way that directly serves your objective

STANDARDSfor broader reuse

APPLICATIONSfor optimized experience

Page 25: 2016 ACS Semantic Approaches for Biochemical Knowledge Discovery

High Quality Metadata are Essential

for Large-Scale Reuse and Biomedical Discovery

25 @micheldumontier::ACS:15-03-2016

Page 26: 2016 ACS Semantic Approaches for Biochemical Knowledge Discovery

Making it Easier, Possibly Even Pleasant, to Author Interoperable Experimental Metadata

26 @micheldumontier::ACS:15-03-2016

Page 27: 2016 ACS Semantic Approaches for Biochemical Knowledge Discovery

27

metadatacenter.org

NIH COMMONS

@micheldumontier::ACS:15-03-2016

Page 28: 2016 ACS Semantic Approaches for Biochemical Knowledge Discovery

smartAPI

The goal is to reduce the barrier for the discovery andreuse of web APIs through richer semantic metadata.

i) a coordinated facility for the intelligent annotation ofsmart APIs

ii) a web application to discover smart APIs and howthey connect to each other.

iii) The augmentation of existing APIs to provide FAIRdata

28 @micheldumontier::ACS:15-03-2016

Page 29: 2016 ACS Semantic Approaches for Biochemical Knowledge Discovery

smartAPI

29

Gene

myGene.infomyVariant.info

Linking API Data

Web Services

Linked DataCloud

@micheldumontier::ACS:15-03-2016

Page 30: 2016 ACS Semantic Approaches for Biochemical Knowledge Discovery

Evan’s Questions

• What should we be doing now?– Encouraging researchers to publish FAIR data and

services• How should we be doing it?

– As Linked Data – Institutional repositories and available in wikidata and

other aggregators• Where are things going in the future?

– Reproducible analyses over indexed, archived, and massively connected knowledge graphs

@micheldumontier::ACS:15-03-201630

Page 31: 2016 ACS Semantic Approaches for Biochemical Knowledge Discovery

[email protected]

Website: http://dumontierlab.comPresentations: http://slideshare.com/micheldumontier

31 @micheldumontier::ACS:15-03-2016