the scientific method on the semantic web

129
SADI, SHARE and the Scientific Method The Quest for the Holy Grail

Upload: mark-wilkinson

Post on 16-Dec-2014

654 views

Category:

Technology


5 download

DESCRIPTION

Presentation to the iCAPTURE Center, Heart + Lung Institute at St. Paul's Hospital

TRANSCRIPT

Page 1: The Scientific Method on the Semantic Web

SADI, SHARE and the Scientific Method

The Quest for the Holy Grail

Page 2: The Scientific Method on the Semantic Web

The Problem

Page 3: The Scientific Method on the Semantic Web

The Problem

Page 4: The Scientific Method on the Semantic Web

The Holy Grail:(this slide created circa 2002)

Align the promoters of all serine threonine kinases involved exclusively in the regulation of cell sorting during wound healing in blood vessels.

Retrieve and align 2000nt 5' from every serine/threonine kinase in Mus musculus expressed exclusively in the tunica [I | M |A] whose expression increases 5X or more within 5 hours of wounding but is not activated during the normal development of blood vessels, and is <40% homologous in the active site to kinases known to be involved in cell-cycle regulation in any other species.

Page 5: The Scientific Method on the Semantic Web

Two novel technologies

developed in our lab

are getting us very close to the Holy Grail!

Page 6: The Scientific Method on the Semantic Web

Holy Grail Demo #1

Page 7: The Scientific Method on the Semantic Web

Imagine there is a “virtual database” containing all of the data from all of the databases,together with the output of

every conceivable analysis

Page 8: The Scientific Method on the Semantic Web

How do we query that database?

Page 9: The Scientific Method on the Semantic Web

A Brief Digression…

Page 10: The Scientific Method on the Semantic Web

“Database”

Page 11: The Scientific Method on the Semantic Web
Page 12: The Scientific Method on the Semantic Web
Page 13: The Scientific Method on the Semantic Web
Page 14: The Scientific Method on the Semantic Web

?

Page 15: The Scientific Method on the Semantic Web

Boxes became ovals…

Straight lines became curvy lines…

Page 16: The Scientific Method on the Semantic Web

Boxes became ovals…

Straight lines became curvy lines…

…and you want us to give you a grant for THAT??

Page 17: The Scientific Method on the Semantic Web

Relational Database

“Graph”

Page 18: The Scientific Method on the Semantic Web

Protein Table-----------------------

Protein IndexProtein NameRegulates ID

Gene Table-----------------------

Gene IDTissue IDType ID

http://pdb.org/114487 http://ncbi.nlm/NR/NR_14487

isRepressor

Of

Page 19: The Scientific Method on the Semantic Web

Protein Table-----------------------

Protein IndexProtein NameRegulates ID

Gene Table-----------------------

Gene IDTissue IDType ID

“Foreign keys” are used to link tables in a database

http://pdb.org/114487 http://ncbi.nlm/NR/NR_14487

isRepressor

Of

Page 20: The Scientific Method on the Semantic Web

Links in Graphs consist of statements called

“TRIPLES”

Protein Table-----------------------

Protein IndexProtein NameRegulates ID

Gene Table-----------------------

Gene IDTissue IDType ID

http://pdb.org/114487 http://ncbi.nlm/NR/NR_14487

isRepressor

Of

Page 21: The Scientific Method on the Semantic Web

Protein Table-----------------------

Protein IndexProtein NameRegulates ID

Gene Table-----------------------

Gene IDTissue IDType ID

Both Data Sources are on the Same Machine

http://pdb.org/114487 http://ncbi.nlm/NR/NR_14487

isRepressor

Of

Page 22: The Scientific Method on the Semantic Web

Graph Data Sources (may be) on Independent Machines on the Web

Protein Table-----------------------

Protein IndexProtein NameRegulates ID

Gene Table-----------------------

Gene IDTissue IDType ID

http://pdb.org/114487 http://ncbi.nlm/NR/NR_14487

isRepressor

Of

Page 23: The Scientific Method on the Semantic Web

Protein Table-----------------------

Protein IndexProtein NameRegulates ID

Gene Table-----------------------

Gene IDTissue IDType ID

“Meaning” of the connection between data-points is understood

only by the database administrator

http://pdb.org/114487 http://ncbi.nlm/NR/NR_14487

isRepressor

Of

Protein regulates

Gene

Page 24: The Scientific Method on the Semantic Web

“Meaning” of the connection in a Graph is explicitly labeled(and machine-readable!)

Protein Table-----------------------

Protein IndexProtein NameRegulates ID

Gene Table-----------------------

Gene IDTissue IDType ID

http://pdb.org/114487 http://ncbi.nlm/NR/NR_14487

isRepressor

Of

Page 25: The Scientific Method on the Semantic Web

Connect all of the graphs in the world to one another

And what do you get?

Page 26: The Scientific Method on the Semantic Web

Mark Butler (2003) Is the semantic web hype? Hewlett Packard laboratories presentation at MMU, 2003-03-12

Page 27: The Scientific Method on the Semantic Web

The lavender portion represents biology – currently ~40,000,000,000 Triples(we and our collaborators will be doubling that number in the next 12 months)

Page 28: The Scientific Method on the Semantic Web

How do you find information on this

“Semantic Web”

??

Page 29: The Scientific Method on the Semantic Web

SPARQL

The query language used to discover and extract information represented in Graphs

Page 30: The Scientific Method on the Semantic Web

SPARQL

Unfortunately, YOU have to know which Web resources contain which Triples

(HARD!)

Even if you do know this, SPARQL has significant limitations when attempting to

query over disparate Graphs(SLOW AND CUMBERSOME)

Page 31: The Scientific Method on the Semantic Web

SPARQL

If the data doesn’t existin any Graph at all…

Page 32: The Scientific Method on the Semantic Web
Page 33: The Scientific Method on the Semantic Web

Basically…

A novel way of making Triples available on the Semantic Web, using a technology called Web Services

“Services” for short

Page 34: The Scientific Method on the Semantic Web

Basically…

We invented SADI to overcome some/all of these problems

…but I wont bore you with the technical details…

Page 35: The Scientific Method on the Semantic Web

Detour EndsPlease resume speed

Page 36: The Scientific Method on the Semantic Web

Imagine there is a “virtual database” containing all of the data from all of the databases,together with the output of

every conceivable analysis

Holy Grail Demo #1

How do we query that database?

Page 37: The Scientific Method on the Semantic Web

SHARESemantic Health And Research Environment

SPARQL enhanced by SADI

Page 38: The Scientific Method on the Semantic Web

A Novel SPARQL Query Engine

Overcomes some of the limitations of traditional SPARQL query-handlers

Page 39: The Scientific Method on the Semantic Web

A Novel SPARQL Query Engine

Overcomes some of the limitations of traditional SPARQL query-handlers

…and more…

Page 40: The Scientific Method on the Semantic Web

A Novel SPARQL Query Engine

Overcomes some of the limitations of traditional SPARQL query-handlers

…and more…

MUCH more!!

Page 41: The Scientific Method on the Semantic Web

What pathways does UniProt protein P47989 belong to?

PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>PREFIX ont: <http://ontology.dumontierlab.com/>PREFIX uniprot: <http://lsrn.org/UniProt:>SELECT ?gene ?pathway WHERE {

uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway .

}

Page 42: The Scientific Method on the Semantic Web

What pathways does UniProt protein P47989 belong to?

PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>PREFIX ont: <http://ontology.dumontierlab.com/>PREFIX uniprot: <http://lsrn.org/UniProt:>SELECT ?gene ?pathway WHERE {

uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway .

}

Page 43: The Scientific Method on the Semantic Web

What pathways does UniProt protein P47989 belong to?

PREFIX pred: <http://sadiframework.org/ontologies/predicates.owl#>PREFIX ont: <http://ontology.dumontierlab.com/>PREFIX uniprot: <http://lsrn.org/UniProt:>SELECT ?gene ?pathway WHERE {

uniprot:P47989 pred:isEncodedBy ?gene . ?gene ont:isParticipantIn ?pathway .

}

Note that there is no “From” clause… I have neglected to tell the system where to look for the answer, I am simply asking my question

Page 44: The Scientific Method on the Semantic Web

Now stick that query into SHARE

Page 45: The Scientific Method on the Semantic Web
Page 46: The Scientific Method on the Semantic Web
Page 47: The Scientific Method on the Semantic Web

Recapwhat we just saw

A standard SPARQL query was entered into SHARE, a SADI-aware query engine

Page 48: The Scientific Method on the Semantic Web

Recapwhat we just saw

The query was interpreted to extract the individual data/relationships being

requested

(and any component/sub-properties, as we shall see later!)

Page 49: The Scientific Method on the Semantic Web

Recapwhat we just saw

The “triple-patterns” required to answer the query are passed to SADI for

Web Service discovery

Page 50: The Scientific Method on the Semantic Web

Recapwhat we just saw

Services capable of generating those triple-patterns are automatically executed,

the triples are stored, and the query is resolved.

Page 51: The Scientific Method on the Semantic Web

Recapwhat we just saw

We posed, and answered a ~complex database query

WITHOUT A DATABASE

(in fact, the data didn’t even have to exist...)

Page 52: The Scientific Method on the Semantic Web

Holy Grail Demo #1

Align the promoters of all serine threonine kinases involved exclusively in the regulation of cell sorting during wound healing in blood vessels.

Retrieve and align 2000nt 5' from every serine/threonine kinase in Mus musculus expressed exclusively in the tunica [I | M |A] whose expression increases 5X or more within 5 hours of wounding but is not activated during the normal development of blood vessels, and is <40% homologous in the active site to kinases known to be involved in cell-cycle regulation in any other species.

Page 53: The Scientific Method on the Semantic Web

Holy Grail Demo #2

Page 54: The Scientific Method on the Semantic Web

Show me the latest Blood Urea Nitrogen and Creatinine levelsof patients who appear to be rejecting their transplants

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#> PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#> SELECT ?patient ?bun ?creatFROM <http://sadiframework.org/ontologies/patients.rdf>WHERE {

?patient rdf:type patient:LikelyRejecter .?patient l:latestBUN ?bun . ?patient l:latestCreatinine ?creat .

}

Page 55: The Scientific Method on the Semantic Web

Likely Rejecter:

A patient who has creatinine levelsthat are increasing over time

- - Wilkinson MD

Page 56: The Scientific Method on the Semantic Web

Likely Rejecter:

…but there is no “likely rejecter” column or table in our database…

only blood chemistry measurementsat various time-points

Page 57: The Scientific Method on the Semantic Web

?

Page 58: The Scientific Method on the Semantic Web

The definition of a LikelyRejecter is encoded in a machine-readable document written in the OWL language (“Ontology”)

“the regression line over creatinine measurements should have an increasing slope”

Page 59: The Scientific Method on the Semantic Web

The machine continues to burrow down through the definition and discovers that regression lines have things like slopes and intercepts, etc…

Page 60: The Scientific Method on the Semantic Web

Then…

Two magical events occur…

Page 61: The Scientific Method on the Semantic Web

The machine figures out

by itself

the need to do a Linear Regression analysis

in order to answer your question

Page 62: The Scientific Method on the Semantic Web

The machine figures out

by itself

how and where that analysiscan be done

and does it automatically!

Page 63: The Scientific Method on the Semantic Web

http://www.impactlab.net/2009/03/22/improve-your-brain-power/

Page 64: The Scientific Method on the Semantic Web

The SHARE system utilizes SADI to discover analytical services on the Web that do linear regression analysis

Page 65: The Scientific Method on the Semantic Web

VOILA!

Page 66: The Scientific Method on the Semantic Web

How do we do that?!?

We let the data describe itself!

This is a different frommost of the bioinformatics world,

where the person giving you the data also tells you how to interpret it

Page 67: The Scientific Method on the Semantic Web

Data exhibits “late binding”

Page 68: The Scientific Method on the Semantic Web
Page 69: The Scientific Method on the Semantic Web

Late binding:

“purpose and meaning”of the data is

not determined untilthe moment it is required

Page 70: The Scientific Method on the Semantic Web

Benefitof late binding

Data is amenable toconstant re-interpretation

Page 71: The Scientific Method on the Semantic Web

Example?

Blood Creatinine measurements

were not dictated to be (only)

Blood Creatinine measurements!

Page 72: The Scientific Method on the Semantic Web

Example?

The data had the ‘qualities/properties’ that

allowed the machine to infer

that they were Blood Creatinine measurements

Page 73: The Scientific Method on the Semantic Web

Example?

But the data also had the ‘qualities/properties’ that

allowed them to be interpreted as

X/Y coordinate data by another Service

Page 74: The Scientific Method on the Semantic Web

http://www.flickr.com/people/faernworks/

Page 75: The Scientific Method on the Semantic Web

Holy Grail Demo #2

Align the promoters of all serine threonine kinases involved exclusively in the regulation of cell sorting during wound healing in blood vessels.

Retrieve and align 2000nt 5' from every serine/threonine kinase in Mus musculus expressed exclusively in the tunica [I | M |A] whose expression increases 5X or more within 5 hours of wounding but is not activated during the normal development of blood vessels, and is <40% homologous in the active site to kinases known to be involved in cell-cycle regulation in any other species.

Page 76: The Scientific Method on the Semantic Web

The Holy Grail may not yet be in-handbut we can at least see it from here!

So… now what?

Page 77: The Scientific Method on the Semantic Web

Mark’s Manifesto

What is my next “Holy Grail”?

Page 78: The Scientific Method on the Semantic Web

Science

Support for the in silico Scientific Method

Page 79: The Scientific Method on the Semantic Web

Reproducibility

Clarity (hypothesis)

Discourse

Disagreement

Clarity (experiment)

Page 80: The Scientific Method on the Semantic Web

The Scientific Method

Discourse: What do you believe? What do I believe?

Disagreement: You’re wrong! And I’m gonna prove it!

Clarity: This is the experiment I am going to do

Reproducibility: This is how I did it (“provenance”)

Clarity: This is my new hypothesis

Page 81: The Scientific Method on the Semantic Web

The Scientific Method

Discourse: What do you believe? What do I believe?

Disagreement: You’re wrong! And I’m gonna prove it!

Clarity: This is the experiment I am going to do

Reproducibility: This is how I did it (“provenance”)

Clarity: This is my new hypothesis

Workflows (e.g. myExperiment)

Page 82: The Scientific Method on the Semantic Web

Another Brief Digression…

Page 83: The Scientific Method on the Semantic Web

“Facebook” for Scientists

http://myexperiment.org

Page 84: The Scientific Method on the Semantic Web

An exciting evolution in the way Researchers express and share

their in silico “Materials and Methods”

Through things called ‘Workflows’

Page 85: The Scientific Method on the Semantic Web
Page 86: The Scientific Method on the Semantic Web

Workflows are explicit representationsof the method by which an analysis was done

and which resources are used to do it

Page 87: The Scientific Method on the Semantic Web

Workflows can be very simple…

“Blast this sequence”

Page 88: The Scientific Method on the Semantic Web

Or not...

This workflow takes in a CEL file and a normalisation method then returns a series of images/graphs which represent the same output obtained using the MADAT software package (MicroArray Data Analysis Tool)

Also returned by this workflow are a list of the top differentially expressed genes (size dependant on the number specified as input - geneNumber), which are then used to find the candidate pathways which may be influencing the observed changes in the microarray data.

Page 89: The Scientific Method on the Semantic Web

Why bother?

Page 90: The Scientific Method on the Semantic Web

A workbench for designing and executingScientific Workflows

Taverna

Page 91: The Scientific Method on the Semantic Web
Page 92: The Scientific Method on the Semantic Web

Load-up your data and press “play”!

…Then go home for the weekend! You are just one click away from your M.Sc.!!

Page 93: The Scientific Method on the Semantic Web

By the by…

The SHARE application automatically creates a Workflow and then automatically runs it.

This is where the data comes from to answer the queries…

Workflows are a Good Thing™

Page 94: The Scientific Method on the Semantic Web

Detour EndsPlease resume speed

Page 95: The Scientific Method on the Semantic Web

WORKFLOWSReproducibility

Clarity (hypothesis)

Discourse

Disagreement

Clarity (experiment)

Page 96: The Scientific Method on the Semantic Web

Reproducibility

Clarity (hypothesis)

Discourse

Disagreement

Clarity (experiment)

Page 97: The Scientific Method on the Semantic Web

At the moment the Semantic Web in Healthcare

and Life Sciencesaddresses these issues by attempting to create

“consensus”

Page 98: The Scientific Method on the Semantic Web

Large, centralized ontologies (e.g. the Gene Ontology)that claim to represent community agreement about “biological reality”

Page 99: The Scientific Method on the Semantic Web

…is that Science?

Page 100: The Scientific Method on the Semantic Web

Reproducibility

Clarity (hypothesis)

Discourse

Disagreement

Clarity (experiment)

Page 101: The Scientific Method on the Semantic Web

Reproducibility

Clarity (hypothesis)

Ontology Consortia

Disagreement

Clarity (experiment)

Page 102: The Scientific Method on the Semantic Web

Reproducibility

Clarity (hypothesis)

Ontology Consortia

Consensus

Clarity (experiment)

Page 103: The Scientific Method on the Semantic Web

Reproducibility

????

Ontology Consortia

Consensus

Clarity (experiment)

Page 104: The Scientific Method on the Semantic Web

To restore the “traditions of Science”

to in silico science

The Semantic Web needs to encourage/facilitate

personal opinion and debate

Page 105: The Scientific Method on the Semantic Web

What has this got to do with SADI and SHARE?

Page 106: The Scientific Method on the Semantic Web

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX patient: <http://sadiframework.org/ontologies/patients.owl#> PREFIX l: <http://sadiframework.org/ontologies/predicates.owl#> SELECT ?patient ?bun ?creatFROM <http://sadiframework.org/ontologies/patients.rdf>WHERE {

?patient rdf:type patient:LikelyRejecter .?patient l:latestBUN ?bun . ?patient l:latestCreatinine ?creat .

}

Page 107: The Scientific Method on the Semantic Web

Likely Rejecter

Page 108: The Scientific Method on the Semantic Web

I created a small ontology describing my definition of

a Likely Rejecter

Page 109: The Scientific Method on the Semantic Web

… it was MY ontology!

Page 110: The Scientific Method on the Semantic Web

I can re-use it

Page 111: The Scientific Method on the Semantic Web

I can modify it as I change my world-view

Page 112: The Scientific Method on the Semantic Web

Reproducibility

Clarity (hypothesis)

Discourse

Disagreement

Clarity (experiment)

I can publish it for others to use

Page 113: The Scientific Method on the Semantic Web

Reproducibility

Clarity (hypothesis)

Discourse

Disagreement

Clarity (experiment)Others can modify it and/or

compare it to THEIR world-view

Page 114: The Scientific Method on the Semantic Web

Reproducibility

Clarity (hypothesis)

Discourse

Disagreement

Clarity (experiment)

Sharing my ontology gives opportunities for “micro-attribution”

“Credit” to me is automatic when someone uses my ontology in their ontology/query

Page 115: The Scientific Method on the Semantic Web

Using SADI and SHAREmy personal world-view is

explicitly expressed and can bedynamically evaluated against

global data and knowledge

Page 116: The Scientific Method on the Semantic Web

http://www.dailymail.co.uk/femail/article-488234/Friends-dignity-self-respect---weight-wasnt-I-lost-slimming-club.html

Page 117: The Scientific Method on the Semantic Web

…but there’s more…

Page 118: The Scientific Method on the Semantic Web

“Likely Rejecter”

Page 119: The Scientific Method on the Semantic Web

I made that up! It came out of my head!

Page 120: The Scientific Method on the Semantic Web

What’s another word for a world-view that you make-up?

Hypothesis

Page 121: The Scientific Method on the Semantic Web

Reproducibility

Hypotheses

Discourse

Disagreement

Clarity (experiment)The “Likely Rejecter” OWL Classis an explicitly-expressed hypothesis;

Members of that class may or may not exist!

Page 122: The Scientific Method on the Semantic Web

Reproducibility

Hypotheses

Discourse

Disagreement

Experiment

Page 123: The Scientific Method on the Semantic Web
Page 124: The Scientific Method on the Semantic Web

Ontologically-expressed Hypotheses drive the discovery, assembly, and analysis of data capable of evaluating their validity

Blood Pressure

Hypertension

Ischemia

Hypothesis

Database 1 Database 2

SADI+

SHARE

Analytical Algorithm

Page 125: The Scientific Method on the Semantic Web

Join us!

SADI and CardioSHARE are Open-Source projects

Come join us – we’re having a lot of fun!!

http://sadiframework.org

Page 126: The Scientific Method on the Semantic Web

C r e d i t s

B e n j a m i n V a n d e r V a l k ( S H A R E & S A D I )

L u k e M c C a r t h y ( S A D I , S H A R E , T a v e r n a , C a r d i o S H A R E )

S o r o u s h S a m a d i a n ( C a r d i o S H A R E )

D a v i d W i t h e r s( T a v e r n a )

E d w a r d K a w a s ( S A D I S e r v i c e a u t o - g e n e r a t o r )

Page 127: The Scientific Method on the Semantic Web

U o f N e w B r u n s w i c k

D r. C h r i s B a k e rA l e x a n d r e R i a z a n o v

C a r l e t o n U n i v e r s i t yD r. M i c h e l D u m o n t i e rM a r c - A l e x a n d r e N o l i nL e o n i d C h e p e l e vS t e v e E t l i n g e rN i c h a e l l a K i e t hJ o s e C r u z

Page 128: The Scientific Method on the Semantic Web

Microsoft Research

Page 129: The Scientific Method on the Semantic Web

Fin

This presentation available on SlideShare: keywords ‘wilkinson’ ‘iCAPTURE’ ‘HLI’