the vision: scientist as knowledge worker

44
Information Management for the Life Sciences M. Scott Marshall Marco Roos Adaptive Information Disclosure University of Amsterdam

Upload: sian

Post on 12-Jan-2016

41 views

Category:

Documents


0 download

DESCRIPTION

Information Management for the Life Sciences M. Scott Marshall Marco Roos Adaptive Information Disclosure University of Amsterdam. The Vision: Scientist as knowledge worker. For Knowledge Workers: Knowledge is the data (i.e. rules, relations, properties, hypotheses, etc.) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Vision:  Scientist as knowledge worker

Information Management for the Life Sciences

M. Scott MarshallMarco Roos

Adaptive Information DisclosureUniversity of Amsterdam

Page 2: The Vision:  Scientist as knowledge worker

The Vision: Scientist as knowledge worker

• For Knowledge Workers: – Knowledge is the data (i.e. rules, relations, properties,

hypotheses, etc.)

• For Today's Biologist: – Numbers, sequences, organisms(!), and images are the data

• Manipulate knowledge instead of data– Find support for relations between concepts instead of

discovering table and column names and numbers.

• In the virtual laboratory, everything is a resource that can be described and manipulated with semantics

Page 3: The Vision:  Scientist as knowledge worker

Vision: Concept-based interfaces

• The scientist should be able to work in terms of commonly used concepts.

• The scientist should be able to work in terms of personal concepts and hypotheses.

- Not be forced to map concepts to the terms that have been chosen for a given application by the application builder.

Page 4: The Vision:  Scientist as knowledge worker

Interface Sketch:Finding a basis for relation

Epigenetic Mechanisms Transcription

Chromatin Transcription Factors

“There is a relation”

Common DomainInstance

s

Classes

Hypothesis

Histone Modification

Transcription Factor Binding Sites

position

Page 5: The Vision:  Scientist as knowledge worker

KSinBIT’06

Biological cartoon as interface

Source: Marco Roos

Page 6: The Vision:  Scientist as knowledge worker

Biology in a nutshell: Bigger isn’t better

• DNA Dogma– Transcription = DNA -> mRNA -> Protein

• Molecular pathways allow biologists to ‘connect’ one process to another.

• Huntington’s mutation mapped in 1993 yet there is still no understanding of the mechanism that causes the neurodegeneration.

• Semantic models are necessary to create a ‘systems view’ of biology.

Page 7: The Vision:  Scientist as knowledge worker

Show Bigger isn’t Better

• Scaling up should be done in small increments but once you’ve reached a certain threshold..

Page 8: The Vision:  Scientist as knowledge worker

What is metadata (in this course)?

• Metadata: data about data• Metadata can be syntactic such as a data type,

e.g. Integer.• Metadata can be semantic such as

chromosome number.• Note: not always ontology, but metadata can

be stored in OWL

Page 9: The Vision:  Scientist as knowledge worker

Common approaches to metadata

• Code it into the GUI or application (in datastructures, object types, etc.)

• Create special tables or fields for it in a relational database

• Map it into substrings of filenames• Mix it in with data in proprietary file formats• Let the user figure it out• Conclusion: There is a need for semantic

disclosure.

Page 10: The Vision:  Scientist as knowledge worker

The Semantic Gap

User ResourcesMiddlewareApplication

Page 11: The Vision:  Scientist as knowledge worker

The Model in the middle

User ResourcesMiddlewareApplication

My Model

Model Model

Page 12: The Vision:  Scientist as knowledge worker

What is knowledge (in this course)

“data”, “information”, “facts”, “knowledge”

Knowledge is a statement that can be tested for truth.

(by a machine)Otherwise, computing can’t add much

Page 13: The Vision:  Scientist as knowledge worker

Resources are shared on the grid

• Shared:– CPU time– network bandwidth– memory– storage space

• But also:– Data– Knowledge: ontologies, rules, vocabularies– Services

Page 14: The Vision:  Scientist as knowledge worker

Abundance of resources in Grid: A Challenge

• Knowledge Sharing– How will we find the relevant resources (data,

services)? – How can we automatically integrate them into an

application?– How will we leverage existing knowledge in my

analysis?– How will we integrate our results as usable data for

a new (computational) experiment?– And link to the evidence (data) for the new

knowledge?

Page 15: The Vision:  Scientist as knowledge worker

Knowledge Capture

• How will we acquire the knowledge?– Literature– Other forms of discourse– Data analysis

• How will we represent and store it?– In Semantic Web formats such as RDF, OWL, RIF

Page 16: The Vision:  Scientist as knowledge worker

Knowledge capture from a computational experiment

Database

Computational experiment

in workflow environment

Database

Database

...

Page 17: The Vision:  Scientist as knowledge worker

What will we do with knowledge?

• How will we use it?– Query it– Reason across it– Integrate it with other data

• Link it up

Page 18: The Vision:  Scientist as knowledge worker

Linked Data Principles

1. Use URIs as names for things.2. Use HTTP URIs so that people can look up those

names.3. When someone looks up a URI, provide useful

RDF information.4. Include RDF statements that link to other URIs so

that they can discover related things.

• Tim Berners-Lee 2007• http://www.w3.org/DesignIssues/LinkedData.html

Page 19: The Vision:  Scientist as knowledge worker

Background of the HCLS IG

• Originally chartered in 2005– Chairs: Eric Neumann and Tonya Hongsermeier

• Re-chartered in 2008– Chairs: Scott Marshall and Susie Stephens– Team contact: Eric Prud’hommeaux

• Broad industry participation– Over 100 members – Mailing list of over 600

• Background Information– http://www.w3.org/2001/sw/hcls/– http://esw.w3.org/topic/HCLSIG

Page 20: The Vision:  Scientist as knowledge worker

Mission of HCLS IG

•The mission of HCLS is to develop, advocate for, and support the use of Semantic Web technologies for

– Biological science– Translational medicine– Health care

•These domains stand to gain tremendous benefit by adoption of Semantic Web technologies, as they depend on the interoperability of information from many domains and processes for efficient decision support

Page 21: The Vision:  Scientist as knowledge worker

Translating across domains

• Translational medicine – use cases that cross domains• Link across domains and research:

– What are the links? • gene – transcription factor – protein• pathway – molecular interaction – chemical

compound• drug – drug side effect – chemical compound

Page 22: The Vision:  Scientist as knowledge worker

Group Activities

• Document use cases to aid individuals in understanding the business and technical benefits of using Semantic Web technologies• Document guidelines to accelerate the adoption of the technology• Implement a selection of the use cases as proof-of-concept demonstrations• Develop high-level vocabularies• Disseminate information about the group’s work at government, industry, and academic events

Page 23: The Vision:  Scientist as knowledge worker

Current Task Forces

• BioRDF – integrated neuroscience knowledge base– Kei Cheung (Yale University)

• Clinical Observations Interoperability – patient recruitment in trials– Vipul Kashyap (Cigna Healthcare)

• Linking Open Drug Data – aggregation of Web-based drug data – Chris Bizer (Free University Berlin)

• Pharma Ontology – high level patient-centric ontology– Christi Denney (Eli Lilly)

• Scientific Discourse – building communities through networking– Tim Clark (Harvard University)

• Terminology – Semantic Web representation of existing resources– John Madden (Duke University)

Page 24: The Vision:  Scientist as knowledge worker

BioRDF Task Force

•Task Lead: Kei Cheung•Participants: M. Scott Marshall, Eric Prud’hommeaux, Susie Stephens, Andrew Su, Steven Larson, Huajun Chen, TN Bhat, Matthias Samwald, Erick Antezana, Rob Frost, Ward Blonde, Holger Stenzhorn, Don Doherty

Page 25: The Vision:  Scientist as knowledge worker

BioRDF: Answering Questions

•Goals: Get answers to questions posed to a body of collective knowledge in an effective way•Knowledge used: Publicly available databases, and text mining•Strategy: Integrate knowledge using careful modeling, exploiting Semantic Web standards and technologies

Page 26: The Vision:  Scientist as knowledge worker

BioRDF: Looking for Targets for Alzheimer’s

• Signal transduction pathways are considered to be rich in “druggable” targets • CA1 Pyramidal Neurons are known to be particularly damaged in Alzheimer’s disease• Casting a wide net, can we find candidate genes known to be involved in signal transduction and active in Pyramidal Neurons?

Source: Alan Ruttenberg

Page 27: The Vision:  Scientist as knowledge worker

NeuronDB

BAMS

Literature

Homologene

SWAN

Entrez Gene

Gene Ontology

Mammalian Phenotype

PDSPki

BrainPharm

AlzGene

Antibodies

PubChem

MESH

Reactome

Allen Brain Atlas

BioRDF: Integrating Heterogeneous Data

Source: Susie Stephens

Page 28: The Vision:  Scientist as knowledge worker

BioRDF: SPARQL Query

Source: Alan Ruttenberg

Page 29: The Vision:  Scientist as knowledge worker

BioRDF: Results: Genes, Processes

•DRD1, 1812 adenylate cyclase activation•ADRB2, 154 adenylate cyclase activation•ADRB2, 154 arrestin mediated desensitization of G-protein coupled receptor protein signaling pathway•DRD1IP, 50632 dopamine receptor signaling pathway•DRD1, 1812 dopamine receptor, adenylate cyclase activating pathway•DRD2, 1813 dopamine receptor, adenylate cyclase inhibiting pathway•GRM7, 2917 G-protein coupled receptor protein signaling pathway•GNG3, 2785 G-protein coupled receptor protein signaling pathway•GNG12, 55970 G-protein coupled receptor protein signaling pathway•DRD2, 1813 G-protein coupled receptor protein signaling pathway•ADRB2, 154 G-protein coupled receptor protein signaling pathway•CALM3, 808 G-protein coupled receptor protein signaling pathway•HTR2A, 3356 G-protein coupled receptor protein signaling pathway•DRD1, 1812 G-protein signaling, coupled to cyclic nucleotide second messenger•SSTR5, 6755 G-protein signaling, coupled to cyclic nucleotide second messenger•MTNR1A, 4543 G-protein signaling, coupled to cyclic nucleotide second messenger•CNR2, 1269 G-protein signaling, coupled to cyclic nucleotide second messenger•HTR6, 3362 G-protein signaling, coupled to cyclic nucleotide second messenger•GRIK2, 2898 glutamate signaling pathway•GRIN1, 2902 glutamate signaling pathway•GRIN2A, 2903 glutamate signaling pathway•GRIN2B, 2904 glutamate signaling pathway•ADAM10, 102 integrin-mediated signaling pathway•GRM7, 2917 negative regulation of adenylate cyclase activity•LRP1, 4035 negative regulation of Wnt receptor signaling pathway•ADAM10, 102 Notch receptor processing•ASCL1, 429 Notch signaling pathway•HTR2A, 3356 serotonin receptor signaling pathway•ADRB2, 154 transmembrane receptor protein tyrosine kinase activation (dimerization)•PTPRG, 5793 ransmembrane receptor protein tyrosine kinase signaling pathway•EPHA4, 2043 transmembrane receptor protein tyrosine kinase signaling pathway•NRTN, 4902 transmembrane receptor protein tyrosine kinase signaling pathway•CTNND1, 1500 Wnt receptor signaling pathway

Many of the genes are related to AD through gamma

secretase (presenilin) activity

Source: Alan Ruttenberg

Page 30: The Vision:  Scientist as knowledge worker

Linking Open Drug Data

• HCLSIG task started October 1st, 2008

• Primary Objectives

• Survey publicly available data sets about drugs

• Explore interesting questions from pharma, physicians and patients that could be answered with Linked Data

• Publish and interlink these data sets on the Web

• Participants: Bosse Andersson, Chris Bizer, Kei Cheung, Don Doherty, Oktie Hassanzadeh, Anja Jentzsch, Scott Marshall, Eric Prud’hommeaux, Matthias Samwald, Susie Stephens, Jun Zhao

Page 31: The Vision:  Scientist as knowledge worker

The Classic Web

B C

HTML HTMLHTML

Web Browsers

Search Engines

hyper-links

• Single information space• Built on URIs

– globally unique IDs– retrieval mechanism

• Built on Hyperlinks– are the glue that holds

everything together

A

hyper-links

Source: Chris Bizer

Page 32: The Vision:  Scientist as knowledge worker

Linked Data

B C

Thing

typedlinks

A D E

typedlinks

typedlinks

typedlinks

Thing

Thing

Thing

Thing

Thing Thing

Thing

Thing

Thing

Search Engines

Linked DataMashups

Linked DataBrowsers

Use Semantic Web technologies to publish structured data on the Web and set links between data from one data source and data from another data sources

Source: Chris Bizer

Page 33: The Vision:  Scientist as knowledge worker

Data Objects Identified with HTTP URIs

pd:cygri

Richard Cyganiak

dbpedia:Berlin

foaf:name

foaf:based_near

foaf:Personrdf:type

pd:cygri = http://richard.cyganiak.de/foaf.rdf#cygri

dbpedia:Berlin = http://dbpedia.org/resource/Berlin

Forms an RDF link between two data sources

Source: Chris Bizer

Page 34: The Vision:  Scientist as knowledge worker

Dereferencing URIs over the Web

dp:Cities_in_Germany

3.405.259dp:population

skos:subject

Richard Cyganiak

dbpedia:Berlin

foaf:name

foaf:based_near

foaf:Personrdf:type

pd:cygri

Source: Chris Bizer

Page 35: The Vision:  Scientist as knowledge worker

Dereferencing URIs over the Web

dp:Cities_in_Germany

3.405.259dp:population

skos:subject

Richard Cyganiak

dbpedia:Berlin

foaf:name

foaf:based_near

foaf:Personrdf:type

pd:cygri

skos:subject

skos:subject

dbpedia:Hamburg

dbpedia:Meunchen

Source: Chris Bizer

Page 36: The Vision:  Scientist as knowledge worker

LODD Data Sets

Source: Anja Jentzsch

Page 37: The Vision:  Scientist as knowledge worker

LODD in Marbles

Source: Anja Jentzsch

Page 38: The Vision:  Scientist as knowledge worker

The Linked Data Cloud

Source: Chris Bizer

Page 39: The Vision:  Scientist as knowledge worker

Accomplishments

• Technical – HCLS KB hosted at 2 institutes– Linked Open Data contributions– Demonstrator of querying across heterogeneous EHR systems– Integration of SWAN and SIOC ontologies for Scientific Discourse

• Outreach– Conference Presentations and Workshops:

• Bio-IT World, WWW, ISMB, AMIA, C-SHALS, etc.– Publications:

• Proceedings of LOD Workshop at WWW 2009: Enabling Tailored Therapeutics with Linked Data• Proceedings of the ICBO: Pharma Ontology: Creating a Patient-Centric Ontology for Translational

Medicine• AMIA Spring Symposium: Clinical Observations Interoperability: A Semantic Web Approach • BMC Bioinformatics. A Journey to Semantic Web Query Federation in Life Sciences• Briefings in Bioinformatics. Life sciences on the Semantic Web: The Neurocommons and Beyond

Page 40: The Vision:  Scientist as knowledge worker

New Technologies

• SPARQL-DL• Semantic Wiki (integration with KB’s)• Cloud Computing (e.g. Amazon)• Query rewriting: SPARQL -> SQL

– Legacy integration– Improve interfaces

• FeDeRate: Federated query

Page 41: The Vision:  Scientist as knowledge worker

We’ve come a long way

• Triplestores have gone from millions to billions• Linked Open Data cloud• http://lod.openlinksw.com/• On demand Knowledge Bases: Amazon’s EC2• Terminologies: SNOMED-CT, MeSH, UMLS, .. • Neurocommons, Flyweb, Biogateway, Bio2RDF, Linked Life Data, ..

Page 42: The Vision:  Scientist as knowledge worker

Penetrance of ontology in biology

• OBO Foundry - http://www.obofoundry.org • BioPortal - http://bioportal.bioontology.org • National Centers for Biomedical Computing

http://www.ncbcs.org/ • Shared Names• Concept Web Alliance• Semantic Web Interest Group PRISM Forum• Work packages in ELIXIR

Page 43: The Vision:  Scientist as knowledge worker

Recipe for a Semantic Web

• Follow Linked Open Data principles

• Attempt to use Shared Names (same URI’s)

• Query rewriting to map from: – SPARQL -> (query language)– SPARQL (term1) -> SPARQL (term2)

• Add federated query support to SPARQL engine implementations

Page 44: The Vision:  Scientist as knowledge worker

The End

“Science is built up of facts, as a house is built of stones; but an accumulation of facts is no more a science than a heap of stones is a house.”

– Henri Poincaré, Science and Hypothesis, 1905