reveald: a user-driven domain-specific interactive search platform for biomedical research

1
Enabling Networked Knowledge PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX granatum: <http://chem.deri.ie/granatum/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> SELECT DISTINCT * WHERE { ?x0_Assay a granatum:Assay ; granatum:hasInput ?x1_Target ; granatum:identify ?x2_ChemopreventiveAgent ; granatum:outcome_method ?x3_outcome_method . ?x1_Target granatum:title ?x4_title . ?x2_ChemopreventiveAgent granatum:molecularWeight ?x10_molecularWeight ; granatum:SMILESnotation ?x9_SMILESnotation ; granatum:hasFormula ?x7_hasFormula ; granatum:HBD ?x5_Hydrogen_Bond_Donors ; granatum:HBA ?x6_Hydrogen_Bond_Acceptors ; granatum:TPSA ?x8_Topological_Polar_Surface_Area . FILTER regex(xsd:string(?x4_title), "estrogen receptor", "is") FILTER ( xsd:double(?x10_molecularWeight) < 300 ) } LIMIT 100 Increasing adoption and usability by the non-technical biomedical researcher. Awareness of which datasets contain the required data and their data model. Heterogeneous biomedical data sources, too dynamic for data centralization. High cognitive entry barrier towards the assembly of SPARQL queries. Human-readable, domain-specific representation of query results is required. Trade-off between expressivity (SPARQL) and usability (NL-Queries). Making the User Experience engaging, while providing quality results. ReVeaLD: A User-driven Domain specific Interactive Search Platform for Biomedical Research Maulik R. Kamdar*, Dimitris Zeginis, Ali Hasnain, Stefan Decker, Helena F. Deus *[email protected] Acknowledgements: This work was funded EU FP7 GRANATUM project, ref. FP7-ICT-2009-6-270139 and Science Foundation Ireland Lion 2 SELECT * WHERE {<ResourceURI> ?p ?o} Results are subjected to a set of Graphic Rules, which follow the Event-Condition-Action paradigm (ECA) and provide visual representations using Fresnel Display Vocabulary. Event: drugbank:targets_844 drugbank:pdbIdPage <Structure_File> (single triple can be multiple) Condition: pdbIdpage (Predicate) + http (Object) Action: HTTP GET and invoke Resource Renderer Resource Renderer: GLMol Molecular Viewer Methods Results Challenges Concept Map Representation of the DSL Visual Query Builder Interface (Single-Entity & Advanced Search) Data Browser Interface (Faceted & Lens-based Data Navigation) Domain specific Visualizations (resource-dependent) Evaluation SPARQL Query Motivation ~5 compounds ~300 000 compounds ~300 interesting compounds ~ 10 interesting compounds Literature Virtual Screening Query databases Hypothesis Generation (Linked) Data Are there Drugs with molecular weight under 400 tested against ‘Colon Cancer’?” Do any Publications refer to assays using ‘Aspirin’ as the primary Drug in treatment of ‘Prostrate Cancer’? Integrative Bioinformatics Domain Specific Language (DSL) Federated Query Engine Graphic Rules CanCO - a concise semantic model consisting of only those concepts and properties which are relevant to the cancer chemoprevention domain SPARQL Query Chebi DrugBank UniProt Others Life Sciences Linked Open Data (LSLOD) Transformed Query Transformed Query Transformed Query Transformed Query chebi:Compound void-ext:subClassOf granatum:Molecule drugbank:Drug void-ext:subClassOf granatum:Molecule RRule Templates LSLOD Catalogue Cataloguing & Links Creation Experimental Datasets 5 Query formulation tasks - single or multiple concept selection from DSL or LSLOD Catalogue Structured on the Tracking Real-time User Experience (TRUE) methodology, popularly used to evaluate user experience in computer games. ReVeaLD Re al-time V isual E xplorer and A ggregator of L inked D ata (http://reveald.info) Usability Hypotheses Evaluated :- Does familiarity of the users with the DSL affect the time needed to formulate the query (intuitive)? Does a constrained DSL (smaller DSL), lead to less time needed for query formulation? Pubchem Chebi Uniprot Assays, which identify potential Chemopreventive Agents with a Molecular Weight less than 300, and which Target Estrogen Receptors

Upload: maulik-kamdar

Post on 14-Jul-2015

111 views

Category:

Health & Medicine


0 download

TRANSCRIPT

Enabling Networked Knowledge

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

PREFIX granatum: <http://chem.deri.ie/granatum/>

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT DISTINCT * WHERE

{

?x0_Assay a granatum:Assay ;

granatum:hasInput ?x1_Target ;

granatum:identify ?x2_ChemopreventiveAgent ;

granatum:outcome_method ?x3_outcome_method .

?x1_Target granatum:title ?x4_title .

?x2_ChemopreventiveAgent

granatum:molecularWeight ?x10_molecularWeight ;

granatum:SMILESnotation ?x9_SMILESnotation ;

granatum:hasFormula ?x7_hasFormula ;

granatum:HBD ?x5_Hydrogen_Bond_Donors ;

granatum:HBA ?x6_Hydrogen_Bond_Acceptors ;

granatum:TPSA ?x8_Topological_Polar_Surface_Area .

FILTER regex(xsd:string(?x4_title), "estrogen receptor", "is")

FILTER ( xsd:double(?x10_molecularWeight) < 300 )

} LIMIT 100

Increasing adoption and usability by the non-technical biomedical researcher.

Awareness of which datasets contain the required data and their data model.

Heterogeneous biomedical data sources, too dynamic for data centralization.

High cognitive entry barrier towards the assembly of SPARQL queries.

Human-readable, domain-specific representation of query results is required.

Trade-off between expressivity (SPARQL) and usability (NL-Queries).

Making the User Experience engaging, while providing quality results.

ReVeaLD: A User-driven Domain specific Interactive

Search Platform for Biomedical ResearchMaulik R. Kamdar*, Dimitris Zeginis, Ali Hasnain, Stefan Decker, Helena F. Deus

*[email protected]

Acknowledgements: This work was funded EU FP7 GRANATUM project, ref. FP7-ICT-2009-6-270139 and Science Foundation Ireland Lion 2

SELECT * WHERE {<ResourceURI> ?p ?o}

Results are subjected to a set of Graphic Rules, which

follow the Event-Condition-Action paradigm (ECA)

and provide visual representations using Fresnel

Display Vocabulary.

Event: drugbank:targets_844 drugbank:pdbIdPage

<Structure_File> (single triple – can be multiple)

Condition: pdbIdpage (Predicate) + http (Object)

Action: HTTP GET and invoke Resource Renderer

Resource Renderer: GLMol Molecular Viewer

Methods

Results

Challenges

Concept Map Representation of the DSL Visual Query Builder Interface (Single-Entity & Advanced Search)

Data Browser Interface (Faceted & Lens-based Data Navigation)Domain specific Visualizations (resource-dependent)

Evaluation

SPARQL Query

Motivation

~5 compounds

~300 000

compounds~300 interesting

compounds

~ 10 interesting

compounds

Lite

ratu

re

Virtu

al S

cre

en

ing

Qu

ery

data

ba

ses

Hypothesis

Generation

(Linked) Data

“Are there Drugs with molecular weight

under 400 tested against ‘Colon Cancer’?”

“Do any Publications refer to assays using ‘Aspirin’ as the

primary Drug in treatment of ‘Prostrate Cancer’?

Integrative

Bioinformatics

Domain Specific Language (DSL) Federated Query Engine Graphic Rules

CanCO - a concise semantic model consisting of

only those concepts and properties which are

relevant to the cancer chemoprevention domain

SPARQL

Query

Chebi DrugBank UniProt Others

Life Sciences Linked Open Data

(LSLOD)

Transformed

QueryTransformed

Query

Transformed

Query

Transformed

Query

chebi:Compound void-ext:subClassOf granatum:Molecule

drugbank:Drug void-ext:subClassOf granatum:Molecule

RRule Templates

LSLOD

Catalogue

Cataloguing &

Links Creation

Experimental

Datasets

5 Query formulation tasks - single or multiple

concept selection from DSL or LSLOD Catalogue

Structured on the Tracking Real-time User

Experience (TRUE) methodology, popularly used

to evaluate user experience in computer games.

ReVeaLD – Real-time Visual Explorer and Aggregator of Linked Data (http://reveald.info)Usability Hypotheses Evaluated :-

Does familiarity of the users with the DSL affect

the time needed to formulate the query (intuitive)?

Does a constrained DSL (smaller DSL), lead to

less time needed for query formulation?

Pubchem

Chebi

Uniprot

Assays, which identify potential

Chemopreventive Agents with

a Molecular Weight less than 300,

and which Target Estrogen Receptors