open phacts: the data today

Post on 21-Jan-2017

1.134 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The Data TodayAlasdair GrayHeriot-Watt University, Edinburgh, UKA.J.G.Gray@hw.ac.uk@gray_alasdair

2@gray_alasdair Big Data Integration

Dataset Downloaded Version Licence TriplesBio Assay Ontology CC-By 10,360

CALOHA 8 Apr 2015 2014-01-22 CC-By-ND 14,552

ChEBI 4 Mar 2015 125 CC-By-SA 1,012,056

ChEMBL 18 Feb 2015 20.0 CC-By-SA 445,732,880

ConceptWiki 12 Dec 2013 CC-By-SA 4,331,760

DisGeNET 31 Mar 2015 2.1.0 ODbL 15,011,136

Disease Ontology 2015-05-21 CC-By 188,062

DrugBank 19 Feb 2015 4.1 Non-commercial 4,028,767

ENZYME 2015_11 CC-By-ND 61,467

FDA Adverse Events 9 Jul 2012 CC0 13,557,070

Total: ~3 Billion triples

Dataset Downloaded Version Licence TriplesGene Ontology 4 Mar 2015 CC-By 1,366,494

Gene Ontology Annotations 17 Feb 2015 CC-By 879,448,347

NCATS OPDDR Nov 2015 Oct 2015 2,643

neXTProt (NP) 1 Feb 2014 1.0 CC-By-ND 215,006,108

OPS Chemical Registry 4 Nov 2014 CC-By-SA 241,986,722

HMDB 3.6 HMDBMeSH 2015 MeSH

PDB Ligands 2 PDB

OPS Metadata CC-By-SA 2,053

UniProt 2015_11 CC-By-ND 1,131,186,434

WikiPathways 20151118 CC-By 11,781,627

Total: ~3 Billion triples

John Wilbanks consulted for us

A framework built around STANDARD well-understood Creative Commons licences – and how they interoperate

Deal with the problems by:

Interoperable licences

Appropriate terms

Declare expectations to users and

data publishers

One size won‘t fit all requirements

Data Licensing (Or Lack Of!)

Disease

Tissue

Target

Compound

Pathway

STANDARD_TYPE   UNIT_COUNT---------------- -------AC50                  7 Activity         421 EC50                 39 IC50                 46 ID50                 42 Ki                   23 Log IC50             4 Log Ki               7 Potency              11 log IC50             0 

STANDARD_TYPE      STANDARD_UNITS     COUNT(*)------------------ ------------------ --------IC50               nM                   829448 IC50               ug.mL-1               41000 IC50                                     38521 IC50               ug/ml                  2038 IC50               ug ml-1                 509 IC50               mg kg-1                 295 IC50               molar ratio             178 IC50               ug                      117 IC50               %                       113 IC50               uM well-1                52 

~ 100 units>5000 typesImplemented using the Quantities, Units, Dimension, TypesOntology (http://www.qudt.org/)

Quantitative Data Challenges

Quality Assurance

ops:OPS437281

ops:OPS380297 ops:OPS380292

is_stereoisomer_of[ci:CHEMINF_000461]

has_stereoundefined_parent [ci:CHEMINF_000456] Other relationships

• has part• is tautomer of• uncharged counterpart• isotope…

Chemical Registration Service Data

Mappings: Raw

Mappings (Raw)25,087,328

Mappings: Computed

Mappings (Comp)200,000,000+

P12047X31045 P12

047

GB:29384RS_

2353

Andy Law's Third Law“The number of unique identifiers assigned to an individual is never less than the number of Institutions involved in the study”

http://bioinformatics.roslin.ac.uk/lawslaws/

DrugbankChemSpider PubChem

MesylateImatinib MesylateYLMAHDNUQAMNNX-UHFFFAOYSA-N

Are these records the same?It depends upon your task!

skos:exactMatch(InChI)

Strict Relaxed

Analysing Browsing

I need to perform an analysis, give me details of the active compound in Gleevec.

skos:closeMatch(Drug Name)

skos:closeMatch(Drug Name)

skos:exactMatch(InChI)

Strict Relaxed

Analysing Browsing

Which targets are known to interact with Gleevec?

A lens defines a conceptual view over the data Specifies operational equivalence conditions

Consists of:Identifier (URI)Title (dct:title)Description (dct:description)Documentation link (dcat:landingPage)Creator (pav:createdBy)Timestamp (pav:createdOn)Equivalence rules (bdb:linksetJustification)

Scientific Lens

Lenses34 in total

7 Public

25 Chemistry

2 Gene

Data Governance

Contribution must not be underestimated!!!

Alasdair J G GrayA.J.G.Gray@hw.ac.ukwww.macs.hw.ac.uk/~ajg33/@gray_alasdair

Open PHACTScontact@openphacts.orgopenphacts.org@open_phacts

top related