chunlei wu heart_bd2k_201602_ebi

24
Chunlei Wu, Ph.D. [email protected] @chunleiwu Associate Professor of Molecular Medicine Dept. of Molecular Experimental Medicine The Scripps Research Institute La Jolla, CA, USA 02/23/2016 HeartBD2K PI meeting at EBI Integrated Annotation Services for "BioThings" From MyGene.info and MyVariant.info towards BioThings API

Upload: chunlei-wu

Post on 12-Feb-2017

311 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Chunlei wu heart_bd2k_201602_ebi

Chunlei Wu, [email protected]

@chunleiwu

Associate Professor of Molecular MedicineDept. of Molecular Experimental Medicine

The Scripps Research InstituteLa Jolla, CA, USA

02/23/2016 HeartBD2K PI meeting at EBI

Integrated Annotation Services for "BioThings"

From MyGene.info and MyVariant.info towards BioThings API

Page 2: Chunlei wu heart_bd2k_201602_ebi

Annotations centered around bio-entities

Gene

GVariant

V

Pathway

P

D

Metabolite

M

Disease

Page 3: Chunlei wu heart_bd2k_201602_ebi

So many variant annotation resources

dbNSFP

The Exome Aggregation

Consortium (ExAC)

Page 4: Chunlei wu heart_bd2k_201602_ebi

BioGPS.org – a gene portal for biologists

Page 5: Chunlei wu heart_bd2k_201602_ebi

Schematic view of MyVariant.info architecture

Each data source is updated individually. Colors indicate their different updating schedules.

Page 6: Chunlei wu heart_bd2k_201602_ebi

High-performance web service APIs

Schematic view of MyVariant.info architecture

Page 7: Chunlei wu heart_bd2k_201602_ebi

MyVariant.info for the end users:

http://MyVariant.info(currently v1 API, two endpoints)

http://MyVariant.info/v1/query?q=<query>

any query term(s)

matching variant hits

http://MyVariant.info/v1/variant/<variantid>

hgvs id(s)

matching variant object(s)

Both supports batch-mode via POST

Simple API. No sign-up. No API key.

Try our live API , and documentations

Page 8: Chunlei wu heart_bd2k_201602_ebi

MyGene.info for the end users:

http://MyGene.info(currently v2 API, two endpoints)

http://MyGene.info/v2/query?q=<query>

any query term(s)

matching gene hits

http://MyGene.info/v2/gene/<geneid>

gene id(s)

matching gene object(s)

Both supports batch-mode via POST

Simple API. No sign-up. No API key.

Try our live API , and documentations

Page 9: Chunlei wu heart_bd2k_201602_ebi

MyGene.info usage updates

lastyear

thisyear

2M

3MMonthly hits in Millions

Page 10: Chunlei wu heart_bd2k_201602_ebi

30%9%

35%26%

Increased clients adoption

Requests by MyGene.info clients

Highlights:• mygene Python client usage now surpasses BioGPS usage• mygene R client usage now increased to 9% from <1%

10/07/2015-01/05/2016

Page 11: Chunlei wu heart_bd2k_201602_ebi

30%9%

35%26%

Increased clients adoptionmygene Python client hosted in PyPI

mygene R client hosted in Bioconductor

Page 12: Chunlei wu heart_bd2k_201602_ebi

MyVariant.info updates

Total over 334 Millions of annotated variants

The Exome Aggregation Consortium (ExAC)

New additions:

dbNSFPUpdated:

Page 13: Chunlei wu heart_bd2k_201602_ebi

MyVariant.info updates

30%

68%2%

10/07/2015-01/05/2016

1 Million requests in 3 months

Page 14: Chunlei wu heart_bd2k_201602_ebi

MyVariant.info official Python/R Clients

myvariant Python client hosted in PyPI (initial release in Aug 2015)

myvariant R client hosted in Bioconductor(initial release in Oct 2015)

Page 15: Chunlei wu heart_bd2k_201602_ebi

From our users' love

Page 16: Chunlei wu heart_bd2k_201602_ebi

From our users' love

Page 17: Chunlei wu heart_bd2k_201602_ebi

Next?

MyVariant.info

MyGene.info

Page 18: Chunlei wu heart_bd2k_201602_ebi

Make our APIs serve Linked Data

via

Page 19: Chunlei wu heart_bd2k_201602_ebi

JSON + context = JSON-LD{ "@context": { "clinvar": "http://schema.myvariant.info/datasource/clinvar", "rcv": "http://schema.myvariant.info/datanode/rcv", "gene": "http://schema.myvariant.info/datanode/gene", "_id": "@id" }, "_id": "chr6:g.26093141G>A", "clinvar": { "@context": { "uniprot": "http://identifiers.org/uniprot/", "omim": "http://identifiers.org/omim/" }, "chrom": "6", "alt": "A", "ref": "G", "allele_id": 15048, "rsid": "rs1800562", "rcv": { "@context": { "accession": "http://identifers.org/clinvar" }, "accession": "RCV000000020", "origin": "germline", "clinical_significance": "risk factor" }, "gene": { "@context": { "symbol": "http://identifiers.org/hgnc.symbol/" }, "id": "3077", "symbol": "HFE" }, "omim": "613609.0001", "variant_id": 9 }}

Page 20: Chunlei wu heart_bd2k_201602_ebi

Processed JSON-LD

<chr6:g.26093141G>A> <http://schema.myvariant.info/datasource/clinvar> _:b0 ._:b0 <http://identifiers.org/omim/> "613609.0001" ._:b0 <http://schema.myvariant.info/datanode/gene> _:b1 ._:b0 <http://schema.myvariant.info/datanode/rcv> _:b2 ._:b1 <http://identifiers.org/hgnc.symbol/> "HFE" ._:b2 <http://identifers.org/clinvar> "RCV000000020" .

JSON-LD N-Quads output:

{ "@id": "chr6:g.26093141G>A", "http://schema.myvariant.info/datasource/clinvar": { "http://identifiers.org/omim/": "613609.0001", "http://schema.myvariant.info/datanode/gene": { "http://identifiers.org/hgnc.symbol/": "HFE" }, "http://schema.myvariant.info/datanode/rcv": { "http://identifers.org/clinvar": "RCV000000020" } }}

JSON-LD compacted output:

Page 21: Chunlei wu heart_bd2k_201602_ebi

Linked Data for data aggregation

MyVariant.info

Another Variant API

{ "_id": "chr1:g.196659237C>T", “cosmic": { "tumor_site": "breast", "mut_freq": 0.49, }, "clinvar": {…}, "dbsnp": {…}, …}

{ "pop": "GWD", "nobs": 226, "freq": 0.371681415929, …}

{ "_id": "chr1:g.196659237C>T", “cosmic": { "tumor_site": "breast", "mut_freq": 0.49, }, "clinvar": {…}, "dbsnp": {…}, "new_src": { "pop": "GWD", "nobs": 226, "freq": 0.371681415929 }, …}

Page 22: Chunlei wu heart_bd2k_201602_ebi

BioThings API

MyVariant.info MyGene.info

JSON data aggregation mechanism

High-performance query engine

Well-designed REST API pattern

JSON-LD enabled Linked Data

Data-updating schedulerPython/R clients…

Page 23: Chunlei wu heart_bd2k_201602_ebi

BioThings API

API specs API SDK API registry

BD2K API working group

Page 24: Chunlei wu heart_bd2k_201602_ebi

Acknowledgement

Funding and SupportU54GM114833U01HG008473

Washington U:Ben AinscoughObi Griffith

TSRI:

Andrew SuJiwen XinCyrus AfrasiabiGinger TsuengAdam Mark

Greg StuppTim Putman

STSI:

Eric TopolAli TorkamaniGalina Erikson

U. Washington:

Sean MooneyMoritz JuchlerNikhil Gopal

OICR:Robin Haw

UC Berkeley:Chris Mungall

UCSD:Trish Whetzel

MyVariant.info

MyGene.info