Download - Citizen Science and Rare Disease Research
Citizen Science and Rare Disease Research
Andrew Su, Ph.D.@andrewsu
[email protected]://sulab.org
September 22, 2016
Personalized Health in the Digital Age Symposium
Slides: slideshare.net/andrewsu
12
NGLY1(11 PubMed articles)
Congenital disorders of glycosylation
(822)
PNGase(686)
ERAD(1330)
glycosylation(48,862)
alacrima(164)
Genetic interactors
(3016)
symptoms(109,928)
25 million articles in PubMed
The biomedical literature is massive…13
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
Number of new PubMed-indexed articles
… but it is very hard to query and compute15
ImatinibCrizotinibErlotinibGefitinibSorafenibLapatinibDasatinib
…
Acute myeloid leukemiaAcute lymphoblastic leukemia
Chronic myelogenous leukemiaChronic lymphocytic leukemia
Hodgkin lymphomaNon-Hodgkin lymphoma
Myeloma…
AND
16
Personalized medicine relies on effective
Pie
tro B
ellin
iht
tps:
//flic
.kr/p
/k5j
mja
KNOWELDGE MANAGEMENT
Information extraction from biomedical text17
1. Identify biomedical concepts in text
… We report a case of familial systemic mastocytosis with the rare KIT K509I germ line mutation. In vitro treatment with imatinib, dasatinib and PKC412 reduced cell viability of primary mast cells harboring KIT K509I mutation. Both patients with familial systemic mastocytosis had remarkable hematological and skin improvement after three months of imatinib treatment.
Leuk Res. 2014 Oct;38(10):1245-51. doi: 10.1016/j.leukres.
GENES
DISEASES
DRUGS
VARIANTS
Information extraction from biomedical text18
imatinib
dasatinib
PKC412
Familial systemic mastocytosis
KIT
K509I
1. Identify biomedical concepts in text
2. Identify relationships between concepts
Mutation of
Mutation causes
causes
treats
inhibits
19
Goal: Assemble a network of biomedical knowledge that is comprehensive, current, computable and traceable.
is to data
is to text
biomedicalProvide a database of the world’s knowledge that anyone can edit
- Denny Vrandečić
Subclass of
Regulates
Physically interacts with
Protein
Neural development
Property:P279
Property:P128
Property:P129
Q8054
Q1345738
VLDL receptor Q1979313
Amyloid beta A4 Q423510
Q13561329
http
://w
ww
.wik
idat
a.or
g/w
iki/Q
1356
1329
Decreased expression in
Property:P1910Schizophrenia Q41112
Bipolar disorder Q131755
Property:P279
Property:P128
Property:P129
Q8054
Q1345738
Q1979313
Q423510
Q13561329
http
s://
ww
w.w
ikid
ata.
org/
w/a
pi.p
hp?a
ctio
n=w
bget
entit
ies&
ids=
Q13
5613
29&
form
at=j
son
Property:P1910Q41112
Q131755
We are seeding it with biomedical data
• All human, mouse genes and proteins • All Gene Ontology terms• All FDA approved drugs • 9,000+ human diseases• 120 reference microbial genomes
Burgstaller et al (2016) Database (preprint in BioRxiv)Mitraka et al (2015) Semantic Web Applications for the Life Sciences (best paper) (preprint in BioRxiv)
Putman et al (2016) Database (preprint in BioRxiv)
Inter-item links form a giant knowledge graph
Everything is connectedReelin, Heart disease, Barack Obama, everything..
https://query.wikidata.orgSPARQL endpoint for Wikidata
Question: Can a group of non-scientists collectively perform concept recognition in biomedical texts?
29
30
Experts versus crowd for concept identification
593 PubMed abstracts
6,900 mentions of “disease concepts”
F = 0.87F = 0.78
$$$
31
Experts versus crowd for concept identification
593 PubMed abstracts
6,900 mentions of “disease concepts”
F = 0.87F = 0.87
$$$
• 9 days• 145 workers• Total: $630.96
33
Paid crowdsourcing
• F = 0.84• 28 days• 212 workers• Total cost: $0
$$$
• F = 0.87• 9 days• 145 workers• Total: $630.96
“Help science, please”
Citizen Science
36
A preliminary view of the NGLY1-focused biological network
1,200 contributors3,200 documents 787,400 annotations
37
Personalized medicine relies on effective
Pie
tro B
ellin
iht
tps:
//flic
.kr/p
/k5j
mja
KNOWELDGE MANAGEMENT
38
If I have seen further than others, it is by standing on the shoulders of giants.
- Sir Isaac Newton
39
Jake BruggemanKarthik GRamya Gamini
Louis GioiaToby LiGreg Stupp
Other group members
Funding and Support
BioGPS: GM83924Gene Wiki: GM089820BD2K COE: GM114833
Contacthttp://sulab.org
[email protected]@andrewsu
Mark2CureJennifer Fouquier
Max NanisGinger Tsueng
AMT volunteers and Mark2Curators!
Slides: slideshare.net/andrewsuIcon credits (Noun Project, Wikimedia Commons): Zach VanDeHey, hunotika, Viktorvoigt, Alberto Rojas, Lloyd Humphreys
Matt and Cristina MightNGLY1 community
Gene WikiBen Good
Sebastian BurgstallerTim Putman
Núria Queralt RosinachJulia Turner
Andra Waagmeester
BioThings APIChunlei Wu
Julee AdesaraCyrus Afrasiabi
Sebastien LelongMike Mayers
Kevin Xin
Why do I Mark2Cure?40
I am retired, have a doctorate in medical humanities, and have two children with Gaucher disease. I am just looking for some way to put my education to use.
My 4 year old daughter Phoebe is living with and battling rare disease.
I have Ehlers Danlos Syndrome. I hope to help people learn about this painful and debilitating disorder, so that others like me can receive more effective medical care.
Take part in something that helps humanity.
I Mark2Cure in memory of my son Mike who had type 1 diabetes.
Studied biology in college and I really miss it!
In memory of my daughter who had Cystic Fibrosis
Give back