Building a massive biomedical knowledge
graph with citizen scienceBenjamin Good
The Scripps Research Institute @bgood
Not paying attention? be a citizen scientist at http://mark2cure.org
High level goal: improve access to published knowledge
22
articles added to PubMed per year
1 every 30 seconds, more than million a year
knowledge graph
Chemicals & drugsGenesOrganismsArea of studyBiological Process
Auto!Knowledge Graph
~10,000 articles
Ngly1 gene
?
New drug candidate?
Knowledge graph problems
• Assigning meaning to relations
• Incorrect relations • Missing relations • …
Facts of life in computer processing of human language
• False Positives and False Negatives always
• Human annotators remain the gold standard
• There are not nearly enough professional human annotators to process every document published
5 Not paying attention? be a citizen scientist at http://mark2cure.org
Observations
• There are about 2.92 billion Internet users
• Lots of them can read English
6 http://www.statista.com/statistics/273018/number-of-internet-users-worldwide/
Hypothesis
• We can generate the equivalent of massive numbers of professional annotators by aggregating the labor of large numbers of non-professional CITIZEN SCIENTISTS!!!
7
Building a Knowledge Graph
1. Find mentions of concepts in text
2. Identify relationships between concepts
8
Before we try for citizens..
• Can non-scientists collectively identify concepts in biomedical texts with high quality?
• We used the Amazon Mechanical Turk crowdsourcing platform to answer the question
9 Not paying attention? be a citizen scientist at http://mark2cure.org
Highlight the “disease”.
Answer was yes
• By combining the responses of multiple non-professional members of ‘the crowd’, we achieved equivalent quality to professional annotators
Good et al. “Microtask crowdsourcing for disease mention annotation in pubmed abstracts.” Pacific Symposium on Biocomputing 2015
http://psb.stanford.edu/psb-online/proceedings/psb15/good.pdf
Mark2Cure.org
Same task, different context
Experiment 1 in progressEvaluating quality and quantity of volunteer annotators
Goal is to complete about 600 abstracts, with 15 volunteers per abstract
Almost there!
mark2cure experiment 1Tasks/10
New usersLaunchTweet
Blog post
San Diego Union Tribune
Article
11:00am Feb. 9 5423, tasks complete
230 signups, 130 have completed a task
Not paying attention? be a citizen scientist at http://mark2cure.org
Next steps
• Implement and test a relation extraction workflow
• Start disease-focused knowledge capture missions
• First disease: NGLY1 deficiency
• http://ngly1.org
Thanks to the mark2cure team!
Max Nanis
Andrew Su
@bgood [email protected]
Ginger Tsueng
Chunlei Wu
Thank you to the citizen scientists
making this possible!
Why do I Mark2Cure?In memory of my daughter who had Cystic Fibrosis
Studied biology in college and I really miss it! My 4 year old daughter Phoebe is living with and battling rare disease.
I have Ehlers Danlos Syndrome. I hope to help people learn about this painful and debilitating disorder, so that others like me can receive more effective medical care.
I am retired, have a doctorate in medical humanities, and have two children with Gaucher disease. I am just looking for some way to put my education to use.
To give back
I Mark2Cure in memory of my son Mike who had type 1 diabetes.
Take part in something that helps humanity.
Increase precision with voting
20
1 or more votes (K=1)This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies.
K=2This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies.
K=3This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies.
K=4This molecule inhibits the growth of a broad panel of cancer cell lines, and is particularly efficacious in leukemia cells, including orthotopic leukemia preclinical models as well as in ex vivo acute myeloid leukemia (AML) and chronic lymphocytic leukemia (CLL) patient tumor samples. Thus, inhibition of CDK9 may represent an interesting approach as a cancer therapeutic target especially in hematologic malignancies.
Aggregation function
AMT results: 589 abstracts compared to gold standard
21
F = 0.87, k = 6