ncbo webinar: translating unstructured, crowdsourced content into structured data
DESCRIPTION
The use of crowdsourcing in biology is gaining popularity as a mechanism to tackle challenges of massive scale. However, to maximize participation and lower the barriers to entry, contributions to crowdsourcing efforts are typically not well-structured, which makes computing on these data challenging and difficult. The presentation will discuss strategies for translating this unstructured content into structured data. Three vignettes (in varying degrees of completion) will be described, one each from our Gene Wiki [1], BioGPS [2], and serious gaming [3] initiatives. [1]: http://en.wikipedia.org/wiki/Portal:Gene_Wiki [2]: http://biogps.org [3]: http://genegames.orgTRANSCRIPT
![Page 1: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/1.jpg)
Translating unstructured, crowdsourced content into structured data
Andrew Su, Ph.D.The Scripps Research Institute
NCBO Webinar
February 20, 2013
![Page 2: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/2.jpg)
Human genetics underlies human health2
~3 billion bases
~20,000 genes
Molecular diagnostics & therapeutics
Molecular understanding of:• Biological function• Genetic variation• Mutation• Deletion• Amplification• …
Gene annotations
Structured gene
annotations
![Page 3: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/3.jpg)
Structured gene annotations enable computation3
Structured gene annotations
![Page 4: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/4.jpg)
Few genes are well annotated4
41%
65%
CTNNB1VEGFASIRT1FGFR2TGFB1TP53MEF2CBMP4LEF1WNT5ATNF
Data: NCBI, February 2013
20,473 protein-coding genes
Genes, sorted by decreasing counts
GO
An
no
tati
on
C
ou
nts
![Page 5: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/5.jpg)
Few genes are well annotated5
Genes, sorted by decreasing counts
GO
An
no
tati
on
C
ou
nts
Data: NCBI, February 2013
+ Electronic annotation (IEA)
![Page 6: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/6.jpg)
Few genes are well annotated6
Genes, sorted by decreasing counts
GO
An
no
tati
on
C
ou
nts
Data: NCBI, February 2013
+ Electronic annotation (IEA)
Biological Process only
![Page 7: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/7.jpg)
7
311,696 articles (1.5% of PubMed)have been cited by GO annotations
![Page 8: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/8.jpg)
8
0
Sooner or later, the research community will
need to be involved in the annotation effort to scale
up to the rate of data generation.
![Page 9: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/9.jpg)
9
Crowdsourcing empowers the entire
scientific community to directly participate in the gene annotation process.
![Page 10: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/10.jpg)
From crowdsourcing to structured data10
The Gene Wiki
GeneGames.org
![Page 11: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/11.jpg)
10,000 gene “stubs” within Wikipedia11
Protein structure
Symbols and identifiers
Tissue expression pattern
Gene Ontology annotations
Links to structured databases
Gene summary
Protein interactions
Linked references
Huss, PLoS Biol, 2008
![Page 12: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/12.jpg)
Gene Wiki has a critical mass of readers12
Total: 4.0 million views / month
Huss, PLoS Biol, 2008; Good, NAR, 2011
![Page 13: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/13.jpg)
Gene Wiki has a critical mass of editors13
Increase of ~10,000 words / month from >1,000 editsCurrently 1.42 million words
Approximately equal to 230 full-length articles
Good, NAR, 2011
Edi
tor
coun
t Editors
Edits Edi
t co
unt
![Page 14: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/14.jpg)
A review article for every gene is powerful14
Hyperlinks to related concepts
References to the literature
Reelin: 68 editors, 543 edits since July 2002
Heparin: 175 editors, 320 edits since June 2003
AMPK: 44 editors, 84 edits since March 2004
RNAi: 232 editors, 708 edits since October 2002
![Page 15: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/15.jpg)
Filtering, extracting, and summarizing PubMed
Documents
Concepts
![Page 16: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/16.jpg)
Document- and concept-centric text mining16
Subject Object
Predicate
![Page 17: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/17.jpg)
Simple text mining for gene annotations17
Wikilink
GO exact match
Gene Wiki mapping
NCBI Entrez Gene: 334
Candidate assertion
GO:0006897
6319 novel Gene Ontology annotations2147 novel Disease Ontology annotations
Good, BMC Genomics, 2011.
![Page 18: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/18.jpg)
Gene Wiki content improves enrichment analysis18
p-value (PubMed only)
p-value (PubMed + GW)
Muscle contraction
More significant
PubMed + GW
More significant
PubMed only
Good, BMC Genomics, 2011.
![Page 19: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/19.jpg)
Gene Wiki+ for integrative queries19
http://genewikiplus.org
mwsync
Good, J Biomed Semantics, 2012.
![Page 20: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/20.jpg)
Dynamic queries across genes, diseases, SNPs20
Good, J Biomed Semantics, 2012.
![Page 21: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/21.jpg)
Gene Wiki+ for integrative queries21
http://genewikiplus.org
mwsync
{{#ask: [[Category:Human_proteins]] [[is_associated_with:: <q>[[Category:Breast_cancer]]</q>]] [[HasSNP:: <q>[[is_associated_with:: <q>[[Category:Breast_cancer]]</q>]] </q>]]}}
…
OMIMPharmGKB
Good, J Biomed Semantics, 2012.
![Page 22: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/22.jpg)
OMIMPharmGKB
Gene Wiki+ for integrative queries22
http://genewikiplus.org
mwsync
Good, J Biomed Semantics, 2012.
![Page 23: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/23.jpg)
Wikidata23
Provide a database of the world’s knowledge that
anyone can edit
- Denny Vrandečić
![Page 24: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/24.jpg)
Wikidata24
is a
regulates
Interacts with
Protein
Glycoprotein
Neural development
VLDL receptor
Amyloid precursor protein
Property:P31
Property:P128
Property:P129
Q8054
Q187126
Q1345738
Q1979313
Q423510
Q414043
Reelin
http://www.wikidata.org/wiki/Q414043
![Page 25: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/25.jpg)
Wikidata25
Property:P31
Property:P128
Property:P129
Q8054
Q187126
Q1345738
Q1979313
Q423510
Q414043
http://wikidata.org/w/api.php?action=wbgetentities&ids=Q414043&languages=en
![Page 26: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/26.jpg)
Wikidata26
http://www.wikidata.org/wiki/Wikidata:Molecular_Biology_task_force
![Page 27: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/27.jpg)
Wikidata27
http://www.wikidata.org/wiki/Wikidata:Molecular_Biology_task_force
![Page 28: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/28.jpg)
From crowdsourcing to structured data28
The Gene Wiki
GeneGames.org
![Page 29: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/29.jpg)
Not just the biomedical literature…29
![Page 30: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/30.jpg)
BioGPS aggregates gene-centric information30
http://biogps.orgWu, NAR, 2013; Wu, Genome Biology, 2009.
![Page 31: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/31.jpg)
The plugin interface is simple and universal31
KEGGhttp://www.genome.jp/dbget-bin/www_bget?hsa:{{EntrezGene}}
STRINGhttp://string-db.org/newstring_cgi?...&identifier={{EnsemblGene}}
Pubmedhttp://www.ncbi.nlm.nih.gov/sites/entrez?...&Term={{Symbol}}
URL template
Gene entityRendered URL
Wu, NAR, 2013; Wu, Genome Biology, 2009.
![Page 32: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/32.jpg)
The plugin interface is simple and universal32
![Page 33: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/33.jpg)
The plugin interface is simple and universal33
![Page 34: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/34.jpg)
The plugin interface is simple and universal34
![Page 35: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/35.jpg)
The plugin interface is simple and universal35
![Page 36: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/36.jpg)
The plugin interface is simple and universal36
Total of 389 gene-centric online databases registered as BioGPS plugins
![Page 37: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/37.jpg)
BioGPS has a critical mass of users37
• > 4100 registered users• 4000 unique visitors per week• 40,000 page views per week
1. Harvard2. NIH3. UCSD4. Scripps5. MIT6. Cambridge
7. U Penn8. Stanford9. Wash U10. UNC
Top 10 organizations
Daily pageviews
Wu, NAR, 2013; Wu, Genome Biology, 2009.
![Page 38: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/38.jpg)
All resources should provide RDF…38
![Page 39: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/39.jpg)
Mining structured content from HTML39
![Page 40: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/40.jpg)
Defining a data extraction template40
…
TP53 TNF APOE IL6 VEGF …EGFR TGFB1
![Page 41: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/41.jpg)
The BioGPS Semantic Annotator41
http://54.244.135.254:8000/
![Page 42: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/42.jpg)
From crowdsourcing to structured data42
The Gene Wiki
GeneGames.org
![Page 43: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/43.jpg)
43
http://www.flickr.com/photos/archana3k1/4124330493/
Seven million human hours
![Page 44: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/44.jpg)
44
Twenty million human hours
http://www.flickr.com/photos/ableman/2171326385/
![Page 45: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/45.jpg)
-45
150 billion human hours
http://www.flickr.com/photos/rvp-cw/6243289302/
per year
![Page 46: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/46.jpg)
Using games to fold proteins46
Fold.it players have successfully:• Outperformed state of the art protein
folding algorithms (Cooper, Nature, 2010)
• Solved a previously-intractable crystal structure (Khatib, Nat Struct Mol Biol, 2011)
• Designed an improved protein folding algorithm (Khatib, PNAS, 2011)
• Improved enzyme activity of de novo designed enzyme (Eiben, Nat Biotechnol, 2011)
![Page 48: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/48.jpg)
Using games to align sequences 48
http://phylo.cs.mcgill.ca
Kawrykow, PLOS ONE, 2012.
![Page 50: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/50.jpg)
No good gene-disease annotation database50
Alzheimer's disease (AD)Lipoprotein glomerulopathySea-blue histiocyte disease
Query: Apolipoprotein E
![Page 51: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/51.jpg)
No good gene-disease annotation database51
Alzheimer's disease (AD)Lipoprotein glomerulopathy Sea-blue histiocyte diseaseHyperlipoproteinemia, type IIIMacular degeneration, age-relatedMyocardial infarction susceptibility
Query: Apolipoprotein E
![Page 52: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/52.jpg)
No good gene-disease annotation database52
Alzheimer's disease (AD)Lipoprotein glomerulopathy Sea-blue histiocyte diseaseHyperlipoproteinemia, type IIIMacular degeneration, age-relatedMyocardial infarction susceptibilityHIVPsoriasisVascular Diseases
Query: Apolipoprotein E
?
?
?
?
?
![Page 53: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/53.jpg)
No good gene-disease annotation database53
Alzheimer's disease (AD)Neuropsychological Tests Cognition Disorders Dementia Cognition Disease Progression Cardiovascular Diseases Coronary Disease Diabetes Mellitus, Type 2 Memory Disorders
Query: Apolipoprotein E
Memory Coronary Artery Disease Hypertension Mental Status Schedule Psychiatric Status Rating
Scales Hyperlipidemias Atrophy Dementia, Vascular Parkinson Disease Brain Injuries Myocardial Infarction …
477 diseases!
![Page 54: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/54.jpg)
Play Dizeez to annotate gene-disease links54
3. If it’s ‘right’, you get points
4. Then on to the next question…
2. Click the related disease (only one is “right”)
5. Hurry!
1. Read the clue (gene)
6. Play to win!
![Page 55: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/55.jpg)
Dizeez players seem pretty smart…55
In total (since Dec 2011):• 230 unique gamers• 1045 games played• 8525 guesses
# Occurrences Gene Disease
11 NBPF3 neuroblastoma
11 SOX8 mental retardation
9 ABL1 leukemia
9 SSX1 synovial sarcoma
8 APC colorectal cancer
8 FES sarcoma
8 RBP3 retinoblastoma
8 GAST gastrinoma
8 DCC colorectal cancer
8 MAP3K5 cancer
Gene Wiki OMIM PharmGKB PubMed
![Page 56: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/56.jpg)
Using games to predict phenotype from genotype?56
http://genegames.org
![Page 57: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/57.jpg)
Classification problems in genome biology57
cancer normal
find patterns
Classify new samples
cancer
normalSVM
Neural networks
Naïve Bayes
KNN
…100s samples
100,
000s
fea
ture
s
![Page 58: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/58.jpg)
Random forests58
Sample subset of cases and
featuresTrain decision
treecancer normal
100s samples
100,
000s
fea
ture
s
![Page 59: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/59.jpg)
Random forests59
cancer normal
100s samples
100,
000s
fea
ture
s
![Page 60: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/60.jpg)
Random forests60
Classify new samples
cancer
normal
cancer normal
100s samples
100,
000s
fea
ture
s
How to interject biological
knowledge?
![Page 61: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/61.jpg)
Network-guided forests61
Dutkowski & Ideker (2011). PLoS Computational Biology
![Page 62: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/62.jpg)
Network-guided forests62
Sample features by PPI
networkTrain decision
treecancer normal
100s samples
100,
000s
fea
ture
s
![Page 63: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/63.jpg)
Human-guided forests63
Sample features by
human intelligence
Train decision treecancer normal
100s samples
100,
000s
fea
ture
s
![Page 64: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/64.jpg)
64
![Page 65: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/65.jpg)
The Cure: Genomic predictors for disease65
![Page 66: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/66.jpg)
The Cure: Genomic predictors for disease66
![Page 67: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/67.jpg)
The Cure: Genomic predictors for disease67
![Page 68: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/68.jpg)
The Cure: Genomic predictors for disease68
![Page 69: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/69.jpg)
The Cure: Genomic predictors for disease69
![Page 70: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/70.jpg)
The Cure: Genomic predictors for disease70
![Page 71: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/71.jpg)
Human-guided forests71
Classify new samples
cancer
normal
![Page 72: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/72.jpg)
“Critical Assessment”-style challenge72
![Page 73: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/73.jpg)
Results
• 214 registered players– 50% declared knowledge of cancer
biology– 40% self-identified as having Ph.D.
• Prediction results– 70% correct on survival concordance
index– Best scoring model was 76%– Player registrations still increasing!
73
![Page 74: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/74.jpg)
74
Crowdsourcing empowers the entire
scientific community to directly participate in the gene annotation process.
![Page 75: NCBO Webinar: Translating unstructured, crowdsourced content into structured data](https://reader036.vdocuments.net/reader036/viewer/2022062307/554e806eb4c905f66a8b5485/html5/thumbnails/75.jpg)
75
Doug Howe, ZFINJohn Hogenesch, U PennLuca de Alfaro, UCSCAngel Pizzaro, U PennFaramarz Valafar, SDSUPierre Lindenbaum,
Fondation Jean DaussetMichael Martone, RushKonrad Koehler, Karo BioWarren Kibbe, Simon Lim, NorthwesternMany Wikipedia editors
WP:MCB Project
Collaborators
Katie FischBen GoodSalvatore Loguercio
Max NanisChunlei Wu
Group members
Funding and Support
(BioGPS: GM83924, Gene Wiki: GM089820)
Contacthttp://sulab.org
[email protected]@andrewsu+Andrew Su
Erik ClarkeJon HussMarc LegliseMaximilian LudvigssonIan MacLeodCamilo Orozco
Key group alumni