hvp critical assessment of genome interpretation
DESCRIPTION
Note: CAGI occurred in Dec 2010, after I left Berkeley. Susanna Repo made the event happen and it would not have occurred without her.TRANSCRIPT
CAGI (\ˈkā-jē\)Critical Assessment of Genome InterpretationA community experiment to evaluate phenotype prediction
Reece Hart (with Steven Brenner and John Moult)QB3 / Center for Computational BiologyUC [email protected]
Human Variome Project MeetingParis 2010-05-12
ca·gey \ˈkā-jē\ adjective1: hesitant about committing oneself;2a: wary of being trapped or deceived;2b: marked by cleverness
2
The Significance of“Variants of Uncertain Significance”
“VUS – Variant of uncertain significance. A variation in a genetic sequence whose association with disease risk is unknown. Also called variant of uncertain significance, variant of unknown significance, and unclassified variant.”http://www.cancer.gov/cancertopics/genetics-terms-alphalist
3
The long tail of rare diseases.
“A rare disease typically affects a patient population estimated at fewer than 200,000 in the U.S. There are more than 6,000 rare diseases known today and they affect an estimated 25 million persons in the U.S.”
NIH Office of Rare Diseases Researchhttp://rarediseases.info.nih.gov/
4
Interpretation of Unclassified Variantsa sampling of responses from genetic counselors
➢ Routinely used● dbSNP● OMIM● GeneReviews● PolyPhen● SIFT● PubMed● Mailing lists
➢ Selectively used● PharmGKB● LSDBs● Domain prediction● Structure impact
analysis● Homology
5
Genome Variant Impact Prediction Toolsan incomplete list
Program URL
CUPSAT
SIFTSNAP
SNPs3D
Align-GVGD http://agvgd.iarc.fr/AutoMute http://proteins.gmu.edu/automute/
http://cupsat.tu-bs.de/Dmutant http://sparks.informatics.iupui.edu/hzhou/mutation.htmlnsSNPAnalyzer http://snpanalyzer.uthsc.edu/PantherPSEC http://www.pantherdb.org/tools/csnpScoreForm.jspPhD-SNP http://gpcr.biocomp.unibo.it/~emidio/PhD-SNP/PhD-SNP.htmPmut http://mmb2.pcb.ub.es:8080/PMut/
PolyPhen http://coot.embl.de/PolyPhen/http://sift.jcvi.org/http://cubic.bioc.columbia.edu/services/snap/
SNP Function Pred. http://www.ensembl.org/ [N.B. login required]SNPinfo / FuncPred http://snpinfo.niehs.nih.gov/snpfunc.htm
http://snps3d.org/UMD-predictor http://www.umd.be/
6
Current methods are the tip of the iceberg.
~1%
~99%
m
Cnon-proteintranscripts
proteintranscripts
repeats indels epigenetics
7
Objectively Assessing Computational Predictions
Data Acquisition
Publication
The Prediction Window~1-12 months when unpublishedhigh-quality data are available
➢ CASP – Structure prediction➢ CAPRI – Protein-ligand docking➢ EGASP – Encode Gene Annotation➢ RGASP – RNA-Seq mapping➢ DREAM – network model assessment
8
➢ Follow the successful critical assessment framework:
● Solicit pre-publication genotype-phenotype associations
● Provide genomic data to predictors and collect their predictions
● Assess predictions against revealed annotations, mechanisms, and phenotypes
CAGI – Critical Assessment of Genome InterpretationA community assessment of the state-of-the-art in phenotype prediction.
9
Please contact us if you have pre-publication genotype-phenotype association data.
Sample Prediction Categories
MolecularA
T
OrganismalA
T
CellularA
T
MTHFR mutants – Yeast growth rates with variousMTHFR mutations and [folate].(Jasper Rine)
Breast Cancer –Segregation of rare variants among 2500 cases and controls.(Sean Tavtigian)
PGP100 – Unpublished phenotypes from PGP100 project.
(George Church)
10
Census of Molecular Mechanismspossible mechanisms of variant impact for WTCCC SNVs
Wellcome Trust Case Control Consortium Nature. 2007;447(7145):661-78.
11
Contributors, Predictors, Assessorsan incomplete list of participants
Gad Getz
Pauline Ng
Sean Tavtigian
George ChurchMarc Greenblatt
Jasper RineRachel Karchin
Mauno Vihinen
12
Sample CAGI Timeline05
-03
05-1
0
05-1
7
05-2
4
05-3
1
06-0
7
06-1
4
06-2
1
06-2
8
07-0
5
07-1
2
07-1
9
07-2
6
08-0
2
08-0
9
08-1
6
08-2
3
08-3
0
09-0
6
09-1
3
09-2
0
09-2
7
10-0
4
10-1
1
10-1
8
10-2
5
11-0
1
11-0
8
11-1
5
11-2
2
11-2
9
12-0
6
12-1
3
12-2
0
12-2
7
01-0
3
01-1
0
01-1
7
01-2
4
01-3
1
Data Gathering
Prediction Season
Assessment
Key Dates▲ finalize data sources ▲ workshop
▲ release prospectus / rules▲ open participant registration
Dates are for illustration – exact dates have not been set.
13
CAGI Summary
➢ CAGI will:● objectively assess phenotype prediction methods● inform future research directions● introduce researchers in diverse fields
➢ CAGI is being planned for the end of 2010 or early 2011.
➢ Now seeking data contributors, assessors, and predictors.
➢ Feedback is sought! [email protected]
➢ See http://genomecommons.org/cagi for more information.
14
15
The Genome Commons:A Flagship Project Within QB3
10 km
16
Reece HartChief ScientistUC Berkeley
Steven BrennerPlant & Mol. BiologyUC Berkeley
Sandrine DudoitBiostatisticsUC Berkeley
Robert NussbaumChief, Medical GeneticsUCSF
Jasper RineGenetics, Genomics & DevChair, Computational BiologyUC Berkeley
Lior PachterMathematicsMol., Cell, BiolUC Berkeley
Bernie LoDirector, Medical EthicsDepartment of MedicineUCSF
Rasmus NielsenoMichael I. JordanIan HolmesKimmen SjölanderYun SongMonty SlatkinTerry SpeedMark van der LaanRichard KarpBernd SturmfelsSteven EvansElizabeth PurdomHaiyan HuangPeter BickelSusan MarquseeMichael EisenLisa BarcellosRachel BremTom Alber
Program in Translational Genomics