snp discovery and genotyping workshop snp discovery strategies debbie nickerson identifying snps by...
TRANSCRIPT
![Page 1: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/1.jpg)
SNP Discovery and Genotyping Workshop
• SNP discovery strategies Debbie Nickerson
• Identifying SNPs by association for genotype-phenotype analysis of candidate genesChris Carlson
• Identifying haplotypes for genotype-phenotype analysis of candidate genes
Dana Crawford
• SNP genotyping strategies Debbie Nickerson
![Page 2: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/2.jpg)
SNP Discovery and Genotyping Strategies
Debbie Nickerson - [email protected]
• Overview of Variation in the Human Genome
• SNP Discovery Strategies and Status
• SNP Data in the PGAs
• Genotyping SNPs
![Page 3: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/3.jpg)
Total sequence variation in humans
Population size: 6x109 (diploid)
Mutation rate: 2x10–8 per bp per generation
Expected “hits”: 240 for each bp
Every variant compatible with life exists in the population
BUT: Most are vanishingly rare
Compare 2 haploid genomes: 1 SNP per 1331 bp*
*The International SNP Map Working Group, Nature 409:928 - 933 (2001)
![Page 4: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/4.jpg)
Strategies to Find SNPs
• Mine them from Existing Genome Resources
• Targeted SNP Discovery in Candidate Genes
CardioGenomicsCardioGenomics - - http://www.cardiogenomics.org
InnateImmunityInnateImmunity - - http://innateimmunity.net
Berkeley PGABerkeley PGA - - http://pga.lbl.gov
SouthwesternSouthwestern - - http://pga.swmed.edu
SeattleSNPsSeattleSNPs - - http://pga.mbt.washington.edu
![Page 5: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/5.jpg)
Sequence Overlap SNP discovery
GTTTAAATAATACTGATCAGTTTAAATAATACTGATCAGTTTAAATAGTACTGATCAGTTTAAATAGTACTGATCA
Genomic DNA mRNA
BAC library RRS Libraryor Sampling
cDNA Library
EST OverlapShotgun Overlap
Sequence-based SNP Mining
BAC Overlap
~ 4.1 Million SNPs Available http://www.ncbi.nlm.gov/SNP/
![Page 6: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/6.jpg)
Mining Finds Only A Small Fraction of the SNPs
0.0 0.2 0.3 0.4 0.50.10.0
0.5
1.0
Minor Allele Frequency
Fra
ctio
n o
f SN
Ps
Dis
cove
red
2
4824
16
8
96
A G
![Page 7: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/7.jpg)
minimal allelefrequency
expected SNPs(millions)
expected SNPfrequency (bp)
expected % indatabase
1% 11.0 290 11-12
5% 7.1 450 15-17
10% 5.3 600 18-20
20% 3.3 960 21-25
30% 2.0 1570 23-27
40% 0.97 3280 24-28
Total Estimated SNPs and Fraction in dbSNP
L. Kruglyak and D. Nickerson, Nat Genet 27:234-236 2001
![Page 8: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/8.jpg)
Surfactant B - Locus Link
dbSNP (http://www.ncbi.nlm.nih.gov/SNP/)
![Page 9: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/9.jpg)
Surfactant B - dbSNP
![Page 10: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/10.jpg)
Confirmation of SNP Resource in New SamplePotential Pitfalls
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
BAC
RRS
EST
PCR
Oth
er
Any M
ultiple
Rep
ort
BRE Multi
ple R
eport
Confirmed Multiple Method Report in dbSNP
Confirmed Unique Method Report in dbSNP
![Page 11: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/11.jpg)
Strategies to Find SNPs
• Mine them from Existing Resources
• Targeted SNP Discovery in Candidate Genes
CardioGenomicsCardioGenomics - - http://www.cardiogenomics.org
InnateImmunityInnateImmunity - - http://innateimmunity.net
Berkeley PGABerkeley PGA - - http://pga.lbl.gov
SouthwesternSouthwestern - - http://pga.swmed.edu
SeattleSNPsSeattleSNPs - - http://pga.mbt.washington.edu
![Page 12: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/12.jpg)
Sequence each end
of the fragment.
Base-calling
Quality determination
Contig assembly
Final quality determination
Sequence viewing
Polymorphism tagging
Polymorphism reporting
Individual genotyping
Polymorphism detection
PolyPhred
Consed
Analysis
Sequence Phred PhrapAmplify DNA5’ 3’
Sequence-based SNP Identification
Phylogenetic analysis
ATAGACG ATACACG ATAGACG ATACACG
ATAGACGATACACG
Homozygotes Heterozygote
![Page 13: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/13.jpg)
Sequence-Based Detection and Genotyping of SNPs
Jim Sloan, Tushar Bhangle (PolyPhred)Matthew Stephens, Paul Scheet (Quality Scores for SNPs)Phil Green, Brent Ewing, David Gordon (Phred, Phrap, Consed)
![Page 14: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/14.jpg)
![Page 15: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/15.jpg)
PGA SNPs
• The PGAs provide a validated SNP resource (Allele Frequency Data)
• Novel Views of the Variation Data Emerging Pathway Interfaces Color Fasta Formats Gene Structure Views Visual Genotypes Linkage Disequilibrium Views TagSNPs Haplotypes
• Many New Formats Under Development
![Page 16: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/16.jpg)
Toward comprehensive association studies
• 5-7 million common variants exist in genome
• Testing all for association is impractical today
• Can the list be reduced w/o loss of power?
– SNPs in Coding (Amino Acid Changes)
– Linkage disequilibrium (SNPs in other functional regions, i.e.
regulatory elements)
![Page 17: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/17.jpg)
cSNPs - Both Deep and Average Coverage Available from the PGAs
CD36 - Southwestern PGA - Deep cSNP Discovery Strategy - Healthy, High Cholesterol, High Triglycerides, Congential Cardiac Abnormalities, Left Ventricular Hypertrophy …….
CD36 - SeattleSNPs PGA - Average cSNP Discovery Strategy -Healthy only
![Page 18: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/18.jpg)
SIFT (Sorting Intolerant From Tolerant) Coding Changes
CYP4F2
Trp (W) Gly (G)Predicted to be tolerated
Val (V) Gly (G)Predicted not to be tolerated
Ng and Henikoff, Gen. Res. 2002
![Page 19: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/19.jpg)
SNP-Based Association Studies
5’ 3’
Arg-Cys Val-Val
Collins, Guyer, Chakravarti Science 278:1580-81, 1997
Indirect: Use dense map of SNPs and test for linkage disequilibrium (use association to find sites in entire sequence (non-coding) with function)
![Page 20: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/20.jpg)
SNP Discovery and Genotyping Workshop
• SNP discovery strategies Debbie Nickerson
• Identifying SNPs by association for genotype-phenotype analysis of candidate genesChris Carlson
• Identifying haplotypes for genotype-phenotype analysis of candidate genes
Dana Crawford
• SNP genotyping strategies Debbie Nickerson
![Page 21: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/21.jpg)
Selecting SNPs for Genotype-Phenotype Analysis
Using Allelic Association(Linkage Disequilibrium)
Christopher Carlson
![Page 22: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/22.jpg)
Candidate Gene Association Analysis
• Describe existing genetic variation– Rare SNPs (deep exonic resequencing)– Common SNPs (complete resequencing)
• Select a subset of SNPs for genotyping– cSNPs (amino acid changes)– htSNPs (resolve haplotypes)– tagSNPs (patterns of genotype)
• Test for genotype/phenotype correlations
![Page 23: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/23.jpg)
SeattleSNPs Resequencing Strategy I
• Resequence the complete genomic region of each gene – 2000 bp upstream of first exon– 1500 bp downstream of poly-A signal– All exons and introns for genes below 35 kbp
Image courtesy of GeneSNPs
![Page 24: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/24.jpg)
VG2
• Visual Genotype 2– Web interface– Visualize genotypes– View SNPs by frequency– Sort on similarity
between sites– Sort on similarity
between samples– Visualize LD
![Page 25: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/25.jpg)
SeattleSNPs Resequencing Strategy II
• Resequence candidate genes from inflammation and coagulation pathways
• Resequence 47 individuals– 24 African American– 23 European American
Homozygote common Heterozygote Homozygote rare Missing Data
![Page 26: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/26.jpg)
VG2
• Visual Genotype 2– Web interface– Visualize genotypes– View SNPs by frequency– Sort on similarity
between sites– Sort on similarity
between samples– Visualize LD
![Page 27: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/27.jpg)
VG2
• Visual Genotype 2– Web interface– Visualize genotypes– View SNPs by frequency– Sort on similarity
between sites– Sort on similarity
between samples– Visualize LD
![Page 28: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/28.jpg)
VG2
• Visual Genotype 2– Web interface– Visualize genotypes– View SNPs by frequency– Sort on similarity
between sites– Sort on similarity
between samples– Visualize LD
![Page 29: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/29.jpg)
VG2
• Visual Genotype 2– Web interface– Visualize genotypes– View SNPs by frequency– Sort on similarity
between sites– Sort on similarity
between samples– Visualize LD
![Page 30: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/30.jpg)
VG2
• Visual Genotype 2– Web interface– Visualize genotypes– View SNPs by frequency– Sort on similarity
between sites– Sort on similarity
between samples– Visualize LD
![Page 31: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/31.jpg)
VG2
• Visual Genotype 2– Web interface– Visualize genotypes– View SNPs by frequency– Sort on similarity
between sites– Sort on similarity
between samples– Visualize LD
![Page 32: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/32.jpg)
Preliminary Analyses
• Hardy Weinberg Equilibrium
• Population specificity• Nucleotide diversity• Pop genetics statistics
(e.g. Tajima’s D)
![Page 33: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/33.jpg)
SNP Selection: cSNPs
• Genotype SNPs which change amino acids• Genotype other “good story” SNPs
– SNPs in known regulatory elements– SNPs in Conserved Noncoding Sequences
Image courtesy of GeneSNPs
![Page 34: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/34.jpg)
SNP Selection: htSNPs
• Genotype “haplotype tagging” SNPs which resolve existing common haplotypes
![Page 35: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/35.jpg)
SNP Selection: htSNPs
• Genotype “haplotype tagging” SNPs which resolve existing common haplotypes
![Page 36: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/36.jpg)
SNP Selection: tagSNPs
• Resequence a modest number of samples
– Describe patterns of genotype at all common SNPs
– Genotype tagSNPs which efficiently capture existing patterns of genotype
![Page 37: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/37.jpg)
Linkage Disequilibrium A B
Haplotype is the pattern of alleles
on a single chromosome– 4 possible haplotypes
Linkage Disequilibrium (LD) describes the allelic association between two SNPs
Two popular LD statistics: D´ r2
![Page 38: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/38.jpg)
Complete LD
A B
Unequal allele frequencyAllelic association is as strong as
possible– 3 haplotypes observed – No detected recombination
between SNPs– Genotype is not perfectly
correlated
D´ = 1 r2 < 1
![Page 39: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/39.jpg)
Perfect LD
A B
Equal allele frequency
Allelic association is as strong as possible– 2 haplotypes observed
– No detected recombination between SNPs
– Genotype is perfectly correlated
D´ = 1
r2 = 1
![Page 40: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/40.jpg)
Select SNPs to genotype on the basis of LD
Rational SNP Selection
• Some SNPs are in LD with many other SNPs
• SNPs between a pair of associated SNPs are not necessarily associated with the flanking SNPs
• Some SNPs are in LD with no other SNPs
![Page 41: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/41.jpg)
LD SNP Selection Example
CSF3 in European Americans•5200 bp•17 SNPs
![Page 42: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/42.jpg)
LD SNP Selection Example
CSF3 in European Americans•5200 bp•17 SNPs•10 common SNPs (above 10% minor allele frequency)
![Page 43: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/43.jpg)
LD Site Selection Algorithm• Find minimal set of SNPs
for assay, such that each SNP is either assayed directly or above r2 threshold with an assayed SNP
•Calculate all pairwise r2 values
•Set r2 threshold based on power estimates for study
![Page 44: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/44.jpg)
LD Site Selection Algorithm• Find minimal set of SNPs
for assay, such that each SNP is either assayed directly or above r2 threshold with an assayed SNP
•Calculate all pairwise r2 values
•Set r2 threshold based on power estimates for study
![Page 45: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/45.jpg)
CSF3 Site Selection
• Threshold LD: r2 > 0.64– Bin 1: 4 sites– Bin 2: 4 sites– Bin 3: 2 sites
• Genotype 1 SNP from each bin, chosen for biological intuition or ease of assay design
![Page 46: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/46.jpg)
Power and LD
• Given– All common SNPs described
– Patterns of LD between common SNPs are known
• Select SNPs such that every SNP is either– Directly assayed
– Associated with an assayed SNP
• Test for disease associations with assayed SNPs• Power to detect disease associations at unassayed
SNPs depends on r2 between assayed and unassayed SNPs
![Page 47: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/47.jpg)
LD Selection and Haplotype
• LD selected SNPs provide the highest possible haplotype diversity for a given number of SNPs assayed
• LD selection is robust to recombination and hotspot structure
• LD selection is sensitive to population stratification
![Page 48: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/48.jpg)
SNP Selection Summary
• It is possible to test all common variants in a candidate gene directly for risk association (main effects) with meaningful null negative results
• Caveat: Higher order risks unaddressed– Haplotype (G X G effects within a locus)– Epistasis (G X G effects between loci)– Environment (G X E effects)
![Page 49: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/49.jpg)
SNP Discovery and Genotyping Workshop
• SNP discovery strategies Debbie Nickerson
• Identifying SNPs by association for genotype-phenotype analysis of candidate genesChris Carlson
• Identifying haplotypes for genotype-phenotype analysis of candidate genes
Dana Crawford
• SNP genotyping strategies Debbie Nickerson
![Page 51: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/51.jpg)
Outline of discussion
• Constructing or inferring haplotypes
• Haplotype tools available in PGA
• Description of haplotypes in SeattleSNPs genes
• Use of VH1 tool to visually inspect– Haplotype blocks– Haplotype diversity– Hotspots of recombination
• Summary of SeattleSNPs haplotype data
![Page 52: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/52.jpg)
What is a Diplotype ?
• Humans are diploid
• At each SNP there are two alleles, which are observed as a genotype
• At each gene there are two haplotypes, which are observed as a multi-site genotype, or diplotype
![Page 53: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/53.jpg)
What is a Haplotype?
A: “…a unique combination of genetic markers present in a chromosome.” pg 57 in Hartl & Clark, 1997
VH1 – haplotype visualization tool
![Page 54: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/54.jpg)
How Do You Construct Haplotypes?
1. Collect extended family members
C TA G
T TG G
C CA G
C/T, A/G
C/C, A/GT/T, G/G
C/T, A/AC/C, A/G
![Page 55: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/55.jpg)
How Do You Construct Haplotypes?
2. Go from diploid to haploid via
somatic cell hybrids
e.g. Patil et al 2001
![Page 56: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/56.jpg)
How Do You Construct Haplotypes?
3. Allele-specific PCR
SNP 1 SNP 2
C/T A/G
![Page 57: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/57.jpg)
4. Statistical inference
• Clark Algorithm
• EM (Arlequin)
• Phase Ligation (HAPLOTYPER)
• PHASE
How Do You Construct Haplotypes?
![Page 58: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/58.jpg)
Clark Algorithm
• Find unambiguous haplotypes– Homozygotes– Single Heterozygotes
![Page 59: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/59.jpg)
Clark Algorithm
• Find ambiguous diplotypes formed from two unambiguous genotypes
![Page 60: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/60.jpg)
Clark Algorithm
• Find ambiguous diplotypes formed from one unambiguous genotype and one new genotype
![Page 61: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/61.jpg)
Clark Algorithm
• Iterate until either all haplotypes resolve, or ambiguous haplotypes are inconsistent with any inferred haplotype
![Page 62: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/62.jpg)
Haplotype Algorithm Comparison
• Clark– Intuitive– Fast
• EM– Complete solution– Slightly more
accurate than Clark– Robust to ambiguity
• PHASE– Complete solution– Slightly more
accurate than EM– Slow version 2 faster
• Haplotyper (Ligation)– Fast– Better than Clark– Less accurate than
EM or PHASE
![Page 63: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/63.jpg)
Haplotype Tools in the PGA
InnateImmunity• 25 genes re-sequenced in innate immunity pathway• 4 populations: European and African-Americans,
Hispanics, Asthmatics• PHASE and Haplotyper results posted on website
http://innateimmunity.net
![Page 64: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/64.jpg)
Haplotype Tools in the PGA
SeattleSNPs• 120 genes re-sequenced in inflammation response• 2 populations: European- and African-Americans• PHASE results posted on website• Interactive tool (VH1) to visualize and sort haplotypes
http://pga.gs.washington.edu
![Page 65: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/65.jpg)
0
5
10
15
20
25
30
35
40
45
50
0 10 20 30 40 50 60 70 80 90 100
Number of genes
Number of haplotypes
Distribution of Haplotypes in100 SeattleSNPs Genes
AD
ED
![Page 66: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/66.jpg)
Common Haplotypes in 100 SeattleSNPs Genes
(Frequency >5%)
Population >5% MAF
Average Range
ED 4.54 1 - 8
AD 4.99 0 - 11
![Page 67: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/67.jpg)
Haplotype Sharing Between Populations in 100 SeattleSNPs Genes
00.10.20.30.40.50.60.70.80.9
1
ED AD
Non-sharedShared
![Page 68: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/68.jpg)
Number of Haplotypes From Two Different Discovery Strategies
0
5
10
15
20
25
30
35
AD ED Combined
Average number of haplotypes per gene
All SNPs>5%
CodingSNPs,>5%
![Page 69: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/69.jpg)
FGB – African-Americans
Haplotype Structures Are Similar Across Discovery Strategies…
29 SNPs >5% 13 SNPs >5%
Coding SNPs
![Page 70: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/70.jpg)
…But, Not For All GenesF10 – African-Americans
48 SNPs >5% 13 SNPs >5%
Coding SNPs
![Page 71: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/71.jpg)
Are Blocks Preserved Using Different Discovery Strategies?
Fewer “blocks” with fewer SNPs/kb
Yes*, for some: 10% of genes in AD
25% of genes in ED
*>75% of the blocks are preserved
A B
a bA b
a B
Four-gamete test:
A B
a b
HaploBlockFinder; Zhang and Jin 2003
![Page 72: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/72.jpg)
Using Visualization Tools (VH1) To Identify Haplotype Blocks
IL10:
• Rare sites removed
• Sorted by related sites
• “Block” structure evident
![Page 73: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/73.jpg)
Using VH1 to Identify Highly Divergent Haplotypes
• Some haplotypes are highly divergent
• More likely to have functional consequences?
• Mixed Blessing:– Easier to detect– Harder to dissect
![Page 74: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/74.jpg)
CD36 haplotypes, sorted by sample
Using Haplotypes To
IdentifyHotspots Of
Recombination
![Page 75: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/75.jpg)
Linkage Disequilibrium and Hotspots
Associated Sites
Hotspot in betweensites need to betyped from bothends
CD36
![Page 76: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/76.jpg)
Detection of Recombination HotspotsIn Candidate Genes
HOTSPOTTER
• Developed by Na Li and Matthew Stephens
• Multilocus model for LD:Does not rely on “block-like” patterns
Relates LD to underlying recombination process
Incorporated into new version of PHASE (v2.0)
students.washington.edu/lina/software/
![Page 77: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/77.jpg)
CD36 – combined population
![Page 78: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/78.jpg)
CD36 – AD and ED populations
![Page 79: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/79.jpg)
HOTSPOTTERPreliminary Results
AGTR1APOBCD36IL1BIL21RIL4NOS3PLAUR
PON1SERPIN45SELPSFPA2SFTPBVCAM1VEGF
15 out of 100 genes have evidence of a hotspot:
![Page 80: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/80.jpg)
SeattleSNPs Haplotype Summary
• More haplotypes per gene than previously described
• Block structure is preserved across discovery strategies for only a fraction of the genes
• <50% of African-American chromosomes are representedby common shared haplotypes
• Evidence for hotspots of recombination in human genes
![Page 81: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/81.jpg)
SNP Discovery and Genotyping Workshop
• SNP discovery strategies Debbie Nickerson
• Identifying SNPs by association for genotype-phenotype analysis of candidate genesChris Carlson
• Identifying haplotypes for genotype-phenotype analysis of candidate genes
Dana Crawford
• SNP genotyping strategies Debbie Nickerson
![Page 82: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/82.jpg)
Ideals for SNP Genotyping
• High Sensitivity - PCR but moving towards direct genomic DNA detection
• High Specificity - Accurate
• Simple process - Easy to automate - High Throughput
• Multiplexing - Perform many assays at once - decrease costs
• Cheap
![Page 83: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/83.jpg)
C Allele T AlleleProbe and Target
C C CTarget G A
Cleave Fail to cleave
CC C
Target G ADegrade Fail to degrade
CTarget G A
C incorporated C Fails to incorporate
Target G AC C
C
Hybridize Fail to hybridize
Target G A
C
C C
Amplify Fail to amplify
Target G ACC
C
Ligate Fail to ligate
+ddCTP
SNP Genotyping
Allele-Specific Hybridization
Polymerase Extension
Oligonucleotide Ligation
Invader
Taqman
Allele-Specific PCR
Matched Mis-Matched
![Page 84: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/84.jpg)
SNP Typing Formats
Microtiter Plates - Fluorescence
Size Analysis by Mass or Electrophoresis
Arrays - Custom or Universal
eg. Taqman - Good for a few markers - lots of samples - PCR
eg. Sequenom or SnapShot - Moderate Multiplexing reducing costs
eg. Affymetrics, Illumina or ParAllele - Highly multiplexed - HighThroughput - Genotype directly on genomic DNA
![Page 85: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/85.jpg)
Taqman
Genotyping with fluorescence-based homogenous assays (single-tube assay)
A
G
Reporter Quencher
![Page 86: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/86.jpg)
Genotype Calling - Cluster Analysis
![Page 87: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/87.jpg)
Genotyping by Mass Spectrometry
Multiplex ~ 5 SNPs
![Page 88: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/88.jpg)
Polymorphism Polymorphism
60/40 85/15
Population 1 Population 2Pooled DNA Pooled DNA
PCR Pooled DNA Quantitative AssayEstimate Allele Frequency
PCR Pooled DNA Quantitative AssayEstimate Allele Frequency
Comparative Genotyping in Populations
![Page 89: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/89.jpg)
Pooled Genotyping
Advantages:
Speed, Cost
Major Disadvantages:
Loss of haplotype information Loss of stratification by phenotype
or environmental factors
![Page 90: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/90.jpg)
SNP Genotyping
Custom SNP Genotyping Chips:
![Page 91: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/91.jpg)
Locus 1 Specific Sequence
cTag1 sequenceTag1 sequence
SubstrateBead or Chip
Tag 1
Tag 2
Tag 3
Tag 4
Chip ArrayBead Array
Multiplexed Genotyping - Universal Tag Readouts
Locus 2 Specific Sequence
cTag2 sequenceTag2 sequence
SubstrateBead or Chip
C T A G
Multiplex ~1,000 SNPs
Not dependent on primary PCRIllumina ParAllele
![Page 92: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/92.jpg)
Illumina Genotyping - Gap Ligation
![Page 93: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/93.jpg)
1,000 SNPs Assayed on 96 Samples
![Page 94: SNP Discovery and Genotyping Workshop SNP discovery strategies Debbie Nickerson Identifying SNPs by association for genotype- phenotype analysis of candidate](https://reader035.vdocuments.net/reader035/viewer/2022062305/5697bfc71a28abf838ca7b89/html5/thumbnails/94.jpg)
SNP Genotyping
Lots of systems - Still costly but dropping
Offering Moderate to High throughputs
Systems vary in price $$ -$$$$
Laboratory Information Management Systems (Key: Track - Samples,
- Assays - Completion rate
- Reproducibility/Error Analysis)