snp
TRANSCRIPT
SNP Analysis
Vipin Kumar
Csci 8980 - DMBIO
October 13, 2008
Vipin Kumar SNP Analysis
Messages
Single SNP Methods do not capture multi-locus interactions
Multi SNP Methods can do that,
But they can’t handle high dimensionality
Our work on Myeloma data
Vipin Kumar SNP Analysis
Single Nucleotide Polymorphism
SNP is a single nucleotidevariation that occurs at anappreciable frequency (1% to 5%)
12 Million SNPs on humangenome
Spread uniformly across thegenome
Contributes to 90% of the geneticvariation in human genome
AGCGTGCATCAGTCAGCGTGCATCAGTCAGCGTGCATCTGTCAGCGTGCATCAGTCindividual 1
individual 2
individual 3
individual 4
SNP
Vipin Kumar SNP Analysis
Linkage Disequilibrium (LD)
probability that no recombinationoccurs in between two alleles
close regions tend to stay togetherduring recombination
Regions that are far apart havelow LD
Regions that are close togetherhave high LD Figure:
Crossoverbetweenchromosomes
Vipin Kumar SNP Analysis
Example LD Plot for a sample set of SNPs
Vipin Kumar SNP Analysis
Single SNP approaches
Each SNP is tested for its association withthe phenotype
Most prevalent methods for testing SNPassociations are:
Chi-squared statistic testFishers exact testCochran-Armitage test, etc.
These tests give the probability ofassociation by chance
Sub
ject
s
PhenotypeSNPs
Vipin Kumar SNP Analysis
Chi-square test
Observed Matrix:
MM Mm mm Row Sum
Affected 8 27 65 100Unaffected 70 20 10 100
Column Sum 78 47 75 200
Expected Matrix:
MM Mm mm Row Sum
Affected 39 23.5 37.5 100Unaffected 39 23.5 37.5 100
Column Sum 78 47 75 200
Vipin Kumar SNP Analysis
Chi-square test
Expected Value E = Col .sum×RowsumTotalsamples
χ2 =∑ (O−E)2
E
Degrees of freedom = (m − 1) × (n − 1)
Using χ2 and deg. of freedom, lookup the probability offinding the observed matrix by chance
-log(p) is often used for convenience
Each snp having -log(p) > significance level is associated withthe phenotype
Vipin Kumar SNP Analysis
GWAS of Coronary Artery Disease - Samani et. al. 2007
1926 cases (subjects with artery disease before 66 yrs)
2938 controls
377,857 SNPs
Found strong association with SNPs on chromosome 9
Vipin Kumar SNP Analysis
GWAS for lung cancer - Amos et. al.
1,154 ever-smoking lung cancer cases
1,137 ever-smoking controls
317,498 SNPs
Vipin Kumar SNP Analysis
Multi SNP approaches
Here multiple SNPs are tested forassociation with the phenotype
Most suitable for complex diseases
The following combinatorial methods areused:
Multifactor Dimensionality ReductionCombinatorial Partitioning Method, etc S
ub
ject
s
PhenotypeSNPs
Vipin Kumar SNP Analysis
Multifactor Dimensionality Reduction
Vipin Kumar SNP Analysis
Application
MDR reveals higher order interaction in sporadic breastcancer, Ritchie et. al. Am. J. Hum. Genet. 2001
200 women with sporadic breast cancer
Age matched controls (patients with other illness)
9 SNPs were considered
Found a 4 locus genotype with highest crossvalidationconsistency.
Vipin Kumar SNP Analysis
Combinatorial Partitioning Method (for QTL)
Vipin Kumar SNP Analysis
SNP Data Set for finding Associations
Each pixel is either MM (green), Mm (red) or mm (blue).
Vipin Kumar SNP Analysis
Statistical Significance
Vipin Kumar SNP Analysis
Classification Based Approaches
Controls
Cases
Cases
Cases
Controls
Controls
Test Set
Train set
Model
Classifier
Accuracy
Test
Train
Vipin Kumar SNP Analysis
Using Location Information
Non-synonymous X X X X X
Introns X X X X
Synonymous X X X
Admixture X X X X
UTR X X X X
Other X X X
Accuracy 66.43 58.74 51.74 72.72 71.33 54.54 69.99
Nonsyn + Promolign (Syn + Introns): 75.75 %
Vipin Kumar SNP Analysis
Statistical Significance
Vipin Kumar SNP Analysis
Messages
Single SNP Methods do not capture multi-locus interactions
Multi SNP Methods can do that,
But they can’t handle high dimensionality
Our work on Myeloma data
Vipin Kumar SNP Analysis
Questions?
Vipin Kumar SNP Analysis