snp

21
SNP Analysis Vipin Kumar Csci 8980 - DMBIO October 13, 2008 Vipin Kumar SNP Analysis

Upload: sean-paul

Post on 21-May-2015

867 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Snp

SNP Analysis

Vipin Kumar

Csci 8980 - DMBIO

October 13, 2008

Vipin Kumar SNP Analysis

Page 2: Snp

Messages

Single SNP Methods do not capture multi-locus interactions

Multi SNP Methods can do that,

But they can’t handle high dimensionality

Our work on Myeloma data

Vipin Kumar SNP Analysis

Page 3: Snp

Single Nucleotide Polymorphism

SNP is a single nucleotidevariation that occurs at anappreciable frequency (1% to 5%)

12 Million SNPs on humangenome

Spread uniformly across thegenome

Contributes to 90% of the geneticvariation in human genome

AGCGTGCATCAGTCAGCGTGCATCAGTCAGCGTGCATCTGTCAGCGTGCATCAGTCindividual 1

individual 2

individual 3

individual 4

SNP

Vipin Kumar SNP Analysis

Page 4: Snp

Linkage Disequilibrium (LD)

probability that no recombinationoccurs in between two alleles

close regions tend to stay togetherduring recombination

Regions that are far apart havelow LD

Regions that are close togetherhave high LD Figure:

Crossoverbetweenchromosomes

Vipin Kumar SNP Analysis

Page 5: Snp

Example LD Plot for a sample set of SNPs

Vipin Kumar SNP Analysis

Page 6: Snp

Single SNP approaches

Each SNP is tested for its association withthe phenotype

Most prevalent methods for testing SNPassociations are:

Chi-squared statistic testFishers exact testCochran-Armitage test, etc.

These tests give the probability ofassociation by chance

Sub

ject

s

PhenotypeSNPs

Vipin Kumar SNP Analysis

Page 7: Snp

Chi-square test

Observed Matrix:

MM Mm mm Row Sum

Affected 8 27 65 100Unaffected 70 20 10 100

Column Sum 78 47 75 200

Expected Matrix:

MM Mm mm Row Sum

Affected 39 23.5 37.5 100Unaffected 39 23.5 37.5 100

Column Sum 78 47 75 200

Vipin Kumar SNP Analysis

Page 8: Snp

Chi-square test

Expected Value E = Col .sum×RowsumTotalsamples

χ2 =∑ (O−E)2

E

Degrees of freedom = (m − 1) × (n − 1)

Using χ2 and deg. of freedom, lookup the probability offinding the observed matrix by chance

-log(p) is often used for convenience

Each snp having -log(p) > significance level is associated withthe phenotype

Vipin Kumar SNP Analysis

Page 9: Snp

GWAS of Coronary Artery Disease - Samani et. al. 2007

1926 cases (subjects with artery disease before 66 yrs)

2938 controls

377,857 SNPs

Found strong association with SNPs on chromosome 9

Vipin Kumar SNP Analysis

Page 10: Snp

GWAS for lung cancer - Amos et. al.

1,154 ever-smoking lung cancer cases

1,137 ever-smoking controls

317,498 SNPs

Vipin Kumar SNP Analysis

Page 11: Snp

Multi SNP approaches

Here multiple SNPs are tested forassociation with the phenotype

Most suitable for complex diseases

The following combinatorial methods areused:

Multifactor Dimensionality ReductionCombinatorial Partitioning Method, etc S

ub

ject

s

PhenotypeSNPs

Vipin Kumar SNP Analysis

Page 12: Snp

Multifactor Dimensionality Reduction

Vipin Kumar SNP Analysis

Page 13: Snp

Application

MDR reveals higher order interaction in sporadic breastcancer, Ritchie et. al. Am. J. Hum. Genet. 2001

200 women with sporadic breast cancer

Age matched controls (patients with other illness)

9 SNPs were considered

Found a 4 locus genotype with highest crossvalidationconsistency.

Vipin Kumar SNP Analysis

Page 14: Snp

Combinatorial Partitioning Method (for QTL)

Vipin Kumar SNP Analysis

Page 15: Snp

SNP Data Set for finding Associations

Each pixel is either MM (green), Mm (red) or mm (blue).

Vipin Kumar SNP Analysis

Page 16: Snp

Statistical Significance

Vipin Kumar SNP Analysis

Page 17: Snp

Classification Based Approaches

Controls

Cases

Cases

Cases

Controls

Controls

Test Set

Train set

Model

Classifier

Accuracy

Test

Train

Vipin Kumar SNP Analysis

Page 18: Snp

Using Location Information

Non-synonymous X X X X X

Introns X X X X

Synonymous X X X

Admixture X X X X

UTR X X X X

Other X X X

Accuracy 66.43 58.74 51.74 72.72 71.33 54.54 69.99

Nonsyn + Promolign (Syn + Introns): 75.75 %

Vipin Kumar SNP Analysis

Page 19: Snp

Statistical Significance

Vipin Kumar SNP Analysis

Page 20: Snp

Messages

Single SNP Methods do not capture multi-locus interactions

Multi SNP Methods can do that,

But they can’t handle high dimensionality

Our work on Myeloma data

Vipin Kumar SNP Analysis

Page 21: Snp

Questions?

Vipin Kumar SNP Analysis