statistical methods for identifying differentially expressed genes in replicated cdna microarray...
TRANSCRIPT
![Page 1: Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649f2c5503460f94c47ba6/html5/thumbnails/1.jpg)
Statistical Methods for Identifying Differentially Expressed Genes in
Replicated cDNA Microarray Experiments
Presented by Nan Lin
13 October 2002
![Page 2: Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649f2c5503460f94c47ba6/html5/thumbnails/2.jpg)
Introduction to cDNA Microarray Experiment
Single-slide Design– Two mRNA samples (red/green) on the same slide
Multiple-slide Design– Two or more types of mRNA on different slides– Exclude: time-course experiment
![Page 3: Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649f2c5503460f94c47ba6/html5/thumbnails/3.jpg)
Examples of Multiple-slide Design
Apo AI– Treatment group: 8 mice with apo AI gene knocked out– Control group: 8 C57B1/6 mice– Cy5: each of 16 mice– Cy3: pooling cDNA from 8 control mice
SR-BI– Treatment group: 8 SR-BI transgenic mice– Control group: 8 “normal” FVB mice
Microarray Setup– 6384 spots, 4X4 grids with 19X21 spots in each
![Page 4: Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649f2c5503460f94c47ba6/html5/thumbnails/4.jpg)
Single-slide Methods
Two types– Based solely on intensity ratio R/G– Take into account overall transcript abundance measured by
R*G
Historical Review– Fold increase/decrease cut-offs (1995-1996)– Probabilistic modeling based on distributional assumptions
(1997-2000)– Consider R*G (2000-2001) e.g. Gamma-Gamma-Bernoulli
![Page 5: Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649f2c5503460f94c47ba6/html5/thumbnails/5.jpg)
Summary of Single-slide Methods
Producing a model dependent rule: drawing two curves in the (R,G) plane
– Power (1-Type II error rate)– False positive rate (Type I error rate)
Multiple testing
Replication is needed because gene expression data are too noisy
![Page 6: Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649f2c5503460f94c47ba6/html5/thumbnails/6.jpg)
Image Analysis
“Raw” data: 16-bit TIFF files Addressing
– Within a batch, important characteristics are similar Segmentation
– Seeded region growing algorithm Background adjustment
– Morphological opening (a nonlinear filter) Software package: Spot in R environment
![Page 7: Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649f2c5503460f94c47ba6/html5/thumbnails/7.jpg)
Single-slide Data Display
Plot log2R vs. log2G– variation less dependent on absolute magnitude– normalization is additive for logged intensities– evens out highly skewed distributions– a more realistic sense of variation
Plot M=log2 (R/G) vs. A=[log2(RG)]/2– More revealing in terms of identifying spot artifacts
and for normalization purpose
![Page 8: Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649f2c5503460f94c47ba6/html5/thumbnails/8.jpg)
Normalization
Identify and remove sources of systematic variation other than differential expression
– Different labeling efficiencies and scanning properties for Cy3 and Cy5
– Different scanning parameters– Print-tip, spatial or plate effects
Red intensity is often lower than green intensity The imbalance between R and G varies
– across spots and between arrays– Overall spot intensity A– Location on the array, plate origin, etc.
![Page 9: Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649f2c5503460f94c47ba6/html5/thumbnails/9.jpg)
An Example: Self-Self Experiment
![Page 10: Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649f2c5503460f94c47ba6/html5/thumbnails/10.jpg)
Normalization (Cont.)
Global normalization– subtract mean or median from all intensity log-ratios
More complex normalization– Robust locally weighted regression
M=spot intensity A+location+plate origin Use print-tip group to represent the spot locations log2 (R/G) log2 (R/G) –l(A,j) l(A,j): lowess in R (0.2<f<0.4)
Control sequences
![Page 11: Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649f2c5503460f94c47ba6/html5/thumbnails/11.jpg)
Apo AI: Normalization
![Page 12: Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649f2c5503460f94c47ba6/html5/thumbnails/12.jpg)
Graphical Display for Test Statistics (I)
Test statistics– Hj: no association between treatment and the
expression level of gene j, j=1,…,m.– Two-sided alternative– Two-sample Welch t-statistics– Replication is essential to assess the variability in
treatment and control group– The joint distribution is estimated by a permutation
procedure because the actual distribution is not a t-distribution
![Page 13: Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649f2c5503460f94c47ba6/html5/thumbnails/13.jpg)
Graphical Display for Test Statistics (II)
Quantile-Quantile plots
![Page 14: Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649f2c5503460f94c47ba6/html5/thumbnails/14.jpg)
Graphical Display for Test Statistics (III)
Plots vs. absolute expression levels
![Page 15: Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649f2c5503460f94c47ba6/html5/thumbnails/15.jpg)
Multiple Hypothesis Testing: Adjusted p-values (I)
P-value: Pj=Pr(|Tj|>=|tj||Hj), j=1,…,m. Family-wise Type I Error Rate (FWER)
– The probability of at least one Type I error in the family
Strong Control of the FWER– Control the FWER for any combination of true and false
hypotheses
Weak Control of the FWER– Control the FWER only under the complete null hypothesis
that all hypotheses in the family are true
![Page 16: Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649f2c5503460f94c47ba6/html5/thumbnails/16.jpg)
Multiple Hypothesis Testing: Adjusted p-values (II)
Adjusted p-value for Hj
– Pj=inf{a: Hj is rejected at FWER=a}
– Hj is rejected at FWER a if Pj<=a
P-value adjustment approaches– Bonferroni – Sidak single-step– Holm step-down– Westfall and Young step-down minP
![Page 17: Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649f2c5503460f94c47ba6/html5/thumbnails/17.jpg)
Multiple Hypothesis Testing: Estimation of adjusted p-values (I)
![Page 18: Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649f2c5503460f94c47ba6/html5/thumbnails/18.jpg)
Multiple Hypothesis Testing: Estimation of adjusted p-values (II)
![Page 19: Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649f2c5503460f94c47ba6/html5/thumbnails/19.jpg)
Apo AI: Adjusted p-values (I)
![Page 20: Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649f2c5503460f94c47ba6/html5/thumbnails/20.jpg)
Apo AI: Adjusted p-values (II)
![Page 21: Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649f2c5503460f94c47ba6/html5/thumbnails/21.jpg)
Apo AI: Comparison with Single-slide Methods
![Page 22: Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002](https://reader036.vdocuments.net/reader036/viewer/2022081603/56649f2c5503460f94c47ba6/html5/thumbnails/22.jpg)
Discussion
M-A plots Normalization
– Robust local regression, e.g. lowess Q-Q plots & Plots vs. absolute expression level False discovery rate (FDR) Replication is necessary Design issues Factorial experiments Joint behavior of genes R package SMA