(2) ratio statistics of gene expression levels and applications to microarray data analysis
DESCRIPTION
(2) Ratio statistics of gene expression levels and applications to microarray data analysis. Bioinformatics, Vol. 18, no. 9, 2002 Yidong Chen, Vishnu Kamat, Edward R. Dougherty, Michael L. Bittner, Paul S. Meltzer1, and Jeffery M. Trent. Outline. Introduction Ratio Statistics - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: (2) Ratio statistics of gene expression levels and applications to microarray data analysis](https://reader036.vdocuments.net/reader036/viewer/2022062519/56815027550346895dbe1489/html5/thumbnails/1.jpg)
(2) Ratio statistics of gene expression levels and applications to microarray data analysis
Bioinformatics, Vol. 18, no. 9, 2002
Yidong Chen, Vishnu Kamat, Edward R. Dougherty, Michael L. Bittner, Paul S. Meltzer1, and Jeffery M. Trent
![Page 2: (2) Ratio statistics of gene expression levels and applications to microarray data analysis](https://reader036.vdocuments.net/reader036/viewer/2022062519/56815027550346895dbe1489/html5/thumbnails/2.jpg)
OutlineOutline
Introduction
Ratio Statistics
Quality Metric for Ratio Statistics
Conclusion
![Page 3: (2) Ratio statistics of gene expression levels and applications to microarray data analysis](https://reader036.vdocuments.net/reader036/viewer/2022062519/56815027550346895dbe1489/html5/thumbnails/3.jpg)
IntroductionIntroduction
Motivation Expression-based analysis for large families of genes
has recently become possible owing to the development of cDNA microarrays, which allow simultaneous measurement of transcript levels for thousands of genes. For each spot on a microarray, signals in two channels must be extracted from their backgrounds. This requires algorithms to extract signals arising from tagged mRNA hybridized to arrayed cDNA locations and algorithms to determine the significance of signal ratios.
![Page 4: (2) Ratio statistics of gene expression levels and applications to microarray data analysis](https://reader036.vdocuments.net/reader036/viewer/2022062519/56815027550346895dbe1489/html5/thumbnails/4.jpg)
IntroductionIntroduction Results 1. estimation of signal ratios from the two channels,
and the significance of those ratios.
2. a refined hypothesis test is considered in which the measured intensities forming the ratio are assumed to be combinations of signal and background. The new method involves a signal-to-noise ratio, and for a high signal-to-noise ratio the new test reduces (with close approximation) to the original test. The effect of low signal-to-noise ratio on the ratio statistics constitutes the main theme of the paper.
3. a quality metric is formulated for spots
![Page 5: (2) Ratio statistics of gene expression levels and applications to microarray data analysis](https://reader036.vdocuments.net/reader036/viewer/2022062519/56815027550346895dbe1489/html5/thumbnails/5.jpg)
Ratio StatisticsRatio Statistics
![Page 6: (2) Ratio statistics of gene expression levels and applications to microarray data analysis](https://reader036.vdocuments.net/reader036/viewer/2022062519/56815027550346895dbe1489/html5/thumbnails/6.jpg)
Consider a microarray having n genes, with red and green fluorescent expression values labeled by
and , respectively.
Hypothesis test:
Assumption:
Ratio Statistics assuming a Ratio Statistics assuming a constant coefficient of variationconstant coefficient of variation
nRRR ,...,, 21 nGGG ,...,, 21
kk
kk
kk
GR
H
GG
RR
c
c
0under
kk
kk
GR
GR
H
H
:
:
1
0
![Page 7: (2) Ratio statistics of gene expression levels and applications to microarray data analysis](https://reader036.vdocuments.net/reader036/viewer/2022062519/56815027550346895dbe1489/html5/thumbnails/7.jpg)
Ratio test statistics:
Assuming and to be normally and
identically distributed, has the density function
Ratio Statistics assuming a Ratio Statistics assuming a constant coefficient of variation constant coefficient of variation (cont.)
kkk GRT /
kR kG
kT
],)1(2
)1(exp[
2)1(
1)1();(
2
2
22
2
tc
t
tc
ttctf
kT
n
i i
i
t
t
nc
12
2
)1(
)1(1ˆ
![Page 8: (2) Ratio statistics of gene expression levels and applications to microarray data analysis](https://reader036.vdocuments.net/reader036/viewer/2022062519/56815027550346895dbe1489/html5/thumbnails/8.jpg)
Ratio Statistics assuming a constant Ratio Statistics assuming a constant coefficient of variation coefficient of variation (cont.)
self-self experiment Duplicate
),log(log)log(log
logloglog
,'/
''
'
kkkk
kkk
GGRR
ttT
ttT
).log(loglog where
)log(1
log1
2
R
R
n
ik
RR
Rn
ck
![Page 9: (2) Ratio statistics of gene expression levels and applications to microarray data analysis](https://reader036.vdocuments.net/reader036/viewer/2022062519/56815027550346895dbe1489/html5/thumbnails/9.jpg)
Ratio Statistics assuming a Ratio Statistics assuming a constant coefficient of variation constant coefficient of variation (cont.)
Confidence interval
1. Integrating the ratio density function
2. The C.I. is determined by the parameter c, one can
either use the par. derived from pre-selected housekeeping genes or a set of duplicate genes.
2
2'log
2log
2'log
2loglog
4c
)()(
Therefore,
GGRRT
![Page 10: (2) Ratio statistics of gene expression levels and applications to microarray data analysis](https://reader036.vdocuments.net/reader036/viewer/2022062519/56815027550346895dbe1489/html5/thumbnails/10.jpg)
Ratio Statistics for low signal-Ratio Statistics for low signal-to-noise ratioto-noise ratio
The actual expression intensity measurement is of the form
kBRkkk BRSRR )(
level backgroundmean theis
and level, background fluoresent theis
, gene of
t measuremenintensity expression theis where
kBRk
k
BR
k
SR
![Page 11: (2) Ratio statistics of gene expression levels and applications to microarray data analysis](https://reader036.vdocuments.net/reader036/viewer/2022062519/56815027550346895dbe1489/html5/thumbnails/11.jpg)
Ratio Statistics for low signal-Ratio Statistics for low signal-to-noise ratio to-noise ratio (cont.)
Null hypothesis of interest:
test statistics:
kkkk GRSGSR HH : : 00
kk SGSR
k
k
k
SR
BRkk
kR
BRSRE
RE
])[(
][
kkk GRT /
![Page 12: (2) Ratio statistics of gene expression levels and applications to microarray data analysis](https://reader036.vdocuments.net/reader036/viewer/2022062519/56815027550346895dbe1489/html5/thumbnails/12.jpg)
Ratio Statistics for low signal-Ratio Statistics for low signal-to-noise ratio to-noise ratio (cont.)
Major difference:
1. the assumption of a constant cv applies to
and , not to and
2. the density of is not applicable
SNR (signal-to-noise ratio)
kSR
kSG kR kG
kT
![Page 13: (2) Ratio statistics of gene expression levels and applications to microarray data analysis](https://reader036.vdocuments.net/reader036/viewer/2022062519/56815027550346895dbe1489/html5/thumbnails/13.jpg)
Assuming that are independent,
SNR (SNR (signal-to-noise ratiosignal-to-noise ratio))
and kk BRSR
2222 )(kkkkk BRSRBRSRR c
k
k
kk
kBR
SR
BRBRk
kR BRE
SRESNR
][
][
2
22
22
2
222
2 1)(
kk
k
k
kk
k
k
kRSR
BR
SR
BRSR
R
RR SNR
ccc
c
![Page 14: (2) Ratio statistics of gene expression levels and applications to microarray data analysis](https://reader036.vdocuments.net/reader036/viewer/2022062519/56815027550346895dbe1489/html5/thumbnails/14.jpg)
The Expression intensity scatter plot
![Page 15: (2) Ratio statistics of gene expression levels and applications to microarray data analysis](https://reader036.vdocuments.net/reader036/viewer/2022062519/56815027550346895dbe1489/html5/thumbnails/15.jpg)
Confidence interval for the Confidence interval for the test statisticstest statistics
Assumption:
k
k
BGkk
BRkk
k
kk BGSG
BRSR
G
RT
)(
)(
BGBGBGp
BRBRBRp
NpN
NpNT
),(),(
),(),(
)( ,under 0 kkkk GRSGSRpH
t.independen and
ddistributenormally are ,,, kkkk BGBRSGSR
![Page 16: (2) Ratio statistics of gene expression levels and applications to microarray data analysis](https://reader036.vdocuments.net/reader036/viewer/2022062519/56815027550346895dbe1489/html5/thumbnails/16.jpg)
Confidence interval for the Confidence interval for the test statistics test statistics (cont.)
Under the assumption of constant cv for the signal (wi
thout the background),
cpp
ratio) std d(backgroun /
ratio) noise-to-(signal /
par.) (variance },max{
BGBR
B
BGBRB
ps
),0(),(
),0(),(
BGBB
BGBB
NcssN
NcssNT
![Page 17: (2) Ratio statistics of gene expression levels and applications to microarray data analysis](https://reader036.vdocuments.net/reader036/viewer/2022062519/56815027550346895dbe1489/html5/thumbnails/17.jpg)
The 99% confidence interval for ratio statistic
1 (b) )1or ( 100 (a)
,2.0
BGBR
c
![Page 18: (2) Ratio statistics of gene expression levels and applications to microarray data analysis](https://reader036.vdocuments.net/reader036/viewer/2022062519/56815027550346895dbe1489/html5/thumbnails/18.jpg)
Correction of background Correction of background estimationestimation
Owing to interaction between the fluorescent signal and background, local-background estimation is often biased.
To estimate the bias difference, we find the relationship between the red and green intensities under the null hypothesis by assuming a linear relation, G = aR+b.
![Page 19: (2) Ratio statistics of gene expression levels and applications to microarray data analysis](https://reader036.vdocuments.net/reader036/viewer/2022062519/56815027550346895dbe1489/html5/thumbnails/19.jpg)
Correction of background Correction of background estimation estimation (cont.)(cont.)
Simulation
1. generate 10,000 data points from exp. dist. with
2,000 to simulate 10,000 gene expression levels,
2. The intensity measurement for each channel is
further simulated by using a normal dist. with mean
intensity from the exp. dist. and a constant cv of 0.2
3. simulate background level by a normal dist.
(1) no bias: background level ~ N (0,100)
(2) some bias: background level ~ N (b,100)
),0(),(
),0(),(
BGp
BGp
NpN
NpNT
![Page 20: (2) Ratio statistics of gene expression levels and applications to microarray data analysis](https://reader036.vdocuments.net/reader036/viewer/2022062519/56815027550346895dbe1489/html5/thumbnails/20.jpg)
Scatter plot of simulated expression data
500 of bias estimation background with points data 10,000 (b)
estimation background from bias no with points data 10,000 )(a
dog-leg effect
![Page 21: (2) Ratio statistics of gene expression levels and applications to microarray data analysis](https://reader036.vdocuments.net/reader036/viewer/2022062519/56815027550346895dbe1489/html5/thumbnails/21.jpg)
Correction of background Correction of background estimation estimation (cont.)(cont.)
G = aR+b
we employ a chi-square fitting method that minimizes
N
k GR
kk
kk
baRG
122
22 ))((
N
k BGBRkk
N
k kkBGBRkk
GRc
RGGRcb
11222
11222
)ˆ2ˆ2)((
)()ˆ2ˆ2)((
![Page 22: (2) Ratio statistics of gene expression levels and applications to microarray data analysis](https://reader036.vdocuments.net/reader036/viewer/2022062519/56815027550346895dbe1489/html5/thumbnails/22.jpg)
Quality Metric for Ratio Quality Metric for Ratio StatisticsStatistics
For a given cDNA target, the following factors affect ratio measurement quality:
(1) Weak fluorescent intensities
(2) A smaller than normal detected target area
(3) A very high local background level
(4) A high standard deviation of target intensity
![Page 23: (2) Ratio statistics of gene expression levels and applications to microarray data analysis](https://reader036.vdocuments.net/reader036/viewer/2022062519/56815027550346895dbe1489/html5/thumbnails/23.jpg)
(1)Fluorescent intensity (1)Fluorescent intensity measurement quality measurement quality
Under the null hypothesis, the signal means are equal, so that
B
R
BGBR
RGR SNRSNR
},max{
},min{
otherwise , 1
6ˆ 2
GR3 ,
ˆ 6
GR
3ˆ 2
GR ,0
obtain to,ˆ and G)/2(R ,estimators
hypothesis-nullby their and replace We
BB
B
B
BR
Iw
![Page 24: (2) Ratio statistics of gene expression levels and applications to microarray data analysis](https://reader036.vdocuments.net/reader036/viewer/2022062519/56815027550346895dbe1489/html5/thumbnails/24.jpg)
(2)Target area measurement (2)Target area measurement quality quality
.target
theof components connectedlargest two theof
area thebe let and tip,-print particular afor
t cDNA targe theofmask of area thebe Let
k
A
A
kT
M
otherwise ,1
20.0 ,
}05.0,/10max{a ,0
by
qualityt measuremen area the define We
./
istarget each of area alproportion The
minmin
min
min
bb
M
a
MTk
sasss
a-s
As
w
AAak
![Page 25: (2) Ratio statistics of gene expression levels and applications to microarray data analysis](https://reader036.vdocuments.net/reader036/viewer/2022062519/56815027550346895dbe1489/html5/thumbnails/25.jpg)
(3)Background flatness quality(3)Background flatness quality
Define background flatness
similarly. defined is and
6 ,0
64 ,3
)6(
4 ,1
where},,min{
BG
BRBRk
BRBRkBRBRBR
kBRBR
BRBRk
BR
BGBRb
w
BR
BRBR
BR
w
www
![Page 26: (2) Ratio statistics of gene expression levels and applications to microarray data analysis](https://reader036.vdocuments.net/reader036/viewer/2022062519/56815027550346895dbe1489/html5/thumbnails/26.jpg)
(4)Signal intensity consistency (4)Signal intensity consistency quality quality
Typical target shap
cv=0.48 cv=0.45 cv=0.31
cv=0.81 cv=0.98 cv=0.59
![Page 27: (2) Ratio statistics of gene expression levels and applications to microarray data analysis](https://reader036.vdocuments.net/reader036/viewer/2022062519/56815027550346895dbe1489/html5/thumbnails/27.jpg)
(4)Signal intensity consistency (4)Signal intensity consistency quality quality (cont.)
9.0 ,1
1.10.9 ,2.0
9.0
1.1 ,0
channels,green and
red for the variationoft coefficienintensity the
between minimun thedenote Letting
min,
min,min,
min,
min,
k
kk
k
s
k
cv
cvcv
cv
w
cv