felix naef & marcelo magnasco, gl meeting, nov. 19 2001 [email protected] outline
DESCRIPTION
Felix Naef & Marcelo Magnasco, GL meeting, Nov. 19 2001 [email protected] Outline. Excursions into GeneChip data analysis. Background subtraction Probeset statistics. Background estimation. estimate both mean B and fluctuations s needed in low-intensity regime - PowerPoint PPT PresentationTRANSCRIPT
Felix Naef & Marcelo Magnasco, GL meeting, Nov. 19 [email protected]
Outline• Background subtraction
• Probeset statistics
Excursions into GeneChip data analysis
Background estimation
• estimate both mean B and fluctuations • needed in low-intensity regime
• includes light reflection from substrate,
photodetector dark current, some cross-
hybridization (i.e. small residues)
• by the CLT, background is expected to be a Gaussian variable
• idea: B is insensitive to MM and visible at low
intensity
• select probes such that |PM-MM| < (locally?)
• use =50 (new) or 100 (old settings)
• P(PM) or P(MM) is convolution of Gaussian and
step function
“+” =
0 B
B
Real P( P
M)
example:
)
dependence on
trick for dealing with negative values
PM vs. MM distribution
MM>PMMM>PM
make a histogramin this regionmake a histogramin this region
zoom
PM vs. MM histogram
MM>PM across different chips
MM>PM not concentrated at low intensities: 27% of probe pairs with MM>PM are in the top quartile
Chip Dros HG85A Mu11K U74A YG_S98# pairs 14 16 20 16 16# samples 36 86 24 12 4% MM>PM 0.35 0.31 0.34 0.34 0.17% probesets with 1 MM>PM 0.951 0.91 0.95 0.92 0.73% probesets with 5 MM>PM 0.58 0.56 0.71 0.64 0.21% probesets with 10 MM>PM 0.04 0.07 0.26 0.1 0.02
probe pairs trajectories (~80 chips)
• take all (PM, MM) for
a given probe set• center of mass (x,y)• ellipsoid of inertia
> and
• histogram the cm’s• color code acc. to
s = / (min(x, y
~ noise detrending
all probe sets
blue : large sgreen : midred : small
probes with ‘well’defined trajectories (eccentricity > 3)
~1/3 of probes
blue : largegreen : midred : small
PM within a probe set
Are the brightness of the probes reasonably uniform? Or do different probes have very different hybridization efficiencies?
So what can possibly be happening?
• sequence dependent hybridization efficiencies
are kinetic effects important?• cross-hybridization beyond what is detectable by
MM probes
this is hard to assess without sequence info• sequence dependent fabrication efficiencies?
variable probe densities
Composite scores
What have we learned from previous slides?
• MM are not consistently behaving as expected
- What about not using them ?
• The probe set intensities vary over decades
- difficult to estimate absolute intensities using ‘averages’ (alternative: Li and Wong)- we focus on ratio scores
Outline of algorithm
1. estimate background (mean and std)
2. discard noisy and saturated probes use either only PM or PM-MM as raw intensities
3. average the remaining log-ratios in an outlier robust way (robust regression to intercept), SE
4. normalize by centering (event. local) log-ratio distribution