felix naef & marcelo magnasco, gl meeting, nov. 19 2001 [email protected] outline

Felix Naef & Marcelo Magnasco, GL meeting, Nov. 19 [email protected]

Outline• Background subtraction

• Probeset statistics

Excursions into GeneChip data analysis

Background estimation

• estimate both mean B and fluctuations • needed in low-intensity regime

• includes light reflection from substrate,

photodetector dark current, some cross-

hybridization (i.e. small residues)

• by the CLT, background is expected to be a Gaussian variable

• idea: B is insensitive to MM and visible at low

intensity

• select probes such that |PM-MM| < (locally?)

• use =50 (new) or 100 (old settings)

• P(PM) or P(MM) is convolution of Gaussian and

step function

“+” =

0 B

B

Real P( P

M)

example:

)

dependence on

trick for dealing with negative values

PM vs. MM distribution

MM>PMMM>PM

make a histogramin this regionmake a histogramin this region

zoom

PM vs. MM histogram

MM>PM across different chips

MM>PM not concentrated at low intensities: 27% of probe pairs with MM>PM are in the top quartile

Chip Dros HG85A Mu11K U74A YG_S98# pairs 14 16 20 16 16# samples 36 86 24 12 4% MM>PM 0.35 0.31 0.34 0.34 0.17% probesets with 1 MM>PM 0.951 0.91 0.95 0.92 0.73% probesets with 5 MM>PM 0.58 0.56 0.71 0.64 0.21% probesets with 10 MM>PM 0.04 0.07 0.26 0.1 0.02

probe pairs trajectories (~80 chips)

• take all (PM, MM) for

a given probe set• center of mass (x,y)• ellipsoid of inertia

> and

• histogram the cm’s• color code acc. to

s = / (min(x, y

~ noise detrending

all probe sets

blue : large sgreen : midred : small

probes with ‘well’defined trajectories (eccentricity > 3)

~1/3 of probes

blue : largegreen : midred : small

PM within a probe set

Are the brightness of the probes reasonably uniform? Or do different probes have very different hybridization efficiencies?

So what can possibly be happening?

• sequence dependent hybridization efficiencies

are kinetic effects important?• cross-hybridization beyond what is detectable by

MM probes

this is hard to assess without sequence info• sequence dependent fabrication efficiencies?

variable probe densities

Composite scores

What have we learned from previous slides?

• MM are not consistently behaving as expected

- What about not using them ?

• The probe set intensities vary over decades

- difficult to estimate absolute intensities using ‘averages’ (alternative: Li and Wong)- we focus on ratio scores

Outline of algorithm

1. estimate background (mean and std)

2. discard noisy and saturated probes use either only PM or PM-MM as raw intensities

3. average the remaining log-ratios in an outlier robust way (robust regression to intercept), SE

4. normalize by centering (event. local) log-ratio distribution

felix naef & marcelo magnasco, gl meeting, nov. 19 2001 [email protected] outline

Documents