felix naef & marcelo magnasco, gl meeting, nov. 19 2001 [email protected] outline

15
Felix Naef & Marcelo Magnasco, GL meeting, Nov. 19 2001 [email protected] Outline Background subtraction Probeset statistics Excursions into GeneChip data analysis

Upload: addison

Post on 15-Jan-2016

15 views

Category:

Documents


0 download

DESCRIPTION

Felix Naef & Marcelo Magnasco, GL meeting, Nov. 19 2001 [email protected] Outline. Excursions into GeneChip data analysis. Background subtraction Probeset statistics. Background estimation. estimate both mean B and fluctuations s needed in low-intensity regime - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Felix Naef & Marcelo Magnasco, GL meeting, Nov. 19 2001 felix@funes.rockefeller Outline

Felix Naef & Marcelo Magnasco, GL meeting, Nov. 19 [email protected]

Outline• Background subtraction

• Probeset statistics

Excursions into GeneChip data analysis

Page 2: Felix Naef & Marcelo Magnasco, GL meeting, Nov. 19 2001 felix@funes.rockefeller Outline

Background estimation

• estimate both mean B and fluctuations • needed in low-intensity regime

• includes light reflection from substrate,

photodetector dark current, some cross-

hybridization (i.e. small residues)

• by the CLT, background is expected to be a Gaussian variable

Page 3: Felix Naef & Marcelo Magnasco, GL meeting, Nov. 19 2001 felix@funes.rockefeller Outline

• idea: B is insensitive to MM and visible at low

intensity

• select probes such that |PM-MM| < (locally?)

• use =50 (new) or 100 (old settings)

• P(PM) or P(MM) is convolution of Gaussian and

step function

“+” =

0 B

B

Real P( P

M)

Page 4: Felix Naef & Marcelo Magnasco, GL meeting, Nov. 19 2001 felix@funes.rockefeller Outline

example:

)

dependence on

Page 5: Felix Naef & Marcelo Magnasco, GL meeting, Nov. 19 2001 felix@funes.rockefeller Outline

trick for dealing with negative values

Page 6: Felix Naef & Marcelo Magnasco, GL meeting, Nov. 19 2001 felix@funes.rockefeller Outline

PM vs. MM distribution

MM>PMMM>PM

make a histogramin this regionmake a histogramin this region

zoom

Page 7: Felix Naef & Marcelo Magnasco, GL meeting, Nov. 19 2001 felix@funes.rockefeller Outline

PM vs. MM histogram

Page 8: Felix Naef & Marcelo Magnasco, GL meeting, Nov. 19 2001 felix@funes.rockefeller Outline

MM>PM across different chips

MM>PM not concentrated at low intensities: 27% of probe pairs with MM>PM are in the top quartile

Chip Dros HG85A Mu11K U74A YG_S98# pairs 14 16 20 16 16# samples 36 86 24 12 4% MM>PM 0.35 0.31 0.34 0.34 0.17% probesets with 1 MM>PM 0.951 0.91 0.95 0.92 0.73% probesets with 5 MM>PM 0.58 0.56 0.71 0.64 0.21% probesets with 10 MM>PM 0.04 0.07 0.26 0.1 0.02

Page 9: Felix Naef & Marcelo Magnasco, GL meeting, Nov. 19 2001 felix@funes.rockefeller Outline

probe pairs trajectories (~80 chips)

• take all (PM, MM) for

a given probe set• center of mass (x,y)• ellipsoid of inertia

> and

• histogram the cm’s• color code acc. to

s = / (min(x, y

~ noise detrending

Page 10: Felix Naef & Marcelo Magnasco, GL meeting, Nov. 19 2001 felix@funes.rockefeller Outline

all probe sets

blue : large sgreen : midred : small

Page 11: Felix Naef & Marcelo Magnasco, GL meeting, Nov. 19 2001 felix@funes.rockefeller Outline

probes with ‘well’defined trajectories (eccentricity > 3)

~1/3 of probes

blue : largegreen : midred : small

Page 12: Felix Naef & Marcelo Magnasco, GL meeting, Nov. 19 2001 felix@funes.rockefeller Outline

PM within a probe set

Are the brightness of the probes reasonably uniform? Or do different probes have very different hybridization efficiencies?

Page 13: Felix Naef & Marcelo Magnasco, GL meeting, Nov. 19 2001 felix@funes.rockefeller Outline

So what can possibly be happening?

• sequence dependent hybridization efficiencies

are kinetic effects important?• cross-hybridization beyond what is detectable by

MM probes

this is hard to assess without sequence info• sequence dependent fabrication efficiencies?

variable probe densities

Page 14: Felix Naef & Marcelo Magnasco, GL meeting, Nov. 19 2001 felix@funes.rockefeller Outline

Composite scores

What have we learned from previous slides?

• MM are not consistently behaving as expected

- What about not using them ?

• The probe set intensities vary over decades

- difficult to estimate absolute intensities using ‘averages’ (alternative: Li and Wong)- we focus on ratio scores

Page 15: Felix Naef & Marcelo Magnasco, GL meeting, Nov. 19 2001 felix@funes.rockefeller Outline

Outline of algorithm

1. estimate background (mean and std)

2. discard noisy and saturated probes use either only PM or PM-MM as raw intensities

3. average the remaining log-ratios in an outlier robust way (robust regression to intercept), SE

4. normalize by centering (event. local) log-ratio distribution