quality control: artifacts, visualization, qc as residual...

40
Quality control: artifacts, visualization, QC as residual analysis Further topics on preprocessing: probe set summaries, physics Wolfgang Huber DKFZ Heidelberg

Upload: others

Post on 03-Dec-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

Quality control: artifacts, visualization, QC as residual analysis

Further topics on preprocessing: probe set summaries, physics

Wolfgang Huber

DKFZ Heidelberg

Page 2: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

Normal QQ-plotvsn-transformed data

Page 3: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

Scatterplot, colored by PCR-plateTwo RZPD Unigene II filters (cDNA nylon membranes)

PCR plates

Page 4: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

PCR plates

Page 5: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

PCR plates: boxplots

Page 6: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

array batches

Page 7: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

print-tip effects

-0.8 -0.6 -0.4 -0.2 0.0 0.2

0.0

0.2

0.4

0.6

0.8

1.0

41 (a42-u07639vene.txt) by spotting pin

log(fg.green/fg.red)

1:11:21:31:42:12:22:32:43:13:23:33:44:14:24:34:4

log-ratio

empirica

l CD

F

Page 8: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

spotting pin quality declinespotting pin quality decline

after delivery of 3x105 spots

after delivery of 5x105 spots

H. Sueltmann DKFZ/MGA

Page 9: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

spatial effectsspatial effects

R Rb R-Rbcolor scale by rank

spotted cDNA arrays, Stanford-type

another array:

print-tip

color scale ~ log(G)

color scale ~ rank(G)

Page 10: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

One RNA, four slides

JörgSchneider,

DKFZ

Page 11: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

Spot DNA concentration: ratio compression

Yue et al., (Incyte

Genomics) NAR

(2001) 29 e41

Page 12: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2
Page 13: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

Factors that affect measurements

ArraysPCR yield: plate bias

ratio compressionSpotting / wear of pins: pin biasBatch effects: density and steric accessibility of probesHybridization chamber asymmetries: spatial gradients

Samples Ascertainment: RNA degradation

contaminationAmplificationRNA purificationLabelingWashingScanner

Page 14: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

10 20 30 40 50 60

1020

3040

5060

1:nrhyb

1:nr

hyb

1 2 3 4 5 6 7 8 910111213141516171823242526272829303132333435363738737475767778798081828384858687888990919293949596979899100

0.6

0.8

1.0

1.2

1.4

1.6

1.8

Batches: array to array differences dij = madk(hik -hjk)

arrays i=1…63; roughly sorted by time

Page 15: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

Scatterplots

Page 16: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

Histogram

Page 17: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

Density representation of the scatterplot(76,000 clones, RZPD Unigene-II filters)

Page 18: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

Density representation of the scatterplot(76,000 clones, RZPD Unigene-II filters)

Page 19: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

Quantities that can be used for QCControl data:Positive controls (e.g. metallothioneins in kidney)Negative controls (e.g. nonhomologous probes)(Spikein cDNA)

Hot data:reproducibility / similarity:replicate probes per arrayreplicate arrays per samplemultiple probes per transcriptmultiple samples per biological conditionAbsence of correlation with technical factors (enzyme-

bacth, spatial location on array, …)signal:amplitude / quantity of differences between samples

known to be biological different

Page 20: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

Quantities that can be used for QCEssential:Experimental design that minimizes role of technical effectsbiological groups are balanced/randomized

Page 21: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

A model-based approach to QCMake theoretically and/or empirically founded modelling

assumptions on the data, then see if a given set of data fits. If no, the data is bad.

Examples:- additive-multiplicative error model with affine chip

effects- additive-multiplicative error model with affine chip-

und pin-effects- Li-Wing model with probe- and sample effects- affyPLM (later … first we need some background on

Affymetrix)

Page 22: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

Affymetrix expression measuresPMijg , MMijg = Intensity for perfect match and

mismatch probe j for gene g in chip i. i = 1,…, n one to hundreds of chipsj = 1,…, J usually 16 or 20 probe pairsg = 1,…, G 8…20,000 probe sets.

Tasks:calibrate (normalize) the measurements from

different chips (samples)summarize for each probe set the probe level data,

i.e., 20 PM and MM pairs, into a single expression measure.

compare between chips (samples) for detecting differential expression.

Page 23: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

expression measures: MAS 4.0

expression measures: MAS 4.0

Affymetrix GeneChip MAS 4.0 software uses AvDiff, a trimmed mean:

o sort dj = PMj -MMjo exclude highest and lowest valueo J := those pairs within 3 standard deviations of the average

1 ( )# j j

j JAvDiff PM MM

J ∈

= −∑

Page 24: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

Expression measures MAS 5.0

Expression measures MAS 5.0

Instead of MM, use "repaired" version CTCT= MM if MM<PM

= PM / "typical log-ratio" if MM>=PM

"Signal" =Tukey.Biweight (log(PM-CT))

(… ≈median)

Tukey Biweight: B(x) = (1 – (x/c)^2)^2 if |x|<c, 0 otherwise

Page 25: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

Expression measures: Li & Wong

Expression measures: Li & Wong

dChip fits a model for each gene

where– θi: expression index for gene i– φj: probe sensitivity

Maximum likelihood estimate of MBEI is used as expression measure of the gene in chip i.

Need at least 10 or 20 chips.

Current version works with PMs only.

2, (0, )ij ij i j ij ijPM MM Nθ φ ε ε σ− = + ∝

Page 26: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

Expression measures RMA: Irizarry et al. (2002)

Expression measures RMA: Irizarry et al. (2002)

o Estimate one global background value b=mode(MM). No probe-specific background!

o Assume: PM = strue + bEstimate s≥0 from PM and b as a conditional expectation E[strue|PM, b].

o Use log2(s).o Nonparametric nonlinear calibration

('quantile normalization') across a set of chips.

Page 27: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

AvDiff-like

with A a set of “suitable” pairs.

Li-Wong-like: additive model

Estimate RMA = ai for chip i using robust method median polish (successively remove row and column medians, accumulate terms, until convergence). Works with d>=2

21RMA log ( )j j

j APM BG

= −Α ∑

Robust expression measures RMA: Irizarry et al. (2002)Robust expression measures RMA: Irizarry et al. (2002)

2log ( )ij i j ijPM BG a b ε− = + +

Page 28: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

IPM = IMM + Ispecific ?

log(PM/MM)0From: R. Irizarry et al.,

Biostatistics 2002

Page 29: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

Physico-chemical modeling of the probe intensities: the riddle of the bright mismatches

Naef et al., Phys Rev E 68 (2003)

Page 30: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

Physico-chemical modeling of the probe intensities: the riddle of the bright mismatches

Felix Naef et al., Phys Rev E 68 (2003)

purines2 ringsMM: 2 large molecules -> sterichindrance

pyrimidines1 ringMM: 2 small molecules -> no problem

This explains the existence of two populations, but not their location

Page 31: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

Physico-chemical modeling of the probe intensities: the riddle of the bright mismatches

+ + +

1 2 25log ~ ...( )ii

PM s s smed PM

Naef et al., Phys Rev E 68 (2003)

Fit a statistical model for the deviation of a probe’s intensity from its probe set’s median intensity

si: factor representing nucleotide (A, C, G, T) at i-thposition

Page 32: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

Physico-chemical modeling of the probe intensities: the riddle of the bright mismatches

i

wi

Page 33: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

Physico-chemical modeling of the probe intensities: the riddle of the bright mismatches

o Changing one A into C in the middle of the probe: e0.4~1.5o Left/right asymmetryo Asymmetry A vs T: A-T bonds are not equivalent to T-A bonds! (similar for G vs C). o Labels are at U and C

G-C* (PM) dimmer than C-C* (MM)

Page 34: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

affyPLM packageFitting linear models to probe set intensities

across mutliple arrays

ypi ~ p + ai + …

Ypi intensity of probe p (e.g. 1…11) on array ip probe ID (factor)ai array effect… further biological factors!

Page 35: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

affyPLM package

affy::fitPLM example: robust linear model for Dilution data witheffect for liver dilution level and scanner

Pset <- fitPLM(Dilution, model = PM ~ -1 + probes + liver + scanner, normalize = FALSE, background = FALSE)

Result:For each probe: weightFor measurement (probe*chip): residual

Page 36: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

Ben Bolstad’s PLM Image Hall of Fame

residuals

Page 37: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

Ben Bolstad’s PLM Image Hall of Fame

residuals

Page 38: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

Ben Bolstad’s PLM Image Hall of Fame

from Affymetrix’ HGU95a latin square spike-in data set

Page 39: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

Clickable plots via client side imagemaps

1. Plate plots

2. Domain combination graph

3. prada

Page 40: Quality control: artifacts, visualization, QC as residual ...compdiag.molgen.mpg.de/ngfn/docs/2004/jun/QualityControl.pdf · Xprint-tip effects-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.0 0.2

ReferencesNormalization for cDNA microarray data: a robust composite method

addressing single and multiple slide systematic variation. YH Yang, SDudoit, P Luu, DM Lin, V Peng, J Ngai and TP Speed. Nucl. AcidsRes. 30(4):e15, 2002.

Variance Stabilization Applied to Microarray Data Calibration and tothe Quantification of Differential Expression. W.Huber, A.v.Heydebreck, H.Sültmann, A.Poustka, M.Vingron. Bioinformatics, Vol.18, Supplement 1, S96-S104, 2002.

A Variance-Stabilizing Transformation for Gene Expression MicroarrayData. : Durbin BP, Hardin JS, Hawkins DM, Rocke DM.Bioinformatics, Vol.18, Suppl. 1, S105-110.

Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data. Irizarry, RA, Hobbs, B, Collin, F, Beazer-Barclay, YD, Antonellis, KJ, Scherf, U, Speed, TP (2002). Accepted for publication in Biostatistics.http://biosun01.biostat.jhsph.edu/~ririzarr/papers/index.html

W. Huber, A.v. Heydebreck, M. Vingron, Error models for microarray intensities (PDF file on the course CD)