dna microarray bioinformatics - #27612 normalization getting the numbers comparable

24
DNA Microarray Bioinformatics - #27612 Normalization Getting the numbers comparable

Post on 19-Dec-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: DNA Microarray Bioinformatics - #27612 Normalization Getting the numbers comparable

DNA Microarray Bioinformatics - #27612

Normalization

Getting the numbers comparable

Page 2: DNA Microarray Bioinformatics - #27612 Normalization Getting the numbers comparable

DNA Microarray Bioinformatics - #27612

Sample PreparationHybridization

Array designProbe design

QuestionExperimental Design

Buy Chip/Array

Statistical AnalysisFit to Model (time series)

Expression IndexCalculation

Advanced Data AnalysisClustering PCA Classification Promoter AnalysisMeta analysis Survival analysis Regulatory Network

ComparableGene Expression Data

Normalization

Image analysis

The DNA Array Analysis Pipeline

Page 3: DNA Microarray Bioinformatics - #27612 Normalization Getting the numbers comparable

DNA Microarray Bioinformatics - #27612

Expression intensities are not just target concentrations

• Sample contamination

• RNA quality• Sample preparation• Dye effect (cy3/cy5)• Probe affinity• Hybridization• Unspecific signal

(background)• Saturation

•Spotting•Other issues related to array manufacturing

•Image segmentation•Array spatial effects

Page 4: DNA Microarray Bioinformatics - #27612 Normalization Getting the numbers comparable

DNA Microarray Bioinformatics - #27612

Gene-specific variation

Spotting (size and shape)Cross-hybridizationDye

Biological variation– Effect– Noise

Global variation

RNA qualitySample preparationDyeHybridizationPhotodetection

Systematic

Two kinds of variation in the signal

Stochastic

Page 5: DNA Microarray Bioinformatics - #27612 Normalization Getting the numbers comparable

DNA Microarray Bioinformatics - #27612

Gene-specific variation:

• Too random to be explicitly accounted for• “noise”

Global variation:

• Similar effect on many

measurements• Corrections can be estimated from data

Normalization Statistical testing

Sources of variation

Systematic Stochastic

Page 6: DNA Microarray Bioinformatics - #27612 Normalization Getting the numbers comparable

DNA Microarray Bioinformatics - #27612

Calibration = Normalization = Scaling

Page 7: DNA Microarray Bioinformatics - #27612 Normalization Getting the numbers comparable

DNA Microarray Bioinformatics - #27612

Nonlinear normalization

Page 8: DNA Microarray Bioinformatics - #27612 Normalization Getting the numbers comparable

DNA Microarray Bioinformatics - #27612

Lowess Normalization

One of the most commonly utilized normalization techniques is the LOcally Weighted Scatterplot Smoothing (LOWESS) algorithm.

M

A

* * * *** *

Page 9: DNA Microarray Bioinformatics - #27612 Normalization Getting the numbers comparable

DNA Microarray Bioinformatics - #27612

The Qspline method

From the empirical distribution, a number of quantiles are calculated for each of the channels to be normalized (one channel shown in red) and for the reference distribution (shown in black)A QQ-plot is made and a normalization curve is constructed by fitting a cubic spline functionAs reference one can use an artificial “median array” for a set of arrays or use a log-normal distribution, which is a good approximation.

Page 10: DNA Microarray Bioinformatics - #27612 Normalization Getting the numbers comparable

DNA Microarray Bioinformatics - #27612

Once again…qspline

When many microarrays are to be normalized to each other an average array can be used as target

Accumulating quantiles

Page 11: DNA Microarray Bioinformatics - #27612 Normalization Getting the numbers comparable

DNA Microarray Bioinformatics - #27612

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Invariant set normalization (Li and Wong)

A invariant set of probes is used-Probes that does does not change intensity rank between arrays-A piecewise linear median line is calculated-This curve is used for normalization

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 12: DNA Microarray Bioinformatics - #27612 Normalization Getting the numbers comparable

DNA Microarray Bioinformatics - #27612

Spatial biasestimate

Spatial normalization

After intensitynormalization

After spatialnormalization

Raw data After intensitynormalizationAfter intensitynormalization

After spatialnormalizationAfter spatial

normalization

Page 13: DNA Microarray Bioinformatics - #27612 Normalization Getting the numbers comparable

DNA Microarray Bioinformatics - #27612

Sample PreparationHybridization

Array designProbe design

QuestionExperimental Design

Buy Chip/Array

Statistical AnalysisFit to Model (time series)

Expression IndexCalculation

Advanced Data AnalysisClustering PCA Classification Promoter AnalysisMeta analysis Survival analysis Regulatory Network

ComparableGene Expression Data

Normalization

Image analysis

The DNA Array Analysis Pipeline

Page 14: DNA Microarray Bioinformatics - #27612 Normalization Getting the numbers comparable

DNA Microarray Bioinformatics - #27612

Expression index value

Some microarrays have multiple probes addressing the expression of the same target

– Affymetrix GeneChips have 11-20 probe pairs pr. Gene

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.- Perfect Match (PM)

- MisMatch (MM)

PM: CGATCAATTGCACTATGTCATTTCT MM: CGATCAATTGCAGTATGTCATTTCT

Page 15: DNA Microarray Bioinformatics - #27612 Normalization Getting the numbers comparable

DNA Microarray Bioinformatics - #27612

Expression index calculation

Simplest method?Median

But more sophisticated methods exists:dChip, RMA and MAS 5

Page 16: DNA Microarray Bioinformatics - #27612 Normalization Getting the numbers comparable

DNA Microarray Bioinformatics - #27612

dChip (Li & Wong)

Model: PMij = ij + ij

Outlier removal:– Identify extreme residuals– Remove– Re-fit– Iterate

Distribution of errors ij assumed independent of signal strength

(Li and Wong, 2001)

Page 17: DNA Microarray Bioinformatics - #27612 Normalization Getting the numbers comparable

DNA Microarray Bioinformatics - #27612

RMA

Robust Multi-array Average (RMA) expression measure (Irizarry et al., Biostatistics, 2003)

For each probe set, re-write PMij = ij as:

log(PMij)= log(i ) + log(j)

Fit this additive model by iteratively re-weighted least-squares or median polish

Page 18: DNA Microarray Bioinformatics - #27612 Normalization Getting the numbers comparable

DNA Microarray Bioinformatics - #27612

MAS. 5

MicroArray Suite version 5 uses

MM* is an adjusted MM that is never bigger than PM

Tukey biweight is a robust average procedure with

weights and outlier rejection

)}{log( *jj MMPMghtTukeyBiweisignal −=

Page 19: DNA Microarray Bioinformatics - #27612 Normalization Getting the numbers comparable

DNA Microarray Bioinformatics - #27612

Std Dev of gene measures from 20 replicate arrays

Methods compared on expression variance

Standard deviation of gene measures from 20 replicate arrays

RMA: Blue and RedMAS5: GreendChip: Black

Expression level

From Terry speed

Page 20: DNA Microarray Bioinformatics - #27612 Normalization Getting the numbers comparable

DNA Microarray Bioinformatics - #27612

Robustness

MAS5.0

(Irizarry et al., Biostatistics, 2003)

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

MAS 5.0

Log fold change estimate from 1.25ug cRNA

Log

fold

ch

an

ge

esti

mate

fro

m 2

0u

g c

RN

A

Page 21: DNA Microarray Bioinformatics - #27612 Normalization Getting the numbers comparable

DNA Microarray Bioinformatics - #27612

Robustness

dChip

(Irizarry et al., Biostatistics, 2003)

dChip

Log fold change estimate from 1.25ug cRNA

Log

fold

ch

an

ge

esti

mate

fro

m 2

0u

g c

RN

A

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 22: DNA Microarray Bioinformatics - #27612 Normalization Getting the numbers comparable

DNA Microarray Bioinformatics - #27612

Robustness

RMA

(Irizarry et al., Biostatistics, 2003)

RMA

Log fold change estimate from 1.25ug cRNA

Log

fold

ch

an

ge

esti

mate

fro

m 2

0u

g c

RN

A

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 23: DNA Microarray Bioinformatics - #27612 Normalization Getting the numbers comparable

DNA Microarray Bioinformatics - #27612

All of this is implemented in…

R

In the BioConductor packages ‘affy’

(Gautier et al., 2003).

Page 24: DNA Microarray Bioinformatics - #27612 Normalization Getting the numbers comparable

DNA Microarray Bioinformatics - #27612

References

Li and Wong, (2001). Model-based analysis of oligonucleotide arrays: Model validation, design issues and standard error application. Genome Biology 2:1–11.

Irizarry, Bolstad, Collin, Cope, Hobbs and Speed, (2003) Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Research 31(4):e15.)

Affymetrix. Affymetrix Microarray Suite User Guide. Affymetrix, Santa

Clara, CA, version 5 edition, 2001.

Gautier, Cope, Bolstad, and Irizarry, (2003). affy - an r package for the analysis of affymetrix genechip data at the probe level. Bioinformatics