january 20081 mscl analyst’s toolbox, part 2 instructors: jennifer barb, zoila g. rangel, peter...

60
January 2008 1 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 atical and Statistical Computing Laboratory on of Computational Biosciences

Upload: janis-mclaughlin

Post on 20-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 1

MSCL Analyst’s Toolbox, Part 2

Instructors:

Jennifer Barb, Zoila G. Rangel, Peter Munson

March 2007Mathematical and Statistical Computing LaboratoryDivision of Computational Biosciences

Page 2: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 2

Statistical topics

• Quality Control Charts

• False Discovery Rate

• Principal Components Analysis explained

• PCA Heatmap

• Data normalization, transformation

• Affymetrix probesets and “Probe-level” analysis

• MAS5, RMA, S10 compared

Page 3: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 3

Gene Expression Microarrays

• Started in mid-1990s, exponential growth in popularity

• High-throughput -- measures 10,000s of genes at once

• Very noisy -- systematic and random errors– Chip manufacturing, printing artifacts– RNA sample quality issues– Sample preparation, amplification, labeling reaction problems

– Hybridization reaction variability– Linearity of response, saturation, background

• Affymetrix has controlled chip quality well.

• REPLICATION IS STILL REQUIRED!

• Statistical methods are critical in analysis!

• Quality Control is Essential!

Page 4: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 4

Quality Control Plotsfor Parameters RawQ,

ScaleFactor

Page 5: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 5

New Scanner Installed

Scanner “burn-in”?

Quality Control Plotsfor Parameters RawQ,

ScaleFactor

Page 6: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 6

Quality ControlFigure 2

Figure 3

Levy-Jennings (QC) plots for parameters RawQ, reflecting image background noise, and SF, or scale-factor for over 700 Affymetrix U133A and U133 Plus 2 arrays processed in the NIH/CC and NHLBI microarray laboratories. Average and Upper Control Limit values are set based on historical data extending over 5 years. These and other parameters are tracked regularly and used as basis for acceptance of new array data quality.

Page 7: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 7

Experimental Designs for Gene Expression

• Cross-sectional clinical studies from 2 or more patient groups or tissues; identify markers, prognostic indicators.

• Animal model: samples compared between treatments, groups, or over time; identify genes involved in disease process.

• Intervention Trial: collect blood samples pre/post treatment or over time, identify (and rationalize) genes involved.

• Cell culture: Treat cells in culture, identify genes and patterns of response. Complex study designs possible.

• Genetic Knock-out: Perturb genotype, give treatment, investigate expression response, in animal or cells.

Page 8: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 8

Gene Expression Analysis Strategies

• Clinical Studies: – Exploratory analysis, Hierarchical Cluster, Heat maps– Sample size often insufficient– Two-sample tests, Discriminant Analysis, “machine learning” approaches to find prognostic factors

• Designed studies: Analysis plan should follow design– T-tests, one-way ANOVA to select significantly changing genes

– Blocking to account for experimental batch– Two-way ANOVA for complete two-factor experiments– Regression (etc.) for time-course experimemts

• Corrections for multiple-comparison (20,000 genes tested)– False Discovery Rate

• Interpretation of gene lists (open-ended problem!)

Page 9: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 9

P-values should be uniformly distributed

• Note excess of small p-values in 45,000 probe sets

• Indicates presence of significant, differentially expressed genes

Cut at p<.05

Falsediscoveries

True discoveries

Page 10: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 10

False Discovery Rate calculation

(simplified version)

= Number Discovered at this p-value

(Number of tests) x p-value cutoff

Example: 48 genes detected at p<.001 in chip with 12,000 genes.

12,000 * .001

48= 48

12= 25%FDR =

FDR* = Expected Number of False Discoveries

Number Discovered

*Benjamini, Y., Hochberg, Y. (1995) JRSS-B, 57, 289-300.

Page 11: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 11

False Discovery Rate calculation

(full version)

FDR(p) = minp*> p

NumberofTests × p*

NumberDetected(p*)

FDR(p) ≤1

Now we have guarantee that,

Page 12: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 12

Page 13: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 13

Gene Expression Data Matrix, X

(transpose of “Final File” format)

Expression Matrix, X

1 12,625

Genes

1

n

Sam

ples

Information abouteach Sample

Annotations for each Gene

Page 14: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 14

Analyzing the Data Matrix

•"pre-condition" the Expression Data Matrix

•Select "significant" Genes (False Discovery Rate)

•Select relevant Samples (Outlier rejection, QC)

•Re-order, partition the Genes ("clustering")

•Re-order the Samples

•Visualize the matrix ("heat-map", PCA scatterplot), encode Gene and Sample annotations

•Visualize by Sample (rows of X, scatterplots, line plots)

•Visualize by Gene (cols of X)

•Visualize the Annotations (how?)

•Browse the display for new hypotheses!

Page 15: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 15

Principal Component Analysis

PC1(i) = a1,1x1(i) + a1,2x2(i) + ...+ a1,12625x12625(i)

PC2(i) = a2,1x1(i) + a2,2x2(i) + ...+ a2,12625x12625(i)

PC = X ⋅A

Each Principal Component is an orthogonal, linear combination of the expression levels. For the ith gene chip:

In matrix notation:

Principal Components Matrix

Expression Data Matrix

Patterns Matrix

Page 16: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 16

Data can be Reconstructed from PCs!

PC = X ⋅A

PC ⋅AT = X ⋅(A ⋅AT ) = XA was chosen so that AAT is the Identity matrix:

X = PC ⋅ATOr

Page 17: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 17

Data Matrix (X) equals Principal Components (PC)

times Expression Patterns (EP = AT)

X

1 12,625

1

n

Genes

Exp

erim

ents

Genes

EP

1 12,625

1

n

Com

pone

nts

PC

1

nE

xper

imen

ts

1 nComponents

*=

Plot PC(i,1) vs PC(i,2)for each experiment

•EP row1 contains most important “expression pattern"•PC col 1 defines how that pattern is manifest in each experiment•Similarly for EP row 2, PC col 2, etc.•Only a few patterns needed to reconstruct data matrix X

Page 18: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 18

Principal Components Analysis

-15

-10

-5

0

5

10

15

20

Pattern 2

4C

4Dex

4IF

4IFDex

5C

5Dex

5IF 5IFDex

6C

6Dex6IF

6IFDex

7C7Dex7IF

7IFDex

8C

8Dex8IF

8IFDex

-30 -20 -10 0 10 20Pattern

10050861008115

Probe Array Lot

-15

-10

-5

0

5

10

15

20

Pattern 2

4C

4Dex

4IF

4IFDex

5C

5Dex

5IF 5IFDex

6C

6Dex6IF

6IFDex

7C7Dex7IF

7IFDex

8C

8Dex8IF

8IFDex

-30 -20 -10 0 10 20Pattern

PC 1(38%)

PC 2(12%)

Page 19: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 19

GLOBAL DATABASE (HG U95A)PCA BI-PLOT

-110

-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0

10

20

30

Pattern 2

-50 -40 -30 -20 -10 0 10 20 30 40 50 60 70Pattern

Each spot is one chipN=469

Page 20: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 20

GLOBAL DATABASE PCA BI-PLOT (PC2 vs PC3)

-40

-30

-20

-10

0

10

20

30

40

Pattern 3

-110 -100 -90 -80 -70 -60 -50 -40 -30 -20 -10 0 10 20 30Pattern 2

Page 21: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 21

X

1 12,625

1

n

Genes

Exp

erim

ents

Genes

EP

1 12,625

1

n

Com

pone

nts

PC

1

n

Exp

erim

ents

1 nComponents

*=

PCA HEATMAPData (X) equals Components (PC) times Expression

Patterns (EP)

Visualize coefficientsof a first few “Patterns”, Re-order Experiments

Page 22: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 22

Conclusion:Sample Type and Project determine clusters

U95A DatabasePCA Heatmap

colored bySample Type

(12)

Page 23: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 23

PCA Heatmap of Entire Database

469 Chips, 468 Components5,933,750 values!

Page 24: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 24

Page 25: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 25

Data Normalization and Transformation

Page 26: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 26

Chip-to-chip normalization,

Data transformation• Signal intensity varies chip-to-chip for a variety of technical reasons. – Scale adjustments can be made in variety of ways.

– Median adjustment (divide by col median) is commonly used

– Other quantiles (e.g.75th percentile) may work better

• Log-transform – spreads data more evenly– makes variance more uniform

• “Lmed” is median normalized, log transform

Page 27: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 27

Chip-to-chip normalization,

Data transformation (2)• Quantile normalization (“ranking” the data): every

percentile becomes identical across chips

• Quantile normalization may remove technical

artifacts (e.g. curvature)

• Variance should be homogeneous across measurement

scale

• Variance may be “homogenized” with appropriate

transform (e.g. logarithm, square-root, arcsinh)

• “S10” transform -- optimal variance stabilizing,

quantile normalizing transform, calibrated to match

Log10 over central part of measurement scale

Page 28: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 28

Data Transformation and Normalization

Page 29: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 29

Log(x/median x) transform (“Lmed”)

Page 30: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 30

2 Comparison of two chips-MAS5 signal

Page 31: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 31

2 Comparison of two chips - Log10(Signal)

Page 32: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 32

2 Comparison of two chips - Lmed(SG)

Note deviation from line of identity

Page 33: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 33

2 Comparison of two chips - 2 x limits

•Note deviation from line of identity

•Note nonuniform variance

Page 34: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 34

Median-normalized Log-transform“Lmed”

• Adequate in most cases

BUT….• Some nonlinearity may remain, requiring further normalization

• Variance is not truly constant, expands at low intensities

• Cannot treat zero or negative values

• Logarithm may not be best transformation

• Median normalization may not always be adequate

Page 35: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 35

Variance Stabilizing Transform (3)

Symmetric Adaptive Transform (S10):

• We start with quantile normalization to convenient distribution

• We further transform to make variance constant with mean

• We adapt transform to empirical variance model (with experiment with at least 5 to 10 chips)

• We scale transform to match log10 units midrange

• We require symmetry around origin

Page 36: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 36

2 Comparison of two chips - Lmed(Signal)

Model the nonlinear relationship

Red line is plot of

quantile of chip 1 vs quantile of chip 2

Page 37: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 37

2 Comparison of two chips - Quantile normalization

• Second chip is quantile-normalized to first chip

• Curvature is cured!

• Now, can we remove the variable spread?

• Nonuniform variance?

Page 38: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 38

2 Comparison of two chips -Symmetric Adaptive Transpose,

base10 “S10”• Uses Quantile normalization

• Gives better fit to line of identity

• Adapts scale to give homogeneous variance

• Uniform scatter about line

• Calibrated to match Log10 in middle of scale*Munson, P.J. A consistency test for determining the significance of gene expression changes on replicate samples and two convenient variance-stabilizing transformations. in GeneLogic Workshop of Low Level Analysis of Affymetrix GeneChip Data. 2001. Bethesda, MD.

Page 39: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 39

Symmetric Adaptive Transform (“S10”)

Page 40: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 40

Symmetric Adaptive Transform (“S10”)

Lmed

S10

Page 41: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 41

PCA on Lmed transformed data

• 12 Chips• 3 Groups• Two apparent outliers• Groups not well separated• 1st PC explains 15.3% of variation

Page 42: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 42

PCA on S10 transformed data

• Outliers no

longer obvious

• Groups well-

separated

• 1st PC

explains 30.8%

of variation

Page 43: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 43

Fold Change due to Drug - Log10 scale

Log Fold Change-Drug vs. Control - Repl. 1

LFC - Repl. 2

Page 44: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 44

Fold Change due to Drug - S10 scale

SFC-Drug vs. Control - Repl. 1

SFC - Repl. 2

Page 45: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 45

Variance Stabilizing Transforms (1)

Page 46: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 46

Variance Stabilizing Transforms (2)

2

Page 47: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 47

Log of “Signal”, Variance Model

Mean Lmed Value Std Dev Lmed

Signal ValueLmed Transform Value

Page 48: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 48

S10(“Signal”), Variance Model

S10 Transform Value

Lmed Transform Value

Std Dev S10

Mean S10 Value

Page 49: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 49

“Probe Level” analysis

Comparison of Signal, RMA, S10

Page 50: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 50

Affymetrix Technology

Page 51: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 51

Affymetrix uses multiple probes per gene

Page 52: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 52

Data Summarizing Algorithms

To go from 11 probe pairs to a single number:

• Affymetrix MAS 4.0 (Average difference)

• Affymetrix MAS 5.0 (Signal)

• dChip (Li and Wong, 2001)

• RMA (Irizarry, 2003)

• PLIER (Hubbell, 2004, Affymetrix)

• Transformations of above statistics (Log, Glog, S10, etc.)

Page 53: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 53

Which Algorithm is Best?Latin Square Data Answers Question

• Spike-in (or Latin Square) study on Affy U133A chip

• 13 concentrations plus “control” spiked into complex HeLa background

• 42 oligos, 0, 0.125 - 512 pM

• Concentration doubles at each step

• Three chips run for each concentration

www.affymetrix.com“Latin Square Data for Expression Algorithm Assessment” Concentration Number

Mean Intensity for

Probeset

Page 54: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 54

Detect 2x changes for each Spike-in

Using Volcano Plot

Move selector box to detect more Red, fewer Blue points

RED - spike-in genesBLUE - background

Page 55: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 55

ROC curve for Lmed of Signal

TP=Red points inside detection box

FP=Blue points inside detection box

Number of False Positives

Number of True Positives

Lmed(Signal)

Page 56: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 56

ROC for S10(Signal)

Number of False Positives

Number of True Positives

S10(Signal)

Lmed(Signal)

RMA

Page 57: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 57

Lmed(Signal) Details

Page 58: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 58

S10(Signal) Details

Page 59: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 59

RMA Details

Page 60: January 20081 MSCL Analyst’s Toolbox, Part 2 Instructors: Jennifer Barb, Zoila G. Rangel, Peter Munson March 2007 Mathematical and Statistical Computing

January 2008 60

Comparison of Algorithms

• RMA– gives overall best ROC curve– requires probes on multiple chips be summarized together

– Implemented in Affy EC, R, Bioconductor or ArrayAssistLite

• Signal (MAS5) – is convenient, available in Affy GCOS software – summarizes each chip separately– has expanded variance near baseline – LmedMAS5 give worst ROC curve

• S10 transform – cures variance problem for Signal, – improves detection efficiency (ROC curve), – is simple to compute!