1-18-20051 randomization issues two-sample t-test vs paired t-test i made a mistake in creating the...

23
1-18-2005 1 •Randomization issues •Two-sample t-test vs paired t-test •I made a mistake in creating the dataset, so previous analyses will not be comparable with new ones •Issues of the background subtraction •Limma as the general tool for analyzing microarray data Outline

Upload: cornelius-arnold

Post on 30-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1-18-20051 Randomization issues Two-sample t-test vs paired t-test I made a mistake in creating the dataset, so previous analyses will not be comparable

1-18-2005 1

• Randomization issues

• Two-sample t-test vs paired t-test

• I made a mistake in creating the dataset, so previous analyses will not be comparable with new ones

• Issues of the background subtraction

• Limma as the general tool for analyzing microarray data

Outline

Page 2: 1-18-20051 Randomization issues Two-sample t-test vs paired t-test I made a mistake in creating the dataset, so previous analyses will not be comparable

1-18-2005 2

limma

... is a package for the analysis of microarray data, especially the use of linear models for analyzing designed experiments and the assessment of differential expression.

• Specially constructed data objects to represent various aspects of microarray data

• Specially constructed "object methods" for importing, normalizing, displaying and analyzing microarray data

• All objects and methods are transparent

• All objects can be accessed and modified outside of limma

• Unique in the implementation of the empirical Bayes procedure for identifying differentially expressed genes by "borrowing" information from different genes (everything so far has been gene by gene)

Page 3: 1-18-20051 Randomization issues Two-sample t-test vs paired t-test I made a mistake in creating the dataset, so previous analyses will not be comparable

1-18-2005 3

Measurement Error Model With Additive Background

Scanning the “Green Channel”(XG)

Scanning the “Red Channel”(XR)

Scanning the “Red Channel”(XR)

G

RX

X )Xlog()log(X GR

Scanning the “Green Channel”(XG)

Scanning the “Red Channel”(XR)

Scanning the “Red Channel”(XR)

G

RX

X )Xlog()log(X GR

•There are other models for accounting for the background signal

•Simple subtraction of the background intensities often introduces additional variability in the observed signal

•The problem is in the fact that we use a single-observation estimate for B

•With this in mind, various strategies have been proposed to pool background information from more than one spot to estimate B

Foreground (F)

Background (B)

)σ,μ(~x)σ,0(~ where,μlog(F) x 22 NN

)σ,0(~)log( where, F 2μ Ne Old Model

New Model

)σ,μ-μμ(~x x)σ,μ(~ xand )σ,μ(~ x 2rCWrCW

2xCC

2WW NrNN

)σ,(~ ),σ,0(~)log( where, F 22μBBX NBNeB

)σ,μ(~x)σ,0(~ where,μB)-log(F x 2X

2X NN

Page 4: 1-18-20051 Randomization issues Two-sample t-test vs paired t-test I made a mistake in creating the dataset, so previous analyses will not be comparable

1-18-2005 4

limma

Data to import:http://eh3.uc.edu/data/51-C1-3-vs-W1-5.gpr

File descriptions:http://eh3.uc.edu/data/WTargets.txt

Spot descriptions:http://eh3.uc.edu/data/WSpotTypes.txt

Importing data:source("http://eh3.uc.edu/LimmaDataImport.R")

Page 5: 1-18-20051 Randomization issues Two-sample t-test vs paired t-test I made a mistake in creating the dataset, so previous analyses will not be comparable

1-18-2005 5

limma

library(limma)

data.directory<-"http://eh3.uc.edu/data/"

targets<-readTargets("http://eh3.uc.edu/data/WTargets.txt")

spottypes<-readSpotTypes("http://eh3.uc.edu/data/WSpotTypes.txt")

LimmadataC<-read.maimages(files=targets$FileName,source="genepix",

path = data.directory, columns=list(Gf = "F532 Median",

Gb ="B532 Median", Rf = "F635 Median", Rb = "B635 Median"),

annotation=c("Name","ID","Block","Row","Column"),wt.fun=wtflags(0))

Page 6: 1-18-20051 Randomization issues Two-sample t-test vs paired t-test I made a mistake in creating the dataset, so previous analyses will not be comparable

1-18-2005 6

RGList class

> attributes(LimmadataC)

$names

[1] "R" "G" "Rb" "Gb" "weights" "targets" "genes"

$class

[1] "RGList"

attr(,"package")

[1] "limma"

Page 7: 1-18-20051 Randomization issues Two-sample t-test vs paired t-test I made a mistake in creating the dataset, so previous analyses will not be comparable

1-18-2005 7

RGList class

> LimmadataC$genes[1,]

Name ID Block Row Column

1 no name Rn30000100 1 1 1

> LimmadataC$R[1:3,]

51-C1-3-vs-W1-5 60-W2-3-vs-C2-5 72-C3-3-vs-W3-5 79-W4-3-vs-C4-5 82-C5-3-vs-W5-5 97-W6-3-vs-C6-5

[1,] 85 57 91 71 67 111

[2,] 358 1102 2394 1685 575 882

[3,] 168 376 620 670 206 293

> LimmadataC$Rb[1:3,]

51-C1-3-vs-W1-5 60-W2-3-vs-C2-5 72-C3-3-vs-W3-5 79-W4-3-vs-C4-5 82-C5-3-vs-W5-5 97-W6-3-vs-C6-5

[1,] 81 55 65 51 52 72

[2,] 81 56 65 51 51 72

[3,] 82 57 64 50 48 69

Page 8: 1-18-20051 Randomization issues Two-sample t-test vs paired t-test I made a mistake in creating the dataset, so previous analyses will not be comparable

1-18-2005 8

RGList classLimmadataC$genes$Status<-controlStatus(spottypes,LimmadataC)

LimmadataC$weights[LimmadataC$genes$ID=="Blank"]<-0

LimmadataC$printer<-getLayout(LimmadataC$genes)

> attributes(LimmadataC)

$names

[1] "R" "G" "Rb" "Gb" "weights" "targets" "genes" "printer"

$class

[1] "RGList"

attr(,"package")

[1] "limma"

> LimmadataC$genes[1,]

Name ID Block Row Column Status

1 no name Rn30000100 1 1 1 cDNA

Page 9: 1-18-20051 Randomization issues Two-sample t-test vs paired t-test I made a mistake in creating the dataset, so previous analyses will not be comparable

1-18-2005 9

Plotting data in a RGList object> plotMA(LimmadataC,array=1,xlim=c(-1,16),ylim=c(-3,8))

0 5 10 15

-20

24

68

51-C1-3-vs-W1-5

A

M

cDNABlankControlEmpty

Page 10: 1-18-20051 Randomization issues Two-sample t-test vs paired t-test I made a mistake in creating the dataset, so previous analyses will not be comparable

1-18-2005 10

limma

• PlotMA automatically subtracts the background intensities before plotting data

• Does not plot data with weight 0

• If you want to plot all data or the data without subtracting background, you need to do a little work

• source("http://eh3.uc.edu/BackgroundEffects.R")

Page 11: 1-18-20051 Randomization issues Two-sample t-test vs paired t-test I made a mistake in creating the dataset, so previous analyses will not be comparable

1-18-2005 11

limma

> NBLimmadataC<-backgroundCorrect(LimmadataC,method="none")

> attributes(NBLimmadataC)

$names

[1] "R" "G" "weights" "targets" "genes" "printer"

$class

[1] "RGList"

attr(,"package")

[1] "limma"

• Note that background measurements are gone

Page 12: 1-18-20051 Randomization issues Two-sample t-test vs paired t-test I made a mistake in creating the dataset, so previous analyses will not be comparable

1-18-2005 12

Scatter with and without background subtraction

0 5 10 15

-4-2

02

46

8

51-C1-3-vs-W1-5

AM

cDNABlankControlEmpty

0 5 10 15

-20

24

68

51-C1-3-vs-W1-5

A

M

cDNABlankControlEmpty

•Background subtracted data is more spread•More data points without background subtractions

Page 13: 1-18-20051 Randomization issues Two-sample t-test vs paired t-test I made a mistake in creating the dataset, so previous analyses will not be comparable

1-18-2005 13

Plotting all data points

•Want to plot data points with weight 0 as well•Create datasets with and without subtracting background and set all weights to 1

SpotsPerArray<-dim(LimmadataC$R)[1]

Narrays<-dim(LimmadataC$R)[2]

Limmadata<-LimmadataC

Limmadata$weights[1:SpotsPerArray,1:Narrays]<-1

NBLimmadata<-NBLimmadataC

NBLimmadata$weights[1:SpotsPerArray,1:Narrays]<-1

Page 14: 1-18-20051 Randomization issues Two-sample t-test vs paired t-test I made a mistake in creating the dataset, so previous analyses will not be comparable

1-18-2005 14

Plotting all data points

BackgroundSubtracted

Raw

0 5 10 15

-4-2

02468

51-C1-3-vs-W1-5

A

M

cDNABlankControlEmpty

0 5 10 15

-4-2

02468

51-C1-3-vs-W1-5

A

M

cDNABlankControlEmpty

0 5 10 15

-4-2

02468

51-C1-3-vs-W1-5

A

M

cDNABlankControlEmpty

0 5 10 15

-4-2

02468

51-C1-3-vs-W1-5

A

M

cDNABlankControlEmpty

All data Zero-weightdata removed

Page 15: 1-18-20051 Randomization issues Two-sample t-test vs paired t-test I made a mistake in creating the dataset, so previous analyses will not be comparable

1-18-2005 15

Which one to use?

• Removing points with the weight zero seems reasonable

• Subtracting background costs us some data points even if one channel is above background since differences of log-transformed measurements are used only

• Subtracting background seems to increase the variability, but it is unclear how would this affect results

• For now proceed without background subtraction, but compare results at the end

• Exploring other proposed background-adjustment methods also seems like a good idea

Page 16: 1-18-20051 Randomization issues Two-sample t-test vs paired t-test I made a mistake in creating the dataset, so previous analyses will not be comparable

1-18-2005 16

Data Analysis

Loess normalization

source("eh3.uc.edu/LimmaLoess.R")

> NNBLimmadataC<-normalizeWithinArrays(NBLimmadataC, method="loess")

> attributes(NNBLimmadataC)

$names

[1] "weights" "targets" "genes" "printer" "M" "A"

$class

[1] "MAList"

attr(,"package")

[1] "limma"

Page 17: 1-18-20051 Randomization issues Two-sample t-test vs paired t-test I made a mistake in creating the dataset, so previous analyses will not be comparable

1-18-2005 17

Loess-normalized data

6 8 10 12 14 16

-20

24

6

51-C1-3-vs-W1-5

A

McDNABlankControlEmpty

6 8 10 12 14 16

-50

5

60-W2-3-vs-C2-5

AM

cDNABlankControlEmpty

6 8 10 12 14 16

-20

24

72-C3-3-vs-W3-5

A

M

cDNABlankControlEmpty

6 8 10 12 14 16

-4-2

02

4

79-W4-3-vs-C4-5

A

M

cDNABlankControlEmpty

6 8 10 12 14 16

-20

24

82-C5-3-vs-W5-5

A

M

cDNABlankControlEmpty

6 8 10 12 14 16

-20

24

97-W6-3-vs-C6-5

A

M

cDNABlankControlEmpty

Page 18: 1-18-20051 Randomization issues Two-sample t-test vs paired t-test I made a mistake in creating the dataset, so previous analyses will not be comparable

1-18-2005 18

source("http://eh3.uc.edu/LimmaTTest.R")

> design<-modelMatrix(targets, ref="C")

Found unique target names:

C W

> design

W

51-C1-3-vs-W1-5 1

60-W2-3-vs-C2-5 -1

72-C3-3-vs-W3-5 1

79-W4-3-vs-C4-5 -1

82-C5-3-vs-W5-5 1

97-W6-3-vs-C6-5 -1

Paired t-test using limma

Page 19: 1-18-20051 Randomization issues Two-sample t-test vs paired t-test I made a mistake in creating the dataset, so previous analyses will not be comparable

1-18-2005 19

> LimmaPTT<-lmFit(MA,design)

Error in .class1(object) : Object "MA" not found

> LimmaPTT<-lmFit(NNBLimmadataC,design)

>

> attributes(LimmaPTT)

$names

[1] "coefficients" "stdev.unscaled" "sigma" "df.residual"

[5] "cov.coefficients" "pivot" "method" "design"

[9] "genes" "Amean"

$class

[1] "MArrayLM"

attr(,"package")

[1] "limma"

Paired t-test using limma

Page 20: 1-18-20051 Randomization issues Two-sample t-test vs paired t-test I made a mistake in creating the dataset, so previous analyses will not be comparable

1-18-2005 20

> mean(c(1,-1,1,-1,1,-1)*NNBLimmadataC$M[2,])

[1] -0.03068036

> var(c(1,-1,1,-1,1,-1)*NNBLimmadataC$M[2,])^0.5

[1] 0.3176068

> 1/(6^0.5)

[1] 0.4082483

> mean(c(1,-1,1,-1,1,-1)*NNBLimmadataC$M[2,])/((var(c(1,-1,1,-1,1,-1)*NNBLimmadataC$M[2,])^0.5)*(1/(6^0.5)))

[1] -0.2366172

>

> LimmaPTT$coefficients[2]

[1] -0.03068036

> LimmaPTT$stdev.unscaled[2]

[1] 0.4082483

> LimmaPTT$sigma[2]

[1] 0.3176068

> LimmaPTT$coefficients[2]/(LimmaPTT$sigma[2]*LimmaPTT$stdev.unscaled[2])

[1] -0.2366172

Paired t-test using limma

Page 21: 1-18-20051 Randomization issues Two-sample t-test vs paired t-test I made a mistake in creating the dataset, so previous analyses will not be comparable

1-18-2005 21

> mean(c(1,-1,1,-1,1,-1)*NNBLimmadataC$M[1,])

[1] 0.1425021

> var(c(1,-1,1,-1,1,-1)*NNBLimmadataC$M[1,])^0.5

[1] 0.2395690

> 1/(6^0.5)

[1] 0.4082483

> mean(c(1,-1,1,-1,1,-1)*NNBLimmadataC$M[1,])/((var(c(1,-1,1,-1,1,-1)*NNBLimmadataC$M[1,])^0.5)*(1/(6^0.5)))

[1] 1.457023

>

> LimmaPTT$coefficients[1]

[1] 0.1875361

> LimmaPTT$stdev.unscaled[1]

[1] 0.5

> LimmaPTT$sigma[1]

[1] 0.2831248

> LimmaPTT$coefficients[1]/(LimmaPTT$sigma[1]*LimmaPTT$stdev.unscaled[1])

[1] 1.324760

> NNBLimmadataC$weights[1,]

51-C1-3-vs-W1-5 60-W2-3-vs-C2-5 72-C3-3-vs-W3-5 79-W4-3-vs-C4-5 82-C5-3-vs-W5-5 97-W6-3-vs-C6-5

0 0 1 1 1 1

Paired t-test using limma

Page 22: 1-18-20051 Randomization issues Two-sample t-test vs paired t-test I made a mistake in creating the dataset, so previous analyses will not be comparable

1-18-2005 22

> dfp<-LimmaPTT$df.residuals>0

> LimmaPTT$LimmaTStat<-LimmaPTT$coefficients/(LimmaPTT$sigma*LimmaPTT$stdev.unscaled)

> LimmaPTT$LimmaTPvalue<-rep(NA,SpotsPerArray)

> LimmaPTT$LimmaTPvalue[dfp]<-2*pt(LimmaPTT$LimmaTStat[dfp],LimmaPTT$df.residual[dfp],lower.tail=FALSE)

> attributes(LimmaPTT)

$names

[1] "coefficients" "stdev.unscaled" "sigma" "df.residual"

[5] "cov.coefficients" "pivot" "method" "design"

[9] "genes" "Amean" "LimmaTStat" "LimmaTPvalue"

$class

[1] "MArrayLM"

attr(,"package")

[1] "limma"

Paired t-test using limma

Page 23: 1-18-20051 Randomization issues Two-sample t-test vs paired t-test I made a mistake in creating the dataset, so previous analyses will not be comparable

1-18-2005 23

• Facilitates easy data import and normalization

• Keeps track of "bad" spots

• To run the basic t-test, it takes a bit of additional work

• If we were to use the empirical Bayes statistics as implemented in limma, it would be even easier

• Empirical Bayes is generally BETTER than simple t-test

• Will talk about this type of analysis next week

• Limma also allows fitting models with multiple factors which we will also talk about next week

• Next time – multiple hypothesis testing and p-value adjustments

limma so far