1-18-20051 randomization issues two-sample t-test vs paired t-test i made a mistake in creating the...

1-18-2005 1

• Randomization issues

• Two-sample t-test vs paired t-test

• I made a mistake in creating the dataset, so previous analyses will not be comparable with new ones

• Issues of the background subtraction

• Limma as the general tool for analyzing microarray data

Outline

1-18-2005 2

limma

... is a package for the analysis of microarray data, especially the use of linear models for analyzing designed experiments and the assessment of differential expression.

• Specially constructed data objects to represent various aspects of microarray data

• Specially constructed "object methods" for importing, normalizing, displaying and analyzing microarray data

• All objects and methods are transparent

• All objects can be accessed and modified outside of limma

• Unique in the implementation of the empirical Bayes procedure for identifying differentially expressed genes by "borrowing" information from different genes (everything so far has been gene by gene)

1-18-2005 3

Measurement Error Model With Additive Background

Scanning the “Green Channel”(XG)

Scanning the “Red Channel”(XR)


G

RX

X )Xlog()log(X GR

Scanning the “Green Channel”(XG)



G

RX

X )Xlog()log(X GR

•There are other models for accounting for the background signal

•Simple subtraction of the background intensities often introduces additional variability in the observed signal

•The problem is in the fact that we use a single-observation estimate for B

•With this in mind, various strategies have been proposed to pool background information from more than one spot to estimate B

Foreground (F)

Background (B)

)σ,μ(~x)σ,0(~ where,μlog(F) x 22 NN

)σ,0(~)log( where, F 2μ Ne Old Model

New Model

)σ,μ-μμ(~x x)σ,μ(~ xand )σ,μ(~ x 2rCWrCW

2xCC

2WW NrNN

)σ,(~ ),σ,0(~)log( where, F 22μBBX NBNeB

)σ,μ(~x)σ,0(~ where,μB)-log(F x 2X

2X NN

1-18-2005 4

limma

Data to import:http://eh3.uc.edu/data/51-C1-3-vs-W1-5.gpr

File descriptions:http://eh3.uc.edu/data/WTargets.txt

Spot descriptions:http://eh3.uc.edu/data/WSpotTypes.txt

Importing data:source("http://eh3.uc.edu/LimmaDataImport.R")

http://eh3.uc.edu/data/51-C1-3-vs-W1-5.gpr

http://eh3.uc.edu/data/WTargets.txt

http://eh3.uc.edu/data/WSpotTypes.txt

http://eh3.uc.edu/LimmaDataImport.R

1-18-2005 5

limma

library(limma)

data.directory<-"http://eh3.uc.edu/data/"

targets<-readTargets("http://eh3.uc.edu/data/WTargets.txt")

spottypes<-readSpotTypes("http://eh3.uc.edu/data/WSpotTypes.txt")

LimmadataC<-read.maimages(files=targets$FileName,source="genepix",

path = data.directory, columns=list(Gf = "F532 Median",

Gb ="B532 Median", Rf = "F635 Median", Rb = "B635 Median"),

annotation=c("Name","ID","Block","Row","Column"),wt.fun=wtflags(0))

1-18-2005 6

RGList class

> attributes(LimmadataC)

$names

[1] "R" "G" "Rb" "Gb" "weights" "targets" "genes"

$class

[1] "RGList"

attr(,"package")

[1] "limma"

1-18-2005 7

RGList class

> LimmadataC$genes[1,]

Name ID Block Row Column

1 no name Rn30000100 1 1 1

> LimmadataC$R[1:3,]

51-C1-3-vs-W1-5 60-W2-3-vs-C2-5 72-C3-3-vs-W3-5 79-W4-3-vs-C4-5 82-C5-3-vs-W5-5 97-W6-3-vs-C6-5

[1,] 85 57 91 71 67 111

[2,] 358 1102 2394 1685 575 882

[3,] 168 376 620 670 206 293

> LimmadataC$Rb[1:3,]


[1,] 81 55 65 51 52 72

[2,] 81 56 65 51 51 72

[3,] 82 57 64 50 48 69

1-18-2005 8

RGList classLimmadataC$genes$Status<-controlStatus(spottypes,LimmadataC)

LimmadataC$weights[LimmadataC$genes$ID=="Blank"]<-0

LimmadataC$printer<-getLayout(LimmadataC$genes)

> attributes(LimmadataC)

$names

[1] "R" "G" "Rb" "Gb" "weights" "targets" "genes" "printer"

$class

[1] "RGList"

attr(,"package")

[1] "limma"

> LimmadataC$genes[1,]

Name ID Block Row Column Status

1 no name Rn30000100 1 1 1 cDNA

1-18-2005 9

Plotting data in a RGList object> plotMA(LimmadataC,array=1,xlim=c(-1,16),ylim=c(-3,8))

0 5 10 15

-20

24

68

51-C1-3-vs-W1-5

A

M

cDNABlankControlEmpty

1-18-2005 10

limma

• PlotMA automatically subtracts the background intensities before plotting data

• Does not plot data with weight 0

• If you want to plot all data or the data without subtracting background, you need to do a little work

• source("http://eh3.uc.edu/BackgroundEffects.R")

http://eh3.uc.edu/BackgroundEffects.R

1-18-2005 11

limma

> NBLimmadataC<-backgroundCorrect(LimmadataC,method="none")

> attributes(NBLimmadataC)

$names

[1] "R" "G" "weights" "targets" "genes" "printer"

$class

[1] "RGList"

attr(,"package")

[1] "limma"

• Note that background measurements are gone

1-18-2005 12

Scatter with and without background subtraction

0 5 10 15

-4-2

02

46

8

51-C1-3-vs-W1-5

AM


0 5 10 15

-20

24

68

51-C1-3-vs-W1-5

A

M


•Background subtracted data is more spread•More data points without background subtractions

1-18-2005 13

Plotting all data points

•Want to plot data points with weight 0 as well•Create datasets with and without subtracting background and set all weights to 1

SpotsPerArray<-dim(LimmadataC$R)[1]

Narrays<-dim(LimmadataC$R)[2]

Limmadata<-LimmadataC

Limmadata$weights[1:SpotsPerArray,1:Narrays]<-1

NBLimmadata<-NBLimmadataC

NBLimmadata$weights[1:SpotsPerArray,1:Narrays]<-1

1-18-2005 14

Plotting all data points

BackgroundSubtracted

Raw

0 5 10 15

-4-2

02468

51-C1-3-vs-W1-5

A

M


0 5 10 15

-4-2

02468

51-C1-3-vs-W1-5

A

M


0 5 10 15

-4-2

02468

51-C1-3-vs-W1-5

A

M


0 5 10 15

-4-2

02468

51-C1-3-vs-W1-5

A

M


All data Zero-weightdata removed

1-18-2005 15

Which one to use?

• Removing points with the weight zero seems reasonable

• Subtracting background costs us some data points even if one channel is above background since differences of log-transformed measurements are used only

• Subtracting background seems to increase the variability, but it is unclear how would this affect results

• For now proceed without background subtraction, but compare results at the end

• Exploring other proposed background-adjustment methods also seems like a good idea

1-18-2005 16

Data Analysis

Loess normalization

source("eh3.uc.edu/LimmaLoess.R")

> NNBLimmadataC<-normalizeWithinArrays(NBLimmadataC, method="loess")

> attributes(NNBLimmadataC)

$names

[1] "weights" "targets" "genes" "printer" "M" "A"

$class

[1] "MAList"

attr(,"package")

[1] "limma"

1-18-2005 17

Loess-normalized data

6 8 10 12 14 16

-20

24

6

51-C1-3-vs-W1-5

A

McDNABlankControlEmpty

6 8 10 12 14 16

-50

5

60-W2-3-vs-C2-5

AM


6 8 10 12 14 16

-20

24

72-C3-3-vs-W3-5

A

M


6 8 10 12 14 16

-4-2

02

4

79-W4-3-vs-C4-5

A

M


6 8 10 12 14 16

-20

24

82-C5-3-vs-W5-5

A

M


6 8 10 12 14 16

-20

24

97-W6-3-vs-C6-5

A

M


1-18-2005 18

source("http://eh3.uc.edu/LimmaTTest.R")

> design<-modelMatrix(targets, ref="C")

Found unique target names:

C W

> design

W

51-C1-3-vs-W1-5 1

60-W2-3-vs-C2-5 -1

72-C3-3-vs-W3-5 1

79-W4-3-vs-C4-5 -1

82-C5-3-vs-W5-5 1

97-W6-3-vs-C6-5 -1

Paired t-test using limma

http://eh3.uc.edu/LimmaTTest.R

1-18-2005 19

> LimmaPTT<-lmFit(MA,design)

Error in .class1(object) : Object "MA" not found

> LimmaPTT<-lmFit(NNBLimmadataC,design)

>

> attributes(LimmaPTT)

$names

[1] "coefficients" "stdev.unscaled" "sigma" "df.residual"

[5] "cov.coefficients" "pivot" "method" "design"

[9] "genes" "Amean"

$class

[1] "MArrayLM"

attr(,"package")

[1] "limma"


1-18-2005 20

> mean(c(1,-1,1,-1,1,-1)*NNBLimmadataC$M[2,])

[1] -0.03068036

> var(c(1,-1,1,-1,1,-1)*NNBLimmadataC$M[2,])^0.5

[1] 0.3176068

> 1/(6^0.5)

[1] 0.4082483

> mean(c(1,-1,1,-1,1,-1)*NNBLimmadataC$M[2,])/((var(c(1,-1,1,-1,1,-1)*NNBLimmadataC$M[2,])^0.5)*(1/(6^0.5)))

[1] -0.2366172

>

> LimmaPTT$coefficients[2]

[1] -0.03068036

> LimmaPTT$stdev.unscaled[2]

[1] 0.4082483

> LimmaPTT$sigma[2]

[1] 0.3176068

> LimmaPTT$coefficients[2]/(LimmaPTT$sigma[2]*LimmaPTT$stdev.unscaled[2])

[1] -0.2366172


1-18-2005 21

> mean(c(1,-1,1,-1,1,-1)*NNBLimmadataC$M[1,])

[1] 0.1425021

> var(c(1,-1,1,-1,1,-1)*NNBLimmadataC$M[1,])^0.5

[1] 0.2395690

> 1/(6^0.5)

[1] 0.4082483

> mean(c(1,-1,1,-1,1,-1)*NNBLimmadataC$M[1,])/((var(c(1,-1,1,-1,1,-1)*NNBLimmadataC$M[1,])^0.5)*(1/(6^0.5)))

[1] 1.457023

>

> LimmaPTT$coefficients[1]

[1] 0.1875361

> LimmaPTT$stdev.unscaled[1]

[1] 0.5

> LimmaPTT$sigma[1]

[1] 0.2831248

> LimmaPTT$coefficients[1]/(LimmaPTT$sigma[1]*LimmaPTT$stdev.unscaled[1])

[1] 1.324760

> NNBLimmadataC$weights[1,]


0 0 1 1 1 1


1-18-2005 22

> dfp<-LimmaPTT$df.residuals>0

> LimmaPTT$LimmaTStat<-LimmaPTT$coefficients/(LimmaPTT$sigma*LimmaPTT$stdev.unscaled)

> LimmaPTT$LimmaTPvalue<-rep(NA,SpotsPerArray)

> LimmaPTT$LimmaTPvalue[dfp]<-2*pt(LimmaPTT$LimmaTStat[dfp],LimmaPTT$df.residual[dfp],lower.tail=FALSE)

> attributes(LimmaPTT)

$names

[1] "coefficients" "stdev.unscaled" "sigma" "df.residual"

[5] "cov.coefficients" "pivot" "method" "design"

[9] "genes" "Amean" "LimmaTStat" "LimmaTPvalue"

$class

[1] "MArrayLM"

attr(,"package")

[1] "limma"


1-18-2005 23

• Facilitates easy data import and normalization

• Keeps track of "bad" spots

• To run the basic t-test, it takes a bit of additional work

• If we were to use the empirical Bayes statistics as implemented in limma, it would be even easier

• Empirical Bayes is generally BETTER than simple t-test

• Will talk about this type of analysis next week

• Limma also allows fitting models with multiple factors which we will also talk about next week

• Next time – multiple hypothesis testing and p-value adjustments

limma so far

1-18-20051 randomization issues two-sample t-test vs paired t-test i made a mistake in creating the...

Documents

statusplotting data

data pointswant

txtimporting data

data pointswhich

background intensities

background information

background measurements

background subtractionlimma