bringing a statistical package to the biologist's fingertips

28
Bringing A Statistical Package To The Biologist’s Fingertips With Applications to Microarray Analysis

Upload: sammy17

Post on 28-Nov-2014

782 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Bringing A Statistical Package To The Biologist's Fingertips

Bringing A Statistical Package To The

Biologist’s Fingertips

With Applications to Microarray Analysis

Page 2: Bringing A Statistical Package To The Biologist's Fingertips

Microarray ExperimentsSome examples of the many types of microarray

experiments currently being considered.• Comparison to normal cells.• Comparison of many cell types using an

appropriate pool of RNA as a reference.• Time series using either time 0 or past time as

a reference• Knockout experiments• Factor experiments

Page 3: Bringing A Statistical Package To The Biologist's Fingertips

Statistical issues to be addressed.Image analysis.• Spot identification• Background correction

Data analysis• Normalisation• Transformation• Significant genes• Large amounts of data• • ………………….Need a flexible approach.

Page 4: Bringing A Statistical Package To The Biologist's Fingertips

A tool for analysis : R

R is freeware that is rapidly becoming very widely used.

It can handle the large data files used to analyse microarrays.

Is available for Unix, Linux and Windows.

Has excellent documentation and help available.

Page 5: Bringing A Statistical Package To The Biologist's Fingertips

Image Analysis and R

In collaboration with the CSIRO (Sydney) , Jean Yee Hwa Yang and Terry Speed have developed a microarray image analysis package that is currently being written for implementation using Z-image and R.

This automated image analysis program overcomes some of the problems and limitations of other commercial packages.

Output will automatically be setup for further analysis in R.

Page 6: Bringing A Statistical Package To The Biologist's Fingertips

Using R at WEHI

Currently only available on unix02.

Access from a Macintosh is limited to command line window only. The graphics window can only be seen if an X-Windows program is installed on the Mac.

However, if there is a demand for use of R at WEHI then Computer Centre will investigate options to change this situation.

Install R windows on a PC or install R for linux.

Page 7: Bringing A Statistical Package To The Biologist's Fingertips

Using R at WEHI (2)

NAT>R

R : Copyright 2000, The R Development Core TeamVersion 1.0.0 (February 29, 2000)Type "demo()" for some demos, "help()" for on-line help, or "help.start()" for a HTML browser interface to help.

Type "q()" to quit R.>q()Save workspace image? [y/n/c]: y

NAT>R --vsize=50M --nsize=2000k

Page 8: Bringing A Statistical Package To The Biologist's Fingertips

How to make a vector

> x<-c(1,3,5,4,7,8)> x[1] 1 3 5 4 7 8

> t(x) [,1] [,2] [,3] [,4] [,5] [,6][1,] 1 3 5 4 7 8

> length(x)[1] 6

> index<-c(2,3,4)> x[index][1] 3 5 4>

Page 9: Bringing A Statistical Package To The Biologist's Fingertips

How to make a matrix

> xmat<-matrix(x,nrow=2,ncol=3,byrow=T)> xmat [,1] [,2] [,3][1,] 1 3 5[2,] 4 7 8

> xmat[1,2][1] 3> xmat[,3][1] 5 8

> xmat<-matrix(x,nrow=2,ncol=3,byrow=F)> xmat [,1] [,2] [,3][1,] 1 5 7[2,] 3 4 8

Page 10: Bringing A Statistical Package To The Biologist's Fingertips

Adding and removing a column

> addcol<-c(9,2)>> newxmat<-cbind(xmat,addcol)> newxmat addcol[1,] 1 5 7 9[2,] 3 4 8 2

> oldxmat<-newxmat[,-4]> oldxmat

[1,] 1 5 7[2,] 3 4 8

>

Page 11: Bringing A Statistical Package To The Biologist's Fingertips

A script to find mean of columns

> for( i in 1:3){+ print(mean(xmat[,i]))+ }> > 2.0 > 4.5 > 7.5 >

m<-0for( i in 1:3){m<-c(m,mean(xmat[,i]))}m<-m[-1]

for( i in 1:3){ print(mean(xmat[,i]))}

> dim(xmat)[1] 2 3> m<-0+ for( i in 1:3){+ m<-c(m,mean(xmat[,i]))+ }+ m<-m[,-1] >+ + + > > > > > > > > m[1] 2.0 4.5 7.5

Page 12: Bringing A Statistical Package To The Biologist's Fingertips

Reading in Datanum GR GC SR SC NAME X Y CH1ICH1B CH1ISD CH1BSD CH2I CH2B CH2ISDCH2BSD1 1 1 1 1 CL0001 1220.00 890.00 1223.317505 168.473679 435.35226437.599304 1014.603149 139.578949 446.61496021.9375782 1 1 1 2 CL0001 1400.00 890.00 1257.714233 233.368423 337.94632090.568703 975.333313 142.684204 354.19403122.9348183 1 1 1 3 CL0008 1580.00 890.00 333.555542 144.000000 145.99256915.944347 277.730164 126.842102 156.3145299.719757

Page 13: Bringing A Statistical Package To The Biologist's Fingertips

Reading in data from a text file

>#check that file has same number of arguments >#on each line for all lines>count.fields(file="tp04sk1.txt",sep="\t",skip=0)> . . . . . . . . . . . . . . . . . 16 16 16 16[9145] 16 16 16 16 16 16 16 16 16 16 16 16 16 16 [9169] 16 16 16 16 16 16 16 16 16 16 16 16 16 16 [9193] 16 16 16 16 16 16 16 16 16 16 16 16 16 16 [9217] 16

>tp4sk1<- read.table("tp04sk1.txt", header=T, sep="\t", skip=0, row.names=1)> > >>attach(tp4sk1)> median(CH1I)[1] 375.627

Page 14: Bringing A Statistical Package To The Biologist's Fingertips

Getting spot info from the dataframe

> cy3 <- CH2I # Greency5 <- CH1I # Red>> cy3bc <- CH2I-CH2B # Background Corrected.cy5bc <- CH1I-CH1B

> # Get duplicates.> d1 <- seq(1,(dim(tp4sk1)[1]-1),2)d2 <- seq(2,(dim(tp4sk1)[1]),2)>> cy3d1 <- cy3bc[d1] cy3d2 <- cy3bc[d2]> cy5d1 <- cy5bc[d1]cy5d2 <- cy5bc[d2]>

Page 15: Bringing A Statistical Package To The Biologist's Fingertips

Always log the intensities

> > par(mfrow=c(2,3))hist(cy3,col="green")plot(density(cy3),col="green")plot(density(Cy3),col="green") # Use Log base 2 hist(cy5,col="red") plot(density(cy5),col="red")plot(density(Cy5),col="red")>>

Page 16: Bringing A Statistical Package To The Biologist's Fingertips

Normalisation

>>>>

> par(mfrow=c(2,1))plot(density(Cy3),type="n")lines(density(Cy3),col="green")lines(density(Cy5),col="red")plot(Cy3,Cy5,xlab="Log(cy3) Background Corrected",ylab="Log(cy5) Background Corrected",main="The Need For Normalisation Between Green and Red Intensities")lines(lowess(Cy3,Cy5),col="yellow")

Page 17: Bringing A Statistical Package To The Biologist's Fingertips

Normalisation (2)

>

>K <- median(

log2(cy3)-log2(cy5) )>>k <- 2**KCy5n <- k*cy5Cy5n <- log2(cy5n)

>

>

Green intensity is a multiple of the red intensity.cy3 <- k*cy5

So when you take logs,log2(cy3) <- K+log2(cy5)

Therefore, estimate K by the median difference of log intensities.

K <- median( Cy3 - Cy5 )k <- 2**(K)cy5n <- k*cy5Cy5n <- log2(cy5n)

Page 18: Bringing A Statistical Package To The Biologist's Fingertips

Approximate normality of log ratios

> par(mfrow=c(2,1))plot(density(Cy5n-Cy3),col="purple")>>qqnorm(Cy5n-Cy3,col=c("red","red","yellow","yellow","green","green","blue","blue","pink","pink","orange","orange","purple","purple","black","black"))>

>

Page 19: Bringing A Statistical Package To The Biologist's Fingertips

A question of significance

> par(mfrow=c(1,1))>plot(0.5*(Cy3+Cy5n),Cy5n-Cy3,xlab="Average of logRed and logGreen",ylab="Difference of logRed and logGreen",main="Variation In Intensities Is Not Constant",col=c("red","red","yellow","yellow","green","green","blue","blue","pink","pink","orange","orange","purple","purple","black","black"))>>lines(lowess(0.5*(Cy3+Cy5n),Cy5n-Cy3),col=”yellow")

> >

Page 20: Bringing A Statistical Package To The Biologist's Fingertips

Identifying a spot on a plot

> par(mfrow=c(1,1))plot(0.5*(Cy3+Cy5n),Cy5n-Cy3,xlab="Average of logRed and logGreen",ylab="Difference of logRed and logGreen",main="Variation In Intensities Is Not Constant", type="n",ylim=c(-4,4),xlim=c(6,12))>text(0.5*(Cy3+Cy5n),Cy5n -Cy3, as.character=c(1:9216),col=c("red","red","yellow","yellow","green","green","blue","blue","pink","pink","orange","orange","purple","purple","black","black"),cex=1)lines(lowess(0.5*(Cy3+Cy5n),Cy5n-Cy3), col="yellow")

Page 21: Bringing A Statistical Package To The Biologist's Fingertips

Saving graphics to a file (postscript)

>postscript(“filename.ps”) par(mfrow=c(1,1))plot(0.5*(Cy3+Cy5n),Cy5n-Cy3,xlab="Average of logRed and logGreen",ylab="Difference of logRed and logGreen",main="Variation In Intensities Is Not Constant", type="n",ylim=c(-0.1,1),xlim=c(10,11))text(0.5*(Cy3+Cy5n),Cy5n-Cy3, as.character=c(1:9216),col=c("red","red","yellow","yellow","green","green","blue","blue","pink","pink","orange","orange","purple","purple","black","black"),cex=1)

dev.off()

>

Page 22: Bringing A Statistical Package To The Biologist's Fingertips

Using R help

> ?plotGeneric X-Y PlottingDescription:Generic function for plotting of R objects. For more details about the graphical parameter arguments, see`par'.Usage: plot(x, ...) plot(x, y, xlim=range(x), ylim=range(y), type="p", main, xlab, ylab, ...) plot(y ~ x, ...)Arguments: x: the coordinates of points in the plot. Alternatively, a single plotting structure or any R object with a `plot’ method can be provided.:

Page 23: Bringing A Statistical Package To The Biologist's Fingertips

Using R help (2)

> help.start()

Page 24: Bringing A Statistical Package To The Biologist's Fingertips

R Help (3)

Page 25: Bringing A Statistical Package To The Biologist's Fingertips

11 22

66

14 15

11

7

16

12

8

443

5

9

13

10

1 2 3 4 ……………….2425 26 27 …………………..48…….…..…........ 1.......….……...……..…………………………….576

577 578 579 …………….10011002 1003 …..…………..1025…….…..…........ 2.......….……...……..…………………………..1152

Page 26: Bringing A Statistical Package To The Biologist's Fingertips

Level colour plot of background

> bkgmat<-matrix(1:24,nrow=24,ncol=1) for(i in 1:16){ s<-c((((i-1)*576)+1):(i*576)) m<-matrix(CH1B[s],nrow=24,ncol=24,byrow=T) bkgmat<-cbind(bkgmat,m) } bkgmat<-bkgmat[,-1] m1<-bkgmat[,1:96] m2<-bkgmat[,(97:192)] m3<-bkgmat[,(193:(3*96))] m4<-bkgmat[,(((3*96)+1):(4*96))] bkg<-rbind(m1,m2,m3,m4) > + + + >> + + + + > > > > >

> filled.contour(1:96,1:96,bkg,nlevels=100,color.palette=heat.colors)

Page 27: Bringing A Statistical Package To The Biologist's Fingertips

ConclusionR is flexible and powerful

• Easy to read in data.

• Enables manipulation of data.

• Extensive control of and range of graphics.

• Wide range of statistical functions.

• Add on packages available.

• Can write scripts as a text file to send to collaborators for importing into R. (Use source(“filename”) to import and execute code).

• Can save all the work you do in a session.

Page 28: Bringing A Statistical Package To The Biologist's Fingertips

Acknowledgements

Terry Speed

Melanie Bahlo

Asa Wirapati

George Rudy

Jean Yee HwaYang

Chuang Fong Kong

Keith Slattery