statistical analyses of microarray data rafael a. irizarry department of biostatistics [email protected]...

42
Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics [email protected] http://biosun01.biostat.jhsph.edu/~ririzarr

Upload: cody-chase

Post on 28-Dec-2015

221 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

Statistical Analyses of Microarray Data

Rafael A. Irizarry

Department of [email protected]

http://biosun01.biostat.jhsph.edu/~ririzarr

Page 2: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

Outline

• Scientific questions

• Review of technology

• Role of statistics

• Two case studies

Page 3: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

Scientific Questions

• Expression

• Differential expression

• Expression patterns

“To understand gene function, it is helpful to know when and where it is expressed and…”

“…under what circumstances the expression level is affected.”

“… questions concerning functional pathways and how cellular components work together to regulate and carry out cellular

processes.”

Lipshutz et al. (1999) Nature genetics, 21, pp. 20-21

Page 4: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

What do Microarrays do?Interrogate labeled nucleic acid samples

model systems, microdissections, cell lines, human tissue bank

kanRUPTAG DOWNTAG

• RNA samples

• Oligonucleotide barcodes

Page 5: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

How do they do it?

Probes

Labeled targets

Page 6: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

cDNA clones(probes)

PCR product amplificationpurification

printing

microarray

Hybridize target to microarray

mRNA target

excitation

laser 1laser 2

emission

scanning

analysis

0.1nl/spot

overlay image and normalize

cDNA Arrays

Page 7: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

High Density Oligonucleotide Arrays

24µm24µm

Millions of copies of a specificMillions of copies of a specificoligonucleotide probeoligonucleotide probe

Image of Hybridized Probe ArrayImage of Hybridized Probe Array

>200,000 different>200,000 differentcomplementary probes complementary probes

Single stranded, Single stranded, labeled RNA targetlabeled RNA target

Oligonucleotide probeOligonucleotide probe

**

**

*

1.28cm1.28cm

GeneChipGeneChip Probe ArrayProbe ArrayHybridized Probe CellHybridized Probe Cell

Compliments of D. Gerhold

Page 8: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

Role of Statistics

Page 9: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

Biological questionDifferentially expressed genesSample class prediction etc.

Testing

Biological verification and interpretation

Microarray experiment

Estimation

Experimental design

Image analysis

Normalization

Clustering Discrimination

Quantify Expression

Page 10: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

Part of the image of one channel false-coloured on a white (v. high) red (high) through yellow and green (medium) to blue (low) and black scale

Page 11: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

Does one size fit all?

Page 12: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

Segmentation: limitation of the fixed circle method

SRG Fixed Circle

Inside the boundary is spot (fg), outside is not.

Page 13: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

Some local backgrounds

We use something different again: a smaller, less variable value.

Single channelgrey scale

Page 14: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

Quantification of Expression

For each spot on the slide we calculateRed intensity = Rfg – Rbg

fg = foreground, bg = background, andGreen intensity = Gfg – Gbg

and combine them in the log (base 2) ratioLog2( Red intensity / Green intensity)

we now have one differential expression for each gene for each array

Page 15: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr
Page 16: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

Top 2.5%of ratios red, bottom 2.5% of ratios green

The red-green ratios can be spatially biased

Page 17: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

Another example

Page 18: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

Oligo Array Image Analysis

• About 100 pixels per probe cell

• These intensities are combined to form one number representing expression for the probe cell oligo

Page 19: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

Normalization at Probe Level

Page 20: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

Normalization at Probe Level

Page 21: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

Dilution Experiment Data

Page 22: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

Dilution Experiment Data

Page 23: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

PM MM

Page 24: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

Default until 2002

• GeneChip® software uses Avg.diff

with A a set of “suitable” pairs chosen by software.• Log ratio version is also used.• For differential expression Avg.diffs are compared

between chips.

j

jj MMPMdiffAvg )(1

.

Page 25: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

What is the evidence? Lockhart et. al. Nature Biotechnology 14 (1996)

Page 26: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

Two case studies

Page 27: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

Spike-In Experiments

• Add concentrations (0.5pM – 100 pM) of 11 foreign species cRNAs to hybridization mixture

• Set A: 11 control cRNAs were spiked in, all at the same concentration, which varied across chips.

• Set B: 11 control cRNAs were spiked in, all at different concentrations, which varied across chips. The concentrations were arranged in 12x12 cyclic Latin square (with 3 replicates)

Page 28: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

Set A: Probe Level Data (12 chips)

Page 29: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

Spike-In BProbe Set Conc 1 Conc 2 Rank

BioB-5 100 0.5 1

BioB-3 0.5 25.0 2

BioC-5 2.0 75.0 3

BioB-M 1.0 35.7 4

BioDn-3 1.5 50.0 5

DapX-3 35.7 3.0 6

CreX-3 50.0 5.0 7

CreX-5 12.5 2.0 8

BioC-3 25.0 100 9

DapX-5 5.0 1.5 10

DapX-M 3.0 1.0 11

Later we consider 23 different combinations of concentrations

Page 30: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

Observed RanksGene AvDiff MAS 5.0 Li&Wong AvLog(PM-BG)

BioB-5 6 2 77 1

BioB-3 16 1 33 2

BioC-5 74 6 22 5

BioB-M 30 3 6 3

BioDn-3 44 5 27 4

DapX-3 239 24 796 7

CreX-3 333 73 386 11

CreX-5 3276 33 43 9

BioC-3 2709 8572 12 10300

DapX-5 2709 102 59 17

DapX-M 165 19 30 6

Page 31: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr
Page 32: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

kanRA

Transformation into deletion pool

Select for Ura+ transformantsGenomic DNA preparation

Circular pRS416

PCRCy5 labeled PCR products Cy3 labeled PCR products

Oligonucleotide array hybridization

B

EcoRI linearized PRS416

NHEJ Defective

MCS

CEN/ARS

URA3 ttaaaatt

CEN/ARS

URA3

UPTAG DOWNTAG

Page 33: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

• .

Y K U 7 0 N E J 1 Y K U 8 0

Y K U 7 0 N E J 1 Y K U 8 0

Page 34: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

Average Red and Green Scatter Plot

Page 35: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

Average Red and Green MVA plot

Page 36: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

Histograms

Page 37: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

QQ-Plot

Page 38: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

Z-Scores

Page 39: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

Average Red and Green MVA Plot

Page 40: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

Average Red and Green Scatter Plot

Page 41: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

Summary

• Simple data exploration useful tool for quality assessment

• Statistical thinking helpful for interpretation

• Statistical models may help find signals in noise

Page 42: Statistical Analyses of Microarray Data Rafael A. Irizarry Department of Biostatistics rafa@jhu.edu ririzarr

Acknowledgements

UC Berkeley StatBen BolstadSandrine DudoitTerry SpeedJean Yang

MBG (SOM)Jef BoekeSiew-Loon OoiMarina LeeForrest Spencer

BiostatisticsKarl BromanLeslie CopeCarlo CoulantoniGiovanni ParmigianiScott Zeger

Gene LogicFrancois Colin Uwe Scherf’s Group

PGATom Cappola Skip GarciaJoshua Hare

WEHIBridget HobbsNatalie Thorne