darlene goldstein a comparison of microarray platforms nus – ims workshop 7 january 2004

37
Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

Post on 18-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

Darlene Goldstein

A Comparison of Microarray Platforms

NUS – IMS Workshop7 January 2004

Page 2: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

Talk Outline

• Bioinformatics Core Facility at ISREC

• Purpose of study

• Platform technologies and study design

• Comparisons between platforms

• Conclusions and study completion

Page 3: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

BCF: What is it ?• ISREC-based, supported by

the NCCR for molecular oncology, member group of the SIB

• Created by the NCCR

molecular oncology to assist its DAF (which is now absorbed into the DAFL) and its microarray users in their biomedical research

• A group devoted to the bioinformatics and statistical aspects of gene expression research, in particular to the analysis of data generated with microarray technologies

BCF

DAF NCCR biomedical

BCFDAFL &

BIOINF

microarray researchNCCR biomedical

bioinf. research

biostatisticsEPFL

Page 4: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

BCF: Main Components

• Technical Support

– advice in experimental design and data analysis

– production, control, development of spotted arrays

– processing of microarray data, quality assessment

• Education

– practical training through classes / workshops

• Collaboration

– statistical data analysis of research projects

• Research & Development

– development / testing tools & methods

Page 5: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

Platform Comparison Study• Purpose

– to assess accuracy and reproducibility of different gene expression platforms

– to compare features of different measurement types

– to understand the system (important for normalization and downstream analysis)

• Impact

– practical advice to DAF(L) and to NCCR microarray users

– benefit to wider scientific community, especially if possible to somehow combine results across array types

Page 6: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

Platforms and Study Design• Platforms

– Affymetrix GeneChips, high-density short oligo arrays

– Agilent long oligo arrays– in-house spotted cDNA arrays– MPSS (massively parallel signature sequencing, a

digital gene expression technology patented by Lynx); in collaboration with the Ludwig Institute for Cancer Research; originally intended as ‘gold standard’

• Basic Design– 3 replicate measurements for two mRNAs (human

placenta and testis)– dye swap for two-color systems (Agilent, cDNA)– 2 to 3 million tags sequenced for MPSS

Page 7: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

Methods• Experimental Method (as recommended by

‘specialists’):

– Affymetrix: Biozentrum Basel– Agilent: Institut Goustav Roussy, Paris– Spotted cDNA arrays: Otto Hagenbuechle's group

(DAF, now DAFL)– MPSS: Lynx (California), Victor Jongeneel's group

(LICR)– qRT-PCR followup (~ 250 genes), Robert Lyle,

Patrick Descombes (UniGE)

• Expression Quantificationas recommended by ‘specialists’ (above),but : RMA for Affymetrix

Page 8: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

Spotted cDNA arrays

Human 10k Array 8x4 subarrays

Page 9: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

Affymetrix GeneChips

Image of hybridized array

Page 10: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

MPSS•Uses microbeads with

~100k identical DNA molecules attached

•Captures and identifies transcript sequences of expressed genes by counting the number of individual mRNA molecules representing each gene

•Individual mRNAs are identified through generated 17-to 20-base signature sequence

•Can use without organism sequence information

•‘MPSS can accurately quantify transcripts as low as 5 transcripts per million (tpm) to above 50,000 tpm’

(information from Lynx web site)

Page 11: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

Other comparison studies (I)

• Yuen et al. 2002; Nuc. Acids Res. 30(10):e48– Affy MGU-74A, cDNA; cell lines; qRT-PCR 47 genes– both arrays sensitive (TP) and specific (TN) at identifying

regulated transcripts– found comparable rank-order of gene regulation, but only

modest correlation in fold-change– both array types biased downwards (FC under-estimated

compared to qRT-PCR)

• Evans et al. 2002; Eur. J. Neuroscience 16:409-413– Affy RG-U34A, SAGE to detect brain transcripts; 43 rat

hippocampi; evaluation based on 1000 transcripts– ~55% low, ~90% high abundance transcripts detected

Page 12: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

Other comparison studies (II)

• Li et al. 2002; Toxicological Sciences 69:383-390– Affy HuGene FL, HGU-95Av2, IncyteGenomics UniGemV

2.0 (‘long cDNA’); drug-treated cell lines at 8h and 24h; qRT-PCR 9 genes

– cross-hyb contributed to platform discrepancies – found Affy ‘more reliable’ (sensitive)

• Kuo et al. 2002; Bioinformatics 18:405-412– Affy HU6800, cDNA, publicly available data on NCI 60;

2895 genes– found low correlation between measurements (but no

control over lab procedures – different groups had performed the original studies)

Page 13: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

Other comparison studies (III)• Barczak et al. 2003; Genome Res. 13:1775-1785

– 2 versions of spotted long oligo (Operon), Affy HGU-95Av2; cell lines; 7344 genes

– this large-scale analysis found strong correlations between relative expression measurements

– similar results for amplified and unamplified targets

• Tan et al. 2003; Nuc. Acids Res. 31:5676-5684– Agilent Human 1, Affy HGU-95Av2, Amersham Codelink

UniSet Human I (30-mers); cell lines in serum-rich medium and 24h after serum removal; 2009 genes

– modest correlations– little overlap in genes called DE– best agreement on DE calls (varying criteria) only 21%

• comparison studies by other groups world-wide are also in progress

Page 14: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

Comparison Principle• Cross-platform gene matching done through the

trome database of transcripts (constructed with the Transcriptome Analyzer program tromer)

• Use only those genes we classify as ‘reliably mapped’ between platforms (~2500 genes); we have not (yet) looked at probe(set)s that could not be well-mapped to known transcripts

• ‘Peak technical performance’ : this is a case study, not a systematic study; does not take into account normal user variation, other mRNAs, etc.

• Comparison based on M (log ratio) and A (average log intensity)

• Unfortunately, accuracy cannot be properly assessed, as true M values are not known

Page 15: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

cDNA array Performance

Page 16: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

MA plots (examples)

range

background

Affy U133A

NCCR h10kd

Agilent

Page 17: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

|M| (putative effect) densities

Page 18: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

(Difference in M) vs. A: reproducibility

Affy U133A

Agilent h1A

NCCR h10k

y = difference in M

x = average A

Page 19: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

D (error) densities

Page 20: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

Gene Matching

3365

549

435

2514

5797

5099 5977

AgilentAffy

NCCR

Probe(sets) / genes18325 Agilent h1A 1568824808 Affy U133A 14876 7812 NCCR h10k 6853

Page 21: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

Gene matching also with MPSS

2494

--

-

--

Agilent

Affy

NCCR

2494 Tromer clusters4060 Affy probesets2869 Agilent probes2685 NCCR clones

Page 22: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

Concordance in M density plots (I)

Agilent Affy NCCR

Page 23: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

Concordance in M density plots (II)

Agilent Affy NCCR

Page 24: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

Difficulty in comparing to MPSS ratios

selection on MPSS MPSS MPSS AFFY AGIL NCCRTESTIS PLAC M M ave M ave M ave genedescr

Affy upHTR005199_MPSS_1_AGIL_1_AFFY_1_NCCR_10 9 3.32 6.38 3.91 7.12 Fc fragment of IgE, high affinity I, receptor for; alpha polypeptideHTR000581_MPSS_1_AGIL_1_AFFY_1_NCCR_114 5117 8.41 6.45 4.07 5.91 cytochrome P450, family 19, subfamily A, polypeptide 1HTR000581_MPSS_1_AGIL_2_AFFY_1_NCCR_114 5117 8.41 6.45 1.34 5.91 cytochrome P450, family 19, subfamily A, polypeptide 1HTR004581_MPSS_1_AGIL_1_AFFY_1_NCCR_11 3635 10.83 6.89 4.12 4.84 pregnancy-associated plasma protein AHTR006015_MPSS_1_AGIL_1_AFFY_1_NCCR_19 9632 9.91 6.99 4.20 6.60 glycoprotein hormones, alpha polypeptideHTR000790_MPSS_1_AGIL_1_AFFY_2_NCCR_10 280 8.13 7.94 2.20 4.38 hydroxy-delta-5-steroid dehydrogenase, 3 beta- and steroid delta-isomerase 1HTR000790_MPSS_1_AGIL_1_AFFY_2_NCCR_20 280 8.13 7.94 2.20 5.69 hydroxy-delta-5-steroid dehydrogenase, 3 beta- and steroid delta-isomerase 1

Agil upHTR010250_MPSS_1_AGIL_1_AFFY_1_NCCR_14 211 5.41 5.87 4.56 6.15 Homo sapiens adrenomedullin (ADM), mRNA.HTR004842_MPSS_1_AGIL_1_AFFY_1_NCCR_16 1328 7.57 5.91 4.98 5.36 glypican 3HTR004414_MPSS_1_AGIL_1_AFFY_1_NCCR_10 6 2.81 1.58 5.15 4.56 estrogen-related receptor gammaHTR004414_MPSS_1_AGIL_1_AFFY_2_NCCR_10 6 2.81 4.02 5.15 4.56 estrogen-related receptor gammaHTR002717_MPSS_1_AGIL_1_AFFY_1_NCCR_10 828 9.70 6.06 6.17 7.04 insulin-like growth factor binding protein 1HTR002717_MPSS_1_AGIL_1_AFFY_1_NCCR_10 828 9.70 6.06 6.17 7.04 insulin-like growth factor binding protein 1

NCCR upHTR010250_MPSS_1_AGIL_1_AFFY_1_NCCR_14 211 5.41 5.87 4.56 6.15 Homo sapiens adrenomedullin (ADM), mRNA.HTR003344_MPSS_1_AGIL_1_AFFY_1_NCCR_10 726 9.51 0.02 1.75 6.23 placental growth factor, vascular endothelial growth factor-related proteinHTR003344_MPSS_1_AGIL_1_AFFY_2_NCCR_10 726 9.51 5.01 1.75 6.23 placental growth factor, vascular endothelial growth factor-related proteinHTR006015_MPSS_1_AGIL_1_AFFY_1_NCCR_19 9632 9.91 6.99 4.20 6.60 glycoprotein hormones, alpha polypeptideHTR002717_MPSS_1_AGIL_1_AFFY_1_NCCR_10 828 9.70 6.06 6.17 7.04 insulin-like growth factor binding protein 1HTR005199_MPSS_1_AGIL_1_AFFY_1_NCCR_10 9 3.32 6.38 3.91 7.12 Fc fragment of IgE, high affinity I, receptor for; alpha polypeptide

Page 25: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

MPSS difficulties, another illustration

Page 26: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

Correlations

MPSS AGILENT 1 AGILENT 2 AFFY NCCR 1 NCCR 2MPSS 1.00 0.43 -0.45 0.44 0.47 -0.47 MPSS

AGILENT 1 0.43 1.00 -0.97 0.65 0.72 -0.73 AGILENT 1AGILENT 2 -0.45 -0.97 1.00 -0.66 -0.73 0.73 AGILENT 2

AFFY 0.44 0.65 -0.66 1.00 0.72 -0.73 AFFYNCCR 1 0.47 0.72 -0.73 0.72 1.00 -0.98 NCCR 1NCCR 2 -0.47 -0.73 0.73 -0.73 -0.98 1.00 NCCR 2

MPSS AGILENT 1 AGILENT 2 AFFY NCCR 1 NCCR 2

first quartile (25% least frequent RNAs)

fourth quartile (25% most frequent RNAs)

MPSS AGILENT 1 AGILENT 2 AFFY NCCR 1 NCCR 2MPSS 1.00 0.72 -0.73 0.73 0.76 -0.76 MPSS

AGILENT 1 0.72 1.00 -0.98 0.77 0.79 -0.79 AGILENT 1AGILENT 2 -0.73 -0.98 1.00 -0.77 -0.79 0.80 AGILENT 2

AFFY 0.73 0.77 -0.77 1.00 0.81 -0.81 AFFYNCCR 1 0.76 0.79 -0.79 0.81 1.00 -0.98 NCCR 1NCCR 2 -0.76 -0.79 0.80 -0.81 -0.98 1.00 NCCR 2

MPSS AGILENT 1 AGILENT 2 AFFY NCCR 1 NCCR 2

Page 27: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

Agreement: top up 200 (placenta)

96

2634

44

4840 30Agilent

Affy

NCCR

M range Affy: 1.66 - 7.94Agil: 1.48 - 6.17

NCCR: 1.83 - 7.12

Page 28: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

Agreement: top down 200 (testis)

87

2634

53

4638 41Agilent

Affy

NCCR

M range Affy: -8.27 - -1.65Agil: -6.07 - -1.47

NCCR: -6.18 - -1.79

Page 29: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

Comparison with MPSS, 99% CI (up)

Page 30: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

Comparison with MPSS, 99% CI (Down)

Page 31: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

MPSS CI Overlap

Overlap with the 99% CI for MPSS

Affy 31.23%Agilent 28.62%NCCR 30.52%

Overlap with the 99.9% CI for MPSS

Affy 41.53%Agilent 38.90%NCCR 40.52%

Page 32: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

Overlap with MPSS

88

112

74MPSS

NCCR

38

missing or classified as unreliably mapped (tag to gene not unique)

(similar numbers also for Affy and Agilent);56 of the 88 are in common to all 4

Page 33: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

Conclusions (I)•The three microarray platforms compared

performed very similarly in terms of which genes are detected as differentially expressed, distributions of M values, variability between replicate measurements ...

... so similarly that it seems hard to find real differences

•Most disagreement for low-expressed genes

•RMA M values (Affy) are better variance-stabilized, but reproducibility is good for all platforms except for weak signals in Agilent (likely due to bg treatment)

•RMA M values are more strongly compressed towards zero at low intensity; reduces false positive calls but might make DE at low intensity undetectable (but is it detectable at all?)

Page 34: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

Conclusions (II)Microarrays vs MPSS

M values, quantitative comparison:

the disagreement is large ...

... so large that it is hard to reconcile the values, making it impossible to use MPSS as the ‘gold standard’

M values, qualitative comparison:

there is a good degree of agreement - approximately the same to all three

microarray platforms

Page 35: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

Conclusions (III)• MPSS predicts many more low-abundance genes to

be (strongly) differentially expressed• The hybridization methods lose signal of low-

abundance genes (due to the background fluorescence estimation?)

• microarrays miss detection of most of the differential expression of low abundance transcripts, but it is also possible that MPSS is biased for many genes or less precise than this approach suggests

approach with confidence intervals for MPSS

(currently approximate CI that takes into consideration the sampling error on the counts, we have no replicated measurements for MPSS)

Page 36: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

Completion of Study

Choose genes for qRT-PCR for which the platforms and MPSS disagree and (attempt to) address the questions:

• which platform is more accurate?• how does accuracy depend on the signal intensity?• do the microarrays miss DE frequently....?• ....and especially at weak signal intensity ?• which platform best detects low abundance RNAs?• does MPSS agree with QT-PCR?

• Suggestions are welcome !!

Page 37: Darlene Goldstein A Comparison of Microarray Platforms NUS – IMS Workshop 7 January 2004

Acknowledgements

• Ludwig Institute for Cancer ResearchVictor Jongeneel, Christian Iseli, Brian

Stephenson

• DAF/DAFLOtto Hagenbuechle, Josiane Wyniger

• UniGERobert Lyle, Patrick Descombes

• BCF

Mauro Delorenzi, Eugenia Migliavacca

• and everyone I inadvertently left out!