affydecomp: towards a benchmark for differential expression methods

27
AffyDEComp: towards a benchmark for differential expression methods Richard Pearson School of Computer Science University of Manchester

Upload: jalia

Post on 19-Jan-2016

20 views

Category:

Documents


0 download

DESCRIPTION

AffyDEComp: towards a benchmark for differential expression methods. Richard Pearson School of Computer Science University of Manchester. Overview. Why benchmark DE methods? The Golden Spike data set AffyDEComp Conclusions Recommendations. The need for benchmarks. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: AffyDEComp: towards a benchmark for differential expression methods

AffyDEComp: towards a benchmark for differential

expression methods

Richard Pearson

School of Computer Science

University of Manchester

Page 2: AffyDEComp: towards a benchmark for differential expression methods

Overview

Why benchmark DE methods?

The Golden Spike data set

AffyDEComp

Conclusions

Recommendations

Page 3: AffyDEComp: towards a benchmark for differential expression methods

The need for benchmarks

Microarray analysis has many stages

Competing methods at each stage

Methodologists good at showing superiority

Results can appear contradictory

Confused end users choice driven by…What they are familiar with

What colleagues use

What was used in their favourite paper

…and not by a scientific comparison

Page 4: AffyDEComp: towards a benchmark for differential expression methods

Benchmarking requirements

Methods: a set we wish to compareBenchmark data: where truth is knownMetrics: by which to compare methodsAffycomp

Methods: Summarisation methodsBenchmark data: various spike-in studiesMetrics: various, including, e.g. area under ROC curve for a fold change classifier

Affycomp doesn’t compare DE methods

Page 5: AffyDEComp: towards a benchmark for differential expression methods

A benchmark for DE methods

Methods:DE methods depend on summarisation

Compare summarisation/DE combinations

Benchmark data:Affycomp spike-ins have few DE genes

Golden spike data has many DE genes, but also a few “issues”!

Metrics:Based around areas under ROC curves

Page 6: AffyDEComp: towards a benchmark for differential expression methods

The Golden Spike data

3 “sample”, 3 “control” arrays

Many RNAs “spiked-in” at known levels

“DE”, “Equal” and “Empty” probesets.

Controversial data setNon-uniform null p-value distributions - use ROC

Spike-in concentrations high - unrepresentative

“DE” spike-ins all up-regulated - unrepresentative

Concentrations and FC confounded - loess

Different FC between “Equal” and “Empty”

Page 7: AffyDEComp: towards a benchmark for differential expression methods

“Empty” > FC than “Equal”

Most analyses have treated both Empty and Equal as True Negatives - to what effect?

Page 8: AffyDEComp: towards a benchmark for differential expression methods

“Empty” > FC than “Equal”

To illustrate how analysis choices effect results I’ll treat Empty and Equal as true negative (TN) and DE<=1.2 as true positive (TP)

Page 9: AffyDEComp: towards a benchmark for differential expression methods

2-sided test

Large apparent difference between methodsCan you guess which paper used this chart?

Page 10: AffyDEComp: towards a benchmark for differential expression methods

2-sided test

Large apparent difference between methodsAre TP correctly identified as up-regulated?

Page 11: AffyDEComp: towards a benchmark for differential expression methods

1-sided test of up-regulation

Probesets identified as up-regulated not TP

Page 12: AffyDEComp: towards a benchmark for differential expression methods

1-sided test of down-regulation

DE probesets are mostly being identified as down-regulated, despite the fact that they are in truth up-regulated

We appear to be identifying TP as down-regulated

Page 13: AffyDEComp: towards a benchmark for differential expression methods

DE <=1.2 lower than Empty

TP are identified as down-regulated because most TN are “Empty” which have higher FC than DE <= 1.2

Page 14: AffyDEComp: towards a benchmark for differential expression methods

Remove “empty” probesets

We can remedy this by using just Equal probesets as our TN…

…bearing in mind that this makes the data somewhat atypical

Page 15: AffyDEComp: towards a benchmark for differential expression methods

Up-regulation - Empty in TN

Probesets identified as up-regulated generally not TP when using Empty in TN

Page 16: AffyDEComp: towards a benchmark for differential expression methods

Up-regulation - TN Equal

Probesets identified as up-regulated more likely to be TP when using only Equal as TN

Page 17: AffyDEComp: towards a benchmark for differential expression methods

Down-regulation - Empty in TN

DE probesets are mostly being identified as down-regulated, despite the fact that they are in truth up-regulated

We appear to be identifying TP as down-regulated when including Empty in TN

Page 18: AffyDEComp: towards a benchmark for differential expression methods

Down-regulation - TN Equal

We generally don’t identify TP as down-regulated when excluding Empty in TN

Page 19: AffyDEComp: towards a benchmark for differential expression methods

“Recommended” test

We recommend using just Equal as TN, and all DE as TP

Page 20: AffyDEComp: towards a benchmark for differential expression methods

Recommended Up-reg

Using our recommendations, tests of up-regulation generally find TP, as expected

Page 21: AffyDEComp: towards a benchmark for differential expression methods

Recommended Down-reg

Using our recommendations, tests of down-regulation generally don’t find TP, as expected

Page 22: AffyDEComp: towards a benchmark for differential expression methods

Analysis decisions to make

Summarisation methodDE methodDirection of DE (recommend up)Choice of true negatives (equal only)Choice of true positives (all DE)Post-summarisation normalisation (loess using equal only)Type of ROC chart (standard ROC)Proportion of x-axis to display (all)

Page 23: AffyDEComp: towards a benchmark for differential expression methods

AffyDEComp - charts

Page 24: AffyDEComp: towards a benchmark for differential expression methods

AffyDEComp - comparison

Page 25: AffyDEComp: towards a benchmark for differential expression methods

AUCs - recommended choices

Page 26: AffyDEComp: towards a benchmark for differential expression methods

Conclusions

First step towards a reliable benchmark for DEGolden Spike data has some value if use of empty probesets is revisitedCertain combinations of summarisation/DE methods seem poor

Keep it open (Bioconductor) - because science should be reproducible!

Page 27: AffyDEComp: towards a benchmark for differential expression methods

Recommendations

Create a new spike-in data set whereSpike-in concentrations are realistic

DE spike-ins both up- and down-regulated

Concentrations and FC not confounded

Larger number of arrays

Benchmarks using regulatory information

Benchmarks for Illumina data

Benchmarks for SNP chips (GWA studies)

manchester.ac.uk/bioinformatics/affydecomp