alex lewin (imperial college) sylvia richardson (ic epidemiology) tim aitman (ic microarray centre)...

56
Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina (IC Epidemiology) Helen Causton (IC Microarray Centre) Peter Green (Bristol) Bayesian Modelling for Differential Gene Expression

Upload: evelyn-fitzgerald

Post on 28-Mar-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Alex Lewin (Imperial College)

Sylvia Richardson (IC Epidemiology)Tim Aitman (IC Microarray Centre)

In collaboration with Anne-Mette Hein, Natalia Bochkina (IC Epidemiology)

Helen Causton (IC Microarray Centre)Peter Green (Bristol)

Bayesian Modelling for Differential Gene Expression

Page 2: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Insulin-resistance gene Cd36

cDNA microarray: hybridisation signal for SHR much lower than for Brown Norway and SHR.4 control strains

Aitman et al 1999, Nature Genet 21:76-83

Page 3: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Larger microarray experiment: look for other genes associated with Cd36

Microarray Data

3 SHR compared with 3 transgenic rats (with Cd36)

3 wildtype (normal) mice compared with 3 mice with Cd36 knocked out

12000 genes on each array

Biological Question

Find genes which are expressed differently between animals with and without Cd36.

Page 4: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

• Bayesian Hierarchical Model for Differential Expression

• Decision Rules

• Predictive Model Checks

• Simultaneous estimation of normalization and differential expression

• Gene Ontology analysis for differentially expressed genes

Page 5: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Low-level Model(how gene expression is estimated from signal)

Normalisation(to make arrays comparable)

Differential Expression

Clustering,Partition Model

We aim to integrate all the steps in a common statistical framework

Microarray analysis is amulti-step process

Page 6: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Bayesian Modelling Framework

• Model different sources of variability simultaneously,within array, between array …

• Uncertainty propagated from data to parameter estimates (so not over-optimistic in conclusions).

• Share information in appropriate ways to get robust estimates.

Page 7: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Gene Expression Data

3 wildtype mice, Fat tissue hybridised to Affymetrix chips

Newton et al. 2001Showed data fit well by

Gamma or Log Normal distributions

Kerr et al. 2000Linear model on log scale

sd

mean

Page 8: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Data: ygsr = log expression for gene g, condition s, replicate rg = gene effectδg = differential effect for gene g between 2 conditionsr(g)s = array effect (expression-level dependent)gs

2 = gene variance

• 1st level yg1r | g, δg, g1 N(g – ½ δg + r(g)1 , g1

2), yg2r | g, δg, g2 N(g + ½ δg + r(g)2 , g2

2),

Σr r(g)s = 0 r(g)s = function of g , parameters {a} and {b}

Bayesian hierarchical model for differential expression

Page 9: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Mean effect g

g ~ Unif (much wider than data range)

Differential effect δg

δg ~ N(0,104) – “fixed” effects (no structure in prior)

OR mixture:

δg ~ 0δ0 + 1G_ (1.5, 1) + 2G+ (1.5, 2)

Priors for gene effects

Explicit modellingof the alternativeH0

Page 10: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Fixed Effects

Kerr et al. 2000

Mixture Models

Newton et al. 2004 (non-parametric mixture)

Löenstedt and Speed 2003, Smyth 2004

(conjugate mixture prior)

Broet et al. 2002 (several levels of DE)

References

Page 11: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Two extreme cases:

(1) Constant variance gsr N(0, 2) Too stringent Poor fit

(2) Independent variances gsr N(0, g2)

! Variance estimates based on few replications are highly variable

Need to share information between genes to better estimate their variance, while allowing some variability Hierarchical model

Prior for gene variances

Page 12: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

• 2nd level gs

2 | μs, τs logNormal (μs, τs)

Hyper-parameters μs and τs can be influential.

Empirical BayesEg. Löenstedt and Speed 2003, Smyth 2004Fixes μs , τs

Fully Bayesian• 3rd level

μs N( c, d) τs Gamma (e, f)

Prior for gene variances

Page 13: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Variances estimated using information from all G x R measurements (~12000 x 3) rather than just 3

Variances stabilised and shrunk towards average variance

Gene specific variances are stabilised

Page 14: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Spline Curver(g)s = quadratic in g for ars(k-1) ≤ g ≤ ars(k)

with coeff (brsk(1), brsk

(2) ), k =1, … #breakpoints

Prior for array effects (Normalization)

Locations of break points not fixedMust do sensitivity checks on # break points

a1 a2 a3a0

Page 15: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Array effect as a function of gene effect

loessBayesian posterior mean

Page 16: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Before (ygsr)

After (ygsr- r(g)s )

Wildtype Knockout

Effect of normalisation on density

^

Page 17: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

• 1st level

– ygsr | g, δg, gs N(g – ½ δg + r(g)s , gs2),

• 2nd level

– Fixed effect priors for g, δg

– Array effect coefficients, Normal and Uniform

gs2 | μs, τs

logNormal (μs, τs)

• 3rd level

– μs N( c, d)

– τs Gamma (e, f)

Bayesian hierarchical model for differential expression

Page 18: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Declare the model

WinBUGS software for fitting Bayesian models

for( i in 1 : ngenes ) { for( j in 1 : nreps) { y1[i, j] ~ dnorm(x1[i, j], tau1[i]) x1[i, j] <- alpha[i] - 0.5*delta[i] + beta1[i, j] }} for( i in 1 : ngenes ) { tau1[i] <- 1.0/sig21[i] sig21[i] <- exp(lsig21[i]) lsig21[i] ~ dnorm(mm1,tt1) }mm1 ~ dnorm( 0.0,1.0E-3)tt1 ~ dgamma(0.01,0.01)

WinBUGS does the calculations

Page 19: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Whole posterior distribution

Posterior means, medians, quantiles

WinBUGS software for fitting Bayesian models

Page 20: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

• Bayesian Hierarchical Model for Differential Expression

• Decision Rules

• Predictive Model Checks

• Simultaneous estimation of normalization and differential expression

• Gene Ontology analysis for differentially expressed genes

Page 21: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

So far, discussed fitting the model.

How do we decide which genes are differentially

expressed?

Parameters of interest: g , δg , g

– What quantity do we consider, δg , (δg /g) , … ?

– How do we summarize the posterior distribution?

Decision Rules for Inference

Page 22: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Inference on δ

(1) dg = E(δg | data) posterior mean

Like point estimate of log fold change.

Decision Rule: gene g is DE if |dg| > δcut

(2) pg = P( |δg| > δcut | data)

posterior probability (incorporates uncertainty)

Decision Rule: gene g is DE if pg > pcut

This allows biologist to specify what size of

effect is interesting (not just statistical significance)

Fixed Effects Model

biologicalinterest

biologicalinterest

statisticalconfidence

Page 23: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Inference on δ,

(1) tg = E(δg | data) / E(g | data)

Like t-statistic.

Decision Rule: gene g is DE if |tg| > tcut

(2) pg = P( |δg /g| > tcut | data)

Decision Rule: gene g is DE if pg > pcut

Bochkina and Richardson (in preparation)

Fixed Effects Model

statisticalconfidence

statisticalconfidence

Page 24: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

δg ~ 0δ0 + 1G_ (1.5, 1) + 2G+ (1.5, 2)

Mixture Model

(1) dg = E(δg | data) posterior mean

Shrunk estimate of log fold change.

Decision Rule: gene g is DE if |dg| > δcut

(2) Classify genes into the mixture components.

pg = P(gene g not in H0 | data)

Decision Rule: gene g is DE if pg > pcut

H0

Explicit modellingof the alternative

Page 25: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Illustration of decision rule

pg = P( |δg| > log(2)

and g > 4 | data)

x pg > 0.8

Δ t-statistic > 2.78 (95% CI)

Page 26: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

• Bayesian Hierarchical Model for Differential Expression

• Decision Rules

• Predictive Model Checks

• Simultaneous estimation of normalization and differential expression

• Gene Ontology analysis for differentially expressed genes

Page 27: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Bayesian P-values

• Compare observed data to a “null” distribution

• P-value: probability of an observation from the null distribution being more extreme than the actual observation

• If all observations come from the null distribution, the distribution of p-values is Uniform

Page 28: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Cross-validation p-values

Distribution of p-values {pi, i=1,…,n} is approximately Uniform if model adequately describes the data.

Idea of cross validation is to split the data: one part for fitting the model, the rest for validation

n units of observation

For each observation yi, run model on rest of data y-i, predict new data yi

new from posterior distribution.

Bayesian p-value pi = Prob(yinew > yi | data y-i)

Page 29: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Posterior Predictive p-values

“all data” includes yi p-values are less extreme than they should be

p-values are conservative (not quite Uniform).

Bayesian p-value pi = Prob(yinew > yi | all data)

For large n, not possible to run model n times.

Run model on all data. For each observation yi, predict new data yi

new from posterior distribution.

Page 30: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Bayesian p-value Prob( Sg2 new > Sg

2 obs | data)

Example: Check priors on gene variances

1) Compare equal and exchangeable variance models2) Compare different exchangeable priors

Want to compare data for each gene, not gene and replicate, so use sample variance Sg

2 (suppress index s here)

Page 31: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

WinBUGS code for posterior predictive checks

for( i in 1 : ngenes ) { for( j in 1 : nreps) { y1[i, j] ~ dnorm(x1[i, j], tau1[i]) ynew1[i, j] ~ dnorm(x1[i, j], tau1[i]) x1[i, j] <- alpha[i] - 0.5*delta[i] + beta1[i, j] } s21[i] <- pow(sd(y1[i, ]), 2) s2new1[i] <- pow(sd(ynew1[i, ]), 2) pval1[i] <- step(s2new1[i] - s21[i])}

replicate relevant sampling distribution

calculate sample variances

count no. times predicted sample variance is bigger than observed sample variance

Page 32: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Posterior predictivePrior

parameters

ygr

Mean parameters

r = 1:R

g = 1:G

g2

Sg2

newSg

2

ygr

new

Graph shows structure of model

Page 33: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Mixed predictivePrior

parameters

ygr

Mean parameters

r = 1:R

g = 1:G

g2

Sg2

newSg

2

ygr

new

g2

new

Less conservative than posterior predictive(Marshall and Spiegelhalter, 2003)

Page 34: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Equal variance model:Model 1: 2 log Normal (0, 10000)

Exchangeable variance models:

Model 2: g-2 Gamma (2, β)

Model 3: g-2 Gamma (α, β)

Model 4: g2 log Normal (μ, τ)

(α, β, μ, τ all parameters)

Four models for gene variances

Page 35: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Bayesian predictive p-values

Page 36: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

• Bayesian Hierarchical Model for Differential Expression

• Decision Rules

• Predictive Model Checks

• Simultaneous estimation of normalization and differential expression

• Gene Ontology analysis for differentially expressed genes

Page 37: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Expression level dependent normalization

Many gene expression data sets need normalization which depends on expression level.

Usually normalization is performed in a pre-processing step before the model for differential expression is used.

These analyses ignore the fact that the expression level is measured with variability.

Ignoring this variability leads to bias in the function used for normalization.

Page 38: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Simulated Data

Gene variances similar range and distribution to mouse data

Array effects cubic functions of expression level

Differential effects900 genes: δg = 0

50 genes: δg N( log(3), 0.12)

50 genes: δg N( -log(3), 0.12)

Page 39: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Array Effects and Variability for Simulated Data

Data points: ygsr – yg (r = 1…3)

Curves: r(g)s (r = 1…3)

_

Page 40: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Two-step method (using loess)

1) Use loess smoothing to obtain array effects loessr(g)s

2) Subtract loess array effects from data: yloessgsr

= ygsr - loessr(g)s

3) Run our model on yloessgsr with no array effects

Page 41: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Decision rules for selecting differentially expressed genes

If P( |δg| > δcut | data) > pcut then gene g is called differentially expressed.

δcut chosen according to biological hypothesis of interest (here we use log(3) ).

pcut corresponds to the error rate (e.g. False Discovery Rate or Mis-classification Penalty) considered acceptable.

Page 42: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Full model v. two-step method

Plot observed False Discovery Rate against pcut (averaged over 5 simulations)

Solid line for full model

Dashed line for pre-normalized method

Page 43: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

1) yloessgsr = ygsr - loess

r(g)s

2) ymodelgsr = ygsr - E(r(g)s | data)

Results from 2 different two-step methods are much closer to each other than to full model results.

Different two-step methods

Page 44: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

• Bayesian Hierarchical Model for Differential Expression

• Decision Rules

• Predictive Model Checks

• Simultaneous estimation of normalization and differential expression

• Gene Ontology analysis for differentially expressed genes

Page 45: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Gene Ontology (GO)

Database of biological terms

Arranged in graph connecting related terms

Directed Acyclic Graph: links indicate more specific terms

~16,000 terms

from QuickGO website (EBI)

Page 46: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Gene Ontology (GO)

from QuickGO website (EBI)

Page 47: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Gene Annotations

• Genes/proteins annotated to relevant GO terms

• Gene may be annotated to several GO terms

• GO term may have 1000s of genes annotated to it (or none)

• Gene annotated to term A annotated to all ancestors of A (terms that are related and more general)

Page 48: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

GO annotations of genes associated with the insulin-resistance gene Cd36

Compare GO annotations of genes most and least differentially expressed

Most differentially expressed ↔ pg > 0.5 (280 genes)

Least differentially expressed ↔ pg < 0.2 (11171 genes)

Page 49: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

GO annotations of genes associated with the insulin-resistance gene Cd36

For each GO term, Fisher’s exact test on

proportion of differentially expressed genes with annotations

v.

proportion of non-differentially expressed genes with annotations

observed O = A

expected E = C*(A+B)/(C+D)

if no association of GO

annotation with DE

FatiGO website

http://fatigo.bioinfo.cnio.es/

genes annot. to GO term

genes not annot. to GO term

genes mostdiff. exp.

genes leastdiff. exp.

A B

C D

Page 50: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

GO annotations of genes associated with the insulin-resistance gene Cd36

O = observed no. differentially expressed genesE = expected no. differentially expressed genes

Page 51: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Response to external stimulus(O=12, E=4.7)

Response to biotic stimulus(O=14, E=6.9)

Response to stimulus

Physiological process

Organismal movement

Biological process

Response to external biotic stimulus *

Inflammatory response(O=4, E=1.2)

Immune response(O=9, E=4.5)

Response to wounding(O=6, E=1.8)

Response to stress(O=12, E=5.9)

Defense response(O=11, E=5.8)

Response to pest, pathogen or parasite(O=8, E=2.6)

All GO ancestors of Inflammatory response

* This term was not accessed by FatiGO

Relations between GO terms were found using QuickGO:http://www.ebi.ac.uk/ego/

Page 52: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Further Work to do on GO

• Account for dependencies between GO terms

• Multiple testing corrections

• Uncertainty in annotation

( work in preparation )

Page 53: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

Summary

• Bayesian hierarchical model flexible, estimates variances robustly

• Predictive model checks show exchangeable prior good for gene variances

• Useful to find GO terms over-represented in the most differentially-expressed genes

Paper available (Lewin et al. 2005, Biometrics, in press)

http ://www.bgx.org.uk/

Page 54: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina
Page 55: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

• In full Bayesian framework, introduce latent allocation variable zg = 0,1 for gene g in null, alternative

• For each gene, calculate posterior probability of belonging to unmodified component: pg = Pr( zg = 0 | data )

• Classify using cut-off on pg (Bayes rule corresponds to 0.5)

• For any given pg , can estimate FDR, FNR.

Decision Rules

For gene-list S, est. (FDR | data) = Σg S pg / |S|

Page 56: Alex Lewin (Imperial College) Sylvia Richardson (IC Epidemiology) Tim Aitman (IC Microarray Centre) In collaboration with Anne-Mette Hein, Natalia Bochkina

The Null Hypothesis

Composite Null

Point Null, alternative not modelled

Point Null, alternative modelled