identifying differentially regulated genes

22
1 Identifying Differentially Regulated Genes Nirmalya Bandyopadhyay, Manas Somaiya, Sanjay Ranka, and Tamer Kahveci Bioinformatics Lab., CISE Department, University of Florida

Upload: marcia-stephenson

Post on 30-Dec-2015

41 views

Category:

Documents


0 download

DESCRIPTION

Identifying Differentially Regulated Genes. Nirmalya Bandyopadhyay, Manas Somaiya, Sanjay Ranka, and Tamer Kahveci Bioinformatics Lab., CISE Department, University of Florida. Gene interaction through regulatory networks. - PowerPoint PPT Presentation

TRANSCRIPT

1

Identifying Differentially Regulated Genes

Nirmalya Bandyopadhyay, Manas Somaiya, Sanjay Ranka, and Tamer Kahveci

Bioinformatics Lab., CISE Department,University of Florida

2

Gene interaction through regulatory networks

• Gene networks: The genes are nodes and the interactions are directed edges.

• Neighbors– incoming neighbors and outgoing neighbors.

• A gene can changes the state of other genes– Activation– Inhibition

K-Ras Raf MEKERK

JNK

RalGDS Ral RalBP1

PLD1

Cob42Rac

Perturbation experiments

3

K-Ras Raf MEKERK

JNK

RalGDS Ral RalBP1

PLD1

Cob42Rac

Perturbation

• In a perturbation experiment stimulant (radiation, toxic element, medication), also known as perturbation, is applied on tissues.

• Gene expression is measured before and after the perturbation.• A gene can change its expression as a result of perturbation.

• Differentially expressed gene (DE).• Equally expressed gene (EE).

Differentially expressed genes

4

Perturbation experiment : single dataset

• Primarily affected genes : Directly affected by perturbation.

• Secondarily affected genes : Primarily affected genes affect some other genes.

K-Ras Raf MEKERK

JNK

RalGDS Ral RalBP1

PLD1

Cob42Rac

Perturbation

Primarily affected genes

Secondarily affected genes

Differentially and Equally regulated

• Some dataset inherently has two groups.– Fasting vs non-fasting, Caucasian American vs African American

• For these datasets, a gene is– Differentially regulated: DE in one group and EE in another.– Equally regulated: DE or EE in both the groups.– Here, gene g1 is DE in data DA and EE in DB. Hence, it is DR.

5

g1 g4 g5

g2 g3

g1 g4 g5

g2 g3

DADB

Differentially expressed

Equally expressed

666

Two datasets: Primary and secondary effects

• Primarily differentially regulated genes (PDR): Directly affected by perturbation.

• Secondarily differentially regulated genes (SDR): Primarily affected genes affect some other genes.

g1 g4 g5

g2 g3

g1 g4 g5

g2 g3

g0

DADB

Primarily differentially expressed

Secondarily differentially expressed

Equally expressed

7

Problem & method • Input: Gene expression (control and non-control) of

two data groups DA and DB.• Problem: Analyzing the primary and secondary

affects of the perturbation– Estimate probability that a gene is differentially regulated

because of the perturbation or because of the other genes (incoming neighbors)?

– What are the primarily differentially regulated genes? • Method

– Probabilistic Bayesian method, where we employ Markov Random Field to leverage domain knowledge.

Notation • Observed variables

– Microarray datasets:• Two data groups: DA, DB • A single gene gi in group C, (C ϵ

A,B):

• For All genes in group A:

– Neighborhood variables

• Hidden variables– State variables: – Regulation variables: Zi

– Interaction variables: Xij

8

M

1i CiC YY

EE is g ifEE,

DE is g ifDE,S

i

ii

otherwise 0

g tog from edgean if 1,W

ji

ij

'yyY CiiCCi

SAi SBi SAj SBj Zi Zj Xij

DE DE DE DE 1 1 1

DE DE DE EE 1 2 2

DE DE EE DE 1 3 3

DE DE EE EE 1 4 4

DE EE DE DE 2 1 5

DE EE DE EE 2 2 6

DE EE EE DE 2 3 7

DE EE EE EE 2 4 8

EE DE DE DE 3 1 9

EE DE DE EE 3 2 10

EE DE EE DE 3 3 11

EE DE EE EE 3 4 12

EE EE DE DE 4 1 13

EE EE DE EE 4 2 14

EE EE EE DE 4 3 15

EE EE EE EE 4 4 16

9

Problem formulation

• Input to the problem:– Microarray expression: Y – Gene network V = {G, W}

• G = {g0, g1, g2, …, gM} where g0 is metagene.

• Goal:– Estimate the density p(Xij| X- Xij, Y, V, Wij = 1 ) for all Wij.

This gene estimates the probability that a gene is DR due to the perturbation or due to an incoming neighbor gene.

– Note: A higher value for p(Xij ={2, 3}| X- Xij, Y, V, Wij = 1 ) indicates a higher chance that gj is affected by gi

10

Bayesian distribution• We propound a Bayesian model as it allows us to

incorporate our beliefs into the model.– The joint probability distribution over X

– We can derivate the density of Xij , p(Xij| X- Xij, Y, V, Wij =1) from the joint density function.

X XY

XYXY )θV,|p(X)θV,X,|p(Y

)θV,|p(X)θV,X,|p(Y)θ,θV,Y,|p(X

Posterior density Likelihood density Prior density

11

Prior density function : Markov random field

• MRF is an undirected graph Ψ = (X, E).– X = {Xij} represents an

edge in the gene network.

– E = {(Xij, Xpj)| Wpi = Wij= 1} U {(Xij, Xik) | Wjk= Wij

= 1} • An edge in MRF

corresponds to two edges in the gene network. – (X23, X25) corresponds to

(g2, g3) and (g3, g5)

g1 g4 g5

g2 g3

g1 g4 g5

g2 g3

g0

DA DB

X01 (2) X02 (1) X03 (1) X05 (3)

X04 (4) X12 (5) X23 (1) X35 (3)

X14 (8) X13 (5) X25 (7)

(a) Gene network

(b) Markov random field

12

Prior density function: Feature functions• Three beliefs relevant to our model:

– In a data group, the meta gene g0 can affect the states of all other genes. (modeled by adding directed edges from g0 to all other genes.)

– In a data group, a gene can affect the state of its outgoing neighbors.

– A gene has high probability of being equally regulated.• We incorporate these beliefs into the MRF graph using seven

feature functions.• Feature function: Unary or Binary function over the nodes of

MRF. A feature function allows us to introduce our belief on the graph.

13

Feature Functions• Unary: Capture the frequency of Xij.

• Binary: Encapsulates the second belief that In a data group, a gene can affect the state of its outgoing neighbors.

• Unary: Capture the third belief that a gene has high probability of being equally regulated.

• Prior density function

otherwise 0,

2X if 1,)(XF ij

ij1

1W1,Wp, pjij4ij4piij

)X,(Xf)(XF

1W1,Wk, ikij5ij5jkij

)X,(Xf)(XF

Left External Equality

Right External Equality

))(XFγexp(Δ

1)θ|p(X

}7{1,2,...,k1,Wj,i, ijkkXij

Feature functions

otherwise 0,

3X if 1,)(XF

ijij2

)(XF)(XF)(XF ij2ij1ij3

3,...,16}{1,...,4,1t1,W ij6ij6ij

)t,(Xf)(XF

,12,13,16}{1,4,5,8,9t1,W ij7ij7ij

)t,(Xf)(XF

Left Internal Equality

Right Internal Equality

Binary: External feature functions

• The external feature functions encapsulate the belief that in a data group, a gene can affect the state of its outgoing neighbors.

• Left Equality– Xij = Xpj Zi = Zp

• Right Equality– Xij = Xik Zj = Zk

14

g1 g2 g3 g4

X23

X12

X34

X13 X24

(a) Gene network

(a) MRF network

Left equality for X23

Right equality for X23

Unary: Internal feature functions

• The internal feature function represents the belief that a gene has high probability of being equally regulated.

• gi is equally regulated.– Xij = {1,2,3,4} Zi = 1 (DE)

– Xij = {13,14,15,15} Zi = 4 (EE)

• gj is equally regulated.– Xij = {1,5,9,13} Zj = 1 (DE)

– Xij = {4,8,12,16} Zj = 4 (EE)

15

16

Objective function optimization

Obtain an initial estimate of state variables.

Estimate parameters for likelihood density.

Estimate parameters that maximize the prior density.

Estimate parameters that maximize the pseudo-likelihood density.

ICM

Differential evolution

Student’s t

Rank the DE genes based on the likelihood w.r.t the metagene.

17

Dataset and experimental setup• DataSet

– Real: Adapted from Smirnov et al. generated using 10 Gy ionizing radiation over immortalized B cells obtained from 155 doner.

– Real/Synthetic: We created synthetic data to simulate the perturbation experiment based on the real dataset. The simulated model is taken from “Modeling of Multiple Valued Gene Regulatory Networks,” by Garg et. al.

– Gene regulatory network: 24,663 genetic interactions over 2,335 genes collected from KEGG database.

• Experimental setup– Implemented our method in MATLAB and java.– Ran our code on a quad core AMD Opteron 2 Ghz workstation with

32GB memory.

Comparison with other methods

• We compared our method with three other methods:– SMRF: Our old method, developed to analyze the effect of

external perturbation on a single data group.– SSEM: A method to differentiate between primary and

secondary effect of perturbation on gene expression dataset.

– Two sample t-test (Student’s t test)

18

Comparison with other methods

19

20

Conclusions

• Our method could find primarily affected genes with high accuracy.

• It achieved significantly better accuracy than SMRF, SSEM and the student’s t test method.

• Our method produces a probability distribution rather than a fixed binary decision.

21

Acknowledgement

This work was supported partially by NSF under grants CCF-0829867 and IIS-0845439.

22

Thank you!