network inference umer zeeshan ijaz 1. overview introduction application areas cdna microarray...

27
Network Inference Umer Zeeshan Ijaz 1

Upload: james-drake

Post on 28-Mar-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Network Inference Umer Zeeshan Ijaz 1. Overview Introduction Application Areas cDNA Microarray EEG/ECoG Network Inference Pair-wise Similarity Measures

Network Inference

Umer Zeeshan Ijaz

1

Page 2: Network Inference Umer Zeeshan Ijaz 1. Overview Introduction Application Areas cDNA Microarray EEG/ECoG Network Inference Pair-wise Similarity Measures

Overview

•Introduction•Application Areas

• cDNA Microarray• EEG/ECoG

•Network Inference• Pair-wise Similarity Measures

• Cross-correlation STATIC• Coherence STATIC

• Autoregressive• Granger Causality STATIC

• Probabilistic Graphical Models• Directed

• Kalman-filtering based EM algorithm STATIC • Undirected

• Kernel-weighted logistic regression method DYNAMIC• Graphical Lasso-model STATIC

Page 3: Network Inference Umer Zeeshan Ijaz 1. Overview Introduction Application Areas cDNA Microarray EEG/ECoG Network Inference Pair-wise Similarity Measures

Introduction

Page 4: Network Inference Umer Zeeshan Ijaz 1. Overview Introduction Application Areas cDNA Microarray EEG/ECoG Network Inference Pair-wise Similarity Measures

cDNA Microarray

Page 5: Network Inference Umer Zeeshan Ijaz 1. Overview Introduction Application Areas cDNA Microarray EEG/ECoG Network Inference Pair-wise Similarity Measures

EoCG/EEG

Page 6: Network Inference Umer Zeeshan Ijaz 1. Overview Introduction Application Areas cDNA Microarray EEG/ECoG Network Inference Pair-wise Similarity Measures

For a pair of time series xi[t] and xj[t] of lengths n, the sample correlation at lag τ

)][()][()2(ˆˆ

1][

1jj

n

tii

jiij xtxxtx

nC

Cross-correlation based(1)

|][|max ijij Cs Measure of Coupling is the maximum cross correlation:

][txi

][ tx j

][ijC

n

njjii CC

lnl

][][1

)var(

toingcorrespond lag theis ˆ where valuescaled theis

)ˆvar(

ijijij

ij

ijij

slz

l

sz Use P-Value test to compare zij with a standard

normal distribution with mean zero and variance 1

Page 7: Network Inference Umer Zeeshan Ijaz 1. Overview Introduction Application Areas cDNA Microarray EEG/ECoG Network Inference Pair-wise Similarity Measures

Use Fisher Transformation: the resulting distribution is normal and has the standard deviation of

Use scaled value that is expected to behave like the maximum of the absolute value of a sequence of random numbers. Using now established results for statistics of this form, we obtain therefore that

*M. A. Kramer, U. T. Eden, S. S. Cash, E. D. Kolaczyk, Network inference with confidence from multivariate time series. Physical review E 79, 061916, 2009

][1

][1ln

2

1][

ij

ijFij C

CC

3/1 n

Significance test: ANALYTIC METHOD

)var( Fij

FijF

ijC

sz

Cross-correlation based (2)

)4lnln(ln)2(

ln2

}Pr{][ e wher

)]}(exp[2exp{][

1

naab

na

zzzP

bzazP

nnn

n

Fij

nn

Page 8: Network Inference Umer Zeeshan Ijaz 1. Overview Introduction Application Areas cDNA Microarray EEG/ECoG Network Inference Pair-wise Similarity Measures

Significance test: FREQUENCY DOMAIN BOOTSTRAP METHOD

1) Compute the power spectrum (Hanning tapered) of each series and average these power spectra from all the time series

2) Compute the standardized and whitened residuals for each time series

3) For each bootstrap replicate, RESAMPLE WITH REPLACEMENT and compute the surrogate data

4) Compute such instances and calculate maximum cross-correlation for each pair of nodes i and j

5) Finally compare the bootstrap distribution and assign a p-value

])[(][~ where

)][/][~(][ 1

txx

pxte

ii

ii

][tei

treplacemen with resampled ][ residual theof ansformFourier tr theis ][~ where tee ii

sN ijs

Cross-correlation based (3)

)][/][~(][ 1 petx ii

Page 9: Network Inference Umer Zeeshan Ijaz 1. Overview Introduction Application Areas cDNA Microarray EEG/ECoG Network Inference Pair-wise Similarity Measures

1) Order m=N(N-1)/2 p-values

2) Choose FDR level q3) Compare each to critical value

and find the maximum i such that

4) We reject the null hypothesis that time series and are uncoupled for

False Detection Rate Test

mppp ...21

ip miq /.

mkqpk /.

][txi ][tx j

kppp ...21

*M. A. Kramer, U. T. Eden, S. S. Cash, and E. D. Kolaczyk. Network inference with confidence from multivariate time series, Physics Review E 79(061916), 1-13, 2009

Cross-correlation based (4)

Page 10: Network Inference Umer Zeeshan Ijaz 1. Overview Introduction Application Areas cDNA Microarray EEG/ECoG Network Inference Pair-wise Similarity Measures

Coherence: Signals are fully correlated with constant phase shifts, although they may show difference in amplitude

Cross-phase spectrum: Provides information on time-relationships between two signals as a function of frequency. Phase displacement may be converted into time displacement

)(*)(/|)(|)( 22 fSyyfSxxfSxxfCxy

))(Re/)(arctan(Im)( fSxyfSxyf

Coherence based

Page 11: Network Inference Umer Zeeshan Ijaz 1. Overview Introduction Application Areas cDNA Microarray EEG/ECoG Network Inference Pair-wise Similarity Measures

Coherence based(2)

*S. Weiss, and H. M. Mueller. The contribution of EEG coherence to the investigation of language, Brain and Language 85(2), 325-343, 2003

Page 12: Network Inference Umer Zeeshan Ijaz 1. Overview Introduction Application Areas cDNA Microarray EEG/ECoG Network Inference Pair-wise Similarity Measures

Directed Transfer Function: Directional influences between any given pair of channels in a multivariate data set

Bivariate autoregressive process

If the variance of the prediction error is reduced by the inclusion of other series, then based on granger causality, one depends on another. Now taking the fourier transform

Granger causality from channel j to i:

)()()()()()(

)()()()()()(

21

2221

1212

11

2121

1111

tEjtXjAjtXjAtX

tEjtXjAjtXjAtX

P

j

P

j

P

j

P

j

)(

)(

)(

)(

)()(

)()(

2

1

2

1

2221

1211

fE

fE

fX

fX

fAfA

fAfA

ml

ml

ejAfA

lm

p

j

fjilmlmlm

when0

when1

)()(1

2

1

2221

1211

2221

1211

2

1

1

2221

1211

2

1

)()(

)()(

)()(

)()(

)(

)(

)()(

)()(

)(

)(

fAfA

fAfA

fHfH

fHfH

where

fE

fE

fAfA

fAfA

fX

fX

2

222

|)(A|

|)(||)(|

f

fAfHI ij

ijij

Granger Causality

Page 13: Network Inference Umer Zeeshan Ijaz 1. Overview Introduction Application Areas cDNA Microarray EEG/ECoG Network Inference Pair-wise Similarity Measures

- State Space Model (State Variable Model; State Evolution Model)

,1 kkk wAxx

State Equation NNk

Tkk QwwE ][

1 Nkx

,kkk vCxg Measurement Equation

1 Mkg

MMk

Tkk RvvE ][

Measurement Update(Filtering) Time Update(Prediction)1

11 ][ k

Tk|k

Tk|kk RCCPCPK

1)P( k|kkNk|k CKIP][ 11 k|kkkk|kk|k CxgKxx

kT

k|k|kk QAAPP 1

k|k|kk Axx 1

kkT

kkkkkk PxxxxE ||| ]))([(

Kalman Filter

Page 14: Network Inference Umer Zeeshan Ijaz 1. Overview Introduction Application Areas cDNA Microarray EEG/ECoG Network Inference Pair-wise Similarity Measures

Probabilistic graphical models(1)

nX,,X=X ...1Joint distribution over a set

Bayesian Networks associate with each variable a conditional probability

XU,U|XP iii iX

The resulting product is of the form iii

n U|XP=X,,XP ...1

A B

CD

E

P(C|A,B)

A B 0 1

0 0 0.9 0.10 1 0.2 0.81 0 0.9 0.11 1 0.01 0.99

C|EPA|DPBA,|CPBPAP=ED,C,B,A,P

Page 15: Network Inference Umer Zeeshan Ijaz 1. Overview Introduction Application Areas cDNA Microarray EEG/ECoG Network Inference Pair-wise Similarity Measures

RCQCCQ

QCQQ~

,vCw

ww~

I) (0H ,DCBCA

BAA

where

x~Hg

w~x~Ax~

is form space state new Then the

]g,x[x~ vector state new Define

0

01

Tt

T

T

tt

tt

tt

ttt

TTt

Tt

ttt wAxx 1

ttt vCxg

tttt

tttt

vDgCxg

wBgAxx

1

1

EM Algorithm: Predicting gene regulatory network

Constructing the network:

n)(Connectio 0)(:

)connection (No 0)(:

,1

,0

ji

ji

DCBH

DCBH

Page 16: Network Inference Umer Zeeshan Ijaz 1. Overview Introduction Application Areas cDNA Microarray EEG/ECoG Network Inference Pair-wise Similarity Measures

Conditional distribution of state and observables

Factorization rule for bayesian network

Unknowns in the system

2/12/

11-

1

1

2/12/

11

1

1

11

1

|R|)2(

)DgCxg(R)DgCxg(21

exp)g,x|y(

|Q|)2(

)BgAxx(Q)BgAxx(21

exp)g,x|x(

)R,DgCx(~)g,x|g(

)Q,BgAx(~)g,x|x(

p

tttT

ttt

ttt

K

tttT

ttt

ttt

ttttt

ttttt

P

P

P

P

T

tttt

T

tttttt PPPP

11

1

111 )g,x|g()g,x|x()x()|}g{},x({

}μ,QR,Q,D,C,B,A,{ 11

lyrespective matrices,n observatio and state theinput to are D and B

vDgCxg

wBgAxx

1

1

tttt

tttt

EM Algorithm: Predicting gene regulatory network(2)

Page 17: Network Inference Umer Zeeshan Ijaz 1. Overview Introduction Application Areas cDNA Microarray EEG/ECoG Network Inference Pair-wise Similarity Measures

Construct the likelihood

θ)|g,x(log)g,x;θ( PL

)g,x;θ(argmaxθ θLConstruct the likelihood

θ),F(Qmaxargθ :step M

)θF(Q,maxargQ :step E)g,x;θ(argmax

1θ1

1

θ

kk

kQkL

Marginalize with respect to x and introducing a distribution Q

)θg,|x((x)Q where

θ),Q(x)()g,x;θ(

1k P

FL

EM Algorithm: Predicting gene regulatory network(4)

Page 18: Network Inference Umer Zeeshan Ijaz 1. Overview Introduction Application Areas cDNA Microarray EEG/ECoG Network Inference Pair-wise Similarity Measures

Let’s say we want to compute C}μ,QR,Q,D,C,B,{A,θ 11

SMOOTHER-KALMAN thefromobtain willwhich we

}]{|xx[P

}]{|xx[P

}]{|x[x

require will that wedo

willebut when w ,parametersother obtain and Dfor procedure same repeat thecan We

PxgDxgC

nsexpectatio Taking

0xgDxxCxg

C

])DgCxg(R)DgCxg([

C

)]θ|}{},x({log2[

1,1

1

111

1

11

11

1

11

11

gE

gE

gE

gP

tttt

ttt

tt

T

tt

T

ttt

T

ttt

T

ttt

T

ttt

T

ttt

T

ttttttt

tt

Kalman filter based: Inferring network from microarray expression data(5)

Page 19: Network Inference Umer Zeeshan Ijaz 1. Overview Introduction Application Areas cDNA Microarray EEG/ECoG Network Inference Pair-wise Similarity Measures

Experimental Results: A standard T-Cell activation model

*Claudia Rangel, John Angus, Zoubin Ghahramani, Maria Lioumi, Elizabeth Sotheran, Alessia Gaiba, David L. Wild, Francesco Falciani: Modeling T-cell activation using gene expression profiling and state-space models. Bioinformatics 20(9): 1361-1372 (2004)

Kalman filter based: Inferring network from microarray expression data(9)

Page 20: Network Inference Umer Zeeshan Ijaz 1. Overview Introduction Application Areas cDNA Microarray EEG/ECoG Network Inference Pair-wise Similarity Measures

Probabilistic graphical models(2)

Markov Networks represent joint distribution as a product of potentials

j

jjn cZ

=X,,XP ][1

...1

D

BC

A

E

A B π1(A,B)

0 0 1.00 1 0.51 0 0.51 1 2.0

DA,πDC,πEC,B,πBA,πZ

=ED,C,B,A,P 4321

1

Page 21: Network Inference Umer Zeeshan Ijaz 1. Overview Introduction Application Areas cDNA Microarray EEG/ECoG Network Inference Pair-wise Similarity Measures

x1

x6

x8x5

x2

x3

x4

)(

)(

),(

)()()()(

)( exp)(

1)(P

t

t

vu

tv

tu

tuvt

t XXZ

X

Kernel-weighted logistic regression method(1)

1,2exp

,2exp)|(P

)(\

)(\

)(

)(\

)(\

)(

)(\

)()(

\

tu

tu

tu

tu

tu

tut

utu

XX

XXXXt

u

Pair-wise Markov Random Field

Logistic Function

Log Likelihood

)|(Plog);( )(\

)()()(\ )(

\

tu

tu

ttu XXX t

u

||||);()(-argminˆ1

1)(

\)()(

\)(

ˆ)(

\ 1)(\

n

i

tu

ttui

ttu

ipt

uXtw

Optimization problem

ldimensiona 1 is }\|{

V:\)()(

\ p-uv

uu

baa,b

tuv

tu

T

Page 22: Network Inference Umer Zeeshan Ijaz 1. Overview Introduction Application Areas cDNA Microarray EEG/ECoG Network Inference Pair-wise Similarity Measures

Kernel-weighted logistic regression method(2)

n

i

tu

ttui

ttu

ipt

uXtw

11

)(\

)()(\

)(ˆ

)(\ ||||);()(-argminˆ

1)(\

Page 23: Network Inference Umer Zeeshan Ijaz 1. Overview Introduction Application Areas cDNA Microarray EEG/ECoG Network Inference Pair-wise Similarity Measures

Kernel-weighted logistic regression method(3)

Interaction between gene ontological groups related to developmental process undergoing dynamic rewiring. The weight of an edge between two ontological groups is the total number of connection between genes in the two groups. In the visualization, the width of an edge is propotional to the edge weight. The edge weight is thresholded at 30 so that only those interactions exceeding this number are displayed. The average network on left is produced by averaging the right side. In this case, the threshold is set to 20

*L. Song, M. Kolar, and E. P. Xing. KELLER: estimating time-varying interactions between genes. Bioinformatics 25, i128-i136, 2009

Page 24: Network Inference Umer Zeeshan Ijaz 1. Overview Introduction Application Areas cDNA Microarray EEG/ECoG Network Inference Pair-wise Similarity Measures

Graphical Lasso Model(1)

tindependenlly conditiona are and then zero is ofcomponent th theIf :IdeaKey -

matrix covariance and mean on with distributiGaussian variate-p a be )Z,...,(Let 1

Tp1

jiij

ZZ

-1ˆ ofestimator an is 10 ||||)(detlogargmax Str

n

k

Tkk ZZn

S1

)()(1

matrix covariance Empirical

*O. Banerjee, L. E. Ghaoui, A. d’Aspremont. Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. Journal of Machine Language Research 101, 2007

2212

1211

ww

wWTW

2212

1211

ss

sSTS

ofestimator an isW

.obtain can we

,for solvingafter and where

||||||||2

1min

1112

121/2-

11

122/1

11

Ww

sWb

bW

Page 25: Network Inference Umer Zeeshan Ijaz 1. Overview Introduction Application Areas cDNA Microarray EEG/ECoG Network Inference Pair-wise Similarity Measures

1114131215

4144434245

3134333235

2124232225

5154535255

sssss

sssss

sssss

sssss

sssss

5554535251

4544434241

3534333231

2524232221

1514131211

sssss

sssss

sssss

sssss

sssss

1114131215

4144434245

3134333235

2124232225

5154535255

wwwww

wwwww

wwwww

wwwww

wwwww

5554535251

4544434241

3534333231

2524232221

1514131211

wwwww

wwwww

wwwww

wwwww

wwwww

5554535251

4544434241

3534333231

2524232221

1514131211

wwwww

wwwww

wwwww

wwwww

wwwww

5554535251

4544434241

3534333231

2524232221

1514131211

sssss

sssss

sssss

sssss

sssss

2224232521

4244434541

3234333531

5254535551

1214131511

wwwww

wwwww

wwwww

wwwww

wwwww

2224232521

4244434541

3234333531

5254535551

1214131511

sssss

sssss

sssss

sssss

sssss

2212

1211

ww

wWTW

2212

1211

ss

sSTS

)ˆ/(1ˆ

ˆˆˆ

:follows as and ˆ matrices prestored theusecan wecase,such In

10

0

thatknowing from ˆrecover can weconverged,When

122222

2212

2212

1211

2212

1211

1

T

TTT

ww

W

I

ww

wW

W

2212

1211

ww

wWTW

2212

1211

ss

sSTS

ISW :Initialise

Solve the lasso problem for w12 over jth column one at a time

11-1- ||W||p-)tr( SW

Graphical Lasso Model(2)

*O. Banerjee, L. E. Ghaoui, A. d’Aspremont. Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. Journal of Machine Language Research 101, 2007

Page 26: Network Inference Umer Zeeshan Ijaz 1. Overview Introduction Application Areas cDNA Microarray EEG/ECoG Network Inference Pair-wise Similarity Measures

Graphical Lasso Model(3)

*Software under development @ Oxford Complex Systems Group with Nick Jones*Results shown for Google Trend Dataset

Page 27: Network Inference Umer Zeeshan Ijaz 1. Overview Introduction Application Areas cDNA Microarray EEG/ECoG Network Inference Pair-wise Similarity Measures

27

THE END