estimation and inference for differential networks · estimatedbrainnetworks...

Estimation and Inference for DifferentialNetworks

Mladen Kolar ([email protected])

1

mailto:[email protected]

Collaborators

Byol Kim (U Chicago)

Song Liu (U Bristol)

2

Motivation

Example: ADHD-200 brain imaging dataset (Biswal et al. 2010)

The ADHD-200 dataset is a collection of resting-state functional MRI ofsubjects with and without attention deficit hyperactive disorder.

I subjects: 491 controls, 195 cases diagnosed with ADHD of varioustypes

I for each subject there are between 76 and 276 R-fMRI scansI focus on 264 voxels as the regions of interest (ROI); extracted by

Power et al. (2011)

We are interested in understanding how the structure of the neural networkvaries with age of subjects. (Lu, Kolar, and Liu 2017)

3

Estimated Brain NetworksThe differences between junior, median and senior neural networks.

I The green edges only exist in G(11.75) and the red edges only existin G(8).

I The green edges only exist in G(20) and the red edges only exist inG(11.75).

coronal sagittal transverse

(a) G(8) v.s. G(11.75)

(b) G(11.75) v.s. G(20) 4

Limitations

While we can estimate the networks and their difference, it is hard toquantify uncertainty in the estimates and perform statistical inference.

The goal here is to develop methodology capable of doing so.

5

Probabilistic Graphical Models

- Graph G = (V ,E ) with p nodes- Random vector X = (X1, . . . ,Xp)′

Represents conditional independence relationships between nodes

Useful for exploring associations between measured variables

(a, b) 6∈ E ⇐⇒ Xa ⊥ Xb | Xab

X1 ⊥ X6 | X2, . . . ,X51 2

3

45

6

6

Structure Learning ProblemGiven an i.i.d. sample Dn = xin

i=1 from a distribution P ∈ PLearn the set of conditional independence relationships

G = G(Dn)

1 2

3

45

6 ?

Some literature: Yuan and Lin (2007), O. Banerjee, El Ghaoui, and d’Aspremont(2008), Friedman, Hastie, and Tibshirani (2008), T. T. Cai, Liu, and Luo (2011),Meinshausen and Bühlmann (2006), Ravikumar, Wainwright, and Lafferty(2010), Xue, Zou, and Ca (2012), Yang et al. (2012), Yang et al. (2015), Yanget al. (2013), Yang et al. (2014), . . .

7

Difference Estimation

8

Literature Review: Gaussian Graphical Models

Penalized estimation (B. Zhang and Wang 2012, Danaher, Wang, andWitten (2014))

Ωx , Ωy = arg maxΩx0,Ωy0

∑c∈x ,y

(log |Ωc | − tr ΣcΩc − λ ‖Ωc‖1

)−λ2 ‖Ωx − Ωy‖1

Direct difference estimation (S. D. Zhao, Cai, and Li 2014)

∆ = arg min∆‖∆‖1 subject to ‖Σx ∆Σy − (Σy − Σx )‖∞ ≤ λ

9

Literature Review: Inference for Graphical Models

Inference is available for single group testingI Ren et al. (2015), Lu, Kolar, and Liu (2017), J. Wang and Kolar

(2016), Barber and Kolar (2015), Yu, Gupta, and Kolar (2016), Chenet al. (2015), Jankova and van de Geer (2015, 2016)

Xia, Cai, and Cai (2015) combines inference results for each precisionmatrix based on Ren et al. (2015). We will discuss this result more later.

10

Inference for Differential Markov Random Fields

11

The Setting

Let F = f ( · ; θ) be a parametric family of probability distributions ofthe form

f (y ; θ) = Z (θ)−1 exp( ∑

1≤i≤j≤mθij ψij(yi , yj)

)= Z (θ)−1 exp

(θ>ψ(y)

)

Given samplesI xii∈[nx ] from f (x ; θx ), andI yii∈[ny ] from f (y ; θy )

Our goal is to estimate the change

δθ = θx − θy

12

Density Ratio Approach

Kullback-Leibler importance estimation procedure (KLIEP) (Sugiyama etal. 2008)

r(y ; δθ) = fx (y)fy (y) =

(Z (θx )Z (θy )

)−1exp

((θx − θy︸︷︷︸

δθ

)>ψ(y)

)

δθ = arg minδθ

DKL(fx ‖ r( · ; δθ)fy )

= arg minδθ

− Ex

[δ>θ ψ(x)

]+ logEy

[exp

(δ>θ ψ(y)

) ].

13

Sparse Direct Difference Estimation

Given the data, we replace the expectations with the appropriate sampleaverages to form the empirical KLIEP loss

`KLIEP(δθ; X ,Y ) = − 1nx

nx∑i=1

δ>θ ψ(xi ) + log[1ny

ny∑j=1

exp(δ>θ ψ(yj)

)]

Under a suitable set of assumptions, the penalized estimator

δθ = arg minδθ

`KLIEP(δθ; X ,Y ) + λ‖δθ‖1

can be shown to be consistent (S. Liu et al. 2017, Fazayeli and Banerjee(2016)).

14

Inference For a Low Dimensional Component

Let δθ = (δθ,1, δθ,2). Our goal is construction of an asymptotically Normalestimator of δθ,1.

An oracle estimator

δθ,1 = arg minδθ,1

`KLIEP(δθ,1, δ∗θ,2)

is asymptotically Normal.

What about a feasible estimator

δθ,1 = arg minδθ,1

`KLIEP(δθ,1, δθ,2)?

15

Performance of a Naive Estimator

16

Feasible Plug-in Estimator

0 = ∂1`KLIEP(δθ,1, δθ,2) = ∂1`KLIEP(δ∗θ,1, δ∗θ,2)

+ H11(δθ,1 − δ∗θ,1) + H12(δθ,2 − δ∗θ,2) + op(1)

I H = ∇2`KLIEP(δ∗θ ; Y )

I δθ,2 is a consistent estimator of δ∗θ,2 with ‖δθ,2 − δ∗θ,2‖2 .P

√s log(p)

n

The problem is that the limiting distribution depends on the modelselection procedure used to estimate δ∗θ,2.

17

ω-Corrected Score Function

Consider a linear combination of the components of the score vector

Sω(δθ,1, δθ,2) = ∂1`KLIEP(δθ,1, δθ,2)− ω>∂2`KLIEP(δθ,1, δθ,2)

Our estimator δθ,1 is a solution to the following estimating equation

0 = Sω(δθ,1, δθ,2)

18


0 =√

nSω(δθ,1, δθ,2) =√

nSω(δ∗θ,1, δ∗θ,2)

+√

n(H11 − ωT H21

) (δθ,1 − δ∗θ,1

)+√

n(H12 − ωT H22

) (δθ,2 − δ∗θ,2

)+ oP(1)

In order to have√

n(δθ,1 − δ∗θ,1

)→D N(0, σ2

δθ,1) we need:

I E[Sω(δθ,1, δθ,2)] = 0I√

n(H12 − ωT H22

) (δθ,2 − δ∗θ,2

)vanishes sufficiently fast

19


Idea from Ning and Liu (2017)

Construct ω as a solution to

H12 = ωT H22

That is ω = H−122 H21

Sample estimate obtained as

ω = arg minω

12ω>H(δθ; Y )ω − ω>e1 + λ2‖ω‖1

I needs to converge sufficiently fast

20

Our Practical Procedure

Compute an initial estimate δθ:

δθ ← arg minδθ

`KLIEP(δθ; X ,Y ) + λ1‖δθ‖1

δθ ← arg minδθ

`KLIEP(δθ; X ,Y ) : supp(δθ) ⊆ supp(δθ)

Compute a sparse approximation ω to the corresponding row of the inverseof the Hessian:

ω ← arg minω

12ω>H(δθ; Y )ω − ω>e1 + λ2‖ω‖1

Re-estimate on the union of the supports from Steps 1 and 2:

(δθ,1, δθ,2)← arg minδθ

`KLIEP(δθ; X ,Y ) : supp(δθ) ⊆ supp(δθ) ∪ supp(ω)

21

Technical Conditions

High Level Technical Conditions:I consistency for initial parameter estimation

I deviaton for the gradient and Hessian

I local smoothness of the loss

I conditions for the central limit theorem and estimation of the variance

22

Technical Conditions

I ‖ψ(y)‖∞ = M <∞ with probability 1I λmin(H) ≥ λ > 0I ω is approximately sparseI ∀∆ ∈ E ∩ B(0, ‖δ∗θ‖1)

1n2

y

∑1≤j<j′≤ny

〈ψ(yj)− ψ(yj′ ),∆〉2 ≥ C2(κ1‖∆‖2 − κ2

log pny‖∆‖2

1

)

with probability 1− δny

23

Main Result

Suppose the regularity conditions hold. Let n = nx ∧ ny be the smallersample size and ρ = limn→∞ n/(nx ∨ ny ).

With appropriately chosen regularization parameters λ1, λ2 out estimatorδθ,1 satisfies √

nσ−1(δθ,1 − δ∗θ,1)→D N(0, 1 + ρ)

whereσ2 = (Σ11 − Σ12Σ−1

22 Σ21)−1.

Furthermore, σ2 can be consistently estimated.

24

Simulation Plot

25

Power Plot

26

Simultaneous Inference

For a potentially large I, we would like to simultaneously test

H0k : δ∗θ,k = δθ,0k , k ∈ I ⊆ [p]

Or construct simultaneous confidence intervals

Pδ∗θ,k ∈ (δθ,k − σktα(I)/

√n, δθ,k + σktα(I)/

√n) ∀k ∈ I

≥ 1− α

27

Simulation Assisted Simultaneous Inference

Inference based on the following test statistic

maxk∈I

√n|δθ,k − δ∗θ,k |.

Approximate the distribution of the test statistic with

W = maxk

1√n

n∑i=1

⟨ωk ,−ψ(xi ) + 1

ny

ny∑j=1

ψ(yj)r(yj ; δ∗θ )⟩ξi ,

where ξini=1 are i.i.d. standard normal random variables.

The bootstrap critical value is the empirical quantile

tα(I) = inf t ∈ R : P W ≤ t | X ,Y ≥ 1− α .

28

Future Work

Lower bounds

Can we say something about uncertainty when confounders are present?I The problem is how to account for the fact that we are estimating the

effect of confounders.

29

ReferencesBanerjee, O., L. El Ghaoui, and A. d’Aspremont. 2008. “Model SelectionThrough Sparse Maximum Likelihood Estimation.” J. Mach. Learn. Res. 9(3): 485–516.Barber, Rina Foygel, and Mladen Kolar. 2015. “ROCKET: RobustConfidence Intervals via Kendall’s Tau for Transelliptical Graphical Models.”ArXiv E-Prints, arXiv:1502.07641, February.Biswal, Bharat B, Maarten Mennes, Xi-Nian Zuo, Suril Gohel, Clare Kelly,Steve M Smith, Christian F Beckmann, Jonathan S Adelstein, Randy LBuckner, and Stan Colcombe. 2010. “Toward Discovery Science of HumanBrain Function.” Proceedings of the National Academy of Sciences 107(10). National Acad Sciences: 4734–9.Cai, T. Tony, W. Liu, and X. Luo. 2011. “A Constrained `1 MinimizationApproach to Sparse Precision Matrix Estimation.” J. Am. Stat. Assoc.106 (494): 594–607.Chen, Mengjie, Zhao Ren, Hongyu Zhao, and Harrison H. Zhou. 2015.“Asymptotically Normal and Efficient Estimation of Covariate-AdjustedGaussian Graphical Model.” Journal of the American StatisticalAssociation 0 (ja): 00–00. doi:10.1080/01621459.2015.1010039.Danaher, P., P. Wang, and Daniela M. Witten. 2014. “The JointGraphical Lasso for Inverse Covariance Estimation Across Multiple Classes.”J. R. Stat. Soc. B 76 (2). Wiley-Blackwell: 373–97.doi:10.1111/rssb.12033.Fazayeli, Farideh, and Arindam Banerjee. 2016. “Generalized DirectChange Estimation in Ising Model Structure.” In Proceedings of the 33rdInternational Conference on Machine Learning, edited by Maria FlorinaBalcan and Kilian Q. Weinberger, 48:2281–90. Proceedings of MachineLearning Research. New York, New York, USA: PMLR.http://proceedings.mlr.press/v48/fazayeli16.html.Friedman, Jerome H., Trevor J. Hastie, and Robert J. Tibshirani. 2008.“Sparse Inverse Covariance Estimation with the Graphical Lasso.”Biostatistics 9 (3): 432–41.Liu, Song, Taiji Suzuki, Raissa Relator, Jun Sese, Masashi Sugiyama, andKenji Fukumizu. 2017. “Support Consistency of Direct Sparse-ChangeLearning in Markov Networks.” Ann. Statist. 45 (3): 959–90.https://doi.org/10.1214/16-AOS1470.Lu, Junwei, Mladen Kolar, and Han Liu. 2017. “Post-RegularizationInference for Dynamic Nonparanormal Graphical Models.” Journal ofMachine Learning Research (to Appear), December.Meinshausen, Nicolas, and Peter Bühlmann. 2006. “High DimensionalGraphs and Variable Selection with the Lasso.” Ann. Stat. 34 (3):1436–62.Power, Jonathan D, Alexander L Cohen, Steven M Nelson, Gagan S Wig,Kelly Anne Barnes, Jessica A Church, Alecia C Vogel, et al. 2011.“Functional Network Organization of the Human Brain.” Neuron 72 (4).Elsevier: 665–78.Ravikumar, Pradeep, Martin J. Wainwright, and J. D. Lafferty. 2010.“High-Dimensional ising Model Selection Using `1-Regularized LogisticRegression.” Ann. Stat. 38 (3): 1287–1319. doi:10.1214/09-AOS691.Ren, Zhao, Tingni Sun, Cun-Hui Zhang, and Harrison H. Zhou. 2015.“Asymptotic Normality and Optimalities in Estimation of Large GaussianGraphical Models.” Ann. Stat. 43 (3): 991–1026.doi:10.1214/14-AOS1286.Sugiyama, Masashi, Shinichi Nakajima, Hisashi Kashima, Paul V. Buenau,and Motoaki Kawanabe. 2008. “Direct Importance Estimation with ModelSelection and Its Application to Covariate Shift Adaptation.” In Advancesin Neural Information Processing Systems 20, edited by J. C. Platt, D.Koller, Y. Singer, and S. T. Roweis, 1433–40. Curran Associates, Inc.http://papers.nips.cc/paper/3248-direct-importance-estimation-with-model-selection-and-its-application-to-covariate-shift-adaptation.pdf.Wang, J., and Mladen Kolar. 2016. “Inference for High-DimensionalExponential Family Graphical Models.” In Proc. of Aistats, edited byArthur Gretton and Christian C. Robert, 51:751–60.Xia, Yin, Tianxi Cai, and T. Tony Cai. 2015. “Testing DifferentialNetworks with Applications to the Detection of Gene-Gene Interactions.”Biometrika 102 (2). Oxford University Press (OUP): 247–66.doi:10.1093/biomet/asu074.Xue, Lingzhou, Hui Zou, and Tianxi Ca. 2012. “Nonconcave PenalizedComposite Conditional Likelihood Estimation of Sparse Ising Models.”Ann. Stat. 40 (3). Institute of Mathematical Statistics: 1403–29.http://projecteuclid.org/euclid.aos/1344610588.Yang, Eunho, Genevera I. Allen, Zhandong Liu, and Pradeep Ravikumar.2012. “Graphical Models via Generalized Linear Models.” In Advances inNeural Information Processing Systems 25, edited by F. Pereira, C.J.C.Burges, L. Bottou, and K.Q. Weinberger, 1358–66. Curran Associates, Inc.http://papers.nips.cc/paper/4617-graphical-models-via-generalized-linear-models.pdf.Yang, Eunho, Yulia Baker, Pradeep Ravikumar, Genevera I. Allen, andZhandong Liu. 2014. “Mixed Graphical Models via Exponential Families.”In Proc. 17th Int. Conf, Artif. Intel. Stat., 1042–50.Yang, Eunho, Pradeep Ravikumar, Genevera I. Allen, and Zhandong Liu.2013. “On Poisson Graphical Models.” In Advances in Neural InformationProcessing Systems 26, edited by C.J.C. Burges, L. Bottou, M. Welling, Z.Ghahramani, and K.Q. Weinberger, 1718–26. Curran Associates, Inc.http://papers.nips.cc/paper/5153-on-poisson-graphical-models.pdf.———. 2015. “On Graphical Models via Univariate Exponential FamilyDistributions.” J. Mach. Learn. Res. 16: 3813–47.http://jmlr.org/papers/v16/yang15a.html.Yu, Ming, Varun Gupta, and Mladen Kolar. 2016. “Statistical Inference forPairwise Graphical Models Using Score Matching.” In Advances in NeuralInformation Processing Systems 29. Curran Associates, Inc.Yuan, M., and Y. Lin. 2007. “Model Selection and Estimation in theGaussian Graphical Model.” Biometrika 94 (1): 19–35.Zhang, Bai, and Yue Wang. 2012. “Learning Structural Changes ofGaussian Graphical Models in Controlled Experiments.” Arxiv:1203.3532,March. http://arxiv.org/abs/1203.3532v1.Zhao, Sihai Dave, T. Tony Cai, and Hongzhe Li. 2014. “Direct Estimationof Differential Networks.” Biometrika 101 (2). Oxford University Press(OUP): 253–68. doi:10.1093/biomet/asu009.

30

https://doi.org/10.1080/01621459.2015.1010039

https://doi.org/10.1111/rssb.12033

http://proceedings.mlr.press/v48/fazayeli16.html

https://doi.org/10.1214/16-AOS1470



http://papers.nips.cc/paper/3248-direct-importance-estimation-with-model-selection-and-its-application-to-covariate-shift-adaptation.pdf



https://doi.org/10.1093/biomet/asu074

http://projecteuclid.org/euclid.aos/1344610588

http://papers.nips.cc/paper/4617-graphical-models-via-generalized-linear-models.pdf

http://papers.nips.cc/paper/4617-graphical-models-via-generalized-linear-models.pdf

http://papers.nips.cc/paper/5153-on-poisson-graphical-models.pdf

http://papers.nips.cc/paper/5153-on-poisson-graphical-models.pdf

http://jmlr.org/papers/v16/yang15a.html

http://arxiv.org/abs/1203.3532v1

https://doi.org/10.1093/biomet/asu009

estimation and inference for differential networks · estimatedbrainnetworks...

Documents