multitask learning for ggm

82
Inferring Multiple Graph Structures Julien Chiquet 1 , Yves Grandvalet 2 , Christophe Ambroise 1 1 Statistique et G ´ enome, CNRS & Universit ´ e d’ ´ Evry Val d’Essonne 2 Heudiasyc, CNRS & Universit ´ e de Technologie de Compi` egne NeMo – 21 juin 2010 Chiquet, Grandvalet, Ambroise, arXiv preprint. Inferring multiple Gaussian graphical structures. Chiquet, Grasseau, Charbonnier and Ambroise, R-package SIMoNe. http://stat.genopole.cnrs.fr/ ~ jchiquet/fr/softwares/simone Inferring Multiple Graph Structures 1

Upload: laboratoire-statistique-et-genome

Post on 24-May-2015

631 views

Category:

Documents


4 download

TRANSCRIPT

  • 1. Inferring Multiple Graph Structures Julien Chiquet1 , Yves Grandvalet2 , ChristopheAmbroise11 Statistique et Genome, CNRS & Universite dEvry Val dEssonne2`Heudiasyc, CNRS & Universite de Technologie de Compiegne NeMo 21 juin 2010 Chiquet, Grandvalet, Ambroise, arXiv preprint. Inferring multiple Gaussian graphical structures. Chiquet, Grasseau, Charbonnier and Ambroise, R-package SIMoNe. http://stat.genopole.cnrs.fr/~jchiquet/fr/softwares/simoneInferring Multiple Graph Structures 1

2. ProblemInference few arrays few examples lots of genes high dimension Which interactions? interactions very high dimension The main trouble is the low sample size and high dimensional setting Our main hope is to benet from sparsity: few genes interactInferring Multiple Graph Structures 2 3. Handling the scarcity of dataMerge several experimental conditionsexperiment 1 experiment 2 experiment 3Inferring Multiple Graph Structures3 4. Handling the scarcity of dataInferring each graph independently does not help experiment 1 experiment 2 experiment 3(1)(1) (2)(2)(3)(3)(X1 , . . . , Xn1 ) (X1 , . . . , Xn2 ) (X1 , . . . , Xn3 )inferenceinference inferenceInferring Multiple Graph Structures 3 5. Handling the scarcity of dataBy pooling all the available data experiment 1 experiment 2 experiment 3(X1 , . . . , Xn ), n = n1 + n2 + n3 . inferenceInferring Multiple Graph Structures 3 6. Handling the scarcity of data experiment 1 experiment 2 experiment 3(1)(1)(2)(2) (3)(3)(X1 , . . . , Xn1 )(X1 , . . . , Xn2 )(X1 , . . . , Xn3 )inference inferenceinferenceInferring Multiple Graph Structures 3 7. Handling the scarcity of dataBy breaking the separability experiment 1 experiment 2experiment 3(1)(1) (2)(2) (3)(3)(X1 , . . . , Xn1 ) (X1 , . . . , Xn2 )(X1 , . . . , Xn3 )inferenceinferenceinferenceInferring Multiple Graph Structures3 8. Handling the scarcity of dataBy breaking the separability experiment 1 experiment 2experiment 3(1)(1) (2)(2) (3)(3)(X1 , . . . , Xn1 ) (X1 , . . . , Xn2 )(X1 , . . . , Xn3 )inferenceinferenceinferenceInferring Multiple Graph Structures3 9. OutlineStatistical modelMulti-task learningAlgorithms and methodsModel selectionExperimentsInferring Multiple Graph Structures 4 10. OutlineStatistical modelMulti-task learningAlgorithms and methodsModel selectionExperimentsInferring Multiple Graph Structures 5 11. Gaussian graphical modelingLetX = (X1 , . . . , Xp ) N (0p , ) and assume n i.i.d. copies of X,X be the n p matrix whose kth row is Xk , = (ij )i,jP 1 be the concentration matrix.Graphical interpretationSince corij|P{i,j} = ij / ii jj for i = j,ij = 0 Xi Xj |XP{i,j} oredge (i, j) network./ non zeroes in describes the graph structure.Inferring Multiple Graph Structures6 12. The model likelihoodLet S = n1 X X be the empirical variance-covariance matrix: Sis a sufcient statistic for X L(; X) = L(; S)The log-likelihoodn n nL(; S) = log det() trace(S) log(2).2 2 2The MLE of is S1not dened for n < pnot sparse fully connected graphInferring Multiple Graph Structures 7 13. Penalized ApproachesPenalized Likelihood (Banerjee et al., 2008)max L(; S) 1S+well dened for n < psparse sensible graphSDP of size O(p2 ) (solved by Friedman et al., 2007) Neighborhood Selection (Meinshausen & Bulhman, 2006)12 = argminXj Xj 2 + 1 Rp1 nwhere Xj is the jth column of X and Xj is X deprived of Xjnot symmetric, not positive-denitep independent L ASSO problems of size (p 1)Inferring Multiple Graph Structures8 14. Penalized ApproachesPenalized Likelihood (Banerjee et al., 2008)max L(; S) 1S+well dened for n < psparse sensible graphSDP of size O(p2 ) (solved by Friedman et al., 2007) Neighborhood Selection (Meinshausen & Bulhman, 2006)12 = argminXj Xj 2 + 1 Rp1 nwhere Xj is the jth column of X and Xj is X deprived of Xjnot symmetric, not positive-denitep independent L ASSO problems of size (p 1)Inferring Multiple Graph Structures8 15. Neighborhood vs. LikelihoodPseudo-likelihood (Besag, 1975) pP(X1 , . . . , Xp ) P(Xj |{Xk }k=j )j=1n nnlog det(D) L(; S) = trace SD1 2 log(2)2 2 2n nnL(; S) = log det() trace(S) log(2)2 22with D = diag().Proposition (Ambroise, Chiquet, Matias, 2008)Neighborhood selection leads to the graph maximizing thepenalized pseudo-log-likelihood Proof: i = ij , where = arg max L(; S) 1jjInferring Multiple Graph Structures9 16. Neighborhood vs. LikelihoodPseudo-likelihood (Besag, 1975) pP(X1 , . . . , Xp ) P(Xj |{Xk }k=j )j=1n nnlog det(D) L(; S) = trace SD1 2 log(2)2 2 2n nnL(; S) = log det() trace(S) log(2)2 22with D = diag().Proposition (Ambroise, Chiquet, Matias, 2008)Neighborhood selection leads to the graph maximizing thepenalized pseudo-log-likelihood Proof: i = ij , where = arg max L(; S) 1jjInferring Multiple Graph Structures9 17. OutlineStatistical modelMulti-task learningAlgorithms and methodsModel selectionExperimentsInferring Multiple Graph Structures 10 18. Multi-task learningWe have T samples (experimental cond.) of the same variablesX(t) is the tth data matrix, S(t) is the empirical covarianceexamples are assumed to be drawn from N (0, (t) )Ignoring the relationships between the tasks leads to separableobjectivesmaxL((t) ; S(t) ) (t) 1(t) Rpp ,t=1...,TMulti-task learning = solving the T tasks jointlyWe may couple the objectivesthrough the tting term term,through the penalty term.Inferring Multiple Graph Structures 11 19. Multi-task learningWe have T samples (experimental cond.) of the same variablesX(t) is the tth data matrix, S(t) is the empirical covarianceexamples are assumed to be drawn from N (0, (t) )Ignoring the relationships between the tasks leads to separableobjectivesmaxL((t) ; S(t) ) (t) 1(t) Rpp ,t=1...,TMulti-task learning = solving the T tasks jointlyWe may couple the objectivesthrough the tting term term,through the penalty term.Inferring Multiple Graph Structures 11 20. Coupling through the tting termIntertwined L ASSOTmax L((t) ; S(t) ) (t) 1 (t) ,t...,Tt=11 T(t) is the pooled-tasks covariance matrix.S=n t=1 nt SS(t) =S(t) + (1 )S is a mixture between specic andpooled covariance matrices. = 0 pools the data sets and infers a single graph = 1 separates the data sets and infers T graphsindependently = 1/2 in all our experimentsInferring Multiple Graph Structures12 21. Coupling through penalties: group-L ASSO X1 X2We group parameters by sets ofcorresponding edges across graphs: X3 X4Graphical group-L ASSOTT 1/2 (t) (t)(t) 2max L ;S ij (t) ,t...,Tt=1i,j t=1i=jSparsity pattern shared between graphsIdentical graphs across tasksInferring Multiple Graph Structures13 22. Coupling through penalties: group-L ASSO X1X2X1X2We group parameters by sets of X1X1 X2X2corresponding edges across graphs: X3X4X3X4 X3X4X3X4Graphical group-L ASSOTT1/2 (t) (t)(t) 2max L ;S ij (t) ,t...,Tt=1i,j t=1i=jSparsity pattern shared between graphsIdentical graphs across tasksInferring Multiple Graph Structures13 23. Coupling through penalties: group-L ASSO X1X2X1X2We group parameters by sets of X1X1 X2X2corresponding edges across graphs: X3X4X3X4 X3X4X3X4Graphical group-L ASSOTT1/2 (t) (t)(t) 2max L ;S ij (t) ,t...,Tt=1i,j t=1i=jSparsity pattern shared between graphsIdentical graphs across tasksInferring Multiple Graph Structures13 24. Coupling through penalties: group-L ASSO X1X2X1X2We group parameters by sets of X1X1 X2X2corresponding edges across graphs: X3X4X3X4 X3X4X3X4Graphical group-L ASSOTT1/2 (t) (t)(t) 2max L ;S ij (t) ,t...,Tt=1i,j t=1i=jSparsity pattern shared between graphsIdentical graphs across tasksInferring Multiple Graph Structures13 25. Coupling through penalties: group-L ASSO X1X2X1X2We group parameters by sets of X1X1 X2X2corresponding edges across graphs: X3X4X3X4 X3X4X3X4Graphical group-L ASSOTT1/2 (t) (t)(t) 2max L ;S ij (t) ,t...,Tt=1i,j t=1i=jSparsity pattern shared between graphsIdentical graphs across tasksInferring Multiple Graph Structures13 26. Coupling through penalties: cooperative-L ASSOSame grouping, and bet that X1X2correlations are likely to be signconsistentGene interactions are eitherX3X4inhibitory or activating across assaysGraphical cooperative-L ASSOT T1 T1 (t)(t)(t) 2 2(t) 2 2max L S ;ij+ ij(t) +t=1,...,T t=1 i,j t=1t=1 i=jwhere [u]+ = max(0, u) and [u] = min(0, u).Plausible in many other situationsSparsity pattern shared between graphs, which may differInferring Multiple Graph Structures 14 27. Coupling through penalties: cooperative-L ASSOSame grouping, and bet that X1X2correlations are likely to be signconsistentGene interactions are eitherX3X4inhibitory or activating across assaysGraphical cooperative-L ASSOT T1 T1 (t)(t)(t) 2 2(t) 2 2max L S ;ij+ ij(t) +t=1,...,T t=1 i,j t=1t=1 i=jwhere [u]+ = max(0, u) and [u] = min(0, u).Plausible in many other situationsSparsity pattern shared between graphs, which may differInferring Multiple Graph Structures 14 28. Coupling through penalties: cooperative-L ASSOSame grouping, and bet that X1 X2 X1 X2correlations are likely to be signX1 X2 X1 X2consistentGene interactions are eitherX3 X4inhibitory or activating across assays X3 X4X3 X4 X3 X4Graphical cooperative-L ASSOT T 1 T1 (t)(t)(t) 22(t) 22max L S ;ij +ij(t) +t=1,...,T t=1 i,j t=1 t=1 i=jwhere [u]+ = max(0, u) and [u] = min(0, u).Plausible in many other situationsSparsity pattern shared between graphs, which may differInferring Multiple Graph Structures14 29. Coupling through penalties: cooperative-L ASSOSame grouping, and bet that X1 X2 X1 X2correlations are likely to be signX1 X2 X1 X2consistentGene interactions are eitherX3 X4inhibitory or activating across assays X3 X4X3 X4 X3 X4Graphical cooperative-L ASSOT T 1 T1 (t)(t)(t) 22(t) 22max L S ;ij +ij(t) +t=1,...,T t=1 i,j t=1 t=1 i=jwhere [u]+ = max(0, u) and [u] = min(0, u).Plausible in many other situationsSparsity pattern shared between graphs, which may differInferring Multiple Graph Structures14 30. Coupling through penalties: cooperative-L ASSOSame grouping, and bet that X1 X2 X1 X2correlations are likely to be signX1 X2 X1 X2consistentGene interactions are eitherX3 X4inhibitory or activating across assays X3 X4X3 X4 X3 X4Graphical cooperative-L ASSOT T 1 T1 (t)(t)(t) 22(t) 22max L S ;ij +ij(t) +t=1,...,T t=1 i,j t=1 t=1 i=jwhere [u]+ = max(0, u) and [u] = min(0, u).Plausible in many other situationsSparsity pattern shared between graphs, which may differInferring Multiple Graph Structures14 31. Coupling through penalties: cooperative-L ASSOSame grouping, and bet that X1 X2 X1 X2correlations are likely to be signX1 X2 X1 X2consistentGene interactions are eitherX3 X4inhibitory or activating across assays X3 X4X3 X4 X3 X4Graphical cooperative-L ASSOT T 1 T1 (t)(t)(t) 22(t) 22max L S ;ij +ij(t) +t=1,...,T t=1 i,j t=1 t=1 i=jwhere [u]+ = max(0, u) and [u] = min(0, u).Plausible in many other situationsSparsity pattern shared between graphs, which may differInferring Multiple Graph Structures14 32. Coupling through penalties: cooperative-L ASSOSame grouping, and bet that X1 X2 X1 X2correlations are likely to be signX1 X2 X1 X2consistentGene interactions are eitherX3 X4inhibitory or activating across assays X3 X4X3 X4 X3 X4Graphical cooperative-L ASSOT T 1 T1 (t)(t)(t) 22(t) 22max L S ;ij +ij(t) +t=1,...,T t=1 i,j t=1 t=1 i=jwhere [u]+ = max(0, u) and [u] = min(0, u).Plausible in many other situationsSparsity pattern shared between graphs, which may differInferring Multiple Graph Structures14 33. OutlineStatistical modelMulti-task learningAlgorithms and methodsModel selectionExperimentsInferring Multiple Graph Structures 15 34. A Geometric View of Sparsity Constrained Optimization L(1 , 2 ) max L(1 , 2 ) (1 , 2 ) 1 ,2 21Inferring Multiple Graph Structures 16 35. A Geometric View of Sparsity Constrained Optimization L(1 , 2 ) max L(1 , 2 ) (1 , 2 ) 1 ,2 max L(1 , 2 ) 1 ,2 s.t. (1 , 2 ) c 21Inferring Multiple Graph Structures 16 36. A Geometric View of Sparsity Constrained Optimizationmax L(1 , 2 ) (1 , 2 )1 ,2max L(1 , 2 ) 1 ,2s.t. (1 , 2 ) c 21Inferring Multiple Graph Structures16 37. A Geometric View of Sparsity Supporting HyperplaneAn hyperplane supports a set iffthe set is contained in one half-spacethe set has at least one point on the hyperplane 21Inferring Multiple Graph Structures17 38. A Geometric View of Sparsity Supporting HyperplaneAn hyperplane supports a set iffthe set is contained in one half-spacethe set has at least one point on the hyperplane 221 1Inferring Multiple Graph Structures17 39. A Geometric View of Sparsity Supporting HyperplaneAn hyperplane supports a set iffthe set is contained in one half-spacethe set has at least one point on the hyperplane 221 1Inferring Multiple Graph Structures17 40. A Geometric View of Sparsity Supporting HyperplaneAn hyperplane supports a set iffthe set is contained in one half-spacethe set has at least one point on the hyperplane 221 1Inferring Multiple Graph Structures17 41. A Geometric View of Sparsity Supporting HyperplaneAn hyperplane supports a set iffthe set is contained in one half-spacethe set has at least one point on the hyperplane 22 21 11 There are Supporting Hyperplane at all points of convex sets:Generalize tangentsInferring Multiple Graph Structures17 42. A Geometric View of Sparsity Dual ConeGeneralizes normals 22211 1Inferring Multiple Graph Structures 18 43. A Geometric View of Sparsity Dual ConeGeneralizes normals 22211 1Inferring Multiple Graph Structures 18 44. A Geometric View of Sparsity Dual ConeGeneralizes normals 22211 1Inferring Multiple Graph Structures 18 45. A Geometric View of Sparsity Dual ConeGeneralizes normals 222111 Shape of dual cones sparsity patternInferring Multiple Graph Structures18 46. Group-L ASSO balls(2)(2) 2=0 2 = 0.3Admissible set 2 tasks (T = 2) 1 12 coefcients (p = 2) =0(1)(1) 2 11 2 11(2)Unit ball 11 1 (1) (1)1 12 21/2(t) 2i 11 1 i=1t=1 = 0.3(1)(1) 2 11 2 11(2) 11 1 (1) (1)1 1Inferring Multiple Graph Structures19 47. Group-L ASSO balls(2)(2) 2=0 2 = 0.3Admissible set 2 tasks (T = 2) 1 12 coefcients (p = 2) =0(1)(1) 2 11 2 11(2)Unit ball 11 1 (1) (1)1 12 21/2(t) 2i 11 1 i=1t=1 = 0.3(1)(1) 2 11 2 11(2) 11 1 (1) (1)1 1Inferring Multiple Graph Structures19 48. Group-L ASSO balls(2)(2) 2=0 2 = 0.3Admissible set 2 tasks (T = 2) 1 12 coefcients (p = 2) =0(1)(1) 2 11 2 11(2)Unit ball 11 1 (1) (1)1 12 21/2(t) 2i 11 1 i=1t=1 = 0.3(1)(1) 2 11 2 11(2) 11 1 (1) (1)1 1Inferring Multiple Graph Structures19 49. Group-L ASSO balls(2)(2) 2=0 2 = 0.3Admissible set 2 tasks (T = 2) 1 12 coefcients (p = 2) =0(1)(1) 2 11 2 11(2)Unit ball 11 1 (1) (1)1 12 21/2(t) 2i 11 1 i=1t=1 = 0.3(1)(1) 2 11 2 11(2) 11 1 (1) (1)1 1Inferring Multiple Graph Structures19 50. Cooperative-L ASSO balls(2)(2) 2=0 2 = 0.3Admissible set 2 tasks (T = 2)2 coefcients (p = 2) 1 1Unit ball =0(1)(1) 2 11 2 11(2)1/2 12 2 (t) 21 1 (1) (1)j1 1 +j=1 t=1 2 2 1/2 1 1 (t) 2 = 0.3+j 1 (1) 2 (1)2 +1111j=1t=1(2) 11 1 (1) (1)1 1Inferring Multiple Graph Structures20 51. Cooperative-L ASSO balls(2)(2) 2=0 2 = 0.3Admissible set 2 tasks (T = 2)2 coefcients (p = 2) 1 1Unit ball =0(1)(1) 2 11 2 11(2)1/2 12 2 (t) 21 1 (1) (1)j1 1 +j=1 t=1 2 2 1/2 1 1 (t) 2 = 0.3+j 1 (1) 2 (1)2 +1111j=1t=1(2) 11 1 (1) (1)1 1Inferring Multiple Graph Structures20 52. Cooperative-L ASSO balls(2)(2) 2=0 2 = 0.3Admissible set 2 tasks (T = 2)2 coefcients (p = 2) 1 1Unit ball =0(1)(1) 2 11 2 11(2)1/2 12 2 (t) 21 1 (1) (1)j1 1 +j=1 t=1 2 2 1/2 1 1 (t) 2 = 0.3+j 1 (1) 2 (1)2 +1111j=1t=1(2) 11 1 (1) (1)1 1Inferring Multiple Graph Structures20 53. Cooperative-L ASSO balls(2)(2) 2=0 2 = 0.3Admissible set 2 tasks (T = 2)2 coefcients (p = 2) 1 1Unit ball =0(1)(1) 2 11 2 11(2)1/2 12 2 (t) 21 1 (1) (1)j1 1 +j=1 t=1 2 2 1/2 1 1 (t) 2 = 0.3+j 1 (1) 2 (1)2 +1111j=1t=1(2) 11 1 (1) (1)1 1Inferring Multiple Graph Structures20 54. Cooperative-L ASSO balls(2)(2) 2=0 2 = 0.3Admissible set 2 tasks (T = 2)2 coefcients (p = 2) 1 1Unit ball =0(1)(1) 2 11 2 11(2)1/2 12 2 (t) 21 1 (1) (1)j1 1 +j=1 t=1 2 2 1/2 1 1 (t) 2 = 0.3+j 1 (1) 2 (1)2 +1111j=1t=1(2) 11 1 (1) (1)1 1Inferring Multiple Graph Structures20 55. Cooperative-L ASSO balls(2)(2) 2=0 2 = 0.3Admissible set 2 tasks (T = 2)2 coefcients (p = 2) 1 1Unit ball =0(1)(1) 2 11 2 11(2)1/2 12 2 (t) 21 1 (1) (1)j1 1 +j=1 t=1 2 2 1/2 1 1 (t) 2 = 0.3+j 1 (1) 2 (1)2 +1111j=1t=1(2) 11 1 (1) (1)1 1Inferring Multiple Graph Structures20 56. Decomposition strategy Estimate the j th neighborhood of the T graphsTmax L(K(t) ; S(t) ) (K(t) ) K(t) ,t=1...,Tt=1decomposes into p convex optimization problems of size j = argmin fj () + ()RT (p1)where j is a minimizer iff 0 fj () + ()Inferring Multiple Graph Structures21 57. Decomposition strategy Estimate the j th neighborhood of the T graphsTmax L(K(t) ; S(t) ) (K(t) ) K(t) ,t=1...,Tt=1decomposes into p convex optimization problems of size j = argmin fj () + ()RT (p1)where j is a minimizer iff 0 fj () + ()Group-L ASSO: p1[1:T ]() = i 2 i=1 [1:T ]where i is the vector corresponding to the edges (i, j) acrossgraphsInferring Multiple Graph Structures21 58. Decomposition strategy Estimate the j th neighborhood of the T graphsTmax L(K(t) ; S(t) ) (K(t) ) K(t) ,t=1...,Tt=1decomposes into p convex optimization problems of size j = argmin fj () + ()RT (p1)where j is a minimizer iff 0 fj () + ()Coop-L ASSO:p1[1:T ][1:T ] () = i + i + 2 + 2i=1 [1:T ]where i is the vector corresponding to the edges (i, j) acrossgraphsInferring Multiple Graph Structures21 59. Active set algorithm:yellow belt// 0. INITIALIZATION 0, A while 0 L() do/// 1. MASTER PROBLEM: OPTIMIZATION WITH RESPECT TO AFind a solution h to the smooth problem h f ( A + h) + h ( A + h) = 0,where h = { h } . A A + h// 2. IDENTIFY NEWLY ZEROED VARIABLESA A{i}// 3. IDENTIFY NEW NON-ZERO VARIABLES// Select a candidate i Acf ()i arg max vj , where vj = minj + jAc jendInferring Multiple Graph Structures 22 60. Active set algorithm:orange belt// 0. INITIALIZATION 0, A while 0 L() do/// 1. MASTER PROBLEM: OPTIMIZATION WITH RESPECT TO AFind a solution h to the smooth problem h f ( A + h) + h ( A + h) = 0,where h = { h } . A A + h// 2. IDENTIFY NEWLY ZEROED VARIABLESA A{i}// 3. IDENTIFY NEW NON-ZERO VARIABLES// Select a candidate i Ac which violates the more the optimalityconditionsf ()i arg max vj , where vj = minj + jAc jif it exists such an i then A A {i}else Stop and return , which is optimalendendInferring Multiple Graph Structures 22 61. Active set algorithm: green belt// 0. INITIALIZATION 0, A while 0 L() do/// 1. MASTER PROBLEM: OPTIMIZATION WITH RESPECT TO AFind a solution h to the smooth problem h f ( A + h) + h ( A + h) = 0, where h = { h } . A A + h// 2. IDENTIFY NEWLY ZEROED VARIABLESf ()while i A : i = 0 and min + = 0 do i i A A{i}end// 3. IDENTIFY NEW NON-ZERO VARIABLES// Select a candidate i Ac such that an infinitesimal change of iprovides the highest reduction of L f ()i arg max vj , where vj = min + j jAcjif vi = 0 thenA A {i}elseStop and return , which is optimalendendInferring Multiple Graph Structures22 62. OutlineStatistical modelMulti-task learningAlgorithms and methodsModel selectionExperimentsInferring Multiple Graph Structures 23 63. Tuning the penalty parameter What does the literature say?Theory based penalty choices 1. Optimal order of penalty in the pn framework: n log p Bunea et al. 2007, Bickel et al. 20092. Control on the probability of connecting two distinct connectivity setsMeinshausen et al. 2006, Banerjee et al. 2008, Ambroise et al. 2009practically much too conservativeCross-validationOptimal in terms of prediction, not in terms of selectionProblematic with small samples:changes the sparsity constraint due to sample sizeInferring Multiple Graph Structures24 64. Tuning the penalty parameter BIC / AICTheorem (Zou et al. 2008)lasso lasso df( ) = 0Straightforward extensions to the graphical framework log nBIC() = L( ; X) df( ) 2 AIC() = L( ; X) df( )Rely on asymptotic approximations, but still relevant for smalldata setInferring Multiple Graph Structures25 65. OutlineStatistical modelMulti-task learningAlgorithms and methodsModel selectionExperimentsInferring Multiple Graph Structures 26 66. Data GenerationWe setthe number of nodes pthe number of edges Kthe number of examples nProcess 1. Generate a random adjacency matrix with 2 Koff-diagonal terms2. Compute the normalized Laplacian L3. Generate a symmetric matrix of random signs R4. Compute the concentration matrix Kij = Lij Rij5. compute by pseudo-inversion of K6. generate correlated Gaussian data N (0, )Inferring Multiple Graph Structures 27 67. Simulating Related TasksGenerate 1. an ancestor with p = 20 nodes and K = 20 edges2. T = 4 children by adding and deleting edges3. T = 4 Gaussian samplesFigure: ancestor and children with = 2 perturbationsInferring Multiple Graph Structures28 68. Simulating Related TasksGenerate 1. an ancestor with p = 20 nodes and K = 20 edges2. T = 4 children by adding and deleting edges3. T = 4 Gaussian samplesFigure: ancestor and children with = 2 perturbationsInferring Multiple Graph Structures28 69. Simulating Related TasksGenerate 1. an ancestor with p = 20 nodes and K = 20 edges2. T = 4 children by adding and deleting edges3. T = 4 Gaussian samplesFigure: ancestor and children with = 2 perturbationsInferring Multiple Graph Structures28 70. Simulation results Precision/Recall curve ROC curve precision = TP/(TP+FP) fallout = FP/N (type I error)recall = TP/P (power) recall = TP/P (power)Inferring Multiple Graph Structures 29 71. Simulation results large sample sizepenalty: max 0 penalty: max 01.0 1.00.8 0.80.6 0.6precisionrecall0.4 0.4CoopLasso CoopLasso0.2 0.2GroupLassoGroupLassoIntertwined IntertwinedIndependent IndependentPooledPooled0.0 0.00.0 0.2 0.4 0.6 0.8 1.00.00.20.40.6 0.8 1.0 recall falloutFigure: nt = 100, = 1Inferring Multiple Graph Structures 29 72. Simulation results large sample sizepenalty: max 0 penalty: max 01.0 1.00.8 0.80.6 0.6precisionrecall0.4 0.4CoopLasso CoopLasso0.2 0.2GroupLassoGroupLassoIntertwined IntertwinedIndependent IndependentPooledPooled0.0 0.00.0 0.2 0.4 0.6 0.8 1.00.00.20.40.6 0.8 1.0 recall falloutFigure: nt = 100, = 3Inferring Multiple Graph Structures 29 73. Simulation results large sample sizepenalty: max 0 penalty: max 01.0 1.00.8 0.80.6 0.6precisionrecall0.4 0.4CoopLasso CoopLasso0.2 0.2GroupLassoGroupLassoIntertwined IntertwinedIndependent IndependentPooledPooled0.0 0.00.0 0.2 0.4 0.6 0.8 1.00.00.20.40.6 0.8 1.0 recall falloutFigure: nt = 100, = 5Inferring Multiple Graph Structures 29 74. Simulation results medium sample sizepenalty: max 0penalty: max 01.01.00.80.80.60.6precision recall0.40.4CoopLassoCoopLasso0.20.2GroupLasso GroupLassoIntertwinedIntertwinedIndependentIndependentPooled Pooled0.00.00.0 0.2 0.4 0.60.8 1.00.00.20.40.6 0.8 1.0 recallfalloutFigure: nt = 50, = 1Inferring Multiple Graph Structures29 75. Simulation results medium sample sizepenalty: max 0penalty: max 01.01.00.80.80.60.6precision recall0.40.4CoopLassoCoopLasso0.20.2GroupLasso GroupLassoIntertwinedIntertwinedIndependentIndependentPooled Pooled0.00.00.0 0.2 0.4 0.60.8 1.00.00.20.40.6 0.8 1.0 recallfalloutFigure: nt = 50, = 3Inferring Multiple Graph Structures29 76. Simulation results medium sample sizepenalty: max 0penalty: max 01.01.00.80.80.60.6precision recall0.40.4CoopLassoCoopLasso0.20.2GroupLasso GroupLassoIntertwinedIntertwinedIndependentIndependentPooled Pooled0.00.00.0 0.2 0.4 0.60.8 1.00.00.20.40.6 0.8 1.0 recallfalloutFigure: nt = 50, = 5Inferring Multiple Graph Structures29 77. Simulation results small sample sizepenalty: max 0penalty: max 01.01.00.80.80.60.6precision recall0.40.4CoopLassoCoopLasso0.20.2GroupLasso GroupLassoIntertwinedIntertwinedIndependentIndependentPooled Pooled0.00.00.0 0.2 0.4 0.60.8 1.00.00.20.40.6 0.8 1.0 recallfalloutFigure: nt = 25, = 1Inferring Multiple Graph Structures29 78. Simulation results small sample sizepenalty: max 0penalty: max 01.01.00.80.80.60.6precision recall0.40.4CoopLassoCoopLasso0.20.2GroupLasso GroupLassoIntertwinedIntertwinedIndependentIndependentPooled Pooled0.00.00.0 0.2 0.4 0.60.8 1.00.00.20.40.6 0.8 1.0 recallfalloutFigure: nt = 25, = 3Inferring Multiple Graph Structures29 79. Simulation results small sample sizepenalty: max 0penalty: max 01.01.00.80.80.60.6precision recall0.40.4CoopLassoCoopLasso0.20.2GroupLasso GroupLassoIntertwinedIntertwinedIndependentIndependentPooled Pooled0.00.00.0 0.2 0.4 0.60.8 1.00.00.20.40.6 0.8 1.0 recallfalloutFigure: nt = 25, = 5Inferring Multiple Graph Structures29 80. Breast Cancer Prediction of the outcome of preoperative chemotherapyTwo types of patientsPatient response can be classied either as1. pathologic complete response (PCR)2. residual disease (not PCR)Gene expression data133 patients (99 not PCR, 34 PCR)26 identied genes (differential analysis)Inferring Multiple Graph Structures 30 81. Package Democancer data: Coop-LassoInferring Multiple Graph Structures 31 82. ConclusionsTo sum-upClaried links between neighborhood selection and graphicalL ASSOIdentied the relevance of Multi-Task Learning in networkinferenceFirst methods for inferring multiple Gaussian Graphical ModelsConsistent improvements upon the available baseline solutionsAvailable in the R package SIMoNePerspectivesExplore model-selection capabilitiesOther applications of the Cooperative-L ASSOTheoretical analysis (uniqueness, selection consistency)Inferring Multiple Graph Structures32