structural deep brain network mining - uic computer scienceclu/doc/kdd17_sdbn.pdf · we propose a...

Structural Deep Brain Network MiningShen Wang

University of Illinois at ChicagoChicago, USA

[email protected]

Lifang He∗ University of Illinois at Chicago

Chicago, USA [email protected]

Bokai CaoUniversity of Illinois at Chicago

Chicago, [email protected]

Chun-Ta LuUniversity of Illinois at Chicago


Philip S. YuUniversity of Illinois at Chicago

Chicago, USATsinghua University

Beijing, [email protected]

Ann B. RaginNorthwestern University


ABSTRACTMining from neuroimaging data is becoming increasingly popularin the �eld of healthcare and bioinformatics, due to its potential todiscover clinically meaningful structure pa�erns that could facili-tate the understanding and diagnosis of neurological and neuropsy-chiatric disorders. Most recent research concentrates on applyingsubgraph mining techniques to discover connected subgraph pat-terns in the brain network. However, the underlying brain networkstructure is complicated. As a shallow linear model, subgraph min-ing cannot capture the highly non-linear structures, resulting insub-optimal pa�erns. �erefore, how to learn representations thatcan capture the highly non-linearity of brain networks and preservethe underlying structures is a critical problem.

In this paper, we propose a Structural Deep Brain Network min-ing method, namely SDBN, to learn highly non-linear and structure-preserving representations of brain networks. Speci�cally, we �rstintroduce a novel graph reordering approach based on moduleidenti�cation, which rearranges the order of the nodes to preservethe modular structure of the graph. Next, we perform structuralaugmentation to further enhance the spatial information of thereordered graph. �en we propose a deep feature learning frame-work for combining supervised learning and unsupervised learningin a small-scale se�ing, by augmenting Convolutional Neural Net-work (CNN) with decoding pathways for reconstruction. With thehelp of the multiple layers of non-linear mapping, the proposedSDBN approach can capture the highly non-linear structure ofbrain networks. Further, it has be�er generalization capability forhigh-dimensional brain networks and works well even for smallsample learning. Bene�t from CNN’s task-oriented learning style,the learned hierarchical representation is meaningful for the clinical

∗Corresponding author.

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor pro�t or commercial advantage and that copies bear this notice and the full citationon the �rst page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permi�ed. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior speci�c permission and/or afee. Request permissions from [email protected]’17, August 13–17, 2017, Halifax, NS, Canada.© 2017 ACM. ISBN 978-1-4503-4887-4/17/08. . .$15.00DOI: h�p://dx.doi.org/10.1145/3097983.3097988

task. To evaluate the proposed SDBN method, we conduct exten-sive experiments on four real brain network datasets for diseasediagnoses. �e experiment results show that SDBN can capturediscriminative and meaningful structural graph representations forbrain disorder diagnosis.

CCS CONCEPTS•Information systems →Data mining; •Computing method-ologies →Machine learning;

KEYWORDSbrain network, deep learning, graph reordering

1 INTRODUCTIONIn many neurological and neuropsychiatric disorders, brain involve-ment, including irreversible loss of brain tissue and deterioration incognitive function [19], has deleterious consequences for judgmentand function. Early detection in subclinical periods is critical foraiding clinical diagnosis, clarifying underlying mechanisms andinforming neuroprotective interventions to slow or reverse neuralinjury for a broad spectrum of brain disorders, e.g., HIV infection,a�ention-de�cit/hyperactivity disorder (ADHD), and bipolar disor-der (BP). �e last several decades have witnessed rapid progress innoninvasive neuroimaging techniques, e.g., functional resting-statemagnetic resonance imaging (fMRI) and structural di�usion tensorimaging (DTI), from which an underlying brain connectivity net-work (a.k.a, connectome [42]) can be interrogated. New capacilitiesfor investigating the vast pa�erns of connectivity involving billionsof neurons, promise to yield new insights concerning the normalbrain and many disorders. Brain connectomes are o�en representedas graphs, where nodes represent brain regions and edges representconnections between pairs of regions.

Learning discriminative and clinically meaningful representa-tions from brain networks has the following challenges:• High non-linearity: [30] found that the underlying structure

of graph data is o�en non-linear. Brain network, as a typical ex-ample of graph data, has highly non-linear interactions betweencortical regions [36]. How to build the model to capture thesehighly non-linear structure is challenging.

• Structure preserving: �e structure of brain network plays akey role in brain network analysis. However underlying struc-ture of brain network is highly complicated [37]. �erefore,preserving the structure is di�cult during the brain networkmining.

• Noise: �e analysis of real brain networks usually su�ers fromnoise. For example, some brain connections may be wronglyplaced during fMRI analysis. �is kind of noise may introduceerrors into diagnosis of the brain disorders.

• Limited quantity and high dimensionality: imaging studiesof neurological disorders o�en involve a limited number of sam-ples due to the limited number of available patients of interestand the high costs of acquiring the data, which signi�cantlylimits the graph mining method that can be used.In the literature, many brain network mining methods have

been proposed. Most of them concentrate on selecting frequentsubstructures, such as connected subgraph pa�erns[6, 9, 22, 26].However, since connections between brain regions are character-ized by massive complexity, it is di�cult for traditional subgraphmining models to accurately capture or approximate the compli-cated highly non-linear structure of brain networks. Moreover,these methods only search for most frequent simple pa�erns, anddo not consider any underlying structural information of brainnetworks. Additionally, as the number of subgraph pa�erns is ex-ponential to the size of graphs, the subgraph mining process is bothtime consuming and memory intensive. Recently, some researchersleverage kernel techniques as a way to perform brain networklearning [23, 46], where graph kernel is used to integrate local con-nectivity and global topological properties of brain networks. Fromthe perspective of model architecture depth, all the above methodsare shallow models. As stated in [47] and [50], shallow models arerecognized as encountering the curse of dimensionality, and havinglimited capability in learning the feature representation in complexsituations.

To address the above challenges, we propose a Structural DeepBrain Network Mining method, namely SDBN, to learn highly non-linear and structure-preserving representations of brain networks.Speci�cally, we consider to extract the locally connected ROIs asthe receptive �elds in the brain network and construct a multilayerconvolutional neural network (CNN) architecture, which consistsof multiple non-linear gating functions, for learning the graph rep-resentation in the highly non-linear latent space. To solve the issuesof structural preserving and graph noise, we exploit the underly-ing graph structure in threefold. First, we propose a novel graphreordering approach based on spectral clustering to rearrange theorder of the nodes, which can preserve the modular structure ofthe graph. Second, we perform structural augmentation to fur-ther enhance the spatial information of the reordered graph forstructure-preserving and noise-robust feature learning. �ird, wepropose a deep feature learning framework for joint supervisedand unsupervised learning in a small-scale se�ing by augmentingmultilayer CNN with decoding pathways for reconstruction. Bene-�ting from the multiple layers of non-linear mapping, the proposedSDBN method can capture the highly non-linear structure of brainnetworks. With the help of the regularization power of unsuper-vised learning augmentation, the proposed deep model is robust to

small datasets. Moreover, CNN learns features in a task-orientedstyle and can capture the meaningful feature for clinical tasks.

In summary, this paper makes the following contributions:• We propose a Structural Deep Brain Network mining method,

namely SDBN, to learn discriminative and meaningful graphrepresentation from brain networks. �e learned graph represen-tation residing in the highly non-linear latent space can preservethe graph structure and be robust to the noise.

• A graph reordering technique for brain networks is proposedwith the help of spectral clustering. �is technique correspondsbrain networks from di�erent subjects and makes them compa-rable. �e CNN-based deep feature learning method becomesmore reliable and the underlying graph structure can be exposed.

• A structural augmentation is introduced to enhance the structureinformation by adding the spatial information and to alleviatethe e�ect of the noise.

• A deep feature learning framework for joint supervised andunsupervised learning is proposed in a small-scale se�ing byaugmenting existing neural networks with decoding pathwaysfor reconstruction. �e unsupervised learning augmentation hasa regularization e�ect to make the learned features have be�ergeneralization from the high dimensional data.

• �e proposed method is extensively evaluated on brain disorderdetection with four real brain network datasets. �e experimentsshow that SDBN can capture discriminative and meaningfulstructural graph representations for brain disorder diagnosis.

2 PRELIMINARY AND PROBLEMFORMULATION

In this section, we �rst establish the key notational conventionsthat we will use in the sequel. We then de�ne the problem we areconcerned with brain network mining.

Notations. �roughout this paper, vectors are denoted by bold-face lowercase le�ers (e.g., a), matrices are denoted by boldfacecapital le�ers (e.g., A), and cubes are denoted by calligraphic le�ers(e.g., A). For a matrix A ∈ RI×J , its elements are denoted by ai j ,and its i-th row and j-th column are denoted by ai and aj , respec-tively. We denote a weighted graph by a tripleG = (V ,E,W), whereV = {v1,v2, · · · ,vN } is the set of nodes, E ⊆ V × V is the set ofedges connecting nodes, and W is the connectivity weight matrixencoding connectivity strengths of the edges. In particular, N = |V |denotes the number of nodes and M = |E | denotes the number ofedges of the graph. We will o�en use calligraphic le�ers (G,H , · · · )to denote general sets or spaces, regardless of their speci�c nature.

Before proceeding to de�ne our problem, we �rst introduce somerelated concepts of brain network. Herea�er we use “brain network”and “graph” interchangeably whenever there is no ambiguity.

De�nition 2.1. (Brain Network) A brain network (or connec-tome), is a weighted graph G = (V ,E,W) with |V | nodes and |E |edges re�ecting brain regions of interest (ROIs) and connectivi-ties between ROIs, respectively. �e weights or adjacency matrixin W represent the connectivity degree, where a larger weightcorresponds to a higher connectivity degree, re�ecting strongerfunctional correlations in fMRI and tighter �ber connections inDTI.

Cross Edge

Bridge node

Internal Node

Module 1

Module 3

Module 2

Figure 1: A brain network example, associated with mod-ules, cross edges, internal and bridge nodes

�e module structure of brain network is critical in the study ofbrain. To facilitate brain analysis, brain network mining is requiredto preserve the module structure. It can be characterized by localityand similarity. �e locality describes that nodes in the same moduleare connected tightly while the similarity describes that nodes inthe same module o�en have similar neighborhood structure. �ede�nition of “module” is given as follows.

De�nition 2.2. (Module) A module (also called community orgroup) in a graph is a subset of nodes which are densely connectedto each other, but sparsely connected to the nodes in other modules.�e node with all its neighboring nodes in the same module isinternal node. �e node with neighboring nodes belong to thedi�erent modules is bridge node.

To be�er understand and illustrate the module structure, anexample of brain network is shown in Figure 1, in which lobeFrontal, lobe Parietal, lobe Temporal are represented by the modulesin blue, yellow and green respectively. �ere are a number of nodesrepresenting brain regions and a number of edges representingconnections between pairs of regions. Among these nodes, whitenodes indicate the internal nodes and red nodes indicate the bridgenodes. Between di�erent modules, bridge nodes are connected bycross edge marked in blue. One can notice that within each module,the nodes are connected tightly to each other than the nodes inother regions, which performs strong locality and similarity.

Now we formally de�ne the problem studied in this paper. GivenN brain networks G = {G1,G2, · · · ,GN }, we denote their corre-sponding connectivity matricesW = {W1,W2, · · · ,WN }, whereWi ∈ RN×N . LetH = {h1, ..., hN }. We are concerned with discov-ering a good vector representation hi for each brain network Gi .Speci�cally, we are interested in learning a representation functionf : G → H , which maps each brain network Gi ∈ G into a dis-criminative, highly-nonlinear, structure-preserving, and clinicallymeaningful representation hi ∈ H . �e representation function isestimated on a collection of brain networks and their connectivitymatrices.

Deep learning methods have powerful representation learningcapability, especially in capturing the highly-nonlinear representa-tion. Among them Convolutional Neural Network (CNN) providesan e�cient framework to discover the highly-nonlinear, discrimi-native and task-focus representation. It can capture the local sta-tionary structures hierarchically, which has achieved successes inmany �elds, such as computer vision, signal processing and naturelanguage processing. However, applying CNN straightforwardlyto brain network is not reliable. �e connection between di�erentbrain regions lies on an irregular or non-Euclidean domain. �is

kind of edge is a universal representation of heterogeneous pairwiserelation. In terms of the connectivity matrices, the receptive �eldof CNN can not encode the module structure well. �erefore it isusually di�cult to identify common pa�erns and di�erent pa�ernsamong di�erent subjects. Here we formulate a graph reorderingtechnique to generalize CNN to learn a structure-preserving repre-sentationH . Its de�nition is as follows.

De�nition 2.3. (Graph Reordering) Given a collection of unla-beled graphs G = {G1,G2, · · · ,GN }, the goal of graph reorderingis to �nd a labeling ` such that for any two graphs Gi ,G j ∈ Gdrawn uniformly at random from G, the expected di�erence be-tween the distance of the graph connectivity matrices based on `and the distance of the graphs in graph space is minimized. Let dGbe a distance measure on graphs G, and dW be a distance measureon connectivity matricesW. It can be formulated as the followingoptimization problem:

arg min`EG[‖dW(W`

i ,W`j ) − dG(Gi ,G j )‖] (1)

Graph reordering is essentially a graph labeling procedure, whichassigns labels to the nodes or edges, or both. In general, Eq. (1) is adi�cult combinatorial problem. Solving this problem would requiren! trials in the worst case. �erefore optimal graph reordering is NP-hard and computationally infeasible. Alternatively, we borrow theidea of graph compression [8], handling this problem by performingmodule identi�cation in brain.

In the following, we will elaborate our approach towards aboveconcepts.

3 PROPOSED METHODAs discussed in the introduction, learning good feature represen-tations from brain networks is challenging due to its complex andnon-linear structure. To overcome these challenges, we propose astructural deep brain network mining (SDBN) method. An overviewof SDBN is shown in Figure 2, which consists of three main steps:(1) Graph Reordering; (2) Structural Augmentation; and (3) Convo-lutional Deep Feature Learning. �ese steps collectively addressthe challenges of brain network representation learning.

3.1 Graph Reordering�e objective of graph reordering is to leverage graph labelingprocedure to reorder the nodes in the connectivity matrix, wherethe di�erence of distances from the connectivity matrices of graphsis minimized, in the sense that nodes close to each other are morelikely to have strong connections. In this way, a permutation ofthe nodes across di�erent brain networks is established. By graphreordering, nodes belonging to the same module will be locatedclose to each other [13] and the structural topology within thegraphs is very similar [33]. �is would make CNN more able tobring out the latent network structures shared by the di�erentnetworks. However, as the optimal graph reordering is NP-hard, it isinfeasible to solve Eq. (1) directly. Alternatively, we borrow the ideafrom graph compression [2] to address this problem. �e intuitiveidea of graph compression is to produce a compressed graph of agiven weighted graph by �nding an ordering of nodes, such that theconnectivity matrix is close to block-diagonal. Speci�cally, it wasnoted that �nding modules is a useful conceptual tool for analyzing

Graph ReorderingMapping

Structure-Preserving

Connectivity Matrix

Convolutional

Deep Feature

Learning

Feature

Representation

Brain Disorder

Classification

Label

Structural

Augmentation

Connectivity

MatrixBrain Network

Reordered

Connectivity Matrix

G W W X = [W;M]

Figure 2: An overview of the proposed SDBN method. Connectivity matrix, reordered connectivity matrix and structure-preserving connectivity matrix are represented in heat map, where high heat color indicates strong connection. �e structure-preserving connectivity matrix consists of two channels: the �rst channel is the reordered connectivity matrix with theweights of the o�-block diagonal regions being lowered. �e second channel is the module identi�cation matrixM

relations between the original and compressed graphs. Based onthis insight, we identify the modules among brain networks toapproximate the optimal graph reordering.

Currently various methods have been used for module identi�ca-tion. Among the existing methods, spectral clustering o�en yieldsmore superior performance compared to other methods. Spectralmethods have also been successfully applied to graph reorderingproblems [24]. �erefore, we employ the spectral clustering to per-form module identi�cation. �e key idea of spectral clustering is toconvert a clustering problem into a graph partitioning problem andthen solve this problem by means of matrix theory. Let K be thenumber of modules to be identi�ed, and W ∈ RN×N be the averageconnectivity matrix across all graphs. �en, spectral clustering canbe formulated as follows:

minF

N∑i, j=1

wi j fi − fj 2

2 = tr(FTLF

)s.t. FTF = IK (2)

where tr (·) denotes the trace function, a superscript T denotestransposition, IK denotes the identity matrix with size K , and L isthe Laplacian matrix [45] obtained from W.

By applying K-means to the eigenvectors of the Laplacian ma-trix L, we can obtain K modulesM = {M1,M2, · · · ,MK }, withV = M1 ∪M2 ∪ · · · ∪MK and Mi ∩Mj = ∅ for every pair i, j withi , j . We treat the module sequence 1, 2, · · · ,K as a permutation ofnodes. Based on this, the reordered connectivity matrix W for eachbrain network can be established. Since all reordered connectivitymatrices are obtained according to the global module structure, dif-ferent subjects are more comparable. An example of the reorderedconnectivity matrices for abnormal and normal subjects is shownin the second column (b) and (e) of Figure 3.

3.2 Structural AugmentationAccording to the graph reordering procedure above, we obtain thereordered connectivity matrix for each brain network. �is repre-sentation is more meaningful and bene�cial to CNN architecture.However, there are still two problems a�ecting the use of CNN.�e �rst one is the spatial information loss caused by CNN itself.�e second one is the noise in brain network. To address these

problems, we propose a structural augmentation approach to fur-ther enhance the spatial information of the reordered graph forstructure-preserving and noise-robust feature learning.

CNN adopts max pooling to perform invariant properties totranslation, rotation and shi�ing, which results in the spatial infor-mation loss. However, the spatial relation of each receptive �eldis also important structure information for brain network mining.Given the top layer of the network, the original spatial informationis lost. Naturally, the clinical structure pa�erns only occur in spe-ci�c brain regions and those occurring in the other brain regionsare usually invalid.

To tackle this problem, we augment the reordered connectivitymatrix with an additional channel. Speci�cally, we de�ne a moduleidenti�cation matrix M to further encode the module structureinformation, whose element is:

mi j =

{k for vi ,vj ∈ Mk , i, j = 1, ...,N0 for otherwise

(3)

�e constructed M is a block-diagonal matrix. It is concatenatedwith the reordered connectivity matrix as an additional channel asshown in Figure 2. With such an information augmentation, wecould preserve the spatial information during the feature learningprocedure.

On the other hand, brain networks usually su�er from noise,which is introduced by the error in the acquisition and in the imageanalysis. For the reordered connectivity matrix obtained from graphreordering, the intra-module neighbor edges reside in the block-diagonal region and the cross edges lie in the o�-diagonal region.�e intra-module neighbor edges preserve the structure within eachmodule, which are more reliable to learn structure-preserving graphrepresentation, while the cross edges are more complicated whichmay have negative e�ects to study the modular structure and someof them may be invalid connections [18]. To alleviate the e�ects ofcross edges, we further augment the reordered connectivity matrixW by lowering the weights of the o�-block diagonal regions suchthat

wi j =

{1 for i, j ∈ Mk ,k = 1, ...,Kϵ for i, j < Mk ,k = 1, ...,K (4)

An example of the resulted new reordered connectivity matrices forabnormal and normal subjects is shown in the third column (c) and

abnormal

normal

original reordered structure-preserving

Figure 3: �e�rst row and second row show the connectivitymatrices of the abnormal subjects and that of the normalsubjects, respectively. From le� to right, the �rst columnFigures (a) and (d) present the original connectivitymatricesmapped from the brain network, the second column Figures(b) and (e) indicate the reordered connectivity matrices a�erthe graph reordering and the third column Figures (c) and (f)show the �rst channel of structure preserving connectivitymatrices.

(f) of Figure 3. Together with module information augmentation,we obtain the structure-preserving connectivity matrix X = [W;M]as shown in Figure 2, which encodes more structural informationand is bene�cial for CNN.

3.3 Unsupervised Learning Augmented CNNIn this section, the resulting connectivity matrices are fed into theCNN architecture for feature learning. Speci�cally, we introducea novel CNN architecture to learn highly non-linear feature rep-resentations from the brain network data while dealing with theproblem of small sample size and high dimensionality.

CNN is a supervised feature learning technique, which hasdemonstrated state-of-the-art performance for image analysis [27].Recent works [14] [39] [43] commonly nest a group of convolu-tional layers and a pooling layer to construct the basic featureextractor in di�erent scales. �e extracted internal representationpreserves highly-nonlinear properties. Consider the feature extrac-tors in di�erent scales as cells, the input data goes over a numberof convolution-pooling cells and fully connected layer. Finally, theyare fed to the top-layer classi�er. �e CNN with L cells can bede�ned as follows

Xl = fl (Xl−1;Θl ) for l = 1, 2, ...,L + 1 (5)

whereX0 = X is the input,which is the resulted structure-preservingconnectivity matrix. fl (·) for l = 1, 2, ...,L are the feature mappingfunctions of the l − th cell with the parameter Θl . �e �nal classi�-cation loss can be represented as follows

C(X0,y) = `(XL+1,y) (6)

where y corresponds to the ground truth label, which indicates thepatients’ disease diagnose result, and ` indicates the classi�cationloss function.

conv1 pool1 conv2 pool2

deconv1 depool1 deconv2 depool2

fc3 fc4 input

output

probability

softmax loss

label

reconstruction loss

X0 X1 X2

X0 X1

(a) DC-�rst: CNN architecture with reconstruction loss at the �rst layer



fc3 fc4 input

output

probability

reconstruction losssoftmax loss

label

output

reconstruction loss

X0 X1

X0 X1

X2

(b) DC-all: CNN architecture with reconstruction loss at the all layers



fc3 fc4 input

output

probability

reconstruction losssoftmax loss

label

output

reconstruction loss

X0 X1

X0 X1

X2

(c) DC-layerwise: CNN architecture with layerwise reconstruction loss

Figure 4: CNN architecturewith three types of unsupervisedlearning augmentation

Given a set of subject data X1,X2, ...,XN with correspondinglabelsy1,y2, ...,yN , the CNN is trained by minimizing the objectivefunction as follows

1N

N∑i=1

C(Xi ,yi ) (7)

However, CNN usually requires a large amount of labeled datafor training, which is infeasible for current brain health research,due to the limited number of available patients of interest andthe high costs of acquiring the data. It could potentially limit theperformance of CNN.

Motivated by the recent a�empts [28, 35, 41, 49] to couple theunsupervised and supervised learning in the same phase, makingthe unsupervised objective being able to impact the network train-ing a�er supervised learning took place, we augment the auxiliaryunsupervised learning objective function to the supervised learningobjective function to address the above problem. �e joint objectivefunction is as follows

1N

N∑i=1(C(Xi ,yi ) + λU (Xi )) (8)

where U (·) indicates the unsupervised objective function, whichcan be one or more auxiliary pathways that are a�ached to theconvolution-pooling cells in the original CNN.

Speci�cally, given the CNN architecture for classi�cation, wetake the sub-network consists of all the convolution-pooling cellsas the encoding pathway, and build a mirrored decoder network asan auxiliary pathway of the original network. �e fully connectedlayers are not used for reconstruction, since they are more relatedto supervised learning objective. For a CNN with 2 convolution-pooling cells, we consider three types of unsupervised learning

augmentation via decoding: DC-�rst, DC-all and DC-layerwise.�eir frameworks are shown in Figure 4. �e decoding pathway isde�ned as follows

XL = XL , Xl−1 = f decl (Xl ;Θ′l ), X = X0 (9)

where Θ′l are decoder parameters.Figure 4(a) shows the framework of DC-�rst. In DC-�rst, the

CNN is just augmented with a stacked autoencoder. �e decodingstarts from the pooled feature generated by the second poolinglayer and goes to reconstruct the original input. Reconstructionerrors are measured between the �nal reconstruction input and theCNN input. �e auxiliary training signals of DC-�rst come fromthe bo�om of the decoding pathway and are incorporated with thetop-down signals from the supervised learning into the encoderpathway. �e corresponding unsupervised loss is as follows:

UDC−f ir st (X) = | |X − X| |22 (10)

Figure 4(b) shows the framework of DC-all. Di�erent from DC-�rst, it allows more gradients to �ow directly to the previous cells.�is model makes the autoencoder perform be�er mirrored archi-tecture by matching activations for all the cells. Its unsupervisedlearning loss is as follows:

UDC−all (X) =L−1∑l=1| |Xl − Xl | |22 (11)

Figure 4(c) shows the framework of DC-layerwise, which isanother autoencoder variant with layerwise decoding architecture.It reconstructs the output activations of every cell to its input. Itsunsupervised learning loss is the same as DC-all. Di�erent fromDC-all, its decoding pathway is de�ned as follows:

Xl−1 = f decl (Xl ;Θ′l ) (12)

�e �rst two methods encourage top-level convolution featuresto preserve as much information as possible. In contrast, the decod-ing pathways in DC-layerwise concentrates on inverting the cleanintermediate activations to the input of the associated cell-layer,performing parallel layerwise training.

�e training procedure of our proposed method is as follows:(1) Train the CNN with labels; (2) Train “layerwise” decoding path-ways; (3) Train the top-down decoding pathways; and (4) Fine-tunethe entire augmented network. With the help of the proposeddeep convolutional feature learning framework, we can addressthe three problems above and learn discriminative graph featurerepresentations.

4 EXPERIMENTS AND RESULTSIn this section, we evaluate the proposed methods on four real-world datasets. We �rst introduce the datasets used in the experi-ments, the compared methods and the experimental se�ing. Next,we show the performance of compared methods on four brain net-work datasets described below. We further conduct several casestudies, including the e�ectiveness of graph reordering using di�er-ent spectral clustering con�gurations (corresponding to di�erentLaplacian matrices) and the e�ectiveness of unsupervised learningaugmentation.

Table 1: �e statistics for each dataset

Name HIV-fMRI HIV-DTI BP-fMRI ADHD-fMRICategory fMRI DTI fMRI fMRI�antity 77 77 97 776Size 90 x 90 90 x 90 82 x 82 90 x 90

4.1 Data CollectionIn the experiments, we evaluate the proposed methods on fourbrain network datasets from three di�erent neurological disorders.• HIV-fMRI-77 & HIV-DTI-77: Chicago Early HIV Infection

Study at Northwestern University [34] collected both fMRI andDTI data for 56 HIV (positive) and 21 seronegative controls (nega-tive). By propagating the Automated Anatomical Labeling (AAL)to each image [44], each brain is represented as a graph with 90nodes corresponding to cerebral regions (45 for each hemisphere).A detailed description about data acquisition and preprocessingis available in [5].

• BP-fMRI-97: �is dataset consists of 52 bipolar I subjects whoare currently in euthymia and 45 age and gender matched healthycontrols. Using the Freesurfer-generated cortical/subcorticalgray ma�er regions, functional brain networks were derivedusing pairwise BOLD signal correlations of 82 brain regions. Adetailed description about data acquisition and preprocessing isavailable in [6].

• ADHD-fMRI-776: �is dataset is collected from ADHD-200global competition dataset1. �e dataset contains records ofresting-state fMRI images for 776 subjects, which are labeled asreal patients (positive) and normal controls (negative). Similarly,each brain is parcellated into 90 brain regions.�e statistics for each dataset is summarized in Table 1. �e

HIV-fMRI, HIV-DTI, BP-fMRI are relatively small datasets, whilethe ADHD-fMRI is a relatively big dataset.

4.2 Compared MethodsTo evaluate the performance of the proposed method, we compareit with other state-of-the-art methods. All the compared methodsare summarized as follows:• gSSC: It is a semi-supervised subgraph feature selection method

[25] based upon both labeled and unlabeled graphs.• Freq: �is method is an unsupervised subgraph feature selection

method based upon frequency. �e top-n frequent subgraphfeatures are selected.

• Conf, Ratio, Gtest, HSIC: �ese methods are supervised sub-graph selection [26] based upon con�dence, frequency ratio,G-test score and HSIC, respectively. �e top-n discriminativesubgraph features are selected in terms of di�erent discrimina-tion criteria.

• AE: It is an unsupervised feature learning method based on au-toencoder [48] which learns the latent representation of all theconnectivity data without considering any structural informa-tion.

1h�p://neurobureau.projects.nitrc.org/ADHD200

• CNN: It is a plain vanilla convolutional architecture stackingthe convolution layer, pooling layer, non-linear gating layerand fully connected layer. �e graph reordering and structuralaugmentation is not applied.

Since the proposed SDBN consists of several components, we con-sider several variations of SDBN as follows:

• CNN-GN: It is a convolutional architecture with graph reorder-ing of nodes. It has three versions: SP, NJW and SM [45], eachof which uses a di�erent spectral clustering method. SP indicatesthe graph reordering using the unnormalized spectral cluster-ing, which aims to minimize ratio cut. NJW indicates the graphreordering with the normalized spectral clustering by Ng-Jordan-Weiss Algorithm [32]. SM indicates the graph reordering withnormalized spectral clustering by Shi and Malik [38], which aimsto minimize normalized cut.

• CNN-GN-A: It is a convolutional architecture with graph re-ordering and structural augmentation.

• CNN-GN-A-DC: It is a convolutional architecture with graphreordering, structural augmentation and unsupervised learningaugmentation. It has three variations: DC-�rst, DC-all, DC-layerwise. DC-�rst uses the augmentation with reconstructionloss at the �rst layer. DC-all uses the augmentation with recon-struction loss at all layers. DC-layerwise uses the augmentationwith a layerwise architecture.• SDBN: It is the method proposed in this work which can learn

structural-preserving graph feature representation that lies in ahighly non-linear latent space. �ere are two versions of SDBN:one with a shallow architecture (SDBN-S) and one with a deeparchitecture (SDBN-D). We select the best parameter con�gura-tion for them.

4.3 Experimental SettingTo evaluate the quality of the learned brain network representation,we feed the learned feature representation to a so�max classi�erfor each compared method. For the subgraph mining methods, thethreshold for four dataset is 0.9, 0.3, 0.5, 0.5, respectively. We selecttop-100 features for classi�cation as in [48].

For SDBN, we �rst select the con�guration of the graph reorder-ing. By empirical study, we choose the SM spectral clusteringmethod and set the cluster size K = 9. �en we choose the con-�gurations to build CNN. We build a convolutional architecturewith di�erent layers as shown in Table 2. It stacks the convolutionlayer, pooling layer, non-linear gating layer and fully connectedlayer. For a fair comparison, we set the number of the neurons inthe fully connected layer to 100. SDBN-S has the following setup:LayerC-LayerP-LayerN-LayerF-Layer-N, while SDBN-D has thefollowing setup: LayerC-LayerP-LayerN-LayerC-LayerP-LayerN-LayerF-LayerN. To prevent the over�ting problem, the dropout isadded a�er the fully connected layer with dropout rate being 0.5.In the experiment, we apply AdaGrad [11], an adaptive gradient-based optimization approach that automatically determines a per-parameter learning rate, for parameter learning. Since there arelimited number of subjects in the datasets, we perform a 3-fold crossvalidation as in [3] on balanced datasets and report the averageresults.

Table 2: �e con�guration of di�erent cell-layers in the con-volution architecture. LayerC, LayerP, LayerN and LayerFindicate the convolution layer, the pooling layer, the non-linear gating function and the fully connected layer, respec-tively.

Name LayerC LayerP LayerN LayerFType conv max-pool gating fcParameter 5 × 5 × 50 2 × 2 relu 100

4.4 Performance on Brain Disorder DetectionIn this section, we investigate the e�ectiveness of the learnedstructure-preserving graph feature representations for brain dis-ease detection. �e average classi�cation performance are shownin Table 3. Two evaluation metrics are used, including accuracyand F1 score. It is easy to see that SDBN outperforms all the otherbaseline methods on all four datasets. Compared with subgraphmining methods that search in a potentially exponential space,the proposed methods achieve be�er performance and consumearound 83% less memory and computation resources. In additionalwe can see that AE has a li�le be�er performance than the subgraphmining methods in most of cases. �is is probably because, theconnectivity pa�erns learned from AE can capture the non-linearcomplex structure in brain network. CNN is nearly the worst oneamong all the models, since applying CNN directly to brain networkis not reliable. Compared with the shallow feature learning meth-ods, our shallow model SDBN-S outperforms AE. �is is due to thefact that SDBN can identify more complex graph representations,while AE can only identify simple representations. It supports ourpremise that more discriminative graph feature representation canbe identi�ed by SDBN. Considering the deepness, our deep architec-ture SDBN-D achieves a be�er result than the shallow architectureSDBN-S, since the features learned from the deep architecture cancapture hierarchical and highly non-linear structure pa�erns. Itis also worth noticing that deep model has limited improvementcompared with shallow model on small data set such as HIV-fMRI,HIV-DIT, BP-fMRI, while shows signi�cant improvement in the arelative big dataset such as ADHD-fMRI. When the size of datasetis big, the deep learning methods show their superiority.

4.5 Case Study of Graph ReorderingIn this part, we study the importance of graph reordering in theproposed method. We compare the performance of CNN with andwithout graph reordering. In addition, we study how the graphreordering using di�erent spectral clustering methods and clustersizes a�ects the quality of learned feature representation.

We conduct experiments for brain disorder detection with threedi�erent spectral clustering methods and two cluster numbers onall four datasets. �e clustering methods used in the experimentsinclude SP, NJW, and SM [45] as mentioned above. Two clustersizes 6 and 9 are considered. �e average accuracies are reportedin Figure 5. Compared the CNN without graph reordering, CNNswith graph reordering have a signi�cant gain except for the onesusing SP. It indicates the importance of graph reordering. As an un-normalized spectral clustering method, SP has a limited clusteringability, especially for brain networks. It has limited improvement

Table 3: Performances of the compared methods. �e results are reported as “average performance + (rank)”.

Dataset Measure MethodgSSC Freq Conf Ratio Gtest HSIC AE CNN SDBN-S SDBN-D

HIV-fMRI-77 ACC 60.0 (4) 54.3 (8) 58.6 (6) 54.3 (8) 50 (9) 58.7 (5) 62.9 (3) 55.2 (9) 65.3 (2) 66.5 (1)F1 62.5 (4) 58.2 (8) 64.2 (3) 62.0 (5) 52.5 (10) 59.5 (7) 60.4 (6) 56.7 (9) 65.4 (2) 66.7 (1)

HIV-DIT-77 ACC 59.5 (4) 64.6 (3) 52.4 (9) 59.3 (5) 59.3 (5) 57.9 (6) 57.8 (7) 54.3 (8) 64.7 (2) 65.9 (1)F1 59.6 (4) 63.9 (3) 46.1 (10) 57.9 (7) 58.5 (5) 58.3 (6) 56.8 (8) 53.5 (9) 64.5 (2) 65.6 (1)

BP-fMRI-97 ACC 56.7 (5) 56.8 (4) 50.8 (10) 54.2 (8) 55.2 (6) 54.9 (7) 60.4 (3) 53.8 (9) 63.0 (2) 64.8 (1)F1 57.8 (4) 57.6 (5) 49.1 (10) 53.7 (8) 53.9 (7) 55.8 (6) 59.8 (3) 52.1 (9) 62.5 (2) 63.7 (1)

ADHD-fMRI-776 ACC 55.7 (4) 48.5 (7) 54.5 (6) 48.5 (7) 54.5 (6) 46.1 (8) 61.3 (3) 55.3 (5) 68.9 (2) 71.2 (1)F1 55.9 (4) 55.0 (5) 51.4 (7) 55.0 (5) 51.4 (7) 50.5 (8) 60.1 (3) 54.2 (6) 67.6 (2) 69.9 (1)

CNN SP NJW SM SP NJW SM54

56

58

60

62

64

Acc

urac

y

CNN w/GN (k=6) CNN w/GN (k=9)

(a) HIV-fMRI-77


55

56

57

58

59

60

Acc

urac

y


(b) HIV-DTI-77


54

55

56

57

58

59

60

Acc

urac

y


(c) BP-fMRI-97


56

58

60

62

64

66

Acc

urac

y


(d) ADHD-fMRI-776

Figure 5: Case study of graph reordering. In each bar chart,the CNN methods with and without graph reordering (GR)are compared. For CNN w/GN, di�erent cluster numbers mare tested.

on HIV-DTI, BP-fMRI and ADHD-fMRI compared with CNN with-out graph reordering. �e clusters it produces cannot preservethe underlying structure well and are not helpful for CNN featurelearning. For the rest of two clustering methods, SM is slightlybe�er than NJW, due to its be�er clustering ability. Hence, the spec-tral clustering method is the key to graph reordering. Comparedwith graph reordering using di�erent cluster sizes, the ones with9 clusters are slightly be�er than the ones with 6 clusters, whichcon�rms the same observation in [31]. It indicates that the clustersize is also an important factor for graph reordering. Overall, thequality of spectral clustering is important to graph reordering andgood spectral clustering will produce a graph reordering close tothe optimal graph reordering.

Spectral clustering not only supports graph reordering but alsoexposes the rich graph structure of brain networks. �e generatedclustering information can be reused for structural augmentation.In order to test its e�ectiveness, we add the modular structurematrix during the feature learning process. It is found that withstructural augmentation, we usually have about 1% improvementbased on CNN with good graph reordering.

SM (k=9)CNN w/GN

DC-first DC-all DC-layerwise63.0

63.5

64.0

64.5

65.0

65.5

66.0

66.5

67.0

Acc

urac

y

(a) HIV-fMRI-77SM (k=9)

CNN w/GNDC-first DC-all DC-layerwise

58

59

60

61

62

63

64

65

66

Acc

urac

y

(b) HIV-DTI-77

SM (k=9)CNN w/GN

DC-first DC-all DC-layerwise59

60

61

62

63

64

65A

ccur

acy

(c) BP-fMRI-97

SM (k=9)CNN w/GN

DC-first DC-all DC-layerwise64

65

66

67

68

69

70

71

72

Acc

urac

y

(d) ADHD-fMRI-776

Figure 6: Case study of unsupervised learning augmenta-tion. In each bar chart, themethod with good graph reorder-ing (SM,k=9), and themethodswith additional unsupervisedlearning augmentation are compared.

In conclusion, graph reordering is critical to learn graph repre-sentation directly from the brain network data using CNN archi-tecture. A�er graph reordering, the structure-preserving locallyconnectivity regions can be extracted and serve as the receptive�elds of CNN, allowing the CNN framework to learn e�ective graphrepresentations.

4.6 Case Study of Unsupervised LearningAugmentation

In this part, we investigate the e�ectiveness of unsupervised learn-ing augmentation in the proposed method. With the best reorder-ing con�guration from previous section, we compare CNN withand without unsupervised learning augmentation. We conductexperiments for brain disease detection with three di�erent unsu-pervised learning augmentations on all four datasets. We take thesub-network of CNN composed of all the convolution-pooling lay-ers as the encoder and generate a fully mirrored decoder networkas an auxiliary pathway of the original network. We measure thereconstruction loss in three ways: DC-�rst, DC-all, DC-layerwiseas described above and augment the supervised loss with these

reconstruction loss. �e average accuracies are shown as in Fig-ure 6. We can see that on four datasets, CNNs with unsupervisedlearning augmentation outperform the ones without augmentation.It demonstrates the e�ectiveness of unsupervised learning augmen-tation, and the performance of di�erent augmentation methods arecomparable. Among them, the DC-layerwise is slightly be�er thanthe other two methods, the DC-all is slightly be�er than DC-�rst.

Overall, unsupervised learning augmentation is helpful for learn-ing graph representation. �is can be explained in two folds. First,the spatial information is critical for brain network analysis. �eclinical disrupted conductivities is module inspired and usuallyhappens among certain ROIs. �e traditional CNN’s max poolingoperation preserves the spatial invariance properties but removesthe spatial information. Reconstruction objective is spatial corre-sponded, which can help to keep the spatial information. Second,the decoding pathways help the supervised objective reach a be�eroptimum and regularize the �nal objective solution. �ird, thelayer-wise reconstruction loss is an e�ective way to regularize thenetwork training.

5 RELATEDWORKOur work is related to deep learning, graph reordering and braindata mining. We brie�y discuss them in the followings.

5.1 Deep LearningRepresentation learning has long been an important problem ofmachine learning and many works aim at learning representationsfor samples [1]. Recent years, deep learning is becoming moreand more popular, due to its powerful ability of learning featurewith di�erent kinds properties and can be applied for di�erent kindof data. It made success in di�erent �elds, such as computationvision [14, 27, 39, 43], nature language processing [40] and speechrecognition [21]. Several recent works have been proposed whichshow neural architectures for graph input data [12, 20, 29], thesemethods are trying to apply the convolutional neural networkon the simple graph structure data. However, to the best of ourknowledge, there have been few deep learning works on brainnetwork [48], especially learning brain network representations.

5.2 Graph ReorderingGraph Reordering is very active in the �eld of data base and datamining. It is widely used for the graph compression task. whichfocus on �nding homogeneous regions in the graph so that nodesinside a region are tightly connected to each other than to nodes inother regions. In terms of the adjacency matrix, the target was to�nd an ordering of nodes so that the adjacency matrix is close toblock-diagonal, containing more “square” blocks. Spectral cluster-ing [32, 38], co-clustering [10], cross-associations [7], and shingle-ordering [8] are typical examples for such approaches. Graph re-ordering also has some other applications. Recently, [33] proposeda reordering method for the nodes of the neighborhood graph soas to map from the unordered graph space to a vector space with alinear order and try to learn from graph with the help of CNN withconsidering the node a�ribute.

5.3 Brain Data MiningApplications of data mining techniques to brain network data anal-ysis are related to our work as well. Wee et al. used clusteringcoe�cients of each brain region and a multi-kernel SVM for earlydetection of brain abnormalities in Alzheimer’s disease [46]. Jieet al. used a graph kernel and integrate local connectivity andglobal topological properties of fMRI brain networks [23]. Konget al. proposed a discriminative subgraph selection approach foruncertain fMRI networks [6, 26]. Cao et al. proposed a supervisedtensor factorization technique to analyze EEG brain networks [4].Ma et al. investigate the interior-node topology of brain networks[31]. Zhang et al. presented an autoencoder architecture with theguidance of side information for brain network analysis [48]. Heet al. proposed several tensor based methods to analyze brain data[15–17]. In contrast to the existing works, this work presents anovel approach to discover highly non-linear pa�erns from brainnetwork data based on a deep convolutional architecture.

6 CONCLUSION AND OUTLOOKIn this paper, a structural deep brain network mining method,namely SDBN, is proposed to learn structural preserving and clini-cally meaningful representation from brain network. SDBN consistsof several components: (1) Graph Reordering (2) Structural Aug-mentation, and (3) Convolutional Deep Feature Learning. Basedon spectral clustering, graph reordering is proposed to impose anorder on the nodes of the neighborhood graph so as to preserve theglobal and local structure of the brain network. Next, we performstructural augmentation to further enhance the spatial informationof the reordered graph. �en a deep feature learning frameworkis introduced, which joint supervised and unsupervised learningin a small-scale se�ing by augmenting existing supervised CNNwith decoding pathways for unsupervised reconstruction. With thehelp of the multiple layers of non-linear mapping, it can capturethe highly non-linear network structure of the brain network. Toevaluate our methods, we conduct the experiment on four real brainnetwork datasets for disease diagnoses. �e experiments show thatSDBN can discover discriminative and meaningful structural graphrepresentation for brain disorder diagnosis. Since the proposeddeep feature learning framework is end-to-end and task-oriented,its application is not limited to binary disease classi�cation. It canbe easily extended to the other clinical task with objectives such asmulti-class classi�cation, clustering, regression and ranking. Weplan to apply our framework for the other medical task.

7 ACKNOWLEDGEMENTS�is work is supported in part by NSF through grants IIS-1526499and CNS-1626432, and NSFC through grants 61503253 and 61672313,and NIH through grant R01-MH080636. We gratefully acknowledgethe support of NVIDIA Corporation with the donation of the TitanX GPU used for this research.

REFERENCES[1] Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A

review and new perspectives. IEEE Transactions on Pa�ern Analysis and MachineIntelligence, 35(8):1798–1828, 2013.

[2] Paolo Boldi and Sebastiano Vigna. �e webgraph framework i: compressiontechniques. In Proceedings of the 13th International Conference on World Wide

Web, pages 595–602, 2004.[3] Bokai Cao, Lifang He, Xiangnan Kong, Philip S. Yu, Zhifeng Hao, and Ann B.

Ragin. Tensor-based multi-view feature selection with applications to braindiseases. In IEEE International Conference on Data Mining, pages 40–49, 2014.

[4] Bokai Cao, Lifang He, Xiaokai Wei, Mengqi Xing, Philip S. Yu, Heide Klumpp,and Alex D Leow. t-bne: Tensor-based brain network embedding. In Proceedingsof the SIAM International Conference on Data Mining, 2017.

[5] Bokai Cao, Xiangnan Kong, Jingyuan Zhang, Philip S. Yu, and Ann B. Ragin. Iden-tifying HIV-induced subgraph pa�erns in brain networks with side information.Brain Informatics, 2015.

[6] Bokai Cao, Liang Zhan, Xiangnan Kong, Philip S. Yu, Nathalie Vizueta, Lori LAltshuler, and Alex D Leow. Identi�cation of discriminative subgraph pa�ernsin fmri brain networks in bipolar a�ective disorder. In International Conferenceon Brain Informatics and Health, pages 105–114, 2015.

[7] Deepayan Chakrabarti, Spiros Papadimitriou, Dharmendra S Modha, and Chris-tos Faloutsos. Fully automatic cross-associations. In Proceedings of the 10thACM SIGKDD International Conference on Knowledge Discovery and Data Mining,pages 79–88, 2004.

[8] Flavio Chieriche�i, Ravi Kumar, Silvio La�anzi, Michael Mitzenmacher, Alessan-dro Panconesi, and Prabhakar Raghavan. On compressing social networks. InProceedings of the 15th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, pages 219–228, 2009.

[9] Ian Davidson, Sean Gilpin, Owen Carmichael, and Peter Walker. Networkdiscovery via constrained tensor analysis of fmri data. In Proceedings of the 19thACM SIGKDD International Conference on Knowledge Discovery and Data Mining,pages 194–202, 2013.

[10] Inderjit S Dhillon, Subramanyam Mallela, and Dharmendra S Modha.Information-theoretic co-clustering. In Proceedings of the 9th ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining, pages 89–98,2003.

[11] John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methodsfor online learning and stochastic optimization. Journal of Machine LearningResearch, 12:2121–2159, 2011.

[12] David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell,Timothy Hirzel, Alan Aspuru-Guzik, and Ryan P Adams. Convolutional networkson graphs for learning molecular �ngerprints. In Advances in Neural InformationProcessing Systems, pages 2224–2232, 2015.

[13] Daniel Gamermann, Arnau Montagud, RA Infante, Julian Triana, PF de Cordoba,and JF Urchueguıa. Pynetmet: Python tools for e�cient work with networksand metabolic models. arXiv preprint arXiv:1211.7196, 2012.

[14] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learningfor image recognition. arXiv preprint arXiv:1512.03385, 2015.

[15] Lifang He, Xiangnan Kong, Philip S. Yu, Xiaowei Yang, Ann B. Ragin, and ZhifengHao. Dusk: A dual structure-preserving kernel for supervised tensor learningwith applications to neuroimages. In Proceedings of the SIAM InternationalConference on Data Mining, pages 127–135, 2014.

[16] Lifang He, Chun-Ta Lu, Hao Ding, Shen Wang, Linlin Shen, Philip S. Yu, andAnn B. Ragin. Multi-way multi-level kernel modeling for neuroimaging classi-�cation. In Proceedings of the IEEE Conference on Computer Vision and Pa�ernRecognition, 2017.

[17] Lifang He, Chun-Ta Lu, Guixiang Ma, Shen Wang, Linlin Shen, Philip S. Yu, andAnn B. Ragin. Kernelized support tensor machines. In Proceedings of the 34thInternational Conference on Machine Learning, 2017.

[18] Lifang He, Chun-Ta Lu, Jiaqi Ma, Jianping Cao, Linlin Shen, and Philip S. Yu.Joint community and structural hole spanner detection via harmonic modularity.In Proceedings of the 22nd ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, pages 875–884, 2016.

[19] RK Heaton, DB Cli�ord, DR Franklin, SP Woods, C Ake, F Vaida, RJ Ellis, SL Le-tendre, TD Marco�e, JH Atkinson, et al. Hiv-associated neurocognitive disorderspersist in the era of potent antiretroviral therapy charter study. Neurology,75(23):2087–2096, 2010.

[20] Mikael Hena�, Joan Bruna, and Yann LeCun. Deep convolutional networks ongraph-structured data. arXiv preprint arXiv:1506.05163, 2015.

[21] Geo�rey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed,Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara NSainath, et al. Deep neural networks for acoustic modeling in speech recognition:�e shared views of four research groups. IEEE Signal Processing Magazine, 2012.

[22] Shuai Huang, Jing Li, Jieping Ye, Adam Fleisher, Kewei Chen, Teresa Wu, and EricReiman. Brain e�ective connectivity modeling for alzheimer’s disease by sparsegaussian bayesian network. In Proceedings of the 17th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining, pages 931–939, 2011.

[23] Biao Jie, Daoqiang Zhang, Wei Gao, Qian Wang, Chong-Yaw Wee, and Ding-gang Shen. Integration of network topological and connectivity properties forneuroimaging classi�cation. IEEE Transactions on Biomedical Engineering, 2014.

[24] Martin Juvan and Bojan Mohar. Optimal linear labelings and eigenvalues ofgraphs. Discrete Applied Mathematics, 36(2):153–168, 1992.

[25] Xiangnan Kong and Philip S. Yu. Semi-supervised feature selection for graphclassi�cation. In Proceedings of the 16th ACM SIGKDD International Conference

on Knowledge Discovery and Data Mining, pages 793–802, 2010.[26] Xiangnan Kong, Philip S. Yu, Xue Wang, and Ann B. Ragin. Discriminative

feature selection for uncertain graph classi�cation. In Proceedings of the SIAMInternational Conference on Data Mining, pages 82–93, 2013.

[27] Alex Krizhevsky, Ilya Sutskever, and Geo�rey E Hinton. Imagenet classi�cationwith deep convolutional neural networks. In Advances in Neural InformationProcessing Systems, pages 1097–1105, 2012.

[28] Hugo Larochelle and Yoshua Bengio. Classi�cation using discriminative re-stricted boltzmann machines. In Proceedings of the 25th International Conferenceon Machine Learning, pages 536–543, 2008.

[29] Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. Gated graphsequence neural networks. arXiv preprint arXiv:1511.05493, 2015.

[30] Dijun Luo, Feiping Nie, Heng Huang, and Chris H Ding. Cauchy graph embed-ding. In Proceedings of the 28th International Conference on Machine Learning,pages 553–560, 2011.

[31] Guixiang Ma, Lifang He, Bokai Cao, Jiawei Zhang, Philip S. Yu, and Ann B. Ragin.Multi-graph clustering based on interior-node topology with applications tobrain networks. In Joint European Conference onMachine Learning and KnowledgeDiscovery in Databases, pages 476–492, 2016.

[32] Andrew Y Ng, Michael I Jordan, and Yair Weiss. On spectral clustering: Analysisand an algorithm. In Advances in Neural Information Processing Systems, pages849–856, 2002.

[33] Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. Learning con-volutional neural networks for graphs. In Proceedings of the 33rd InternationalConference on Machine Learning, 2016.

[34] Ann B. Ragin, Hongyan Du, Renee Ochs, Ying Wu, Christina L Sammet, AlfredShoukry, and Leon G Epstein. Structural brain alterations can be detected earlyin hiv infection. Neurology, 2012.

[35] Marc’Aurelio Ranzato and Martin Szummer. Semi-supervised learning of com-pact document representations with deep networks. In Proceedings of the 25thInternational Conference on Machine Learning, pages 792–799, 2008.

[36] Mikail Rubinov, Stuart A Knock, Cornelis J Stam, Si�s Micheloyannis, An-thony WF Harris, Leanne M Williams, and Michael Breakspear. Small-worldproperties of nonlinear brain activity in schizophrenia. Human Brain Mapping,30(2):403–416, 2009.

[37] Blake Shaw and Tony Jebara. Structure preserving embedding. In Proceedings ofthe 26th International Conference on Machine Learning, pages 937–944, 2009.

[38] Jianbo Shi and Jitendra Malik. Normalized cuts and image segmentation. IEEETransactions on Pa�ern Analysis and Machine Intelligence, 22(8):888–905, 2000.

[39] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks forlarge-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.

[40] Richard Socher, Alex Perelygin, Jean Y Wu, Jason Chuang, Christopher D Man-ning, Andrew Y Ng, Christopher Po�s, et al. Recursive deep models for semanticcompositionality over a sentiment treebank. In Proceedings of the Conference onEmpirical Methods in Natural Language Processing, volume 1631, page 1642, 2013.

[41] Kihyuk Sohn, Guanyu Zhou, Chansoo Lee, and Honglak Lee. Learning andselecting features jointly with point-wise gated boltzmann machines. In Proceed-ings of the 30th International Conference on Machine Learning, pages 217–225,2013.

[42] Olaf Sporns, Giulio Tononi, and Rolf Ko�er. �e human connectome: a structuraldescription of the human brain. PLoS Comput Biol, 2005.

[43] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Sco� Reed, DragomirAnguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Goingdeeper with convolutions. In Proceedings of the IEEE Conference on ComputerVision and Pa�ern Recognition, pages 1–9, 2015.

[44] Nathalie Tzourio-Mazoyer, Brigi�e Landeau, Dimitri Papathanassiou, FabriceCrivello, Olivier Etard, Nicolas Delcroix, Bernard Mazoyer, and Marc Joliot. Auto-mated anatomical labeling of activations in spm using a macroscopic anatomicalparcellation of the mni mri single-subject brain. Neuroimage, 2002.

[45] Ulrike Von Luxburg. A tutorial on spectral clustering. Statistics and computing,2007.

[46] Chong-Yaw Wee, Pew-�ian Yap, Daoqiang Zhang, Kevin Denny, Je�rey NBrowndyke, Guy G Po�er, Kathleen A Welsh-Bohmer, Lihong Wang, and Ding-gang Shen. Identi�cation of mci individuals using structural and functionalconnectivity networks. Neuroimage, 2012.

[47] Zhaoquan Yuan, Jitao Sang, Changsheng Xu, and Yan Liu. A uni�ed frameworkof latent feature learning in social media. IEEE Transactions on Multimedia,16(6):1624–1635, 2014.

[48] Jingyuan Zhang, Bokai Cao, Sihong Xie, Chun-Ta Lu, Philip S. Yu, and Ann B.Ragin. Identifying connectivity pa�erns for brain diseases via multi-side-viewguided deep architectures. In Proceedings of the SIAM International Conferenceon Data Mining, pages 36–44, 2016.

[49] Yuting Zhang, Kibok Lee, Honglak Lee, and UMICH EDU. Augmenting su-pervised neural networks with unsupervised objectives for large-scale imageclassi�cation. In Proceedings of the 33rd International Conference on MachineLearning, 2016.

[50] Jinfeng Zhuang, Ivor W Tsang, and Steven CH Hoi. Two-layer multiple kernellearning. In AISTATS, pages 909–917, 2011.

structural deep brain network mining - uic computer scienceclu/doc/kdd17_sdbn.pdf · we propose a...

Documents