efficiently learning structure · 2016-08-03 · edwin hancock department of computer science...

54
Efficiently Learning Structure Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award

Upload: others

Post on 03-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Efficiently Learning Structure

Edwin Hancock Department of Computer Science

University of York

Supported by a Royal Society Wolfson Research Merit Award

Page 2: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Structural Variations

Page 3: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Variation in object appearance

• These graphs are represent different views of the same object.

• They vary in detail, but represent the same object in different poses.

3

Page 4: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Protein-Protein Interaction Networks

Page 5: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Variations in complexity

• These graphs represent PPI’s from organisms at different stages of evolution.

• The organisms and their PPI’s are of different structure and complexity.

5

Page 6: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

6

Page 7: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Financial Data

• Represents trading in NYSE over 6000 day period.

• Nodes are stock, edges show closing price time series are correlated.

• Modularity (structure) of network changes sharply during trading crises.

7

Page 8: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Questions

• Can we learn models of structure when there are both different variants of the same object present and objects of different intrinsic complexity?

• How do we capture variance and complexity at the structural level?

8

Page 9: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Graph data• Problems based on graphs arise in areas such as language

processing, proteomics/chemoinformatics, data mining, computer vision and complex systems.

• Relatively little methodology available, and vectorial methods from statistical machine learning not easily applied since there is no canonical ordering of the nodes in a graph.

• Can make considerable progress if we develop permutation invariant characterisations of variations in graph structure.

Page 10: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Graph data• Problems based on graphs arise in areas such as language

processing, proteomics/chemoinformatics, data mining, computer vision and complex systems.

• Relatively little methodology available, and vectorial methods from statistical machine learning not easily applied since there is no canonical ordering of the nodes in a graph.

• Can make considerable progress if we develop permutation invariant characterisations of variations in graph structure.

Page 11: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Graph data• Problems based on graphs arise in areas such as language

processing, proteomics/chemoinformatics, data mining, computer vision and complex systems.

• Relatively little methodology available, and vectorial methods from statistical machine learning not easily applied since there is no canonical ordering of the nodes in a graph.

• Can make considerable progress if we develop permutation invariant characterisations of variations in graph structure.

Page 12: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Characterising graphs• Topological: e.g. average degree, degree

distribution, edge-density, diameter, cycle frequencies etc.

• Spectral or algebraic: use eigenvalues of adjacency matrix or Laplacian, or equivalently the co-efficients of characteristic polynomial.

• Complexity: use information theoretic measures of structure (e.g. Shannon entropy).

Page 13: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Complex systems

• Spatial and topological indices: node degree stats; edge density;

• Communicability: communities, measures of centrality, separation, etc. (Baribasi, Watts and Strogatz, Estrada).

• Processes on graphs: Markov process, Ising models, random walks, searchability (Kleinberg).

Page 14: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Links explored in this talk

• Structure: discriminate between graphs on the basis of their detailed structure.

• Complexity: determine whether different non-isomorphic structures are if similar or different intrinsic complexity.

• Learning: learn generative model of structure that gives minimum complexity description of training data (MDL).

Page 15: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Complexity

Information theory, graphs and kernels.

Page 16: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Graph Entropy• Entropic measures of complexity: Many possibilities -

Shannon , Erdos-Renyi, Von-Neumann.

• Problems: Difficult to compute for graphs. Require either probability distribution over nodes of graph or combinatorial (micro-canonical state) characterisation. Remains open problem in literature.

• Uses: Complexity level analysis of graphs, learning structure via description length, construct information theoretic kernels.

Page 17: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Entropy• Thermodynamics: measure of disorder in a system. Change in

entropy with energy measure temperature of system dE=TdH.

• Statistical mechanics: Entropy is measure of uncertainty of microstates of a system H=-k Sumi pi ln pi – Boltzmann.

• Quantum mechanics: Confusion of states H=-kTr[r ln r ] in terms of density matrix r for states of operator O– Von Neumann.

• Information theory: Shannon information H=- Sumi pi ln pi – in terms of probability of transmission of a message in an information channel.

Page 18: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Von Neumann entropy• Passerini and Severini – normalised Laplacian L=D-1/2(D-A) D-1/2 I is density

matrix for graph.

• Exploited to compute approximate VN entropy for undirected graphs by Han et al (PRL) 2013 and by Cheng et al (Phys Rev E 2014) for directed graphs.

• Used for graph kernel construction (Bai JMIV 2013) and learning generative models of graphs (Han SIMBAD 2011).

• Recently used as unary feature to analyse and classify complex network time series.

Page 19: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Von Neumann entropy and node degree

Page 20: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Von-Neumann Entropy

• Passerini and Severini – normalised Laplacian is density matrix for graph Hamiltonian

• Associated Von Neumann entropy is .

||

ˆln||

ˆ||

1 VVH i

V

i

iVN

λλ∑=

−=

TDADDL ΦΛΦ=−= −− ˆˆˆ)(ˆ 2/12/1

Page 21: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Simplified entropy

∑∈

−−=Evu vu

VN ddVVH

),(

1||1

||11 2

Quadratic approximation of Neumann entropy reduces to

Computed in quadratic time. Most spectral methods are at least cubic.

Some graph-entropy computations are combinatorial.

Page 22: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Properties

Based on degree statistics

Extremal values for cycle and star-graphs

Can be used to distinguish Erdos-Renyi, small worlds, and scale free networks.

Page 23: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Uses

• Complexity-based clustering (especially protein-protein interaction networks).

• Defining information theoretic (Jensen-Shannon) kernels.

• Controlling complexity of generative models of graphs.

Page 24: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Entropy component analysis

24

Page 25: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

➢ Entropy Component Analysis

Entropy component analysis- For each graph construct a 2D histogram indexed by node-degrees of each

edge- Increment each bin by entropies of edges with relevant degree configuration.- Vectorise histogram bin contents and perform PCA on sample of vectors for

different graphs.

Page 26: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Financial Market Data

• Look at time series correlation for set of leading stocks.

• Create undirected or directed links on basis of time series correlation.

Page 27: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

• Directed von Neumann entropy change during Black Monday, 1987.

• Entropy witnesses a sharp drop on Black Monday and recovers in a few trading days’ time.

Black Monday

Page 28: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Fig. 4. PCA plot for directed graph embedding on financial stock market data. Black: Black Monday, cyan and green: background, blue: dot-com bubble, red: subprime crisis.

The four clusters representing different eras are clearly seen and Black Monday is also separated, implying our graph characterization is effective.

PCA applied to entropy feature vectors: distinct epochs of market evolution occupy different regions of the subspace and can be separated. Black Monday is a clear outlier. There appears to be some underlying manifold structure.

Page 29: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Graph kernels

29

Page 30: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Jensen-Shannon Kernel

• Defined in terms of J-S divergence

• Properties: extensive, positive semidefinite.

• JSD is difference of entropy of graph-union and individual graph entropies.

{ })()()(),(

),(2ln),(

jijiji

jijiJS

GHGHGGHGGJS

GGJSGGK

+−⊕=

−=

Page 31: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural
Page 32: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural
Page 33: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Structural Learning

Page 34: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Deep learning• Deep belief networks (Hinton 2006, Bengio 2007).

• Compositional networks (Amit+Geman 1999, Fergus 2010).

• Markov models (Leonardis 200

• Stochastic image grammars (Zhu, Mumford, Yuille)

• Taxonomy/category learning (Todorovic+Ahuja, 2006-2008).

Page 35: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Description length

• Wallace+Freeman: minimum message length.

• Rissanen: minimum description length.

Use log-posterior probability to locate model that is optimal with respect to code-length.

Page 36: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Similarities/differences

• MDL: selection of model is aim; model parameters are simply a means to this end. Parameters usually maximum likelihood. Prior on parameters is flat.

• MML: Recovery of model parameters is central. Parameter prior may be more complex.

Page 37: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Coding scheme

• Usually assumed to follow an exponential distribution.

• Alternatives are universal codes and predictive codes.

• MML has two part codes (model+parameters). In MDL the codes may be one or two-part.

Page 38: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Method

• Model is supergraph (i.e. Graph prototypes) formed by graph union.

• Sample data observation model: Bernoulli distribution over nodes and edges.

• Mode: complexity: Von-Neumann entropy of supergraphs.

• Fitting criterion: MDL-like-make ML estimates of the Bernoulli parameters MML-like: two-part code for data-model fit + supergraph

complexity.

Page 39: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Model overview• Description length criterion

code-length=negative + model code-length log-likelihood (entropy)

Data-set: set of graphs G

Model: prototype graph+correspondences with it

Updates by expectation maximisation: Model graph adjacency matrix (M-step) + correspondence indicators (E-step).

)()|(),( Γ+Γ=Γ HGLLGL

Page 40: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Data Codelength

• Depends on correspondences s between data-graph adjacency matrix elements D and model-graph adjacency matrix elements M

40

Page 41: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Experiments

Delaunay graphs from images of different objects.

COIL dataset Toys dataset

Page 42: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Experiments---validation■ COIL dataset: model complexity increase, graph data log-likelihood

increase, overall code length decrease during iterations.

■ Toys dataset: model complexity decrease, graph data log-likelihood increase, overall code length decrease during iterations.

Page 43: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Experiments---classification task

We compare the performance of our learned supergraph on classification task with two alternative constructions , the median graph and the supergraph learned without using MDL. The table below shows the average classification rates from 10-fold cross validation, which are followed by their standard errors.

Page 44: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Experiments---graph embeddingPairwise graph distance based on the Jensen-Shannon divergence and the von Neumann entropy of graphs

Compute supergraph for each pair of graphs.

Page 45: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Experiments---graph embedding

Edit distance JSD distance

Page 46: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Generative model

• Train on graphs with set of predetermined characteristics.

• Sample using Monte-Carlo.

• Reproduces characteristics of training set, e.g. Spectral gap, node degree distribution, etc.

Page 47: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Erdos Renyi

Page 48: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Barabasi Albert (scale free)

Page 49: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Dealunay Graphs

Page 50: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Quantum machine learning

50

Page 51: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Quantum machine learning• Learn kernels by optimising quantum Jensen Shannon

divergence.

• Work with density matrices rather than probability distributions.

• Quantum walkers can hit exponentially faster than classical ones on symmetric structure.

• Also sensitive to long range structure and interference effects.

51

Page 52: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Initialization• Construct compositional graph

• Allow initial walks to interfere

• Emphasise constructive and destructive interference. For isomorphic graphs kernel maximises distinguishability between states of time averaged density matrices.

Page 53: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Bibliography• C. Ye, R.C. Wilson, C. Comin, C. da F. Costa, E.R. Hancock,

”Approximate Von Neumann Entropy for Directed Graphs”, Physical Review E, 89, 052804, 2014.

• Lin Han, Richard Wilson and Edwin Hancock, ‘’Generative Graphs Prototypes from Information Theory’’, IEEE TPAMI, 2015.

• L. Rossi, A. Torsello, E. R. Hancock and R.C. Wilson, “Characterizing graph symmetries through quantum Jensen-Shannon divergence”, Physical Review E, 88, 032806 2013.

• Bai, L., Rossi, L. & Hancock, E. R., “An Aligned Subtree Kernel for Weighted Graphs’’, 2015 International Conference on Machine Learning (ICML). 53

Page 54: Efficiently Learning Structure · 2016-08-03 · Edwin Hancock Department of Computer Science University of York Supported by a Royal Society Wolfson Research Merit Award. Structural

Conclusions• Shown how von Neumann entropy how can be used

as characterisation of graph complexity for component analysis, kernel construction and structural learning.

• Presented MDL framework which uses complexity characterisation to learn generative model of graph structure.

• Future: Deeper measures of structure (symmetry) and detailed dynamics of network evolution.