sum-product networks: a new deep architecture · 60 conclusion sum-product networks (spns) dag of...

60
1 Sum-Product Networks: A New Deep Architecture Hoifung Poon Microsoft Research Joint work with Pedro Domingos

Upload: others

Post on 11-Sep-2019

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

11

Sum-Product Networks:

A New Deep Architecture

Hoifung PoonMicrosoft Research

Joint work with Pedro Domingos

Page 2: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

2

Graphical Models: Challenges

2

Bayesian Network Markov Network

Sprinkler Rain

Grass Wet

Advantage: Compactly represent probability

Problem: Inference is intractable

Problem: Learning is difficult

Restricted Boltzmann

Machine (RBM)

Page 3: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

Stack many layers

E.g.: DBN [Hinton & Salakhutdinov, 2006]

CDBN [Lee et al., 2009]

DBM [Salakhutdinov & Hinton, 2010]

Potentially much more powerful than

shallow architectures [Bengio, 2009]

But …

Inference is even harder

Learning requires extensive effort3

Deep Learning

3

Page 4: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

Learning: Requires approximate inference

Inference: Still approximate

Graphical

Models

4

Page 5: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

E.g., hierarchical mixture model,

thin junction tree, etc.

Problem: Too restricted

Graphical

Models

Existing

Tractable

Models

5

Page 6: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

This Talk: Sum-Product Networks

Compactly represent partition function

using a deep network

Graphical

Models

Existing

Tractable

Models

Sum-Product

Networks

6

Page 7: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

Graphical

Models

Existing

Tractable

Models

Sum-Product

Networks

Exact inference linear time

in network size

7

Page 8: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

Graphical

Models

Existing

Tractable

Models

Sum-Product

Networks

Can compactly represent

many more distributions

8

Page 9: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

Graphical

Models

Existing

Tractable

Models

Sum-Product

Networks

Learn optimal way to

reuse computation, etc.

9

Page 10: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

1010

Outline

Sum-product networks (SPNs)

Learning SPN

Experimental results

Conclusion

Page 11: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

Bottleneck: Summing out variables

E.g.: Partition function

Sum of exponentially many products

Why Is Inference Hard?

1 1

1( , , ) , ,N j N

j

P X X X XZ

j

X j

Z X

11

Page 12: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

Alternative Representation

X1 X2 P(X)

1 1 0.4

1 0 0.2

0 1 0.1

0 0 0.3

P(X) = 0.4 I[X1=1] I[X2=1]

+ 0.2 I[X1=1] I[X2=0]

+ 0.1 I[X1=0] I[X2=1]

+ 0.3 I[X1=0] I[X2=0]

Network Polynomial [Darwiche, 2003]

12

Page 13: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

Alternative Representation

X1 X2 P(X)

1 1 0.4

1 0 0.2

0 1 0.1

0 0 0.3

P(X) = 0.4 I[X1=1] I[X2=1]

+ 0.2 I[X1=1] I[X2=0]

+ 0.1 I[X1=0] I[X2=1]

+ 0.3 I[X1=0] I[X2=0]

13

Network Polynomial [Darwiche, 2003]

Page 14: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

Shorthand for Indicators

X1 X2 P(X)

1 1 0.4

1 0 0.2

0 1 0.1

0 0 0.3

P(X) = 0.4 X1 X2

+ 0.2 X1 X2

+ 0.1 X1 X2

+ 0.3 X1 X2

14

Network Polynomial [Darwiche, 2003]

Page 15: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

Sum Out Variables

X1 X2 P(X)

1 1 0.4

1 0 0.2

0 1 0.1

0 0 0.3

P(e) = 0.4 X1 X2

+ 0.2 X1 X2

+ 0.1 X1 X2

+ 0.3 X1 X2

e: X1 = 1

Set X1 = 1, X1 = 0, X2 = 1, X2 = 1

Easy: Set both indicators to 115

Page 16: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

Graphical Representation

0.4

0.2 0.10.3

X1 X2X1

X2

X1 X2 P(X)

1 1 0.4

1 0 0.2

0 1 0.1

0 0 0.3

16

Page 17: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

But … Exponentially Large

Example: Parity

Uniform distribution over states with even number of 1’s

17

X2 X2

X3 X3

X1 X1 X4

X4

X5 X5

2N-1

N2N-1

17

Page 18: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

But … Exponentially Large

Example: Parity

Uniform distribution over states of even number of 1’s

18

X2 X2

X3 X3

X1 X1 X4

X4

X5 X5

Can we make this more compact?

18

Page 19: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

19

Use a Deep Network

19

O(N)

Example: Parity

Uniform distribution over states with even number of 1’s

Page 20: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

20

Use a Deep Network

20

Example: Parity

Uniform distribution over states of even number of 1’s

Induce many hidden layers

Reuse partial computation

Page 21: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

Arithmetic Circuits (ACs)

Data structure for efficient inference

Darwiche [2003]

Compilation target of Bayesian networks

Key idea: Use ACs instead to define a

new class of deep probabilistic models

Develop new deep learning algorithms

for this class of models

21

Page 22: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

22

Sum-Product Networks (SPNs)

Rooted DAG

Nodes: Sum, product, input indicator

Weights on edges from sum to children

22

0.7 0.3

X1 X2

0.80.30.10.20.70.90.4

0.6

X1

X2

Page 23: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

23

Distribution Defined by SPN

P(X) S(X)

23

0.7 0.3

X1 X2

0.80.30.10.20.70.90.4

0.6

X1

X2

23

Page 24: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

2424

0.7 0.3

X1 X2

0.80.30.10.20.70.90.4

0.6

X1

X2

1 0 1 1e: X1 = 1

P(e) Xe S(X) S(e)

Can We Sum Out Variables?

=?

Page 25: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

25

Valid SPN

SPN is valid if S(e) = Xe S(X) for all e

Valid Can compute marginals efficiently

Partition function Z can be computed by setting all indicators to 1

25

Page 26: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

26

Valid SPN: General Conditions

Theorem: SPN is valid if it is complete & consistent

26

Incomplete Inconsistent

Complete: Under sum, children

cover the same set of variables

Consistent: Under product, no variable

in one child and negation in another

S(e) Xe S(X) S(e) Xe S(X)

Page 27: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

Semantics of Sums and Products

Product Feature Form feature hierarchy

Sum Mixture (with hidden var. summed out)

27

I[Yi = j]

i

j

…… ……

j

wij

i

…… ……

wijSum out Yi

Page 28: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

28

Inference

Probability: P(X) = S(X) / Z

0.7 0.3

X1 X2

0.80.30.10.20.70.90.4

0.6

X1 X2

1 0 0 1

0.6 0.9 0.7 0.8

0.42 0.72

X: X1 = 1, X2 = 0

X1 1

X1 0

X2 0

X2 1

0.51

Page 29: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

29

Inference

If weights sum to 1 at each sum node

Then Z = 1, P(X) = S(X)

0.7 0.3

X1 X2

0.80.30.10.20.70.90.4

0.6

X1 X2

1 0 0 1

0.6 0.9 0.7 0.8

0.42 0.72

X: X1 = 1, X2 = 0

X1 1

X1 0

X2 0

X2 1

0.51

Page 30: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

30

Inference

Marginal: P(e) = S(e) / Z

0.7 0.3

X1 X2

0.80.30.10.20.70.90.4

0.6

X1 X2

1 0 1 1

0.6 0.9 1 1

0.6 0.9

0.69 = 0.51 0.18e: X1 = 1

X1 1

X1 0

X2 1

X2 1

Page 31: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

31

Inference

0.7 0.3

X1 X2

0.80.30.10.20.70.90.4

0.6

X1 X2

1 0 1 1

0.6 0.9 0.7 0.8

0.42 0.72

0.3 0.72 = 0.216e: X1 = 1

X1 1

X1 0

X2 1

X2 1

MPE: Replace sums with maxes

MAX MAX MAX MAX

MAX

0.7 0.42 = 0.294

Darwiche [2003]

Page 32: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

32

Inference

0.7 0.3

X1 X2

0.80.30.10.20.70.90.4

0.6

X1 X2

1 0 1 1

0.6 0.9 0.7 0.8

0.42 0.72

0.3 0.72 = 0.216e: X1 = 1

X1 1

X1 0

X2 1

X2 1

MAX: Pick child with highest value

MAX MAX MAX MAX

MAX

0.7 0.42 = 0.294

Darwiche [2003]

Page 33: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

33

Handling Continuous Variables

Sum Integral over input

Simplest case: Indicator Gaussian

SPN compactly defines a very large

mixture of Gaussians

Page 34: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

34

SPNs Everywhere

Graphical models

34

• Existing tractable mdls. & inference mthds.

• Determinism, context-specific indep., etc.

• Can potentially learn the optimal way

Page 35: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

35

SPNs Everywhere

Graphical models

Methods for efficient inference

35

E.g., arithmetic circuits,

AND/OR graphs, case-factor diagrams

SPNs are a class of probabilistic models

SPNs have validity conditions

SPNs can be learned from data

Page 36: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

36

SPNs Everywhere

Graphical models

Models for efficient inference

General, probabilistic convolutional network

36

Sum: Average-pooling

Max: Max-pooling

Page 37: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

37

SPNs Everywhere

Graphical models

Models for efficient inference

General, probabilistic convolutional network

Grammars in vision and language

37

E.g., object detection grammar,

probabilistic context-free grammar

Sum: Non-terminal

Product: Production rule

Page 38: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

3838

Outline

Sum-product networks (SPNs)

Learning SPN

Experimental results

Conclusion

Page 39: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

39

General Approach

Start with a dense SPN

Find the structure by learning weights

Zero weights signify absence of connections

Can learn with gradient descent or EM

Page 40: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

4040

The Challenge

Gradient diffusion: Gradient quickly dilutes

Similar problem with EM

Hard EM overcomes this problem

Page 41: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

4141

Our Learning Algorithm

Online learning Hard EM

Sum node maintains counts for each child

For each example

Find MPE instantiation with current weights

Increment count for each chosen child

Renormalize to set new weights

Repeat until convergence

Page 42: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

4242

Outline

Sum-product networks (SPNs)

Learning SPN

Experimental results

Conclusion

Page 43: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

43

Task: Image Completion

Methodology:

Learn a model from training images

Complete unseen test images

Measure mean square errors

Very challenging

Good for evaluating deep models

Page 44: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

44

Datasets

Main evaluation: Caltech-101 [Fei-Fei et al., 2004]

101 categories, e.g., faces, cars, elephants

Each category: 30 – 800 images

Also, Olivetti [Samaria & Harter, 1994] (400 faces)

Each category: Last third for test

Test images: Unseen objects

Page 45: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

45

SPN Architecture

Whole Image

Region

Pixel

......

......

x

......

......

y

Page 46: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

46

Decomposition

Page 47: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

47

Decomposition

……

……

Page 48: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

4848

Systems

SPN

DBM [Salakhutdinov & Hinton, 2010]

DBN [Hinton & Salakhutdinov, 2006]

PCA [Turk & Pentland, 1991]

Nearest neighbor [Hays & Efros, 2007]

Page 49: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

49

Caltech: Mean-Square Errors

DBNNN SPN

Left Bottom

PCA DBM DBNNN SPNPCA DBM

Page 50: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

50

SPN vs. DBM / DBN

SPN is order of magnitude faster

No elaborate preprocessing, tuning

Reduced errors by 30-60%

Learned up to 46 layers

SPN DBM / DBN

Learning 2-3 hours Days

Inference < 1 second Minutes or hours

Page 51: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

51

Example Completions

SPN

DBN

Nearest Neighbor

DBM

PCA

Original

Page 52: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

52

Example Completions

SPN

DBN

Nearest Neighbor

DBM

PCA

Original

Page 53: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

53

Example Completions

SPN

DBN

Nearest Neighbor

DBM

PCA

Original

Page 54: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

54

Example Completions

SPN

DBN

Nearest Neighbor

DBM

PCA

Original

Page 55: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

55

Example Completions

SPN

DBN

Nearest Neighbor

DBM

PCA

Original

Page 56: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

56

Example Completions

SPN

DBN

Nearest Neighbor

DBM

PCA

Original

Page 57: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

5757

Open Questions

Other learning algorithms

Discriminative learning

Architecture

Continuous SPNs

Sequential domains

Other applications

Page 58: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

End-to-End Comparison

58

DataGeneral

Graphical ModelsPerformance

DataSum-Product

NetworksPerformance

Approximate Approximate

Approximate Exact

Given same computation budget,

which approach has better performance?

Page 59: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

Graphical

Models

Existing

Tractable

Models

Sum-Product

Networks

59

True Model Approximate Inference

Optimal SPN

Page 60: Sum-Product Networks: A New Deep Architecture · 60 Conclusion Sum-product networks (SPNs) DAG of sums and products Compactly represent partition function Learn many layers of hidden

6060

Conclusion

Sum-product networks (SPNs)

DAG of sums and products

Compactly represent partition function

Learn many layers of hidden variables

Exact inference: Linear time in network size

Deep learning: Online hard EM

Substantially outperform state of the art on image completion