structure learning in undirected graphical models mark schmidt › ~schmidtm › documents ›...

171
Structure Learning in Undirected Graphical Models Mark Schmidt INRIA - SIERRA team Laboratoire d’Informatique de l’Ecole Normale Suprieure January 20, 2011

Upload: others

Post on 03-Jul-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Structure Learning in Undirected Graphical Models

Mark Schmidt

INRIA - SIERRA teamLaboratoire d’Informatique de l’Ecole Normale Suprieure

January 20, 2011

Page 2: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Outline

1 Motivation, Classical Methods

2 Gausian and Ising graphical models: `1-Regularization

3 General pairwise models: Group `1-Regularization

4 High-order models: Structured Sparsity

5 Further Extensions

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 3: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

MotivationClassical MethodsRegularization Methods

Motivation for Graphical Model Structure Learning

car drive files hockey mac league pc win

0 0 1 0 1 0 1 00 0 0 1 0 1 0 11 1 0 0 0 0 0 00 1 1 0 1 0 0 00 0 1 0 0 0 1 1

What words are related?

Is a post with (car,drive,hockey,pc,win) spam?

What is p(car|drive)? What about p(car|drive,files)?

Can we ‘fill in’ some variables given the others?

Can we generate more items that look like this?

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 4: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

MotivationClassical MethodsRegularization Methods

Motivation for Graphical Model Structure Learning

car drive files hockey mac league pc win

0 0 1 0 1 0 1 00 0 0 1 0 1 0 11 1 0 0 0 0 0 00 1 1 0 1 0 0 00 0 1 0 0 0 1 1

What words are related?

Is a post with (car,drive,hockey,pc,win) spam?

What is p(car|drive)? What about p(car|drive,files)?

Can we ‘fill in’ some variables given the others?

Can we generate more items that look like this?

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 5: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

MotivationClassical MethodsRegularization Methods

Motivation for Graphical Model Structure Learning

car drive files hockey mac league pc win

0 0 1 0 1 0 1 00 0 0 1 0 1 0 11 1 0 0 0 0 0 00 1 1 0 1 0 0 00 0 1 0 0 0 1 1

What words are related?

Is a post with (car,drive,hockey,pc,win) spam?

What is p(car|drive)? What about p(car|drive,files)?

Can we ‘fill in’ some variables given the others?

Can we generate more items that look like this?

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 6: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

MotivationClassical MethodsRegularization Methods

Motivation for Graphical Model Structure Learning

car drive files hockey mac league pc win

0 0 1 0 1 0 1 00 0 0 1 0 1 0 11 1 0 0 0 0 0 00 1 1 0 1 0 0 00 0 1 0 0 0 1 1

What words are related?

Is a post with (car,drive,hockey,pc,win) spam?

What is p(car|drive)? What about p(car|drive,files)?

Can we ‘fill in’ some variables given the others?

Can we generate more items that look like this?

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 7: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

MotivationClassical MethodsRegularization Methods

Motivation for Graphical Model Structure Learning

car drive files hockey mac league pc win

0 0 1 0 1 0 1 00 0 0 1 0 1 0 11 1 0 0 0 0 0 00 1 1 0 1 0 0 00 0 1 0 0 0 1 1

What words are related?

Is a post with (car,drive,hockey,pc,win) spam?

What is p(car|drive)? What about p(car|drive,files)?

Can we ‘fill in’ some variables given the others?

Can we generate more items that look like this?

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 8: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

MotivationClassical MethodsRegularization Methods

Motivation for Graphical Model Structure Learning

car drive files hockey mac league pc win

0 0 1 0 1 0 1 00 0 0 1 0 1 0 11 1 0 0 0 0 0 00 1 1 0 1 0 0 00 0 1 0 0 0 1 1

What words are related?

Is a post with (car,drive,hockey,pc,win) spam?

What is p(car|drive)? What about p(car|drive,files)?

Can we ‘fill in’ some variables given the others?

Can we generate more items that look like this?

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 9: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

MotivationClassical MethodsRegularization Methods

Example of Learned Graph Structure

baseball

games

league

players

bible

christian

god

jesus

question

car

dealerdrive engine

card

driver

graphics

pc

problem

system

video

windows

case

course

evidence

fact

government

human

lawnumber power

rights

state

world

children

president

religionwar

computer

data

email

program

science

software

university

memory

research

space

disk

files

display

imagedos

mac scsi

earth

orbit

format

ftp

help

phone

jews

fans

hockey

team

version

nhl

season

win

gun

health

insurance

israel

launch moon

nasa

shuttle

technology

won

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 10: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

MotivationClassical MethodsRegularization Methods

Example of Learned Graph Structure

baseball

games

league

players

bible

christian

god

jesus

question

car

dealerdrive engine

card

driver

graphics

pc

problem

system

video

windows

case

course

evidence

fact

government

human

lawnumber power

rights

state

world

children

president

religionwar

computer

data

email

program

science

software

university

memory

research

space

disk

files

display

imagedos

mac scsi

earth

orbit

format

ftp

help

phone

jews

fans

hockey

team

version

nhl

season

win

gun

health

insurance

israel

launch moon

nasa

shuttle

technology

won

baseball

games

league

players

bible

christian

god

jesus

question

car

dealerdrive engine

card

driver

graphics

pc

problem

system

video

windows

case

course

evidence

fact

government

human

lawnumber power

rights

state

world

children

president

religionwar

computer

data

email

program

science

software

university

memory

research

space

disk

files

display

imagedos

mac scsi

earth

orbit

format

ftp

help

phone

jews

fans

hockey

team

version

nhl

season

win

gun

health

insurance

israel

launch moon

nasa

shuttle

technology

won

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 11: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

MotivationClassical MethodsRegularization Methods

Estimation in Graphical Models with Unknown Structure

X1

X3

X8

X7

X9

X4

X5

X6

X2

X1

X3

X8

X7

X9

X4

X5

X6

X2

Undirected graphical models are used to efficiently representprobability distributions in various applications.

Often the graph structure is known (or assumed).

We consider parameter estimation with an unknown structure.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 12: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

MotivationClassical MethodsRegularization Methods

Estimation in Graphical Models with Unknown Structure

X1

X3

X8

X7

X9

X4

X5

X6

X2

X1

X3

X8

X7

X9

X4

X5

X6

X2

Undirected graphical models are used to efficiently representprobability distributions in various applications.

Often the graph structure is known (or assumed).

We consider parameter estimation with an unknown structure.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 13: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

MotivationClassical MethodsRegularization Methods

Estimation in Graphical Models with Unknown Structure

X1

X3

X8

X7

X9

X4

X5

X6

X2

X1

X3

X8

X7

X9

X4

X5

X6

X2

Undirected graphical models are used to efficiently representprobability distributions in various applications.

Often the graph structure is known (or assumed).

We consider parameter estimation with an unknown structure.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 14: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

MotivationClassical MethodsRegularization Methods

Motivations for doing Structure Learning

One approach to this task is to simply fit a dense model.

Alternately, we can search for a sparse set of edges.

Reasons why we might prefer the sparse approach:

Statistical efficiencyComputational efficiencyStructural discovery

There are two classical methods for estimating sparse models:

Constraint-based approachesSearch and score approaches

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 15: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

MotivationClassical MethodsRegularization Methods

Motivations for doing Structure Learning

One approach to this task is to simply fit a dense model.

Alternately, we can search for a sparse set of edges.

Reasons why we might prefer the sparse approach:

Statistical efficiencyComputational efficiencyStructural discovery

There are two classical methods for estimating sparse models:

Constraint-based approachesSearch and score approaches

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 16: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

MotivationClassical MethodsRegularization Methods

Motivations for doing Structure Learning

One approach to this task is to simply fit a dense model.

Alternately, we can search for a sparse set of edges.

Reasons why we might prefer the sparse approach:

Statistical efficiencyComputational efficiencyStructural discovery

There are two classical methods for estimating sparse models:

Constraint-based approachesSearch and score approaches

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 17: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

MotivationClassical MethodsRegularization Methods

Motivations for doing Structure Learning

One approach to this task is to simply fit a dense model.

Alternately, we can search for a sparse set of edges.

Reasons why we might prefer the sparse approach:

Statistical efficiencyComputational efficiencyStructural discovery

There are two classical methods for estimating sparse models:

Constraint-based approachesSearch and score approaches

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 18: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

MotivationClassical MethodsRegularization Methods

Constraint-based Methods 1: Marginal Independence

Perform a series of (in)dependence tests to discover the edges.

One approach is using a pairwise (in)dependence statistic to:

Select the ‘top-k’ neighbors.Select those above a threshold.

Assesses marginal instead of conditional dependence:

‘true’ neighbors may not have highest marginal dependence.all variables may be marginally dependent in sparse graphs.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 19: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

MotivationClassical MethodsRegularization Methods

Constraint-based Methods 1: Marginal Independence

Perform a series of (in)dependence tests to discover the edges.

One approach is using a pairwise (in)dependence statistic to:

Select the ‘top-k’ neighbors.Select those above a threshold.

Assesses marginal instead of conditional dependence:

‘true’ neighbors may not have highest marginal dependence.all variables may be marginally dependent in sparse graphs.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 20: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

MotivationClassical MethodsRegularization Methods

Constraint-based Methods 1: Marginal Independence

Perform a series of (in)dependence tests to discover the edges.

One approach is using a pairwise (in)dependence statistic to:

Select the ‘top-k’ neighbors.Select those above a threshold.

Assesses marginal instead of conditional dependence:

‘true’ neighbors may not have highest marginal dependence.all variables may be marginally dependent in sparse graphs.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 21: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

MotivationClassical MethodsRegularization Methods

Constraint-based Methods 2: Conditional Independence

More advanced methods use conditional independence tests.[Verman & Pearl, 1990, Spirtes and Glymour, 1991]

In some cases, these methods recover the true structure.

However, there are several practical drawbacks:

Number and size of possible conditioning sets is exponential.Multiple testing gives low statistical power.Potential for propagation of errors.Tests don’t assess ability of structure to model the data.

Modern methods alleviate these, but aren’t the focus of talk.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 22: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

MotivationClassical MethodsRegularization Methods

Constraint-based Methods 2: Conditional Independence

More advanced methods use conditional independence tests.[Verman & Pearl, 1990, Spirtes and Glymour, 1991]

In some cases, these methods recover the true structure.

However, there are several practical drawbacks:

Number and size of possible conditioning sets is exponential.Multiple testing gives low statistical power.Potential for propagation of errors.Tests don’t assess ability of structure to model the data.

Modern methods alleviate these, but aren’t the focus of talk.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 23: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

MotivationClassical MethodsRegularization Methods

Constraint-based Methods 2: Conditional Independence

More advanced methods use conditional independence tests.[Verman & Pearl, 1990, Spirtes and Glymour, 1991]

In some cases, these methods recover the true structure.

However, there are several practical drawbacks:

Number and size of possible conditioning sets is exponential.Multiple testing gives low statistical power.Potential for propagation of errors.Tests don’t assess ability of structure to model the data.

Modern methods alleviate these, but aren’t the focus of talk.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 24: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

MotivationClassical MethodsRegularization Methods

Constraint-based Methods 2: Conditional Independence

More advanced methods use conditional independence tests.[Verman & Pearl, 1990, Spirtes and Glymour, 1991]

In some cases, these methods recover the true structure.

However, there are several practical drawbacks:

Number and size of possible conditioning sets is exponential.Multiple testing gives low statistical power.Potential for propagation of errors.Tests don’t assess ability of structure to model the data.

Modern methods alleviate these, but aren’t the focus of talk.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 25: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

MotivationClassical MethodsRegularization Methods

Search and Score 1: Greedy Forward/Backward

Classical search and score methods:

Start with the empty structureAdd the edge that improves the likelihood the most.Test for sufficient improvement in the likelihood.Stop when the test fails.

[Dempster, 1972, Goodman, 1971](you can also start with the full structure and work backwards)

Very expensive in high dimensions:

Fits O(p2) models at each of O(p2) steps.In Gaussian graphical models, fitting model require O(p3).

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 26: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

MotivationClassical MethodsRegularization Methods

Search and Score 1: Greedy Forward/Backward

Classical search and score methods:

Start with the empty structureAdd the edge that improves the likelihood the most.Test for sufficient improvement in the likelihood.Stop when the test fails.

[Dempster, 1972, Goodman, 1971](you can also start with the full structure and work backwards)

Very expensive in high dimensions:

Fits O(p2) models at each of O(p2) steps.In Gaussian graphical models, fitting model require O(p3).

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 27: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

MotivationClassical MethodsRegularization Methods

Search and Score 1: Greedy Forward/Backward

Classical search and score methods:

Start with the empty structureAdd the edge that improves the likelihood the most.Test for sufficient improvement in the likelihood.Stop when the test fails.

[Dempster, 1972, Goodman, 1971](you can also start with the full structure and work backwards)

Very expensive in high dimensions:

Fits O(p2) models at each of O(p2) steps.In Gaussian graphical models, fitting model require O(p3).

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 28: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

MotivationClassical MethodsRegularization Methods

Search and Score 2: Restricted Model Classes

Modern search and score methods:

Define a score on structure and parameters.Use combinatorial-search techniques to optimize the score.Consider a restricted class of models (chordal, low treewidth).Use heuristics to approximately evaluate O(p2) candidates.

But these methods still have drawbacks:

The search space is enormous, 2p(p−1)/2 possible models.Each step may still be very expensive, still need to re-fit.Restricted classes may be inefficient or ineffective for modellingsome distributions.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 29: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

MotivationClassical MethodsRegularization Methods

Search and Score 2: Restricted Model Classes

Modern search and score methods:

Define a score on structure and parameters.Use combinatorial-search techniques to optimize the score.Consider a restricted class of models (chordal, low treewidth).Use heuristics to approximately evaluate O(p2) candidates.

But these methods still have drawbacks:

The search space is enormous, 2p(p−1)/2 possible models.Each step may still be very expensive, still need to re-fit.Restricted classes may be inefficient or ineffective for modellingsome distributions.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 30: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

MotivationClassical MethodsRegularization Methods

Motivation for NOT doing Structure Learning

Recall the reasons we wanted to do structure learning:

Statistical efficiencyComputational efficiencyStructural discovery

But, even greedy search methods are extremely expensive.

A high-dimensional alternative is fit single dense model but:

use regularization to improve statistical efficiencyuse approximations to improve computational efficiencyinterpret our parameter estimates for structural discovery.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 31: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

MotivationClassical MethodsRegularization Methods

Motivation for NOT doing Structure Learning

Recall the reasons we wanted to do structure learning:

Statistical efficiencyComputational efficiencyStructural discovery

But, even greedy search methods are extremely expensive.

A high-dimensional alternative is fit single dense model but:

use regularization to improve statistical efficiencyuse approximations to improve computational efficiencyinterpret our parameter estimates for structural discovery.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 32: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

MotivationClassical MethodsRegularization Methods

Motivation for NOT doing Structure Learning

Recall the reasons we wanted to do structure learning:

Statistical efficiencyComputational efficiencyStructural discovery

But, even greedy search methods are extremely expensive.

A high-dimensional alternative is fit single dense model but:

use regularization to improve statistical efficiencyuse approximations to improve computational efficiencyinterpret our parameter estimates for structural discovery.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 33: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

MotivationClassical MethodsRegularization Methods

Graphical Model Structure Learning with `1-Regularization

We focus on an intermediate between fitting a dense andsparse model:

Fit a single dense model (possibly with approximations).Use `1-regularization to encourage parameter sparsity.

We parameterize the model so that parameter sparsity isequivalent to graph sparsity.

Estimates a sparse model by fitting a single dense model.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 34: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

MotivationClassical MethodsRegularization Methods

Summary of Contributions

There has been growing interest in this approach:

Gives regularized estimate (like `2-regularization).Gives sparse estimate (like search methods).Formulated as a convex optimization.

But previous work usually makes two unrealistic assumptions:

Parameters and edges have a one-to-one correspondence.The model only includes pairwise dependencies.

This talk outlines methods that remove these assumptions.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 35: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

MotivationClassical MethodsRegularization Methods

Summary of Contributions

There has been growing interest in this approach:

Gives regularized estimate (like `2-regularization).Gives sparse estimate (like search methods).Formulated as a convex optimization.

But previous work usually makes two unrealistic assumptions:

Parameters and edges have a one-to-one correspondence.The model only includes pairwise dependencies.

This talk outlines methods that remove these assumptions.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 36: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

MotivationClassical MethodsRegularization Methods

Summary of Contributions

There has been growing interest in this approach:

Gives regularized estimate (like `2-regularization).Gives sparse estimate (like search methods).Formulated as a convex optimization.

But previous work usually makes two unrealistic assumptions:

Parameters and edges have a one-to-one correspondence.The model only includes pairwise dependencies.

This talk outlines methods that remove these assumptions.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 37: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Pairwise Undirected Graphical ModelsOptimization with `1-RegularizationGaussian and Ising Graphical Models

Outline

1 Motivation, Classical Methods

2 Gausian and Ising graphical models: `1-RegularizationPairwise Undirected Graphical ModelsOptimization with `1-RegularizationGaussian and Ising Graphical Models

3 General pairwise models: Group `1-Regularization

4 High-order models: Structured Sparsity

5 Further Extensions

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 38: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Pairwise Undirected Graphical ModelsOptimization with `1-RegularizationGaussian and Ising Graphical Models

Pairwise Undirected Graphical Models (UGMs)

Pairwise UGMs represent multivariate distributions as anormalized product of non-negative potential functions:

p(x1, x2, . . . , xp) =1

Z

p∏i=1

φi (xi )∏

(i ,j)∈E

φij(xi , xj)

Z is the constant that makes the distribution integrate to one.

Models the pairwise statistics of all pairs of variables in E .

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 39: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Pairwise Undirected Graphical ModelsOptimization with `1-RegularizationGaussian and Ising Graphical Models

Continuous Structure Learning in UGMs

Pairwise UGMs represent multivariate distributions as anormalized product of non-negative potentials functions:

p(x1, x2, . . . , xp) =1

Z

p∏i=1

φi (xi )∏

(i ,j)∈E

φij(xi , xj)

Structure learning is the task of choosing the edge set E.

Removing the edge is the same as setting φij(xi , xj) = 1,∀ij .We parameterize so that zero parameters make φij(xi , xj) = 1.

This lets us perform structure learning with `1-regularization.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 40: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Pairwise Undirected Graphical ModelsOptimization with `1-RegularizationGaussian and Ising Graphical Models

Continuous Structure Learning in UGMs

Pairwise UGMs represent multivariate distributions as anormalized product of non-negative potentials functions:

p(x1, x2, . . . , xp) =1

Z

p∏i=1

φi (xi )∏

(i ,j)∈E

φij(xi , xj)

Structure learning is the task of choosing the edge set E.

Removing the edge is the same as setting φij(xi , xj) = 1,∀ij .We parameterize so that zero parameters make φij(xi , xj) = 1.

This lets us perform structure learning with `1-regularization.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 41: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Pairwise Undirected Graphical ModelsOptimization with `1-RegularizationGaussian and Ising Graphical Models

Continuous Structure Learning in UGMs

Pairwise UGMs represent multivariate distributions as anormalized product of non-negative potentials functions:

p(x1, x2, . . . , xp) =1

Z

p∏i=1

φi (xi )∏

(i ,j)∈E

φij(xi , xj)

Structure learning is the task of choosing the edge set E.

Removing the edge is the same as setting φij(xi , xj) = 1,∀ij .We parameterize so that zero parameters make φij(xi , xj) = 1.

This lets us perform structure learning with `1-regularization.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 42: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Pairwise Undirected Graphical ModelsOptimization with `1-RegularizationGaussian and Ising Graphical Models

Optimization with `1-Regularization

Various fields are now interested in `1-regularization:

minw

f (w) +

p∑i=1

λi |wi |

There are efficient algorithms for solving this type of problem.

Under suitable assumptions, yields a sparse solution:

Many coefficients wi are exactly zero.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 43: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Pairwise Undirected Graphical ModelsOptimization with `1-RegularizationGaussian and Ising Graphical Models

Optimization with `1-Regularization

Various fields are now interested in `1-regularization:

minw

f (w) +

p∑i=1

λi |wi |

There are efficient algorithms for solving this type of problem.

Under suitable assumptions, yields a sparse solution:

Many coefficients wi are exactly zero.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 44: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Pairwise Undirected Graphical ModelsOptimization with `1-RegularizationGaussian and Ising Graphical Models

`2-Regularization vs. `1-Regularization

`2-regularization is equivalent to optimization over an `2-norm ball:

Unconstrained Solution

L2-Regularized Solution

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 45: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Pairwise Undirected Graphical ModelsOptimization with `1-RegularizationGaussian and Ising Graphical Models

`2-Regularization vs. `1-Regularization

`1-regularization is equivalent to optimization over an `1-norm ball:

Unconstrained Solution

L1-Regularized Solution

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 46: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Pairwise Undirected Graphical ModelsOptimization with `1-RegularizationGaussian and Ising Graphical Models

Continuous Variables: Gaussian Graphical Models (GGMs)

Structure learning with `1-regularization was first explored forGaussian graphical models (GGMs).

GGMs model a multivariate distribution over continuousvariables as a multivariate Gaussian distribution:

p(x1, x2, . . . , xp) =1

Zexp(−1

2(x− b)TW (x− b))

The normalizing constant Z is

Z = (2π)p/2|W |−1/2

Edges correspond to non-zero elements of the precision W .

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 47: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Pairwise Undirected Graphical ModelsOptimization with `1-RegularizationGaussian and Ising Graphical Models

Continuous Variables: Gaussian Graphical Models (GGMs)

X1

X3X8

X7X4

X5

X6

X2

X9

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 48: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Pairwise Undirected Graphical ModelsOptimization with `1-RegularizationGaussian and Ising Graphical Models

Continuous Variables: Gaussian Graphical Models (GGMs)

X1

X3X8

X7X4

X5

X6

X2

X9

-0.2 -1.8

2.3

0.8

0.3

1.3

-0.7

-0.7

0.9

1.1

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 49: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Pairwise Undirected Graphical ModelsOptimization with `1-RegularizationGaussian and Ising Graphical Models

Continuous Variables: Gaussian Graphical Models (GGMs)

X1

X3X8

X7X4

X5

X6

X2

X9

-0.2 -1.8

2.3

0.8

0.3

1.3

-0.7

-0.7

0.9

1.1

0

0

0

0

0

0

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 50: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Pairwise Undirected Graphical ModelsOptimization with `1-RegularizationGaussian and Ising Graphical Models

Continuous Variables: Gaussian Graphical Models (GGMs)

GGM structure learning with `1-regularization of the precision:

minW�0,b

−n∑

m=1

log p(xm|W ,b) +

p∑i=1

p∑j=1

λij |Wij |

First explored in [Dahl et al., 2005, Banerjee et al., 2006,Meinshausen & Buhlmann, 2006, Yuan and Lin, 2007].

Sometimes called the graphical LASSO.

Convex optimization is easily solved with 1000s of variables.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 51: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Pairwise Undirected Graphical ModelsOptimization with `1-RegularizationGaussian and Ising Graphical Models

Binary Variables: Ising Graphical Models (IGMs)

This idea was next explored for Ising graphical models:

p(x1, x2, . . . , xp) =1

Zexp(

p∑i=1

xibi +∑

(i ,j)∈E

xixjWij)

The normalizing constant Z is

Z =∑x′

exp(

p∑i=1

x ′i bi +∑

(i ,j)∈E

x ′i x′jWij)

Setting the edge weight Wij to zero removes the edge.IGM structure learning with `1-regularization:

minW ,b−

n∑m=1

log p(xm|W ,b) +

p∑i=1

p∑j=1

λij |Wij |

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 52: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Pairwise Undirected Graphical ModelsOptimization with `1-RegularizationGaussian and Ising Graphical Models

Binary Variables: Ising Graphical Models (IGMs)

This idea was next explored for Ising graphical models:

p(x1, x2, . . . , xp) =1

Zexp(

p∑i=1

xibi +∑

(i ,j)∈E

xixjWij)

The normalizing constant Z is

Z =∑x′

exp(

p∑i=1

x ′i bi +∑

(i ,j)∈E

x ′i x′jWij)

Setting the edge weight Wij to zero removes the edge.IGM structure learning with `1-regularization:

minW ,b−

n∑m=1

log p(xm|W ,b) +

p∑i=1

p∑j=1

λij |Wij |

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 53: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Pairwise Undirected Graphical ModelsOptimization with `1-RegularizationGaussian and Ising Graphical Models

Approximations for IGMs

IGM case is more difficult than GGM case because of Z :

Z can be computed in O(p3) for GGMsIn general, it is #P-hard to evaluate Z in IGMs.

Several ways to address this have been explored:

Asymmetric pseudo-likelihood [Wainwright et al., 2006].Bethe approximation [Lee et al., 2006].Symmetric pseudo-likelihood [Schmidt et al., 2008].Mean-field approximation, convex Bethe approximation.Logdet approximation [Banerjee et al., 2008].Cutting-plane refinement [Kolar and Xing, 2008].

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 54: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Group-Sparse ModelsGroup `1-RegularizationExperiments

Outline

1 Motivation, Classical Methods

2 Gausian and Ising graphical models: `1-Regularization

3 General pairwise models: Group `1-RegularizationGroup-Sparse ModelsGroup `1-RegularizationExperiments

4 High-order models: Structured Sparsity

5 Further Extensions

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 55: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Group-Sparse ModelsGroup `1-RegularizationExperiments

Structure Learning with Group `1-Regularization

In GGMs/IGMs, there is a one-to-one correspondence betweenparameters and edges.

In some case, we want sparsity in groups of parameters:

General log-linear models [Lee et al., 2006].Blockwise-sparse models [Duchi et al., 2008].Conditional random fields [Schmidt et al., 2008].

In these cases, we can use group `1-regularization.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 56: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Group-Sparse ModelsGroup `1-RegularizationExperiments

General Pairwise Log-Linear Models

In log-linear models, the log-potentials are linear functions.

IGMs are a special case with binary variables.

log φij(xi , xj ,wij) = xixjwij

But log-linear models allow non-binary discrete variables.

Also useful for (discretized) non-Gaussian continuous data.

The potentials for an edge between three-state variables:

log φij(·, ·,wij) =

wij11 wij12 wij13

wij21 wij22 wij23

wij31 wij32 wij33

We must set all 9 elements to zero to remove the edge.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 57: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Group-Sparse ModelsGroup `1-RegularizationExperiments

General Pairwise Log-Linear Models

In log-linear models, the log-potentials are linear functions.

IGMs are a special case with binary variables.

log φij(xi , xj ,wij) = xixjwij

But log-linear models allow non-binary discrete variables.

Also useful for (discretized) non-Gaussian continuous data.

The potentials for an edge between three-state variables:

log φij(·, ·,wij) =

wij11 wij12 wij13

wij21 wij22 wij23

wij31 wij32 wij33

We must set all 9 elements to zero to remove the edge.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 58: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Group-Sparse ModelsGroup `1-RegularizationExperiments

General Pairwise Log-Linear Models

In log-linear models, the log-potentials are linear functions.

IGMs are a special case with binary variables.

log φij(xi , xj ,wij) = xixjwij

But log-linear models allow non-binary discrete variables.

Also useful for (discretized) non-Gaussian continuous data.

The potentials for an edge between three-state variables:

log φij(·, ·,wij) =

wij11 wij12 wij13

wij21 wij22 wij23

wij31 wij32 wij33

We must set all 9 elements to zero to remove the edge.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 59: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Group-Sparse ModelsGroup `1-RegularizationExperiments

General Pairwise Log-Linear Models

X1

X3X8

X7X4

X5

X6

X2

X9

-0.2 -1.4 -0.2

0.9 -1.4 -0.2

-0.8 0.5 1.4

1.1 -1.5 2.4

1.5 -0.7 -0.6

0.1 -1.1 0.7

0.0 0.4 -1.1

1.5 -0.2 0.0

-0.8 1.1 0.6

-0.3 0.9 -0.8

0.3 -1.1 -2.9

-0.8 -1.1 1.4

2.8 0.7 -0.2

-1.4 -0.1 -0.1

3.0 0.7 1.5

0.3 -1.7 0.4

-0.8 -0.1 0.3

1.4 -0.2 -0.80.0 1.1 0.1

-0.2 1.1 -1.2

0.6 -0.9 -1.10.5 0.9 -0.4

1.8 0.3 0.3

-2.3 -1.3 3.61.4 -1.2 0.5

1.4 0.7 1.0

0.7 1.6 0.7

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 60: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Group-Sparse ModelsGroup `1-RegularizationExperiments

General Pairwise Log-Linear Models

X1

X3X8

X7X4

X5

X6

X2

X9

-0.2 -1.4 -0.2

0.9 -1.4 -0.2

-0.8 0.5 1.4

1.1 -1.5 2.4

1.5 -0.7 -0.6

0.1 -1.1 0.7

0.0 0.4 -1.1

1.5 -0.2 0.0

-0.8 1.1 0.6

-0.3 0.9 -0.8

0.3 -1.1 -2.9

-0.8 -1.1 1.4

2.8 0.7 -0.2

-1.4 -0.1 -0.1

3.0 0.7 1.5

0.3 -1.7 0.4

-0.8 -0.1 0.3

1.4 -0.2 -0.80.0 1.1 0.1

-0.2 1.1 -1.2

0.6 -0.9 -1.10.5 0.9 -0.4

1.8 0.3 0.3

-2.3 -1.3 3.61.4 -1.2 0.5

1.4 0.7 1.0

0.7 1.6 0.7

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 61: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Group-Sparse ModelsGroup `1-RegularizationExperiments

Blockwise Sparsity

X

Y

Y

X

Z

Z

In blockwise-sparse models, each variable has a type.

We expect some types to be conditionally independent.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 62: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Group-Sparse ModelsGroup `1-RegularizationExperiments

Blockwise Sparsity

X

Y

Y

X

Z

Z

In blockwise-sparse models, each variable has a type.

We expect some types to be conditionally independent.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 63: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Group-Sparse ModelsGroup `1-RegularizationExperiments

Blockwise Sparsity

X

Y

Y

X

Z

Z

In blockwise-sparse models, each variable has a type.

We expect some types to be conditionally independent.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 64: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Group-Sparse ModelsGroup `1-RegularizationExperiments

Blockwise Sparsity

X

Y

Y

X

Z

Z

In blockwise-sparse models, each variable has a type.

We expect some types to be conditionally independent.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 65: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Group-Sparse ModelsGroup `1-RegularizationExperiments

Blockwise Sparsity

In GGMs/IGMs, corresponds to blockwise-sparsity in matrix.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 66: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Group-Sparse ModelsGroup `1-RegularizationExperiments

Conditional Random Fields

X1

X3X8

X7X4

X5

X6

X2

X9

-0.2 -1.4 -0.2

0.9 -1.4 -0.2

-0.8 0.5 1.4

1.1 -1.5 2.4

1.5 -0.7 -0.6

0.1 -1.1 0.7

0.0 0.4 -1.1

1.5 -0.2 0.0

-0.8 1.1 0.6

-0.3 0.9 -0.8

0.3 -1.1 -2.9

-0.8 -1.1 1.4

2.8 0.7 -0.2

-1.4 -0.1 -0.1

3.0 0.7 1.5

0.3 -1.7 0.4

-0.8 -0.1 0.3

1.4 -0.2 -0.80.0 1.1 0.1

-0.2 1.1 -1.2

0.6 -0.9 -1.10.5 0.9 -0.4

1.8 0.3 0.3

-2.3 -1.3 3.61.4 -1.2 0.5

1.4 0.7 1.0

0.7 1.6 0.7

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 02.8 0.7 -0.2

-1.4 -0.1 -0.1

3.0 0.7 1.5

1.4 -1.2 0.5

1.4 0.7 1.0

0.7 1.6 0.7

-0.3 0.9 -0.8

0.3 -1.1 -2.9

-0.8 -1.1 1.4

0 0 0

0 0 0

0 0 0

0.3 -1.7 0.4

-0.8 -0.1 0.3

1.4 -0.2 -0.8

0.0 1.1 0.1

-0.2 1.1 -1.2

0.6 -0.9 -1.10.5 0.9 -0.4

1.8 0.3 0.3

-2.3 -1.3 3.6

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

-0.2 -1.4 -0.2

0.9 -1.4 -0.2

-0.8 0.5 1.4

1.1 -1.5 2.4

1.5 -0.7 -0.6

0.1 -1.1 0.7

0.0 0.4 -1.1

1.5 -0.2 0.0

-0.8 1.1 0.6

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

2.8 0.7 -0.2

-1.4 -0.1 -0.1

3.0 0.7 1.5

In some scenarios, we also have covariates.

We can consider doing conditional structure learning.

Here, we have a tensor of variables associated with each edge.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 67: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Group-Sparse ModelsGroup `1-RegularizationExperiments

Group `1-Regularization

In all these cases, we want sparsity in groups of parameters.

This can be accomplished with group `1-regularization:

minw

f (w) +∑g

λg ||wg ||2

Applies `1-regularization to the lengths of the groups.

An alternative is group `1-regularization with the `∞-norm:

minw

f (w) +∑g

λg ||wg ||∞

Applies `1-regularization to the maximums of the groups.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 68: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Group-Sparse ModelsGroup `1-RegularizationExperiments

Group `1-Regularization

In all these cases, we want sparsity in groups of parameters.

This can be accomplished with group `1-regularization:

minw

f (w) +∑g

λg ||wg ||2

Applies `1-regularization to the lengths of the groups.

An alternative is group `1-regularization with the `∞-norm:

minw

f (w) +∑g

λg ||wg ||∞

Applies `1-regularization to the maximums of the groups.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 69: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Group-Sparse ModelsGroup `1-RegularizationExperiments

Group `1-Regularization

w1

w2

||wg1||2

Unconstrained Solution

Group-L1 Regularized

||wg1||2

||wg2||2w4

w3

||wg2||2

p=2

w1

w2

||wg1||! w4

w3

||wg2||!

Unconstrained Solution

Group-L1 Regularized

||wg1||!

||wg2||!

p=!

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 70: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Group-Sparse ModelsGroup `1-RegularizationExperiments

Group `1-Regularization with Matrix Groups

In several of the examples, the groups form matrices.

For matrix groups, an alternative is the nuclear norm:

minW1,W2,...,WG

f (W1,W2, . . . ,WG ) +∑g

λg ||Wg ||σ

The nuclear norm, ||Wg ||σ, is the sum of singular values.

Applies `1-regularization to the singular values of the groups.

Encourages the matrices to be low-rank.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 71: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Group-Sparse ModelsGroup `1-RegularizationExperiments

Group `1-Regularization with Matrix Groups

In several of the examples, the groups form matrices.

For matrix groups, an alternative is the nuclear norm:

minW1,W2,...,WG

f (W1,W2, . . . ,WG ) +∑g

λg ||Wg ||σ

The nuclear norm, ||Wg ||σ, is the sum of singular values.

Applies `1-regularization to the singular values of the groups.

Encourages the matrices to be low-rank.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 72: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Group-Sparse ModelsGroup `1-RegularizationExperiments

Structure Learning with Group `1-Regularization

X1

X3X8

X7X4

X5

X6

X2

X9

-0.2 -1.4 -0.2

0.9 -1.4 -0.2

-0.8 0.5 1.4

1.1 -1.5 2.4

1.5 -0.7 -0.6

0.1 -1.1 0.7

0.0 0.4 -1.1

1.5 -0.2 0.0

-0.8 1.1 0.6

-0.3 0.9 -0.8

0.3 -1.1 -2.9

-0.8 -1.1 1.4

2.8 0.7 -0.2

-1.4 -0.1 -0.1

3.0 0.7 1.5

0.3 -1.7 0.4

-0.8 -0.1 0.3

1.4 -0.2 -0.80.0 1.1 0.1

-0.2 1.1 -1.2

0.6 -0.9 -1.10.5 0.9 -0.4

1.8 0.3 0.3

-2.3 -1.3 3.61.4 -1.2 0.5

1.4 0.7 1.0

0.7 1.6 0.7

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

Group `1-Regularization with the `2 group norm.

Encourage group sparsity.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 73: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Group-Sparse ModelsGroup `1-RegularizationExperiments

Structure Learning with Group `1-Regularization

X1

X3X8

X7X4

X5

X6

X2

X9

-0.2 -1.2 -0.2

0.9 -1.2 -0.2

-0.8 0.5 1.2

0.2 -0.2 0.2

0.2 -0.2 -0.2

0.1 0.2 0.2

0.0 0.2 -0.2

0.2 -0.2 0.0

-0.2 0.2 0.2

-0.3 0.8 -0.8

0.3 -0.8 -0.8

-0.8 -0.8 0.8

0.3 0.3 -0.2

-0.3 -0.1 -0.1

0.3 0.3 0.3

0.2 -0.2 0.2

-0.2 -0.1 0.2

0.2 -0.2 -0.20.0 0.8 0.1

-0.2 0.8 -1.2

0.6 -0.8 -0.80.5 0.7 -0.4

0.7 0.3 0.3

-0.7 -0.7 0.71.4 -1.2 0.5

1.4 0.7 1.0

0.7 1.6 1.6

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

Group `1-Regularization with the `∞ group norm.

Encourage group sparsity and parameter tieing.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 74: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Group-Sparse ModelsGroup `1-RegularizationExperiments

Structure Learning with Group `1-Regularization

X1

X3X8

X7X4

X5

X6

X2

X90.7

-0.1

1.5

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

2.8 0.7 -0.2

-0.7

1.0

0.7

1.4 -1.2 0.5 0 0 0

0 0 0

0 0 0

-0.1

0.3

-0.8

0.3 -1.7 0.4

0 0 0

0 0 0

0 0 00.3

0.3

3.6

0.5 0.9 -0.4

0 0 0

0 0 0

0 0 0

0.3 0.4

0.3 0.1

3.6 -1.1

0.5 0.9 -0.4

0.1 1.3 0.3

Group `1-Regularization with the nuclear group norm.

Encourage group sparsity and low-rank.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 75: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Group-Sparse ModelsGroup `1-RegularizationExperiments

Experiments Comparing Parameterizations and Norms

We tested three log-linear edge parameterizations:

log φij(·, ·,wij) =

wij 0 00 wij 00 0 wij

(Ising potentials)

log φij(·, ·,wij) =

wij1 0 00 wij2 00 0 wij3

(gIsing potentials)

log φij(·, ·,wij) =

wij11 wij12 wij13

wij21 wij22 wij23

wij31 wij32 wij33

(full potentials)

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 76: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Group-Sparse ModelsGroup `1-RegularizationExperiments

Experiments Comparing Parameterizations and Norms

We also tested six regularization strategies:

Tree: Maximum-likelihood tree structure.L2: `2-Regularization (squared).L1: `1-Regularization.L12: Group `1-Regularization (`2-norm).L1inf: Group `1-Regularization (`∞-norm).L1nuc: Group `1-Regularization (nuclear norm).

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 77: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Group-Sparse ModelsGroup `1-RegularizationExperiments

Experimental Comparison of Different Norms

Results on heart wall motion abnormality data (16 nodes, 5 states):

Ising gIsing full

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Tree L2 L1 Tree L2 L1 L12 L1inf Tree L2 L1 L12 L1inf L1nuc

test

se

t re

lativ

e n

eg

ativ

e lo

g!

pse

ud

o!

like

liho

od

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 78: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Group-Sparse ModelsGroup `1-RegularizationExperiments

Experimental Comparison of Different Norms

Results on USPS digits data (256 nodes, 4 discretization levels):

full

0

0.005

0.01

0.015

0.02

0.025

L2 L1 L12 L1inf L1nuc

test

se

t re

lativ

e n

eg

ativ

e lo

g!

pse

ud

o!

like

liho

od

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 79: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Group-Sparse ModelsGroup `1-RegularizationExperiments

Experimental Comparison of Different Norms

Results on USPS digits data (256 nodes, 8 discretization levels):

full

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

L2 L1 L12 L1inf L1nuc

test

se

t re

lative

ne

ga

tive

log

!p

seu

do

!lik

elih

oo

d

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 80: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Group-Sparse ModelsGroup `1-RegularizationExperiments

Experimental Comparison of Different Norms

Estimated structure on USPS data:1,2

2,12,2

1,3

2,3

1,4

1,5

2,41,6

2,51,7

2,61,8

2,71,9

2,81,10

2,91,11

2,101,12

2,111,13

2,121,14

2,13

1,15

2,14

2,15 1,16

2,16

3,16

3,13,2

3,3

3,4

3,5

3,6

3,7

3,8

3,9

3,10

3,11

3,12

3,13

3,14

3,15

4,1

4,2

4,3

4,4

4,5

4,6

4,7

4,8

4,9

4,10

4,11

4,12

4,13

4,14

4,15

4,16

5,1

5,2

5,3

5,4

5,5

5,6

5,7

5,8

5,9

5,10

5,11

5,12

5,13

5,14

5,15

5,16

6,1

6,2

6,3

6,4

6,5

6,6

6,7

6,8

6,9

6,10

6,11

6,12

6,13

6,14

6,15

6,16

7,1

7,2

7,3

7,4

7,5

7,6

7,7

7,8

7,9

7,10

7,11

7,12

7,13

7,14

7,15

7,16

8,1

8,2

8,3

8,4

8,5

8,6

8,7

8,8

8,9

8,10

8,11

8,12

8,13

8,14

8,15

8,16

9,1

9,2

9,3

9,4

9,5

9,6

9,7

9,8

9,9

9,10

9,11

9,12

9,13

9,14

9,15

9,16

10,15

10,1

10,2

10,3

10,4

10,5

10,6

10,7

10,8

10,9

10,10

10,11

10,12

10,13

10,14

10,16

11,15

11,1

11,2

11,3

11,4

11,5

11,6

11,7

11,8

11,9

11,10

11,11

11,12

11,13

11,14

11,16

12,15

12,1

12,2

12,3

12,4

12,5

12,6

12,7

12,8

12,9

12,10

12,11

12,12

12,13

12,14

12,16

13,15

13,1

13,2

13,3

13,4

13,5

13,6

13,7

13,8

13,9

13,10

13,11

13,12

13,13

13,14

13,16

14,15

14,1

14,2

14,3

14,4

14,5

14,6

14,7

14,8

14,9

14,10

14,11

14,12

14,13

14,14

14,16

15,1

15,2

15,3

15,4

15,5

15,6

15,7

15,8

15,9

15,10

15,11

15,12

15,13

15,14 15,15 15,16

16,1

16,2

16,3

16,4

16,5

16,6

16,7

16,8

16,9

16,10

16,11

16,12

16,13

16,14

16,15

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 81: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Outline

1 Motivation, Classical Methods

2 Gausian and Ising graphical models: `1-Regularization

3 General pairwise models: Group `1-Regularization

4 High-order models: Structured SparsityHierarchical Log-Linear ModelsActive Set MethodExperiments

5 Further Extensions

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 82: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Structure Learning with `1-Regularization

A list of papers on this topic (incomplete):

[Li & Yang, 2004], [Li & Yang, 2005], [Banerjee et al., 2006], [Huang et

al., 2006], [Lee et al., 2006], [Meinshausen & Buhlmann, 2006],

[Wainwright et al., 2006], [Dahinden et al., 2007], [Schmidt et al., 2007],

[Shimamura et al., 2007], [Yuan & Lin, 2007], [d’ Aspremont et al.,

2008], [Banerjee et al., 2008], [Dahl et al., 2008], [Duchi et al., 2008],

[Friedman et al., 2008], [Kolar & Xing, 2008], [Levina et al., 2008],

[Schmidt et al., 2008], [Fan & Feng, 2009], [Holing & Tibshirani, 2009],

[Krishnamurphy & d’Aspremont, 2009], [Lu, 2009a], [Lu, 2009b], [Marlin

et al., 2009a], [Marlin et al., 2009b], [Schmidt et al., 2009], [Schmidt &

Murphy, 2009], [Schnitzspan et al., 2009], [Yuan, 2009], [Vidaurre et al.,

2010].

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 83: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Structure Learning with `1-Regularization

Many of these papers have made the pairwise assumption:

[Li & Yang, 2004], [Li & Yang, 2005], [Banerjee et al., 2006], [Huang et

al., 2006], [Lee et al., 2006], [Meinshausen & Buhlmann, 2006],

[Wainwright et al., 2006], [Dahinden et al., 2007], [Schmidt et al., 2007],

[Shimamura et al., 2007], [Yuan & Lin, 2007], [d’ Aspremont et al.,

2008], [Banerjee et al., 2008], [Dahl et al., 2008], [Duchi et al., 2008],

[Friedman et al., 2008], [Kolar & Xing, 2008], [Levina et al., 2008],

[Schmidt et al., 2008], [Fan & Feng, 2009], [Holing & Tibshirani, 2009],

[Krishnamurphy & d’Aspremont, 2009], [Lu, 2009a], [Lu, 2009b], [Marlin

et al., 2009a], [Marlin et al., 2009b], [Schmidt et al., 2009], [Schmidt &

Murphy, 2009], [Schnitzspan et al., 2009], [Yuan, 2009], [Vidaurre et al.,

2010].

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 84: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Structure Learning with `1-Regularization

Many of these papers have made the pairwise assumption:

[Li & Yang, 2004], [Li & Yang, 2005], [Banerjee et al., 2006], [Huang et

al., 2006], [Lee et al., 2006], [Meinshausen & Buhlmann, 2006],

[Wainwright et al., 2006], [Dahinden et al., 2007], [Schmidt et al., 2007],

[Shimamura et al., 2007], [Yuan & Lin, 2007], [d’ Aspremont et al.,

2008], [Banerjee et al., 2008], [Dahl et al., 2008], [Duchi et al., 2008],

[Friedman et al., 2008], [Kolar & Xing, 2008], [Levina et al., 2008],

[Schmidt et al., 2008], [Fan & Feng, 2009], [Holing & Tibshirani, 2009],

[Krishnamurphy & d’Aspremont, 2009], [Lu, 2009a], [Lu, 2009b], [Marlin

et al., 2009a], [Marlin et al., 2009b], [Schmidt et al., 2009], [Schmidt &

Murphy, 2009], [Schnitzspan et al., 2009], [Yuan, 2009], [Vidaurre et al.,

2010].

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 85: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Structure Learning with `1-Regularization

Many of these papers have made the pairwise assumption:

[Li & Yang, 2004], [Li & Yang, 2005], [Banerjee et al., 2006], [Huang et

al., 2006], [Lee et al., 2006], [Meinshausen & Buhlmann, 2006],

[Wainwright et al., 2006], [Dahinden et al., 2007], [Schmidt et al., 2007],

[Shimamura et al., 2007], [Yuan & Lin, 2007], [d’ Aspremont et al.,

2008], [Banerjee et al., 2008], [Dahl et al., 2008], [Duchi et al., 2008],

[Friedman et al., 2008], [Kolar & Xing, 2008], [Levina et al., 2008],

[Schmidt et al., 2008], [Fan & Feng, 2009], [Holing & Tibshirani, 2009],

[Krishnamurphy & d’Aspremont, 2009], [Lu, 2009a], [Lu, 2009b], [Marlin

et al., 2009a], [Marlin et al., 2009b], [Schmidt et al., 2009], [Schmidt &

Murphy, 2009], [Schnitzspan et al., 2009], [Yuan, 2009], [Vidaurre et al.,

2010].

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 86: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Structure Learning with `1-Regularization

Many of these papers have made the pairwise assumption:

[Li & Yang, 2004], [Li & Yang, 2005], [Banerjee et al., 2006], [Huang et

al., 2006], [Lee et al., 2006], [Meinshausen & Buhlmann, 2006],

[Wainwright et al., 2006], [Dahinden et al., 2007], [Schmidt et al., 2007],

[Shimamura et al., 2007], [Yuan & Lin, 2007], [d’ Aspremont et al.,

2008], [Banerjee et al., 2008], [Dahl et al., 2008], [Duchi et al., 2008],

[Friedman et al., 2008], [Kolar & Xing, 2008], [Levina et al., 2008],

[Schmidt et al., 2008], [Fan & Feng, 2009], [Holing & Tibshirani, 2009],

[Krishnamurphy & d’Aspremont, 2009], [Lu, 2009a], [Lu, 2009b], [Marlin

et al., 2009a], [Marlin et al., 2009b], [Schmidt et al., 2009], [Schmidt &

Murphy, 2009], [Schnitzspan et al., 2009], [Yuan, 2009], [Vidaurre et al.,

2010].

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 87: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Beyond Pairwise Potentials

The pairwise assumption is inherent to Gaussian models.

The pairwise assumption has not traditionally been associatedwith log-linear models [Goodman, 1971], [Bishop et al., 1975].

The assumption is restrictive if higher-order statistics matter.

Eg. Mutations in both gene A and gene B lead to cancer.

We want to go beyond pairwise potentials.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 88: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Beyond Pairwise Potentials

The pairwise assumption is inherent to Gaussian models.

The pairwise assumption has not traditionally been associatedwith log-linear models [Goodman, 1971], [Bishop et al., 1975].

The assumption is restrictive if higher-order statistics matter.

Eg. Mutations in both gene A and gene B lead to cancer.

We want to go beyond pairwise potentials.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 89: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Beyond Pairwise Potentials

The pairwise assumption is inherent to Gaussian models.

The pairwise assumption has not traditionally been associatedwith log-linear models [Goodman, 1971], [Bishop et al., 1975].

The assumption is restrictive if higher-order statistics matter.

Eg. Mutations in both gene A and gene B lead to cancer.

We want to go beyond pairwise potentials.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 90: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Beyond Pairwise Potentials

The pairwise assumption is inherent to Gaussian models.

The pairwise assumption has not traditionally been associatedwith log-linear models [Goodman, 1971], [Bishop et al., 1975].

The assumption is restrictive if higher-order statistics matter.

Eg. Mutations in both gene A and gene B lead to cancer.

We want to go beyond pairwise potentials.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 91: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

General Log-Linear Models

In log-linear models [Bishop et al., 1975] we write the probabilityof a vector x ∈ {1, 2, . . . , k}p as a normalized product

p(x) ,1

Z

∏A⊆S

φA(xA),

over each subset A of S , {1, 2, . . . , p},(except the null set)

We consider gIsing and full parameterizations of these potentials.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 92: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

General Log-Linear Models

In log-linear models [Bishop et al., 1975] we write the probabilityof a vector x ∈ {1, 2, . . . , k}p as a normalized product

p(x) ,1

Z

∏A⊆S

φA(xA),

over each subset A of S , {1, 2, . . . , p},(except the null set)

We consider gIsing and full parameterizations of these potentials.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 93: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

General Log-Linear Models

The full parameterization for a threeway potential on binary nodes,

log φijk (xijk ) = I(xi = 1, xj = 1, xk = 1)wijk111 + I(xi = 1, xj = 1, xk = 2)wijk112

+ I(xi = 1, xj = 2, xk = 1)wijk121 + I(xi = 1, xj = 2, xk = 2)wijk122

+ I(xi = 2, xj = 1, xk = 1)wijk211 + I(xi = 2, xj = 1, xk = 2)wijk212

+ I(xi = 2, xj = 2, xk = 1)wijk221 + I(xi = 2, xj = 2, xk = 2)wijk222.

φA(xA) has k |A| parameters wA.

Setting wA = 0 is equivalent to removing the potential.

In pairwise models we assume wA = 0 if |A| > 2.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 94: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

General Log-Linear Models

The full parameterization for a threeway potential on binary nodes,

log φijk (xijk ) = I(xi = 1, xj = 1, xk = 1)wijk111 + I(xi = 1, xj = 1, xk = 2)wijk112

+ I(xi = 1, xj = 2, xk = 1)wijk121 + I(xi = 1, xj = 2, xk = 2)wijk122

+ I(xi = 2, xj = 1, xk = 1)wijk211 + I(xi = 2, xj = 1, xk = 2)wijk212

+ I(xi = 2, xj = 2, xk = 1)wijk221 + I(xi = 2, xj = 2, xk = 2)wijk222.

φA(xA) has k |A| parameters wA.

Setting wA = 0 is equivalent to removing the potential.

In pairwise models we assume wA = 0 if |A| > 2.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 95: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

General Log-Linear Models

The full parameterization for a threeway potential on binary nodes,

log φijk (xijk ) = I(xi = 1, xj = 1, xk = 1)wijk111 + I(xi = 1, xj = 1, xk = 2)wijk112

+ I(xi = 1, xj = 2, xk = 1)wijk121 + I(xi = 1, xj = 2, xk = 2)wijk122

+ I(xi = 2, xj = 1, xk = 1)wijk211 + I(xi = 2, xj = 1, xk = 2)wijk212

+ I(xi = 2, xj = 2, xk = 1)wijk221 + I(xi = 2, xj = 2, xk = 2)wijk222.

φA(xA) has k |A| parameters wA.

Setting wA = 0 is equivalent to removing the potential.

In pairwise models we assume wA = 0 if |A| > 2.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 96: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Group `1-Regularization for General Log-Linear Models

We can extend the work on pairwise models to the general case bysolving [Dahinden et al., 2007]:

minw−

n∑i=1

log p(xi |w) +∑A⊆S

λA||wA||2,

However,

Sparsity in the groups A does not correspond to conditionalindependence.

Without a cardinality restriction, we have an exponentialnumber of variables.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 97: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Group `1-Regularization for General Log-Linear Models

We can extend the work on pairwise models to the general case bysolving [Dahinden et al., 2007]:

minw−

n∑i=1

log p(xi |w) +∑A⊆S

λA||wA||2,

However,

Sparsity in the groups A does not correspond to conditionalindependence.

Without a cardinality restriction, we have an exponentialnumber of variables.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 98: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Hierarchical Log-Linear Models

Instead of using a cardinality restriction, we use:

Hierarchical Inclusion Restriction:If wA = 0 and A ⊂ B, then wB = 0.

We can only have (1, 2, 3) if we also have (1, 2), (1, 3), and (2, 3).

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 99: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Hierarchical Log-Linear Models

Instead of using a cardinality restriction, we use:

Hierarchical Inclusion Restriction:If wA = 0 and A ⊂ B, then wB = 0.

We can only have (1, 2, 3) if we also have (1, 2), (1, 3), and (2, 3).

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 100: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Hierarchical Log-Linear Models

This is the well-known class of hierarchical log-linear models[Bishop et al., 1975].

Much larger than the set of pairwise models.

Can represent any positive distribution.

Group-sparsity corresponds to conditional independence.

But, we can’t enforce the hierarchical constraint with(disjoint) group `1-regularization.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 101: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Hierarchical Log-Linear Models

This is the well-known class of hierarchical log-linear models[Bishop et al., 1975].

Much larger than the set of pairwise models.

Can represent any positive distribution.

Group-sparsity corresponds to conditional independence.

But, we can’t enforce the hierarchical constraint with(disjoint) group `1-regularization.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 102: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Hierarchical Log-Linear Models

This is the well-known class of hierarchical log-linear models[Bishop et al., 1975].

Much larger than the set of pairwise models.

Can represent any positive distribution.

Group-sparsity corresponds to conditional independence.

But, we can’t enforce the hierarchical constraint with(disjoint) group `1-regularization.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 103: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Hierarchical Log-Linear Models

This is the well-known class of hierarchical log-linear models[Bishop et al., 1975].

Much larger than the set of pairwise models.

Can represent any positive distribution.

Group-sparsity corresponds to conditional independence.

But, we can’t enforce the hierarchical constraint with(disjoint) group `1-regularization.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 104: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Structured Sparsity for Hierarchical Constraints

Bach [2008], Zhao et al. [2009] enforce hierarchical inclusionrestrictions with overlapping group `1-regularization.(also known as structured sparsity)

Example:

We can enforce that B is zero whenever A is zero by usingtwo groups: {B} and {A,B}.The resulting regularizer is λB ||wB ||2 + λA,B ||wA,B ||2

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 105: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Structured Sparsity for Hierarchical Constraints

Bach [2008], Zhao et al. [2009] enforce hierarchical inclusionrestrictions with overlapping group `1-regularization.(also known as structured sparsity)

Example:

We can enforce that B is zero whenever A is zero by usingtwo groups: {B} and {A,B}.The resulting regularizer is λB ||wB ||2 + λA,B ||wA,B ||2

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 106: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Structured Sparsity for Hierarchical Log-Linear Models

We can learn hierarchical log-linear models by solving

minw−

n∑i=1

log p(xi |w) +∑A⊆S

λA(∑

{B|A⊆B}

||wB ||22)1/2.

Under reasonable assumptions, a minimizer of this convexoptimization problem will satisfy hierarchical inclusion.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 107: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Structured Sparsity for Hierarchical Log-Linear Models

We can learn hierarchical log-linear models by solving

minw−

n∑i=1

log p(xi |w) +∑A⊆S

λA(∑

{B|A⊆B}

||wB ||22)1/2.

Under reasonable assumptions, a minimizer of this convexoptimization problem will satisfy hierarchical inclusion.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 108: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Active Set Method

We want to avoid considering the exponential number ofpossible higher-order potentials.

We know the solution will be hierarchical, so we propose toonly consider groups that satisfy hierarchical inclusion.

The resulting method guarantees a weak form of globaloptimality.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 109: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Active Set Method

We want to avoid considering the exponential number ofpossible higher-order potentials.

We know the solution will be hierarchical, so we propose toonly consider groups that satisfy hierarchical inclusion.

The resulting method guarantees a weak form of globaloptimality.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 110: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Active Set Method

We want to avoid considering the exponential number ofpossible higher-order potentials.

We know the solution will be hierarchical, so we propose toonly consider groups that satisfy hierarchical inclusion.

The resulting method guarantees a weak form of globaloptimality.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 111: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Active, Inactive, Boundary Groups

We call A an active group if A or some superset of A isnon-zero.

If A is not active, and some subset of A is zero, we call A aninactive group.

The remaining groups are called boundary group.

Boundary groups can be made non-zero without violatinghierarchical inclusion.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 112: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Active, Inactive, Boundary Groups

We call A an active group if A or some superset of A isnon-zero.

If A is not active, and some subset of A is zero, we call A aninactive group.

The remaining groups are called boundary group.

Boundary groups can be made non-zero without violatinghierarchical inclusion.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 113: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Active, Inactive, Boundary Groups

We call A an active group if A or some superset of A isnon-zero.

If A is not active, and some subset of A is zero, we call A aninactive group.

The remaining groups are called boundary group.

Boundary groups can be made non-zero without violatinghierarchical inclusion.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 114: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Active, Inactive, Boundary Groups

We call A an active group if A or some superset of A isnon-zero.

If A is not active, and some subset of A is zero, we call A aninactive group.

The remaining groups are called boundary group.

Boundary groups can be made non-zero without violatinghierarchical inclusion.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 115: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Active Set Method

Similar to Bach [2008], we use an active set method:

Find the active groups, and sub-optimal boundary groups.

Solve the problem with respect to these variables.

This adds groups that satisfy hierarchical inclusion, and where themodel poorly estimates the higher-moment in the data.

(analogous to the greedy method of [Gevarter, 1987] for fittingmaximum entropy distributions subject to marginal constraints[Cheeseman, 1983]).

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 116: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Active Set Method

Similar to Bach [2008], we use an active set method:

Find the active groups, and sub-optimal boundary groups.

Solve the problem with respect to these variables.

This adds groups that satisfy hierarchical inclusion, and where themodel poorly estimates the higher-moment in the data.

(analogous to the greedy method of [Gevarter, 1987] for fittingmaximum entropy distributions subject to marginal constraints[Cheeseman, 1983]).

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 117: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Active Set Method

Similar to Bach [2008], we use an active set method:

Find the active groups, and sub-optimal boundary groups.

Solve the problem with respect to these variables.

This adds groups that satisfy hierarchical inclusion, and where themodel poorly estimates the higher-moment in the data.

(analogous to the greedy method of [Gevarter, 1987] for fittingmaximum entropy distributions subject to marginal constraints[Cheeseman, 1983]).

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 118: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Example of Active Set Method

Initial boundary groups.

1,2,3 1,2,4 1,2,5 1,3,4 1,3,5 1,4,5 2,3,4 2,3,5 2,4,5 3,4,5

1,2,3,4 1,2,3,5 1,2,4,5 1,3,4,5 2,3,4,5

1,2,3,4,5

1,2 1,3 1,4 1,5 2,3 2,4 2,5 3,4 3,5 4,5

1 2 3 4 5

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 119: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Example of Active Set Method

Optimize initial boundary groups.

1,2,3 1,2,4 1,2,5 1,3,4 1,3,5 1,4,5 2,3,4 2,3,5 2,4,5 3,4,5

1,2,3,4 1,2,3,5 1,2,4,5 1,3,4,5 2,3,4,5

1,2,3,4,5

1,2 1,3 1,4 1,5 2,3 2,4 2,5 3,4 3,5 4,5

1 2 3 4 5

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 120: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Example of Active Set Method

Find new active groups.

1,2,3 1,2,4 1,2,5 1,3,4 1,3,5 1,4,5 2,3,4 2,3,5 2,4,5 3,4,5

1,2,3,4 1,2,3,5 1,2,4,5 1,3,4,5 2,3,4,5

1,2,3,4,5

1,2 1,3 1,4 1,5 2,3 2,4 2,5 3,4 3,5 4,5

1 2 3 4 5

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 121: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Example of Active Set Method

Find new boundary groups.

1,2,3 1,2,4 1,2,5 1,3,4 1,3,5 1,4,5 2,3,4 2,3,5 2,4,5 3,4,5

1,2,3,4 1,2,3,5 1,2,4,5 1,3,4,5 2,3,4,5

1,2,3,4,5

1,2 1,3 1,4 1,5 2,3 2,4 2,5 3,4 3,5 4,5

1 2 3 4 5

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 122: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Example of Active Set Method

Optimize active groups and sub-optimal boundary groups.

1,2,3 1,2,4 1,2,5 1,3,4 1,3,5 1,4,5 2,3,4 2,3,5 2,4,5 3,4,5

1,2,3,4 1,2,3,5 1,2,4,5 1,3,4,5 2,3,4,5

1,2,3,4,5

1,2 1,3 1,4 1,5 2,3 2,4 2,5 3,4 3,5 4,5

1 2 3 4 5

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 123: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Example of Active Set Method

Find new active groups.

1,2,3 1,2,4 1,2,5 1,3,4 1,3,5 1,4,5 2,3,4 2,3,5 2,4,5 3,4,5

1,2,3,4 1,2,3,5 1,2,4,5 1,3,4,5 2,3,4,5

1,2,3,4,5

1,2 1,3 1,4 1,5 2,3 2,4 2,5 3,4 3,5 4,5

1 2 3 4 5

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 124: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Example of Active Set Method

Find new boundary groups.

1,2,3 1,2,4 1,2,5 1,3,4 1,3,5 1,4,5 2,3,4 2,3,5 2,4,5 3,4,5

1,2,3,4 1,2,3,5 1,2,4,5 1,3,4,5 2,3,4,5

1,2,3,4,5

1,2 1,3 1,4 1,5 2,3 2,4 2,5 3,4 3,5 4,5

1 2 3 4 5

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 125: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Example of Active Set Method

Optimize active groups and sub-optimal boundary groups.

1,2,3 1,2,4 1,2,5 1,3,4 1,3,5 1,4,5 2,3,4 2,3,5 2,4,5 3,4,5

1,2,3,4 1,2,3,5 1,2,4,5 1,3,4,5 2,3,4,5

1,2,3,4,5

1,2 1,3 1,4 1,5 2,3 2,4 2,5 3,4 3,5 4,5

1 2 3 4 5

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 126: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Example of Active Set Method

Find new active groups.

1,2,3 1,2,4 1,2,5 1,3,4 1,3,5 1,4,5 2,3,4 2,3,5 2,4,5 3,4,5

1,2,3,4 1,2,3,5 1,2,4,5 1,3,4,5 2,3,4,5

1,2,3,4,5

1,2 1,3 1,4 1,5 2,3 2,4 2,5 3,4 3,5 4,5

1 2 3 4 5

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 127: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Example of Active Set Method

Find new boundary groups.

1,2,3 1,2,4 1,2,5 1,3,4 1,3,5 1,4,5 2,3,4 2,3,5 2,4,5 3,4,5

1,2,3,4 1,2,3,5 1,2,4,5 1,3,4,5 2,3,4,5

1,2,3,4,5

1,2 1,3 1,4 1,5 2,3 2,4 2,5 3,4 3,5 4,5

1 2 3 4 5

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 128: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Example of Active Set Method

Optimize active groups and sub-optimal boundary groups.

1,2,3 1,2,4 1,2,5 1,3,4 1,3,5 1,4,5 2,3,4 2,3,5 2,4,5 3,4,5

1,2,3,4 1,2,3,5 1,2,4,5 1,3,4,5 2,3,4,5

1,2,3,4,5

1,2 1,3 1,4 1,5 2,3 2,4 2,5 3,4 3,5 4,5

1 2 3 4 5

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 129: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Example of Active Set Method

Find new active groups.

1,2,3 1,2,4 1,2,5 1,3,4 1,3,5 1,4,5 2,3,4 2,3,5 2,4,5 3,4,5

1,2,3,4 1,2,3,5 1,2,4,5 1,3,4,5 2,3,4,5

1,2,3,4,5

1,2 1,3 1,4 1,5 2,3 2,4 2,5 3,4 3,5 4,5

1 2 3 4 5

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 130: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Example of Active Set Method

Find new boundary groups.

1,2,3 1,2,4 1,2,5 1,3,4 1,3,5 1,4,5 2,3,4 2,3,5 2,4,5 3,4,5

1,2,3,4 1,2,3,5 1,2,4,5 1,3,4,5 2,3,4,5

1,2,3,4,5

1,2 1,3 1,4 1,5 2,3 2,4 2,5 3,4 3,5 4,5

1 2 3 4 5

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 131: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Example of Active Set Method

Optimize active groups and sub-optimal boundary groups.

1,2,3 1,2,4 1,2,5 1,3,4 1,3,5 1,4,5 2,3,4 2,3,5 2,4,5 3,4,5

1,2,3,4 1,2,3,5 1,2,4,5 1,3,4,5 2,3,4,5

1,2,3,4,5

1,2 1,3 1,4 1,5 2,3 2,4 2,5 3,4 3,5 4,5

1 2 3 4 5

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 132: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Example of Active Set Method

Find new active groups.

1,2,3 1,2,4 1,2,5 1,3,4 1,3,5 1,4,5 2,3,4 2,3,5 2,4,5 3,4,5

1,2,3,4 1,2,3,5 1,2,4,5 1,3,4,5 2,3,4,5

1,2,3,4,5

1,2 1,3 1,4 1,5 2,3 2,4 2,5 3,4 3,5 4,5

1 2 3 4 5

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 133: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Example of Active Set Method

No new boundary groups, so we are done.

1,2,3 1,2,4 1,2,5 1,3,4 1,3,5 1,4,5 2,3,4 2,3,5 2,4,5 3,4,5

1,2,3,4 1,2,3,5 1,2,4,5 1,3,4,5 2,3,4,5

1,2,3,4,5

1,2 1,3 1,4 1,5 2,3 2,4 2,5 3,4 3,5 4,5

1 2 3 4 5

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 134: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Example of Active Set Method

We only considered 4 of 10 possible threeway interactions, 1of 5 fourway interactions, and no fiveway interactions.

The active set method can save us from looking at anexponential number of higher-order factors.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 135: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Example of Active Set Method

We only considered 4 of 10 possible threeway interactions, 1of 5 fourway interactions, and no fiveway interactions.

The active set method can save us from looking at anexponential number of higher-order factors.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 136: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Multivariate Flow Cytometry Experiments

Does it empirically help to have higher-order potentials?

We first consider a small data set where we can tractably computethe normalizing constant:

Multivariate flow cytometry [Sachs et al., 2005].

We compared:

Pairwise with `2-regularization and group `1-regularization.

Threeway with `2-regularization and group `1-regularization.

Hierarchical with overlapping group `1-regularization.

We trained on 1/3, used 1/3 to select λ, and used 1/3 as a testset (for 10 random splits).

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 137: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Multivariate Flow Cytometry Experiments

Does it empirically help to have higher-order potentials?

We first consider a small data set where we can tractably computethe normalizing constant:

Multivariate flow cytometry [Sachs et al., 2005].

We compared:

Pairwise with `2-regularization and group `1-regularization.

Threeway with `2-regularization and group `1-regularization.

Hierarchical with overlapping group `1-regularization.

We trained on 1/3, used 1/3 to select λ, and used 1/3 as a testset (for 10 random splits).

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 138: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Multivariate Flow Cytometry Experiments

Does it empirically help to have higher-order potentials?

We first consider a small data set where we can tractably computethe normalizing constant:

Multivariate flow cytometry [Sachs et al., 2005].

We compared:

Pairwise with `2-regularization and group `1-regularization.

Threeway with `2-regularization and group `1-regularization.

Hierarchical with overlapping group `1-regularization.

We trained on 1/3, used 1/3 to select λ, and used 1/3 as a testset (for 10 random splits).

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 139: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Multivariate Flow Cytometry Experiments

Does it empirically help to have higher-order potentials?

We first consider a small data set where we can tractably computethe normalizing constant:

Multivariate flow cytometry [Sachs et al., 2005].

We compared:

Pairwise with `2-regularization and group `1-regularization.

Threeway with `2-regularization and group `1-regularization.

Hierarchical with overlapping group `1-regularization.

We trained on 1/3, used 1/3 to select λ, and used 1/3 as a testset (for 10 random splits).

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 140: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Flow Cytometry Data

Pairwise Threeway HLLM

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

L2 L1 L2 L1 L1

test

se

t re

lativ

e n

eg

ativ

e lo

g!

like

liho

od

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 141: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Traffic and USPS Experiments

We next consider two larger data sets:

USPS digits data discretized into four states.

Traffic flow level [Shahaf et al., 2009].

On these experiments we used gIsing potentials, and used apseudo-likelihood for training/test.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 142: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

USPS Data

Pairwise Threeway HLLM

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

L2 L1 L2 L1 L1

test

se

t re

lative

ne

ga

tive

log

!p

seu

do

!lik

elih

oo

d

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 143: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Traffic Flow Data

Pairwise Threeway HLLM

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

L2 L1 L2 L1 L1

test

se

t re

lativ

e n

eg

ativ

e lo

g!

pse

ud

o!

like

liho

od

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 144: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Structure Estimation

We sought to test whether the HLLM model could recover atrue structure.

We generated samples from a 10-node data set with potentials(2, 3)(4, 5, 6)(7, 8, 9, 10) and parameters from N (0, 1).

We recorded the number of false positives of different ordersfor the first model along the regularization path that includesthe true model.

Eg., with 20000 samples the order was(8,10)(7,9)(9,10)(7,10)(4,5)(8,9)(2,3)(4,6)(8,9,10)(7,8)(7,8,9)(7,8,10)(5,6)(1,8)(5,9)(3,8)(3,7)(4,5,6)(1,7)(7,9,10)(7,8,9,10)

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 145: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Structure Estimation

We sought to test whether the HLLM model could recover atrue structure.

We generated samples from a 10-node data set with potentials(2, 3)(4, 5, 6)(7, 8, 9, 10) and parameters from N (0, 1).

We recorded the number of false positives of different ordersfor the first model along the regularization path that includesthe true model.

Eg., with 20000 samples the order was(8,10)(7,9)(9,10)(7,10)(4,5)(8,9)(2,3)(4,6)(8,9,10)(7,8)(7,8,9)(7,8,10)(5,6)(1,8)(5,9)(3,8)(3,7)(4,5,6)(1,7)(7,9,10)(7,8,9,10)

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 146: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Structure Estimation

We sought to test whether the HLLM model could recover atrue structure.

We generated samples from a 10-node data set with potentials(2, 3)(4, 5, 6)(7, 8, 9, 10) and parameters from N (0, 1).

We recorded the number of false positives of different ordersfor the first model along the regularization path that includesthe true model.

Eg., with 20000 samples the order was(8,10)(7,9)(9,10)(7,10)(4,5)(8,9)(2,3)(4,6)(8,9,10)(7,8)(7,8,9)(7,8,10)(5,6)(1,8)(5,9)(3,8)(3,7)(4,5,6)(1,7)(7,9,10)(7,8,9,10)

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 147: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Structure Estimation

We sought to test whether the HLLM model could recover atrue structure.

We generated samples from a 10-node data set with potentials(2, 3)(4, 5, 6)(7, 8, 9, 10) and parameters from N (0, 1).

We recorded the number of false positives of different ordersfor the first model along the regularization path that includesthe true model.

Eg., with 20000 samples the order was(8,10)(7,9)(9,10)(7,10)(4,5)(8,9)(2,3)(4,6)(8,9,10)(7,8)(7,8,9)(7,8,10)(5,6)(1,8)(5,9)(3,8)(3,7)(4,5,6)(1,7)(7,9,10)(7,8,9,10)

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 148: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

Hierarchical Log-Linear ModelsActive Set MethodExperiments

Synethetic Data: Types of Errors

Types of errors made by HLLM:

0 50 100 150 2000

5

10

15

20

25

Training Examples (thousands)

Fa

lse

Po

sitiv

es

Pairwise

Threeway

Fourway

Fiveway

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 149: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

ExtensionsSummary

Outline

1 Motivation, Classical Methods

2 Gausian and Ising graphical models: `1-Regularization

3 General pairwise models: Group `1-Regularization

4 High-order models: Structured Sparsity

5 Further ExtensionsExtensionsSummary

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 150: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

ExtensionsSummary

Group Sparse Priors for Covariance Estimation

Earlier we discussed blockwise-sparse models.

What if the blocks aren’t completely sparse?

What if we don’t know the variable types?

We give bounds on integrals of priors over positive-definitematrices, and a variational method that learns the types.[Marlin, Schmidt, Murphy, 2009]

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 151: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

ExtensionsSummary

Group Sparse Priors for Covariance Estimation

Earlier we discussed blockwise-sparse models.

What if the blocks aren’t completely sparse?

What if we don’t know the variable types?

We give bounds on integrals of priors over positive-definitematrices, and a variational method that learns the types.[Marlin, Schmidt, Murphy, 2009]

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 152: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

ExtensionsSummary

Group Sparse Priors for Covariance Estimation

Earlier we discussed blockwise-sparse models.

What if the blocks aren’t completely sparse?

What if we don’t know the variable types?

We give bounds on integrals of priors over positive-definitematrices, and a variational method that learns the types.[Marlin, Schmidt, Murphy, 2009]

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 153: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

ExtensionsSummary

Group Sparse Priors for Covariance Estimation

Learned variable types on mutual fund data:[Scott & Carvalho, 2008]

The methods discover the ‘stocks’ and ‘bonds’ groups.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 154: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

ExtensionsSummary

Causality: Modeling Interventions

The difference between conditioning by observation andconditioning by intervention in the ‘hungry at work’ problem:

If I see that my watch says 11:55, then it’s almost lunch timeIf I set my watch so it says 11:55, it doesn’t help

Without knowing the difference, predictions may be useless.

Methods that model interventions are typically called causal.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 155: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

ExtensionsSummary

Causality: Modeling Interventions

The difference between conditioning by observation andconditioning by intervention in the ‘hungry at work’ problem:

If I see that my watch says 11:55, then it’s almost lunch timeIf I set my watch so it says 11:55, it doesn’t help

Without knowing the difference, predictions may be useless.

Methods that model interventions are typically called causal.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 156: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

ExtensionsSummary

Causality: Modeling Interventions

The difference between conditioning by observation andconditioning by intervention in the ‘hungry at work’ problem:

If I see that my watch says 11:55, then it’s almost lunch timeIf I set my watch so it says 11:55, it doesn’t help

Without knowing the difference, predictions may be useless.

Methods that model interventions are typically called causal.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 157: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

ExtensionsSummary

Causality: Modeling Interventions

The difference between conditioning by observation andconditioning by intervention in the ‘hungry at work’ problem:

If I see that my watch says 11:55, then it’s almost lunch timeIf I set my watch so it says 11:55, it doesn’t help

Without knowing the difference, predictions may be useless.

Methods that model interventions are typically called causal.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 158: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

ExtensionsSummary

Causality: Modeling Interventions

The difference between conditioning by observation andconditioning by intervention in the ‘hungry at work’ problem:

If I see that my watch says 11:55, then it’s almost lunch timeIf I set my watch so it says 11:55, it doesn’t help

Without knowing the difference, predictions may be useless.

Methods that model interventions are typically called causal.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 159: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

ExtensionsSummary

Causality: Modeling Interventions

Interventional Cell Signaling Data [Sachs et al., 2005]

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 160: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

ExtensionsSummary

Causality: Modeling Interventions

Causal learning methods are usually evaluated in terms of a‘true’ underlying DAG.

For real data, the structure may not be known, or even a DAG.

Why not evaluate causal models in terms of modeling theeffects of interventions?

Given this task, there are a variety of approaches to causality.[Eaton & Murphy, 2007][Schmidt & Murphy, 2009][Duvenaud, Eaton, Murphy, Schmidt, 2010]

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 161: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

ExtensionsSummary

Causality: Modeling Interventions

Causal learning methods are usually evaluated in terms of a‘true’ underlying DAG.

For real data, the structure may not be known, or even a DAG.

Why not evaluate causal models in terms of modeling theeffects of interventions?

Given this task, there are a variety of approaches to causality.[Eaton & Murphy, 2007][Schmidt & Murphy, 2009][Duvenaud, Eaton, Murphy, Schmidt, 2010]

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 162: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

ExtensionsSummary

Causality: Modeling Interventions

Causal learning methods are usually evaluated in terms of a‘true’ underlying DAG.

For real data, the structure may not be known, or even a DAG.

Why not evaluate causal models in terms of modeling theeffects of interventions?

Given this task, there are a variety of approaches to causality.[Eaton & Murphy, 2007][Schmidt & Murphy, 2009][Duvenaud, Eaton, Murphy, Schmidt, 2010]

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 163: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

ExtensionsSummary

Causality: Modeling Interventions

Causal learning methods are usually evaluated in terms of a‘true’ underlying DAG.

For real data, the structure may not be known, or even a DAG.

Why not evaluate causal models in terms of modeling theeffects of interventions?

Given this task, there are a variety of approaches to causality.[Eaton & Murphy, 2007][Schmidt & Murphy, 2009][Duvenaud, Eaton, Murphy, Schmidt, 2010]

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 164: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

ExtensionsSummary

Causality: Modeling Interventions

Interventional Cell Signaling Data [Sachs et al., 2005]:

5

5.2

5.4

5.6

5.8

6

6.2

6.4

6.6

6.8

MM

UG

M

DA

G

MM

UG

M

MM

UG

M

DA

G

DA

G

NLL on Sachs

Ignore Independent Conditional Perfect

MM

MM

MM

UG

M

UG

M

UG

M

DA

G

DA

G

DA

G

6

5

6.8

6.6

6.4

6.2

5.8

5.6

5.4

5.2Av

era

ge

Ne

ga

tiv

e L

og

-Lik

eli

ho

od

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 165: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

ExtensionsSummary

Other Selected Extensions

Some topics not discussed:

The methods can be extended to handle missing data orhidden variables.

We can consider mixtures of sparse graphical models.

Stochastic approximation methods allow MCMC for inference.

Can be used as sub-routines in variational Bayes methods.

Can be used as sub-routines in consistent estimation methods.

Methods might be useful for other types of structure learning.

Non-convex alternatives to `1-regularization.

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 166: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

ExtensionsSummary

Summary

`1-Regularization is an appealing approach for graphical modelstructure learning.

Prior work focuses on Gaussian and Ising graphical models.

We considered models with group sparsity:

General discrete pairwise models.Blockwise-sparse models.Conditional models.

We discussed methods for going beyond pairwise potentials.

Code is on-line (or will be soon).

Thank you for inviting me!

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 167: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

ExtensionsSummary

Summary

`1-Regularization is an appealing approach for graphical modelstructure learning.

Prior work focuses on Gaussian and Ising graphical models.

We considered models with group sparsity:

General discrete pairwise models.Blockwise-sparse models.Conditional models.

We discussed methods for going beyond pairwise potentials.

Code is on-line (or will be soon).

Thank you for inviting me!

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 168: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

ExtensionsSummary

Summary

`1-Regularization is an appealing approach for graphical modelstructure learning.

Prior work focuses on Gaussian and Ising graphical models.

We considered models with group sparsity:

General discrete pairwise models.Blockwise-sparse models.Conditional models.

We discussed methods for going beyond pairwise potentials.

Code is on-line (or will be soon).

Thank you for inviting me!

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 169: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

ExtensionsSummary

Summary

`1-Regularization is an appealing approach for graphical modelstructure learning.

Prior work focuses on Gaussian and Ising graphical models.

We considered models with group sparsity:

General discrete pairwise models.Blockwise-sparse models.Conditional models.

We discussed methods for going beyond pairwise potentials.

Code is on-line (or will be soon).

Thank you for inviting me!

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 170: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

ExtensionsSummary

Summary

`1-Regularization is an appealing approach for graphical modelstructure learning.

Prior work focuses on Gaussian and Ising graphical models.

We considered models with group sparsity:

General discrete pairwise models.Blockwise-sparse models.Conditional models.

We discussed methods for going beyond pairwise potentials.

Code is on-line (or will be soon).

Thank you for inviting me!

Mark Schmidt Structure Learning in Undirected Graphical Models

Page 171: Structure Learning in Undirected Graphical Models Mark Schmidt › ~schmidtm › Documents › 2011_INRA_Undire… · w o n Mark Schmidt Structure Learning in Undirected Graphical

Motivation, Classical MethodsGausian and Ising graphical models: `1-Regularization

General pairwise models: Group `1-RegularizationHigh-order models: Structured Sparsity

Further Extensions

ExtensionsSummary

Summary

`1-Regularization is an appealing approach for graphical modelstructure learning.

Prior work focuses on Gaussian and Ising graphical models.

We considered models with group sparsity:

General discrete pairwise models.Blockwise-sparse models.Conditional models.

We discussed methods for going beyond pairwise potentials.

Code is on-line (or will be soon).

Thank you for inviting me!

Mark Schmidt Structure Learning in Undirected Graphical Models