graphical models: approximate inference and learning ca6b, lecture 5

Post on 18-Dec-2015

219 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Graphical models: approximate inference and learning

CA6b, lecture 5

Bayesian Networks

General Factorization

D-separation: Example

Trees

Undirected Tree Directed Tree Polytree

Converting Directed to Undirected Graphs (2)

Additional links

Inference on a Chain

Inference on a Chain

Inference on a Chain

Inference in a HMM

E step: belief propagation

1s 1ns ns 1ns Ns

Belief propagation in a HMM

E step: belief propagation

1s 1ns ns 1ns Ns

Expectation maximization in a HMM

E step: belief propagation

1s 1ns ns 1ns Ns

The Junction Tree Algorithm

• Exact inference on general graphs.• Works by turning the initial graph into a

junction tree and then running a sum-product-like algorithm.

Factor Graphs

Factor Graphs from Undirected Graphs

The Sum-Product Algorithm (6)

The Sum-Product Algorithm (6)

The Sum-Product Algorithm (6)

The Sum-Product Algorithm (5)

The Sum-Product Algorithm (3)

The Sum-Product Algorithm (7)

Initialization

Sensory observations

Prior expectations

Forest

Tree

Leave Root

Bottom-up

Top-down

Stem

1x

2x

3x

5x

4x

6xGreen

Consequence of failing inhibition in hierarchical inference

Causal model Pairwise factor graph

Bayesian network and factor graph

Causal model Pairwise factor graph

Causal model Pairwise factor graph

Pairwise graphs

Log belief ratio

Log messages ratio

Belief propagation and inhibitory loops

-

-

-

-

-

Tight excitatory/inhibitory balance is required, and sufficient

Okun and Lampl, Nat Neuro 2008

Inhibition

Excitation

Lewis et al,Nat Rev Nsci 05

controls schizophrenia

Support for impaired inhibition in schizophrenia

See also: Benes, Neuropsychopharmacology 2010, Uhhaas and Singer, Nat Rev Nsci 2010…

GAD26

Circular inference:

Impaired inhibitory loops

Circular inference and overconfidence:

32

1

2

Renaud Jardri Alexandra Litvinova & Sandrine Duverne

The Fisher Task

3

4

A priori

Evidence sensorielles

Confiance a posteriori

Mean group responses

Controls: Schizophrenes:

-4 -2 0 2 4-8

-6

-4

-2

0

2

4

6

8

Co

nfid

ence

Log likelihood ratio-4 -2 0 2 4

-8

-6

-4

-2

0

2

4

6

8

Co

nfid

ence

Log prior ratio

Simple Bayes:

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce-2 0 2

-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log likelihood ratio

Con

fiden

ce

-2 0 2-3

-2

-1

0

1

2

3

Log prior ratio

Con

fiden

ce

Control Patients

?s

SCZ

CTL

***

*

***

Para

met

er v

alue

(mea

n +

sd) 0.75

0.50

0.25

0.00

Mean parameter values

PAN

SS p

ositi

ve fa

ctor

Inference loops and psychosis

25

Non

-clin

ical

bel

iefs

(PD

I-21

scor

es)

PDI s

core

Strenght of loops Strenght of loops

The Junction Tree Algorithm

• Exact inference on general graphs.• Works by turning the initial graph into a

junction tree and then running a sum-product-like algorithm.

• Intractable on graphs with large cliques.

What if exact inference is intractable?

• Loopy belief propagation works in some scenarios.

• Markov-Monte-Carlo sampling methods.• Variational methods (not covered here)

Loopy Belief Propagation

• Sum-Product on general graphs.• Initial unit messages passed across all links,

after which messages are passed around until convergence (not guaranteed!).

• Approximate but tractable for large graphs.• Sometime works well, sometimes not at all.

1s1sh

2sh

Neural code for uncertainty: sampling

Alternative neural code for uncertainty: sampling

Berkes et al, Science 2011

Alternative neural code for uncertainty: sampling

Learning in graphical models

More generally: learning parameters in latent variable models

Visible

Hidden

?

? , |p x h

ˆ argmax |u

u

p x

Learning in graphical models

More generally: learning parameters in latent variable models

Visible

Hidden

?

? , |p x h

ˆ argmax |u

u

p x

| , |u u

h

p x p x h

Learning in graphical models

More generally: learning parameters in latent variable models

Visible

Hidden

?

? , |p x h

ˆ argmax |u

u

p x

| , |u u

h

p x p x h

Huge!

Mixture of Gaussians (clustering algorithm)

Data (unsupervised)

Mixture of Gaussians (clustering algorithm)

Data (unsupervised)

Generative model: M possible clusters

Gaussian distribution

Mixture of Gaussians (clustering algorithm)

Data (unsupervised)

Generative model: M possible clusters

Gaussian distribution

Parameters

Given the current parameters and the data, what are the expected hidden states?

Expectation stage:

Responsability

Given the responsabilities of each cluster, update the parameters to maximize the likelihood of the data:

Maximization stage:

Learning in hidden Markov models

1ts ts 1ts

tx1tx 1tx Hidden state

Observations

cause

Forward model

Sensory likelihood

Inverse model

tx

1ts

2ts

tL

t dts ts t dts

txdttx dttx Object present/not

Receptor spike/not

Time

1ts

2ts

tx

Time

tL

Leak Synaptic input

' it i t

i

LL w s

t

Bayesian integration corresponds to leaky integration.

Expectation maximization in a HMM

1s 1ns ns 1ns Ns

Multiple training sequences: 1 2, ,...,u u uNs s s

What are the parameters: 1 |ij n nr p x i x j

|ik n nq p s k x j

Transition probabilities

Observation probabilities

Expectation stage

E step: belief propagation

1s 1ns ns 1ns Ns

Expectation stage

E step: belief propagation

1s 1ns ns 1ns Ns

Expectation stage

E step: belief propagation

1s 1ns ns 1ns Ns

Using “on-line” expectation maximization, a neuron can adapt to the statistics of its input.

1 0,i iq q

,on offr r

Fast adaptation in single neurons

Adaptation to temporal statistics? Fairhall et al, 2001

top related