md2 k 04_19_2015

24
Dependency Parsing by Belief Propagation. Presenter: Roy Adams David A. Smith and Jason Eisner

Upload: bbkuhn

Post on 18-Aug-2015

28 views

Category:

Science


3 download

TRANSCRIPT

Dependency Parsing by Belief Propagation.

Presenter: Roy Adams

David A. Smith and Jason Eisner

Undirected Graphical ModelsY1 Y2 Y3 Y4

X1 X2 X3 X4

= Variable not observed at test time.

= Variable always observed.

= Factor. Encodes correlations between variables.

Undirected Graphical ModelsY1 Y2 Y3 Y4

X1 X2 X3 X4

= Variable not observed at test time.

= Variable always observed.

Undirected Graphical ModelsY1 Y2 Y3 Y4

X1 X2 X3 X4

Undirected Graphical ModelsY1 Y2 Y3 Y4

X1 X2 X3 X4

Undirected Graphical ModelsY1 Y2 Y3 Y4

X1 X2 X3 X4

Marginal InferenceY1 Y2 Y3 Y4

X1 X2 X3 X4

Sum-Product Message PassingInitialize all messages to 1 Until Convergence:

For each variable, Yi: For each F in Neighborhood(Yi):

For each v in Dom(Yi):

For each factor, F: For each Yi in Neighborhood(F):

For each v in Dom(Yi):

Sum-Product Message PassingInitialize all messages to 1 Until Convergence:

For each variable, Yi: For each F in Neighborhood(Yi):

For each v in Dom(Yi):

For each factor, F: For each Yi in Neighborhood(F):

For each v in Dom(Yi):

Sum-Product Message PassingInitialize all messages to 1 Until Convergence:

For each variable, Yi: For each F in Neighborhood(Yi):

For each v in Dom(Yi):

For each factor, F: For each Yi in Neighborhood(F):

For each v in Dom(Yi):

Sum-Product Message PassingInitialize all messages to 1 Until Convergence:

For each variable, Yi: For each F in Neighborhood(Yi):

For each v in Dom(Yi):

For each factor, F: For each Yi in Neighborhood(F):

For each v in Dom(Yi):

Sum-Product Message Passing

Yi

“Based on what my other neighbors said, this my distribution over my values.”

Sum-Product Message Passing

Yi

“Based on what my other neighbors said, this my distribution over your values.”

Sum-Product Message Passing

- If it converges, we get:

- For tree structured graphs, it is guaranteed to converge and the marginals will be exact.

- For loopy graphs, it may not converge, and if it converges, the marginals may not be correct.

- With some tricks, it tends to work very well in practice.

Questions so far?

High Order Factors

Y1 Y3 YnY2 …

Why do we want them?

Y1 Y3 YnY2 …

They can be used to encode structure. E.g. Y1 through Yn must all take different values.

Why are they hard?

Y1 Y3 YnY2 …

Why are they hard?

Y1 Y3 YnY2 …

Basic message passing has exponential complexity in the neighborhood size of the largest factor.

Fixed High Order Factors

Y1 Y3 YnY2 …

Fixed High Order Factors

1) Doesn’t depend on the parameters, so we don’t need the marginal to calculate gradients in MLE.

Fixed High Order Factors

2) The structure is so constrained, that the sum in message passing becomes tractable.

Other Structures (Smith and Eisner)

-Parse trees -Unique labels -Label ordering -Segmentation

Questions?