conditional random fields - a probabilistic graphical model stefan mutter machine learning group...

28
Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Conditional Random Fields - A probabilistic graphical model Stefan Mutter

Upload: martha-spencer

Post on 17-Dec-2015

232 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical

Conditional Random Fields - A probabilistic graphical model Stefan MutterMachine Learning Group

Conditional Random Fields - A probabilistic graphical model

Stefan Mutter

Page 2: Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical

Conditional Random Fields - A probabilistic graphical model Stefan MutterMachine Learning Group

Motivation

Bayesian Network

Naive Bayes

Markov Random Field

Hidden Markov Model

Logistic Regression

Linear Chain Conditional Random Field

General Conditional Random Field

Page 3: Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical

Conditional Random Fields - A probabilistic graphical model Stefan MutterMachine Learning Group

Outline

• different views on building a conditional random field (CRF)– from directed to undirected graphical models– from generative to discriminative models– sequence models

• from HMMs to CRFs• CRFs and maximum entropy markov models (MEMM)

• parameter estimation / inference

• applications

Page 4: Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical

Conditional Random Fields - A probabilistic graphical model Stefan MutterMachine Learning Group

Overview: directed graphical models

Bayesian Network

Naive Bayes

Markov Random Field

Hidden Markov Model

Logistic Regression

Linear Chain Conditional Random Field

General Conditional Random Field

Page 5: Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical

Conditional Random Fields - A probabilistic graphical model Stefan MutterMachine Learning Group

Bayesian Networks: directed graphical models

• in general:– a graphical model - family of probability

distributions that factorise according to an underlying graph

– one-to-one correspondence between nodes and random variables

– a set V of random variables consisting of a set X of input variables and a set Y of output variables to predict

• independence assumption using topological ordering:– a node is v conditionally independent of its predecessors given its direct parents

π(v) (Markov blanket)

• direct probabilistic interpretation:– family of distributions factorises into:

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 6: Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical

Conditional Random Fields - A probabilistic graphical model Stefan MutterMachine Learning Group

Overview: undirected graphical models

Bayesian Network

Naive Bayes

Markov Random Field

Hidden Markov Model

Logistic Regression

Linear Chain Conditional Random Field

General Conditional Random Field

Page 7: Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical

Conditional Random Fields - A probabilistic graphical model Stefan MutterMachine Learning Group

Markov Random Field: undirected graphical models

• undirected graph for joint probability p(x) allows no direct probabilistic interpretation

– define potential functions on maximal cliques A

• map joint assignment to non-negative real number• requires normalisation€

p(x) =1

ZΨA (xA )

A

Z = ΨA (xA )A

∏x

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

green red

Page 8: Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical

Conditional Random Fields - A probabilistic graphical model Stefan MutterMachine Learning Group

Markov Random Fields and CRFs

• A CRF is a Markov Random Field globally conditioned on X

• How do the potential functions look like?

p(y | x) =1

ZΨA (xA ,yA )

A

Z(x) = ΨA (xA ,yA )A

∏y

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 9: Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical

Conditional Random Fields - A probabilistic graphical model Stefan MutterMachine Learning Group

Overview: generative discriminative models

Bayesian Network

Naive Bayes

Markov Random Field

Hidden Markov Model

Logistic Regression

Linear Chain Conditional Random Field

General Conditional Random Field

Page 10: Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical

Conditional Random Fields - A probabilistic graphical model Stefan MutterMachine Learning Group

Generative models

• based on joint probability distribution p(y,x)• includes a model of p(x) which is not needed for

classification• interdependent features

– either enhance model structure to represent them• complexity problems

– or make simplifying independence assumptions• e.g. naive bayes: once the class label is known, all features are

independent

Page 11: Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical

Conditional Random Fields - A probabilistic graphical model Stefan MutterMachine Learning Group

Discriminative models

• based directly on conditional probability p(y|x)• need no model for p(x)• simply:

– make independence assumptions among y but not among x

• in general:

pg (y,x;θ) = pg (x;θ)pg (y | x;θ)

pg (y | x;θ) = pg (y,x;θ) / pg (x;θ)computed by inference

p(x) = pc (x;θ ') = pg (y,x;θ ')y

∑pc (y | x;θ) = pg (y,x;θ) / p(x;θ)

pc (y,x) = pc (x;θ ')pc (y | x;θ) conditional approach more freedom to fit data

Page 12: Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical

Conditional Random Fields - A probabilistic graphical model Stefan MutterMachine Learning Group

Naive bayes and logistic regression (1)

• naive bayes and logistic regression are generative-discriminative pair

• naive bayes: • It can be shown that a gaussian naive bayes

classifier implies the parametric form of p(y|x) of its discriminative pair logistic regression!€

p(y,x) = p(y) p(xk | y)k=1

K

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

LR is a MRF globally conditioned on X Use log-linear model as potential functions in CRFsLR is a very simple CRF

Page 13: Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical

Conditional Random Fields - A probabilistic graphical model Stefan MutterMachine Learning Group

Naive bayes and logistic regression (2)

• if GNB assumptions hold, then GNB and LR converge asymptotically toward identical classifiers

• in generative models set of parameters must represent input distribution and conditional well.

• in discriminative models are not as strongly tied to their input distribution– e.g. LR fits its parameter to the data although the naive bayes

assumption might be violated

• in other words: there are more (complex) joint models than GNB whose conditional also have the “LR form”

• GNB and LR mirror relationship between HMM and linear chain CRF

Page 14: Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical

Conditional Random Fields - A probabilistic graphical model Stefan MutterMachine Learning Group

Overview: sequence models

Bayesian Network

Naive Bayes

Markov Random Field

Hidden Markov Model

Logistic Regression

Linear Chain Conditional Random Field

General Conditional Random Field

Page 15: Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical

Conditional Random Fields - A probabilistic graphical model Stefan MutterMachine Learning Group

Sequence models: HMMs

• power of graphical models: model many interdependent variables

• HMM models joint distribution– uses two independence assumptions to do it tractably

• given the direct predecessor, each state is independent of his ancestors

• each observation depends only on current state

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 16: Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical

Conditional Random Fields - A probabilistic graphical model Stefan MutterMachine Learning Group

From HMMs to linear chain CRFs (1)

• key: conditional distribution p(y|x) of an HMM is a CRF with a particular choice of feature function

– parameters are not required to be log probabilities, therefore introduce normalisation

– using feature functions:

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

with

λij = log p(y '= i | y = j)

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 17: Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical

Conditional Random Fields - A probabilistic graphical model Stefan MutterMachine Learning Group

From HMMs to linear chain CRFs (2)

• last step: write conditional probability for the HMM

• This is a linear chain CRF that includes features only HMM features, richer features are possible

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 18: Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical

Conditional Random Fields - A probabilistic graphical model Stefan MutterMachine Learning Group

Linear chain conditional random fields

• Definition:

• for general CRFs use arbitrary cliques

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.with

Page 19: Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical

Conditional Random Fields - A probabilistic graphical model Stefan MutterMachine Learning Group

Side trip: maximum entropy markov models

• entropy - measure of the uniformity of a distribution

• maximum entropy model maximises entropy, subject to constraints imposed by training data

• model conditional probabilities of reaching a state given an observation o and previous state s’ instead of joint probabilities– observations on transitions

– split P(s|s’,o) in |S| separately trained transition functions Ps’(s|o)

• leads to per state normalisation

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 20: Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical

Conditional Random Fields - A probabilistic graphical model Stefan MutterMachine Learning Group

Side Trip: label bias problem

• CRF like log-linear models, but label bias problem:

– per state normalisation requires that probabilities of transitions leaving a state must some to one

• conservation of probability mass• states with one outgoing transition ignore observation

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Calculate:

Page 21: Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical

Conditional Random Fields - A probabilistic graphical model Stefan MutterMachine Learning Group

Inference in a linear chain CRF

• slight variants of HMM algorithms: – Viterbi: use definition from HMM

– but define:

– because CRF model can be written as:

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

where

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 22: Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical

Conditional Random Fields - A probabilistic graphical model Stefan MutterMachine Learning Group

Parameter estimation in general

• So far major drawback• generative model tend to have higher asymptotic

error, but• it approaches its asymptotic error faster than a

discriminative one with number of training examples logarithmic in number of parameters rather than linear

• remember: discriminative models make no independent assumptions for observations x

Page 23: Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical

Conditional Random Fields - A probabilistic graphical model Stefan MutterMachine Learning Group

Principles in parameter estimation

• basic principle: maximum likelihood estimation with conditional log likelihood of

– advantage: conditional log likelihood is concave, therefore every local optimum is a global one

• use gradient descent: quasi-Newton methods• runtime in O(tm2ng) t length of sequence, m

number of labels, n number of training instances, g number of required gradient computations

l (θ) = log p(y(i) | x(i))i=1

N

Page 24: Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical

Conditional Random Fields - A probabilistic graphical model Stefan MutterMachine Learning Group

Application: gene prediction

• use finite-state CRFs to locate introns and exons in DNA sequences

• advantages of CRFs:– ability to straightforwardly incorporate homology evidence from

protein databases.– used feature functions:

• e.g. frequencies of base conjunctions and disjunctions in sliding windows over 20 bases upstream and 40 bases downstream (motivation: splice site detection)

– How many times did “C or G” occurred in the prior 40 bases with sliding window of size 5?

• E.g. frequencies how many times a base appears in related protein (via BLAST search)

• Outperforms 5th order hidden semi markov model by 10% reduction in harmonic mean of precision and recall (86.09 <-> 84.55)

Page 25: Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical

Conditional Random Fields - A probabilistic graphical model Stefan MutterMachine Learning Group

Summary: graphical models

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 26: Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical

Conditional Random Fields - A probabilistic graphical model Stefan MutterMachine Learning Group

The end

Questions ?

Page 27: Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical

Conditional Random Fields - A probabilistic graphical model Stefan MutterMachine Learning Group

References• An Introduction to Conditional Random Fields for Relational

Learning. Charles Sutton and Andrew McCallum. In Introduction to Statistical Relational Learning. Edited by Lise Getoor and Ben Taskar. MIT Press. 2006.

(including figures and formulae)• H. Wallach, "Efficient training of conditional random fields,"

Master's thesis, University of Edinburgh, 2002. http://citeseer.ist.psu.edu/wallach02efficient.html

• John Lafferty, Andrew McCallum, and Fernando Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of ICML-01, pages 282-289, 2001.

• Gene Prediction with Conditional Random Fields. Aron Culotta, David Kulp, and Andrew McCallum. Technical Report UM-CS-2005-028, University of Massachusetts, Amherst, April 2005.

Page 28: Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical

Conditional Random Fields - A probabilistic graphical model Stefan MutterMachine Learning Group

References

• Kevin Murphy. An introduction to graphical models. Technical report, Intel Research Technical Report., 2001. http://citeseer.ist.psu.edu/murphy01introduction.html

• On Discriminative vs. Generative Classifiers: A comparison of logistic regression and Naive Bayes, Andrew Y. Ng and Michael Jordan. In NIPS 14,, 2002.

• T. Minka. Discriminative models, not discriminative training. Technical report, Microsoft Research Cambridge, 2005.

• P. Blunsom. Maximum Entropy Classification. Lecture slides 433-680. 2005. http://www.cs.mu.oz.au/680/lectures/week06a.pdf