hierarchical bayesian models for modeling …hierarchical bayesian models for modeling cognitive...

58
Hierarchical Bayesian Models for Modeling Cognitive Processes C ¸a˘ grı C ¸¨ oltekin Center for Language and Cognition University of Groningen [email protected] March 11, 2009

Upload: others

Post on 21-Jun-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

Hierarchical Bayesian Models for ModelingCognitive Processes

Cagrı Coltekin

Center for Language and CognitionUniversity of [email protected]

March 11, 2009

Page 2: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

Overview

Bayesian InferenceIntroduction(with comparison to frequentist statistics)A Simple Example

Bayesian Inference and LearningMLE and MAPBayesian Learning

Hierarchical Bayesian ModelsA simple HBM example

Using HBMs for Learning GrammarCategorial GrammarsA Bayesian CG learnerA hierarchical extension

Hierarchical Bayesian Models 2/58

Page 3: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

Two views on statistics

◮ Frequentist view◮ Probabilities are long-run frequencies.◮ We can talk about probabilities only for well defined

experiments with random outcomes.◮ Emphasis is on objectivity: data must speak for itself.

◮ Bayesian view◮ Probabilities are degrees of belief.◮ Probabilities can be assigned to any statement.◮ Inference is subjective (methods for somewhat objective

inference exists)

Hierarchical Bayesian Models 3/58

Page 4: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

Statistical Inference

◮ We collect a sample X from a population.

◮ We know (or assume) that X is according to a knowndistribution that can be parametrized by θ

p(θ|X) =p(X|θ)p(θ)

p(X)

p(θ|X) : posteriorp(X|θ) : likelihood (L(θ))p(θ) : priorp(X) : Marginal probability of data (

∫p(X|θ)p(θ)dθ))

Hierarchical Bayesian Models 4/58

Page 5: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

Statistical Inference: the two approaches

p(θ|X) =p(X|θ)p(θ)

p(X)

∝ p(X|θ)p(θ)

◮ Frequentist approach:◮ θ is a fixed but unknown.◮ Inference is done via Maximum Likelihood Estimate (MLE).

Prior information is never used.◮ We need to assess the reliability of the estimate by significance

tests (p values, confidence intervals).

◮ Bayesian approach:◮ θ is is treated just like any other random variable.◮ Posterior contains all the necessary ingredients for the

inference.

Hierarchical Bayesian Models 5/58

Page 6: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

A simple example

We toss a coin 10 times, the outcome is HHTHHHHTTH. Let θ

represent the chance that coin comes up ‘H’.

Frequentist Bayesian 1 Bayesian 2

Hierarchical Bayesian Models 6/58

Page 7: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

A simple example

We toss a coin 10 times, the outcome is HHTHHHHTTH. Let θ

represent the chance that coin comes up ‘H’.

Frequentist Bayesian 1 Bayesian 2

No prior Prior:

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4 Beta(1,1)

Prior:

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4 Beta(5,5)

Hierarchical Bayesian Models 7/58

Page 8: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

A simple example

We toss a coin 10 times, the outcome is HHTHHHHTTH. Let θ

represent the chance that coin comes up ‘H’.

Frequentist Bayesian 1 Bayesian 2

θ = argmaxθ

L(θ)

θ = 0.7p = 0.2059

Posterior:

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4 Beta(1,1)

Beta(8,4)

Posterior:

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4 Beta(5,5)

Beta(12,8)

Hierarchical Bayesian Models 8/58

Page 9: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

Bayesian Inference: summary

p(θ|X) ∝ p(X|θ)p(θ)

Pros

◮ Allows complete inference: posterior distribution contains allthe information needed.

◮ Interpretation of the results are straightforward.

Cons

◮ Computationally expensive: calculation of posteriors are notalways easy.

◮ Choice of priors: it is sometimes difficult to justify the use of(subjective) priors.

Hierarchical Bayesian Models 9/58

Page 10: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

Statistical Inference and Learning

◮ Statistical inference aims to draw conclusions about anunderlying population using a sample drawn from thispopulation.

◮ Learning can be viewed as making generalizations about thetarget concept by looking at a sample of data consistent withthis concept.

◮ Statistical methods are most common learning methods inmachine learning.

◮ More and more psychological phenomena is explained bysensitivity to statistical information in the environment.

◮ Language acquisition is no exception: children are known toexploit statistical regularities in the input in language learning.

Hierarchical Bayesian Models 10/58

Page 11: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

(non-Bayesian) Statistical Learning

p(θ|X) =p(X|θ)p(θ)

p(X)

∝ p(X|θ)p(θ)

We want to learn the parameter θ.

◮ Maximum Likelihood Estimate(MLE):

θ = argmaxθ

p(X|θ)

◮ Maximum a posteriori (MAP) estimate:

θ = argmaxθ

p(X|θ)p(θ)

Hierarchical Bayesian Models 11/58

Page 12: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

Bayesian Learning

p(θ|X) ∝ p(X|θ)p(θ)

◮ No point estimates.

◮ No maximization.

◮ A Bayesian learner learns the posterior distribution, p(θ|X).

◮ If needed, point estimates can be made using,

E [θ] =

∫θp(θ|X)dθ

Note the difference from MAP estimate.

Hierarchical Bayesian Models 12/58

Page 13: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

Bayesian Learning: how?

p(θ|X) =p(X|θ)p(θ)∫p(X|θ)p(θ)dθ

◮ Given the probability density (or distribution) functions forprior and likelihood, we simply multiply, and normalize.

◮ Note that likelihood and prior does not have to be proper.Multiplication by a constant does not change the results.

Problem 1: computation

The computation involved is not always easy to carry out.Approximate methods, such as MCMC, are frequently used.

Problem 2: priors

We need to choose a prior distribution.

Hierarchical Bayesian Models 13/58

Page 14: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

Choice of priors

◮ Subjective priors: based on previous experience.

◮ Non-informative priors: try to be as objective as possible.

◮ Conjugate priors: for computational efficiency.

◮ Empirical Bayes: priors from data.

◮ Hierarchical priors: combining information from differentvariables.

Hierarchical Bayesian Models 14/58

Page 15: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

Digression: Graphical models (or ‘Bayesian networks’)

◮ Often multiple random variables interact.

◮ Bayesian networks are a convenient way to visualize thedependency relations between variables.

Hierarchical Bayesian Models 15/58

Page 16: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

Digression: Graphical models (or ‘Bayesian networks’)

◮ Often multiple random variables interact.

◮ Bayesian networks are a convenient way to visualize thedependency relations between variables.

X

θ

Hierarchical Bayesian Models 16/58

Page 17: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

Hierarchical Bayesian Models

In our coin toss example we choose a Beta(α, β) priorwith fixed α and β.

p(θ|X) ∝ p(X|θ)p(θ)

where,

X ∼ Binomial(θ), θ ∼ Beta(α, β)X

θ

Hierarchical Bayesian Models 17/58

Page 18: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

Hierarchical Bayesian Models

In our coin toss example we choose a Beta(α, β) priorwith fixed α and β.

p(θ|X) ∝ p(X|θ)p(θ)

where,

X ∼ Binomial(θ), θ ∼ Beta(α, β)

We can further extend this if choice of α and β can beguided by additional information.

p(θ, α, β|X) ∝ p(X|θ)p(θ|α, β)p(α, β)

where, e.g.,

X ∼ Binomial(θ), θ ∼ Beta(α, β), α ∼ N(µα, σ), β ∼ N(µβ, σ)

X

θ

α, β

Hierarchical Bayesian Models 18/58

Page 19: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

A simple example: marbles in boxes

Boxes contain either white or black marbles:

?

Hierarchical Bayesian Models 19/58

∗ This example is adopted from a talk by J. Tenenbaum

Page 20: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

A simple example: marbles in boxes

Boxes contain either white or black marbles:

?

Hierarchical Bayesian Models 20/58

∗ This example is adopted from a talk by J. Tenenbaum

Page 21: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

A simple example: marbles in boxes

Boxes contain either white or black marbles:

?

Level 2: Boxes in general

Level 1: Box properties

Color varies among boxes not much within boxes

MostlyBlack

MostlyWhite

MostlyWhite

Hierarchical Bayesian Models 21/58

∗ This example is adopted from a talk by J. Tenenbaum

Page 22: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

HBM for marbles in boxes

◮ xi are the data

◮ θi models box proportions,xi ∼ Binomial(θ)

◮ α, β models the boxes in general,θi ∼ Beta(α, β)

◮ λ models prior expectations in boxes ingeneral

λ

α, β

θ1

x1

θ2

x2

θ3

x3

θ4

x4

p(λ, α, β, θ | X) ∝ p(X | θ)p(θ | λ, α, β)p(α, β | λ)p(λ)

Hierarchical Bayesian Models 22/58

Page 23: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

Summary: HBMs for Modeling Human Learning

◮ Bayesian Statistics provides complete inference: posteriordistribution contains all we need.

◮ Bayesian Learning is incremental: posterior can be used asprior for the next step.

◮ Hierarchical models allow a way to include information fromdifferent sources as prior knowledge.

Hierarchical Bayesian Models 23/58

Page 24: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

Overview

Bayesian InferenceIntroduction(with comparison to frequentist statistics)A Simple Example

Bayesian Inference and LearningMLE and MAPBayesian Learning

Hierarchical Bayesian ModelsA simple HBM example

Using HBMs for Learning GrammarCategorial GrammarsA Bayesian CG learnerA hierarchical extension

Hierarchical Bayesian Models 24/58

Page 25: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

Categorial Grammars

◮ Categorial Grammar (CG) encodes all the language specificsyntactic information in the lexicon.

◮ CG Lexicon contains lexical items of the form:

φ := σ : µ

where φ is the phonological form, σ is the syntactic category,and µ is the meaning of the lexical item.

◮ Syntactic categories in CG are,◮ either a basic category, such as N, NP, S◮ or, a complex category of the form X\Y or X/Y, where X and

Y are any (basic or complex) CG categories. Informally:X/Y says: ‘I need a Y to my right to become X’X\Y says: ‘I need a Y to my left to become X’

Hierarchical Bayesian Models 25/58

Page 26: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

Categorial Grammars (contd.)

◮ More formally, CG has two rules:Forward Application: X/Y Y ⇒ X (>)Backward Application: Y X\Y ⇒ X (<)

◮ An example derivation:Mary likes Peter

NP (S\NP)/NP NP>

S\NP<

S

Hierarchical Bayesian Models 26/58

Page 27: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

Categorial Grammars (contd.)

◮ More formally, CG has two rules:Forward Application: X/Y Y ⇒ X (>)Backward Application: Y X\Y ⇒ X (<)

◮ An example derivation:Mary likes Peter

NP (S\NP)/NP NP>

S\NP<

S

S

NP

Mary

S\NP

(S\NP)/NP

likes

NP

Peter

Hierarchical Bayesian Models 27/58

Page 28: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

A simple CG learner for learning word-grammars

◮ Input is a set of words.

◮ We want to assign a probability, θ, to possible lexical items(〈φ, σ〉 pairs).

◮ Note: probability is the ‘system’s belief’ that the 〈φ, σ〉 pairat hand is a lexical item.

◮ Only 3 categories (known in advance):◮ W : free morpheme (word, or stem)◮ W/W :

Hierarchical Bayesian Models 28/58

Page 29: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

A simple CG learner for learning word-grammars

◮ Input is a set of words.

◮ We want to assign a probability, θ, to possible lexical items(〈φ, σ〉 pairs).

◮ Note: probability is the ‘system’s belief’ that the 〈φ, σ〉 pairat hand is a lexical item.

◮ Only 3 categories (known in advance):◮ W : free morpheme (word, or stem)◮ W/W : prefix◮ W\W :

Hierarchical Bayesian Models 29/58

Page 30: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

A simple CG learner for learning word-grammars

◮ Input is a set of words.

◮ We want to assign a probability, θ, to possible lexical items(〈φ, σ〉 pairs).

◮ Note: probability is the ‘system’s belief’ that the 〈φ, σ〉 pairat hand is a lexical item.

◮ Only 3 categories (known in advance):◮ W : free morpheme (word, or stem)◮ W/W : prefix◮ W\W : suffix

◮ We adopt a Beta/Binomial model.

◮ We assume each input word provides evidence for the lexicalhypothesis in question, If hypothesis used in the interpretationof the input.

Hierarchical Bayesian Models 30/58

Page 31: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

The CG learner: A simple algorithm

◮ Input is unsegmented words (and the lexicon).

◮ Output is the lexicalized grammar with probabilityassignments.

For each input word w ,

1. Try to segment the input using the current lexicon.

2. If there is no possible segmentation, assume that we havefound evidence for a lexical item w := W .

3. If we can segment the input as w = φ1 . . . φN , assume that wehave observed evidence for each tuple 〈φi , σj〉 which yields acorrect parse of w .

4. We update the parameters of the Beta distribution associatedwith the lexical hypotheses.

Hierarchical Bayesian Models 31/58

Page 32: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

An example

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4 book:=W

pens:=W

books:=W

s:=W\W

Lexicon {}Input bookHypothesesParses

Hierarchical Bayesian Models 32/58

Page 33: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

An example

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4 book:=W

pens:=W

books:=W

s:=W\W

Lexicon {}Input bookHypotheses book:=WParses

Hierarchical Bayesian Models 33/58

Page 34: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

An example

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4 book:=W

pens:=W

books:=W

s:=W\W

Lexicon {book:=W}Input bookHypotheses book:=WParses book

W

Hierarchical Bayesian Models 34/58

Page 35: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

An example

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4 book:=W

pens:=W

books:=W

s:=W\W

Lexicon {book:=W}Input pensHypothesesParses

Hierarchical Bayesian Models 35/58

Page 36: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

An example

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4 book:=W

pens:=W

books:=W

s:=W\W

Lexicon {book:=W}Input pensHypotheses pens:=WParses

Hierarchical Bayesian Models 36/58

Page 37: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

An example

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4 book:=W

pens:=W

books:=W

s:=W\W

Lexicon {book:=W,pens:=W}

Input pensHypotheses pens:=WParses pens

W

Hierarchical Bayesian Models 37/58

Page 38: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

An example

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4 book:=W

pens:=W

books:=W

s:=W\W

Lexicon {book:=W,pens:=W}

Input booksHypothesesParses

Hierarchical Bayesian Models 38/58

Page 39: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

An example

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4 book:=W

pens:=W

books:=W

s:=W\W

Lexicon {book:=W,pens:=W}

Input booksHypotheses book:=W,

books:=W, s:=W,book:=W/W,s:=W\W

Parses

Hierarchical Bayesian Models 39/58

Page 40: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

An example

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4 book:=W

pens:=W

books:=W

s:=W\W

Lexicon {book:=W,pens:=W, s:=W\W}

Input booksHypotheses book:=W,

books:=W, s:=W,book:=W/W,s:=W\W

Parses books

Wbook s

W W \W<

Wbook s

W/W W>

W

Hierarchical Bayesian Models 40/58

Page 41: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

An example

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4 book:=W

pens:=W

books:=W

s:=W\W

Lexicon {book:=W,pens:=W, s:=W\W}

Input pensHypothesesParses

Hierarchical Bayesian Models 41/58

Page 42: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

An example

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4 book:=W

pens:=W

books:=W

s:=W\W

Lexicon {book:=W,pens:=W, s:=W\W}

Input pensHypotheses pen:=W, pens:=W,

s:=W, pen:=W/W,s:=W\W

Parses

Hierarchical Bayesian Models 42/58

Page 43: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

An example

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4 book:=W

pens:=W

books:=W

s:=W\W

Lexicon {book:=W,pens:=W, s:=W\W}

Input pensHypotheses pen:=W, pens:=W,

s:=W, pen:=W/W,s:=W\W

Parses pens

W

pen s

W W \W<

Wpen s

W/W W>

W

Hierarchical Bayesian Models 43/58

Page 44: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

A hierarchical extension

◮ Our current model assumes rather non-informativevalues for α and β.

◮ We can extend this model to get more informativepriors

X

θ

Hierarchical Bayesian Models 44/58

Page 45: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

A hierarchical extension

◮ Our current model assumes rather non-informativevalues for α and β.

◮ We can extend this model to get more informativepriors

◮ We treat α and β as random variables.

◮ We make use of context predictability as anothersource providing a hierarchical informative prior forpossible segments.

X

θ

α, β

Hierarchical Bayesian Models 45/58

Page 46: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

Hierarchical extension: example

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4 book:=W

pens:=W

books:=W

s:=W\W

Lexicon {}

Input book

Hypotheses

Parses

Hierarchical Bayesian Models 46/58

Page 47: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

Hierarchical extension: example

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4 book:=W

pens:=W

books:=W

s:=W\W

Lexicon {}

Input book

Hypotheses book:=W

Parses

Hierarchical Bayesian Models 47/58

Page 48: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

Hierarchical extension: example

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4 book:=W

pens:=W

books:=W

s:=W\W

Lexicon {book:=W}

Input book

Hypotheses book:=W

Parses book

W

Hierarchical Bayesian Models 48/58

Page 49: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

Hierarchical extension: example

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4 book:=W

pens:=W

books:=W

s:=W\W

Lexicon {book:=W}

Input pens

Hypotheses

Parses

Hierarchical Bayesian Models 49/58

Page 50: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

Hierarchical extension: example

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4 book:=W

pens:=W

books:=W

s:=W\W

Lexicon {book:=W}

Input pens

Hypotheses pens:=W

Parses

Hierarchical Bayesian Models 50/58

Page 51: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

Hierarchical extension: example

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4 book:=W

pens:=W

books:=W

s:=W\W

Lexicon {book:=W,pens:=W}

Input pens

Hypotheses pens:=W

Parses pens

W

Hierarchical Bayesian Models 51/58

Page 52: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

Hierarchical extension: example

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4 book:=W

pens:=W

books:=W

s:=W\W

Lexicon {book:=W,pens:=W}

Input books

Hypotheses

Parses

Hierarchical Bayesian Models 52/58

Page 53: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

Hierarchical extension: example

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4 book:=W

pens:=W

books:=W

s:=W\W

Lexicon {book:=W,pens:=W}

Input books

Hypotheses book:=W,books:=W, s:=W,book:=W/W,s:=W\W

Parses

Hierarchical Bayesian Models 53/58

Page 54: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

Hierarchical extension: example

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4 book:=W

pens:=W

books:=W

s:=W\W

Lexicon {book:=W,pens:=W, s:=W\W}

Input books

Hypotheses book:=W,books:=W, s:=W,book:=W/W,s:=W\W

Parses books

Wbook s

W W \W<

Wbook s

W/W W>

W

Hierarchical Bayesian Models 54/58

Page 55: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

Hierarchical extension: example

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4 book:=W

pens:=W

books:=W

s:=W\W

Lexicon {book:=W,pens:=W, s:=W\W}

Input pens

Hypotheses

Parses

Hierarchical Bayesian Models 55/58

Page 56: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

Hierarchical extension: example

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4 book:=W

pens:=W

books:=W

s:=W\W

Lexicon {book:=W,pens:=W, s:=W\W}

Input pens

Hypotheses pen:=W, pens:=W,s:=W, pen:=W/W,s:=W\W

Parses

Hierarchical Bayesian Models 56/58

Page 57: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

Hierarchical extension: example

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4 book:=W

pens:=W

books:=W

s:=W\W

Lexicon {book:=W,pens:=W, s:=W\W,pen:=W}

Input pens

Hypotheses pen:=W, pens:=W,s:=W, pen:=W/W,s:=W\W

Parses pens

W

pen s

W W \W<

Wpen s

W/W W>

W

Hierarchical Bayesian Models 57/58

Page 58: Hierarchical Bayesian Models for Modeling …Hierarchical Bayesian Models for Modeling Cognitive Processes C¸agrı C¸oltekin Center for Language and Cognition University of Groningen

Summary

◮ Bayesian statistics provides a different approach to statisticalinference and learning.

◮ Use of (subjective) priors is not always bad: Modelingcognitive processes is a good example.

◮ Hierarchical priors is a good way to combine information fromdifferent sources.

Hierarchical Bayesian Models 58/58