priors and predictions in everyday cognition tom griffiths cognitive and linguistic sciences

73
Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Post on 19-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Priors and predictions ineveryday cognition

Tom GriffithsCognitive and Linguistic Sciences

Page 2: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

data behaviorQuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

What computational problem is the brain solving?

Does human behavior correspond to an optimal solution to that problem?

Page 3: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Inductive problems

• Inferring structure from data

• Perception– e.g. structure of 3D world from 2D visual data

data hypotheses

cube

shaded hexagon

Page 4: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Inductive problems

• Inferring structure from data

• Perception– e.g. structure of 3D world from 2D data

• Cognition– e.g. relationship between variables from samples

data hypotheses

Page 5: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Reverend Thomas Bayes

Page 6: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Bayes’ theorem

∑∈′

′′=

Hh

hphdp

hphdpdhp

)()|(

)()|()|(

Posteriorprobability

Likelihood Priorprobability

Sum over space of hypothesesh: hypothesis

d: data

Page 7: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Bayes’ theorem

p(h | d)∝ p(d | h)p(h)

h: hypothesisd: data

Page 8: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Perception is optimal

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Körding & Wolpert (2004)

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 9: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Cognition is not

Page 10: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Do people use priors?

Standard answer: no(Tversky & Kahneman, 1974)

p(h | d)∝ p(d | h)p(h)

This talk: yes

What are people’s priors?

Page 11: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Explaining inductive leaps

• How do people – infer causal relationships– identify the work of chance– predict the future– assess similarity and make generalizations– learn functions, languages, and concepts

. . . from such limited data?

Page 12: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Explaining inductive leaps

• How do people – infer causal relationships– identify the work of chance– predict the future– assess similarity and make generalizations– learn functions, languages, and concepts

. . . from such limited data?

• What knowledge guides human inferences?

Page 13: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Prior knowledge matters when…

• …using a single datapoint– predicting the future– joint work

• …using secondhand data– effects of priors on cultural transmission

Page 14: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Outline

• …using a single datapoint– predicting the future– joint work with Josh Tenenbaum (MIT)

• …using secondhand data– effects of priors on cultural transmission– joint work with Mike Kalish (Louisiana)

• Conclusions

Page 15: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Outline

• …using a single datapoint– predicting the future– joint work with Josh Tenenbaum (MIT)

• …using secondhand data– effects of priors on cultural transmission– joint work with Mike Kalish (Louisiana)

• Conclusions

Page 16: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Predicting the future

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

How often is Google News updated?

t = time since last update

ttotal = time between updates

What should we guess for ttotal given t?

Page 17: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Making predictions

• You encounter a phenomenon that has existed for t units of time. How long will it continue into the future? (i.e. what’s ttotal?)

• We could replace “time” with any other variable that ranges from 0 to some unknown upper limit

Page 18: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Everyday prediction problems• You read about a movie that has made $60 million to date.

How much money will it make in total?

• You see that something has been baking in the oven for 34 minutes. How long until it’s ready?

• You meet someone who is 78 years old. How long will they live?

• Your friend quotes to you from line 17 of his favorite poem. How long is the poem?

• You see taxicab #107 pull up to the curb in front of the train station. How many cabs in this city?

Page 19: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Bayesian inference

p(ttotal|t) p(t|ttotal) p(ttotal)

posterior probability

likelihood prior

Page 20: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Bayesian inference

p(ttotal|t) p(t|ttotal) p(ttotal)

p(ttotal|t) 1/ttotal p(ttotal)

assumerandomsample

(0 < t < ttotal)

posterior probability

likelihood prior

Page 21: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Bayesian inference

p(ttotal|t) p(t|ttotal) p(ttotal)

p(ttotal|t) 1/ttotal 1/ttotal

assumerandomsample

(0 < t < ttotal)

posterior probability

likelihood prior

“uninformative” prior

Page 22: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

How about maximal value of p(ttotal|t)?

Bayesian inference

p(ttotal|t) 1/ttotal 1/ttotal

posterior probability

What is the best guess for ttotal?

p(ttotal|t)

ttotalttotal = t

randomsampling

“uninformative” prior

Page 23: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Bayesian inference

p(ttotal|t)

ttotal

What is the best guess for ttotal? Instead, compute t* such that p(ttotal > t*|t) = 0.5:

p(ttotal|t) 1/ttotal 1/ttotal

posterior probability

randomsampling

“uninformative” prior

Page 24: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Bayesian inference

Yields Gott’s Rule: P(ttotal > t*|t) = 0.5 when t* = 2t i.e., best guess for ttotal = 2t

What is the best guess for ttotal? Instead, compute t* such that p(ttotal > t*|t) = 0.5.

p(ttotal|t) 1/ttotal 1/ttotal

posterior probability

randomsampling

“uninformative” prior

Page 25: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Applying Gott’s rule

t 4000 years, t* 8000 years

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 26: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Applying Gott’s rule

t 130,000 years, t* 260,000 years

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 27: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Predicting everyday events

• You meet someone who is 35 years old. How long will they live?– “70 years” seems reasonable

• Not so simple:– You meet someone who is 78 years old. How long will they

live?

– You meet someone who is 6 years old. How long will they live?

Page 28: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

The effects of priors

• Different kinds of priors p(ttotal) are appropriate in different domains.

Uninformative: p(ttotal) 1/ttotal

Page 29: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

The effects of priors

• Different kinds of priors p(ttotal) are appropriate in different domains.

e.g. wealth e.g. height

Page 30: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

The effects of priors

Page 31: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Evaluating human predictions

• Different domains with different priors:– a movie has made $60 million [power-law]

– your friend quotes from line 17 of a poem [power-law]

– you meet a 78 year old man [Gaussian]

– a movie has been running for 55 minutes [Gaussian]

– a U.S. congressman has served for 11 years [Erlang]

• Prior distributions derived from actual data

• Use 5 values of t for each

• People predict ttotal

Page 32: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

peopleparametric priorempirical prior

Gott’s rule

Page 33: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Nonparametric priors

You arrive at a friend’s house, and see that a cake has been in the oven for 34 minutes. How long will it be in the oven?

Page 34: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

You learn that in ancient Egypt, there was a great flood in the 11th year of a pharaoh’s reign. How long did he reign?

No direct experience

Page 35: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

You learn that in ancient Egypt, there was a great flood in the 11th year of a pharaoh’s reign. How long did he reign?

How long did the typicalpharaoh reign in ancientEgypt?

No direct experience

Page 36: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

…using a single datapoint

• People produce accurate predictions for the duration and extent of everyday events

• Strong prior knowledge – form of the prior (power-law or exponential)– distribution given that form (parameters)– non-parametric distribution when necessary

• Reveals a surprising correspondence between probabilities in the mind and in the world

Page 37: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Outline

• …using a single datapoint– predicting the future– joint work with Josh Tenenbaum (MIT)

• …using secondhand data– effects of priors on cultural transmission– joint work with Mike Kalish (Louisiana)

• Conclusions

Page 38: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Cultural transmission

• Most knowledge is based on secondhand data

• Some things can only be learned from others– cultural objects transmitted across generations

• Cultural transmission provides an opportunity for priors to influence cultural objects

Page 39: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Iterated learning(Briscoe, 1998; Kirby, 2001)

• Each learner sees data, forms a hypothesis, produces the data given to the next learner

• c.f. the playground game “telephone”

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 40: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Objects of iterated learning

• Languages

• Religious concepts

• Social norms

• Myths and legends

• Causal theories

Page 41: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Explaining linguistic universals

• Human languages are a subset of all logically possible communication schemes– universal properties common to all languages

(Comrie, 1981; Greenberg, 1963; Hawkins, 1988)

• Two questions:– why do linguistic universals exist?– why are particular properties universal?

Page 42: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Explaining linguistic universals

• Traditional answer:– linguistic universals reflect innate constraints

specific to a system for acquiring language

• Alternative answer:– iterated learning imposes “information bottleneck”– universal properties survive this bottleneck

(Briscoe, 1998; Kirby, 2001)

Page 43: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Analyzing iterated learning

What are the consequences of iterated learning?

Simulations

Analytic results

Complexalgorithms

Simplealgorithms

Komarova, Niyogi, & Nowak (2002)

Brighton (2002)

Kirby (2001)

Smith, Kirby, & Brighton (2003)

?

Page 44: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Iterated Bayesian learning

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

p(h|d)

p(d|h)

p(h|d)

p(d|h)

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Learners are rational Bayesian agents(covers a wide range of learning algorithms)

Page 45: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Markov chains

• Variables x(t+1) independent of history given x(t)

• Converges to a stationary distribution under easily checked conditions

x x x x x x x x

Transition matrixP(x(t+1)|x(t))

Page 46: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Markov chain Monte Carlo

• A strategy for sampling from complex probability distributions

• Key idea: construct a Markov chain which converges to target distribution– e.g. Metropolis algorithm– e.g. Gibbs sampling

Page 47: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Gibbs sampling

For variables x = x1, x2, …, xn

Draw xi(t+1) from P(xi|x-i)

x-i = x1(t+1), x2

(t+1),…, xi-1(t+1)

, xi+1(t)

, …, xn(t)

(a.k.a. the heat bath algorithm in statistical physics)

(Geman & Geman, 1984)

Page 48: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Gibbs sampling

(MacKay, 2002)

Page 49: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Iterated Bayesian learning

• Defines a Markov chain on (h,d)

Page 50: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Iterated Bayesian learning

• Defines a Markov chain on (h,d)

• This Markov chain is a Gibbs sampler for

p(d,h) = p(d | h) p(h)

Page 51: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Iterated Bayesian learning

• Defines a Markov chain on (h,d)

• This Markov chain is a Gibbs sampler for

• Rate of convergence is geometric– Gibbs sampler converges geometrically

(Liu, Wong, & Kong, 1995)€

p(d,h) = p(d | h) p(h)

Page 52: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Analytic results

• Iterated Bayesian learning converges to

• Corollaries:– distribution over hypotheses converges to p(h)– distribution over data converges to p(d)– the proportion of a population of iterated learners

with hypothesis h converges to p(h)

p(d,h) = p(d | h) p(h)

Page 53: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Implications for linguistic universals

• Two questions:– why do linguistic universals exist?– why are particular properties universal?

• Different answers: – existence explained through iterated learning– universal properties depend on the prior

• Focuses inquiry on the priors of the learners– cultural objects reflect the human mind

Page 54: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

A method for discovering priors

Iterated learning converges to the prior…

…evaluate prior by producing iterated learning

Page 55: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Iterated function learning

• Each learner sees a set of (x,y) pairs

• Makes predictions of y for new x values

• Predictions are data for the next learner

data hypotheses

Page 56: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Function learning in the lab

Stimulus

Response

Slider

Feedback

Examine iterated learning with different initial data

Page 57: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

1 2 3 4 5 6 7 8 9

IterationInitialdata

Page 58: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

…using secondhand data

• Iterated Bayesian learning converges to the prior• Constrains explanations of linguistic universals• Open questions in Bayesian language evolution

– variation in priors– other selective pressures

• Provides a method for evaluating priors– concepts, causal relationships, languages, …

Page 59: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Outline

• …using a single datapoint– predicting the future

• …using secondhand data– effects of priors on cultural transmission

• Conclusions

Page 60: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Bayes’ theorem

p(h | d)∝ p(d | h)p(h)

A unifying principle for explaining inductive inferences

Page 61: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Bayes’ theorem

behavior = f(data,knowledge)

data behaviorQuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 62: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Bayes’ theorem

behavior = f(data,knowledge)

data behaviorQuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

knowledge

Page 63: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

Explaining inductive leaps

• How do people – infer causal relationships– identify the work of chance– predict the future– assess similarity and make generalizations– learn functions, languages, and concepts

. . . from such limited data?

• What knowledge guides human inferences?

Page 64: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences
Page 65: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

HHTHT

Page 66: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

HHTHT

HHHHT

Page 67: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

p(HHTHT|random)

p(random|HHTHT)

What’s the computational problem?

An inference about the structure of the world

Page 68: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences
Page 69: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

An example: Gaussians

• If we assume…– data, d, is a single real number, x– hypotheses, h, are means of a Gaussian, – prior, p(), is Gaussian(0,0

2)

• …then p(xn+1|xn) is Gaussian(n, x2 + n

2)

n =xn /σ x

2 + μ0 /σ 02

1/σ x2 +1/σ 0

2

n2 =

1

1/σ x2 +1/σ 0

2

Page 70: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

0 = 0, 02 = 1, x0 = 20

Iterated learning results in rapid convergence to prior

Page 71: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

An example: Linear regression

• Assume– data, d, are pairs of real numbers (x, y)– hypotheses, h, are functions

• An example: linear regression– hypotheses have slope and pass through origin

– p() is Gaussian(0,02)

}x = 1

y

Page 72: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences

}x = 1

y

0 = 1, 02 = 0.1, y0 = -1

Page 73: Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences