bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · the big...
TRANSCRIPT
![Page 1: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/1.jpg)
Bayesian models ofinductive learning
Tom GriffithsUC Berkeley
Josh TenenbaumMIT
Charles KempCMU
![Page 2: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/2.jpg)
What you will get out of this tutorial• Our view of what Bayesian models have to offer
cognitive science• In-depth examples of basic and advanced
models: how the math works & what it buys you• A sense for how to go about making your own
Bayesian models• Some (not extensive) comparison to other
approaches• Opportunities to ask questions
![Page 3: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/3.jpg)
Resources…• “Bayesian models of cognition” chapter in
Handbook of Computational Psychology• Tom’s Bayesian reading list:
– http://cocosci.berkeley.edu/tom/bayes.html– tutorial slides will be posted there!
• Trends in Cognitive Sciences special issue onprobabilistic models of cognition (vol. 10, iss. 7)
• IPAM graduate summer school on probabilisticmodels of cognition (with videos!)
![Page 4: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/4.jpg)
Outline• Morning
– Introduction: Why Bayes? (Josh)– Basics of Bayesian inference (Josh)– How to build a Bayesian cognitive model (Tom)
• Afternoon– Hierarchical Bayesian models and learning
structured representations (Charles)– Monte Carlo methods and nonparametric Bayesian
models (Tom)
![Page 5: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/5.jpg)
Why probabilistic models of cognition?
![Page 6: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/6.jpg)
The big question
How does the mind get so much out of so little?
How do we make inferences, generalizations,models, theories and decisions about the worldfrom impoverished (sparse, incomplete, noisy)data?
“The problem of induction”
![Page 7: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/7.jpg)
Visual perception
(Marr)
![Page 8: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/8.jpg)
Learning the meanings of words
“horse” “horse” “horse”
![Page 9: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/9.jpg)
The objects of planet Gazoob“tufa”
“tufa”
“tufa”
![Page 10: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/10.jpg)
The big questionHow does the mind get so much out of so little?
– Perceiving the world from sense data– Learning about kinds of objects and their properties– Learning and interpreting the meanings of words, phrases,
and sentences– Inferring causal relations– Inferring the mental states of other people (beliefs,
desires, preferences) from observing their actions– Learning social structures, conventions, and rules
The goal: A general-purpose computational frameworkfor understanding of how people make theseinferences, and how they can be successful.
![Page 11: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/11.jpg)
The problems of induction1. How does abstract knowledge guide inductive
learning, inference, and decision-making from sparse,noisy or ambiguous data?
2. What is the form and content of our abstractknowledge of the world?
3. What are the origins of our abstract knowledge? Towhat extent can it be acquired from experience?
4. How do our mental models grow over a lifetime,balancing simplicity versus data fit (Occam),accommodation versus assimilation (Piaget)?
5. How can learning and inference proceed efficientlyand accurately, even in the presence of complexhypothesis spaces?
![Page 12: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/12.jpg)
A toolkit for reverse-engineering induction1. Bayesian inference in probabilistic generative models2. Probabilities defined over structured representations:
graphs, grammars, predicate logic, schemas3. Hierarchical probabilistic models, with inference at all
levels of abstraction4. Models of unbounded complexity (“nonparametric
Bayes” or “infinite models”), which can grow incomplexity or change form as observed data dictate.
5. Approximate methods of learning and inference, suchas belief propagation, expectation-maximization (EM),Markov chain Monte Carlo (MCMC), and sequentialMonte Carlo (particle filtering).
![Page 13: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/13.jpg)
VerbVP
NPVPVP
VNPRelRelClause
RelClauseNounAdjDetNP
VPNPS
!
!
!
!
!
][
][][
Phrase structure S
Utterance U
Grammar G
P(S | G)
P(U | S)
P(S | U, G) ~ P(U | S) x P(S | G)
Bottom-up Top-down
(P
![Page 14: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/14.jpg)
VerbVP
NPVPVP
VNPRelRelClause
RelClauseNounAdjDetNP
VPNPS
!
!
!
!
!
][
][][
Phrase structure
Utterance
Speech signal
Grammar
“Universal Grammar” Hierarchical phrase structuregrammars (e.g., CFG, HPSG, TAG)
P(phrase structure | grammar)
P(utterance | phrase structure)
P(speech | utterance)
P(grammar | UG)
![Page 15: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/15.jpg)
(Han and Zhu, 2006)
Vision as probabilistic parsing
![Page 16: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/16.jpg)
![Page 17: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/17.jpg)
Principles
Structure
Data
Whole-object principleShape biasTaxonomic principleContrast principleBasic-level bias
Learning word meanings
![Page 18: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/18.jpg)
Causal learning and reasoning
Principles
Structure
Data
![Page 19: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/19.jpg)
Goal-directed action(production and comprehension)
(Wolpert et al., 2003)
![Page 20: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/20.jpg)
Why Bayesian models of cognition?• A framework for understanding how the mind can solve
fundamental problems of induction.• Strong, principled quantitative models of human cognition.• Tools for studying people’s implicit knowledge of the world.• Beyond classic limiting dichotomies: “rules vs. statistics”,
“nature vs. nurture”, “domain-general vs. domain-specific” .• A unifying mathematical language for all of the cognitive
sciences: AI, machine learning and statistics, psychology,neuroscience, philosophy, linguistics…. A bridge betweenengineering and “reverse-engineering”.
Why now? Much recent progress, in computational resources,theoretical tools, and interdisciplinary connections.
![Page 21: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/21.jpg)
Outline• Morning
– Introduction: Why Bayes? (Josh)– Basics of Bayesian inference (Josh)– How to build a Bayesian cognitive model (Tom)
• Afternoon– Hierarchical Bayesian models & probabilistic
models over structured representations (Charles)– Monte Carlo methods of approximate learning and
inference; nonparametric Bayesian models (Tom)
![Page 22: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/22.jpg)
Bayes’ rule
!"#
##=
Hh
hphdp
hphdpdhp
)()|(
)()|()|(
Posteriorprobability
Likelihood Priorprobability
Sum over space of alternative hypotheses
For any hypothesis h and data d,
![Page 23: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/23.jpg)
Bayesian inference
• Bayes’ rule:• An example
– Data: John is coughing– Some hypotheses:
1. John has a cold2. John has lung cancer3. John has a stomach flu
– Prior P(h) favors 1 and 3 over 2– Likelihood P(d|h) favors 1 and 2 over 3– Posterior P(h|d) favors 1 over 2 and 3
!=
ih
iihdPhP
hdPhPdhP
)|()(
)|()()|(
![Page 24: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/24.jpg)
Plan for this lecture
• Some basic aspects of Bayesian statistics– Comparing two hypotheses– Model fitting– Model selection
• Two (very brief) case studies in modelinghuman inductive learning– Causal learning– Concept learning
![Page 25: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/25.jpg)
Coin flipping• Comparing two hypotheses
– data = HHTHT or HHHHH– compare two simple hypotheses:
P(H) = 0.5 vs. P(H) = 1.0
• Parameter estimation (Model fitting)– compare many hypotheses in a parameterized family
P(H) = θ : Infer θ
• Model selection– compare qualitatively different hypotheses, often
varying in complexity:P(H) = 0.5 vs. P(H) = θ
![Page 26: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/26.jpg)
Coin flipping
HHTHT
HHHHH
What process produced these sequences?
![Page 27: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/27.jpg)
Comparing two hypotheses
• Contrast simple hypotheses:– h1: “fair coin”, P(H) = 0.5– h2:“always heads”, P(H) = 1.0
• Bayes’ rule:
• With two hypotheses, use odds form
!=
ih
iihdPhP
hdPhPdhP
)|()(
)|()()|(
![Page 28: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/28.jpg)
Comparing two hypotheses
D: HHTHTH1, H2: “fair coin”, “always heads”P(D|H1) = 1/25 P(H1) = ? P(D|H2) = 0 P(H2) = 1-?
)(
)(
)|(
)|(
)|(
)|(
2
1
2
1
2
1
HP
HP
HDP
HDP
DHP
DHP!=
![Page 29: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/29.jpg)
Comparing two hypotheses
D: HHTHTH1, H2: “fair coin”, “always heads”P(D|H1) = 1/25 P(H1) = 999/1000 P(D|H2) = 0 P(H2) = 1/1000
infinity1
999
0
321
)|(
)|(
2
1 =!=DHP
DHP
)(
)(
)|(
)|(
)|(
)|(
2
1
2
1
2
1
HP
HP
HDP
HDP
DHP
DHP!=
![Page 30: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/30.jpg)
Comparing two hypotheses
D: HHHHHH1, H2: “fair coin”, “always heads”P(D|H1) = 1/25 P(H1) = 999/1000P(D|H2) = 1 P(H2) = 1/1000
301
999
1
321
)|(
)|(
2
1 !"=DHP
DHP
)(
)(
)|(
)|(
)|(
)|(
2
1
2
1
2
1
HP
HP
HDP
HDP
DHP
DHP!=
![Page 31: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/31.jpg)
Comparing two hypotheses
D: HHHHHHHHHHH1, H2: “fair coin”, “always heads”P(D|H1) = 1/210 P(H1) = 999/1000P(D|H2) = 1 P(H2) = 1/1000
)(
)(
)|(
)|(
)|(
)|(
2
1
2
1
2
1
HP
HP
HDP
HDP
DHP
DHP!=
11
999
1
10241
)|(
)|(
2
1 !"=DHP
DHP
![Page 32: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/32.jpg)
Measuring prior knowledge1. The fact that HHHHH looks like a “mere coincidence”,
without making us suspicious that the coin is unfair, whileHHHHHHHHHH does begin to make us suspicious,measures the strength of our prior belief that the coin isfair.– If θ is the threshold for suspicion in the posterior odds, and D* is
the shortest suspicious sequence, the prior odds for a fair coin isroughly θ/P(D*|“fair coin”).
– If θ ~ 1 and D* is between 10 and 20 heads, prior odds are roughlybetween 1/1,000 and 1/1,000,000.
2. The fact that HHTHT looks representative of a fair coin,and HHHHH does not, reflects our prior knowledge aboutpossible causal mechanisms in the world.– Easy to imagine how a trick all-heads coin could work: low (but
not negligible) prior probability.– Hard to imagine how a trick “HHTHT” coin could work: extremely
low (negligible) prior probability.
![Page 33: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/33.jpg)
Coin flipping• Basic Bayes
– data = HHTHT or HHHHH– compare two hypotheses:
P(H) = 0.5 vs. P(H) = 1.0
• Parameter estimation (Model fitting)– compare many hypotheses in a parameterized family
P(H) = θ : Infer θ
• Model selection– compare qualitatively different hypotheses, often
varying in complexity:P(H) = 0.5 vs. P(H) = θ
![Page 34: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/34.jpg)
• Assume data are generated from aparameterized model:
• What is the value of θ ?– each value of θ is a hypothesis H– requires inference over infinitely many hypotheses
Parameter estimation
d1 d2 d3 d4
P(H) = θ
θ
![Page 35: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/35.jpg)
• Assume hypothesis space of possible models:
• Which model generated the data?– requires summing out hidden variables– requires some form of Occam’s razor to trade off
complexity with fit to the data.
Model selection
d1 d2 d3 d4
Fair coin: P(H) = 0.5
d1 d2 d3 d4
P(H) = θ
θ
d1 d2 d3 d4
Hidden Markov model: si {Fair coin, Trick coin} !
s1 s2 s3 s4
![Page 36: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/36.jpg)
Parameter estimation vs. Model selectionacross learning and development
• Causality: learning the strength of a relation vs. learningthe existence and form of a relation
• Language acquisition: learning a speaker's accent, orfrequencies of different words vs. learning a new tense orsyntactic rule (or learning a new language, or the existenceof different languages)
• Concepts: learning what horses look like vs. learning thatthere is a new species (or learning that there are species)
• Intuitive physics: learning the mass of an object vs.learning about gravity or angular momentum
![Page 37: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/37.jpg)
A hierarchical learning framework
model
data
M
w
D
)|(),|(),|( MwpMwDpMDwp !
Parameter estimation:parametersetting
![Page 38: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/38.jpg)
A hierarchical learning framework
model
data
M
w
D
)|()|(),|( CMpMDpCDMp !
Model selection:
)|(),|(),|( MwpMwDpMDwp !
Parameter estimation:
model class C !=w
MwpMwDpMDp )|(),|()|(
parametersetting
![Page 39: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/39.jpg)
• Assume data are generated from a model:
• What is the value of θ ?– each value of θ is a hypothesis H– requires inference over infinitely many hypotheses
Bayesian parameter estimation
d1 d2 d3 d4
P(H) = θ
θ
![Page 40: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/40.jpg)
• D = 10 flips, with 5 heads and 5 tails.• θ = P(H) on next flip? 50%• Why? 50% = 5 / (5+5) = 5/10.• Why? “The future will be like the past”
• Suppose we had seen 4 heads and 6 tails.• P(H) on next flip? Closer to 50% than to 40%.• Why? Prior knowledge.
Some intuitions
![Page 41: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/41.jpg)
• Posterior distribution P(θ | D) is a probabilitydensity over θ = P(H)
• Need to specify likelihood P(D | θ ) and priordistribution P(θ ).
Integrating prior knowledge and data
')'()'|(
)()|()|(
!!!
!!=!
" dpDp
pDpDp
![Page 42: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/42.jpg)
Likelihood and prior
• Likelihood: Bernoulli distributionP(D | θ ) = θ NH (1-θ ) NT
– NH: number of heads– NT: number of tails
• Prior: P(θ ) ∝ ?
![Page 43: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/43.jpg)
• D = 10 flips, with 5 heads and 5 tails.• θ = P(H) on next flip? 50%• Why? 50% = 5 / (5+5) = 5/10.• Why? Maximum likelihood:
• Suppose we had seen 4 heads and 6 tails.• P(H) on next flip? Closer to 50% than to 40%.• Why? Prior knowledge.
Some intuitions
)|(maxargˆ !!
!
DP=
![Page 44: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/44.jpg)
A simple method of specifying priors
• Imagine some fictitious trials, reflecting aset of previous experiences– strategy often used with neural networks or
building invariance into machine vision.
• e.g., F ={1000 heads, 1000 tails} ~ strongexpectation that any new coin will be fair
• In fact, this is a sensible statistical idea...
![Page 45: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/45.jpg)
Likelihood and prior
• Likelihood: Bernoulli(θ ) distributionP(D | θ ) = θ NH (1-θ ) NT
– NH: number of heads– NT: number of tails
• Prior: Beta(FH,FT) distributionP(θ ) ∝ θ FH-1 (1-θ ) FT-1
– FH: fictitious observations of heads– FT: fictitious observations of tails
![Page 46: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/46.jpg)
Shape of the Beta prior
![Page 47: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/47.jpg)
• Posterior is Beta(NH+FH,NT+FT)– same form as prior!
Bayesian parameter estimation
P(θ | D) ∝ P(D | θ ) P(θ ) = θ NH+FH-1 (1-θ ) NT+FT-1
![Page 48: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/48.jpg)
d1 d2 d3 d4
θ
FH,FT
H
• Posterior predictive distribution:
D = NH,NT
P(θ | D) ∝ P(D | θ ) P(θ ) = θ NH+FH-1 (1-θ ) NT+FT-1
Bayesian parameter estimation
!1
0P(H|D, FH, FT) = P(H|θ ) P(θ | D, FH, FT) dθ
“hypothesis averaging”
![Page 49: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/49.jpg)
d1 d2 d3 d4
θ
FH,FT
H
• Posterior predictive distribution:
D = NH,NT
P(θ | D) ∝ P(D | θ ) P(θ ) = θ NH+FH-1 (1-θ ) NT+FT-1
Bayesian parameter estimation
(NH+FH+NT+FT)(NH+FH)P(H|D, FH, FT) =
![Page 50: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/50.jpg)
Conjugate priors• A prior p(θ ) is conjugate to a likelihood
function p(D | θ ) if the posterior has the samefunctional form of the prior.– Parameter values in the prior can be thought of as a
summary of “fictitious observations”.– Different parameter values in the prior and
posterior reflect the impact of observed data.– Conjugate priors exist for many standard models
(e.g., all exponential family models)
![Page 51: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/51.jpg)
Some examples• e.g., F ={1000 heads, 1000 tails} ~ strong
expectation that any new coin will be fair• After seeing 4 heads, 6 tails, P(H) on next
flip = 1004 / (1004+1006) = 49.95%
• e.g., F ={3 heads, 3 tails} ~ weakexpectation that any new coin will be fair
• After seeing 4 heads, 6 tails, P(H) on nextflip = 7 / (7+9) = 43.75%
Prior knowledge too weak
![Page 52: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/52.jpg)
But… flipping thumbtacks
• e.g., F ={4 heads, 3 tails} ~ weak expectationthat tacks are slightly biased towards heads
• After seeing 2 heads, 0 tails, P(H) on next flip= 6 / (6+3) = 67%
• Some prior knowledge is always necessary toavoid jumping to hasty conclusions...
• Suppose F = { }: After seeing 1 heads, 0 tails,P(H) on next flip = 1 / (1+0) = 100%
![Page 53: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/53.jpg)
Origin of prior knowledge
• Tempting answer: prior experience• Suppose you have previously seen 2000
coin flips: 1000 heads, 1000 tails
![Page 54: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/54.jpg)
Problems with simple empiricism
• Haven’t really seen 2000 coin flips, or any flips of athumbtack– Prior knowledge is stronger than raw experience justifies
• Haven’t seen exactly equal number of heads and tails– Prior knowledge is smoother than raw experience justifies
• Should be a difference between observing 2000 flipsof a single coin versus observing 10 flips each for 200coins, or 1 flip each for 2000 coins– Prior knowledge is more structured than raw experience
![Page 55: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/55.jpg)
A simple theory• “Coins are manufactured by a standardized
procedure that is effective but not perfect, andsymmetric with respect to heads and tails.Tacks are asymmetric, and manufactured toless exacting standards.”– Justifies generalizing from previous coins to the
present coin.– Justifies smoother and stronger prior than raw
experience alone.– Explains why seeing 10 flips each for 200 coins is
more valuable than seeing 2000 flips of one coin.
![Page 56: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/56.jpg)
A hierarchical Bayesian model
d1 d2 d3 d4
FH,FT
d1 d2 d3 d4
θ1
d1 d2 d3 d4
θ ~ Beta(FH,FT)
Coin 1 Coin 2 Coin 200...θ2 θ200
physical knowledge
• Qualitative physical knowledge (symmetry) caninfluence estimates of continuous parameters (FH, FT).
• Explains why 10 flips of 200 coins are better than 2000flips of a single coin: more informative about FH, FT.
Coins
![Page 57: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/57.jpg)
• Learning the parameters of a generativemodel as Bayesian inference.
• Prediction by Bayesian hypothesis averaging.• Conjugate priors
– an elegant way to represent simple kinds of priorknowledge.
• Hierarchical Bayesian models– integrate knowledge across instances of a system,
or different systems within a domain, to explainthe origins of priors.
Summary: Bayesian parameter estimation
![Page 58: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/58.jpg)
A hierarchical learning framework
model
data
M
w
D
)|()|(),|( CMpMDpCDMp !
Model selection:
)|(),|(),|( MwpMwDpMDwp !
Model fitting:
model class C !=w
MwpMwDpMDp )|(),|()|(
parametersetting
![Page 59: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/59.jpg)
Stability versus Flexibility• Can all domain knowledge be represented
with conjugate priors?• Suppose you flip a coin 25 times and get all
heads. Something funny is going on …• But with F ={1000 heads, 1000 tails},
P(heads) on next flip = 1025 / (1025+1000)= 50.6%. Looks like nothing unusual.
• How do we balance stability and flexibility?– Stability: 6 heads, 4 tails θ ~ 0.5– Flexibility: 25 heads, 0 tails θ ~ 1
![Page 60: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/60.jpg)
Bayesian model selection
• Which provides a better account of the data:the simple hypothesis of a fair coin, or thecomplex hypothesis that P(H) = θ ?
d1 d2 d3 d4
Fair coin, P(H) = 0.5
vs. d1 d2 d3 d4
P(H) = θ
θ
![Page 61: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/61.jpg)
• P(H) = θ is more complex than P(H) = 0.5 intwo ways:– P(H) = 0.5 is a special case of P(H) = θ– for any observed sequence D, we can choose θ
such that D is more probable than if P(H) = 0.5
Comparing simple and complex hypotheses
![Page 62: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/62.jpg)
Comparing simple and complex hypothesesPr
obab
ility
nNnDP
!!= )1()|( """
θ = 0.5
D = HHHHH
![Page 63: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/63.jpg)
Comparing simple and complex hypothesesPr
obab
ility
nNnDP
!!= )1()|( """
θ = 0.5
θ = 1.0
D = HHHHH
![Page 64: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/64.jpg)
Comparing simple and complex hypothesesPr
obab
ility
D = HHTHT
nNnDP
!!= )1()|( """
θ = 0.5θ = 0.6
![Page 65: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/65.jpg)
• P(H) = θ is more complex than P(H) = 0.5 intwo ways:– P(H) = 0.5 is a special case of P(H) = θ– for any observed sequence X, we can choose θ
such that X is more probable than if P(H) = 0.5• How can we deal with this?
– Some version of Occam’s razor?– Bayes: automatic version of Occam’s razor
follows from the “law of conservation of belief”.
Comparing simple and complex hypotheses
![Page 66: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/66.jpg)
P(h1|D) P(D|h1) P(h1)P(h0|D) P(D|h0) P(h0)
= x
Comparing simple and complex hypotheses
! """=
1
0
111 )|(),|()|( dhphDPhDP
The “evidence” or “marginal likelihood”: Theprobability that randomly selected parametersfrom the prior would generate the data.
NnNnhDP 2/1)2/11()2/1()|( 0 =!=
!
![Page 67: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/67.jpg)
)|(
)|(log
0
1
hDP
hDP
!
NhDP 2/1)|( 0 =
! """=
1
0
111 )|(),|()|( dhphDPhDP
![Page 68: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/68.jpg)
• Model class hypothesis: is thiscoin fair or unfair?
• Example probabilities:– P(fair) = 0.999– P(θ |fair) is Beta(1000,1000)– P(θ |unfair) is Beta(1,1)
• 25 heads in a row propagates up,affecting θ and then P(fair|D)
d1 d2 d3 d4
θ
P(fair|25 heads) P(25 heads|fair) P(fair) P(unfair|25 heads) P(25 heads|unfair) P(unfair) = ~ 0.001
FH,FT
fair/unfair?
Stability versus Flexibility revisited
![Page 69: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/69.jpg)
Bayesian Occam’s Razor
All possible data sets d
p(D
= d
| M
) M1
M2
1)|(
all
==!"
MdDp
Dd
For any model M,
Law of “conservation of belief”: A model that can predict manypossible data sets must assign each of them low probability.
![Page 70: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/70.jpg)
Occam’s Razor in curve fitting
![Page 71: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/71.jpg)
![Page 72: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/72.jpg)
D
p(D
= d
| M
)
M1
M2
M3
Observed data
M1
M2
M3
1)|(
all
==!"
MdDp
Dd
M1: A model that is too simple is unlikely to generate the data.M3: A model that is too complex can generate many possible data sets, so it is unlikely to generate this particular data set at random.
![Page 73: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/73.jpg)
Summary so far• Three kinds of Bayesian inference
– Comparing two simple hypotheses– Parameter estimation
• The importance and subtlety of prior knowledge– Model selection
• Bayesian Occam’s razor, the blessing of abstraction
• Key concepts– Probabilistic generative models– Hierarchies of abstraction, with statistical
inference at all levels– Flexibly structured representations
![Page 74: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/74.jpg)
Plan for this lecture
• Some basic aspects of Bayesian statistics– Comparing two hypotheses– Model fitting– Model selection
• Two (very brief) case studies in modelinghuman inductive learning– Causal learning– Concept learning
![Page 75: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/75.jpg)
Learning causation from correlation
“Does C cause E?”(rate on a scale from 0 to 100)
E present (e+)
E absent (e-)
C present(c+)
C absent(c-)
a
b
c
d
![Page 76: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/76.jpg)
• Strength: how strong is the relationship?
• Structure: does a relationship exist?
Learning with graphical models
vs.E
CB
E
CB
E
C
w1
B
w0
Delta-P, Power PC, …
h1 h0
![Page 77: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/77.jpg)
• Hypotheses:
• Bayesian causal inference:
support =
!
P(d | h1) = P(d |w0,w1) p(w0,w1 | h1)0
1
"0
1
" dw0 dw1
!
P(d | h0) = P(d |w0) p(w0 | h0)0
1
" dw0
Bayesian learning of causal structure
P(d|h1)
P(d|h0)likelihood ratio (Bayes factor)gives evidence in favor of h1
vs.E
CB
E
CB
h1 h0
log
![Page 78: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/78.jpg)
Bayesian Occam’s Razor
All data sets d
P(d
| h )
h0 (no relationship)
h1 (positive relationship)
!
P(d
" d | h) = 1
For any model h,
P(e+|c+) >>P(e+|c-)
P(e+|c+) ~P(e+|c-)
![Page 79: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/79.jpg)
Comparison with human judgments
ΔP = 0ΔP = 0.25
ΔP = 0.5ΔP = 0.75
ΔP = 1
People
ΔP
Power PC
Bayesian structure learning
Assumestructure:Estimatestrength w1
vs.
E
C
w1
B
w0
E
CB
w0E
C
w1
B
w0
(Buehner & Cheng, 1997; 2003)
![Page 80: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/80.jpg)
Inferences about causal structure depend onthe functional form of causal relations
![Page 81: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/81.jpg)
Concept learning: the number game
• Program input: number between 1 and 100• Program output: “yes” or “no”• Learning task:
– Observe one or more positive (“yes”) examples.– Judge whether other numbers are “yes” or “no”.
![Page 82: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/82.jpg)
Examples of“yes” numbers
Generalizationjudgments (N = 20)
60
60 80 10 30
60 52 57 55
Diffuse similarity
Rule: “multiples of 10”
Focused similarity: numbers near 50-60
Concept learning: the number game
![Page 83: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/83.jpg)
• H: Hypothesis space of possible concepts:– H1: Mathematical properties: multiples and powers of small numbers.– H2: Magnitude: intervals with endpoints between 1 and 100.
• X = {x1, . . . , xn}: n examples of a concept C.• Evaluate hypotheses given data:
– p(h) [prior]: domain knowledge, pre-existing biases– p(X|h) [likelihood]: statistical information in examples.– p(h|X) [posterior]: degree of belief that h is the true extension of C.
Bayesian model
!"#
##=
Hh
hphXp
hphXpXhp
)()|(
)()|()|(
![Page 84: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/84.jpg)
Generalizing to new objects
Given p(h|X), how do we compute ,the probability that C applies to some newstimulus y?
!"
"="
Hh
XhphCypXCyp )|()|()|(
x1 x2 x3 x4
h
Backgroundknowledge
X =?Cy!
=! )|( XCyp
!"
"Hh
XhphCyp )|()|(
![Page 85: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/85.jpg)
Likelihood: p(X|h)• Size principle: Smaller hypotheses receive greater
likelihood, and exponentially more so as n increases.
• Follows from assumption of randomly sampled examples+ law of “conservation of belief”:
• Captures the intuition of a “representative” sample.
hxx
n
n
hhXp !"
#
$%&
'= ,,if
1
)size(
1)|( K
hxi!= any if 0
1)|(
all
==!"
MdDp
Dd
![Page 86: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/86.jpg)
2 4 6 8 1012 14 16 18 2022 24 26 28 3032 34 36 38 4042 44 46 48 5052 54 56 58 6062 64 66 68 7072 74 76 78 8082 84 86 88 9092 94 96 98 100
Illustrating the size principle
h1 h2
![Page 87: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/87.jpg)
2 4 6 8 1012 14 16 18 2022 24 26 28 3032 34 36 38 4042 44 46 48 5052 54 56 58 6062 64 66 68 7072 74 76 78 8082 84 86 88 9092 94 96 98 100
Illustrating the size principle
h1 h2
Data slightly more of a coincidence under h1
![Page 88: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/88.jpg)
2 4 6 8 1012 14 16 18 2022 24 26 28 3032 34 36 38 4042 44 46 48 5052 54 56 58 6062 64 66 68 7072 74 76 78 8082 84 86 88 9092 94 96 98 100
Illustrating the size principle
h1 h2
Data much more of a coincidence under h1
![Page 89: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/89.jpg)
Prior: p(h)• Choice of hypothesis space embodies a strong prior:
effectively, p(h) ~ 0 for many logically possible butconceptually unnatural hypotheses.
• Prevents overfitting by highly specific but unnaturalhypotheses, e.g. “multiples of 10 except 50 and 70”.
e.g., X = {60 80 10 30}:
0001.010
1)10 of multiples|(
4
=!"
#$%
&=Xp
00024.08
1)70 50,except 10 of multiples|(
4
=!"
#$%
&=Xp
![Page 90: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/90.jpg)
Posterior:
• X = {60, 80, 10, 30}
• Why prefer “multiples of 10” over “evennumbers”? p(X|h).
• Why prefer “multiples of 10” over “multiples of10 except 50 and 20”? p(h).
• Why does a good generalization need both highprior and high likelihood? p(h|X) ~ p(X|h) p(h)
!"#
##=
Hh
hphXp
hphXpXhp
)()|(
)()|()|(
Occam’s razor: balancing simplicity and fit to data
![Page 91: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/91.jpg)
Prior: p(h)• Choice of hypothesis space embodies a strong prior:
effectively, p(h) ~ 0 for many logically possible butconceptually unnatural hypotheses.
• Prevents overfitting by highly specific but unnaturalhypotheses, e.g. “multiples of 10 except 50 and 70”.
• p(h) encodes relative weights of alternative theories:
H1: Mathematical properties (24)• even numbers• powers of two• multiples of three ...
H2: Magnitude intervals (5050)• 10-15• 20-32• 37-54 …
H: Total hypothesis spacep(H1) = λ p(H2) = 1-λ
p(h) = λ / 24 p(h) = 1-λ / 5050 * Gamma(s;σ)
![Page 92: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/92.jpg)
+ Examples Human generalization
60
60 80 10 30
60 52 57 55
Bayesian Model
16
16 8 2 64
16 23 19 20
![Page 93: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/93.jpg)
• Higher-level hypothesis: is this conceptmathematical or magnitude-based?
• Example probabilities:– P(math) = λ– P(h | math) …– P(h | magnitude) …
math/magnitude?
Stability versus Flexibility
x1 x2 x3 x4
h
X =
• Just a few examples may be sufficient to infer the kind ofconcept, under the size-principle likelihood– if an a priori reasonable hypothesis of one kind fits much more tightly
than all reasonable hypothesis of the other kind.
• Just a few examples can give all-or-none, “rule-like”generalization or more graded, “similarity-like” generalization.– More all-or-none when the smallest consistent hypothesis is much
smaller than all other reasonable hypotheses; otherwise more graded.
![Page 94: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/94.jpg)
Conclusion:Contributions of Bayesian models
• A framework for understanding how the mind can solvefundamental problems of induction.
• Strong, principled quantitative models of human cognition.• Tools for studying people’s implicit knowledge of the world.• Beyond classic limiting dichotomies: “rules vs. statistics”,
“nature vs. nurture”, “domain-general vs. domain-specific” .• A unifying mathematical language for all of the cognitive
sciences: AI, machine learning and statistics, psychology,neuroscience, philosophy, linguistics…. A bridge betweenengineering and “reverse-engineering”.
![Page 95: Bayesian models of inductive learningcocosci.princeton.edu/tom/talks/tutorial08part1.pdf · The big question How does the mind get so much out of so little? –Perceiving the world](https://reader033.vdocuments.net/reader033/viewer/2022060600/6054169d08f0214bfa1214e3/html5/thumbnails/95.jpg)
A toolkit for reverse-engineering induction1. Bayesian inference in probabilistic generative models2. Probabilities defined over structured representations:
graphs, grammars, predicate logic, schemas3. Hierarchical probabilistic models, with inference at all
levels of abstraction4. Models of unbounded complexity (“nonparametric
Bayes” or “infinite models”), which can grow incomplexity or change form as observed data dictate.
5. Approximate methods of learning and inference, suchas belief propagation, expectation-maximization (EM),Markov chain Monte Carlo (MCMC), and sequentialMonte Carlo (particle filtering).