bayesian models of inductive learning josh tenenbaum & tom griffiths mit computational cognitive...
TRANSCRIPT
![Page 1: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/1.jpg)
Bayesian models of inductive learning
Josh Tenenbaum & Tom Griffiths
MITComputational Cognitive Science Group
Department of Brain and Cognitive Sciences
Computer Science and AI Lab (CSAIL)
![Page 2: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/2.jpg)
What to expect• What you’ll get out of this tutorial:
– Our view of what Bayesian models have to offer cognitive science.– In-depth examples of basic and advanced models: how the math
works & what it buys you. – Some comparison to other approaches.– Opportunities to ask questions.
• What you won’t get:– Detailed, hands-on how-to. – Where you can learn more:
http://bayesiancognition.com
![Page 3: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/3.jpg)
Outline• Morning
– Introduction (Josh) – Basic case study #1: Flipping coins (Tom)– Basic case study #2: Rules and similarity (Josh)
• Afternoon– Advanced case study #1: Causal induction (Tom)– Advanced case study #2: Property induction (Josh)– Quick tour of more advanced topics (Tom)
![Page 4: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/4.jpg)
Outline• Morning
– Introduction (Josh) – Basic case study #1: Flipping coins (Tom)– Basic case study #2: Rules and similarity (Josh)
• Afternoon– Advanced case study #1: Causal induction (Tom)– Advanced case study #2: Property induction (Josh)– Quick tour of more advanced topics (Tom)
![Page 5: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/5.jpg)
Bayesian models in cognitive science
• Vision
• Motor control
• Memory
• Language
• Inductive learning and reasoning….
![Page 6: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/6.jpg)
Everyday inductive leaps
• Learning concepts and words from examples
“horse”
“horse”
“horse”
![Page 7: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/7.jpg)
Learning concepts and words“tufa”
“tufa”
“tufa”
Can you pick out the tufas?
![Page 8: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/8.jpg)
Inductive reasoning
Cows can get Hick’s disease.
Gorillas can get Hick’s disease.
All mammals can get Hick’s disease.
Input:
Task: Judge how likely conclusion is to be true, given that premises are true.
(premises)
(conclusion)
![Page 9: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/9.jpg)
Inferring causal relations
Took vitamin B23 HeadacheDay 1 yes no
Day 2 yes yes
Day 3 no yes
Day 4 yes no
. . . . . . . . .
Does vitamin B23 cause headaches?
Input:
Task: Judge probability of a causal link given several joint observations.
![Page 10: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/10.jpg)
Everyday inductive leaps
How can we learn so much about . . . – Properties of natural kinds– Meanings of words– Future outcomes of a dynamic process– Hidden causal properties of an object– Causes of a person’s action (beliefs, goals)– Causal laws governing a domain
. . . from such limited data?
![Page 11: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/11.jpg)
The Challenge
• How do we generalize successfully from very limited data?– Just one or a few examples– Often only positive examples
• Philosophy: – Induction is a “problem”, a “riddle”, a “paradox”, a “scandal”,
or a “myth”.
• Machine learning and statistics:– Focus on generalization from many examples, both positive
and negative.
![Page 12: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/12.jpg)
Rational statistical inference(Bayes, Laplace)
Hh
hphdp
hphdpdhp
)()|(
)()|()|(
Posteriorprobability
Likelihood Priorprobability
Sum over space of hypotheses
![Page 13: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/13.jpg)
• Shepard (1987)– Analysis of one-shot stimulus generalization, to explain
the universal exponential law.
• Anderson (1990)– Models of categorization and causal induction.
• Oaksford & Chater (1994)– Model of conditional reasoning (Wason selection task).
• Heit (1998)– Framework for category-based inductive reasoning.
Bayesian models of inductive learning: some recent history
![Page 14: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/14.jpg)
• Rational statistical inference (Bayes):
• Learners’ domain theories generate their hypothesis space H and prior p(h). – Well-matched to structure of the natural world.– Learnable from limited data. – Computationally tractable inference.
Hh
hphdp
hphdpdhp
)()|(
)()|()|(
Theory-Based Bayesian Models
![Page 15: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/15.jpg)
What is a theory?
• Working definition– An ontology and a system of abstract principles
that generates a hypothesis space of candidate world structures along with their relative probabilities.
• Analogy to grammar in language.
• Example: Newton’s laws
![Page 16: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/16.jpg)
Structure and statistics• A framework for understanding how structured knowledge and
statistical inference interact.
– How structured knowledge guides statistical inference, and is itself acquired through higher-order statistical learning.
– How simplicity trades off with fit to the data in evaluating structural hypotheses.
– How increasingly complex structures may grow as required by new data, rather than being pre-specified in advance.
![Page 17: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/17.jpg)
Structure and statistics• A framework for understanding how structured knowledge and
statistical inference interact.
– How structured knowledge guides statistical inference, and is itself acquired through higher-order statistical learning.
Hierarchical Bayes.
– How simplicity trades off with fit to the data in evaluating structural hypotheses.
Bayesian Occam’s Razor.
– How increasingly complex structures may grow as required by new data, rather than being pre-specified in advance.
Non-parametric Bayes.
![Page 18: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/18.jpg)
Alternative approaches to inductive generalization
• Associative learning
• Connectionist networks
• Similarity to examples
• Toolkit of simple heuristics
• Constraint satisfaction
• Analogical mapping
![Page 19: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/19.jpg)
Marr’s Three Levels of Analysis
• Computation: “What is the goal of the computation, why is it
appropriate, and what is the logic of the strategy by which it can be carried out?”
• Representation and algorithm: Cognitive psychology
• Implementation:Neurobiology
![Page 20: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/20.jpg)
Why Bayes?• A framework for explaining cognition.
– How people can learn so much from such limited data.
– Why process-level models work the way that they do.
– Strong quantitative models with minimal ad hoc assumptions.
• A framework for understanding how structured knowledge and statistical inference interact.– How structured knowledge guides statistical inference, and is itself acquired
through higher-order statistical learning.
– How simplicity trades off with fit to the data in evaluating structural hypotheses (Occam’s razor).
– How increasingly complex structures may grow as required by new data, rather than being pre-specified in advance.
![Page 21: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/21.jpg)
Outline• Morning
– Introduction (Josh) – Basic case study #1: Flipping coins (Tom)– Basic case study #2: Rules and similarity (Josh)
• Afternoon– Advanced case study #1: Causal induction (Tom)– Advanced case study #2: Property induction (Josh)– Quick tour of more advanced topics (Tom)
![Page 22: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/22.jpg)
Coin flipping
![Page 23: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/23.jpg)
Coin flipping
HHTHT
HHHHH
What process produced these sequences?
![Page 24: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/24.jpg)
Bayes’ rule
• “Posterior probability”:
• “Prior probability”:
• “Likelihood”:
)(
)|()()|(
DP
HDPHPDHP
)|( DHP
)(HP
)|( HDP
For data D and a hypothesis H, we have:
![Page 25: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/25.jpg)
The origin of Bayes’ rule
• A simple consequence of using probability to represent degrees of belief
• For any two random variables:
)|()()&(
)|()()&(
BApBpBAp
ABpApBAp
)|()()|()( ABpApBApBp
)(
)|()()|(
Bp
ABpApBAp
![Page 26: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/26.jpg)
• Good statistics– consistency, and worst-case error bounds.
• Cox Axioms– necessary to cohere with common sense
• “Dutch Book” + Survival of the Fittest– if your beliefs do not accord with the laws of probability, then you
can always be out-gambled by someone whose beliefs do so accord.
• Provides a theory of learning– a common currency for combining prior knowledge and the lessons
of experience.
Why represent degrees of belief with probabilities?
![Page 27: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/27.jpg)
Bayes’ rule
• “Posterior probability”:
• “Prior probability”:
• “Likelihood”:
)(
)|()()|(
DP
HDPHPDHP
)|( DHP
)(HP
)|( HDP
For data D and a hypothesis H, we have:
![Page 28: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/28.jpg)
Hypotheses in Bayesian inference
• Hypotheses H refer to processes that could have generated the data D
• Bayesian inference provides a distribution over these hypotheses, given D
• P(D|H) is the probability of D being generated by the process identified by H
• Hypotheses H are mutually exclusive: only one process could have generated D
![Page 29: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/29.jpg)
Hypotheses in coin flipping
• Fair coin, P(H) = 0.5
• Coin with P(H) = p
• Markov model
• Hidden Markov model
• ...
Describe processes by which D could be generated
HHTHTD =
statisticalmodels
![Page 30: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/30.jpg)
Hypotheses in coin flipping
• Fair coin, P(H) = 0.5
• Coin with P(H) = p
• Markov model
• Hidden Markov model
• ...
Describe processes by which D could be generated
generativemodels
HHTHTD =
![Page 31: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/31.jpg)
Representing generative models
• Graphical model notation– Pearl (1988), Jordan (1998)
• Variables are nodes, edges indicate dependency
• Directed edges show causal process of data generation
HHTHTd1 d2 d3 d4 d5
d1 d2 d3 d4
Fair coin, P(H) = 0.5
d1 d2 d3 d4
Markov model
![Page 32: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/32.jpg)
Models with latent structure
• Not all nodes in a graphical model need to be observed
• Some variables reflect latent structure, used in generating D but unobserved
HHTHTd1 d2 d3 d4 d5
d1 d2 d3 d4
Hidden Markov model
s1 s2 s3 s4
d1 d2 d3 d4
P(H) = p
p
![Page 33: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/33.jpg)
Coin flipping
• Comparing two simple hypotheses– P(H) = 0.5 vs. P(H) = 1.0
• Comparing simple and complex hypotheses– P(H) = 0.5 vs. P(H) = p
• Comparing infinitely many hypotheses– P(H) = p
• Psychology: Representativeness
![Page 34: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/34.jpg)
Coin flipping
• Comparing two simple hypotheses– P(H) = 0.5 vs. P(H) = 1.0
• Comparing simple and complex hypotheses– P(H) = 0.5 vs. P(H) = p
• Comparing infinitely many hypotheses– P(H) = p
• Psychology: Representativeness
![Page 35: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/35.jpg)
Comparing two simple hypotheses
• Contrast simple hypotheses:– H1: “fair coin”, P(H) = 0.5
– H2:“always heads”, P(H) = 1.0
• Bayes’ rule:
• With two hypotheses, use odds form
)(
)|()()|(
DP
HDPHPDHP
![Page 36: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/36.jpg)
Bayes’ rule in odds form
P(H1|D) P(D|H1) P(H1)
P(H2|D) P(D|H2) P(H2)
D: data
H1, H2: models
P(H1|D): posterior probability H1 generated the data
P(D|H1): likelihood of data under model H1
P(H1): prior probability H1 generated the data
= x
![Page 37: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/37.jpg)
Coin flipping
HHTHT
HHHHH
What process produced these sequences?
![Page 38: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/38.jpg)
Comparing two simple hypotheses
P(H1|D) P(D|H1) P(H1)
P(H2|D) P(D|H2) P(H2)
D: HHTHTH1, H2: “fair coin”, “always heads”
P(D|H1) = 1/25 P(H1) = 999/1000
P(D|H2) = 0 P(H2) = 1/1000
P(H1|D) / P(H2|D) = infinity
= x
![Page 39: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/39.jpg)
Comparing two simple hypotheses
P(H1|D) P(D|H1) P(H1)
P(H2|D) P(D|H2) P(H2)
D: HHHHHH1, H2: “fair coin”, “always heads”
P(D|H1) = 1/25 P(H1) = 999/1000
P(D|H2) = 1 P(H2) = 1/1000
P(H1|D) / P(H2|D) 30
= x
![Page 40: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/40.jpg)
Comparing two simple hypotheses
P(H1|D) P(D|H1) P(H1)
P(H2|D) P(D|H2) P(H2)
D: HHHHHHHHHHH1, H2: “fair coin”, “always heads”
P(D|H1) = 1/210 P(H1) = 999/1000
P(D|H2) = 1 P(H2) = 1/1000
P(H1|D) / P(H2|D) 1
= x
![Page 41: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/41.jpg)
• Bayes’ rule tells us how to combine prior beliefs with new data– top-down and bottom-up influences
• As a model of human inference– predicts conclusions drawn from data– identifies point at which prior beliefs are
overwhelmed by new experiences
• But… more complex cases?
Comparing two simple hypotheses
![Page 42: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/42.jpg)
Coin flipping
• Comparing two simple hypotheses– P(H) = 0.5 vs. P(H) = 1.0
• Comparing simple and complex hypotheses– P(H) = 0.5 vs. P(H) = p
• Comparing infinitely many hypotheses– P(H) = p
• Psychology: Representativeness
![Page 43: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/43.jpg)
Comparing simple and complex hypotheses
• Which provides a better account of the data: the simple hypothesis of a fair coin, or the complex hypothesis that P(H) = p?
d1 d2 d3 d4
Fair coin, P(H) = 0.5
d1 d2 d3 d4
P(H) = p
p
vs.
![Page 44: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/44.jpg)
• P(H) = p is more complex than P(H) = 0.5 in two ways:– P(H) = 0.5 is a special case of P(H) = p– for any observed sequence X, we can choose p
such that X is more probable than if P(H) = 0.5
Comparing simple and complex hypotheses
![Page 45: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/45.jpg)
Comparing simple and complex hypotheses
Pro
babi
lity
![Page 46: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/46.jpg)
Comparing simple and complex hypotheses
Pro
babi
lity
HHHHH p = 1.0
![Page 47: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/47.jpg)
Comparing simple and complex hypotheses
Pro
babi
lity
HHTHT p = 0.6
![Page 48: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/48.jpg)
• P(H) = p is more complex than P(H) = 0.5 in two ways:– P(H) = 0.5 is a special case of P(H) = p– for any observed sequence X, we can choose p such
that X is more probable than if P(H) = 0.5
• How can we deal with this?– frequentist: hypothesis testing– information theorist: minimum description length– Bayesian: just use probability theory!
Comparing simple and complex hypotheses
![Page 49: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/49.jpg)
P(H1|D) P(D|H1) P(H1)
P(H2|D) P(D|H2) P(H2)
Computing P(D|H1) is easy:
P(D|H1) = 1/2N
Compute P(D|H2) by averaging over p:
= x
Comparing simple and complex hypotheses
![Page 50: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/50.jpg)
Comparing simple and complex hypotheses
Pro
babi
lity
Distribution is an average over all values of p
![Page 51: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/51.jpg)
Comparing simple and complex hypotheses
Pro
babi
lity
Distribution is an average over all values of p
![Page 52: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/52.jpg)
• Simple and complex hypotheses can be compared directly using Bayes’ rule– requires summing over latent variables
• Complex hypotheses are penalized for their greater flexibility: “Bayesian Occam’s razor”
• This principle is used in model selection methods in psychology (e.g. Myung & Pitt, 1997)
Comparing simple and complex hypotheses
![Page 53: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/53.jpg)
Coin flipping
• Comparing two simple hypotheses– P(H) = 0.5 vs. P(H) = 1.0
• Comparing simple and complex hypotheses– P(H) = 0.5 vs. P(H) = p
• Comparing infinitely many hypotheses– P(H) = p
• Psychology: Representativeness
![Page 54: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/54.jpg)
Comparing infinitely many hypotheses
• Assume data are generated from a model:
• What is the value of p?– each value of p is a hypothesis H– requires inference over infinitely many hypotheses
d1 d2 d3 d4
P(H) = p
p
![Page 55: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/55.jpg)
• Flip a coin 10 times and see 5 heads, 5 tails. • P(H) on next flip? 50%• Why? 50% = 5 / (5+5) = 5/10.• “Future will be like the past.”
• Suppose we had seen 4 heads and 6 tails.• P(H) on next flip? Closer to 50% than to 40%.• Why? Prior knowledge.
Comparing infinitely many hypotheses
![Page 56: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/56.jpg)
• Posterior distribution P(p | D) is a probability density over p = P(H)
• Need to work out likelihood P(D | p) and specify prior distribution P(p)
)(
)|()()|(
DP
HDPHPDHP
Integrating prior knowledge and data
P(p | D) P(D | p) P(p)
![Page 57: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/57.jpg)
Likelihood and prior
• Likelihood:
P(D | p) = pNH (1-p)NT
– NH: number of heads– NT: number of tails
• Prior:
P(p) pFH-1 (1-p)FT-1 ?
![Page 58: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/58.jpg)
A simple method of specifying priors
• Imagine some fictitious trials, reflecting a set of previous experiences– strategy often used with neural networks
• e.g., F ={1000 heads, 1000 tails} ~ strong expectation that any new coin will be fair
• In fact, this is a sensible statistical idea...
![Page 59: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/59.jpg)
Likelihood and prior
• Likelihood:
P(D | p) = pNH (1-p)NT
– NH: number of heads– NT: number of tails
• Prior:
P(p) pFH-1 (1-p)FT-1 – FH: fictitious observations of heads– FT: fictitious observations of tails
Beta(FH,FT)
![Page 60: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/60.jpg)
Conjugate priors
• Exist for many standard distributions– formula for exponential family conjugacy
• Define prior in terms of fictitious observations
• Beta is conjugate to Bernoulli (coin-flipping)
FH = FT = 1 FH = FT = 3FH = FT = 1000
![Page 61: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/61.jpg)
Likelihood and prior
• Likelihood:
P(D | p) = pNH (1-p)NT
– NH: number of heads– NT: number of tails
• Prior:
P(p) pFH-1 (1-p)FT-1 – FH: fictitious observations of heads– FT: fictitious observations of tails
![Page 62: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/62.jpg)
• Posterior is Beta(NH+FH,NT+FT)– same form as conjugate prior
• Posterior mean:
• Posterior predictive distribution:
Comparing infinitely many hypotheses
P(p | D) P(D | p) P(p) = pNH+FH-1 (1-p)NT+FT-1
![Page 63: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/63.jpg)
Some examples• e.g., F ={1000 heads, 1000 tails} ~ strong
expectation that any new coin will be fair
• After seeing 4 heads, 6 tails, P(H) on next flip = 1004 / (1004+1006) = 49.95%
• e.g., F ={3 heads, 3 tails} ~ weak expectation that any new coin will be fair
• After seeing 4 heads, 6 tails, P(H) on next flip = 7 / (7+9) = 43.75%
Prior knowledge too weak
![Page 64: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/64.jpg)
But… flipping thumbtacks
• e.g., F ={4 heads, 3 tails} ~ weak expectation that tacks are slightly biased towards heads
• After seeing 2 heads, 0 tails, P(H) on next flip = 6 / (6+3) = 67%
• Some prior knowledge is always necessary to avoid jumping to hasty conclusions...
• Suppose F = { }: After seeing 2 heads, 0 tails, P(H) on next flip = 2 / (2+0) = 100%
![Page 65: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/65.jpg)
Origin of prior knowledge
• Tempting answer: prior experience
• Suppose you have previously seen 2000 coin flips: 1000 heads, 1000 tails
• By assuming all coins (and flips) are alike, these observations of other coins are as good as observations of the present coin
![Page 66: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/66.jpg)
Problems with simple empiricism
• Haven’t really seen 2000 coin flips, or any flips of a thumbtack– Prior knowledge is stronger than raw experience justifies
• Haven’t seen exactly equal number of heads and tails– Prior knowledge is smoother than raw experience justifies
• Should be a difference between observing 2000 flips of a single coin versus observing 10 flips each for 200 coins, or 1 flip each for 2000 coins– Prior knowledge is more structured than raw experience
![Page 67: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/67.jpg)
A simple theory
• “Coins are manufactured by a standardized procedure that is effective but not perfect.” – Justifies generalizing from previous coins to the present
coin.
– Justifies smoother and stronger prior than raw experience alone.
– Explains why seeing 10 flips each for 200 coins is more valuable than seeing 2000 flips of one coin.
• “Tacks are asymmetric, and manufactured to less exacting standards.”
![Page 68: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/68.jpg)
Limitations
• Can all domain knowledge be represented so simply, in terms of an equivalent number of fictional observations?
• Suppose you flip a coin 25 times and get all heads. Something funny is going on…
• But with F ={1000 heads, 1000 tails}, P(H) on next flip = 1025 / (1025+1000) = 50.6%.
Looks like nothing unusual
![Page 69: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/69.jpg)
Hierarchical priors
• Higher-order hypothesis: is this coin fair or unfair?
• Example probabilities:– P(fair) = 0.99
– P(p|fair) is Beta(1000,1000)
– P(p|unfair) is Beta(1,1)
• 25 heads in a row propagates up, affecting p and then P(fair|D)
d1 d2 d3 d4
p
fair
P(fair|25 heads) P(25 heads|fair) P(fair) P(unfair|25 heads) P(25 heads|unfair) P(unfair)
= = 9 x 10-5
![Page 70: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/70.jpg)
• Latent structure can capture coin variability
• 10 flips from 200 coins is better than 2000 flips from a single coin: allows estimation of FH, FT
More hierarchical priors
d1 d2 d3 d4
p
FH,FT
d1 d2 d3 d4
p
d1 d2 d3 d4
p
p ~ Beta(FH,FT)
Coin 1 Coin 2 Coin 200...
![Page 71: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/71.jpg)
• Discrete beliefs (e.g. symmetry) can influence estimation of continuous properties (e.g. FH, FT)
Yet more hierarchical priors
d1 d2 d3 d4
p
FH,FT
d1 d2 d3 d4
p
d1 d2 d3 d4
p
physical knowledge
![Page 72: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/72.jpg)
• Apply Bayes’ rule to obtain posterior probability density
• Requires prior over all hypotheses– computation simplified by conjugate priors– richer structure with hierarchical priors
• Hierarchical priors indicate how simple theories can inform statistical inferences– one step towards structure and statistics
Comparing infinitely many hypotheses
![Page 73: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/73.jpg)
Coin flipping
• Comparing two simple hypotheses– P(H) = 0.5 vs. P(H) = 1.0
• Comparing simple and complex hypotheses– P(H) = 0.5 vs. P(H) = p
• Comparing infinitely many hypotheses– P(H) = p
• Psychology: Representativeness
![Page 74: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/74.jpg)
Psychology: Representativeness
Which sequence is more likely from a fair coin?
HHTHT
HHHHH
more representative of a fair coin
(Kahneman & Tversky, 1972)
![Page 75: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/75.jpg)
What might representativeness mean?
P(H1|D) P(D|H1) P(H1)
P(H2|D) P(D|H2) P(H2)
H1: random process (fair coin)
H2: alternative processes
= x
Evidence for a random generating process
likelihood ratio
![Page 76: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/76.jpg)
A constrained hypothesis space
Four hypotheses:
h1 fair coin HHTHTTTH
h2 “always alternates” HTHTHTHT
h3 “mostly heads” HHTHTHHH
h4 “always heads” HHHHHHHH
![Page 77: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/77.jpg)
Representativeness judgments
![Page 78: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/78.jpg)
Results
• Good account of representativeness data, with three pseudo-free parameters, = 0.91– “always alternates” means 99% of the time
– “mostly heads” means P(H) = 0.85
– “always heads” means P(H) = 0.99
• With scaling parameter, r = 0.95
(Tenenbaum & Griffiths, 2001)
![Page 79: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/79.jpg)
The role of theories
The fact that HHTHT looks representative of a fair coin and HHHHH does not reflects our implicit theories of how the world works. – Easy to imagine how a trick all-heads coin
could work: high prior probability.– Hard to imagine how a trick “HHTHT” coin
could work: low prior probability.
![Page 80: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/80.jpg)
Summary
• Three kinds of Bayesian inference– comparing two simple hypotheses– comparing simple and complex hypotheses– comparing an infinite number of hypotheses
• Critical notions:– generative models, graphical models– Bayesian Occam’s razor– priors: conjugate, hierarchical (theories)
![Page 81: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/81.jpg)
Outline• Morning
– Introduction (Josh) – Basic case study #1: Flipping coins (Tom)– Basic case study #2: Rules and similarity (Josh)
• Afternoon– Advanced case study #1: Causal induction (Tom)– Advanced case study #2: Property induction (Josh)– Quick tour of more advanced topics (Tom)
![Page 82: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/82.jpg)
Rules and similarity
![Page 83: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/83.jpg)
Structure versus statistics
RulesLogicSymbols
StatisticsSimilarityTypicality
![Page 84: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/84.jpg)
A better metaphor
![Page 85: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/85.jpg)
A better metaphor
![Page 86: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/86.jpg)
Structure and statistics
RulesLogicSymbols
StatisticsSimilarityTypicality
![Page 87: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/87.jpg)
Structure and statistics
• Basic case study #1: Flipping coins– Learning and reasoning with structured
statistical models.
• Basic case study #2: Rules and similarity– Statistical learning with structured
representations.
![Page 88: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/88.jpg)
The number game
• Program input: number between 1 and 100
• Program output: “yes” or “no”
![Page 89: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/89.jpg)
The number game
• Learning task:– Observe one or more positive (“yes”) examples.– Judge whether other numbers are “yes” or “no”.
![Page 90: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/90.jpg)
The number game
Examples of“yes” numbers
Generalizationjudgments (N = 20)
60Diffuse similarity
![Page 91: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/91.jpg)
The number game
Examples of“yes” numbers
Generalizationjudgments (n = 20)
60
60 80 10 30
Diffuse similarity
Rule: “multiples of 10”
![Page 92: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/92.jpg)
The number game
Examples of“yes” numbers
Generalizationjudgments (N = 20)
60
60 80 10 30
60 52 57 55
Diffuse similarity
Rule: “multiples of 10”
Focused similarity: numbers near 50-60
![Page 93: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/93.jpg)
The number game
Examples of“yes” numbers
Generalizationjudgments (N = 20)
16
16 8 2 64
16 23 19 20
Diffuse similarity
Rule: “powers of 2”
Focused similarity: numbers near 20
![Page 94: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/94.jpg)
Main phenomena to explain:– Generalization can appear either similarity-based (graded) or rule-based (all-or-
none). – Learning from just a few positive examples.
60
60 80 10 30
60 52 57 55
Diffuse similarity
Rule: “multiples of 10”
Focused similarity: numbers near 50-60
The number game
![Page 95: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/95.jpg)
Rule/similarity hybrid models
• Category learning– Nosofsky, Palmeri et al.: RULEX– Erickson & Kruschke: ATRIUM
![Page 96: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/96.jpg)
Divisions into “rule” and “similarity” subsystems
• Category learning– Nosofsky, Palmeri et al.: RULEX– Erickson & Kruschke: ATRIUM
• Language processing– Pinker, Marcus et al.: Past tense morphology
• Reasoning– Sloman – Rips– Nisbett, Smith et al.
![Page 97: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/97.jpg)
Rule/similarity hybrid models
• Why two modules? • Why do these modules work the way that they do, and interact as they do?
• How do people infer a rule or similarity metric from just a few positive examples?
![Page 98: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/98.jpg)
• H: Hypothesis space of possible concepts:– h1 = {2, 4, 6, 8, 10, 12, …, 96, 98, 100} (“even numbers”)
– h2 = {10, 20, 30, 40, …, 90, 100} (“multiples of 10”)
– h3 = {2, 4, 8, 16, 32, 64} (“powers of 2”)
– h4 = {50, 51, 52, …, 59, 60} (“numbers between 50 and 60”)
– . . .
Bayesian model
Representational interpretations for H:– Candidate rules
– Features for similarity
– “Consequential subsets” (Shepard, 1987)
![Page 99: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/99.jpg)
Inferring hypotheses from similarity judgment
Additive clustering (Shepard & Arabie, 1977):
: similarity of stimuli i, j
: weight of cluster k
: membership of stimulus i in cluster k
(1 if stimulus i in cluster k, 0 otherwise)
Equivalent to similarity as a weighted sum of common features (Tversky, 1977).
k
jkikkij ffws
ijs
kw
ikf
![Page 100: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/100.jpg)
Additive clustering for the integers 0-9:
k
jkikkij ffws
Rank Weight Stimuli in cluster Interpretation
0 1 2 3 4 5 6 7 8 9
1 .444 * * * powers of two
2 .345 * * * small numbers
3 .331 * * * multiples of three
4 .291 * * * * large numbers
5 .255 * * * * * middle numbers
6 .216 * * * * * odd numbers
7 .214 * * * * smallish numbers
8 .172 * * * * * largish numbers
![Page 101: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/101.jpg)
Three hypothesis subspaces for number concepts
• Mathematical properties (24 hypotheses): – Odd, even, square, cube, prime numbers– Multiples of small integers– Powers of small integers
• Raw magnitude (5050 hypotheses): – All intervals of integers with endpoints between 1 and
100.
• Approximate magnitude (10 hypotheses):– Decades (1-10, 10-20, 20-30, …)
![Page 102: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/102.jpg)
Hypothesis spaces and theories• Why a hypothesis space is like a domain theory:
– Represents one particular way of classifying entities in a domain.
– Not just an arbitrary collection of hypotheses, but a principled system.
• What’s missing?– Explicit representation of the principles.
• Hypothesis spaces (and priors) are generated by theories. Some analogies:– Grammars generate languages (and priors over structural
descriptions)
– Hierarchical Bayesian modeling
![Page 103: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/103.jpg)
• H: Hypothesis space of possible concepts:– Mathematical properties: even, odd, square, prime, . . . .
– Approximate magnitude: {1-10}, {10-20}, {20-30}, . . . .
– Raw magnitude: all intervals between 1 and 100.
• X = {x1, . . . , xn}: n examples of a concept C.
• Evaluate hypotheses given data:
– p(h) [“prior”]: domain knowledge, pre-existing biases
– p(X|h) [“likelihood”]: statistical information in examples.
– p(h|X) [“posterior”]: degree of belief that h is the true extension of C.
Bayesian model
)(
)()|()|(
Xp
hphXpXhp
![Page 104: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/104.jpg)
• H: Hypothesis space of possible concepts:– Mathematical properties: even, odd, square, prime, . . . .
– Approximate magnitude: {1-10}, {10-20}, {20-30}, . . . .
– Raw magnitude: all intervals between 1 and 100.
• X = {x1, . . . , xn}: n examples of a concept C.
• Evaluate hypotheses given data:
– p(h) [“prior”]: domain knowledge, pre-existing biases
– p(X|h) [“likelihood”]: statistical information in examples.
– p(h|X) [“posterior”]: degree of belief that h is the true extension of C.
Bayesian model
Hh
hphXp
hphXpXhp
)()|(
)()|()|(
![Page 105: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/105.jpg)
Likelihood: p(X|h)
• Size principle: Smaller hypotheses receive greater likelihood, and exponentially more so as n increases.
• Follows from assumption of randomly sampled examples.
• Captures the intuition of a representative sample.
hxx
n
nhhXp
,,if
1)size(
1)|(
hxi any if 0
![Page 106: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/106.jpg)
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100
Illustrating the size principle
h1 h2
![Page 107: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/107.jpg)
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100
Illustrating the size principle
h1 h2
Data slightly more of a coincidence under h1
![Page 108: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/108.jpg)
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100
Illustrating the size principle
h1 h2
Data much more of a coincidence under h1
![Page 109: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/109.jpg)
Bayesian Occam’s Razor
All possible data sets d
p(D
= d
| M
)
M1
M2
1)|( all
MdDpDd
For any model M,
Law of “Conservation
of Belief”
![Page 110: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/110.jpg)
Comparing simple and complex hypotheses
Pro
babi
lity
Distribution is an average over all values of p
![Page 111: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/111.jpg)
Prior: p(h)
• Choice of hypothesis space embodies a strong prior: effectively, p(h) ~ 0 for many logically possible but conceptually unnatural hypotheses.
• Prevents overfitting by highly specific but unnatural hypotheses, e.g. “multiples of 10 except 50 and 70”.
![Page 112: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/112.jpg)
Prior: p(h)
• Choice of hypothesis space embodies a strong prior: effectively, p(h) ~ 0 for many logically possible but conceptually unnatural hypotheses.
• Prevents overfitting by highly specific but unnatural hypotheses, e.g. “multiples of 10 except 50 and 70”.
• p(h) encodes relative weights of alternative theories:
H1: Math properties (24)
• even numbers• powers of two• multiples of three ….
H2: Raw magnitude (5050)
• 10-15• 20-32• 37-54 ….
H3: Approx. magnitude (10)
• 10-20• 20-30• 30-40 ….
H: Total hypothesis spacep(H1) = 1/5
p(H2) = 3/5p(H3) = 1/5
p(h) = p(H1) / 24 p(h) = p(H2) / 5050 p(h) = p(H3) / 10
![Page 113: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/113.jpg)
A more complex approach to priors
• Start with a base set of regularities R and combination operators C.
• Hypothesis space = closure of R under C.
– C = {and, or}: H = unions and intersections of regularities in R (e.g., “multiples of 10 between 30 and 70”).
– C = {and-not}: H = regularities in R with exceptions (e.g., “multiples of 10 except 50 and 70”).
• Two qualitatively similar priors:
– Description length: number of combinations in C needed to generate hypothesis from R.
– Bayesian Occam’s Razor, with model classes defined by number of combinations: more combinations more hypotheses lower prior
![Page 114: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/114.jpg)
Posterior:
• X = {60, 80, 10, 30}
• Why prefer “multiples of 10” over “even numbers”? p(X|h).
• Why prefer “multiples of 10” over “multiples of 10 except 50 and 20”? p(h).
• Why does a good generalization need both high prior and high likelihood? p(h|X) ~ p(X|h) p(h)
Hh
hphXp
hphXpXhp
)()|(
)()|()|(
![Page 115: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/115.jpg)
Bayesian Occam’s RazorProbabilities provide a common currency for balancing model complexity with fit to the data.
![Page 116: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/116.jpg)
Generalizing to new objects
Given p(h|X), how do we compute , the probability that C applies to some new stimulus y?
![Page 117: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/117.jpg)
Generalizing to new objects
Hypothesis averaging:
Compute the probability that C applies to some new object y by averaging the predictions of all hypotheses h, weighted by p(h|X):
Hh
XhphCypXCyp )|()|()|(
hy
hy
if 0
if 1
},{
)|(Xyh
Xhp
![Page 118: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/118.jpg)
Examples: 16
![Page 119: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/119.jpg)
Connection to feature-based similarity
• Additive clustering model of similarity:
• Bayesian hypothesis averaging:
• Equivalent if we identify features fk with hypotheses h, and weights wk with .
},{
)|(Xyh
Xhp
k
jkikkij ffws
},{
)()|(Xyh
hphXp
)|( Xhp
![Page 120: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/120.jpg)
Examples: 16 8 2 64
![Page 121: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/121.jpg)
Examples: 16 23 19 20
![Page 122: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/122.jpg)
Model fits
Examples of“yes” numbers
Generalizationjudgments (N = 20)
60
60 80 10 30
60 52 57 55
Bayesian Model (r = 0.96)
![Page 123: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/123.jpg)
Model fits
Examples of“yes” numbers
Generalizationjudgments (N = 20)
16
16 8 2 64
16 23 19 20
Bayesian Model (r = 0.93)
![Page 124: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/124.jpg)
Summary of the Bayesian model
• How do the statistics of the examples interact with prior knowledge to guide generalization?
• Why does generalization appear rule-based or similarity-based?
priorlikelihoodposterior
principle size averaging hypothesis
broad p(h|X): similarity gradient narrow p(h|X): all-or-none rule
![Page 125: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/125.jpg)
Summary of the Bayesian model
• How do the statistics of the examples interact with prior knowledge to guide generalization?
• Why does generalization appear rule-based or similarity-based?
priorlikelihoodposterior
principle size averaging hypothesis
Many h of similar size: broad p(h|X) One h much smaller: narrow p(h|X)
![Page 126: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/126.jpg)
Alternative models• Neural networks
even multiple of 10
power of 2
multiple of 3
80
10
30
60
![Page 127: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/127.jpg)
Alternative models• Neural networks
• Hypothesis ranking and elimination
even multiple of 10
power of 2
multiple of 3
80
10
30
60
Hypothesis ranking: 1 2 3 4 ….
….
![Page 128: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/128.jpg)
Model (r = 0.80)Data
Alternative models• Neural networks
• Hypothesis ranking and elimination
• Similarity to exemplars– Average similarity:
60
60 80 10 30
60 52 57 55
),(sim||
1)|( j
Xx
xyX
XCyp
j
![Page 129: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/129.jpg)
Model (r = 0.64)Data
Alternative models• Neural networks
• Hypothesis ranking and elimination
• Similarity to exemplars– Max similarity: ),(simmax)|( j
XxxyXCyp
j
60
60 80 10 30
60 52 57 55
![Page 130: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/130.jpg)
Alternative models
• Neural networks
• Hypothesis ranking and elimination
• Similarity to exemplars– Average similarity– Max similarity– Flexible similarity? Bayes.
![Page 131: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/131.jpg)
Alternative models
• Neural networks
• Hypothesis ranking and elimination
• Similarity to exemplars
• Toolbox of simple heuristics– 60: “general” similarity– 60 80 10 30: most specific rule (“subset principle”).– 60 52 57 55: similarity in magnitude
Why these heuristics? When to use which heuristic? Bayes.
![Page 132: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/132.jpg)
Summary• Generalization from limited data possible via the interaction of
structured knowledge and statistics. – Structured knowledge: space of candidate rules, theories generate hypothesis
space (c.f. hierarchical priors)– Statistics: Bayesian Occam’s razor.
• Better understand the interactions between traditionally opposing concepts:– Rules and statistics– Rules and similarity
• Explains why central but notoriously slippery processing-level concepts work the way they do. – Similarity– Representativeness
– Rules and representativeness
![Page 133: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/133.jpg)
Why Bayes?• A framework for explaining cognition.
– How people can learn so much from such limited data.
– Why process-level models work the way that they do.
– Strong quantitative models with minimal ad hoc assumptions.
• A framework for understanding how structured knowledge and statistical inference interact.– How structured knowledge guides statistical inference, and is itself acquired
through higher-order statistical learning.
– How simplicity trades off with fit to the data in evaluating structural hypotheses (Occam’s razor).
– How increasingly complex structures may grow as required by new data, rather than being pre-specified in advance.
![Page 134: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/134.jpg)
• Rational statistical inference (Bayes):
• Learners’ domain theories generate their hypothesis space H and prior p(h). – Well-matched to structure of the natural world.– Learnable from limited data. – Computationally tractable inference.
Hh
hphdp
hphdpdhp
)()|(
)()|()|(
Theory-Based Bayesian Models
![Page 135: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/135.jpg)
Looking towards the afternoon
• How do we apply these ideas to more natural and complex aspects of cognition?
• Where do the hypothesis spaces come from?
• Can we formalize the contributions of domain theories?
![Page 136: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/136.jpg)
![Page 137: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/137.jpg)
Outline• Morning
– Introduction (Josh) – Basic case study #1: Flipping coins (Tom)– Basic case study #2: Rules and similarity (Josh)
• Afternoon– Advanced case study #1: Causal induction (Tom)– Advanced case study #2: Property induction (Josh)– Quick tour of more advanced topics (Tom)
![Page 138: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/138.jpg)
Outline• Morning
– Introduction (Josh) – Basic case study #1: Flipping coins (Tom)– Basic case study #2: Rules and similarity (Josh)
• Afternoon– Advanced case study #1: Causal induction (Tom)– Advanced case study #2: Property induction (Josh)– Quick tour of more advanced topics (Tom)
![Page 139: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/139.jpg)
Marr’s Three Levels of Analysis
• Computation: “What is the goal of the computation, why is it
appropriate, and what is the logic of the strategy by which it can be carried out?”
• Representation and algorithm: Cognitive psychology
• Implementation:Neurobiology
![Page 140: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/140.jpg)
Working at the computational level
• What is the computational problem?– input: data– output: solution
statistical
![Page 141: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/141.jpg)
Working at the computational level
• What is the computational problem?– input: data– output: solution
• What knowledge is available to the learner?
• Where does that knowledge come from?
statistical
![Page 142: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/142.jpg)
• Rational statistical inference (Bayes):
• Learners’ domain theories generate their hypothesis space H and prior p(h). – Well-matched to structure of the natural world.– Learnable from limited data. – Computationally tractable inference.
Hh
hphdp
hphdpdhp
)()|(
)()|()|(
Theory-Based Bayesian Models
![Page 143: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/143.jpg)
Causality
![Page 144: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/144.jpg)
Bayes nets and beyond...
• Increasingly popular approach to studying human causal inferences
(e.g. Glymour, 2001; Gopnik et al., 2004)
• Three reactions:– Bayes nets are the solution!– Bayes nets are missing the point, not sure why…– what is a Bayes net?
![Page 145: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/145.jpg)
Bayes nets and beyond...
• What are Bayes nets?– graphical models– causal graphical models
• An example: elemental causal induction
• Beyond Bayes nets…– other knowledge in causal induction– formalizing causal theories
![Page 146: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/146.jpg)
Bayes nets and beyond...
• What are Bayes nets?– graphical models– causal graphical models
• An example: elemental causal induction
• Beyond Bayes nets…– other knowledge in causal induction– formalizing causal theories
![Page 147: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/147.jpg)
Graphical models
• Express the probabilistic dependency structure among a set of variables (Pearl, 1988)
• Consist of– a set of nodes, corresponding to variables– a set of edges, indicating dependency– a set of functions defined on the graph that
defines a probability distribution
![Page 148: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/148.jpg)
Undirected graphical models
• Consist of– a set of nodes– a set of edges– a potential for each clique, multiplied together to yield the distribution over variables
• Examples– statistical physics: Ising model, spinglasses– early neural networks (e.g. Boltzmann machines)
X1
X2
X3 X4
X5
![Page 149: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/149.jpg)
Directed graphical modelsX3 X4
X5
X1
X2
• Consist of– a set of nodes– a set of edges– a conditional probability distribution for each node, conditioned on its parents, multiplied
together to yield the distribution over variables
• Constrained to directed acyclic graphs (DAG)• AKA: Bayesian networks, Bayes nets
![Page 150: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/150.jpg)
Bayesian networks and Bayes
• Two different problems– Bayesian statistics is a method of inference– Bayesian networks are a form of representation
• There is no necessary connection– many users of Bayesian networks rely upon
frequentist statistical methods (e.g. Glymour)– many Bayesian inferences cannot be easily
represented using Bayesian networks
![Page 151: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/151.jpg)
Properties of Bayesian networks
• Efficient representation and inference– exploiting dependency structure makes it easier
to represent and compute with probabilities
• Explaining away– pattern of probabilistic reasoning characteristic
of Bayesian networks, especially early use in AI
![Page 152: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/152.jpg)
• Three binary variables: Cavity, Toothache, Catch
Efficient representation and inference
![Page 153: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/153.jpg)
• Three binary variables: Cavity, Toothache, Catch
• Specifying P(Cavity, Toothache, Catch) requires 7 parameters (1 for each set of values, minus 1 because it’s a probability distribution)
• With n variables, we need 2n -1 parameters• Here n=3. Realistically, many more: X-ray, diet,
oral hygiene, personality, . . . .
Efficient representation and inference
![Page 154: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/154.jpg)
• All three variables are dependent, but Toothache and Catch are independent given the presence or absence of Cavity
• In probabilistic terms:
• With n evidence variables, x1, …, xn, we need 2 n
conditional probabilities:
Conditional independence
)|()|()|( cavcatchPcavachePcavcatchacheP )|()|()|( cavcatchPcavachePcavcatchacheP
)|()|(1 cavcatchPcavacheP
)|(),|( cavxPcavxP ii
![Page 155: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/155.jpg)
• Graphical representation of relations between a set of random variables:
• Probabilistic interpretation: factorizing complex terms
A simple Bayesian network
Cavity
Toothache Catch
)()|,(),,( CavPCavCatchAchePCavCatchAcheP )()|()|( CavPCavCatchPCavAcheP
},,{
])[parents|(),,(CBAV
VVPCBAP
![Page 156: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/156.jpg)
• Joint distribution sufficient for any inference:
A more complex systemBattery
Radio Ignition Gas
Starts
On time to work
)|(),|()()|()|()(),,,,,( SOPGISPGPBIPBRPBPOSGIRBP
)(
)|(),|()()|()|()(
)(
),()|( ,,,
GP
SOPGISPGPBIPBRPBP
GP
GOPGOP SIRB
![Page 157: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/157.jpg)
• Joint distribution sufficient for any inference:
A more complex systemBattery
Radio Ignition Gas
Starts
On time to work
)|(),|()()|()|()(),,,,,( SOPGISPGPBIPBRPBPOSGIRBP
S IB
SOPGISPBIPBPGP
GOPGOP )|(),|()|()(
)(
),()|(
,
![Page 158: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/158.jpg)
• Joint distribution sufficient for any inference:
• General inference algorithm: local message passing (belief propagation; Pearl, 1988)– efficiency depends on sparseness of graph structure
A more complex systemBattery
Radio Ignition Gas
Starts
On time to work
)|(),|()()|()|()(),,,,,( SOPGISPGPBIPBRPBPOSGIRBP
![Page 159: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/159.jpg)
• Assume grass will be wet if and only if it rained last night, or if the sprinklers were left on:
Explaining away
Rain Sprinkler
Grass Wet
.andif0 sSrR
),|()()(),,( RSWPSPRPWSRP
rRsSRSwWP orif1),|(
![Page 160: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/160.jpg)
Explaining away
Rain Sprinkler
Grass Wet
)(
)()|()|(
wP
rPrwPwrP
Compute probability it rained last night, given that the grass is wet:
.andif0 sSrR
),|()()(),,( RSWPSPRPWSRP
rRsSRSwWP orif1),|(
![Page 161: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/161.jpg)
Explaining away
Rain Sprinkler
Grass Wet
sr
srPsrwP
rPrwPwrP
,
),(),|(
)()|()|(
Compute probability it rained last night, given that the grass is wet:
.andif0 sSrR
),|()()(),,( RSWPSPRPWSRP
rRsSRSwWP orif1),|(
![Page 162: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/162.jpg)
Explaining away
Rain Sprinkler
Grass Wet
),(),(),(
)()|(
srPsrPsrP
rPwrP
Compute probability it rained last night, given that the grass is wet:
.andif0 sSrR
),|()()(),,( RSWPSPRPWSRP
rRsSRSwWP orif1),|(
![Page 163: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/163.jpg)
Explaining away
Rain Sprinkler
Grass Wet
Compute probability it rained last night, given that the grass is wet:
),()(
)()|(
srPrP
rPwrP
.andif0 sSrR
),|()()(),,( RSWPSPRPWSRP
rRsSRSwWP orif1),|(
![Page 164: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/164.jpg)
Explaining away
Rain Sprinkler
Grass Wet
)()()(
)()|(
sPrPrP
rPwrP
Compute probability it rained last night, given that the grass is wet:
Between 1 and P(s)
)(rP
.andif0 sSrR
),|()()(),,( RSWPSPRPWSRP
rRsSRSwWP orif1),|(
![Page 165: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/165.jpg)
Explaining away
Rain Sprinkler
Grass Wet
Compute probability it rained last night, given that the grass is wet and sprinklers were left on:
)|(
)|(),|(),|(
swP
srPsrwPswrP
Both terms = 1
.andif0 sSrR
),|()()(),,( RSWPSPRPWSRP
rRsSRSwWP orif1),|(
![Page 166: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/166.jpg)
Explaining away
Rain Sprinkler
Grass Wet
Compute probability it rained last night, given that the grass is wet and sprinklers were left on:
)(rP)|(),|( srPswrP
.andif0 sSrR
),|()()(),,( RSWPSPRPWSRP
rRsSRSwWP orif1),|(
![Page 167: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/167.jpg)
Explaining away
Rain Sprinkler
Grass Wet
)(rP)|(),|( srPswrP )()()(
)()|(
sPrPrP
rPwrP
)(rP
“Discounting” to prior probability.
.andif0 sSrR
),|()()(),,( RSWPSPRPWSRP
rRsSRSwWP orif1),|(
![Page 168: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/168.jpg)
• Formulate IF-THEN rules:– IF Rain THEN Wet– IF Wet THEN Rain
• Rules do not distinguish directions of inference• Requires combinatorial explosion of rules
Contrast w/ production system
Rain
Grass Wet
Sprinkler
IF Wet AND NOT Sprinkler THEN Rain
![Page 169: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/169.jpg)
• Observing rain, Wet becomes more active. • Observing grass wet, Rain and Sprinkler become more active.• Observing grass wet and sprinkler, Rain cannot become less active. No explaining away!
• Excitatory links: Rain Wet, Sprinkler Wet
Contrast w/ spreading activation
Rain Sprinkler
Grass Wet
![Page 170: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/170.jpg)
• Observing grass wet, Rain and Sprinkler become more active.• Observing grass wet and sprinkler, Rain becomes less active: explaining away.
• Excitatory links: Rain Wet, Sprinkler Wet• Inhibitory link: Rain Sprinkler
Contrast w/ spreading activation
Rain Sprinkler
Grass Wet
![Page 171: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/171.jpg)
• Each new variable requires more inhibitory connections.• Interactions between variables are not causal.• Not modular.
– Whether a connection exists depends on what other connections exist, in non-transparent ways. – Big holism problem. – Combinatorial explosion.
Contrast w/ spreading activationRain
Sprinkler
Grass Wet
Burst pipe
![Page 172: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/172.jpg)
Graphical models
• Capture dependency structure in distributions
• Provide an efficient means of representing and reasoning with probabilities
• Allow kinds of inference that are problematic for other representations: explaining away– hard to capture in a production system– hard to capture with spreading activation
![Page 173: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/173.jpg)
Bayes nets and beyond...
• What are Bayes nets?– graphical models– causal graphical models
• An example: causal induction
• Beyond Bayes nets…– other knowledge in causal induction– formalizing causal theories
![Page 174: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/174.jpg)
Causal graphical models
• Graphical models represent statistical dependencies among variables (ie. correlations)– can answer questions about observations
• Causal graphical models represent causal dependencies among variables– express underlying causal structure– can answer questions about both observations and
interventions (actions upon a variable)
![Page 175: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/175.jpg)
Observation and interventionBattery
Radio Ignition Gas
Starts
On time to work
Graphical model: P(Radio|Ignition)
Causal graphical model: P(Radio|do(Ignition))
![Page 176: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/176.jpg)
Observation and interventionBattery
Radio Ignition Gas
Starts
On time to work
Graphical model: P(Radio|Ignition)
Causal graphical model: P(Radio|do(Ignition))
“graph surgery” produces “mutilated graph”
![Page 177: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/177.jpg)
Assessing interventions
• To compute P(Y|do(X=x)), delete all edges coming into X and reason with the resulting Bayesian network (“do calculus”; Pearl, 2000)
• Allows a single structure to make predictions about both observations and interventions
![Page 178: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/178.jpg)
• Using a representation in which the direction of causality is correct produces sparser graphs• Suppose we get the direction of causality wrong, thinking that “symptoms” causes
“diseases”:
• Does not capture the correlation between symptoms: falsely believe P(Ache, Catch) = P(Ache) P(Catch).
Causality simplifies inference
Ache Catch
Cavity
![Page 179: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/179.jpg)
• Using a representation in which the direction of causality is correct produces sparser graphs• Suppose we get the direction of causality wrong, thinking that “symptoms” causes “diseases”:
• Inserting a new arrow allows us to capture this correlation.• This model is too complex: do not believe that
Ache Catch
Cavity
)|()|()|,( CavCatchPCavAchePCavCatchAcheP
Causality simplifies inference
![Page 180: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/180.jpg)
• Using a representation in which the direction of causality is correct produces sparser graphs• Suppose we get the direction of causality wrong, thinking that “symptoms” causes “diseases”:
• New symptoms require a combinatorial proliferation of new arrows. This reduces efficiency of inference.
Ache Catch
Cavity
X-ray
Causality simplifies inference
![Page 181: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/181.jpg)
• Strength: how strong is a relationship?• Structure: does a relationship exist?
E
B C
E
B CB B
Learning causal graphical models
![Page 182: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/182.jpg)
• Strength: how strong is a relationship?
E
B C
E
B CB B
Causal structure vs. causal strength
![Page 183: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/183.jpg)
• Strength: how strong is a relationship?– requires defining nature of relationship
E
B C
w0 w1
E
B C
w0
B B
Causal structure vs. causal strength
![Page 184: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/184.jpg)
Parameterization
• Structures: h1 = h0 =
• Parameterization:
E
B C
E
B C
C B
0 01 00 11 1
h1: P(E = 1 | C, B) h0: P(E = 1| C, B)
p00
p10
p01
p11
p0
p0
p1
p1
Generic
![Page 185: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/185.jpg)
Parameterization
• Structures: h1 = h0 =
• Parameterization:
E
B C
E
B C
w0 w1w0
w0, w1: strength parameters for B, C
C B
0 01 00 11 1
h1: P(E = 1 | C, B) h0: P(E = 1| C, B)
0w1
w0
w1+ w0
00w0
w0
Linear
![Page 186: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/186.jpg)
Parameterization
• Structures: h1 = h0 =
• Parameterization:
E
B C
E
B C
w0 w1w0
w0, w1: strength parameters for B, C
C B
0 01 00 11 1
h1: P(E = 1 | C, B) h0: P(E = 1| C, B)
0w1
w0
w1+ w0 – w1 w0
00w0
w0
“Noisy-OR”
![Page 187: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/187.jpg)
Parameter estimation
• Maximum likelihood estimation:
maximize i P(bi,ci,ei; w0, w1)
• Bayesian methods: as in the “Comparing infinitely many hypotheses” example…
![Page 188: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/188.jpg)
• Structure: does a relationship exist?
E
B C
E
B CB B
Causal structure vs. causal strength
![Page 189: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/189.jpg)
Approaches to structure learning
• Constraint-based– dependency from statistical tests (eg. 2)– deduce structure from dependencies E
B CB
(Pearl, 2000; Spirtes et al., 1993)
![Page 190: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/190.jpg)
Approaches to structure learning
E
B CB• Constraint-based:– dependency from statistical tests (eg. 2)– deduce structure from dependencies
(Pearl, 2000; Spirtes et al., 1993)
![Page 191: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/191.jpg)
Approaches to structure learning
E
B CB• Constraint-based:– dependency from statistical tests (eg. 2)– deduce structure from dependencies
(Pearl, 2000; Spirtes et al., 1993)
![Page 192: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/192.jpg)
Approaches to structure learning
E
B CB
Attempts to reduce inductive problem to deductive problem
• Constraint-based:– dependency from statistical tests (eg. 2)– deduce structure from dependencies
(Pearl, 2000; Spirtes et al., 1993)
![Page 193: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/193.jpg)
Approaches to structure learning
E
B CB
• Bayesian:– compute posterior
probability of structures,
given observed dataE
B C
E
B C
P(S|data) P(data|S) P(S)
P(S1|data) P(S0|data)
• Constraint-based:– dependency from statistical tests (eg. 2)– deduce structure from dependencies
(Pearl, 2000; Spirtes et al., 1993)
(Heckerman, 1998; Friedman, 1999)
![Page 194: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/194.jpg)
Causal graphical models
• Extend graphical models to deal with interventions as well as observations
• Respecting the direction of causality results in efficient representation and inference
• Two steps in learning causal models– parameter estimation– structure learning
![Page 195: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/195.jpg)
Bayes nets and beyond...
• What are Bayes nets?– graphical models– causal graphical models
• An example: elemental causal induction
• Beyond Bayes nets…– other knowledge in causal induction– formalizing causal theories
![Page 196: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/196.jpg)
Elemental causal induction
“To what extent does C cause E?”
E present
E absent
C present C absent
a
b
c
d
![Page 197: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/197.jpg)
• Strength: how strong is a relationship?• Structure: does a relationship exist?
E
B C
w0 w1
E
B C
w0
B B
Causal structure vs. causal strength
![Page 198: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/198.jpg)
Causal strength
• Assume structure:
• Leading models (P and causal power) are maximum likelihood estimates of the strength parameter w1, under different parameterizations for P(E|B,C): – linear P, Noisy-OR causal power
E
B C
w0 w1
B
![Page 199: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/199.jpg)
• Hypotheses: h1 = h0 =
• Bayesian causal inference:
support =
E
B C
E
B CB B
101
0
1
0 110101 )|,(),|data()|data( dwdwhwwpwwPhP
01
0 0000 )|()|data()|data( dwhwpwPhP
Causal structure
![Page 200: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/200.jpg)
People
P (r = 0.89)
Power (r = 0.88)
Support (r = 0.97)
Buehner and Cheng (1997)
![Page 201: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/201.jpg)
The importance of parameterization
• Noisy-OR incorporates mechanism assumptions:– generativity: causes increase probability of effects
– each cause is sufficient to produce the effect
– causes act via independent mechanisms(Cheng, 1997)
• Consider other models:– statistical dependence: 2 test
– generic parameterization (Anderson, computer science)
![Page 202: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/202.jpg)
People
Support (Noisy-OR)
2
Support (generic)
![Page 203: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/203.jpg)
Generativity is essential
• Predictions result from “ceiling effect”– ceiling effects only matter if you believe a cause increases the
probability of an effect
P(e+|c+)P(e+|c-)
8/88/8
6/86/8
4/84/8
2/82/8
0/80/8
Support 10050
0
![Page 204: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/204.jpg)
Bayes nets and beyond...
• What are Bayes nets?– graphical models– causal graphical models
• An example: elemental causal induction
• Beyond Bayes nets…– other knowledge in causal induction– formalizing causal theories
![Page 205: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/205.jpg)
Hamadeh et al. (2002) Toxicological sciences.
Clofibrate Wyeth 14,643 Gemfibrozil Phenobarbital
p450 2B1 Carnitine Palmitoyl Transferase 1
chemicalsgenes
![Page 206: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/206.jpg)
Clofibrate Wyeth 14,643 Gemfibrozil Phenobarbital
p450 2B1 Carnitine Palmitoyl Transferase 1
X
Hamadeh et al. (2002) Toxicological sciences.
chemicalsgenes
![Page 207: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/207.jpg)
Clofibrate Wyeth 14,643 Gemfibrozil Phenobarbital
p450 2B1 Carnitine Palmitoyl Transferase 1
Chemical X
+++
peroxisome proliferators
Hamadeh et al. (2002) Toxicological sciences.
chemicalsgenes
![Page 208: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/208.jpg)
Using causal graphical models
• Three questions (usually solved by researcher)– what are the variables?– what structures are plausible?– how do variables interact?
• How are these questions answered if causal graphical models are used in cognition?
![Page 209: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/209.jpg)
Bayes nets and beyond...
• What are Bayes nets?– graphical models– causal graphical models
• An example: elemental causal induction
• Beyond Bayes nets…– other knowledge in causal induction– formalizing causal theories
![Page 210: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/210.jpg)
Theory-based causal induction
Causal theory– Ontology
– Plausible relations
– Functional form
Z
B YX
Z
B YXh0:h1:
P(h1) = P(h0) =1 –
Hypothesis space of causal graphical models
Generates
P(h|data) P(data|h) P(h)Evaluated by statistical inference
![Page 211: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/211.jpg)
Blicket detector (Gopnik, Sobel, and colleagues)
See this? It’s a blicket machine. Blickets make it go.
Let’s put this oneon the machine.
Oooh, it’s a blicket!
![Page 212: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/212.jpg)
– Two objects: A and B– Trial 1: A on detector – detector active– Trial 2: B on detector – detector inactive– Trials 3,4: A B on detector – detector active– 3, 4-year-olds judge whether each object is a blicket
• A: a blicket• B: not a blicket
“Blocking”
Trial 1 Trials 3, 4A B Trial 2
![Page 213: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/213.jpg)
A deductive inference?• Causal law: detector activates if and only if one or
more objects on top of it are blickets. • Premises:
– Trial 1: A on detector – detector active– Trial 2: B on detector – detector inactive– Trials 3,4: A B on detector – detector active
• Conclusions deduced from premises and causal law:– A: a blicket– B: not a blicket
![Page 214: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/214.jpg)
– Two objects: A and B– Trial 1: A B on detector – detector active– Trial 2: A on detector – detector active– 4-year-olds judge whether each object is a blicket
• A: a blicket (100% of judgments)
• B: probably not a blicket (66% of judgments)
“Backwards blocking” (Sobel, Tenenbaum & Gopnik, 2004)
Trial 1 Trial 2A B
![Page 215: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/215.jpg)
• Ontology– Types: Block, Detector, Trial
– Predicates:
Contact(Block, Detector, Trial)
Active(Detector, Trial)
• Constraints on causal relations– For any Block b and Detector d, with prior probability
q: Cause(Contact(b,d,t), Active(d,t))
• Functional form of causal relations– Causes of Active(d,t) are independent mechanisms, with
causal strengths wi. A background cause has strength w0. Assume a near-deterministic mechanism: wi ~ 1, w0 ~ 0.
Theory
![Page 216: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/216.jpg)
• Ontology– Types: Block, Detector, Trial
– Predicates:
Contact(Block, Detector, Trial)
Active(Detector, Trial)
Theory
E
A B
![Page 217: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/217.jpg)
• Ontology– Types: Block, Detector, Trial
– Predicates:
Contact(Block, Detector, Trial)
Active(Detector, Trial)
Theory
E
A B
A = 1 if Contact(block A, detector, trial), else 0B = 1 if Contact(block B, detector, trial), else 0E = 1 if Active(detector, trial), else 0
![Page 218: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/218.jpg)
• Constraints on causal relations– For any Block b and Detector d, with prior probability
q: Cause(Contact(b,d,t), Active(d,t))
Theory
h00 : h10 :
h01 : h11 :
E
A B
E
A B
E
A B
E
A B
P(h00) = (1 – q)2 P(h10) = q(1 – q)
P(h01) = (1 – q) q P(h11) = q2
No hypotheses with E B, E A, A B, etc.
= “A is a blicket”
E
A
![Page 219: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/219.jpg)
• Functional form of causal relations– Causes of Active(d,t) are independent mechanisms, with
causal strengths wb. A background cause has strength w0. Assume a near-deterministic mechanism: wb ~ 1, w0 ~ 0.
Theory
“Activation law”: E=1 if and only if A=1 or B=1.
P(E=1 | A=0, B=0): 0 0 0 0
P(E=1 | A=1, B=0): 0 0 1 1P(E=1 | A=0, B=1): 0 1 0 1P(E=1 | A=1, B=1): 0 1 1 1
E
BA
E
BA
E
BA
E
BA
P(h00) = (1 – q)2 P(h10) = q(1 – q)P(h01) = (1 – q) q P(h11) = q2
![Page 220: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/220.jpg)
Bayesian inference
• Evaluating causal models in light of data:
• Inferring a particular causal relation:
Hj
hjj
iii hPhdP
hPhdPdhP
)()|(
)()|()|(
H
jh
jj dhPhEAPdEAP )|()|()|(
![Page 221: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/221.jpg)
Modeling backwards blocking
P(E=1 | A=0, B=0): 0 0 0 0
P(E=1 | A=1, B=0): 0 0 1 1P(E=1 | A=0, B=1): 0 1 0 1P(E=1 | A=1, B=1): 0 1 1 1
E
BA
E
BA
E
BA
E
BA
P(h00) = (1 – q)2 P(h10) = q(1 – q)P(h01) = (1 – q) q P(h11) = q2
q
q
hPhP
hPhP
dEBP
dEBP
1)()(
)()(
)|(
)|(
1000
1101
![Page 222: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/222.jpg)
qhP
hPhP
dEBP
dEBP
1
1
)(
)()(
)|(
)|(
10
1101
P(E=1 | A=1, B=1): 0 1 1 1
E
BA
E
BA
E
BA
E
BA
P(h00) = (1 – q)2 P(h10) = q(1 – q)P(h01) = (1 – q) q P(h11) = q2
Modeling backwards blocking
![Page 223: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/223.jpg)
q
q
hP
hP
dEBP
dEBP
1)(
)(
)|(
)|(
10
11
P(E=1 | A=1, B=0): 0 1 1
P(E=1 | A=1, B=1): 1 1 1
E
BA
E
BA
E
BA
P(h10) = q(1 – q)P(h01) = (1 – q) q P(h11) = q2
Modeling backwards blocking
![Page 224: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/224.jpg)
After each trial, adults judge the probability that each object is a blicket.
Trial 1 Trial 2BA
I. Pre-training phase: Blickets are rare . . . .
II. Backwards blocking phase:
Manipulating the prior
![Page 225: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/225.jpg)
• “Rare” condition: First observe 12 objects on detector, of which 2 set it off.
![Page 226: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/226.jpg)
• “Common” condition: First observe 12 objects on detector, of which 10 set it off.
![Page 227: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/227.jpg)
After each trial, adults judge the probability that each object is a blicket.
Trial 1 Trial 2BA
I. Pre-training phase: Blickets are rare . . . .
II. Two trials: A B detector, B C detector
Inferences from ambiguous data
C
![Page 228: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/228.jpg)
• Hypotheses: h000 = h100 =
h010 = h001 =
h110 = h011 =
h101 = h111 =
• Likelihoods:
E
A B C
E
A B C
E
A B C
E
A B C
E
A B C
E
A B C
E
A B C
E
A B C
if A = 1 and A E exists, or B = 1 and B E exists, or C = 1 and C E exists, else 0.
P(E=1| A, B, C; h) = 1
Same domain theory generates hypothesis space for 3 objects:
![Page 229: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/229.jpg)
• “Rare” condition: First observe 12 objects on detector, of which 2 set it off.
![Page 230: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/230.jpg)
The role of causal mechanism knowledge
• Is mechanism knowledge necessary?– Constraint-based learning using 2 tests of
conditional independence.
• How important is the deterministic functional form of causal relations?– Bayes with “noisy sufficient causes” theory (c.f.,
Cheng’s causal power theory).
![Page 231: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/231.jpg)
Bayes with correct theory:
Bayes with “noisy sufficient causes” theory:
![Page 232: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/232.jpg)
Theory-based causal induction
• Explains one-shot causal inferences about physical systems: blicket detectors
• Captures a spectrum of inferences:– unambiguous data: adults and children make all-or-
none inferences
– ambiguous data: adults and children make more graded inferences
• Extends to more complex cases with hidden variables, dynamic systems: come to my talk!
![Page 233: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/233.jpg)
Summary
• Causal graphical models provide a language for asking questions about causality
• Key issues in modeling causal induction:– what do we mean by causal induction?– how do knowledge and statistics interact?
• Bayesian approach allows exploration of different answers to these questions
![Page 234: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/234.jpg)
Outline• Morning
– Introduction (Josh) – Basic case study #1: Flipping coins (Tom)– Basic case study #2: Rules and similarity (Josh)
• Afternoon– Advanced case study #1: Causal induction (Tom)– Advanced case study #2: Property induction (Josh)– Quick tour of more advanced topics (Tom)
![Page 235: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/235.jpg)
Property induction
![Page 236: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/236.jpg)
Collaborators
Charles Kemp Neville Sanjana
Lauren Schmidt Amy Perfors
Fei Xu Liz Baraff
Pat Shafto
![Page 237: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/237.jpg)
The Big Question
• How can we generalize new concepts reliably from just one or a few examples? – Learning word meanings
“horse” “horse” “horse”
![Page 238: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/238.jpg)
The Big Question
• How can we generalize new concepts reliably from just one or a few examples? – Learning word meanings, causal relations, social rules,
….– Property induction
How probable is the the conclusion (target) given the premises (examples)?
Gorillas have T4 cells.Squirrels have T4 cells.
All mammals have T4 cells.
![Page 239: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/239.jpg)
The Big Question
• How can we generalize new concepts reliably from just one or a few examples? – Learning word meanings, causal relations,
social rules, ….– Property induction
Gorillas have T4 cells.Squirrels have T4 cells.
All mammals have T4 cells.
Gorillas have T4 cells.Chimps have T4 cells.
All mammals have T4 cells.
More diverse examples stronger generalization
![Page 240: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/240.jpg)
Is rational inference the answer?• Everyday induction often appears to follow
principles of rational scientific inference. – Could that explain its success?
• Goal of this work: a rational computational model of human inductive generalization.– Explain people’s judgments as approximations to optimal
inference in natural environments.
– Close quantitative fits to people’s judgments with a minimum of free parameters or assumptions.
![Page 241: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/241.jpg)
• Rational statistical inference (Bayes):
• Learners’ domain theories generate their hypothesis space H and prior p(h). – Well-matched to structure of the natural world.– Learnable from limited data. – Computationally tractable inference.
Hh
hphdp
hphdpdhp
)()|(
)()|()|(
Theory-Based Bayesian Models
![Page 242: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/242.jpg)
The plan• Similarity-based models
• Theory-based model
• Bayesian models– “Empiricist” Bayes
– Theory-based Bayes, with different theories
• Connectionist (PDP) models
• Advanced Theory-based Bayes– Learning with multiple domain theories
– Learning domain theories
![Page 243: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/243.jpg)
The plan• Similarity-based models
• Theory-based model
• Bayesian models– “Empiricist” Bayes
– Theory-based Bayes, with different theories
• Connectionist (PDP) models
• Advanced Theory-based Bayes– Learning with multiple domain theories
– Learning domain theories
![Page 244: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/244.jpg)
• 20 subjects rated the strength of 45 arguments:
X1 have property P.
X2 have property P.
X3 have property P.
All mammals have property P.
• 40 different subjects rated the similarity of all pairs of 10 mammals.
An experiment(Osherson et al., 1990)
![Page 245: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/245.jpg)
Similarity-based models(Osherson et al.)
strength(“all mammals” | X )
mammals
),sim(i
Xi
xx
x
Mammals:Examples: x
![Page 246: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/246.jpg)
Similarity-based models(Osherson et al.)
strength(“all mammals” | X )
mammals
),sim(i
Xi
xx
x
Mammals:Examples: x
![Page 247: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/247.jpg)
Similarity-based models(Osherson et al.)
strength(“all mammals” | X )
mammals
),sim(i
Xi
xx
x
Mammals:Examples: x
![Page 248: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/248.jpg)
Similarity-based models(Osherson et al.)
strength(“all mammals” | X )
mammals
),sim(i
Xi
xx
x
Mammals:Examples: x
![Page 249: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/249.jpg)
Similarity-based models(Osherson et al.)
strength(“all mammals” | X )
mammals
),sim(i
Xi
xx
x
Mammals:Examples: x
• Sum-Similarity:
Xj
jiXi ),sim(),sim(
![Page 250: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/250.jpg)
Similarity-based models(Osherson et al.)
strength(“all mammals” | X )
mammals
),sim(i
Xi
xx
x
Mammals:Examples: x
• Max-Similarity:
),sim(max),sim( jiXiXj
max
![Page 251: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/251.jpg)
Similarity-based models(Osherson et al.)
strength(“all mammals” | X )
mammals
),sim(i
Xi
xx
x
Mammals:Examples: x
• Max-Similarity:
),sim(max),sim( jiXiXj
![Page 252: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/252.jpg)
Similarity-based models(Osherson et al.)
strength(“all mammals” | X )
mammals
),sim(i
Xi
xx
x
Mammals:Examples: x
• Max-Similarity:
),sim(max),sim( jiXiXj
![Page 253: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/253.jpg)
Similarity-based models(Osherson et al.)
strength(“all mammals” | X )
mammals
),sim(i
Xi
xx
x
Mammals:Examples: x
• Max-Similarity:
),sim(max),sim( jiXiXj
![Page 254: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/254.jpg)
Similarity-based models(Osherson et al.)
strength(“all mammals” | X )
mammals
),sim(i
Xi
xx
x
Mammals:Examples: x
• Max-Similarity:
),sim(max),sim( jiXiXj
![Page 255: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/255.jpg)
Sum-sim versus Max-sim• Two models appear functionally similar:
– Both increase monotonically as new examples are observed.
• Reasons to prefer Sum-sim:– Standard form of exemplar models of categorization,
memory, and object recognition.– Analogous to kernel density estimation techniques in
statistical pattern recognition.
• Reasons to prefer Max-sim:– Fit to generalization judgments . . . .
![Page 256: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/256.jpg)
Model
Dat
aData vs. models
Each “ ” represents one argument:X1 have property P.X2 have property P.X3 have property P.
All mammals have property P.
.
![Page 257: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/257.jpg)
Three data sets
Max-sim
Sum-sim
Conclusionkind:
Number ofexamples:
“all mammals” “horses” “horses”
3 2 1, 2, or 3
![Page 258: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/258.jpg)
Feature rating data(Osherson and Wilkie)
• People were given 48 animals, 85 features, and asked to rate whether each animal had each feature.
• E.g., elephant: 'gray' 'hairless' 'toughskin' 'big' 'bulbous' 'longleg' 'tail' 'chewteeth' 'tusks' 'smelly' 'walks' 'slow' 'strong' 'muscle’ 'quadrapedal' 'inactive' 'vegetation' 'grazer' 'oldworld' 'bush' 'jungle' 'ground' 'timid' 'smart' 'group'
![Page 259: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/259.jpg)
• Compute similarity based on Hamming distance, or cosine.
• Generalize based on Max-sim or Sum-sim. BABA
???????
?
Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10
Features New property
?
![Page 260: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/260.jpg)
Three data sets
Max-Sim
Sum-Sim
Conclusionkind:
Number ofexamples:
“all mammals” “horses” “horses”
3 2 1, 2, or 3
r = 0.77 r = 0.75 r = 0.94
r = – 0.21 r = 0.63 r = 0.19
![Page 261: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/261.jpg)
Problems for sim-based approach• No principled explanation for why Max-Sim works so
well on this task, and Sum-Sim so poorly, when Sum-Sim is the standard in other similarity-based models.
• Free parameters mixing similarity and coverage terms, and possibly Max-Sim and Sum-Sim terms.
• Does not extend to induction with other kinds of properties, e.g., from Smith et al., 1993:
Dobermanns can bite through wire.
German shepherds can bite through wire.
Poodles can bite through wire.
German shepherds can bite through wire.
![Page 262: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/262.jpg)
Marr’s Three Levels of Analysis
• Computation: “What is the goal of the computation, why is it
appropriate, and what is the logic of the strategy by which it can be carried out?”
• Representation and algorithm:Max-sim, Sum-sim
• Implementation:Neurobiology
![Page 263: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/263.jpg)
The plan• Similarity-based models
• Theory-based model
• Bayesian models– “Empiricist” Bayes
– Theory-based Bayes, with different theories
• Connectionist (PDP) models
• Advanced Theory-based Bayes– Learning with multiple domain theories
– Learning domain theories
![Page 264: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/264.jpg)
• Scientific biology: species generated by an evolutionary branching process.– A tree-structured taxonomy of species.
• Taxonomy also central in folkbiology (Atran).
Theory-based induction
![Page 265: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/265.jpg)
Begin by reconstructing intuitive taxonomy from similarity judgments:
chim
pgo
rilla
hors
eco
wel
epha
ntrh
ino
mou
sesq
uirre
ldo
lphi
nse
al
clustering
Theory-based induction
![Page 266: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/266.jpg)
How taxonomy constrains induction
• Atran (1998): “Fundamental principle of systematic induction” (Warburton 1967, Bock 1973)– Given a property found among members of any
two species, the best initial hypothesis is that the property is also present among all species that are included in the smallest higher-order taxon containing the original pair of species.
![Page 267: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/267.jpg)
elep
hant
squi
rrel
chim
pgo
rilla
hors
eco
w
rhin
om
ouse
dolp
hin
seal
“all mammals”
Cows have property P.Dolphins have property P.Squirrels have property P.
All mammals have property P.
Strong (0.76 [max = 0.82])
![Page 268: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/268.jpg)
elep
hant
squi
rrel
chim
pgo
rilla
hors
eco
w
rhin
om
ouse
dolp
hin
seal
Cows have property P.Dolphins have property P.Squirrels have property P.
All mammals have property P.
Cows have property P.Horses have property P.Rhinos have property P.
All mammals have property P.
“large herbivores”
Strong: 0.76 [max = 0.82]) Weak: 0.17 [min = 0.14]
![Page 269: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/269.jpg)
elep
hant
squi
rrel
chim
pgo
rilla
hors
eco
w
rhin
om
ouse
dolp
hin
seal
Seals have property P.Dolphins have property P.Squirrels have property P.
All mammals have property P.
Cows have property P.Dolphins have property P.Squirrels have property P.
All mammals have property P.
“all mammals”
Strong: 0.76 [max = 0.82] Weak: 0.30 [min = 0.14]
![Page 270: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/270.jpg)
Max-sim
Sum-sim
Conclusionkind:
Number ofexamples:
“all mammals” “horses” “horses”
3 2 1, 2, or 3
Taxonomicdistance
![Page 271: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/271.jpg)
The challenge
• Can we build models with the best of both traditional approaches?– Quantitatively accurate predictions.– Strong rational basis.
• Will require novel ways of integrating structured knowledge with statistical inference.
![Page 272: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/272.jpg)
The plan• Similarity-based models
• Theory-based model
• Bayesian models– “Empiricist” Bayes
– Theory-based Bayes, with different theories
• Connectionist (PDP) models
• Advanced Theory-based Bayes– Learning with multiple domain theories
– Learning domain theories
![Page 273: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/273.jpg)
The Bayesian approach
???????
?
Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10
New property
?
Features
![Page 274: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/274.jpg)
The Bayesian approach
???????
?
Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10
?
New property
GeneralizationHypothesis
Features
![Page 275: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/275.jpg)
The Bayesian approach
???????
?
Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10
?
New property
GeneralizationHypothesis
Features
![Page 276: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/276.jpg)
The Bayesian approach
???????
?
Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10
?
New property
GeneralizationHypothesis
Features
![Page 277: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/277.jpg)
The Bayesian approach
???????
?
Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10
?
New property
GeneralizationHypothesis
Features
![Page 278: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/278.jpg)
The Bayesian approach
???????
?
Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10
?
New property
GeneralizationHypothesis
Features
![Page 279: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/279.jpg)
The Bayesian approach
???????
?
Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10
?
New property
GeneralizationHypothesis
Features
![Page 280: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/280.jpg)
The Bayesian approach
???????
?
Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10
p(h)
New property
GeneralizationHypothesis
h dp(d |h)
Features
![Page 281: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/281.jpg)
???????
?
Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10
New property
GeneralizationHypothesis
h d
Hh
hphdp
hphdpdhp
)()|(
)()|()|(Bayes’ rule:
p(h)
p(d |h)
Features
![Page 282: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/282.jpg)
???????
?
Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10
New property
GeneralizationHypothesis
h d
Probability that property Q holds for species x:
p(d |h)
p(h)
Features
)( with consistent
)|()|)((
xQh
dhpdxQp
![Page 283: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/283.jpg)
???????
?
Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10
New property
GeneralizationHypothesis
h d
0
if d is consistentwith hotherwise
p(d |h)
p(h)
Features
“Size principle”: |h | = # of positive
instances of h
hhdp
1)|(
![Page 284: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/284.jpg)
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100
The size principle
h1 h2
“even numbers” “multiples of 10”
![Page 285: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/285.jpg)
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100
The size principle
Data slightly more of a coincidence under h1
h1 h2
“even numbers” “multiples of 10”
![Page 286: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/286.jpg)
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 100
The size principle
Data much more of a coincidence under h1
h1 h2
“even numbers” “multiples of 10”
![Page 287: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/287.jpg)
Illustrating the size principle
Grizzly bears have property P.
All mammals have property P.
Grizzly bears have property P.Brown bears have property P. Polar bears have property P.
All mammals have property P.
“Non-monotonicity”
Which argument is stronger?
![Page 288: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/288.jpg)
???????
?
Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10
New property
GeneralizationHypotheses
h d
...
p(Q(x)|d)
p(h)
Probability that property Q holdsfor species x: hhphhpdxQp
dh
dxQh
/)(/)()|)((
with consistent
),(with consistent
p(d |h)
![Page 289: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/289.jpg)
???????
?
Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10
New property
GeneralizationHypothesis
h d
Probability that property Q holdsfor species x:
p(d |h)
p(h)
Features
hhphhpdxQp
dh
dxQh
/)(/)()|)((
with consistent
),(with consistent
![Page 290: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/290.jpg)
Specifying the prior p(h)
• A good prior must focus on a small subset of all 2n possible hypotheses, in order to:– Match the distribution of properties in the world.– Be learnable from limited data. – Be efficiently computationally.
• We consider two approaches:– “Empiricist” Bayes: unstructured prior based directly on
known features.– “Theory-based” Bayes: structured prior based on rational
domain theory, tuned to known features.
![Page 291: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/291.jpg)
???????
?
Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10
p(h) =
New property
GeneralizationHypothesis
h d
Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10
h1 h2 h3 h4 h5 h6 h7 h8 h9 h10 h11 h12
15
1
15
1
15
1
15
1
15
1
15
1
15
1
15
115
1
15
115
2
15
3
Features
“Empiricist”Bayes:(Heit, 1998)
![Page 292: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/292.jpg)
Results
Max-Sim
r = 0.77 r = 0.75 r = 0.94
“Empiricist” Bayes
r = 0.38 r = 0.16 r = 0.79
![Page 293: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/293.jpg)
Why doesn’t “Empiricist” Bayes work?
• With no structural bias, requires too many features to estimate the prior reliably.
• An analogy: Estimating a smooth probability density function by local interpolation.
N = 5 N = 100 N = 500
![Page 294: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/294.jpg)
Why doesn’t “Empiricist” Bayes work?
N = 5 N = 5
Assuming an appropriatelystructured form for density (e.g., Gaussian) leads to better generalization from sparse data.
• With no structural bias, requires too many features to estimate the prior reliably.
• An analogy: Estimating a smooth probability density function by local interpolation.
![Page 295: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/295.jpg)
“Theory-based” BayesTheory: Two principles based on the structure of species
and properties in the natural world.
1. Species generated by an evolutionary branching process.– A tree-structured taxonomy of species (Atran, 1998).
2. Features generated by stochastic mutation process and passed on to descendants. – Novel features can appear anywhere in tree, but some
distributions are more likely than others.
![Page 296: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/296.jpg)
???????
?
Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10
New property
GeneralizationHypothesis
h d
T
s1 s2 s3 s4 s5 s6 s7 s8 s9 s10
Mutation process generates p(h|T):– Choose label for root.– Probability that label mutates along
branch b :
= mutation rate|b| = length of branch b
p(h|T)
2
1 2 be
Features
![Page 297: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/297.jpg)
???????
?
Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10
New property
GeneralizationHypothesis
h d
T
Mutation process generates p(h|T):– Choose label for root.– Probability that label mutates along
branch b :
= mutation rate|b| = length of branch b
2
1 2 be
x
x
x
p(h|T)
Features
![Page 298: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/298.jpg)
Samples from the prior
• Labelings that cut the data along fewer branches are more probable:
>
“monophyletic” “polyphyletic”
![Page 299: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/299.jpg)
Samples from the prior
• Labelings that cut the data along longer branches are more probable:
>
“more distinctive” “less distinctive”
![Page 300: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/300.jpg)
???????
?
Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10
New property
GeneralizationHypothesis
h d
T
s1 s2 s3 s4 s5 s6 s7 s8 s9 s10
• Mutation process over tree T generates p(h|T).
• Message passing over tree T efficiently sums over all h.
• How do we know which tree T to use?
p(h|T)
Features
![Page 301: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/301.jpg)
???????
?
Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10
New property
GeneralizationHypothesis
h d
T
s1 s2 s3 s4 s5 s6 s7 s8 s9 s10
The same mutation process generates p(Features|T):– Assume each feature generated
independently over the tree.– Use MCMC to infer most likely tree T and
mutation rate given observed features. – No free parameters!
p(h|T)
Features
![Page 302: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/302.jpg)
Results
Max-Sim
r = 0.77 r = 0.75 r = 0.94
r = 0.38 r = 0.16 r = 0.79
“Theory-based” Bayes
r = 0.91 r = 0.95 r = 0.91
“Empiricist” Bayes
![Page 303: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/303.jpg)
Reconstruct intuitive taxonomy from similarity judgments:
chim
pgo
rilla
hors
eco
wel
epha
ntrh
ino
mou
sesq
uirre
ldo
lphi
nse
al
clustering
Grounding in similarity
![Page 304: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/304.jpg)
Max-sim
Sum-sim
Conclusionkind:
Number ofexamples:
“all mammals” “horses” “horses”
3 2 1, 2, or 3
Theory-based Bayes
![Page 305: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/305.jpg)
Explaining similarity • Why does Max-sim fit so well?
– An efficient and accurate approximation to this Theory-Based Bayesian model.
– Theorem. Nearest neighbor classification approximates evolutionary Bayes in the limit of high mutation rate, if domain is tree-structured.
Correlation (r)
Mean r = 0.94– Correlation with Bayes on
three-premise general arguments, over 100 simulated trees:
![Page 306: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/306.jpg)
Alternative feature-based models• Taxonomic Bayes (strictly taxonomic
hypotheses, with no mutation process)
>
“monophyletic” “polyphyletic”
![Page 307: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/307.jpg)
Alternative feature-based models• Taxonomic Bayes (strictly taxonomic
hypotheses, with no mutation process)
• PDP network (Rogers and McClelland)
Features
Species
![Page 308: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/308.jpg)
Results
PDP network
r = 0.41 r = 0.62 r = 0.71
Taxonomic Bayes
r = 0.51 r = 0.53 r = 0.85
Theory-based Bayes
r = 0.91 r = 0.95 r = 0.91
Bias is too
strong
Bias is too
weak
Bias is just
right!
![Page 309: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/309.jpg)
Mutation principle versus pure Occam’s Razor
• Mutation principle provides a version of Occam’s Razor, by favoring hypotheses that span fewer disjoint clusters.
• Could we use a more generic Bayesian Occam’s Razor, without the biological motivation of mutation?
![Page 310: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/310.jpg)
???????
?
Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10
New property
GeneralizationHypothesis
h d
T
s1 s2 s3 s4 s5 s6 s7 s8 s9 s10
Mutation process generates p(h|T):– Choose label for root.– Probability that label mutates along
branch b :
= mutation rate|b| = length of branch b
p(h|T)
2
1 2 be
Features
![Page 311: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/311.jpg)
???????
?
Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10
New property
GeneralizationHypothesis
h d
T
s1 s2 s3 s4 s5 s6 s7 s8 s9 s10
Mutation process generates p(h|T):– Choose label for root.– Probability that label mutates along
branch b :
= mutation rate|b| = length of branch b
p(h|T)
Features
![Page 312: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/312.jpg)
Bayes(taxonomy+
Occam)
Max-sim
Conclusionkind:
Number ofexamples:
“all mammals”
1
Premise typicality effect (Rips, 1975; Osherson et al., 1990):
Strong:
Weak:
Horses have property P.
All mammals have property P.
Seals have property P.
All mammals have property P.
Bayes(taxonomy+mutation)
![Page 313: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/313.jpg)
Typicality meets hierarchies• Collins and Quillian: semantic memory structured hierarchically
• Traditional story: Simple hierarchical structure uncomfortable with typicality effects & exceptions.
• New story: Typicality & exceptions compatible with rational statistical inference over hierarchy.
![Page 314: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/314.jpg)
Intuitive versus scientific theories of biology
• Same structure for how species are related.– Tree-structured taxonomy.
• Same probabilistic model for traits– Small probability of occurring along any branch at
any time, plus inheritance.
• Different features – Scientist: genes– People: coarse anatomy and behavior
![Page 315: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/315.jpg)
Induction in Biology: summary• Theory-based Bayesian inference explains taxonomic
inductive reasoning in folk biology.
• Insight into processing-level accounts.– Why Max-sim over Sum-sim in this domain?– How is hierarchical representation compatible with typicality
effects & exceptions?
• Reveals essential principles of domain theory.
– Category structure: taxonomic tree.– Feature distribution: stochastic mutation process +
inheritance.
![Page 316: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/316.jpg)
The plan• Similarity-based models
• Theory-based model
• Bayesian models– “Empiricist” Bayes
– Theory-based Bayes, with different theories
• Connectionist (PDP) models
• Advanced Theory-based Bayes– Learning with multiple domain theories
– Learning domain theories
![Page 317: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/317.jpg)
Hyena
Lion
Giraffe
Gazelle
Monkey
Gorilla
Cheetah
Property type Generic “essence”
Theory Structure Taxonomic Tree
LionCheetahHyenaGiraffeGazelleGorillaMonkey
. . .
![Page 318: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/318.jpg)
Hyena
Lion
Giraffe
Gazelle
Monkey
Gorilla
CheetahHyena
Lion
Giraffe
Gazelle
Monkey
Gorilla
Cheetah
Hyena
Lion
Giraffe
Gazelle
Monkey
Gorilla
Cheetah
Property type Generic “essence” Size-related Food-carried
Theory Structure Taxonomic Tree Dimensional Directed Acyclic Network
LionCheetahHyenaGiraffeGazelleGorillaMonkey
. . . . . . . . .
![Page 319: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/319.jpg)
One-dimensional predicates• Q = “Have skins that are more resistant to penetration than most synthetic fibers”.
– Unknown relevant property: skin toughness
– Model influence of known properties via judged prior probability that each species has Q.
Skin toughness
House cat Camel Elephant Rhino
threshold for Q
![Page 320: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/320.jpg)
Max-sim
Bayes(taxonomy+mutation)
Bayes(1D model)
One-dimensional predicates
![Page 321: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/321.jpg)
Dis
ease
Pro
per
ty
Mammals Island
r = -0.35
r = 0.77 r = 0.82
r = -0.05
Food web model fits (Shafto et al.)
![Page 322: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/322.jpg)
Dis
ease
Pro
per
ty
Mammals Island
Taxonomic tree model fits (Shafto et al.)
r = 0.81
r = -0.12 r = 0.16
r = 0.62
![Page 323: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/323.jpg)
The plan• Similarity-based models
• Theory-based model
• Bayesian models– “Empiricist” Bayes
– Theory-based Bayes, with different theories
• Connectionist (PDP) models
• Advanced Theory-based Bayes– Learning with multiple domain theories
– Learning domain theories
![Page 324: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/324.jpg)
DomainStructure
Theory • Species organized in taxonomic tree structure• Feature i generated by mutation process with rate i
Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10
Data
S3 S4 S1 S2 S9 S10 S5 S6 S7 S8
F1 F2F3 F4 F5
F6
F7
F8
F9
F10
F10
F11
F12
F13
F14
F14
F10
p(S|T)
p(D|S)
10 high ~ weight low
![Page 325: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/325.jpg)
Theory • Species organized in taxonomic tree structure• Feature i generated by mutation process with rate i
Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10
Data
S3 S4 S1 S2 S9 S10 S5 S6 S7 S8
F1 F2F3 F4 F5
F6
F7
F8
F9
F10
F10
F11
F12
F13
F14
F14
F10
p(S|T)
p(D|S)
? ? ? ? ? ? ? ? ? ? ? ? ? Species X
DomainStructure
![Page 326: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/326.jpg)
Theory • Species organized in taxonomic tree structure• Feature i generated by mutation process with rate i
Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10
Data
S3 S4 S1 S2 S9 S10 S5 S6 S7 S8
F1 F2F3 F4 F5
F6
F7
F8
F9
F10
F10
F11
F12
F13
F14
F14
F10
p(S|T)
p(D|S) SX
Species X
DomainStructure
![Page 327: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/327.jpg)
Where does the domain theory come from?
• Innate.– Atran (1998): The tendency to group living kinds
into hierarchies reflects an “innately determined cognitive structure”.
• Emerges (only approximately) through learning in unstructured connectionist networks.– McClelland and Rogers (2003).
![Page 328: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/328.jpg)
Bayesian inference to theories
• Challenge to the nativist-empiricist dichotomy.– We really do have structured domain theories.– We really do learn them.
• Bayesian framework applies over multiple levels:– Given hypothesis space + data, infer concepts.– Given theory + data, infer hypothesis space.– Given X + data, infer theory.
![Page 329: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/329.jpg)
Bayesian inference to theories
• Candidate theories for biological species and their features:– T0: Features generated independently for each species. (c.f. naive
Bayes, Anderson’s rational model.)
– T1: Features generated by mutation in tree-structured taxonomy of species.
– T2: Features generated by mutation in a one-dimensional chain of species.
• Score theories by likelihood on object-feature matrix:
)|(),|()|( TSpTSDpTDpS
)|(),|(max TSpTSDpS
![Page 330: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/330.jpg)
T0:• No organizational structure for species. • Features distributed independently over species.
Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10
Features
Data
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
F1F2F5F8F9
F2F4F6F7F9
F14
F1F2F3F5F7F8
F10F12F13
F2F4F7F9
F12F14
F1F5F7F13F14
F1F6F7F8F9F10F13
F2F4F5
F12F13F14
F2F3F6
F11F13
F1F6F8F9
F12
F2F4F8F9
F10F11F14
![Page 331: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/331.jpg)
T0:• No organizational structure for species. • Features distributed independently over species.
Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10
Features
Data
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
F1F6F7F8F9
F11
F1F6F7F8F9
F10F11
F3F7F8F9
F11F12F14
F3F7F8F9
F11F12F14
F4F8F9
F4F8F9
F5F9
F10F13F14
F5F9
F10F13F14
F2F6F7F8F9F11
F2F6F7F8F9
F11
![Page 332: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/332.jpg)
T1:• Species organized in taxonomic tree structure.• Features distributed via stochastic mutation process.
T0:• No organizational structure for species. • Features distributed independently over species.
Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10
Features
Data
S3 S4 S1 S2 S9 S10 S5 S6 S7 S8
F1 F2F3 F4 F5
F6
F7
F8
F9
F10
F10
F11
F12
F13
F14
F14
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
F1F6F7F8F9
F11
F1F6F7F8F9
F10F11
F3F7F8F9
F11F12F14
F3F7F8F9
F11F12F14
F4F8F9
F4F8F9
F5F9
F10F13F14
F5F9
F10F13F14
F2F6F7F8F9F11
F2F6F7F8F9
F11
![Page 333: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/333.jpg)
T1: p(Data|T2) ~ 2.42 x 10-32
• Species organized in taxonomic tree structure.• Features distributed via stochastic mutation process.
T0: p(Data|T1) ~ 1.83 x 10-41
• No organizational structure for species. • Features distributed independently over species.
Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10
Features
Data
S3 S4 S1 S2 S9 S10 S5 S6 S7 S8
F1 F2F3 F4 F5
F6
F7
F8
F9
F10
F10
F11
F12
F13
F14
F14
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
F1F6F7F8F9
F11
F1F6F7F8F9
F10F11
F3F7F8F9
F11F12F14
F3F7F8F9
F11F12F14
F4F8F9
F4F8F9
F5F9
F10F13F14
F5F9
F10F13F14
F2F6F7F8F9F11
F2F6F7F8F9
F11
![Page 334: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/334.jpg)
T0:• No organizational structure for species. • Features distributed independently over species.
Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10
Features
Data
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
F1F2F5F8F9
F2F4F6F7F9
F14
F1F2F3F5F7F8
F10F12F13
F2F4F7F9
F12F14
F1F5F7F13F14
F1F6F7F8F9F10F13
F2F4F5
F12F13F14
F2F3F6
F11F13
F1F6F8F9
F12
F2F4F8F9
F10F11F14
T1:• Species organized in taxonomic tree structure.• Features distributed via stochastic mutation process.
S2 S4 S7 S10 S8 S1 S9 S6 S3 S5
F1F2
F3
F4
F5
F7F10
F11
F12
F13
F14
F2 F2F3F5
F5 F7 F13
F6 F6F6F6 F7
F8 F9
F8
F8
F9
F9
F10F10
F12
F12F12
F13
F13
F14
F11
![Page 335: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/335.jpg)
T0: p(Data|T1) ~ 2.29 x 10-42
• No organizational structure for species. • Features distributed independently over species.
Species 1Species 2Species 3Species 4Species 5Species 6Species 7Species 8Species 9Species 10
Features
Data
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10
F1F2F5F8F9
F2F4F6F7F9
F14
F1F2F3F5F7F8
F10F12F13
F2F4F7F9
F12F14
F1F5F7F13F14
F1F6F7F8F9F10F13
F2F4F5
F12F13F14
F2F3F6
F11F13
F1F6F8F9
F12
F2F4F8F9
F10F11F14
T1: p(Data|T2) ~ 4.38 x 10-53
• Species organized in taxonomic tree structure.• Features distributed via stochastic mutation process.
S2 S4 S7 S10 S8 S1 S9 S6 S3 S5
F1F2
F3
F4
F5
F7F10
F11
F12
F13
F14
F2 F2F3F5
F5 F7 F13
F6 F6F6F6 F7
F8 F9
F8
F8
F9
F9
F10F10
F12
F12F12
F13
F13
F14
F11
![Page 336: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/336.jpg)
Empirical tests
• Synthetic data: 32 objects, 120 features– tree-structured generative model
– linear chain generative model
– unconstrained (independent features).
• Real data– Animal feature judgments: 48 species, 85 features.
– US Supreme Court decisions, 1981-1985: 9 people, 637 cases.
![Page 337: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/337.jpg)
ResultsPreferred
Model
NullTree
LinearTree
Linear
![Page 338: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/338.jpg)
Theory acquisition: summary• So far, just a computational proof of concept.
• Future work:– Experimental studies of theory acquisition in the lab, with
adult and child subjects. – Modeling developmental or historical trajectories of
theory change.
• Sources of hypotheses for candidate theories:– What is innate? – Role of analogy?
![Page 339: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/339.jpg)
Outline• Morning
– Introduction (Josh) – Basic case study #1: Flipping coins (Tom)– Basic case study #2: Rules and similarity (Josh)
• Afternoon– Advanced case study #1: Causal induction (Tom)– Advanced case study #2: Property induction (Josh)– Quick tour of more advanced topics (Tom)
![Page 340: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/340.jpg)
Advanced topics
![Page 341: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/341.jpg)
Structure and statistics
• Statistical language modeling– topic models
• Relational categorization– attributes and relations
![Page 342: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/342.jpg)
Structure and statistics
• Statistical language modeling– topic models
• Relational categorization– attributes and relations
![Page 343: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/343.jpg)
Statistical language modeling
• A variety of approaches to statistical language modeling are used in cognitive science– e.g. LSA (Landauer & Dumais, 1997)
– distributional clustering (Redington, Chater, & Finch, 1998)
• Generative models have unique advantages– identify assumed causal structure of language– make use of standard tools of Bayesian statistics– easily extended to capture more complex structure
![Page 344: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/344.jpg)
Generative models for language
latent structure
observed data
![Page 345: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/345.jpg)
Generative models for language
meaning
sentences
![Page 346: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/346.jpg)
Topic models
• Each document a mixture of topics
• Each word chosen from a single topic
• Introduced by Blei, Ng, and Jordan (2001), reinterpretation of PLSI (Hofmann, 1999)
• Idea of probabilistic topics widely used (eg. Bigi et al., 1997; Iyer & Ostendorf, 1996; Ueda & Saito, 2003)
![Page 347: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/347.jpg)
Generating a document
z
w
zz
w w
distribution over topics
topic assignments
observed words
![Page 348: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/348.jpg)
HEART 0.2 LOVE 0.2SOUL 0.2TEARS 0.2JOY 0.2SCIENTIFIC 0.0KNOWLEDGE 0.0WORK 0.0RESEARCH 0.0MATHEMATICS 0.0
HEART 0.0 LOVE 0.0SOUL 0.0TEARS 0.0JOY 0.0 SCIENTIFIC 0.2KNOWLEDGE 0.2WORK 0.2RESEARCH 0.2MATHEMATICS 0.2
topic 1 topic 2
w P(w|z = 1) = (1) w P(w|z = 2) = (2)
![Page 349: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/349.jpg)
Choose mixture weights for each document, generate “bag of words”
= {P(z = 1), P(z = 2)}
{0, 1}
{0.25, 0.75}
{0.5, 0.5}
{0.75, 0.25}
{1, 0}
MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS RESEARCH WORK SCIENTIFIC MATHEMATICS WORK
SCIENTIFIC KNOWLEDGE MATHEMATICS SCIENTIFIC HEART LOVE TEARS KNOWLEDGE HEART
MATHEMATICS HEART RESEARCH LOVE MATHEMATICS WORK TEARS SOUL KNOWLEDGE HEART
WORK JOY SOUL TEARS MATHEMATICS TEARS LOVE LOVE LOVE SOUL
TEARS LOVE JOY SOUL LOVE TEARS SOUL SOUL TEARS JOY
![Page 350: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/350.jpg)
THEORYSCIENTISTS
EXPERIMENTOBSERVATIONS
SCIENTIFICEXPERIMENTSHYPOTHESIS
EXPLAINSCIENTISTOBSERVED
EXPLANATIONBASED
OBSERVATIONIDEA
EVIDENCETHEORIESBELIEVED
DISCOVEREDOBSERVE
FACTS
SPACEEARTHMOON
PLANETROCKET
MARSORBIT
ASTRONAUTSFIRST
SPACECRAFTJUPITER
SATELLITESATELLITES
ATMOSPHERESPACESHIPSURFACE
SCIENTISTSASTRONAUT
SATURNMILES
ARTPAINT
ARTISTPAINTINGPAINTEDARTISTSMUSEUM
WORKPAINTINGS
STYLEPICTURES
WORKSOWN
SCULPTUREPAINTER
ARTSBEAUTIFUL
DESIGNSPORTRAITPAINTERS
STUDENTSTEACHERSTUDENT
TEACHERSTEACHING
CLASSCLASSROOM
SCHOOLLEARNING
PUPILSCONTENT
INSTRUCTIONTAUGHTGROUPGRADE
SHOULDGRADESCLASSES
PUPILGIVEN
BRAINNERVESENSE
SENSESARE
NERVOUSNERVES
BODYSMELLTASTETOUCH
MESSAGESIMPULSES
CORDORGANSSPINALFIBERS
SENSORYPAIN
IS
CURRENTELECTRICITY
ELECTRICCIRCUIT
ISELECTRICAL
VOLTAGEFLOW
BATTERYWIRE
WIRESSWITCH
CONNECTEDELECTRONSRESISTANCE
POWERCONDUCTORS
CIRCUITSTUBE
NEGATIVE
NATUREWORLDHUMAN
PHILOSOPHYMORAL
KNOWLEDGETHOUGHTREASONSENSEOUR
TRUTHNATURAL
EXISTENCEBEINGLIFE
MINDARISTOTLEBELIEVED
EXPERIENCEREALITY
A selection of topics (from 500)
THIRDFIRST
SECONDTHREE
FOURTHFOUR
GRADETWO
FIFTHSEVENTH
SIXTHEIGHTH
HALFSEVEN
SIXSINGLENINTH
ENDTENTH
ANOTHER
![Page 351: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/351.jpg)
STORYSTORIES
TELLCHARACTER
CHARACTERSAUTHOR
READTOLD
SETTINGTALESPLOT
TELLINGSHORT
FICTIONACTION
TRUEEVENTSTELLSTALE
NOVEL
MINDWORLDDREAM
DREAMSTHOUGHT
IMAGINATIONMOMENT
THOUGHTSOWNREALLIFE
IMAGINESENSE
CONSCIOUSNESSSTRANGEFEELINGWHOLEBEINGMIGHTHOPE
WATERFISHSEA
SWIMSWIMMING
POOLLIKE
SHELLSHARKTANK
SHELLSSHARKSDIVING
DOLPHINSSWAMLONGSEALDIVE
DOLPHINUNDERWATER
DISEASEBACTERIADISEASES
GERMSFEVERCAUSE
CAUSEDSPREADVIRUSES
INFECTIONVIRUS
MICROORGANISMSPERSON
INFECTIOUSCOMMONCAUSING
SMALLPOXBODY
INFECTIONSCERTAIN
A selection of topics (from 500)
FIELDMAGNETIC
MAGNETWIRE
NEEDLECURRENT
COILPOLESIRON
COMPASSLINESCORE
ELECTRICDIRECTION
FORCEMAGNETS
BEMAGNETISM
POLEINDUCED
SCIENCESTUDY
SCIENTISTSSCIENTIFIC
KNOWLEDGEWORK
RESEARCHCHEMISTRY
TECHNOLOGYMANY
MATHEMATICSBIOLOGY
FIELDPHYSICS
LABORATORYSTUDIESWORLD
SCIENTISTSTUDYINGSCIENCES
BALLGAMETEAM
FOOTBALLBASEBALLPLAYERS
PLAYFIELD
PLAYERBASKETBALL
COACHPLAYEDPLAYING
HITTENNISTEAMSGAMESSPORTS
BATTERRY
JOBWORKJOBS
CAREEREXPERIENCE
EMPLOYMENTOPPORTUNITIES
WORKINGTRAINING
SKILLSCAREERS
POSITIONSFIND
POSITIONFIELD
OCCUPATIONSREQUIRE
OPPORTUNITYEARNABLE
![Page 352: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/352.jpg)
STORYSTORIES
TELLCHARACTER
CHARACTERSAUTHOR
READTOLD
SETTINGTALESPLOT
TELLINGSHORT
FICTIONACTION
TRUEEVENTSTELLSTALE
NOVEL
MINDWORLDDREAM
DREAMSTHOUGHT
IMAGINATIONMOMENT
THOUGHTSOWNREALLIFE
IMAGINESENSE
CONSCIOUSNESSSTRANGEFEELINGWHOLEBEINGMIGHTHOPE
WATERFISHSEA
SWIMSWIMMING
POOLLIKE
SHELLSHARKTANK
SHELLSSHARKSDIVING
DOLPHINSSWAMLONGSEALDIVE
DOLPHINUNDERWATER
DISEASEBACTERIADISEASES
GERMSFEVERCAUSE
CAUSEDSPREADVIRUSES
INFECTIONVIRUS
MICROORGANISMSPERSON
INFECTIOUSCOMMONCAUSING
SMALLPOXBODY
INFECTIONSCERTAIN
A selection of topics (from 500)
FIELDMAGNETIC
MAGNETWIRE
NEEDLECURRENT
COILPOLESIRON
COMPASSLINESCORE
ELECTRICDIRECTION
FORCEMAGNETS
BEMAGNETISM
POLEINDUCED
SCIENCESTUDY
SCIENTISTSSCIENTIFIC
KNOWLEDGEWORK
RESEARCHCHEMISTRY
TECHNOLOGYMANY
MATHEMATICSBIOLOGY
FIELDPHYSICS
LABORATORYSTUDIESWORLD
SCIENTISTSTUDYINGSCIENCES
BALLGAMETEAM
FOOTBALLBASEBALLPLAYERS
PLAYFIELD
PLAYERBASKETBALL
COACHPLAYEDPLAYING
HITTENNISTEAMSGAMESSPORTS
BATTERRY
JOBWORKJOBS
CAREEREXPERIENCE
EMPLOYMENTOPPORTUNITIES
WORKINGTRAINING
SKILLSCAREERS
POSITIONSFIND
POSITIONFIELD
OCCUPATIONSREQUIRE
OPPORTUNITYEARNABLE
![Page 353: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/353.jpg)
Learning topic hiearchies
(Blei, Griffiths, Jordan, & Tenenbaum, 2004)
![Page 354: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/354.jpg)
Syntax and semantics from statistics
z
w
zz
w w
xxx
semantics: probabilistic topics
syntax: probabilistic regular grammar
Factorization of language based onstatistical dependency patterns:
long-range, document specific,dependencies
short-range dependencies constantacross all documents
(Griffiths, Steyvers, Blei, & Tenenbaum, submitted)
![Page 355: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/355.jpg)
HEART 0.2 LOVE 0.2SOUL 0.2TEARS 0.2JOY 0.2
z = 1 0.4
SCIENTIFIC 0.2 KNOWLEDGE 0.2WORK 0.2RESEARCH 0.2MATHEMATICS 0.2
z = 2 0.6
x = 1
THE 0.6 A 0.3MANY 0.1
x = 3
OF 0.6 FOR 0.3BETWEEN 0.1
x = 2
0.9
0.1
0.2
0.8
0.7
0.3
![Page 356: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/356.jpg)
HEART 0.2 LOVE 0.2SOUL 0.2TEARS 0.2JOY 0.2
SCIENTIFIC 0.2 KNOWLEDGE 0.2WORK 0.2RESEARCH 0.2MATHEMATICS 0.2
THE 0.6 A 0.3MANY 0.1
OF 0.6 FOR 0.3BETWEEN 0.1
0.9
0.1
0.2
0.8
0.7
0.3
THE ………………………………
z = 1 0.4 z = 2 0.6
x = 1
x = 3
x = 2
![Page 357: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/357.jpg)
HEART 0.2 LOVE 0.2SOUL 0.2TEARS 0.2JOY 0.2
SCIENTIFIC 0.2 KNOWLEDGE 0.2WORK 0.2RESEARCH 0.2MATHEMATICS 0.2
THE 0.6 A 0.3MANY 0.1
OF 0.6 FOR 0.3BETWEEN 0.1
0.9
0.1
0.2
0.8
0.7
0.3
THE LOVE……………………
z = 1 0.4 z = 2 0.6
x = 1
x = 3
x = 2
![Page 358: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/358.jpg)
HEART 0.2 LOVE 0.2SOUL 0.2TEARS 0.2JOY 0.2
SCIENTIFIC 0.2 KNOWLEDGE 0.2WORK 0.2RESEARCH 0.2MATHEMATICS 0.2
THE 0.6 A 0.3MANY 0.1
OF 0.6 FOR 0.3BETWEEN 0.1
0.9
0.1
0.2
0.8
0.7
0.3
THE LOVE OF………………
z = 1 0.4 z = 2 0.6
x = 1
x = 3
x = 2
![Page 359: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/359.jpg)
HEART 0.2 LOVE 0.2SOUL 0.2TEARS 0.2JOY 0.2
SCIENTIFIC 0.2 KNOWLEDGE 0.2WORK 0.2RESEARCH 0.2MATHEMATICS 0.2
THE 0.6 A 0.3MANY 0.1
OF 0.6 FOR 0.3BETWEEN 0.1
0.9
0.1
0.2
0.8
0.7
0.3
THE LOVE OF RESEARCH ……
z = 1 0.4 z = 2 0.6
x = 1
x = 3
x = 2
![Page 360: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/360.jpg)
FOODFOODSBODY
NUTRIENTSDIETFAT
SUGARENERGY
MILKEATINGFRUITS
VEGETABLESWEIGHT
FATSNEEDS
CARBOHYDRATESVITAMINSCALORIESPROTEIN
MINERALS
MAPNORTHEARTHSOUTHPOLEMAPS
EQUATORWESTLINESEAST
AUSTRALIAGLOBEPOLES
HEMISPHERELATITUDE
PLACESLAND
WORLDCOMPASS
CONTINENTS
DOCTORPATIENTHEALTH
HOSPITALMEDICAL
CAREPATIENTS
NURSEDOCTORSMEDICINENURSING
TREATMENTNURSES
PHYSICIANHOSPITALS
DRSICK
ASSISTANTEMERGENCY
PRACTICE
BOOKBOOKS
READINGINFORMATION
LIBRARYREPORT
PAGETITLE
SUBJECTPAGESGUIDE
WORDSMATERIALARTICLE
ARTICLESWORDFACTS
AUTHORREFERENCE
NOTE
GOLDIRON
SILVERCOPPERMETAL
METALSSTEELCLAYLEADADAM
OREALUMINUM
MINERALMINE
STONEMINERALS
POTMININGMINERS
TIN
BEHAVIORSELF
INDIVIDUALPERSONALITY
RESPONSESOCIAL
EMOTIONALLEARNINGFEELINGS
PSYCHOLOGISTSINDIVIDUALS
PSYCHOLOGICALEXPERIENCES
ENVIRONMENTHUMAN
RESPONSESBEHAVIORSATTITUDES
PSYCHOLOGYPERSON
CELLSCELL
ORGANISMSALGAE
BACTERIAMICROSCOPEMEMBRANEORGANISM
FOODLIVINGFUNGIMOLD
MATERIALSNUCLEUSCELLED
STRUCTURESMATERIAL
STRUCTUREGREENMOLDS
Semantic categories
PLANTSPLANT
LEAVESSEEDSSOIL
ROOTSFLOWERS
WATERFOOD
GREENSEED
STEMSFLOWER
STEMLEAF
ANIMALSROOT
POLLENGROWING
GROW
![Page 361: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/361.jpg)
GOODSMALL
NEWIMPORTANT
GREATLITTLELARGE
*BIG
LONGHIGH
DIFFERENTSPECIAL
OLDSTRONGYOUNG
COMMONWHITESINGLE
CERTAIN
THEHIS
THEIRYOURHERITSMYOURTHIS
THESEA
ANTHATNEW
THOSEEACH
MRANYMRSALL
MORESUCHLESS
MUCHKNOWN
JUSTBETTERRATHER
GREATERHIGHERLARGERLONGERFASTER
EXACTLYSMALLER
SOMETHINGBIGGERFEWERLOWER
ALMOST
ONAT
INTOFROMWITH
THROUGHOVER
AROUNDAGAINSTACROSS
UPONTOWARDUNDERALONGNEAR
BEHINDOFF
ABOVEDOWN
BEFORE
SAIDASKED
THOUGHTTOLDSAYS
MEANSCALLEDCRIED
SHOWSANSWERED
TELLSREPLIED
SHOUTEDEXPLAINEDLAUGHED
MEANTWROTE
SHOWEDBELIEVED
WHISPERED
ONESOMEMANYTWOEACHALL
MOSTANY
THREETHIS
EVERYSEVERAL
FOURFIVEBOTHTENSIX
MUCHTWENTY
EIGHT
HEYOU
THEYI
SHEWEIT
PEOPLEEVERYONE
OTHERSSCIENTISTSSOMEONE
WHONOBODY
ONESOMETHING
ANYONEEVERYBODY
SOMETHEN
Syntactic categories
BEMAKE
GETHAVE
GOTAKE
DOFINDUSESEE
HELPKEEPGIVELOOKCOMEWORKMOVELIVEEAT
BECOME
![Page 362: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/362.jpg)
Statistical language modeling
• Generative models provide– transparent assumptions about causal process– opportunities to combine and extend models
• Richer generative models...– probabilistic context-free grammars– paragraph or sentence-level dependencies– more complex semantics
![Page 363: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/363.jpg)
Structure and statistics
• Statistical language modeling– topic models
• Relational categorization– attributes and relations
![Page 364: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/364.jpg)
Relational categorization
• Most approaches to categorization in psychology and machine learning focus on attributes - properties of objects– words in titles of CogSci posters
• But… a significant portion of knowledge is organized in terms of relations– co-authors on posters– who talks to whom
(Kemp, Griffiths, & Tenenbaum, 2004)
![Page 365: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/365.jpg)
Attributes and relations
Data Modelobjects
attr
ibut
esob
ject
s
objects
P(X) = ik z P(xik|zi) i P(zi)X
Y P(Y) = ij z P(yij|zi) i P(zi)
mixture model (c.f. Anderson, 1990)
stochastic blockmodel
![Page 366: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/366.jpg)
Stochastic blockmodels
• For any pair of objects, (i,j), probability of relation is determined by classes, (zi, zj)
• Allows types of objects and class probabilities to be learned from data
21 22 23
31 32 33
11 12 13
Fromtype i
To type j
Each entity has a type = Z
P(Z,|Y) P(Y|Z,P(Z)P(
![Page 367: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/367.jpg)
Stochastic blockmodels
C
BA
C
B
A
CBA
C
BA
C
BA
CBA
D
D
D
![Page 368: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/368.jpg)
Categorizing words
• Relational data: word association norms (Nelson, McEvoy, &
Schreiber, 1998)
• 5018 x 5018 matrix of associations– symmetrized– all words with < 50 and > 10 associates– 2513 nodes, 34716 links
![Page 369: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/369.jpg)
![Page 370: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/370.jpg)
![Page 371: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/371.jpg)
Categorizing words
BANDINSTRUMENT
BLOWHORNFLUTEBRASS
GUITARPIANOTUBA
TRUMPET
TIECOATSHOESROPE
LEATHERSHOEHAT
PANTSWEDDING
STRING
SEWMATERIAL
WOOLYARNWEARTEARFRAYJEANS
COTTONCARPET
WASHLIQUID
BATHROOMSINK
CLEANERSTAINDRAINDISHES
TUBSCRUB
![Page 372: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/372.jpg)
Categorizing actors
• Internet Movie Database (IMDB) data, from the start of cinema to 1960 (Jeremy Kubica)
• Relational data: collaboration
• 5000 x 5000 matrix of most prolific actors– all actors with < 400 and > 1 collaborators– 2275 nodes, 204761 links
![Page 373: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/373.jpg)
![Page 374: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/374.jpg)
![Page 375: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/375.jpg)
Categorizing actors
Albert LievenKarel Stepanek
Walter RillaAnton Walbrook
Moore MarriottLaurence HanrayGus McNaughton
Gordon HarkerHelen Haye
Alfred GoddardMorland Graham
Margaret LockwoodHal Gordon
Bromley Davenport
Gino CerviNadia GrayEnrico GloriPaolo Stoppa
Bernardi NerioAmedeo NazzariGina Lollobrigida
Aldo SilvaniFranco Interlenghi
Guido Celano
Archie RicksHelen GibsonOscar Gahan
Buck MoultonBuck ConnorsClyde McClaryBarney BeasleyBuck Morgan
Tex PhelpsGeorge Sowards
Germany UK British comedy Italian US Westerns
![Page 376: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/376.jpg)
Structure and statistics
• Bayesian approach allows us to specify structured probabilistic models
• Explore novel representations and domains– topics for semantic representation– relational categorization
• Use powerful methods for inference, developed in statistics and machine learning
![Page 377: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/377.jpg)
Other methods and tools...
• Inference algorithms– belief propagation– dynamic programming– the EM algorithm and variational methods– Markov chain Monte Carlo
• More complex models– Dirichlet processes and Bayesian non-parametrics– Gaussian processes and kernel methods
Reading list at http://www.bayesiancognition.com
![Page 378: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/378.jpg)
Taking stock
![Page 379: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/379.jpg)
Bayesian models of inductive learning
• Inductive leaps can be explained with hierarchical Theory-based Bayesian models:
Domain Theory
Structural Hypotheses
Data
ProbabilisticGenerativeModel
Bayesianinference
![Page 380: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/380.jpg)
Bayesian models of inductive learning
• Inductive leaps can be explained with hierarchical Theory-based Bayesian models:
T
S
D
SS
DDD DD D DD
......
![Page 381: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/381.jpg)
Bayesian models of inductive learning
• Inductive leaps can be explained with hierarchical Theory-based Bayesian models.
• What the approach offers:– Strong quantitative models of generalization behavior.
– Flexibility to model different patterns of reasoning that in different tasks and domains, using differently structured theories, but the same general-purpose Bayesian engine.
– Framework for explaining why inductive generalization works, where knowledge comes from as well as how it is used.
![Page 382: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/382.jpg)
Bayesian models of inductive learning
• Inductive leaps can be explained with hierarchical Theory-based Bayesian models.
• Challenges:– Theories are hard.
![Page 383: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/383.jpg)
Bayesian models of inductive learning
• Inductive leaps can be explained with hierarchical Theory-based Bayesian models:
• The interaction between structure and statistics is crucial.– How structured knowledge supports statistical learning, by
constraining hypothesis spaces.– How statistics supports reasoning with and learning
structured knowledge. – How complex structures can grow from data, rather than
being fully specified in advance.
![Page 384: Bayesian models of inductive learning Josh Tenenbaum & Tom Griffiths MIT Computational Cognitive Science Group Department of Brain and Cognitive Sciences](https://reader038.vdocuments.net/reader038/viewer/2022102900/55181f3b550346ac318b4968/html5/thumbnails/384.jpg)