learning overhypotheses with hierarchical bayesian models
DESCRIPTION
Learning overhypotheses with hierarchical Bayesian models. Charles Kemp, Amy Perfors, Josh Tenenbaum ( Developmental Science , 2007). Learning word meanings from examples. “horse”. “horse”. “horse”. The “shape bias” in word learning (Landau, Smith, Jones 1988). This is a dax. - PowerPoint PPT PresentationTRANSCRIPT
Learning overhypotheses with hierarchical Bayesian models
Charles Kemp, Amy Perfors, Josh Tenenbaum(Developmental Science, 2007)
The “shape bias” in word learning(Landau, Smith, Jones 1988)
This is a dax. Show me the dax…
• English-speaking children have a “shape bias”, picking the object with the same shape.
• The shape bias is a useful inductive constraint or “overhypothesis”: majority of early words are labels for object categories, and shape may be the best cue to object category membership.
Overhypotheses
• Syntax: Universal Grammar• Phonology Faithfulness constraints
Markedness constraints• Word Learning Shape bias
Principle of contrastWhole object bias
• Folk physics Objects are unified, bounded and persistent bodies
• Predicability M-constraint• Folk biology Taxonomic principle ... ...
(Spelke)
(Markman)
(Keil)
(Atran)
(Chomsky)
(Prince, Smolensky)
1. How does overhypotheses guide learning from sparsely observed data?
2. What form do overhypotheses take, across different domains and tasks?
3. How are overhypotheses themselves acquired?
4. How can overhypotheses provide constraints yet maintain flexibility, balancing assimilation and accommodation?
Overhypotheses
The “shape bias” in word learning(Landau, Smith, Jones 1988)
This is a dax. Show me the dax…
• English-speaking children have a “shape bias” at 24 months of age, but 20-month-olds do not.…
Is the shape bias learned?• Smith et al (2002) trained
17-month-olds on labels for 4 artificial categories:
• After 8 weeks of training (20 min/week), 19-month-olds show the shape bias:
“wib”
“lug”
“zup”“div”
This is a dax.
Show me the dax…
“Learned attentional bias”“Transfer learning”
Transfer to real-world vocabulary
The puzzle: The shape bias is a powerful inductive constraint, yet can be learned from very little data.
Learning about feature variability
“wib”
“lug”
“zup”“div”
The intuition: - Shape varies across categories but relatively
constant within categories.
- Other features (size, color, texture) vary both across and within nameable object categories.
A hierarchical model
Level 1: Bag proportions
Data
…
…
mostlyred
mostlybrown
mostlyblue
Color varies across bags but not much within bags
Level 2: Bags in general
mostlyyellow
mostlygreen?
A hierarchical Bayesian model
Level 1: Bag proportions
Data
Level 2: Bags in general
Level 3: Prior expectations on bags in general
…
…
Simultaneously infer
A hierarchical Bayesian model
Level 1: Bag proportions
Data
Level 2: Bags in general
Level 3: Prior expectations on bags in general
…
…
“Bag 1 is mostly red”
x
A hierarchical Bayesian model
Level 1: Bag proportions
Data
Level 2: Bags in general
Level 3: Prior expectations on bags in general
…
…
“Bag 2 is mostly yellow”
x
A hierarchical Bayesian model
Level 1: Bag proportions
Data
Level 2: Bags in general
Level 3: Prior expectations on bags in general
…
…
“Color varies across bags but not much within bags”
Learning the shape biasAssuming independent Dirichlet-
multinomial models for each dimension…
“wib”
“lug”
“zup”“div”
Training
“wib” “lug” “div”
Learning the shape bias
“wib”
“lug”
“zup”“div”
Training
“wib” “lug” “div”
Assuming independent Dirichlet-multinomial models for each dimension, we learn that:
– Shape varies across categories but not within categories. – Texture, color, size vary across and within categories.
Extensions• Learning with weaker shape representations.
• Learning to transfer selectively, dependent on knowledge of ontological kinds. – By age ~3, children know that a shape bias is appropriate for solid object
categories (ball, book, toothbrush, …), while a material bias is appropriate for nonsolid substance categories (juice, sand, toothpaste, …).
CategoryHoles Curvature Edges Aspect ratio
Main color Color distribution Oriented texture Roughness
Shapefeatures{Otherfeatures{
Training Test1 1 2 2 3 3 4 4 3 3 2 2 3 3 1 2 1 1 2 3 4 4 4 4 5 3 4 4 1 1 4 4 2 2 1 1 3 1 5 5
1 2 2 5 3 3 1 4 4 3 1 5 2 3 2 4 5 1 2 2 1 4 5 5 2 2 4 2 3 5 4 3
5 ? ?
6 6 56 6 56 6 56 6 5
6 5 6 6 5 66 5 66 5 6
“dax” “zav” “fep” “wif” “wug” “toof”
Variability in solidity, shape, material within kind 1
“toof” material
“dax” shape
solid
non-solid
Modeling selective transferLet be the ontological kind of category i.
Given , we could learn a separate Dirichlet-multinomial model for each ontological kind:
Variability in solidity, shape, material within kind 2
Chicken-and-egg problem: We don’t know the partition into ontological kinds.
The input:
Solution: Define a nonparametric prior over this partition.
Learning to transfer selectively
solid
non-solid
“zav”
“zav”
“zav”
“dax”“dax” “dax”
“wif”
“wif”
“wif”
“wug” “wug”
“wug”
(c.f. Roy & KaelblingIJCAI 07)
Summary• Inductive constraints or “overhypotheses” are critical for learning so
much so fast. New overhypotheses can be learned by children, often very early in development. – Not just innate, nor the result of gradual abstraction from many specific
experiences.
• Hierarchical Bayesian models (HBMs) may help explain the role of overhypotheses in learning as well as how overhypotheses may themselves be acquired from experience (even relatively little experience).– The “blessing of abstraction”
• Overhypotheses must constrain learning yet also be flexible, capable of revision, extension or growth. – Nonparametric HBMs can navigate this “assimilation vs. accommodation” tradeoff.