learning overhypotheses with hierarchical bayesian models

26
Learning overhypotheses with hierarchical Bayesian models les Kemp, Amy Perfors, Josh Tenenb (Developmental Science, 2007)

Upload: bernard-ellison

Post on 31-Dec-2015

65 views

Category:

Documents


0 download

DESCRIPTION

Learning overhypotheses with hierarchical Bayesian models. Charles Kemp, Amy Perfors, Josh Tenenbaum ( Developmental Science , 2007). Learning word meanings from examples. “horse”. “horse”. “horse”. The “shape bias” in word learning (Landau, Smith, Jones 1988). This is a dax. - PowerPoint PPT Presentation

TRANSCRIPT

Learning overhypotheses with hierarchical Bayesian models

Charles Kemp, Amy Perfors, Josh Tenenbaum(Developmental Science, 2007)

“horse”

“horse”

“horse”

Learning word meanings from examples

The “shape bias” in word learning(Landau, Smith, Jones 1988)

This is a dax. Show me the dax…

• English-speaking children have a “shape bias”, picking the object with the same shape.

• The shape bias is a useful inductive constraint or “overhypothesis”: majority of early words are labels for object categories, and shape may be the best cue to object category membership.

What is the relation between y and x?

What is the relation between y and x?

What is the relation between y and x?

Overhypotheses

• Syntax: Universal Grammar• Phonology Faithfulness constraints

Markedness constraints• Word Learning Shape bias

Principle of contrastWhole object bias

• Folk physics Objects are unified, bounded and persistent bodies

• Predicability M-constraint• Folk biology Taxonomic principle ... ...

(Spelke)

(Markman)

(Keil)

(Atran)

(Chomsky)

(Prince, Smolensky)

1. How does overhypotheses guide learning from sparsely observed data?

2. What form do overhypotheses take, across different domains and tasks?

3. How are overhypotheses themselves acquired?

4. How can overhypotheses provide constraints yet maintain flexibility, balancing assimilation and accommodation?

Overhypotheses

The “shape bias” in word learning(Landau, Smith, Jones 1988)

This is a dax. Show me the dax…

• English-speaking children have a “shape bias” at 24 months of age, but 20-month-olds do not.…

Is the shape bias learned?• Smith et al (2002) trained

17-month-olds on labels for 4 artificial categories:

• After 8 weeks of training (20 min/week), 19-month-olds show the shape bias:

“wib”

“lug”

“zup”“div”

This is a dax.

Show me the dax…

“Learned attentional bias”“Transfer learning”

Transfer to real-world vocabulary

The puzzle: The shape bias is a powerful inductive constraint, yet can be learned from very little data.

Learning about feature variability

“wib”

“lug”

“zup”“div”

The intuition: - Shape varies across categories but relatively

constant within categories.

- Other features (size, color, texture) vary both across and within nameable object categories.

?

Learning about feature variabilityMarbles of different colors: …

?

Marbles of different colors: …

Learning about feature variability

A hierarchical model

Level 1: Bag proportions

Data

mostlyred

mostlybrown

mostlyblue

Color varies across bags but not much within bags

Level 2: Bags in general

mostlyyellow

mostlygreen?

A hierarchical Bayesian model

Level 1: Bag proportions

Data

Level 2: Bags in general

Level 3: Prior expectations on bags in general

Simultaneously infer

A hierarchical Bayesian model

Level 1: Bag proportions

Data

Level 2: Bags in general

Level 3: Prior expectations on bags in general

“Bag 1 is mostly red”

x

A hierarchical Bayesian model

Level 1: Bag proportions

Data

Level 2: Bags in general

Level 3: Prior expectations on bags in general

“Bag 2 is mostly yellow”

x

A hierarchical Bayesian model

Level 1: Bag proportions

Data

Level 2: Bags in general

Level 3: Prior expectations on bags in general

“Color varies across bags but not much within bags”

Learning the shape biasAssuming independent Dirichlet-

multinomial models for each dimension…

“wib”

“lug”

“zup”“div”

Training

“wib” “lug” “div”

Learning the shape bias

“wib”

“lug”

“zup”“div”

Training

“wib” “lug” “div”

Assuming independent Dirichlet-multinomial models for each dimension, we learn that:

– Shape varies across categories but not within categories. – Texture, color, size vary across and within categories.

This is a dax.

Show me the dax…

Training Test

Learning the shape bias

Extensions• Learning with weaker shape representations.

• Learning to transfer selectively, dependent on knowledge of ontological kinds. – By age ~3, children know that a shape bias is appropriate for solid object

categories (ball, book, toothbrush, …), while a material bias is appropriate for nonsolid substance categories (juice, sand, toothpaste, …).

CategoryHoles Curvature Edges Aspect ratio

Main color Color distribution Oriented texture Roughness

Shapefeatures{Otherfeatures{

Training Test1 1 2 2 3 3 4 4 3 3 2 2 3 3 1 2 1 1 2 3 4 4 4 4 5 3 4 4 1 1 4 4 2 2 1 1 3 1 5 5

1 2 2 5 3 3 1 4 4 3 1 5 2 3 2 4 5 1 2 2 1 4 5 5 2 2 4 2 3 5 4 3

5 ? ?

6 6 56 6 56 6 56 6 5

6 5 6 6 5 66 5 66 5 6

“dax” “zav” “fep” “wif” “wug” “toof”

Variability in solidity, shape, material within kind 1

“toof” material

“dax” shape

solid

non-solid

Modeling selective transferLet be the ontological kind of category i.

Given , we could learn a separate Dirichlet-multinomial model for each ontological kind:

Variability in solidity, shape, material within kind 2

Chicken-and-egg problem: We don’t know the partition into ontological kinds.

The input:

Solution: Define a nonparametric prior over this partition.

Learning to transfer selectively

solid

non-solid

“zav”

“zav”

“zav”

“dax”“dax” “dax”

“wif”

“wif”

“wif”

“wug” “wug”

“wug”

(c.f. Roy & KaelblingIJCAI 07)

Summary• Inductive constraints or “overhypotheses” are critical for learning so

much so fast. New overhypotheses can be learned by children, often very early in development. – Not just innate, nor the result of gradual abstraction from many specific

experiences.

• Hierarchical Bayesian models (HBMs) may help explain the role of overhypotheses in learning as well as how overhypotheses may themselves be acquired from experience (even relatively little experience).– The “blessing of abstraction”

• Overhypotheses must constrain learning yet also be flexible, capable of revision, extension or growth. – Nonparametric HBMs can navigate this “assimilation vs. accommodation” tradeoff.