learning overhypotheses with hierarchical bayesian models

of 26/26
Learning overhypotheses with hierarchical Bayesian models les Kemp, Amy Perfors, Josh Tenenb (Developmental Science, 2007)

Post on 31-Dec-2015

54 views

Category:

Documents

0 download

Embed Size (px)

DESCRIPTION

Learning overhypotheses with hierarchical Bayesian models. Charles Kemp, Amy Perfors, Josh Tenenbaum ( Developmental Science , 2007). Learning word meanings from examples. “horse”. “horse”. “horse”. The “shape bias” in word learning (Landau, Smith, Jones 1988). This is a dax. - PowerPoint PPT Presentation

TRANSCRIPT

  • Learning overhypotheses with hierarchical Bayesian modelsCharles Kemp, Amy Perfors, Josh Tenenbaum(Developmental Science, 2007)

  • Learning word meanings from exampleshorsehorsehorse

  • The shape bias in word learning(Landau, Smith, Jones 1988)English-speaking children have a shape bias, picking the object with the same shape. The shape bias is a useful inductive constraint or overhypothesis: majority of early words are labels for object categories, and shape may be the best cue to object category membership. This is a dax.Show me the dax

  • What is the relation between y and x?

  • What is the relation between y and x?

  • What is the relation between y and x?

  • OverhypothesesSyntax:Universal GrammarPhonology Faithfulness constraints Markedness constraintsWord LearningShape biasPrinciple of contrast Whole object biasFolk physicsObjects are unified, bounded and persistent bodiesPredicabilityM-constraintFolk biologyTaxonomic principle ......(Spelke)(Markman)(Keil)(Atran)(Chomsky)(Prince, Smolensky)

  • Overhypotheses1. How does overhypotheses guide learning from sparsely observed data?2. What form do overhypotheses take, across different domains and tasks?3. How are overhypotheses themselves acquired? 4. How can overhypotheses provide constraints yet maintain flexibility, balancing assimilation and accommodation?

  • The shape bias in word learning(Landau, Smith, Jones 1988)English-speaking children have a shape bias at 24 months of age, but 20-month-olds do not.

    This is a dax.Show me the dax

  • Is the shape bias learned?Smith et al (2002) trained 17-month-olds on labels for 4 artificial categories:

    After 8 weeks of training (20 min/week), 19-month-olds show the shape bias:This is a dax.Show me the daxTransfer LearningLearned attentional biasTransfer learning

  • Transfer to real-world vocabularyThe puzzle: The shape bias is a powerful inductive constraint, yet can be learned from very little data.

  • Learning about feature variabilityThe intuition: - Shape varies across categories but relatively constant within categories.

    - Other features (size, color, texture) vary both across and within nameable object categories.

  • ?Learning about feature variabilityMarbles of different colors:

  • ?Marbles of different colors:Learning about feature variability

  • A hierarchical modelLevel 1: Bag proportionsDatamostlyredmostlybrownmostlyblueColor varies across bags but not much within bagsLevel 2: Bags in generalmostlyyellowmostlygreen?

  • A hierarchical Bayesian modelLevel 1: Bag proportionsDataLevel 2: Bags in generalLevel 3: Prior expectations on bags in general

  • A hierarchical Bayesian modelLevel 1: Bag proportionsDataLevel 2: Bags in generalLevel 3: Prior expectations on bags in generalBag 1 is mostly redx

  • A hierarchical Bayesian modelLevel 1: Bag proportionsDataLevel 2: Bags in generalLevel 3: Prior expectations on bags in generalBag 2 is mostly yellowx

  • A hierarchical Bayesian modelLevel 1: Bag proportionsDataLevel 2: Bags in generalLevel 3: Prior expectations on bags in generalColor varies across bags but not much within bags

  • Learning the shape biasAssuming independent Dirichlet-multinomial models for each dimensionTrainingwiblugdiv

  • Learning the shape biasAssuming independent Dirichlet-multinomial models for each dimension, we learn that:Shape varies across categories but not within categories. Texture, color, size vary across and within categories.

    Trainingwiblugdiv

  • Learning the shape biasThis is a dax.Show me the dax

  • ExtensionsLearning with weaker shape representations.

    Learning to transfer selectively, dependent on knowledge of ontological kinds. By age ~3, children know that a shape bias is appropriate for solid object categories (ball, book, toothbrush, ), while a material bias is appropriate for nonsolid substance categories (juice, sand, toothpaste, ).CategoryHoles Curvature Edges Aspect ratio Main color Color distribution Oriented texture RoughnessShapefeatures{Otherfeatures{TrainingTest1 1 2 2 3 3 4 4 3 3 2 2 3 3 1 2 1 1 2 3 4 4 4 4 5 3 4 4 1 1 4 4 2 2 1 1 3 1 5 5 1 2 2 5 3 3 1 4 4 3 1 5 2 3 2 4 5 1 2 2 1 4 5 5 2 2 4 2 3 5 4 3 5 ? ? 6 6 56 6 56 6 56 6 56 5 6 6 5 66 5 66 5 6

  • Modeling selective transferLet be the ontological kind of category i.Given , we could learn a separate Dirichlet-multinomial model for each ontological kind:daxzavfepwifwugtoofVariability in solidity, shape, material within kind 1 toof material dax shape solidnon-solidVariability in solidity, shape, material within kind 2

  • Learning to transfer selectivelyChicken-and-egg problem: We dont know the partition into ontological kinds.

    The input:

    Solution: Define a nonparametric prior over this partition. solidnon-solidzavzavzavdaxdaxdaxwifwifwifwugwugwug(c.f. Roy & KaelblingIJCAI 07)

  • SummaryInductive constraints or overhypotheses are critical for learning so much so fast. New overhypotheses can be learned by children, often very early in development. Not just innate, nor the result of gradual abstraction from many specific experiences. Hierarchical Bayesian models (HBMs) may help explain the role of overhypotheses in learning as well as how overhypotheses may themselves be acquired from experience (even relatively little experience).The blessing of abstractionOverhypotheses must constrain learning yet also be flexible, capable of revision, extension or growth. Nonparametric HBMs can navigate this assimilation vs. accommodation tradeoff.

    Im going to tell you about a broad research program

    The problems that intrigue me are all things which people do effortlessly and for th most part quite well, but which we still dont know how to get computers get do -- which is a sign that we dont understand the computational basis of how people do these things.