prosody as a means of acquiring syntactic categories and ... · prosody as a means of acquiring...

1
Prosody as a means of acquiring syntactic categories and building a syntactic skeleton Gutman A.*, Dautriche I. # , Crabb´ eB + & Christophe A. # *Zukunftskolleg, Konstanz University, Germany # LSCP, Ecole Normale Sup´ erieure/CNRS, Paris + Alpage, Paris Diderot University Background & Objective How can children learn a language without explicit training ? Children acquiring their native language have to learn abstract syntactic categories (nouns, verbs, adjectives etc...) : these may facilitate the acquisition of word meanings [1] as well as the syntactic structures of the language. Two-year-olds already have an abstract representation of nouns and verbs [2] What underlies this feat ? We propose to simulate this process with a computational model based on experimental evidence. A model for bootstrapping syntactic acquisition We already know that : I 9-month-olds perceive prosodic boundaries [3] I Function words are frequent and situated at the border of prosodic boundaries [4]. I Toddlers use function words to categorize the next content words [5]. Given this knowledge, we may construct an approximate syntactic structure [6] : I Prosodic structure is treated as an approximative shallow syntactic structure. a prosodic phrase = a syntactic phrase I Function words and content words serve to label categorically the prosodic phrases (NP, VP). Result : A syntactic skeleton. Goal : Testing the feasibility of the labeling component of the model. [The little boy][is running fast] [The little boy] NP [is running fast] VP S NP The little boy VP is running fast Overview of the computational model Input : Transcribed corpus of child directed speech augmented with prosodic boundaries following the algorithm of [7]. Learning simulation : A prosodic phrase is seen as a probabilistic event, to which we associate features : I The observed features are words at the edge of the phrase (L, R). I The hidden feature is the syntactic category of the phrase (NP, VP). Ex : [The L -1 rabbit R -1 ] NP [is L 0 eating R 0 ] VP [the L 1 grass of the garden R 1 ] NP We predict the category φ of each prosodic phrase from its features f i using Na¨ ıve Bayes : ϕ = argmax φ p (φ) Q l i =1 p (f i |φ) The prior probability of the category p (φ) and the probability of the different features given that category p (f i |φ), are estimated via the Expectation-Maximization algorithm [7] : I Initialization : Initially assign the prosodic phrases to syntactic clusters. I Maximization : Estimate the parameters of the model. I Expectation : Re-assign the prosodic phrases to new syntactic clusters. I Reiterate the two last steps until convergence. References [1] Gillette et al. (1999). Human simulations of vocabulary learning. Cognition, 73, 165 – 176. [2] Bernal, S. et al. (2010). Two-year-olds compute syntactic structure on-line. Developmental Science, 13, 69-73. [3] Gerken et al. (1994). 9-month-olds’ sensitivity to phonological versus syntactic phrases. Cognition 51.3, pp. 237–265 [4] Morgan, J.L. et al. (1996). Perceptual bases of rudimentary grammatical categories. pp. 263– 283 [5] Cauvet, E. et al. (inpress) Function words constrain on-line recognition of verbs and nouns in French 18-month-olds, LLD. [6] Christophe, A. et al. (2008). Bootstrapping lexical and syntactic acquisition. Language & Speech, 51, 61-75. [7] Nespor, M and I. Vogel. (1986) Prosodic phonology, volume 28. Walter de Gruyter. [8] A.P. Dempster et al (1977). Maximum likelihood from incomplete data via the EM algorithm.JRSS. 39(1) :1–38. [9] Bergelson, E. et al. (2012). At 6 to 9 months, human infants know the meanings of many common nouns. PNAS, 109. Experiment 1 - Unsupervised clustering using Function words Assumptions I Children use function words to classify phrases. I Function words are frequent and often the first word of a phrase [3]. Implementation I Assign each prosodic phrase to a cluster labelled by its first word (the feature L). I Retain only the clusters corresponding to the most frequent function words. I Re-assign the rest of the corpus using the EM algorithm. Evaluation & Results The quality of each cluster is evaluated by examining the largest syntactic category which it has captured : purity (Cl )= Number of hits of largest category Cluster size Up-side : I Relying on function words appearing at the edge of the prosodic phrase provides a powerful classification heuristic. Down-side : I However no cluster corresponds to a single category VP or NP but these categories are distributed over several clusters. Experiment 2 - Semi-supervised classification using Content words Assumptions I The child knows already some of the most frequent content words [8], and can identify them as objects (nouns) or actions (verbs) [4]. I Based on this knowledge she can identify the class of the current prosodic phrase. Implementation I Pre-define a list of known frequent content words based on the corpus : a semantic seed I Use the last word of each phrase (the feature R) to assign it to one of three classes : Nominal, Verbal or Other. I Re-assign the unclassified part of the corpus using EM algorithm Proportion of each category (NP, VP, Other) in each cluster for the smallest seed (2% of phrases initially classified) Evaluation & Results We vary the size of the semantic seed (known words) starting from {3 Nouns, 1 Verb} and compare to a uniform random clustering of 3 clusters : I The precision is very good even with a small semantic seed (highly precise VP and NP clusters, which span about 50% of these categories). I The recall is lower since the known words do not cover all contexts. But precision is more important than recall for learning ! I Although based on content words, the learning algorithm ultimately relies on function words : knowing a few content words allows the learner to discover the function words associated with the category label. Conclusion I The syntactic skeleton hypothesis is corroborated : The language learner can rely on prosodic boundaries, function words and content words in order to construct a syntactic skeleton. I Experiment 1 : While the prosodic boundaries give the structure of the prosodic skeleton, our baseline random EM shows that the prosodic structure alone does not permit the inference of meaningful phrasal categories. However relying on function words permits the construction of good categories (high purity). I Experiment 2 : Relying on knowledge of frequent content words can lead to emergence of abstract syntactic categories. These abstract categories (VP and NP) are grounded in linguistic experience (frequency of words) and semantic experience (verbs = actions and nouns = objects). I Our model shows that a small semantic seed allows the discovery of function words and that the knowledge of function words may help in the classification of novel words. corresponding author : [email protected]

Upload: others

Post on 27-Dec-2019

43 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Prosody as a means of acquiring syntactic categories and ... · Prosody as a means of acquiring syntactic categories and building a syntactic skeleton Gutman A.*, Dautriche I.#, Crabb

Prosody as a means of acquiring syntactic categories and building a syntactic skeletonGutman A.*, Dautriche I.#, Crabbe B+ & Christophe A.#

*Zukunftskolleg, Konstanz University, Germany #LSCP, Ecole Normale Superieure/CNRS, Paris +Alpage, Paris Diderot University

Background & Objective

How can children learn a language without explicit training ?Children acquiring their native language have to learn abstract syntactic categories (nouns,verbs, adjectives etc...) : these may facilitate the acquisition of word meanings [1] as well asthe syntactic structures of the language. Two-year-olds already have an abstractrepresentation of nouns and verbs [2]

What underlies this feat ? We propose to simulate this process with a computationalmodel based on experimental evidence.

A model for bootstrapping syntactic acquisition

We already know that :I 9-month-olds perceive prosodic boundaries [3]I Function words are frequent and situated at the border of prosodic boundaries [4].I Toddlers use function words to categorize the next content words [5].

Given this knowledge, we may construct an approximate syntactic structure [6] :

I Prosodic structure is treated as anapproximative shallow syntactic structure.a prosodic phrase = a syntactic phrase

I Function words and content words serve to labelcategorically the prosodic phrases (NP, VP).

Result : A syntactic skeleton.

Goal : Testing the feasibility of the labelingcomponent of the model.

[The little boy] [is running fast]

[The little boy] NP [is running fast] VP

S

NP

The little boy

VP

is running fast

Overview of the computational model

Input : Transcribed corpus of child directed speech augmented with prosodic boundariesfollowing the algorithm of [7].

Learning simulation : A prosodic phrase is seen as a probabilistic event, to which weassociate features :

I The observed features are words at the edge of the phrase (L, R).I The hidden feature is the syntactic category of the phrase (NP, VP).

Ex : [TheL−1rabbitR−1

]NP [isL0eatingR0

]VP [theL1grass of the gardenR1

]NP

We predict the category φ of each prosodic phrase from its features fi using Naıve Bayes :

ϕ = argmaxφp(φ)∏l

i=1 p(fi|φ)

The prior probability of the category p(φ) and the probability of the different features giventhat category p(fi|φ), are estimated via the Expectation-Maximization algorithm [7] :

I Initialization : Initially assign the prosodic phrases to syntactic clusters.I Maximization : Estimate the parameters of the model.I Expectation : Re-assign the prosodic phrases to new syntactic clusters.I Reiterate the two last steps until convergence.

References

[1] Gillette et al. (1999). Human simulations of vocabulary learning. Cognition, 73, 165 – 176.[2] Bernal, S. et al. (2010). Two-year-olds compute syntactic structure on-line. Developmental Science, 13, 69-73.[3] Gerken et al. (1994). 9-month-olds’ sensitivity to phonological versus syntactic phrases. Cognition 51.3, pp. 237–265[4] Morgan, J.L. et al. (1996). Perceptual bases of rudimentary grammatical categories. pp. 263– 283[5] Cauvet, E. et al. (inpress) Function words constrain on-line recognition of verbs and nouns in French 18-month-olds, LLD.[6] Christophe, A. et al. (2008). Bootstrapping lexical and syntactic acquisition. Language & Speech, 51, 61-75.[7] Nespor, M and I. Vogel. (1986) Prosodic phonology, volume 28. Walter de Gruyter.[8] A.P. Dempster et al (1977). Maximum likelihood from incomplete data via the EM algorithm.JRSS. 39(1) :1–38.[9] Bergelson, E. et al. (2012). At 6 to 9 months, human infants know the meanings of many common nouns. PNAS, 109.

Experiment 1 - Unsupervised clustering using Function words

Assumptions

I Children use function words to classify phrases.I Function words are frequent and often the first

word of a phrase [3].

Implementation

I Assign each prosodic phrase to a cluster labelled byits first word (the feature L).

I Retain only the clusters corresponding to the mostfrequent function words.

I Re-assign the rest of the corpus using the EMalgorithm.

Evaluation & Results

The quality of each cluster is evaluated by examining the largest syntactic category which it hascaptured : purity(Cl) =

Number of hits of largest categoryCluster size

Up-side :I Relying on function words appearing at the

edge of the prosodic phrase provides a powerfulclassification heuristic.

Down-side :I However no cluster corresponds to a single

category VP or NP but these categories aredistributed over several clusters.

Experiment 2 - Semi-supervised classification using Content words

Assumptions

I The child knows already some of the most frequentcontent words [8], and can identify them as objects(nouns) or actions (verbs) [4].

I Based on this knowledge she can identify the classof the current prosodic phrase.

Implementation

I Pre-define a list of known frequent content wordsbased on the corpus : a semantic seed

I Use the last word of each phrase (the feature R) toassign it to one of three classes : Nominal, Verbalor Other.

I Re-assign the unclassified part of the corpus usingEM algorithm

Proportion of each category

(NP, VP, Other) in each cluster

for the smallest seed (2% of

phrases initially classified)

Evaluation & Results

We vary the size of the semantic seed (known words) starting from {3 Nouns, 1 Verb} andcompare to a uniform random clustering of 3 clusters :

I The precision is very good even with asmall semantic seed (highly precise VPand NP clusters, which span about 50% ofthese categories).

I The recall is lower since the known wordsdo not cover all contexts.But precision is more important than recallfor learning !

I Although based on content words, thelearning algorithm ultimately relies onfunction words : knowing a few contentwords allows the learner to discover thefunction words associated with thecategory label.

Conclusion

I The syntactic skeleton hypothesis is corroborated : The language learner can rely on prosodic boundaries, function words and content words in order toconstruct a syntactic skeleton.

I Experiment 1 : While the prosodic boundaries give the structure of the prosodic skeleton, our baseline random EM shows that the prosodic structure alonedoes not permit the inference of meaningful phrasal categories. However relying on function words permits the construction of good categories (high purity).

I Experiment 2 : Relying on knowledge of frequent content words can lead to emergence of abstract syntactic categories. These abstract categories (VP andNP) are grounded in linguistic experience (frequency of words) and semantic experience (verbs = actions and nouns = objects).

I Our model shows that a small semantic seed allows the discovery of function words and that the knowledge of function words may help inthe classification of novel words.

corresponding author : [email protected]