ling 696b: gradient phonotactics and well-formedness

LING 696B: Gradient phonotactics and well-formedness

Vote on remaining topics Topics that have been fixed:

Morpho-phonological learning (Emily) + (LouAnn’s lecture) + Bayesian learning

Rule induction (Mans) + decision tree Learning and self-organization

(Andy’s lecture)

Voting on remaining topics Select 2-3 from the following (need

a ranking): OT and Stochastic OT Alternatives to OT: random

fields/maximum entropy Minimal Description Length word

chopping Feature-based lexical access

Well-formedness of words (following Mike’s talk) A word “sounds like English” if:

It is a close neighbor of some words that sound really English. E.g. “pand” is neighbor of sand, band, pad, pan, …

It agrees with what English grammar says what an English word should look like, e.g. gradient phonotactics says blick > bnick

Well-formedness of words (following Mike’s talk) A word “sounds like English” if:

It is a close neighbor of some words that sound really English. E.g. “pand” is neighbor of sand, band, pad, pan, …

It agrees with what English grammar says what an English word should look like, e.g. gradient phonotactics says blick > bnick

Today: relate these two ideas to the non-parametric and parametric perspectives

Many ways of calculating probability of a sequence Unigrams, bigrams, trigrams,

syllable parts, transition probabilities … No bound on the number of creative

Many ways of calculating probability of a sequence Unigrams, bigrams, trigrams, syllable

parts, transition probabilities … No bound on the number of creative ways

What does it mean to say the “probability” of a phonological word? Objective/frequentist v.s. subjective/

Bayesian: philosophical (but important)

Many ways of calculating probability of a sequence

Unigrams, bigrams, trigrams, syllable parts, transition probabilities … No bound on the number of creative ways

What does it mean to say the “probability” of a phonological word? Objective/frequentist v.s. subjective/

Bayesian: philosophical (but important) Thinking “parametrically” may clarify

things “likelihood” = “probability” calculated from

a model

Parametric approach to phonotactics Example: “bag of sounds”

assumption/ exchangable distributions p(blik) = p(lbik) = p(kbli)

Parametric approach to phonotactics Example: “bag of sounds”

assumption/ exchangable distributions p(blik) = p(lbik) = p(kbli)

Unigram models: N-1 parameters

B L I K

What is ?How to get (hat)?How to assign prob to “blick”?

Parametric approach to phonotactics Unigram model with overlapping

observations: N2 - 1 parameters

B L I K

Note: input is #B BL LI IK K#

What is ?How to get (hat)?How to assign prob to “blick”?

Parametric approach to phonotactics Unigram with annotated

observations (Coleman and Pierrehumbert)

BL IKOnset of strong

Initial/final syllableRhyme of strong

Initial/final syllable

“osif” “rsif”

Input: segment annotated with a syllable parse

Parametric approach to phonotactics Bigram model: N(N-1) parameters

{p(wn|wn-1)} (how many for trigram?)

B L I K

Input: segment sequence

Ways that theory might help calculate probability Probability calculation must be

based on an explicit model Need a story about what sequences

are How can phonology help with

calculating sequence probability? More delicate representations More complex models

Ways that theory might help calculate probability Probability calculation must be

based on an explicit model Need a story about what sequences

are How can phonology help with

calculating sequence probability? More delicate representations More complex models

But: phonology is not quite about what sequences are …

More delicate representations Would CV phonology help?

Auto-segmental tiers, features, gestures? The chains no longer independent: more

sophisticated models are needed Limit: generative model of speech

production (very hard)

B L I K I T

More complex models Mixture of unigrams

Used in document classification

B L I K

Lexical strata

Unigram

More complex models More structure in the Markov chain

Can also model the length distribution with the so-called semi-Markov models

“onset” “rhyme V”

“rhyme VC”

More complex models Probabilistic context free grammar

Syllable --> C + VC (0.6) Syllable --> C + V (0.35) Syllable --> C + C (0.05) C --> _ (0.01) C --> b (0.05) …

See 439/539

What’s the benefit for doing more sophisticated things? Recall: maximum likelihood need

more data to produce a better estimate

Data sparsity problem: training data often insufficient for estimating all the parameters, e.g. zero counts Lexicon size: we don’t have infinitely

many words to estimate phonotactics Smoothing: properly done, has a

Bayesian interpretation (often not)

Probability and well-formedness Generative modeling: characterize a

distribution over strings Why should we care about this

distribution? Hope: this may have something to do with

grammaticality judgements But: judgements also affected by what

other words “sound like”. Puzzle of mrupect/mrupation It may be easier to model a function with

input = string, output = judgements

Bailey and Hahn Tried all kinds of ways of calculating

phonotatics and neighborhood density, and see which combination “works the best” Typical reasoning: “metric X and Y as

factors explain 15% variance”

Bailey and Hahn Tried all kinds of ways of calculating

phonotatics and neighborhood density, and see which combination “works the best” Typical reasoning: “metric X and Y as factors

explain 15% variance” Methodology: ANOVA

Model (1-way): data = overall mean + effect + error

What can ANOVA do for us? How do we check if ANOVA makes sense? What is the “explained variance”?

Non-parametric approach to similarity neighborhood

A hint from B&H: the neighborhood model dij is weighted edit distance A,B,C,D estimated from polynomial

regression Recall: radial basis functions F(x) = i

ai K(x, xi), with K(x, xi) = e -d(x, xi)

Quadratic weighting ad hoc, should just do general nonlinear regression with RBF

Non-parametric approach to similarity neighborhood Recall: RBF as a “soft”

neighborhood model

Now think of strings also as data points, with neighborhood defined by some string distance (e.g. edit) Same kind of regression with RBF

Non-parametric approach to similarity neighborhood Key technical point: choosing the

right kernel Edit-distance kernel: K(x, xi) = e - edit(x, xi)

Sub-string kernel: measuring the length of common sub-sequence (mrupation)

Key experimental data: controlled stimuli, split into training and test sets (equal phonotactic prob) No need to transform rating scale

Non-parametric approach to similarity neighborhood An enterprise of questions open up

with the non-parametric perspective: Would yes/no task lead to word

“anchor” like support vectors? Would the new words interact with

each other, as seen in the transductive inference?

What type of metric most appropriate for inferring well-formedness from neighborhoods?

Integration Hard to integrate with a probabilistic

(parametric) model Neighborhood density has a strong non-

parametric character -- grows with data Possible to integrate phonotactic prob

in a non-parametric model: kernel algebra aK1(x,y) + bK2(x,y), K1(x,y)*K2(x,y) are

also kernels p kernel: K(x1, x2)= i p(x2|h)p(x1|h)p(h) p

comes from parametric model

ling 696b: gradient phonotactics and well-formedness

Documents

1 ling 696b: midterm review: parametric and non-parametric...

research unlearnable phonotactics

ling 696b: pca and other linear projection methods

running head: children rapidly learn novel phonotactics...

“tense” /æ/ is still lax: a phonotactics study

the consonant phonotactics of georgian the consonant -...

well-formedness in two dimensions: a generalization of

ling 696b: phonotactics wrap-up, ot, stochastic ot

1 ling 696b: phonotactics wrap-up, ot, stochastic ot

(9) phonotactics & coarticulation

psy 696b, analyzing neural time-series data · eeg...

psy 696b, analyzing neural time-series...

syllables and phonotactics

sweets event • rpress butter bakery event 1 ,696b 1 901...

the gradient phonotactics of english cvc...

ling 696b: maximum-entropy and random fields

learning tiers for long-distance phonotactics

the consonant phonotactics of georgian

georgian consonant sequences - georgian phonotactics

a beats-and-bindings account of italian phonotactics