constructing constraints from language data...constructing constraints from language data: the case...

NELS 38, University of Ottawa, Oct. 27th, 2007

Constructing constraints from language data:The case of Canadian English diphthongs1

Paul Boersma and Joe PaterUniversity of Amsterdam and University of Massachusetts, Amherst

1. Overview

(1) § 2 Proposal

Learning of structural constraints from schema applied to observed forms

§ 3 Illustration

Learning semi-predictable distribution of diphthong pairs [ai] / [!i] and [au] / [!u] inCanadian English, including natural and unnatural constraints

§ 4 Proposal, Illustration

How natural phonological constraints emerge from phonetic biases in a bidirectionalmodel of phonology and phonetics: phonologization of gradient raising

2. Learning of structural constraints from observed forms

2.1 Background assumptions

Optimality Theory (OT) translated to Harmonic Grammar (HG) (original translation due to Prince andSmolensky 1993/2004: 236; HG and probably the more general notion of numerically weighted linguisticconstraints is due to Legendre, Miyata and Smolensky 1990; original application of HG to phonology is Goldsmith1990/1993; for further references and discussion see Smolensky and Legendre 2006 and Pater, Bhatt and Potts 2007)

(2) 2 1 H

bat NOCODA MAX

bat 1 1! ba 1 2

The HG tableau conventions assumed here:

(3) i. The cell above each constraint's name is that constraint's weight

ii. Constraints assign positive scores for satisfaction

iii. The rightmost cell in each candidate's row is that candidate's Harmony (sum of

weighted constraint scores)

1 Special thanks to Michael Becker for indispensible technical help (AKA OT-Help), and to Karen Jesney with helpon native speaker judgments. Thanks also to them and Adam Albright, Ricardo Bermúdez-Otero, Bruce Hayes, BillIdsardi, Elliot Moreton and Matt Wolf for very useful discussion.

iv. The optimal candidate is the one with the highest Harmony

On-line error-driven OT/HG learner (Tesar 1995 et seq.; Boersma 1998 et seq.; Jäger 2003; Pater 2007)

(4) i. Correct input-output pair (i1, o11) presented to learner

ii. Learner computes optimal output (o12) given i1, and its current ranking/weighting of

the constraints

iii. If o11 does not match o12 (an 'error'), learning is triggered

For example, if the correct pair (bat, bat) were presented to a learner with the weighting in (2)above, the indicated optimal pairing (bat, ba) would be an error.

HG update rule (Jäger 2003; Soderstrom, Mathis & Smolensky 2006; Boersma & Weenink 2007; Pater 2007):

(5) For each constraint, the updated value is calculated by subtracting the incorrect mapping'sscore from that of the correct mapping, multiplying this value by a constant n, and addingthe result to that constraint's weight

Violation scores of the correct mapping (C) and error (E) in our example, and their difference:

(6) NOCODA MAX

C (bat, bat) 1E (bat, ba) 1C - E -1 1

If n=1, the result of adding the result of C - E in (6) to the weights in (2) is shown in (7). As alsoshown in (7), the grammar will successfully parse the next coda in the learning data:

(7) 2 1 H

bat MAX NOCODA

! bat 1 2ba 1 1

Convergence proofs and testing of this learning procedure for HG and HG-like theories ofgrammar, including ones that deal with variation (unlike standard OT):

(8) Fischer (2005), Boersma and Pater (2007)

To achieve a sufficiently restrictive end-state grammar (Smolensky 1996, Hayes 2004, Prince and Tesar

2004) we assume that the structural constraints (like NOCODA), begin with an initial weightgreater than the faithfulness constraints (like MAX). Comparison of HG with OT in this regard,including a proposal that the weightings of structural constraints changes faster than those offaithfulness constraints:

(9) Jesney and Tessier (today, 2007), Magri (in prep.)

Other applications of HG linear models and HG-like log-linear models to phonology andphonological learning (amongst others, we're sure):

(10) Goldwater and Johnson (2003), Wilson (2006), Hayes and Wilson (2007), Goldrick andDaland (2006), Coetzee and Pater (2007), Albright, Michaels and Magri (2007)

2.2 The proposal

An issue:

(11) How can an on-line learner acquire constraints that generalize across observed forms?

Our answer:

(12) i. For each observed form, the learner creates constraints encoding the relationshipsbetween structural elements

ii. 'Generalization' comes from the subsequent re-weighting of constraints as more formsare encountered

Our general conception of constraint construction (see Burzio 2002 on implicational constraints):

(13) Given structural element e1 and structural element e2 that co-occur in an observed form,create a constraint:

e1 ! e2 Assign a reward of 1 to each instance of e1 that co-occurs with e2.

This general idea could be implemented in many ways. The best approach can only bedetermined through large-scale implementation and testing, which we haven't yet undertaken.

(14) a. Assignment of negative penalties for violation (as in standard OT)?2

b. Constraints that refer to sets of elements: what size are the sets?

c. Constraints that refer to elements in particular domains/structural positions: what arethe domains/windows within which the constraints are created?

We have been inspired by Hayes and Wilson's (2007) approach to phonotactic learning, whichprovides concrete proposals about some of the above. The main differences between ourapproaches (critical comparison would be premature):

2 We state our constraints positively for simplicity. We are aware of the infinite goodness problem. Whether thiswill be a problem for our system will depend on whether we adopt an OT-style GEN (cf. Soderstrom et al. 2006). Ifwe do, the problem can be avoided by simply translating all of our constraints into penalty assigners.

(15) a. We use a grammar of a form familiar from OT (a MaxEnt version would estimate conditional

probabilities, as in Goldwater and Johnson 2003, Jäger 2003, Wilson 2006), with an on-line learner

b. Hayes and Wilson create all logically possible constraints fitting their schema,independent of whether they are satisfied in any observed form or not, and then winnowdown that set (see relatedly Boersma, Escudero and Hayes 2003)

3. Illustration: Distribution of Canadian English diphthongs

3.1 Generalization through learning and re-weighting constraints

Generalization about Canadian English diphthongs (to be refined!):

(16) 'Raised' diphthongs [!i], [!u] before voiceless consonants, [ai], [au] elsewhere

Given a single observed form [!is] 'ice', a learner could create the following constraints, amongstmany others (we use 'D' as an abbreviation for the relevant 'root node' characteristics of thediphthongs, and 'C' for consonant):

(17) Observed structure Constructed constraint

a. D RAISED DIPHTHONG: A diphthong must be [-low][-low] (Assign a reward of 1 to each diphthong that is [-low])

b. D C RAISED/VOICELESS: A diphthong preceding a voiceless

[-low][-vce] consonant must be raised(Assign a reward of 1 to each diphthong that is [-low] thatprecedes a voiceless consonant)

The constraint that encodes the correct generalization is of course (17b). The learner willdiscover this after encountering further data, like [ai] 'eye' and [saiz] 'size', which yield theconstraints in (18):


a. D LOW DIPHTHONG: A diphthong must be [low][low]

b. D C LOW/VOICED: A diphthong preceding a voiced[low][+vce] consonant must be [low]

We'll make the simplifying assumption that these three structural constraints interact with asingle faithfulness constraint:

(19) Faithfulness constraint

IDENT-[LOW] Assign a reward of 1 to an input vowel whose outputcorrespondent has the same specification for Low

We give the structural constraints an initial weighting value of 10, and the Faithfulness constraintan initial value of 0. Learning data (supplied in equal proportions):

(20) (!is, !is) (ai, ai) (saiz, saiz)

The constraint values found by the Praat HG learning implementation (Boersma and Weenink2007) are shown in the following tableaux.

(21) 11 10 10 9 1Input Output LOW

DIPHTHONGRAISED /

VOICELESS

RAISED /VOICED

RAISED

DIPHTHONG

IDENT-LOW

H

!is ! !is 1 1 1 20

ais 1 11ai ! ai 1 1 12

!i 1 9

saiz ! saiz 1 1 12

s!jz 1 1 19

Idsardi (2006) provides the following clever example incontrovertibly demonstratingproductivity (and also that richness of the base must yield raising, rather than devoicing of voiced consonants

following raised vowels, as in Mielke, Armstrong and Hume 2003):

(22) 'i' [ai] 'y' [wai] versus 'i-th' [!i"] 'y-th' [w!i"]

We derive the raising pattern: not only do these values pick the correct forms, they also producethe correct results in a richness of the base test,3 where the diphthongs are contextually misplacedin the input.

3 We do not wish to imply that all things done under the rubric of 'richness of the base' are necessarilypsychologically real. In the absence of alternations like those in (23), this can be interpreted as shorthand for what aspeaker does in performing a phonological grammaticality judgment task: mapping of an observed form to anidentical underlying form, and then creating an input-output map to see if it matches (as in Coetzee 2004; this issimilar to the procedure in error-driven learning described above). Such a task may or may not be possible inpractice, depending on whether speakers are able to robustly perceive the phonologically ill-formed structure.

(23) 11 10 10 9 1Input Output LOW

DIPHTHONGRAISED /

VOICELESS

LOW /VOICED

RAISED

DIPHTHONG

IDENT-LOW

H

ais ! !is 1 1 19

ais 1 1 12

!i ! ai 1 11

!i 1 1 10

s!iz ! saiz 1 1 21

s!iz 1 1 10

A simple point here, which is common to this approach and the standard OT account of learning,with a large universal constraint set (Tesar 1995 et seq.):

(24) Generalization comes from constraint ranking/weighting

A more novel point (though see precedents discussed above):

(25) Phonological constraints can be directly induced from observed forms

3.2 An application of constraint learning: a 'crazy contrast'

Some arguments that constraints must be learned:

(26) i. Claims about innateness require better motivation than typological observations aboutthings like attestation of post-nasal voicing and absence of pre-nasal voicing (see Pater1995/1999/2004 on the typology, and Hayes and Stivers 1995 and Hayes 1999 on the phonetics and alearning proposal; note that the claim in Pater 1999 is that the constraint is universal, not innate - perhapsbetter would be 'universally available'. Note too that Prince and Smolensky 1993/2004 seem never to use

the word 'innate' (though they do speak of UG); see Tesar and Smolensky 2000 for relevant discussion)

ii. Morpheme-specific constraints (first posited in OT for Lardil and Tagalog in Prince and

Smolensky 1993/2004) are necessarily language-specific (see Pater to appear for a learning

proposal)

iii. If phonological categories are learned, then the constraints that refer to them must belearned (Boersma 1998; see Maye 2000 and Mielke 2004 for related work)

iv. Needed to analyze 'crazy rules' (Bach and Harms 1972 et seq.)

Here we demonstrate another consequence of learned constraints:

(27) Analysis, including learning, of 'crazy contrasts'

The distribution of Canadian English diphthongs includes an instance of a crazy contrast (see Joos1942, Vance 1982, Mielke, Armstrong and Hume 2003, and Hayes 2004 for variants of contrast-based analyses, see

Chomsky 1957, Halle 1962, Chomsky 1964, Chomsky and Halle 1968, Chambers 1973/1975 and Idsardi 2006

(source of references) for an alternative approach, and Bermúdez-Otero 2003 for yet another):

(28) Contrast before flaps

[th!i"l#] title

[b$ai"l#] bridle

[m!i"$#] mitre

[sai"$#] cider

This is the only segmental4 environment for the Canadian5 English contrast. This contrast is

'crazy' in that there seems to be no phonetic reason that the flap should license it.

The challenge (Bermudez-Otero 2003):

(29) To show how a learner can arrive at a grammar that allows the pre-flap contrast, but stillproductively chooses the correct form of the diphthongs elsewhere

Some more evidence of productivity (Note that some of these could be lexicalized, and that they haven't allbeen well checked. Note too that 'psychology' and other similar examples will require an expanded constraint set; we

have experimented with a syllable-based analysis, which seems to work fine):

(30) a. 'south' [!w%] versus 'southie' [aw&]

b. 'house' [!ws] versus 'house' [awz] (also alternate plural form 'houses' [hawz'z])

c. 'life' [!jf] versus 'lives' [ajv]

d. 'psych' [!jk] versus 'psychology' [aj.k]

e. 'scyth' [!j%] versus 'scythes' [ai&z]

Presenting the words in (28) to the learner yields two more constraints, amongst many others:


D C RAISED / FLAP [-low][flap] A raised diphthong is followed by a flap

4 In the second author's speech, it appears to be the case that a diphthong in a primary stressed syllable followed bya secondary stressed syllable is invariably unraised (e.g. icon, cyclops, micron, eyeful). However, there are reports ofcontrast and morphological influences in this environment (see Bermudez-Otero 2003 and Mielke et al. 2003 fordiscussion and references), and informal observation seems that to indicate that there may even be intra-speakervariation conditioned by speech rate. Clearly, Canadian raising provides a continuing source of opportunities forboth empirical and theoretical research.

5 See Vance (1987) for evidence of contrast in upstate New York. We have not come across similar data inCanadian speech.

D C LOW / FLAP

[+low][flap] A low diphthong is followed by a flap

A difference between HG and OT is that we can get the contrast as a gang effect between theseconstraints and IDENT-[LOW], as exemplified in the following tableaux:

(32) 2 2 1 H

th!j"l# RAISED /FLAP

LOW /FLAP

IDENT-LOW

! th!j"l# 1 1 3

thaj"l# 1 2

(33) 2 2 1 H

b$ai"l# RAISED /FLAP

LOW /FLAP

IDENT-LOW

! b$ai"l # 1 1 3

b$!i"l# 1 2

Because IDENT-LOW can maintain a low position in the constraint hierarchy and producecontrast, in our slightly larger learning simulation, we still get the emergent alternations.

The constraint set and learned values (RAISED / VOICED comes from [th!j"l#]):

(34) LOW DIPHTHONG 12RAISED / VOICELESS 11RAISED / FLAP 11RAISED / VOICED 11LOW / FLAP 9RAISED DIPHTHONG 8LOW / VOICED 8IDENT-VOICE 3IDENT-LOW 2IDENT-FLAP 0

The observed mappings (both in the learning data, and final state output):

(35) (b$ai"l#, b$ai"l#)

(th!i"l#, th!i"l#)(saiz, saiz)(ai, ai)(!is, !is)

Generalization to unseen forms:

(36) (aid, aid)(!id, aid)

(ait, !it)

(!it, !it)

(!iz, aiz)(ai, ai)

(ais, !is)

4. The emergence of universal and language-specific constraints

So we maintain that structural constraints express language-specific structural facts aboutrepresentations. The big question then becomes: how come so many languages have the samekind of structural constraints, and the same kinds of rankings/weightings? Here we give aschematic answer.

Since pre-voiceless raising of low vowels seems to have emerged independently in variousplaces (for an overview, see Moreton & Thomas 2004), Canadian raising is a good case foranswering the universality question. We minimally need the following model of connectionswithin and between the phonology and the phonetics (Boersma 1998, 2007; articulatory form andauditory form collapsed into one phonetic form for simplicity):

(37) <morphemes>lexical constraints

Šunderlying form Š

/surface form/

[phonetic form]

faithfulness constraintsstructural constraintscue constraintsarticulatory constraints

4.1 Generation 1 optimizes her comprehension

Suppose that the first generation of learners is confronted with a �perfect� language environmentin which the vowels in <ice> and <size> are pronounced with the exact same quality ["i], despitetheir different durations:

(38) Language environment for a learner of generation 1

<ice> ["#is]<size> [s"$iz]

The learner�s task as a listener is to make sense of these given combinations of sound andmeaning. She will have to construct the intermediate representations (underlying and surfaceform). The learner�s first action will be to classify the sounds into appropriate categories. On thebasis of ["#i] and ["$i] she will have no reason to posit any different category of vowel quality than/"i/. The learner�s sound input will be highly variable, i.e. for the vowel /"i/ it will contain

realizations like [!"i], [!i], and [!#i]. On the basis of the language data, then, the learner will create

new cue constraints for all possible combinations of sound and category (Boersma, Escudero &Hayes 2003:1015):

(39) Cue constraints for vowel quality created by a learner of generation 1

/!i/[!"i]/!i/[!i]/!i/[!#i]

It has been shown in computer simulations that a Stochastic OT or Stochastic HG learner whooptimizes her comprehension (i.e. her capability of disambiguating vowels) will end up rankingher cue constraints in such a way that they prefer more peripheral auditory values over morecentral auditory values (Boersma 2006a, Boersma & Hamann 2007). For the case at hand, whereperipheral means �low�, the cue constraints will end up being weighted higher for loweredrealizations than for raised realizations:

(40) Comprehension-optimizing weighting of cue constraints for a learner of generation 1

/!i/[!"i] >> /!i/[!i] >> /!i/[!#i]

The learner will also create cue constraints for the relation between vowel duration and voicing.The duration of the vowel before voiceless consonants is assumed to have been phoneticized inan English-specific way (i.e. any automatic articulatory or acoustic motivation has been lost),and is therefore regulated by cue constraints that are weighted in an English-specific way(Boersma 2006c):

(41) Cue constraints for vowel duration created by a learner of generation 1

/V/[V $!vce] >> /V/[V!vce] >> /V/[V%!vce]/V/[V %+vce] >> /V/[V+vce] >> /V/[V$+vce]

The learner�s second action is to link the perceived vowel /!i/ to something in the lexicon,presumably preferably the underlying form |!i|. The learner can do this by creating new

faithfulness constraints as soon as she has created the relevant categories (Boersma 1998:278),which are here only /!i/:

(42) Faithfulness constraint created by a learner of generation 1

|!i|/!i/

Finally, stored underlying forms have to be linked to morphemes (lemmas) and to meaning. Assoon as a morpheme cooccurs with an underlying form, the learner can create a new lexical

constraint (Boersma 2001, Escudero 2005:214�236, Apoussidou 2006). For instance, onelexical constraint connects the morpheme <ice> to the underlying form |!is|. The speaker has notencountered any other UFs for <ice>, nor any other morphemes for |!is|; hence she needs only asingle positive constraint for each connection:

(43) Lexical constraints created by a learner of generation 1

<ice>|!is|<size>|s!iz|

If these constraints are available, and the learner hears the sound [!"is], she will be able to accessthe correct morpheme, as shown by the following parallel comprehension tableau:

(44) Parallel comprehension in generation 1

10 8 7 7 5 3 2 H

[!"is] <ice>|!js|

|!j|/!j/

/V/[V "!vce]

/V/[V #+vce]

/!i/[!$i]

/!i/[!i]

/!i/[!%i]

! <ice> |!is| /!is/ +1 +1 +1 +1 +28

Given the positive constraints mentioned in this section, and restricting GEN in such a way thatevery connection between adjacent representations in (41) has to positively satisfy at least oneconstraint, the tableau can contain only one candidate, and it is a quite good one. The only thingthat would be even more harmonic (H = +30) is the comprehension of the enhanced (motherese?)utterance [!"$is]...

4.2 Generation 1 introduces a phonetic contrast in production

The generation-1 learner has become successful as a listener. But she will also become a speakerherself, and during a phase of sensorimotor acquisition she will learn that the sounds [!$], [!], and[!%] can be implemented by the articulations [!$], [!], and [!%], respectively. On the basis of vocalplay, then, she will create new articulatory constraints (Boersma 1998:280), separately forlong and short realizations:

(45) Articulatory constraints created by a learner of generation 1

*[!"$] *[!#$]*[!"] *[!#]*[!"%] *[!#%]

We note that in speaking she will have to take into account that shortened vowels (such astypically occur in English before voiceless consonants) are more difficult to pronounce with afully open jaw. This leads to a fixed weighting of articulatory constraints that prefer raisedrealizations over lowered realizations (if they are short) and long realizations over shortrealizations (if they are lowered):

(46) Effort-based weighting of articulatory constraints for speakers of generation 1

*[!"$] >> *[!"] >> *[!"%]*[!"$] >> *[!#$]

We next have to assume bidirectional use of constraints, i.e. in production the learner will usethe same weightings that have optimized her comprehension earlier (Boersma 2007, 2006a;Boersma & Hamann 2007). The cue constraints then come to interact with the articulatory

constraints. With a certain intertwining of these two constraint families, the <ice> morphemewill be pronounced with a raised diphthong:

(47) Parallel production in generation 1

10 8 7 6 5 4 3 2 1 H

<ice> <ice>|!is|

|!i|/!i/

/V/[V "!vce]

*[!"#] /!i/[!#i]

*[!"] /!i/[!i]

/!i/[!$i]

*[!"$]

! |!is| /!is/ [!"$is] +1 +1 +1 +1 !1 +26

|!is| /!is/ [!"is] +1 +1 +1 !1 +1 +24

|!is| /!is/ [!"#is] +1 +1 +1 !1 +1 +24

|!is| /!is/ [!%#is] +1 +1 +1 +23

The voiced case, unhindered by any high-weighted articulatory constraints, ends up with the lowvowel that is preferred by the cue constraints:


10 8 7 6 5 4 3 2 1 H

<size> <size>|s!jz|

|!j|/!j/

/V/[V %+vce]

*[!"#] /!i/[!#i]

*[!"] /!i/[!i]

/!i/[!$i]

*[!"$]

|s!iz| /s!iz/ [s!%$iz] +1 +1 +1 +1 +27

|s!iz| /s!iz/ [s!%iz] +1 +1 +1 +1 +28

! |s!iz| /s!iz/ [s!%#iz] +1 +1 +1 +1 +30

|s!iz| /s!iz/ [s!"$iz] +1 +1 +1 !1 +22

Within a single generation, then, the language has introduced a small phonetic bias on the basisof differential articulatory effort: low vowels get raised before voiceless consonants. (Comment:the present explanation involved duration; other explanations in terms of articulatory effort,biomechanics, physiological-acoustic relations, or acoustic transmission would give similarresults.)

4.3 Generation 2 has a full phonetic contrast (i.e. in comprehension and production)

The children of the learner of generation 1 are confronted with a slightly different languageenvironment than their parents:


<ice> [!"$is]<size> [s!%#iz]

If the difference between the two realizations is not large enough to create two separate vowelquality categories, the learner will create the same category /!i/ and therefore the samefaithfulness and lexical constraints as her parents did. She will also create the same cueconstraints for vowel duration. But there are new kinds of variation: the <ice> vowel will vary

between [!"i], [!"#i], and [$"i], and the <size> vowel between [!%i], [!%&i], and [a %i]. The learner willtherefore create different cue constraints for vowel quality, and these will be sensitive to thevoicing of the following consonant (just as the cue constraints for vowel duration):


(/!i/[ai!vce]) /!i/[ai+vce](/!i/[!&i!vce]) /!i/[!&i+vce]/!i/[!i!vce] /!i/[!i+vce]/!i/[!#i!vce] (/!i/[!#i+vce])/!i/[$i!vce] (/!i/[$i+vce])

A computer simulation of the optimization of comprehension (as in Boersma 2006 and Boersma& Hamann 2007) shows that these constraints will be weighted partly by the peripherality bias,partly by their language-specific use: when compared to the rankings in §4.2, the weighting scalefor the voiceless case is compressed, that for the voiced case expanded. With rounded figures, weget the weights in (51) and (52).

In <ice>, we now get an exaggerated raising:


10 8 7 6 4 3 3 3 3 1 0 H

<ice> <ice>|!is|

|!i|/!i/

/V/[V "!vce]

*[!"&] *[!"] /!i/[!&i!vce]

/!i/[!i!vce]

/!i/[!#i!vce]

/!i/[$i!vce]

*[!"#] *[$"]

! |!is| /!is/ [$"is] +1 +1 +1 +1 !1 +28

|!is| /!is/ [!"#is] +1 +1 +1 +1 !1 +27

|!is| /!is/ [!"is] +1 +1 +1 !1 +1 +24

|!is| /!is/ [!"&is] +1 +1 +1 !1 +1 +22

And in <size>, we get an exaggerated lowering:


10 8 7 7 6 3 0 H

<size> <size>|s!jz|

|!j|/!j/

/V/[V %+vce]

/!i/[ai+vce]

/!i/[!&i+vce]

/!i/[!i+vce]

/!i/[!#i+vce]

|s!iz| /s!iz/ [s!%#iz] +1 +1 +1 +1 +25

|s!iz| /s!iz/ [s!%iz] +1 +1 +1 +1 +28

|s!iz| /s!iz/ [s!%&iz] +1 +1 +1 +1 +31

! |s!iz| /s!iz/ [sa%iz] +1 +1 +1 +1 +32

Within two generations, then, the language has grown a large phonetic distinction: [!"i] beforevoiceless consonants, [#$i] before voiced consonants. (Comment: The process of enhancementwill not continue catastrophically. In the present example it has led to the fairly large phoneticcontrast [!"i]~[a$i], but it might well have stopped short before had the parameters been slightlydifferent. In general the cue constraints will force a halt before the auditory realization enters therealm of a neighbouring category, such as [ei] in the present case.)

4.4 Generation 3a creates a phonological contrast

For the speakers of generation 2, the contrast between [!"i] and [#$i] was �just phonetic�. But theirchildren are confronted with a bigger contrast than the generation-2 speakers had been whenyoung:


<ice> [!"is]<size> [sa$iz]

The difference can be large enough for their children to reanalyse the distinction as categorical atthe surface level, i.e. as the vowel categories /!i/ and /ai/. The learner will create cueconstraints that take this contrast into account:


(/!i/[ai]) /ai/[ai](/!i/[#%i]) /ai/[#%i](/!i/[#i]) (/ai/[#i])/!i/[#&i] (/ai/[#&i])/!i/[!i] (/ai/[!i])

The cue constraints of this generation need no longer be conditioned by voicing.

In addition to the cue constraints, the listener can now also create new structural constraints

for the cooccurrence of the features [lowered/raised] and [+vce/!vce] (Boersma, Escudero &Hayes 2003:1016; Hayes & Wilson 2006):

(55) Structural constraints created by a learner of generation 3 (the same as in §2)

/!vce/"/!i//+vce/"/ai/

Please notice that these constraints can be considered natural, hence universal constraints! Yet,the learner has created them from the language data with the help of her learning procedure(�innately guided empiricism�). What makes the constraints universal is that they ultimately(historically) derive from a universal phonetic bias.

Te next question is what the underlying form contains. It is possible that the surface level cancontain major allophones that do not contrast lexically (Benders in prep.). If so, then the/!i/~/ai/ contrast can be restricted to the surface level, whereas the underlying form could still

just contain |!i|, perhaps on the basis of alternations such as <life>~<life-s> [l"#if]~[la $ivz]. If thissituation can be realistic, the �faithfulness� constraints that the learner creates must have anarbitrary element:

(56) �Faithfulness� constraints created by a learner of generation 3a

|!i|/"i/|!i|/ai/

The lexical constraints that generation 3a creates will be identical to those that their parents andgrandparents created.

The choice between the allophones is now completely determined by structural constraints:

(57) Parallel production in generation 3a

10 8 8 7 7 5 5 H

<ice> <ice>|!is|

|!i|/"i/

|!i|/ai/

/!vce/"/"i/ /+vce/"/ai/ /"i/["i]

/ai/[ai]

! |!is| /"is/ ["#is] +1 +1 +1 +1 +30

|!is| /ais/ [a$is] +1 +1 +1 +23

4.5 Generation 3b creates a lexical contrast

If alternations like <life>~<life-s> [l"#if]~[la $ivz] are unimportant, the lexicon receives no supportfor positing a single lexical phoneme. Therefore, the lexicon may start to contain |"i| and |ai|. Thefaithfulness constraints will be simplified:

(58) Faithfulness constraints created by a learner of generation 3b

|"i|/"i/|ai|/ai/

The lexical constraints will now change as well:

(59) Lexical constraints created by a learner of generation 3b

<ice>|"is|<size>|saiz|

Production proceeds as follows:

(60) Parallel production in generation 3b

10 8 8 7 7 5 5 H

<ice> <ice>|!is|

|!i|/!i/

|ai|

/ai//!vce/"/!i/ /+vce/"/ai/ /!i/

[!i]/ai/[ai]

! |!is| /!is/ [!"is] +1 +1 +1 +1 +30

|!is| /!is/ [a#is] +1 +1 +1 +25

|!is| /ais/ [a#is] +1 +1 +15

This is just the story of how small phonetic learning biases can grow and then percolate up intothe phonology and the lexicon: a story about phoneticization, phonologization, andlexicalization. It works in OT and in HG, and relies on the emergence of articulatory, cue,structural, faithfulness, and lexical constraints from the language input; without this emergence,the story would have been much more complicated.

A point to take home from §4 (in case you had not heard it yet from Boersma & Hamann 2007 orBoersma 2006b) is that the Bidirectional Phonology & Phonetics model solves the question (e.g.Hayes & Wilson 2006: 43) of how universally fixed phonetically-based scales of constraintrankings/weightings can exist when there are learning algorithms around that rank/weight theconstraint in a way that is appropriate for the language. The answer is that, as in §4.3, computersimulations show that (cue or faithfulness) constraint rankings/weightings that have been learnedfrom the ambient language while optimizing comprehension, may cause phonetically-lookingbiases when they are reused in production.

!Some" conclusionsLearning abstract URs:

(61) You may have noticed that the analysis of the interaction between Canadian raising andflapping did not involve learning underlying /t/ and /d/ for flap. This avoids the issue ofhow these neutralized segments are recovered by the learner.

We have not shown, or even suggested:

(62) ...that such recovery is in principle impossible, or that all cases can be reanalyzed alongthe lines we have proposed for this one case.

On natural constraints:

(63) You may have noticed that our learner is able to learn constraints that could producepatterns that probably exist in no human language.

An on-going line of research:

(64) Experimental studies of the learning natural and unnatural patterns, other studies ofpotential learning biases (Esper 1925, Schane et al. 1974, Finley yesterday), and computationalmodeling of these results (Wilson 2006)

Some possibilities based on our proposals:

(65) i. A more fully developed procedure for learning structural constraints might be used tomodel biases for simple featural generalizations

ii. Other biases might be in the phonetics itself (note that the model in section 4 renders moot thedifficult question of how to incorporate phonetic biases into the phonology; cf. Hayes 1999, Wilson 2006,Flack 2007. This is also true in Flemming's 2001 HG work, but it is not clear how his single-level model

can incorporate learning of unnatural phonotactic constraints)

Another ongoing line of research (section 4, Boersma 2006b, Boersma and Hamann 2007; see Kochetov 1999

Zuraw 2000, Wedel 2004 and others for related work):

(66) Explicit computational modeling of historical changes from neutral or unnatural patternsinto natural ones

We have not shown, or even suggested:

(67) ...that doing phonological analysis, and making typological predictions, with aninteracting set of natural constraints, is a bad idea. On the contrary, we think this is anexcellent way of discovering important generalizations about language.

On phonologization and optimization:

(68) As far as we know, the proposal in section 4, is the only extant formally explicit analysisof the process of creating a phonological constraint or rule from a gradient pattern.Phonologization of gradient processes is often presented as an alternative to OptimalityTheory (e.g. Blevins 2004). It's worth noting that this account of phonologization has atits heart an optimization-based choice procedure.

Some referencesAlbright, Adam, Giorgio Magri, Jennifer Michaels (2007). Paper to be presented at BUCLD.Apoussidou, Diana (2007). The learnability of metrical phonology. PhD thesis, University of Amsterdam.Bermúdez-Otero, Ricardo (2003). The acquisition of phonological opacity. In Jennifer Spenader, Anders Eriksson &

Östen Dahl (eds), Variation within Optimality Theory: Proceedings of the Stockholm Workshop on�Variation within Optimality Theory�, 25�36. Stockholm: Department of Linguistics, Stockholm University.[Long version ROA 593]

Blevins, Juliette (2004). Evolutionary phonology: The emergence of sound patterns. Cambridge: CambridgeUniversity Press.

Boersma, Paul (1998). Functional phonology. PhD thesis, University of Amsterdam.

Boersma, Paul (2006a). Prototypicality judgments as inverted perception. In Gisbert Fanselow, Caroline Féry,Matthias Schlesewsky & Ralf Vogel (eds.), Gradedness in grammar. Oxford: Oxford University Press. 167-184.

Boersma, Paul (2006b). The acquisition and evolution of faithfulness rankings. Talk at MFM 14, Manchester, May27, 2006. [http://www.fon.hum.uva.nl/paul/]

Boersma, Paul (2006c). Parallel bidirectional phonology and phonetics. Talk at III Congresso Internacional deFonética e Fonologia, Belo Horizonte, November 29, 2006. [http://www.fon.hum.uva.nl/paul/]

Boersma, Paul, Paola Escudero & Rachel Hayes (2003). Learning abstract phonological from auditory phoneticcategories: An integrated model for the acquisition of language-specific sound categories. Proceedings of the15th International Congress of Phonetic Sciences. 1013�1016.

Boersma, Paul, & Silke Hamann (2007). The evolution of auditory contrast. ROA 909.Boersma, Paul, & Joe Pater (2007). Convergence properties of a gradual learner for Harmonic Grammar. Ms.Burzio, Luigi (2002). Surface-to-surface morphology: when your representations turn into constraints. In P. Boucher

(ed.) Many morphologies. Cascadilla Press. 142�177. [Preliminary version ROA 341]Escudero, Paola (2005). The attainment of optimal perception in second-language acquisition. PhD thesis, Utrecht

University.Fischer, Markus (2005). A Robbins-Monro type learning algorithm for an entropy maximizing version of stochastic

Optimality Theory. MA thesis, Humboldt University, Berlin. [ROA 767]Flack, Kathryn (2007). Inducing functionally grounded constraints. Ms. University of Massachusetts, Amherst. ROA

920.Goldsmith, John (1990). Autosegmental and metrical phonology. Malden: Blackwell.Goldwater, Sharon, & Mark Johnson (2003). Learning OT constraint rankings using a maximum entropy model. In

Jennifer Spenader, Anders Eriksson, and Östen Dahl (eds.) Proceedings of the Workshop on Variation withinOptimality Theory. Stockholm University. 111�120.

Idsardi, William (2006). Canadian raising, opacity and rephonemization. Canadian Journal of Linguistics.Jäger, Gerhard (2003). Maximum entropy models and stochastic Optimality Theory. To appear in Jane Grimshaw,

Joan Maling, Chris Manning, Jane Simpson & Annie Zaenen (eds.) Architectures, rules, and preferences: aFestschrift for Joan Bresnan. Stanford: CSLI. [available as Rutgers Optimality Archive 625]

Legendre, Géraldine, Yoshiro Miyata, & Paul Smolensky. 1990. Harmonic Grammar � a formal multi-levelconnectionist theory of linguistic wellformedness: An application. In Proceedings of the twelfth annualconference of the Cognitive Science Society, 884�891. Cambridge, MA: Lawrence Erlbaum.

Magri, Giorgio. In prep. MIT evals paper.Moreton, Elliott, & Erik R. Thomas (2004). Origins of Canadian raising in voiceless-coda effects: a case study in

phonologization. To appear in Jennifer Cole & José I. Hualde (eds.) Papers in Laboratory Phonology 9,Mouton-De Gruyter, Berlin.

Pater, Joe (1999). Austronesian nasal substitution and other NC effects. The prosody-morphology interface. CUP.Pater, Joe (2007). Gradual learning and convergence. Linguistic Inquiry.Pater, Joe, Rajesh Bhatt and Christopher Potts (2007). Linguistic optimization. Ms, University of Massachusetts,

Amherst.Smolensky, Paul, and Geraldine Legendre. 2006. The harmonic mind: From neural computation to Optimality-

Theoretic grammar. Cambridge, MA: MIT Press.Soderstrom, Melanie, Donald Mathis, & Paul Smolensky (2006). Abstract genomic encoding of Universal Grammar

in Optimality Theory. In Paul Smolensky & Géraldine Legendre (eds.) The harmonic mind. MIT Press.403�471.

Wilson, Colin (2006). Learning phonology with substantive bias: an experimental and computational study of velarpalatalization. Cognitive Science 30: 945�982.

constructing constraints from language data...constructing constraints from language data: the case...

Documents