uc-berkeley phonetic precursors and 2014 may 31 (sat...

16
UC-Berkeley Phonetic precursors and 2014 May 31 (Sat.) Sound Change structural typology Elliott Moreton 1 University of North Carolina, Chapel Hill Address for correspondence about this talk: [email protected]. 1 Introduction (1) Learning a language involves learning patterns which can be described in terms of logical relations between features. A couple of recent examples: a. The Scottish Vowel Length Rule: [i u aI] are long if and only if they precede a voiced fricative or [r] or a morpheme boundary (Stuart-Smith, Thursday). b. Northern Kammu tonotactics: If a syllable has a voiceless stop or a sonorant onset, then the syllable can bear either high or low tone; else if its onset is a glottal stop, then its tone is low; else its tone is high (Svantesson, Friday). (2) This talk addresses an apparent mismatch between, on the one hand, the typological frequency of logical structures in natural-language phonology, and, on the other, the difficulty of learning analogous artificial patterns in the lab. Logical structure Situation A if and only if B (regardless of C ) At least two of A, B, C Lab Harder Easier Nature More common Less common (3) Proposed solution: The typological frequency of different structures is affected by the abstract structure of the phonetic precursors. (4) Outline of the talk: §2 Lab studies of phonological pattern learning. SHJ Type II easier than Types III–V. §3 Typology: Type II is common, Type IV rare. How come? §4 Phonetic precursors of the SHJ types. §5 Illustration: English Diphthong Raising. §6 Discussion. 1 Much of the work reported here is part of a continuing collaboration with Joe Pater and Katya Pertsova. I am indebted to several other colleagues for ideas, discussion, and critique, including Ricardo Berm´ udez-Otero, Jeff Mielke, Jen Smith, and Erik Thomas, as well as to audiences at the Manchester Phonology Meeting and at NELS in 2012. Any remaining errors are to be ascribed to the author. Some of the work was funded by an internal grant from the University of North Carolina at Chapel Hill. 1

Upload: truongmien

Post on 31-Mar-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

UC-Berkeley Phonetic precursors and 2014 May 31 (Sat.)

Sound Change structural typology

Elliott Moreton1

University of North Carolina, Chapel HillAddress for correspondence about this talk: [email protected].

1 Introduction

(1) Learning a language involves learning patterns which can be described in terms of logicalrelations between features. A couple of recent examples:

a. The Scottish Vowel Length Rule: [i u aI] are long if and only if they precede a voiced fricativeor [r] or a morpheme boundary (Stuart-Smith, Thursday).

b. Northern Kammu tonotactics: If a syllable has a voiceless stop or a sonorant onset, then thesyllable can bear either high or low tone; else if its onset is a glottal stop, then its tone islow; else its tone is high (Svantesson, Friday).

(2) This talk addresses an apparent mismatch between, on the one hand, the typological frequencyof logical structures in natural-language phonology, and, on the other, the difficulty of learninganalogous artificial patterns in the lab.

Logical structureSituation A if and only if B (regardless of C) At least two of A, B, C

Lab Harder EasierNature More common Less common

(3) Proposed solution: The typological frequency of different structures is affected by the abstractstructure of the phonetic precursors.

(4) Outline of the talk:

§2 Lab studies of phonological pattern learning. SHJ Type II easier than Types III–V.

§3 Typology: Type II is common, Type IV rare. How come?

§4 Phonetic precursors of the SHJ types.

§5 Illustration: English Diphthong Raising.

§6 Discussion.

1Much of the work reported here is part of a continuing collaboration with Joe Pater and Katya Pertsova. Iam indebted to several other colleagues for ideas, discussion, and critique, including Ricardo Bermudez-Otero, JeffMielke, Jen Smith, and Erik Thomas, as well as to audiences at the Manchester Phonology Meeting and at NELSin 2012. Any remaining errors are to be ascribed to the author. Some of the work was funded by an internal grantfrom the University of North Carolina at Chapel Hill.

1

2 Structural bias in pattern learning

(5) Framework: What determines synchronic phonological typology?

a. Inductive bias (aka analytic bias), a property of the learner’s pattern-induction processes,makes some patterns easier to learn than others. Bias is inescapable for any inductive learner(see Pinker 1979; Mitchell 1990; Gallistel et al. 1991 for reviews. A similarity metric countsas an inductive bias; see Watanabe 1965; Quine 1969).

b. Channel bias, a property of the articulatory-acoustic-perceptual channel, systematically dis-torts the phonological form of utterance in transmission, thereby introducing new, phonetically-motivated patterns into the learner’s input (Hyman, 1976; Ohala, 1992, 1993). Bias is prob-ably also inescapable in any real-world communication channel.

c. Together, inductive and channel bias skew language change, and hence also skew the long-termsteady-state frequencies of different kinds of patterns (Bell, 1970, 1971; Greenberg, 1978).

Research problem: Infer humans’ inductive and channel biases by observing typology, lab experi-ments, acquisition, change, etc.

(6) Recent examples of proposals about inductive and channel biases:

a. Inductive bias: The phonological learner detects patterns based on sound environment, noton lexical category (Bybee, Thursday).

b. Channel bias: Individuals differ in the degree to which they coarticulate and compensate forcoarticulation (Yu, Friday).

2.1 Kinds of bias

(7) Structure vs. substance: Here are three patterns which have the same structure but differentsubstance:

a. PhonologyConsonant

Vowel short long

short *lam lamm

long la:m *la:mm

b. MorphologyNumber

Case sing. pl.

Acc. mur mur-sNom. mur-s mur

c. Non-linguistic gameShapes

Colors One Many

One Illegal Legal

Many Legal Illegal

Swedish: Either the vowelor the consonant of a closedstressed syllable is long, butnot both (Lofstedt, 1992).

Old French: /-s/ is attachedto an o-stem noun if it is nom-inative or plural, but not both(Luquiens, 1909, §289).

Qwirkle: In a row of tiles, ei-ther the colors or the shapesmust differ, but not both(Ross, 2006, 2).

(8) Much theoretical and experimental work in phonology has been motivated by the hypothesisthat learners have a substantive inductive bias which favors phonetically-motivated hypotheses overothers (Stampe, 1973; Prince and Smolensky, 1993; Archangeli and Pulleyblank, 1994; Hayes, 1999;Steriade, 2001; Wilson, 2006). Examples of hypothesized substantive biases:

a. There is a natural (innate) phonological process of final-obstruent devoicing, but not one offinal-obstruent voicing or final-sonorant devoicing (Donegan and Stampe, 2009).

b. CON includes Onset and NoCoda, but not NoOnset or Coda (Prince and Smolensky,1993)

2

c. Dep[p t k]/ V universally dominates Dep[P]/ V (Steriade, 2001)

d. The plasticity of a markedness constraint depends on the perceptual similarity of the soundsin the largest change which the constraint can motivate (Wilson, 2006).

(N.B. I’m citing these hypotheses, not necessarily endorsing them.)

(9) Substantive inductive bias has loomed very large in linguistic theory, for at least two rea-sons:

a. One of the most striking trends in phonological typology is, it tends to look like phonetics.Substantive bias offers an explanation for why.

b. Since substantive bias applies to inductive problems that only arise in phonology, finding anysubstantive bias at all would be good evidence that there are cognitive specializations forphonology.

(10) But structural bias also has a history in linguistic theory. Examples of hypothesized structuralbiases:

a. A phonological rule can only add or delete an association line on a Feature-Geometric tier(McCarthy, 1988).

b. When two markedness constraints are equally effective at distinguishing licit from illicit forms,rank the one that refers to more features, or that uses more disjunctions, lower (slightlygeneralized from Gordon 2004).

c. Every learnable phonotactic pattern can be represented as a finite-state machine (Heinz andIdsardi, 2011).

(11) Why would linguists care about structural inductive bias?

a. It is informative about the architecture of the learner, e.g., rule-based vs. prototype-basedlearning (Medin and Schwanenflugel, 1981). (Soskuthy, Thursday.)

b. The lab evidence for it is strong (see review in Moreton and Pater 2012a,b).

c. Structural bias is a potential link between phonology and other domains (morphology, syn-tax, psychology, biology, computer science, philosophy, . . . ). (Babel, McGuire, & Russell,Thursday.)

d. And . . .

(12) Most relevantly for this week’s theme, structural inductive bias might guide sound change

a. At the point of phonologization (this talk), or

b. In maintenance and transmission of a phonologized pattern over time (Seinhorst, today?)

3

2.2 Structural inductive bias in phonotactic patterns (Moreton et al., 2013, Exp.1)

(13) Research in visual pattern learning has focused on a family of pattern types first studied byShepard et al. (1961):

a. Type I is a simple one-feature affirmation

b. Type II is IFF/XOR on two features

c. Types III-V need all three features, though subsets can be described with two

d. Type VI is a three-way IFF/XOR; every subset needs three features

(N.B. We’re focusing on the SHJ types because there’s a big literature on them. The completefamily of ways to divide a 2x2x2 stimulus space has been studied by Feldman (2000). There isalso an extensive literature on 2x2 spaces, e.g., Bruner et al. (1956); Neisser and Weene (1962);Haygood and Bourne (1965). The SHJ patterns were first discussed in phonology by Silverman1999.)

(14) Supervised learning (participant sees a shape, classifies it as A or B, receives right/wrongfeedback, then on to the next one); no test of generalization outside the training set. The rate oflearning decreased with the number of critical features: I > II > III, IV, V > V I. This result hasbeen replicated many times in the vision literature (reviewed in Kurtz et al. 2013).

⇒ Not all of these patterns are equally easy to learn. The human learner’s inductive biases makesome of them harder than others.

(15) Some recent visual studies have also found that the advantage for Type II over Type IV canbe reduced or even reversed by changing the stimuli or task conditions:

a. Using unsupervised training (Love, 2002)

b. Not instructing participants to look for a rule (Love, 2002; Love and Markman, 2003; Lewandowsky,2011; Kurtz et al., 2013)

c. Making stimulus dimensions harder to verbalize, or perceptually less-separable (Nosofsky andPalmeri, 1996; Kurtz et al., 2013)

All of these changes make the task more like language learning.

4

(16) Stimulus design: MBROLA-synthesized C1V1C2V2 words defined by 8 binary features.(Thesestimuli have been used in several other experiments, including Moreton 2008; Lin 2009; Kapatsinski2011; Moreton 2012)

Stimulus positionσ1 σ2

Feature C1 V1 C2 V2

voiced ± ±Coronal ± ±high ± ±back ± ±

Consonants Vowels

k t g d æ O i u

– – + +

– + – +

– – + +

– + – +

(17) For each participant, 3 of the 8 stimulus features were randomly chosen as the relevant features,and then randomly mapped onto the three logical features defining the Shepard pattern to producethe “language” for that participant. Examples of artificial phonotactic patterns instantiating SHJTypes I, II, and IV.

L1 (TYPE I): C1 is voiceddigu, gada, dika, gugu, . . .

L2 (TYPE II): C1 is voiced iff V2 is back.digu, tægi, kagæ gada, . . .

L3 (TYPE IV): At least two of: C1 is voiced, V2 is high, V2 is backkaku, digu, guki, dæka, . . .

(18) N.B. This procedure deliberately generated “crazy rules” (Bach and Harms, 1972; Anderson,1981), phonetically unmotivated and typologically unattested, since the goal of the experiment wasto study purely structural effects. (Phonetic motivation and typological frequency in any case haveweaker effects than those of structure; Moreton and Pater 2012a,b.)

(19) Procedure:

a. Instructions: Participants (who were run in a lab, by a human) were told they would learnto pronounce words in an artificial language, and then be tested on ability to recognize wordsin that language.

b. Familiarization: They listened to and repeated aloud 32 randomly-chosen pattern-conformingstimuli 4 times over.

c. Test : Then they heard 32 randomly-chosen pairs of new stimuli (one pattern-conforming, onenot) and tried to identify the one that was “a word in the language you were studying”.

5

(20) This figure shows the proportion of pattern-conforming responses for each participant (Moretonet al., 2013, Figure 8):2

(21) Analysis by mixed-effects logistic regression (lmer in R), with Type II as reference categoryand a random intercept for Participant.3 Fixed-effects part of fitted model (4608 responses from144 participants; log-likelihood = −2879):

Coefficient Estimate SE z Pr(> |z|)(Intercept) 0.12803 0.14337 0.893 0.371865I 0.81999 0.17583 4.663 < 0.0001III 0.47202 0.17322 2.725 0.006432IV 0.63399 0.17423 3.639 0.000274V 0.38984 0.17265 2.258 0.023952VI -0.06134 0.17119 -0.358 0.720114Both Same Genus 0.11078 0.24329 0.455 0.648850Both Same Segment 0.69318 0.25149 2.756 0.005845CorrFirst 0.27396 0.06348 4.316 < 0.0001Redup -0.77231 0.10142 -7.615 < 0.0001

(22) ⇒ Unlike SHJ, Type II is harder than Types III–V.

2Plotting symbols for Type II: + = all relevant features are in the same segment analogue; 4 = all relevantfeatures are of the same type (agreement/disagreement pattern); ◦ = other.

3Since repeated measures were not made on items, no random intercept was included for Item.

6

3 The typological conundrum

(23) Problem: This result doesn’t seem to square with natural-language typology. In the nextthirty seconds,

a. If you were born in an odd-numbered year, please write down as many natural-languagephonotactic patterns as you can think of that have the form “A if and only if B”, or “Aunless B”.

b. If you were born in an even-numbered year, please write down as many as you can that havethe form “At least two of A, B, C”, or “At most two of A, B, C”.

(24) Some of this (apparent) asymmetry may be due to a special subclass of Type II patterns, theagreement and disagreement patterns, which are both more frequent in nature, and easier to learnin the lab, than other Type II patterns (Moreton, 2008; Lin, 2009; Moreton, 2012). However, thatcan’t be the whole story:

a. There are still many iff/xor distributional patterns involving instances of two different features(typical Ling 101 complementary distribution patterns)

b. This residue is especially noteworthy since feature systems are deliberately engineered tomake iff/xor patterns look like agreement (the idea behind Feature Geometry).

c. If you take the phonologically active classes in P-Base (Mielke, 2008), coerce them to SHJtypes relative to the SPE feature system, and compare the results to a chance model, Type IIis more overrepresented than Types III–V (Moreton and Pertsova, 2012). This can’t possiblybe due to a bias for agreement/disagreement patterns, since all the relevant features co-occurin a single segment.

(25) ⇒ There is apparently a mismatch between the relative difficulty of Type II and Types III–Vin the lab on the one hand, and their relative frequency in natural-language typology on the otherhand. Possible responses:

a. “Pattern learning in the lab has nothing to do with natural-language typology.”

b. “The typological asymmetry in favor of Type II isn’t coming from inductive bias, so it mustbe coming from channel bias”, i.e., from the phonetic precursors.

Both are worth considering, but today’s talk will focus on the second one.

4 Structural bias in phonetic precursors

(26) If the (apparent) typological bias isn’t coming from inductive bias, where is it coming from?

Maybe it’s inherited from the phonetic precursors: If phonetic variables tend to interact with eachother in ways that are analogous to Type II rather than to Types III–V, then the available poolof phonetic variation could be skewed enough towards Type II to overcome inductive bias in theother direction.

7

(27) Scenario: Formation of three-feature distributional patterns starting from

a. Two already-phonologized phonetic variables x and y, instantiating phonological features Xand Y with means x+, x− and y+, y− and standard deviations σx, σy. (For simplicity, assumeno covariance between x and y.)

b. One unphonologized phonetic variable z

(28) Assumption: Phonologization consists of

a. Introducing a binary phonological feature Z that divides the phonetic z axis to create a 2x2x2phonological partition of the (x, y, z) space

b. Designating densely-populated cells as “permitted” and sparsely-populated ones as “forbid-den”.

(29) Put together two ideas:

a. Minimum-difficulty surface (Lindblom, 1990): For a given (x, y), E[z] defaults to the corre-sponding point f(x, y).

b. Bimodal marginal distribution of z facilitates phonologization (Maye et al., 2002, 2008).

(30) Example: Phonologization to Type II. Note cartoon projections of the marginal z distributionat left. The equatorial plane (added in phonologization) falls on a notch in the distribution.

Phon

etic v

ariab

le x

-0.5

0.0

0.5

1.0

1.52.0

Phonetic variable y-0.5

0.0

0.5

1.01.5

Phonetic variable z

0.0

0.5

1.0

- -

- -- +

- +

+ -

+ -

+ +

+ +

8

(31) This figure illustrates an important special case, namely, the one where the minimum-difficultysurface is a (possibly tilted) plane. I.e., f(x, y) is a linear function of x and y.

We might expect f to often be linear, or nearly so, since real-world functions are differentiable, soa small enough piece is close to being a (possibly tilted) plane.

(32) A linear f restricts the possible ways to get a bi- or multimodal marginal distribution of z.The possibilities are:

a. A bimodal distribution in which two adjacent x-y cells cluster together with a high z and theother two with a low z, as in (30)

b. A unimodal, or broadly smeared, distribution in which there is no clear separation. Phonol-ogization would create a Type I pattern.

Phone

tic va

riable

x

-0.5

0.0

0.5

1.0

1.52.0

Phonetic variable y-0.5

0.0

0.51.0

1.5Phonetic variable z

0.0

0.2

0.4

0.6

0.8

1.0

- -- -- +- +

+ -+ -+ +

+ +

9

c. A trimodal distribution in which the +− and −+ cells cluster in the middle, with ++ aboveand −− below. Phonologization would create a Type V pattern.

Phone

tic va

riable

x

-0.50.00.51.01.52.0

Phonetic variable y-0.5

0.00.5

1.01.5

Phonetic variable z

-1

0

1

2

3

- -

- -

- +

- +

+ -

+ -

+ +

+ +

(33) The (X,Y ) categories are four points on a plane. Phonologization draws a straight line acrossthat plane, cutting off one point or two adjacent points. This holds even if, e.g., the x and y axesaren’t parallel.

10

(34) Not to say that Type IV would have no possible precursor in this scenario; it would just haveto be very non-linear:

Phone

tic va

riable

x

0.0

0.5

1.0

Phonetic variable y 0.0

0.5

1.0

z

0.0

0.5

1.0

- -- -

- +- +

+ -+ -

+ +

+ +

(with the (−X,−Y ) combination ruled out by some independent factor).

(35) In XY z phonologization scenarios, then, the precursor pool is predicted to be skewed towardsTypes I, II, and V, perhaps enough so to overcome inductive bias against Type II.

5 Example: The English Diphthong Raising family

(36) For a real-life illustration of these patterns, we turn to a family of alternations in Englisharound the world, in which the historical /aI/ diphthong has higher vs. lower allophones dependingon following context (TIGHT/TIDE/PRICE/PRIZE):

SVLR SGW CR–cont +cont –cont +cont –cont +cont

–voice 2i 2i aI aI 2i 2i+voice 2i aI a: a: aI aI

(37) Good example of phonologization because

a. Versions of it keep getting independently re-innovated, even in the 20th Century (see reviewin Moreton and Thomas 2007).

b. Has lexical exceptions (Vance, 1987)

c. The difference between the diphthongs is contrastive before flap (Mielke et al., 2003; Boersmaand Pater, 2007; Pater, 2014)

(38) Phonetic basis of CR/SGW: Abbreviation of diphthong nuclei and exaggeration of offglidesbefore voiceless codas (Thomas, 2000; Moreton, 2004).

11

(39) What about the SVLR-related height alternation? Bermudez-Otero (2014) suggests unitingit with CR/SGW via Pre-Fortis Clipping. Here is an alternative hypothesis: Diphthongs are alsomore open (lower) before fricatives than before stops.

Test this with [eI] in West Coast U.S. English to maximize chances of seeing unphonologized pho-netic effect.

(40) Isolated-word production study of 16 American English speakers, mostly from California (ex-ceptions: 1 moved around a lot, 1 Ohio, 1 NJ, 1 Oregon). Read word list aloud in 3 differentrandom orders:

Context Target vocoid

[cont] [voiced] E eI– – t bet, fret, set bait, fate, rate– + d Fred, bed, said fade, raid+ – s cess, fess, press race, base, face+ + z Prez, fez phase, raise

Dependent measure was F1 at the F1 maximum, which normally occurred early in the vowel.Measurement made in Hz with Praat (Boersma and Weenink, 2010), converted to Bark.

(41) Quadratic surface fit to the four condition means (R Development Core Team, 2005):

[continuant]

-0.5

0.0

0.5

1.0

1.5

[voiced]

-0.5

0.0

0.5

1.0

1.5

F1 at F1 max (Bark from

--)

-0.2

0.0

0.2

0.4

- -

- -

- +

- +

+ -

+ -

+ +

+ +

(42) Linear mixed-effects model (lmer function in lme4 package; random effects (1 + Continuant*

Voice | Participant))

Coefficient Estimate SE χ2 Pr(> |χ2|)(intercept) 4.62677 0.11029Continuant 0.12757 0.04064 9.0658 <0.003Voice 0.26426 0.04811 45.3538 <0.001Continuant:Voice -0.07348 0.04394 2.2180 0.1364

12

(43) Results jibe with expectations about precursor:

a. Voicing is associated with higher F1 (lower articulations), as in Canadian Raising, SouthernGlide Weakening, and similar patterns.

b. Continuancy is also associated with higher F1, as in the Scottish [aI] pattern.

c. The precursor surface is approximately planar, and is oriented so that there is a vertical gapbetween the voiced and the voiceless obstruents, as in the (very common) CR/SGW, not the(apparently unique) Scottish pattern.

6 Discussion

(44) Summary:

a. Inductive bias: Human phonotactic pattern learners are worse at learning Type II (iff/xor)patterns than at Type IV (at least two of . . . )

b. Typology : Although quantitative data is scarce, it does seem that Type II patterns are morecommon cross-linguistically than Type IV patterns. Hence, typology mismatches inductivebias.

c. Channel bias: In the XY z phonologization scenario, if minimal-difficulty surfaces are ap-proximately planar, then we expect phonologization to produce Type I, Type II, or Type V(depending on the disposition of the XY points and the orientation of the surface).

(45)⇒ Phonetic precursors have structural properties, and those structural properties may explaintypological facts that are not explained by structural inductive biases.

(46) Some questions raised:

a. What do the structural effects on pattern-learning difficulty reveal about the architectureof the learner? E.g., Type II vs. Type IV is often used as an indicator of rule-based vs.cue-based learning (Shepard et al., 1961; Gluck and Bower, 1988; Ashby et al., 1998; Love,2002; Smith et al., 2012).

b. What are the logical structures of phonetic precursors, and how frequent are they? In theXY z scenario, under what circumstances can we trust the minimum-difficulty surface to beapproximately planar?

c. What, in fact, are the relative typological frequencies of different logical structures amongphonological patterns?

d. Are the same logical-structure effects seen in phonologization and in the historical develop-ment of an existing pattern (e.g., generalization, (Cristia et al., 2013))?

13

References

Anderson, S. R. (1981). Why phonology isn’t “natural”. Linguistic Inquiry 12, 493–539.Archangeli, D. and D. Pulleyblank (1994). Grounded Phonology. Cambridge, Massachusetts: MIT Press.Ashby, F. G., L. A. Alfonso-Reese, A. U. Turken, and E. M. Waldron (1998). A neuropsychological theory

of multiple systems in category learning. Psychological Review 105 (3), 442–481.Bach, E. and R. T. Harms (1972). How do languages get crazy rules? In R. P. Stockwell and R. K. S.

Macaulay (Eds.), Linguistic change and generative theory, Chapter 1, pp. 1–21. Bloomington: IndianaUniversity Press.

Bell, A. (1970). A state-process approach to syllabicity and syllabic structure. Ph. D. thesis, StanfordUniversity.

Bell, A. (1971). Some patterns of the occurrence and formation of syllabic structure. Working Papers onLanguage Universals 6, 23–138.

Bermudez-Otero, R. (2014). Philadelphia /ai/-raising without rule insertion. Handout from the Symposiumon Historical Phonology, Edinburgh, January 13.

Boersma, P. and J. Pater (2007, October). Constructing constraints from language data: the case of CanadianEnglish diphthongs. Handout, NELS 38, University of Ottawa.

Boersma, P. and D. Weenink (2010). PRAAT Version 5.1.31. Software, www.praat.org.Bruner, J. S., J. J. Goodnow, and G. A. Austin (1956). A study of thinking. New York: John Wiley and

Sons.Cristia, A., J. Mielke, R. Daland, and S. Peperkamp (2013). Similarity in the generalization of implicitly

learned sound patterns. Laboratory Phonology 4, 259–285.Donegan, P. J. and D. Stampe (2009). Hypotheses of Natural Phonology. MS, University of Hawai’i.Feldman, J. (2000). Minimization of Boolean complexity in human concept learning. Nature 407, 630–633.Gallistel, C. R., A. L. Brown, S. Carey, R. Gelman, and F. C. Keil (1991). Lessons from animal learning for

the study of cognitive development. In S. G. Carey and R. G. Gelman (Eds.), The epigenesis of mind,Chapter 1, pp. 3–36. Hillsdale, New Jersey: Lawrence Erlbaum Associates.

Gluck, M. A. and G. H. Bower (1988). Evaluating an adaptive network model of human learning. Journalof Memory and Language 27, 166–195.

Gordon, M. (2004). Syllable weight. In B. Hayes, R. Kirchner, and D. Steriade (Eds.), Phonetically-basedphonology, pp. 277–312. Cambridge, England: Cambridge University Press.

Greenberg, J. H. (1978). Diachrony, synchrony, and language universals. In J. H. Greenberg, C. A. Ferguson,and E. A. Moravcsik (Eds.), Universals of human language, volume 1, method and theory, pp. 61–91.Stanford, California: Stanford University Press.

Hayes, B. (1999). Phonetically driven phonology: the role of optimality in inductive grounding. In M. Dar-nell, E. Moravcsik, M. Noonan, F. Newmeyer, and K. Wheatly (Eds.), Functionalism and Formalism inLinguistics, Volume 1: General Papers, pp. 243–285. Amsterdam: John Benjamins.

Haygood, R. C. and L. E. Bourne (1965). Attribute- and rule-learning aspects of conceptual behavior.Psychological Review 72 (3), 175–195.

Heinz, J. and W. Idsardi (2011). Sentence and word complexity. Science 333 (6040), 295–297.Hyman, L. M. (1976). Phonologization. In A. Juilland (Ed.), Linguistic studies offered to Joseph Greenberg:

second volume: phonology, pp. 407–418. Saratoga, California: Anma Libri.Kapatsinski, V. (2011). Modularity in the channel: the link between separability of features and learnability

of dependencies between them. Proceedings of the XVIIth International Congress of Phonetic Sciences.Kurtz, K. J., K. R. Levering, R. D. Stanton, J. Romero, and S. N. Morris (2013). Human learning of

elemental category structures: revising the classic result of Shepard, Hovland, and Jenkins (1961). Journalof Experimental Psychology: Learning, Memory, and Cognition 39 (2), 552–572.

Lewandowsky, S. (2011). Working memory capacity and categorization: individual differences and modelling.Journal of Experimental Psychology: Learning, Memory, and Cognition 37 (3), 720–738.

Lin, Y. (2009). Tests of analytic bias in native Mandarin speakers and native Southern Min speakers. InY. Xiao (Ed.), 21st North American Conference on Chinese Linguistics, Smithfield, Rhode Island, pp.81–92. Bryant University.

Lindblom, B. (1990). Explaining phonetic variation: a sketch of the H&H theory. In W. J. Hardcastle andA. Marchal (Eds.), Speech Production and Speech Modelling, pp. 403–439.

Lofstedt, I. (1992). Swedish segment length and Structure Preservation. Studia Linguistica 46 (2), 93–127.Love, B. C. (2002). Comparing supervised and unsupervised category learning. Psychonomic Bulletin and

14

Review 9 (4), 829–835.Love, B. C. and A. B. Markman (2003). The nonindependence of stimulus properties in human category

learning. Memory and Cognition 31 (5), 790–799.Luquiens, F. B. (1909). An introduction to Old French phonology and morphology. New Haven, Connecticut:

Yale University Press.Maye, J., D. J. Weiss, and R. N. Aslin (2008). Statistical phonetic learning in infants: facilitation and feature

generalization. Developmental Science 11 (1), 122–134.Maye, J., J. F. Werker, and L. Gerken (2002). Infant sensitivity to distributional information can affect

phonetic discrimination. Cognition 82, B101–B111.McCarthy, J. J. (1988). Feature Geometry and dependency: a review. Phonetica 45, 84–108.Medin, D. L. and P. J. Schwanenflugel (1981). Linear separability in classification learning. Journal of

Experimental Psychology: Human Learning and Memory 7 (5), 355–368.Mielke, J. (2008). The emergence of distinctive features. Oxford, England: Oxford University Press.Mielke, J., M. Armstrong, and E. Hume (2003). Looking through opacity. Theoretical Linguistics 29,

123–139.Mitchell, T. M. (1980 [1990]). The need for biases in learning generalizations. In J. W. Shavlik and T. G. Det-

terich (Eds.), Readings in machine learning, Chapter 2.4.1, pp. 184–191. San Mateo, California: MorganKaufman.

Moreton, E. (2004). Realization of the English postvocalic [voice] contrast in F1 and F2. Journal ofPhonetics 32 (1), 1–33.

Moreton, E. (2008). Analytic bias and phonological typology. Phonology 25 (1), 83–127.Moreton, E. (2012). Inter- and intra-dimensional dependencies in implicit phonotactic learning. Journal of

Memory and Language 67 (1), 165–183.Moreton, E. and J. Pater (2012a). Structure and substance in artificial-phonology learning: Part I, structure.

Language and Linguistics Compass 6 (11), 686–701.Moreton, E. and J. Pater (2012b). Structure and substance in artificial-phonology learning: Part II, sub-

stance. Language and Linguistics Compass 6 (11), 702–718.Moreton, E., J. Pater, and K. Pertsova (2013, October). Phonological concept learning. MS submitted to

Cognition.Moreton, E. and K. Pertsova (2012, October). Pastry phonotactics: Is phonological learning special? Pre-

sentation at the 43rd Annual Meeting of the Northeast Linguistic Society, City University of New York.Moreton, E. and E. R. Thomas (2007). Origins of Canadian Raising in voiceless-coda effects: a case study

in phonologization. In J. S. Cole and J. I. Hualde (Eds.), Papers in Laboratory Phonology 9, Berlin, pp.37–64. Mouton.

Neisser, U. and P. Weene (1962). Hierarchies in concept attainment. Journal of Experimental Psychol-ogy 64 (6), 640–645.

Nosofsky, R. M. and T. J. Palmeri (1996). Learning to classify integral-dimension stimuli. PsychonomicBulletin and Review 3 (2), 222–226.

Ohala, J. J. (1992). What’s cognitive, what’s not, in sound change. In G. Kellerman and M. D. Morissey(Eds.), Diachrony within synchrony: language history and cognition, pp. 309–355. Frankfurt am Main:Peter Lang.

Ohala, J. J. (1993). The phonetics of sound change. In C. Jones (Ed.), Historical linguistics: problems andperspectives, pp. 237–278. Harlow: Longman.

Pater, J. (2014). Canadian Raising with language-specific weighted constraints. Language 90 (1), 230–240.Pinker, S. (1979). Formal models of language learning. Cognition 7, 217–283.Prince, A. and P. Smolensky (1993). Optimality Theory: constraint interaction in generative grammar.

Department of Linguistics, Rutgers University.Quine, W. V. O. (1969). Natural kinds. In W. V. O. Quine (Ed.), Ontological relativity and other essays.

New York: Columbia University Press.R Development Core Team (2005). R: A language and environment for statistical computing. Vienna,

Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0.Ross, S. M. (2006). Qwirkle: mix, match, score, and win! Game instructions, MindWare Company.Shepard, R. N., C. L. Hovland, and H. M. Jenkins (1961). Learning and memorization of classifications.

Psychological Monographs 75 (13, Whole No. 517).Silverman, D. (1999, April). On allophonic relations: phonetic similarity or functional identity. Handout,

GLOW99 (Generative Linguistics in the Old World) Workshop on Phonetics in Phonology.Smith, J. D., M. E. Berg, R. G. Cook, M. S. Murphy, M. J. Crossley, J. Boomer, B. Spiering, M. J. Beran,

15

B. A. Church, F. G. Ashby, and R. C. Grace (2012). Implicit and explicit categorization: a tale of fourspecies. Neuroscience and Biobehavioral Reviews xx (xx), xxxx–xxxx.

Stampe, D. (1973). A dissertation on Natural Phonology. Ph. D. thesis, University of Chicago.Steriade, D. (2001). The phonology of perceptibility effects: the P-map and its consequences for constraint

organization. MS, Department of Linguistics, University of California, Los Angeles.Thomas, E. R. (2000). Spectral differences in /ai/ offsets conditioned by voicing of the following consonant.

Journal of Phonetics 28 (1), 1–25.Vance, T. J. (1987). “canadian Raising” in some dialects of the northern United States. American Speech 62,

195–210.Watanabe, S. (1965). Une explication mathematique du classement d’objets. In S. Dockx and P. Bernays

(Eds.), Information and prediction in science, pp. 39–76. New York: Academic Press.Wilson, C. (2006). Learning phonology with substantive bias: an experimental and computational study of

velar palatalization. Cognitive Science 30 (5), 945–982.

16