laboratory phonology 11, 30 june - 2 july 2008, wellington, new zealand the gradient phonotactics of...

1
Laboratory Phonology 11, 30 June - 2 July 2008, Wellington, New Zealand The Gradient Phonotactics of English CVC Syllables Olga Dmitrieva & Arto Anttila Department of Linguistics, Stanford University Introduction Factors affecting the well-formedness of English CVC syllables: OCP-place: gradient prohibition against homorganic consonants in C1 and C2 (e.g. gag vs. gap). HYPOTHESIS: Syllables with C1 and C2 of the same place of articulation are underrepresented. Prominence alignment between stress, vowel height, and consonant place: HYPOTHESIS: Syllables that violate prominence alignment are underrepresented. Methods Material: • CMU pronunciation dictionary and CELEX lemma lexicon. • Stress: primary stress vs. no stress. • Consonants: coronal, dorsal, labial. • Vowels: high (= high or reduced) and low (= low or mid). Effect size evaluation: • Observed frequency/Expected frequency ratio (O/E ratio): P(dorsal-V-dorsal) = P(onset=dorsal) * P(coda=dorsal) E(dorsal-V-dorsal) = P(dorsal-V-dorsal) * Total • Multiple regression. Results 1.21 1.17 C oronal 1.03 1.63 D orsal 1.09 0.84 Labial C oronal D orsal Labial O nset C oda 0.94 1.21 1.17 C oronal 1.03 0.51 1.63 D orsal 1.09 0.84 0.31 Labial C oronal D orsal Labial O nset C oda 1.20 1.29 C oronal 1.25 0.78 D orsal 1.23 0.77 Labial C oronal D orsal Labial O nset C oda 0.86 1.20 1.29 C oronal 1.25 0.42 0.78 D orsal 1.23 0.77 0.40 Labial C oronal D orsal Labial O nset C oda 0% 20% 40% 60% 80% 100% Labial D orsal C oronal C oda C oronal D orsal Labial 0% 20% 40% 60% 80% 100% Labial D orsal Coronal C oda C oronal D orsal Labial Regression OT Analysis Conclusions 1.04 1.14 C oronal 0.96 1.30 D orsal 1.68 1.27 Labial U nstressed Stressed U nstressed Stressed C oda O nset 1.04 0.95 1.14 0.80 C oronal 1.03 0.96 0.79 1.30 D orsal 0.65 1.68 0.87 1.27 Labial U nstressed Stressed U nstressed Stressed C oda O nset 1.03 1.12 C oronal 0.99 1.21 D orsal 1.18 1.15 Labial U nstressed Stressed U nstressed Stressed C oda O nset 1.03 0.88 1.12 0.82 C oronal 1.01 0.99 0.86 1.21 D orsal 0.88 1.18 0.83 1.15 Labial U nstressed Stressed U nstressed Stressed C oda O nset 0% 20% 40% 60% 80% 100% Stressed U nstressed C oronal D orsal Labial 1.04 1.11 C oronal 1.06 1.42 D orsal 1.87 1.23 Labial High Low High Low C oda O nset 1.04 0.92 1.11 0.78 C oronal 0.90 1.06 0.70 1.42 D orsal 0.68 1.87 0.94 1.23 Labial High Low High Low C oda O nset 1.03 1.15 C oronal 1.03 1.41 D orsal 1.12 1.28 Labial High Low High Low C oda O nset 1.03 0.96 1.15 0.76 C oronal 0.87 1.03 0.64 1.41 D orsal 0.93 1.12 0.83 1.28 Labial High Low High Low C oda O nset 0% 20% 40% 60% 80% 100% Low High Coronal D orsal Labial 1.29 High 2.42 Low Vow el quality U nstressed Stressed Syllable type 1.29 0.46 High 0.24 2.42 Low Vow el quality U nstressed Stressed Syllable type 1.37 High 1.89 Low Vow el quality U nstressed Stressed Syllable type 1.37 0.45 High 0.41 1.89 Low Vow el quality U nstressed Stressed Syllable type 0% 20% 40% 60% 80% 100% Stressed U nstressed C oronal D orsal Labial 0% 20% 40% 60% 80% 100% Low High Coronal D orsal Labial 0% 20% 40% 60% 80% 100% Stressed Unstressed Low High 3. Syllables violating consonant-vowel alignment are underrepresented: 1. Syllables that violate OCP-place are underrepresented: CMU CELEX CMU Onset-coda cooccurrences (O/E values): 2. Syllables that violate consonant-stress alignment are underrepresented: a. Labials and dorsals in unstressed syllables. b. Coronals in stressed syllables. CMU CELEX 4. Syllables that violate vowel-stress assignment are underrepresented: a. Low vowels in unstressed syllables. b. High vowels in stressed syllables. CMU CELEX CELEX R = 0.943 (F(6, 35) = 38.689, p < 0.001) R = 0.945 (F(6, 35) = 13.515, p < 0.001) 25,888 CVC syllables from CELEX 83,798 CVC syllables from CMU [rI] repeat [pit] O/E ratio > 1.00 overrepresentation O/E ratio < 1.00 underrepresentation Cases: 36 syllable types: 3 onset place * 3 coda place * 2 stress * 2 vowel height e.g. LLHS - labial-labial, high vowel, stressed In CMU significant factors: • Vowel-stress alignment • OCP • No labial/dorsal in unstressed syllables In CELEX significant factors: • Vowel-stress alignment • OCP • No labial/dorsal with high vowels CMU CELEX • A set of unranked OT constraints generate implicational universals that reflect relative phonotactic markedness: More marked forms entail less marked forms. More marked forms surface less frequently. • Sample universal : If a language allows gag (violates OCP) it also allows gap. Gap is always more frequent than gag. • The implicational universals can be described graphically as a partial order. • Precision (how many of the predicted relationships are correct): CMU 0.85 CELEX 0.86 Gradient OCP-place is active in all CVC syllables (not just monosyllabic words, cf. Berkley 1994). Prominence alignment in CVC syllables: The best stressed syllable has low or mid vowels. The best unstressed syllable has high or reduced vowels and coronal consonants. Positional neutralization and augmentation for vowels. Only positional neutralization for consonants. References: Anttila, A. (2008). Gradient phonotactics and the Complexity Hypothesis. To appear in Natural Language and Linguistic Theory. Anttila, A. & Andrus, C. (2006). T-Orders. Ms., Stanford University. Baayen, R. H., Piepenbrock, R., & Gulikers, L. (1995). The CELEX Lexical Database (Release 2). Philadelphia, PA: Linguistic Data Consortium, University of Pennsylvania [Distributor]. Berkley, D. (1994). Variability in Obligatory Contour Principle effects. CLS 30, pp. 1-12. Coetzee, A., & Pater, J. (2008). Weighted constraints and gradient restrictions on place co-occurrence in Muna and Arabic. To appear in Natural Language and Linguistic Theory. Weide, R. (1998). The CMU pronunciation dictionary (Release 0.6). Carnegie Mellon University. Available online at http://www.speech.cs.cmu.edu/cgi-bin/cmudict. Constraints (significant regression factors): OCP Avoid homorganic C1 and C2 *x/a Avoid unstressed low vowels *X/I Avoid stressed high vowels *x/p_ Avoid labial/dorsal C1 in unstressed syllables *x/_p Avoid labial/dorsal C2 in unstressed syllables *p_/I Avoid labial/dorsal C1 + high vowel *i/_p/ Avoid high vowel + labial/dorsal C2 Faith Do not change input segments Graphic representation of implicational relationships in CELEX data. 0% 20% 40% 60% 80% 100% Stressed Unstressed Low High Dependent variable: • Log of the observed frequency. Independent variable: • Log of the expected frequency. • Binary coded phonotactics factors: 1 – violates, 0 – does not violate. <0.01 -2.947 0.250 -0.737 Lab&D orcoda/U nstressed <0.01 -3.268 0.251 -0.820 Lab&D oronset/U nstressed <0.001 -4.557 0.257 -1.169 Unstressed/Low <0.001 -6.723 0.254 -1.708 Stressed/H igh <0.001 -4.709 0.204 -0.961 OCP <0.001 11.120 0.077 0.857 Expected p t Std.Error C oefficient Factors <0.01 -2.947 0.250 -0.737 Lab&D orcoda/U nstressed <0.01 -3.268 0.251 -0.820 Lab&D oronset/U nstressed <0.001 -4.557 0.257 -1.169 Unstressed/Low <0.001 -6.723 0.254 -1.708 Stressed/H igh <0.001 -4.709 0.204 -0.961 OCP <0.001 11.120 0.077 0.857 Expected p t Std.Error C oefficient Factors <0.05 -2.43 0.196 -0.477 Lab&D orcoda/H igh <0.01 -3.372 0.196 -0.660 Lab&D oronset/H igh <0.001 -7.591 0.199 -1.513 Unstressed/Low <0.001 -4.151 0.2 -1.830 Stressed/H igh <0.001 -4.599 0.16 -0.783 OCP <0.001 9.689 0.088 0.856 Expected p t Std.Error C oefficient Factors <0.05 -2.43 0.196 -0.477 Lab&D orcoda/H igh <0.01 -3.372 0.196 -0.660 Lab&D oronset/H igh <0.001 -7.591 0.199 -1.513 Unstressed/Low <0.001 -4.151 0.2 -1.830 Stressed/H igh <0.001 -4.599 0.16 -0.783 OCP <0.001 9.689 0.088 0.856 Expected p t Std.Error C oefficient Factors a. Low vowels with coronals. b. High vowels with labials or dorsals. stressed > unstressed low vowel > high vowel labial/dorsal > coronal

Upload: avice-banks

Post on 13-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Laboratory Phonology 11, 30 June - 2 July 2008, Wellington, New Zealand The Gradient Phonotactics of English CVC Syllables Olga Dmitrieva & Arto Anttila

Laboratory Phonology 11, 30 June - 2 July 2008, Wellington, New Zealand

The Gradient Phonotactics of English CVC Syllables Olga Dmitrieva & Arto Anttila

Department of Linguistics, Stanford UniversityIntroductionIntroduction

Factors affecting the well-formedness of English CVC syllables:

• OCP-place: gradient prohibition against homorganic consonants in C1 and C2 (e.g. gag vs. gap).

HYPOTHESIS: Syllables with C1 and C2 of the same place of articulation are underrepresented.

• Prominence alignment between stress, vowel height, and consonant place:

HYPOTHESIS: Syllables that violate prominence alignment are underrepresented.

MethodsMethodsMaterial:

• CMU pronunciation dictionary and CELEX lemma lexicon. • Stress: primary stress vs. no stress.• Consonants: coronal, dorsal, labial.• Vowels: high (= high or reduced) and low (= low or mid).

Effect size evaluation:

• Observed frequency/Expected frequency ratio (O/E ratio):

P(dorsal-V-dorsal) = P(onset=dorsal) * P(coda=dorsal) E(dorsal-V-dorsal) = P(dorsal-V-dorsal) * Total

• Multiple regression.

ResultsResults

0.941.211.17Coronal

1.030.511.63Dorsal

1.090.840.31Labial

CoronalDorsalLabial

Onset

Coda

0.941.211.17Coronal

1.030.511.63Dorsal

1.090.840.31Labial

CoronalDorsalLabial

Onset

Coda

0.861.201.29Coronal

1.250.420.78Dorsal

1.230.770.40Labial

CoronalDorsalLabial

Onset

Coda

0.861.201.29Coronal

1.250.420.78Dorsal

1.230.770.40Labial

CoronalDorsalLabial

Onset

Coda

0%

20%

40%

60%

80%

100%

Labial Dorsal Coronal

Coda

Coronal

Dorsal

Labial

0%

20%

40%

60%

80%

100%

Labial Dorsal Coronal

Coda

Coronal

Dorsal

Labial

RegressionRegression

OT AnalysisOT Analysis

ConclusionsConclusions

1.040.951.140.80Coronal

1.030.960.791.30Dorsal

0.651.680.871.27Labial

UnstressedStressedUnstressedStressed

CodaOnset

1.040.951.140.80Coronal

1.030.960.791.30Dorsal

0.651.680.871.27Labial

UnstressedStressedUnstressedStressed

CodaOnset

1.030.881.120.82Coronal

1.010.990.861.21Dorsal

0.881.180.831.15Labial

UnstressedStressedUnstressedStressed

CodaOnset

1.030.881.120.82Coronal

1.010.990.861.21Dorsal

0.881.180.831.15Labial

UnstressedStressedUnstressedStressed

CodaOnset

0%

20%

40%

60%

80%

100%

Stressed Unstressed

Coronal

Dorsal

Labial

1.040.921.110.78Coronal

0.901.060.701.42Dorsal

0.681.870.941.23Labial

HighLowHighLow

CodaOnset

1.040.921.110.78Coronal

0.901.060.701.42Dorsal

0.681.870.941.23Labial

HighLowHighLow

CodaOnset

1.030.961.150.76Coronal

0.871.030.641.41Dorsal

0.931.120.831.28Labial

HighLowHighLow

CodaOnset

1.030.961.150.76Coronal

0.871.030.641.41Dorsal

0.931.120.831.28Labial

HighLowHighLow

CodaOnset

0%

20%

40%

60%

80%

100%

Low High

Coronal

Dorsal

Labial

1.290.46High

0.242.42LowVowel quality

UnstressedStressed

Syllable type

1.290.46High

0.242.42LowVowel quality

UnstressedStressed

Syllable type

1.370.45High

0.411.89LowVowel quality

UnstressedStressed

Syllable type

1.370.45High

0.411.89LowVowel quality

UnstressedStressed

Syllable type

0%

20%

40%

60%

80%

100%

Stressed Unstressed

Coronal

Dorsal

Labial

0%

20%

40%

60%

80%

100%

Low High

Coronal

Dorsal

Labial

0%

20%

40%

60%

80%

100%

Stressed Unstressed

Low

High

3. Syllables violating consonant-vowel alignment are underrepresented:1. Syllables that violate OCP-place are underrepresented:

CMU

CELEX

CMU

Onset-coda cooccurrences (O/E values):

2. Syllables that violate consonant-stress alignment are underrepresented:

a. Labials and dorsals in unstressed syllables.b. Coronals in stressed syllables.

CMU

CELEX

4. Syllables that violate vowel-stress assignment are underrepresented:

a. Low vowels in unstressed syllables.b. High vowels in stressed syllables.

CMU

CELEX

CELEX

R = 0.943 (F(6, 35) = 38.689, p < 0.001)

R = 0.945 (F(6, 35) = 13.515, p < 0.001)

25,888 CVC syllables from CELEX83,798 CVC syllables from CMU

[rI]

repeat

[pit]

O/E ratio > 1.00 overrepresentationO/E ratio < 1.00 underrepresentation

Cases: 36 syllable types:3 onset place * 3 coda place * 2 stress * 2 vowel height

e.g. LLHS - labial-labial, high vowel, stressed

In CMU significant factors:• Vowel-stress alignment• OCP• No labial/dorsal in unstressed syllables

In CELEX significant factors:• Vowel-stress alignment• OCP• No labial/dorsal with high vowels

CMU

CELEX

• A set of unranked OT constraints generate implicational universals that reflect relative phonotactic markedness:

More marked forms entail less marked forms.

More marked forms surface less frequently.

• Sample universal:

If a language allows gag (violates OCP) it also allows gap.

Gap is always more frequent than gag.

• The implicational universals can be described graphically as a partial order.

• Precision (how many of the predicted relationships are correct): CMU 0.85

CELEX 0.86

Gradient OCP-place is active in all CVC syllables (not just monosyllabic words, cf. Berkley 1994).

Prominence alignment in CVC syllables:• The best stressed syllable has low or mid vowels.• The best unstressed syllable has high or reduced vowels and coronal consonants.

Positional neutralization and augmentation for vowels.Only positional neutralization for consonants.

References:Anttila, A. (2008). Gradient phonotactics and the Complexity Hypothesis. To appear in Natural Language and Linguistic Theory.Anttila, A. & Andrus, C. (2006). T-Orders. Ms., Stanford University.Baayen, R. H., Piepenbrock, R., & Gulikers, L. (1995). The CELEX Lexical Database (Release 2). Philadelphia, PA: Linguistic Data Consortium, University of Pennsylvania [Distributor].Berkley, D. (1994). Variability in Obligatory Contour Principle effects. CLS 30, pp. 1-12.Coetzee, A., & Pater, J. (2008). Weighted constraints and gradient restrictions on place co-occurrence in Muna and Arabic. To appear in Natural Language and Linguistic Theory.Weide, R. (1998). The CMU pronunciation dictionary (Release 0.6). Carnegie Mellon University. Available online at http://www.speech.cs.cmu.edu/cgi-bin/cmudict.

Constraints (significant regression factors):

OCP Avoid homorganic C1 and C2*x/a Avoid unstressed low vowels*X/I Avoid stressed high vowels*x/p_ Avoid labial/dorsal C1 in unstressed syllables*x/_p Avoid labial/dorsal C2 in unstressed syllables*p_/I Avoid labial/dorsal C1 + high vowel*i/_p/ Avoid high vowel + labial/dorsal C2Faith Do not change input segments

Graphic representation of implicational relationships in CELEX data.

0%

20%

40%

60%

80%

100%

Stressed Unstressed

Low

High

Dependent variable:• Log of the observed frequency.

Independent variable:• Log of the expected frequency.• Binary coded phonotactics factors:

1 – violates, 0 – does not violate.<0.01-2.9470.250-0.737Lab&Dor coda/Unstressed

<0.01-3.2680.251-0.820Lab&Dor onset/Unstressed

<0.001-4.5570.257-1.169Unstressed/Low

<0.001-6.7230.254-1.708Stressed/High

<0.001-4.7090.204-0.961OCP

<0.00111.1200.0770.857Expected

ptStd. ErrorCoefficientFactors

<0.01-2.9470.250-0.737Lab&Dor coda/Unstressed

<0.01-3.2680.251-0.820Lab&Dor onset/Unstressed

<0.001-4.5570.257-1.169Unstressed/Low

<0.001-6.7230.254-1.708Stressed/High

<0.001-4.7090.204-0.961OCP

<0.00111.1200.0770.857Expected

ptStd. ErrorCoefficientFactors

<0.05-2.430.196-0.477Lab&Dor coda/High

<0.01-3.3720.196-0.660Lab&Dor onset/High

<0.001-7.5910.199-1.513Unstressed/Low

<0.001-4.1510.2-1.830Stressed/High

<0.001-4.5990.16-0.783OCP

<0.0019.6890.0880.856Expected

ptStd. ErrorCoefficientFactors

<0.05-2.430.196-0.477Lab&Dor coda/High

<0.01-3.3720.196-0.660Lab&Dor onset/High

<0.001-7.5910.199-1.513Unstressed/Low

<0.001-4.1510.2-1.830Stressed/High

<0.001-4.5990.16-0.783OCP

<0.0019.6890.0880.856Expected

ptStd. ErrorCoefficientFactors

a. Low vowels with coronals.b. High vowels with labials or dorsals.

stressed > unstressedlow vowel > high vowellabial/dorsal > coronal