testing functional explanations of word order universalsstanford.edu/~mhahn2/cgi-bin/files/osf-cuny...
TRANSCRIPT
Testing Functional Explanations of Word Order Universals
Michael Hahn Richard FutrellStanford UC Irvine
1
(Greenberg 1963)2
U3: ‘Languages with dominant VSO order are alwaysprepositional.’
3
U3: ‘Languages with dominant VSO order are alwaysprepositional.’
U4: ‘With overwhelmingly greater than chancefrequency, languages with normal SOV order arepostpositional.’ 4
U3: ‘Languages with dominant VSO order are alwaysprepositional.’
U4: ‘With overwhelmingly greater than chancefrequency, languages with normal SOV order arepostpositional.’
`Relative position of adposition & noun ~relative position ofverb & object’
5
OV languages with postpositions
VO languages with prepositions
6Source: https://wals.info/feature/95A
7
Why do these universals hold?
Innate constraints on language, ‘Universal Grammar’? (Chomsky 1981)
Facilitation of language processing? (Dryer 1992, Hawkins 1994)
Make languages learnable? (Culbertson 2017)
8
Why do these universals hold?
Innate constraints on language, ‘Universal Grammar’? (Chomsky 1981)
Facilitation of language processing? (Dryer 1992, Hawkins 1994)
Approach: Test functional explanations by implementing efficiency measures, optimizing grammars, and checking whether universals hold in optimized grammars.
Make languages learnable? (Culbertson 2017)
9
Three Efficiency Measures
Dependency Length Minimization (Rijkhoff, 1986; Hawkins, 1994, 2003; Gibson 1998)
Surprisal (Gildea and Jaeger, 2015; Ferrer-i Cancho, 2017)
Parsability (Hawkins, 1994, 2003)
10
Three Efficiency Measures
Dependency Length Minimization (Rijkhoff, 1986; Hawkins, 1994, 2003; Gibson 1998)
11
12
Dependency Length Minimization: Dependencies are shorter than expected at random
(Futrell et al., 2015)Sentence Length
Dep
ende
ncy
Leng
thRandom orderings
Real English
Theoretical Optimum
Idea: In certain models, short dependencies reduce memory load (Gibson 1998)
13
(Futrell et al., 2015)
Dependency Length Minimization: Dependencies are shorter than expected at random
Idea: In certain models, short dependencies reduce memory load (Gibson 1998)
Argued to explain several of the Greenberg correlations (Rijkhoff, 1986; Hawkins, 1994, 2003)
Three Efficiency Measures
Dependency Length Minimization (Rijkhoff, 1986; Hawkins, 1994, 2003; Gibson 1998)
14
Three Efficiency Measures
Dependency Length Minimization (Rijkhoff, 1986; Hawkins, 1994, 2003; Gibson 1998)
21 1
15
Three Efficiency Measures
Dependency Length Minimization (Rijkhoff, 1986; Hawkins, 1994, 2003; Gibson 1998)
21 1+ + = 4
16
Three Efficiency MeasuresSurprisal
Surprisal(w1...wi-1) = -Σi log P(wi|w1...wi-1)
17
Three Efficiency MeasuresSurprisal
Surprisal(w1...wi-1) = -Σi log P(wi|w1...wi-1)
18
Rea
ding
Tim
e
Surprisal (Smith and Levy 2013)19
Three Efficiency MeasuresSurprisal
Surprisal(w1...wi-1) = -Σi log P(wi|w1...wi-1)
Estimated using recurrent neural networks, the strongest existing methods for estimating surprisal and predicting reading times (Frank 2011; Goodkind & Bicknell
2018).
20
Three Efficiency MeasuresParsability
Mary has two green books.
21
Three Efficiency MeasuresParsability
Mary has two green books.
Parsability(utterance) := log P(tree | utterance)
22
Three Efficiency MeasuresParsability
Mary has two green books.
Parsability(utterance) := log P(tree | utterance)
Estimated using a neural network model (Dozat and Manning 2017)
with extremely generic architecture.23
Utility Informativity Cost-=
Amount of Meaning that can be extracted from utterance
Cost of processing utterance
λ
Combining Parsability + Surprisal
24
Formalizes Zipf’s (1949) Forces of Diversification & Unification
Utility Informativity Cost-= λ
Combining Parsability + Surprisal
25
Formalizes Zipf’s (1949) Forces of Diversification & Unification
Amount of Meaning that can be extracted from utterance ~ Parsability
Cost of processing utterance
~ Surprisal
Utility Informativity Cost-= λ
Combining Parsability + Surprisal
26
Formalizes Zipf’s (1949) Forces of Diversification & Unification
λ can take values in (0,1)
Amount of Meaning that can be extracted from utterance ~ Parsability
Cost of processing utterance
~ Surprisal
Utility Informativity Cost-= λ
Combining Parsability + Surprisal
27
Formalizes Zipf’s (1949) Forces of Diversification & Unification
λ can take values in (0,1)
We will give similar weight to both factors (λ=0.9).
Amount of Meaning that can be extracted from utterance ~ Parsability
Cost of processing utterance
~ Surprisal
Utility Informativity Cost-=
Long tradition as an explanation of language (Gabelentz 1903, Zipf 1949, Horn 1984, …)
λ
Combining Parsability + Surprisal
28
Amount of Meaning that can be extracted from utterance ~ Parsability
Cost of processing utterance
~ Surprisal
Utility Informativity Cost-=Amount of Meaning that can be extracted from utterance ~ Parsability
Cost of processing utterance
~ Surprisal
Long tradition as an explanation of language (Gabelentz 1903, Zipf 1949, Horn 1984, …)
Formalized in Rational-Speech Acts models (Frank and Goodman 2012)
λ
Combining Parsability + Surprisal
29
Utility Informativity Cost-=
Long tradition as an explanation of language (Gabelentz 1903, Zipf 1949, Horn 1984, …)
Formalized in Rational-Speech Acts models (Frank and Goodman 2012)
Related to Signal Processing (Rate-Distortion Theory, Information Bottleneck)
λ
Combining Parsability + Surprisal
Amount of Meaning that can be extracted from utterance ~ Parsability
Cost of processing utterance
~ Surprisal
30
Why do the universals hold?
Innate constraints on language, ‘Universal Grammar’? (Chomsky 1981)
Facilitation of human communication? (Dryer 1992, Hawkins 1994)
Approach: Test processing explanations by implementing efficiency measures, optimizing grammars, and checking whether universals hold in optimized grammars.
Make languages learnable? (Culbertson 2017)
31
Testing Functional Explanations
Approach: Optimize the word orders of languages for the three objectives, keeping syntactic structures unchanged
32
Testing Functional Explanations
Approach: Optimize the word orders of languages for the three objectives, keeping syntactic structures unchanged
Languages have word order regularities ⇒ Not sufficient to optimize the word orders of individual sentences
33
Testing Functional Explanations
Approach: Optimize the word orders of languages for the three objectives, keeping syntactic structures unchanged
Languages have word order regularities ⇒ Not sufficient to optimize the word orders of individual sentences
Instead: optimize word order rules of entire languages
34
Testing Functional Explanations
Approach: Optimize the word orders of languages for the three objectives, keeping syntactic structures unchanged
Languages have word order regularities ⇒ Not sufficient to optimize the word orders of individual sentences
Instead: optimize word order rules of entire languages
That is: optimized languages have optimized but internally consistent grammatical regularities in word order, and agree with an actual natural language in all other respects.
35
Mary has two green books
nsubj
obj
nummod
amod
Dependency Corpus
36
Mary has two green books
nsubj
obj
nummod
amod
Mary
hastwo
greenbooks
Tree Topologies
Dependency Corpus
37
Mary has two green books
nsubj
obj
nummod
amod
Mary
hastwo
greenbooks
Tree Topologies
Dependency Corpus Ordering GrammarNOUN ADJamod
0.3
NOUN NUMnummod
VERB NOUNnsubj
VERB NOUNobj
...
0.7
-0.2
0.8
38
Mary has two green books
nsubj
obj
nummod
amod
Mary
hastwo
greenbooks
Tree Topologies
Dependency Corpus Ordering GrammarNOUN ADJamod
0.3
NOUN NUMnummod
VERB NOUNnsubj
VERB NOUNobj
...
0.7
-0.2
0.8
“Object follows verb”
39
Mary has two green books
nsubj
obj
nummod
amod
Mary
hastwo
greenbooks
Tree Topologies
Dependency Corpus Ordering GrammarNOUN ADJamod
0.3
NOUN NUMnummod
VERB NOUNnsubj
VERB NOUNobj
...
0.7
-0.2
0.8
“Adjective precedes noun”
“Object follows verb”
40
Mary has two green books
nsubj
obj
nummod
amod
Mary
hastwo
greenbooks
Tree Topologies
Dependency Corpus Ordering GrammarNOUN ADJamod
0.3
NOUN NUMnummod
VERB NOUNnsubj
VERB NOUNobj
...
0.7
-0.2
0.8
“Adjective precedes noun”
“Object follows verb”
“Numerals follow adjectives & precede nouns”
41
Mary has two green books
nsubj
obj
nummod
amod
Mary
hastwo
greenbooks
Tree Topologies
Maryhastwogreenbooks
Counterfactual Corpus
Dependency Corpus Ordering GrammarNOUN ADJamod
0.3
NOUN NUMnummod
VERB NOUNnsubj
VERB NOUNobj
...
0.7
-0.2
0.8
42
Mary has two green books
nsubj
obj
nummod
amod
Mary
hastwo
greenbooks
Tree Topologies
Maryhastwogreenbooks
Counterfactual Corpus
Dependency Corpus Ordering GrammarNOUN ADJamod
0.3
NOUN NUMnummod
VERB NOUNnsubj
VERB NOUNobj
...
0.7
-0.2
0.8
Each parameter setting generates a different counterfactual corpus.
43
Mary has two green books
nsubj
obj
nummod
amod
Mary
hastwo
greenbooks
Tree Topologies
Maryhastwogreen books
Counterfactual Corpus
Dependency Corpus Ordering GrammarNOUN ADJamod
0.9
NOUN NUMnummod
VERB NOUNnsubj
VERB NOUNobj
...
0.1
0.5
0.2
Each parameter setting generates a different counterfactual corpus.
44
Mary has two green books
nsubj
obj
nummod
amod
Mary
hastwo
greenbooks
Tree Topologies
Maryhas twogreenbooks
Counterfactual Corpus
Dependency Corpus Ordering GrammarNOUN ADJamod
0.1
NOUN NUMnummod
VERB NOUNnsubj
VERB NOUNobj
...
0.95
04.2
0.82
Each parameter setting generates a different counterfactual corpus.
45
Dependency Length Surprisal
Parsability
2.35.81.8
We compute processing measures on counterfactual corpora.
46
Dependency Length Surprisal
Parsability
2.35.81.8
Each parameter setting results in different values for the processing measures.
47
Dependency Length Surprisal
Parsability
2.94.52.9
Each parameter setting results in different values for the processing measures.
48
Dependency Length Surprisal
Parsability
3.47.81.2
Each parameter setting results in different values for the processing measures.
49
Dependency Length Surprisal
Parsability
3.47.81.2
Each parameter setting results in different values for the processing measures.
Which settings optimise the measures?
50
Dependency Length Surprisal
Parsability
3.47.81.2
Each parameter setting results in different values for the processing measures.
Which settings optimise the measures?
Do the optimised settings replicate the Greenberg correlations?
51
For each objective, find parameters that optimise it.
NOUN ADJamod0.1
NOUN NUMnummod
VERB NOUNnsubj
VERB NOUNobj
...
0.95
04.2
0.82
NOUN ADJamod0.1
NOUN NUMnummod
VERB NOUNnsubj
VERB NOUNobj
...
0.85
0.1
0.22
Minimize Dep. Length Minimize Surprisal
NOUN ADJamod0.1
NOUN NUMnummod
VERB NOUNnsubj
VERB NOUNobj
...
0.7
0.5
0.8
NOUN ADJamod0.21
NOUN NUMnummod
VERB NOUNnsubj
VERB NOUNobj
...
0.45
0.4
0.32
Maximize Parsability Optimize Pars.+Surp.
52
For each objective, find parameters that optimise it.
Repeat this for corpora from 51 real languages from Universal Dependencies Project.
NOUN ADJamod0.1
NOUN NUMnummod
VERB NOUNnsubj
VERB NOUNobj
...
0.95
04.2
0.82
NOUN ADJamod0.1
NOUN NUMnummod
VERB NOUNnsubj
VERB NOUNobj
...
0.85
0.1
0.22
Minimize Dep. Length Minimize Surprisal
NOUN ADJamod0.1
NOUN NUMnummod
VERB NOUNnsubj
VERB NOUNobj
...
0.7
0.5
0.8
NOUN ADJamod0.21
NOUN NUMnummod
VERB NOUNnsubj
VERB NOUNobj
...
0.45
0.4
0.32
Maximize Parsability Optimize Pars.+Surp.
53
For each objective, find parameters that optimise it.
Repeat this for corpora from 51 real languages from Universal Dependencies Project.
0.1
0.95
04.2
0.82
0.1
0.85
0.1
0.22
Minimize Dep. Length Minimize Surprisal
NOUN ADJamod0.1
NOUN NUMnummod
VERB NOUNnsubj
VERB NOUNobj
...
NOUN ADJ 0.1
NOUN NUMnummod
VERB NOUNnsubj
VERB NOUNobj
...
0.7
0.5
0.8
0.7
0.5
0.8
0.21
0.45
NOUN ADJ 0.1
NOUN
NOUN ADJ 0.1
NOUN NUMnummod
VERB NOUNnsubj
VERB NOUNobj
...
0.7
0.5
0.8
NUMnummod
VERB NOUNnsubj
VERB NOUNobj
...
0.7
0.5
0.8
0.4
0.32
Maximize Parsability Optimize Pars.+Surp.
1. How do the objectives compare?2. Which universals are predicted?
Minimize Dep. Length Minimize Surprisal
54
Surprisal and Parsability minimize Dependency Length
55
Surprisal and Parsability minimize Dependency Length
56
Surprisal and Parsability minimize Dependency Length
Functional Utility predicts Dependency Length Minimization.
57
Better Parsability
Lower Surprisal
z-transformed on the level of languages
Language optimizes Surprisal and Parsability
58
Better Parsability
Lower Surprisal
Random Grammars
Language optimizes Surprisal and Parsability
59
Better Parsability
Lower Surprisal
Random Grammars
Grammars fit to Real Orderings
Language optimizes Surprisal and Parsability
Better Parsability 60
Better Parsability
Lower Surprisal
Random Grammars
Optimized for Surprisal
Optimized for Parsability
Optimized for Parsability+Surprisal
Grammars fit to Real Orderings
Language optimizes Surprisal and Parsability
61
62
(Dryer 1992 in Language) 63
(Dryer 1992 in Language)
`Relative position of adposition & noun ~relative position ofverb & object’
64
We formalize the correlations in the Universal Dependencies format.
65
(Dryer 1992 in Language) 66
X
XX
67
We formalize the correlations in the Universal Dependencies format.
For any word order grammar, we can then check which correlations it satisfies.
68
69
`Relative position of adposition & noun ~relative position ofverb & object’
We formalize the correlations in the Universal Dependencies format.
For any word order grammar, we can then check which correlations it satisfies.
70
Are the universals satisfied by models fit to the actual orderings for our 51 languages?
%
71
%
72
`Relative position of adposition & noun ~relative position ofverb & object’
Are the universals satisfied by models fit to the actual orderings for our 51 languages?
%
73
Are the universals satisfied by models fit to the actual orderings for our 51 languages?
Prevalence of SVO (Dryer 1992)
Limitation of formalisation
%
74
75
Percentage of grammars optimized for each objective satisfying the universal
76
77
Percentage of grammars optimized for each objective satisfying the universal
Assessing Significance:X = “Object precedes verb”Y = “Object-patterner precedes verb-patterner”
Logistic model:Y ~ X + (1+X|family) + (1+X|language)
78
79
80
81
82
Predictions largely complementary
83
Predictions mostly agree
84
Predictions mostly agree
Functional Utility replicates predictions of Dependency Length Minimization.
85
Predictions mostly agree
Functional Utility replicates predictions of Dependency Length Minimization.Both measures predict most of the correlation universals.
86
Two ObjectivesUtility
21 1+ + = 4
Dependency Length Minimization
Two Objectives
Broad description of functional efficiency in general
21 1+ + = 4
Dependency Length Minimization
Particular component of complexity
Utility
Two Objectives
Broad description of functional efficiency in general
Utility2
1 1+ + = 4
Dependency Length Minimization
Particular component of complexity
Our results support the idea that Dependency Length Minimization emerges from optimizing for Parsability and Predictability (Futrell et al. 2017).
Why do these universals hold?
Innate constraints on language, ‘Universal Grammar’? (Chomsky 1981)
Facilitation of language processing? (Dryer 1992, Hawkins 1994)
Make languages learnable? (Culbertson 2017)
90
Why do these universals hold?
Innate constraints on language, ‘Universal Grammar’? (Chomsky 1981)
Facilitation of language processing? (Dryer 1992, Hawkins 1994)
Make languages learnable? (Culbertson 2017)
91
Why do these universals hold?
Innate constraints on language, ‘Universal Grammar’? (Chomsky 1981)
Facilitation of language processing? (Dryer 1992, Hawkins 1994)
Make languages learnable? (Culbertson 2017)
92
● These ideas need not be mutually exclusive
Why do these universals hold?
Innate constraints on language, ‘Universal Grammar’? (Chomsky 1981)
Facilitation of language processing? (Dryer 1992, Hawkins 1994)
Make languages learnable? (Culbertson 2017)
93
● These ideas need not be mutually exclusive
● If UG or learnability are relevant, our results suggest they may be tilted towards efficiency.
Conclusion
● Tested explanations of Greenberg correlation universals in terms of efficiency of human language processing
● Using corpora from 51 languages, constructed counterfactual optimized languages
● Most of the correlations can be derived from pressure to shorten dependencies, decrease surprisal, or increase parsability
● Clear evidence for functional explanations of word order universals
94
95