a computational model of s-selection · 14th may, 2016 johns hopkins university 1department of...

176
A computational model of S-selection Aaron Steven White 12 Kyle Rawlins 1 Semantics and Linguistic Theory 26 University of Texas, Austin 14 th May, 2016 Johns Hopkins University 1 Department of Cognitive Science 2 Center for Language and Speech Processing 2 Science of Learning Institute 1

Upload: others

Post on 14-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

A computational model of S-selection

Aaron Steven White 1 2 Kyle Rawlins 1

Semantics and Linguistic Theory 26University of Texas, Austin14th May, 2016

Johns Hopkins University1Department of Cognitive Science2Center for Language and Speech Processing2Science of Learning Institute

1

Page 2: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Slides available at aswhite.net

2

Page 3: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Introduction

Page 4: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Introduction

PreliminaryTraditional distributional analyses have had tremendous suc-cess in helping us understand S(emantic)-selection

S-selectionWhat type signatures does a predicate’s denotation have?

ChallengeThese analyses can be difficult to scale to an entire lexicon

4

Page 5: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Introduction

PreliminaryTraditional distributional analyses have had tremendous suc-cess in helping us understand S(emantic)-selection

S-selectionWhat type signatures does a predicate’s denotation have?

ChallengeThese analyses can be difficult to scale to an entire lexicon

4

Page 6: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Introduction

PreliminaryTraditional distributional analyses have had tremendous suc-cess in helping us understand S(emantic)-selection

S-selectionWhat type signatures does a predicate’s denotation have?

ChallengeThese analyses can be difficult to scale to an entire lexicon

4

Page 7: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Introduction

Goals

1. Demonstrate a combined experimental-computationalmethod for scaling distributional analysis

2. Show that this method provides insight into generalprinciples governing lexical semantic structure

Basic idea

1. Formalize S(emantic)-selection, projection rules, andlexical idiosyncrasy at Marr’s (1982) computational level

2. Collect data on ∼1000 verbs’ syntactic distributions3. Given syntactic distribution data, use computational

techniques to automate inference of projection rules andverbs’ semantic type, controlling for lexical idiosyncrasy

5

Page 8: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Introduction

Goals

1. Demonstrate a combined experimental-computationalmethod for scaling distributional analysis

2. Show that this method provides insight into generalprinciples governing lexical semantic structure

Basic idea

1. Formalize S(emantic)-selection, projection rules, andlexical idiosyncrasy at Marr’s (1982) computational level

2. Collect data on ∼1000 verbs’ syntactic distributions3. Given syntactic distribution data, use computational

techniques to automate inference of projection rules andverbs’ semantic type, controlling for lexical idiosyncrasy

5

Page 9: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Introduction

Goals

1. Demonstrate a combined experimental-computationalmethod for scaling distributional analysis

2. Show that this method provides insight into generalprinciples governing lexical semantic structure

Basic idea

1. Formalize S(emantic)-selection, projection rules, andlexical idiosyncrasy at Marr’s (1982) computational level

2. Collect data on ∼1000 verbs’ syntactic distributions3. Given syntactic distribution data, use computational

techniques to automate inference of projection rules andverbs’ semantic type, controlling for lexical idiosyncrasy

5

Page 10: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Introduction

Goals

1. Demonstrate a combined experimental-computationalmethod for scaling distributional analysis

2. Show that this method provides insight into generalprinciples governing lexical semantic structure

Basic idea

1. Formalize S(emantic)-selection, projection rules, andlexical idiosyncrasy at Marr’s (1982) computational level

2. Collect data on ∼1000 verbs’ syntactic distributions

3. Given syntactic distribution data, use computationaltechniques to automate inference of projection rules andverbs’ semantic type, controlling for lexical idiosyncrasy

5

Page 11: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Introduction

Goals

1. Demonstrate a combined experimental-computationalmethod for scaling distributional analysis

2. Show that this method provides insight into generalprinciples governing lexical semantic structure

Basic idea

1. Formalize S(emantic)-selection, projection rules, andlexical idiosyncrasy at Marr’s (1982) computational level

2. Collect data on ∼1000 verbs’ syntactic distributions3. Given syntactic distribution data, use computational

techniques to automate inference of projection rules andverbs’ semantic type, controlling for lexical idiosyncrasy 5

Page 12: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Introduction

Focus

Clause-embedding predicates (∼1000 in English)

Case studyResponsive predicates: take both interrogative and declaratives

(1) John knows {that, whether} it’s raining.

ImportanceDeep literature on S-selection properties of responsivesDo they take questions, propositions, or both? (Karttunen 1977, Groenendijk

& Stokhof 1984, Heim 1994, Ginzburg 1995, Lahiri 2002, George 2011, Rawlins 2013, Spector & Egre 2015, Uegaki 2015)

6

Page 13: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Introduction

Focus

Clause-embedding predicates (∼1000 in English)

Case studyResponsive predicates: take both interrogative and declaratives

(1) John knows {that, whether} it’s raining.

ImportanceDeep literature on S-selection properties of responsivesDo they take questions, propositions, or both? (Karttunen 1977, Groenendijk

& Stokhof 1984, Heim 1994, Ginzburg 1995, Lahiri 2002, George 2011, Rawlins 2013, Spector & Egre 2015, Uegaki 2015)

6

Page 14: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Introduction

Focus

Clause-embedding predicates (∼1000 in English)

Case studyResponsive predicates: take both interrogative and declaratives

(1) John knows {that, whether} it’s raining.

ImportanceDeep literature on S-selection properties of responsivesDo they take questions, propositions, or both? (Karttunen 1977, Groenendijk

& Stokhof 1984, Heim 1994, Ginzburg 1995, Lahiri 2002, George 2011, Rawlins 2013, Spector & Egre 2015, Uegaki 2015)

6

Page 15: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Outline

Introduction

Selection and clausal embedding

The MegaAttitude data set

Model fitting and results

Conclusions and future directions

Appendix

7

Page 16: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Selection and clausal embedding

Page 17: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Multiplicity

Many verbs are syntactically multiplicitous

(2) a. John knows {that, whether} it’s raining.b. John wants {it to rain, rain}.

Syntactic multiplicity does not imply semantic multiplicity

(3) a. John knows [what the answer is]S.b. John knows [the answer]NP.

J(3b)K = J(3a)K suggests it is possible for type(JNPK) = type(JSK)

9

Page 18: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Multiplicity

Many verbs are syntactically multiplicitous

(2) a. John knows {that, whether} it’s raining.b. John wants {it to rain, rain}.

Syntactic multiplicity does not imply semantic multiplicity

(3) a. John knows [what the answer is]S.b. John knows [the answer]NP.

J(3b)K = J(3a)K suggests it is possible for type(JNPK) = type(JSK)

9

Page 19: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Multiplicity

Many verbs are syntactically multiplicitous

(2) a. John knows {that, whether} it’s raining.b. John wants {it to rain, rain}.

Syntactic multiplicity does not imply semantic multiplicity

(3) a. John knows [what the answer is]S.b. John knows [the answer]NP.

J(3b)K = J(3a)K suggests it is possible for type(JNPK) = type(JSK)9

Page 20: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Projection

What do the projection rules look like?How are a verb’s semantic type signatures projected onto itssyntactic type signatures (subcategorization frames)? (Gruber 1965,

Jackendoff 1972, Carter 1976, Grimshaw 1979, 1990, Chomsky 1981, Pesetsky 1982, 1991, Pinker 1984, 1989, Levin 1993)

[ Q]⟨⟨⟨s,t⟩,t⟩, t⟩[ Q] (Grimshaw’s notation)(Montagovian notation)

[ S] [ NP]

Semantic type

Projection

Syntactic type

10

Page 21: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Projection

What do the projection rules look like?How are a verb’s semantic type signatures projected onto itssyntactic type signatures (subcategorization frames)? (Gruber 1965,

Jackendoff 1972, Carter 1976, Grimshaw 1979, 1990, Chomsky 1981, Pesetsky 1982, 1991, Pinker 1984, 1989, Levin 1993)

[ Q]⟨⟨⟨s,t⟩,t⟩, t⟩[ Q] (Grimshaw’s notation)(Montagovian notation)

[ S] [ NP]

Semantic type

Projection

Syntactic type

10

Page 22: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Projection

What do the projection rules look like?How are a verb’s semantic type signatures projected onto itssyntactic type signatures (subcategorization frames)? (Gruber 1965,

Jackendoff 1972, Carter 1976, Grimshaw 1979, 1990, Chomsky 1981, Pesetsky 1982, 1991, Pinker 1984, 1989, Levin 1993)

[ Q]⟨⟨⟨s,t⟩,t⟩, t⟩[ Q] (Grimshaw’s notation)(Montagovian notation)

[ S] [ NP]

Semantic type

Projection

Syntactic type

10

Page 23: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Projection

What do the projection rules look like?How are a verb’s semantic type signatures projected onto itssyntactic type signatures (subcategorization frames)? (Gruber 1965,

Jackendoff 1972, Carter 1976, Grimshaw 1979, 1990, Chomsky 1981, Pesetsky 1982, 1991, Pinker 1984, 1989, Levin 1993)

[ Q]⟨⟨⟨s,t⟩,t⟩, t⟩[ Q] (Grimshaw’s notation)(Montagovian notation)

[ S] [ NP]

Semantic type

Projection

Syntactic type

10

Page 24: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

A model of S-selection and projection

SemanticType

SyntacticDistributionIdealizedSyntactic

Distribution

ObservedSyntactic

Distribution

AcceptabilityJudgment

Data

ProjectionRules

LexicalNoiseNoiseModel

11

Page 25: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Lexical idiosyncrasy

Lexical idiosyncrasyObserved syntactic distributions are not a perfect reflection ofsemantic type + projection rules

Example

Some Q(uestion)-selecting verbs allow concealed questions...

(4) a. Mary asked what time it was.b. Mary asked the time.

...others do not (Grimshaw 1979, Pesetsky 1982, 1991, Nathan 2006, Frana 2010, a.o.)

(5) a. Mary wondered what time it was.b. *Mary wondered the time.

12

Page 26: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Lexical idiosyncrasy

Lexical idiosyncrasyObserved syntactic distributions are not a perfect reflection ofsemantic type + projection rules

Example

Some Q(uestion)-selecting verbs allow concealed questions...

(4) a. Mary asked what time it was.b. Mary asked the time.

...others do not (Grimshaw 1979, Pesetsky 1982, 1991, Nathan 2006, Frana 2010, a.o.)

(5) a. Mary wondered what time it was.b. *Mary wondered the time. 12

Page 27: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Two kinds of lexical idiosyncrasy

Grimshaw (1979)

Verbs are related to semantic type signatures (S-selection) andsyntactic type signatures (C-selection)

Pesetsky (1982, 1991)

Verbs are related to semantic type signatures (S-selection); C-selection is an epiphenomenon of verbs’ abstract case

Shared core

Lexical noise (idiosyncrasy) alters verbs’ idealized syntactic dis-tributions

13

Page 28: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Two kinds of lexical idiosyncrasy

Grimshaw (1979)

Verbs are related to semantic type signatures (S-selection) andsyntactic type signatures (C-selection)

Pesetsky (1982, 1991)

Verbs are related to semantic type signatures (S-selection); C-selection is an epiphenomenon of verbs’ abstract case

Shared core

Lexical noise (idiosyncrasy) alters verbs’ idealized syntactic dis-tributions

13

Page 29: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Two kinds of lexical idiosyncrasy

Grimshaw (1979)

Verbs are related to semantic type signatures (S-selection) andsyntactic type signatures (C-selection)

Pesetsky (1982, 1991)

Verbs are related to semantic type signatures (S-selection); C-selection is an epiphenomenon of verbs’ abstract case

Shared core

Lexical noise (idiosyncrasy) alters verbs’ idealized syntactic dis-tributions

13

Page 30: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

A model of S-selection and projection

SemanticType

SyntacticDistributionIdealizedSyntactic

Distribution

ObservedSyntactic

Distribution

AcceptabilityJudgment

Data

ProjectionRules

LexicalNoiseNoiseModel

14

Page 31: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Specifying the model

QuestionHow do we represent each object in the model?

A minimalistic answerEvery object is a matrix of boolean values

Strategy

1. Give model in terms of sets and functions2. Convert this model into a boolean matrix model

15

Page 32: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Specifying the model

QuestionHow do we represent each object in the model?

A minimalistic answerEvery object is a matrix of boolean values

Strategy

1. Give model in terms of sets and functions2. Convert this model into a boolean matrix model

15

Page 33: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Specifying the model

QuestionHow do we represent each object in the model?

A minimalistic answerEvery object is a matrix of boolean values

Strategy

1. Give model in terms of sets and functions

2. Convert this model into a boolean matrix model

15

Page 34: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Specifying the model

QuestionHow do we represent each object in the model?

A minimalistic answerEvery object is a matrix of boolean values

Strategy

1. Give model in terms of sets and functions2. Convert this model into a boolean matrix model

15

Page 35: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

A model of S-selection and projection

SemanticType

SyntacticDistributionIdealizedSyntactic

Distribution

ObservedSyntactic

Distribution

AcceptabilityJudgment

Data

ProjectionRules

LexicalNoiseNoiseModel

16

Page 36: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

A boolean model of S-selection

know → {[ P], [ Q]}think → {[ P]} wonder → {[ Q]}

S =

[ P] [ Q] · · ·

think 1 0 · · ·know 1 1 · · ·wonder 0 1 · · ·· · ·

...... . . .

17

Page 37: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

A boolean model of S-selection

know → {[ P], [ Q]}think → {[ P]} wonder → {[ Q]}

S =

[ P] [ Q] · · ·

think 1 0 · · ·know 1 1 · · ·wonder 0 1 · · ·· · ·

...... . . .

17

Page 38: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

A boolean model of S-selection

know → {[ P], [ Q]}think → {[ P]} wonder → {[ Q]}

S =

[ P] [ Q] · · ·

think 1 0 · · ·know 1 1 · · ·wonder 0 1 · · ·· · ·

...... . . .

17

Page 39: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

A boolean model of S-selection

know → {[ P], [ Q]}think → {[ P]} wonder → {[ Q]}

S =

[ P] [ Q] · · ·

think 1 0 · · ·know 1 1 · · ·wonder 0 1 · · ·· · ·

...... . . .

17

Page 40: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

A boolean model of projection

[ P] → {[ that S], [ NP], ...} [ Q] → {[ whether S], [ NP], ...}

Π =

[ that S] [ whether S] [ NP] · · ·

[ P] 1 0 1 · · ·[ Q] 0 1 1 · · ·

· · ·...

...... . . .

18

Page 41: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

A boolean model of projection

[ P] → {[ that S], [ NP], ...} [ Q] → {[ whether S], [ NP], ...}

Π =

[ that S] [ whether S] [ NP] · · ·

[ P] 1 0 1 · · ·[ Q] 0 1 1 · · ·

· · ·...

...... . . .

18

Page 42: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

A boolean model of idealized syntactic distribution

D̂(VERB, SYNTYPE) =∨

t∈SEMTYPES S(VERB, t) ∧Π(t, SYNTYPE)D̂(know, [ that S]) =∨

t∈{[ P],[ Q],...} S(know, t) ∧Π(t, [ that S])D̂(wonder, [ NP]) =∨

t∈{[ P],[ Q],...} S(wonder, t) ∧Π(t, [ NP])D̂(VERB, SYNTYPE) =∨

t∈SEMTYPES S(VERB, t) ∧Π(t, SYNTYPE)D̂(know, [ that S]) = 1−∏

t∈{[ P],[ Q],...} 1− S(know, t)×Π(t, [ that S])

[ P] [ Q] · · ·think 1 0 · · ·know 1 1 · · ·wonder 0 1 · · ·

· · ·...

.... . .

[ P] [ Q] · · ·think 0.94 0.03 · · ·know 0.97 0.91 · · ·wonder 0.17 0.93 · · ·

· · ·...

.... . .

[ that S] [ whether S] [ NP] · · ·[ P] 1 0 1 · · ·[ Q] 0 1 1 · · ·

· · ·...

......

. . .

[ that S] [ whether S] · · ·[ P] 0.99 0.12 · · ·[ Q] 0.07 0.98 · · ·

· · ·...

.... . .

[ that S] [ whether S] [ NP] · · ·think 1 0 1 · · ·know 1 1 1 · · ·wonder 0 1 1 · · ·

· · ·...

......

. . .

[ that S] [ whether S] · · ·think 0.97 0.14 · · ·know 0.95 0.99 · · ·wonder 0.12 0.99 · · ·

· · ·...

.... . .

19

Page 43: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

A boolean model of idealized syntactic distribution

D̂(VERB, SYNTYPE) =∨

t∈SEMTYPES S(VERB, t) ∧Π(t, SYNTYPE)D̂(know, [ that S]) =∨

t∈{[ P],[ Q],...} S(know, t) ∧Π(t, [ that S])D̂(wonder, [ NP]) =∨

t∈{[ P],[ Q],...} S(wonder, t) ∧Π(t, [ NP])D̂(VERB, SYNTYPE) =∨

t∈SEMTYPES S(VERB, t) ∧Π(t, SYNTYPE)D̂(know, [ that S]) = 1−∏

t∈{[ P],[ Q],...} 1− S(know, t)×Π(t, [ that S])

[ P] [ Q] · · ·think 1 0 · · ·know 1 1 · · ·wonder 0 1 · · ·

· · ·...

.... . .

[ P] [ Q] · · ·think 0.94 0.03 · · ·know 0.97 0.91 · · ·wonder 0.17 0.93 · · ·

· · ·...

.... . .

[ that S] [ whether S] [ NP] · · ·[ P] 1 0 1 · · ·[ Q] 0 1 1 · · ·

· · ·...

......

. . .

[ that S] [ whether S] · · ·[ P] 0.99 0.12 · · ·[ Q] 0.07 0.98 · · ·

· · ·...

.... . .

[ that S] [ whether S] [ NP] · · ·think 1 0 1 · · ·know 1 1 1 · · ·wonder 0 1 1 · · ·

· · ·...

......

. . .

[ that S] [ whether S] · · ·think 0.97 0.14 · · ·know 0.95 0.99 · · ·wonder 0.12 0.99 · · ·

· · ·...

.... . .

19

Page 44: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

A boolean model of idealized syntactic distribution

D̂(VERB, SYNTYPE) =∨

t∈SEMTYPES S(VERB, t) ∧Π(t, SYNTYPE)D̂(know, [ that S]) =∨

t∈{[ P],[ Q],...} S(know, t) ∧Π(t, [ that S])D̂(wonder, [ NP]) =∨

t∈{[ P],[ Q],...} S(wonder, t) ∧Π(t, [ NP])D̂(VERB, SYNTYPE) =∨

t∈SEMTYPES S(VERB, t) ∧Π(t, SYNTYPE)D̂(know, [ that S]) = 1−∏

t∈{[ P],[ Q],...} 1− S(know, t)×Π(t, [ that S])

[ P] [ Q] · · ·think 1 0 · · ·know 1 1 · · ·wonder 0 1 · · ·

· · ·...

.... . .

[ P] [ Q] · · ·think 0.94 0.03 · · ·know 0.97 0.91 · · ·wonder 0.17 0.93 · · ·

· · ·...

.... . .

[ that S] [ whether S] [ NP] · · ·[ P] 1 0 1 · · ·[ Q] 0 1 1 · · ·

· · ·...

......

. . .

[ that S] [ whether S] · · ·[ P] 0.99 0.12 · · ·[ Q] 0.07 0.98 · · ·

· · ·...

.... . .

[ that S] [ whether S] [ NP] · · ·think 1 0 1 · · ·know 1 1 1 · · ·wonder 0 1 1 · · ·

· · ·...

......

. . .

[ that S] [ whether S] · · ·think 0.97 0.14 · · ·know 0.95 0.99 · · ·wonder 0.12 0.99 · · ·

· · ·...

.... . .

19

Page 45: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

A boolean model of idealized syntactic distribution

D̂(VERB, SYNTYPE) =∨

t∈SEMTYPES S(VERB, t) ∧Π(t, SYNTYPE)D̂(know, [ that S]) =∨

t∈{[ P],[ Q],...} S(know, t) ∧Π(t, [ that S])D̂(wonder, [ NP]) =∨

t∈{[ P],[ Q],...} S(wonder, t) ∧Π(t, [ NP])D̂(VERB, SYNTYPE) =∨

t∈SEMTYPES S(VERB, t) ∧Π(t, SYNTYPE)D̂(know, [ that S]) = 1−∏

t∈{[ P],[ Q],...} 1− S(know, t)×Π(t, [ that S])

[ P] [ Q] · · ·think 1 0 · · ·know 1 1 · · ·wonder 0 1 · · ·

· · ·...

.... . .

[ P] [ Q] · · ·think 0.94 0.03 · · ·know 0.97 0.91 · · ·wonder 0.17 0.93 · · ·

· · ·...

.... . .

[ that S] [ whether S] [ NP] · · ·[ P] 1 0 1 · · ·[ Q] 0 1 1 · · ·

· · ·...

......

. . .

[ that S] [ whether S] · · ·[ P] 0.99 0.12 · · ·[ Q] 0.07 0.98 · · ·

· · ·...

.... . .

[ that S] [ whether S] [ NP] · · ·think 1 0 1 · · ·know 1 1 1 · · ·wonder 0 1 1 · · ·

· · ·...

......

. . .

[ that S] [ whether S] · · ·think 0.97 0.14 · · ·know 0.95 0.99 · · ·wonder 0.12 0.99 · · ·

· · ·...

.... . .

19

Page 46: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

A boolean model of idealized syntactic distribution

D̂(VERB, SYNTYPE) =∨

t∈SEMTYPES S(VERB, t) ∧Π(t, SYNTYPE)D̂(know, [ that S]) =∨

t∈{[ P],[ Q],...} S(know, t) ∧Π(t, [ that S])D̂(wonder, [ NP]) =∨

t∈{[ P],[ Q],...} S(wonder, t) ∧Π(t, [ NP])D̂(VERB, SYNTYPE) =∨

t∈SEMTYPES S(VERB, t) ∧Π(t, SYNTYPE)D̂(know, [ that S]) = 1−∏

t∈{[ P],[ Q],...} 1− S(know, t)×Π(t, [ that S])

[ P] [ Q] · · ·think 1 0 · · ·know 1 1 · · ·wonder 0 1 · · ·

· · ·...

.... . .

[ P] [ Q] · · ·think 0.94 0.03 · · ·know 0.97 0.91 · · ·wonder 0.17 0.93 · · ·

· · ·...

.... . .

[ that S] [ whether S] [ NP] · · ·[ P] 1 0 1 · · ·[ Q] 0 1 1 · · ·

· · ·...

......

. . .

[ that S] [ whether S] · · ·[ P] 0.99 0.12 · · ·[ Q] 0.07 0.98 · · ·

· · ·...

.... . .

[ that S] [ whether S] [ NP] · · ·think 1 0 1 · · ·know 1 1 1 · · ·wonder 0 1 1 · · ·

· · ·...

......

. . .

[ that S] [ whether S] · · ·think 0.97 0.14 · · ·know 0.95 0.99 · · ·wonder 0.12 0.99 · · ·

· · ·...

.... . .

19

Page 47: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

A boolean model of idealized syntactic distribution

D̂(VERB, SYNTYPE) =∨

t∈SEMTYPES S(VERB, t) ∧Π(t, SYNTYPE)D̂(know, [ that S]) =∨

t∈{[ P],[ Q],...} S(know, t) ∧Π(t, [ that S])D̂(wonder, [ NP]) =∨

t∈{[ P],[ Q],...} S(wonder, t) ∧Π(t, [ NP])D̂(VERB, SYNTYPE) =∨

t∈SEMTYPES S(VERB, t) ∧Π(t, SYNTYPE)D̂(know, [ that S]) = 1−∏

t∈{[ P],[ Q],...} 1− S(know, t)×Π(t, [ that S])

[ P] [ Q] · · ·think 1 0 · · ·know 1 1 · · ·wonder 0 1 · · ·

· · ·...

.... . .

[ P] [ Q] · · ·think 0.94 0.03 · · ·know 0.97 0.91 · · ·wonder 0.17 0.93 · · ·

· · ·...

.... . .

[ that S] [ whether S] [ NP] · · ·[ P] 1 0 1 · · ·[ Q] 0 1 1 · · ·

· · ·...

......

. . .

[ that S] [ whether S] · · ·[ P] 0.99 0.12 · · ·[ Q] 0.07 0.98 · · ·

· · ·...

.... . .

[ that S] [ whether S] [ NP] · · ·think 1 0 1 · · ·know 1 1 1 · · ·wonder 0 1 1 · · ·

· · ·...

......

. . .

[ that S] [ whether S] · · ·think 0.97 0.14 · · ·know 0.95 0.99 · · ·wonder 0.12 0.99 · · ·

· · ·...

.... . .

19

Page 48: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

A model of S-selection and projection

SemanticType

SyntacticDistributionIdealizedSyntactic

Distribution

ObservedSyntactic

Distribution

AcceptabilityJudgment

Data

ProjectionRules

LexicalNoiseNoiseModel

20

Page 49: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

A boolean model of observed syntactic distribution

∀t ∈ SYNTYPE : D(wonder, t) = D̂(wonder, t) ∧ N(wonder, t)

[ that S] [ whether S] [ NP] · · ·think 1 0 1 · · ·know 1 1 1 · · ·wonder 0 1 1 · · ·

· · ·...

......

. . .

[ that S] [ whether S] [ NP] · · ·think 1 1 1 · · ·know 1 1 1 · · ·wonder 1 1 0 · · ·

· · ·...

......

. . .

[ that S] [ whether S] [ NP] · · ·think 1 0 1 · · ·know 1 1 1 · · ·wonder 0 1 0 · · ·

· · ·...

......

. . .

21

Page 50: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

A boolean model of observed syntactic distribution

∀t ∈ SYNTYPE : D(wonder, t) = D̂(wonder, t) ∧ N(wonder, t)

[ that S] [ whether S] [ NP] · · ·think 1 0 1 · · ·know 1 1 1 · · ·wonder 0 1 1 · · ·

· · ·...

......

. . .

[ that S] [ whether S] [ NP] · · ·think 1 1 1 · · ·know 1 1 1 · · ·wonder 1 1 0 · · ·

· · ·...

......

. . .

[ that S] [ whether S] [ NP] · · ·think 1 0 1 · · ·know 1 1 1 · · ·wonder 0 1 0 · · ·

· · ·...

......

. . .

21

Page 51: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

A boolean model of observed syntactic distribution

∀t ∈ SYNTYPE : D(wonder, t) = D̂(wonder, t) ∧ N(wonder, t)

[ that S] [ whether S] [ NP] · · ·think 1 0 1 · · ·know 1 1 1 · · ·wonder 0 1 1 · · ·

· · ·...

......

. . .

[ that S] [ whether S] [ NP] · · ·think 1 1 1 · · ·know 1 1 1 · · ·wonder 1 1 0 · · ·

· · ·...

......

. . .

[ that S] [ whether S] [ NP] · · ·think 1 0 1 · · ·know 1 1 1 · · ·wonder 0 1 0 · · ·

· · ·...

......

. . .

21

Page 52: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

A boolean model of observed syntactic distribution

∀t ∈ SYNTYPE : D(wonder, t) = D̂(wonder, t) ∧ N(wonder, t)

[ that S] [ whether S] [ NP] · · ·think 1 0 1 · · ·know 1 1 1 · · ·wonder 0 1 1 · · ·

· · ·...

......

. . .

[ that S] [ whether S] [ NP] · · ·think 1 1 1 · · ·know 1 1 1 · · ·wonder 1 1 0 · · ·

· · ·...

......

. . .

[ that S] [ whether S] [ NP] · · ·think 1 0 1 · · ·know 1 1 1 · · ·wonder 0 1 0 · · ·

· · ·...

......

. . .

21

Page 53: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Animating abstractions

QuestionWhat is this model useful for?

AnswerIn conjunction withmodern computational techniques, thismodelallow us to scale distributional analysis to an entire lexicon

Basic ideaDistributional analysis corresponds to reversing model arrows

22

Page 54: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

A model of S-selection and projection

SemanticType

SyntacticDistributionIdealizedSyntactic

Distribution

ObservedSyntactic

Distribution

AcceptabilityJudgment

Data

ProjectionRules

LexicalNoiseNoiseModel

23

Page 55: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

A model of S-selection and projection

SemanticType

SyntacticDistributionIdealizedSyntactic

Distribution

ObservedSyntactic

Distribution

AcceptabilityJudgment

Data

ProjectionRules

LexicalNoiseNoiseModel

23

Page 56: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

The MegaAttitude data set

Page 57: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

MegaAttitude materials

Ordinal (1-7 scale) acceptability ratings

for1000 clause-embedding verbs

×50 syntactic frames

25

Page 58: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

MegaAttitude materials

Ordinal (1-7 scale) acceptability ratingsfor

1000 clause-embedding verbs

×50 syntactic frames

25

Page 59: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Verb selection

26

Page 60: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

MegaAttitude materials

Ordinal (1-7 scale) acceptability ratingsfor

1000 clause-embedding verbs×

50 syntactic frames

27

Page 61: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Sentence construction

ChallengeAutomate construction of a very large set of frames in a way thatis sufficiently general to many verbs

SolutionConstruct semantically bleached frames using indefinites

(6) Examples of responsivesa. know + NP V {that, whether} S

Someone knew {that, whether} somethinghappened.

b. tell + NP V NP {that, whether} SSomeone told someone {that, whether} somethinghappened.

28

Page 62: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Frame construction

Syntactic type

NP PP S

[ NP] [ PP] [ NP S] [ S][ NP PP] [ PP S]

ACTIVE PASSIVE COMP TENSE

that [+Q] for ∅

whether which NP

[+FIN] [-FIN]

-ed would to ∅ -ing

29

Page 63: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Frame construction

Syntactic type

NP PP S

[ NP] [ PP] [ NP S] [ S][ NP PP] [ PP S]

ACTIVE PASSIVE COMP TENSE

that [+Q] for ∅

whether which NP

[+FIN] [-FIN]

-ed would to ∅ -ing

29

Page 64: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Frame construction

Syntactic type

NP PP S

[ NP] [ PP] [ NP S] [ S][ NP PP] [ PP S]

ACTIVE PASSIVE COMP TENSE

that [+Q] for ∅

whether which NP

[+FIN] [-FIN]

-ed would to ∅ -ing

29

Page 65: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Frame construction

Syntactic type

NP PP S

[ NP] [ PP] [ NP S] [ S][ NP PP] [ PP S]

ACTIVE PASSIVE COMP TENSE

that [+Q] for ∅

whether which NP

[+FIN] [-FIN]

-ed would to ∅ -ing

29

Page 66: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Frame construction

Syntactic type

NP PP S

[ NP] [ PP] [ NP S] [ S][ NP PP] [ PP S]

ACTIVE PASSIVE COMP TENSE

that [+Q] for ∅

whether which NP

[+FIN] [-FIN]

-ed would to ∅ -ing

29

Page 67: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Frame construction

Syntactic type

NP PP S

[ NP] [ PP] [ NP S] [ S][ NP PP] [ PP S]

ACTIVE PASSIVE COMP TENSE

that [+Q] for ∅

whether which NP

[+FIN] [-FIN]

-ed would to ∅ -ing

29

Page 68: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Frame construction

Syntactic type

NP PP S

[ NP] [ PP] [ NP S] [ S][ NP PP] [ PP S]

ACTIVE PASSIVE COMP TENSE

that [+Q] for ∅

whether which NP

[+FIN] [-FIN]

-ed would to ∅ -ing

29

Page 69: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Frame construction

Syntactic type

NP PP S

[ NP] [ PP] [ NP S] [ S][ NP PP] [ PP S]

ACTIVE PASSIVE COMP TENSE

that [+Q] for ∅

whether which NP

[+FIN] [-FIN]

-ed would to ∅ -ing

29

Page 70: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Frame construction

Syntactic type

NP PP S

[ NP] [ PP] [ NP S] [ S][ NP PP] [ PP S]

ACTIVE PASSIVE COMP TENSE

that [+Q] for ∅

whether which NP

[+FIN] [-FIN]

-ed would to ∅ -ing

29

Page 71: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Frame construction

Syntactic type

NP PP S

[ NP] [ PP] [ NP S] [ S][ NP PP] [ PP S]

ACTIVE PASSIVE COMP TENSE

that [+Q] for ∅

whether which NP

[+FIN] [-FIN]

-ed would to ∅ -ing

29

Page 72: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Sentence construction

ChallengeAutomate construction of a very large set of frames in a way thatis sufficiently general to many verbs

SolutionConstruct semantically bleached frames using indefinites

(6) Examples of responsivesa. know + NP V {that, whether} S

Someone knew {that, whether} somethinghappened.

b. tell + NP V NP {that, whether} SSomeone told someone {that, whether} somethinghappened.

30

Page 73: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Sentence construction

ChallengeAutomate construction of a very large set of frames in a way thatis sufficiently general to many verbs

SolutionConstruct semantically bleached frames using indefinites

(6) Examples of responsivesa. know + NP V {that, whether} S

Someone knew {that, whether} somethinghappened.

b. tell + NP V NP {that, whether} SSomeone told someone {that, whether} somethinghappened.

30

Page 74: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Sentence construction

ChallengeAutomate construction of a very large set of frames in a way thatis sufficiently general to many verbs

SolutionConstruct semantically bleached frames using indefinites

(6) Examples of responsivesa. know + NP V {that, whether} S

Someone knew {that, whether} somethinghappened.

b. tell + NP V NP {that, whether} SSomeone told someone {that, whether} somethinghappened. 30

Page 75: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Sentence construction

ChallengeAutomate construction of a very large set of frames in a way thatis sufficiently general to many verbs

SolutionConstruct semantically bleached frames using indefinites

(6) Examples of responsivesa. know + NP V {that, whether} S

Someone knew {that, whether} somethinghappened.

b. tell + NP V NP {that, whether} SSomeone told someone {that, whether} somethinghappened. 30

Page 76: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Data collection

• 1,000 verbs × 50 syntactic frames = 50,000 sentences

• 1,000 lists of 50 items each• Each verb only once per list• Each frame only once per list

• 727 unique Mechanical Turk participants• Annotators allowed to do multiple lists, but never thesame list twice

• 5 judgments per item• No annotator sees the same sentence more than once

31

Page 77: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Data collection

• 1,000 verbs × 50 syntactic frames = 50,000 sentences• 1,000 lists of 50 items each

• Each verb only once per list• Each frame only once per list

• 727 unique Mechanical Turk participants• Annotators allowed to do multiple lists, but never thesame list twice

• 5 judgments per item• No annotator sees the same sentence more than once

31

Page 78: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Data collection

• 1,000 verbs × 50 syntactic frames = 50,000 sentences• 1,000 lists of 50 items each

• Each verb only once per list

• Each frame only once per list

• 727 unique Mechanical Turk participants• Annotators allowed to do multiple lists, but never thesame list twice

• 5 judgments per item• No annotator sees the same sentence more than once

31

Page 79: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Data collection

• 1,000 verbs × 50 syntactic frames = 50,000 sentences• 1,000 lists of 50 items each

• Each verb only once per list• Each frame only once per list

• 727 unique Mechanical Turk participants• Annotators allowed to do multiple lists, but never thesame list twice

• 5 judgments per item• No annotator sees the same sentence more than once

31

Page 80: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Data collection

• 1,000 verbs × 50 syntactic frames = 50,000 sentences• 1,000 lists of 50 items each

• Each verb only once per list• Each frame only once per list

• 727 unique Mechanical Turk participants

• Annotators allowed to do multiple lists, but never thesame list twice

• 5 judgments per item• No annotator sees the same sentence more than once

31

Page 81: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Data collection

• 1,000 verbs × 50 syntactic frames = 50,000 sentences• 1,000 lists of 50 items each

• Each verb only once per list• Each frame only once per list

• 727 unique Mechanical Turk participants• Annotators allowed to do multiple lists, but never thesame list twice

• 5 judgments per item• No annotator sees the same sentence more than once

31

Page 82: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Data collection

• 1,000 verbs × 50 syntactic frames = 50,000 sentences• 1,000 lists of 50 items each

• Each verb only once per list• Each frame only once per list

• 727 unique Mechanical Turk participants• Annotators allowed to do multiple lists, but never thesame list twice

• 5 judgments per item

• No annotator sees the same sentence more than once

31

Page 83: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Data collection

• 1,000 verbs × 50 syntactic frames = 50,000 sentences• 1,000 lists of 50 items each

• Each verb only once per list• Each frame only once per list

• 727 unique Mechanical Turk participants• Annotators allowed to do multiple lists, but never thesame list twice

• 5 judgments per item• No annotator sees the same sentence more than once

31

Page 84: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Task

Turktools (Erlewine & Kotek 2015)

32

Page 85: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Validating the data

Interannotator agreementSpearman rank correlation calculated by list on a pilot 30 verbs

Pilot verb selection

Same verbs used by White (2015), White et al. (2015), selectedbased on Hacquard & Wellwood’s (2012) attitude verb classifi-cation

1. Linguist-to-linguistmedian: 0.70, 95% CI: [0.62, 0.78]

2. Linguist-to-annotatormedian: 0.55, 95% CI: [0.52, 0.58]

3. Annotator-to-annotatormedian: 0.56, 95% CI: [0.53, 0.59]

33

Page 86: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Results

1

2

3

4

5

6

7

1 2 3 4 5 6 7NP V S

NPVwh

ethe

rS

34

Page 87: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Results

know

think

want

wonder

1

2

3

4

5

6

7

1 2 3 4 5 6 7NP V S

NPVwh

ethe

rS

35

Page 88: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Model fitting and results

Page 89: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

A model of S-selection and projection

SemanticType

SyntacticDistributionIdealizedSyntactic

Distribution

ObservedSyntactic

Distribution

AcceptabilityJudgment

Data

ProjectionRules

LexicalNoiseNoiseModel

37

Page 90: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

A model of S-selection and projection

SemanticType

SyntacticDistributionIdealizedSyntactic

Distribution

ObservedSyntactic

Distribution

AcceptabilityJudgment

Data

ProjectionRules

LexicalNoiseNoiseModel

37

Page 91: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

A model of S-selection and projection

SemanticType

SyntacticDistributionIdealizedSyntactic

Distribution

ObservedSyntactic

Distribution

AcceptabilityJudgment

Data

ProjectionRules

LexicalNoiseNoiseModel

37

Page 92: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Fitting the model

GoalFind representations of verbs’ semantic type signatures andprojection rules that best explain the acceptability judgments

Challenges

1. Infeasible to search over 21000T × 250T possibleconfigurations (T = # of type signatures)

2. Finding the best boolean model fails to captureuncertainty inherent in judgment data

38

Page 93: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Fitting the model

GoalFind representations of verbs’ semantic type signatures andprojection rules that best explain the acceptability judgments

Challenges

1. Infeasible to search over 21000T × 250T possibleconfigurations (T = # of type signatures)

2. Finding the best boolean model fails to captureuncertainty inherent in judgment data

38

Page 94: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Fitting the model

SolutionSearch probability distributions over verbs’ semantic type sig-natures and projection rules

Going probabilisticWrap boolean expressions in probability measures

39

Page 95: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Fitting the model

SolutionSearch probability distributions over verbs’ semantic type sig-natures and projection rules

Going probabilisticWrap boolean expressions in probability measures

39

Page 96: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

A boolean model of idealized syntactic distribution

D̂(VERB, SYNTYPE) =∨

t∈SEMTYPES S(VERB, t) ∧Π(t, SYNTYPE)D̂(know, [ that S]) =∨

t∈{[ P],[ Q],...} S(know, t) ∧Π(t, [ that S])D̂(wonder, [ NP]) =∨

t∈{[ P],[ Q],...} S(wonder, t) ∧Π(t, [ NP])D̂(VERB, SYNTYPE) =∨

t∈SEMTYPES S(VERB, t) ∧Π(t, SYNTYPE)D̂(know, [ that S]) = 1−∏

t∈{[ P],[ Q],...} 1− S(know, t)×Π(t, [ that S])

[ P] [ Q] · · ·think 1 0 · · ·know 1 1 · · ·wonder 0 1 · · ·

· · ·...

.... . .

[ P] [ Q] · · ·think 0.94 0.03 · · ·know 0.97 0.91 · · ·wonder 0.17 0.93 · · ·

· · ·...

.... . .

[ that S] [ whether S] [ NP] · · ·[ P] 1 0 1 · · ·[ Q] 0 1 1 · · ·

· · ·...

......

. . .

[ that S] [ whether S] · · ·[ P] 0.99 0.12 · · ·[ Q] 0.07 0.98 · · ·

· · ·...

.... . .

[ that S] [ whether S] [ NP] · · ·think 1 0 1 · · ·know 1 1 1 · · ·wonder 0 1 1 · · ·

· · ·...

......

. . .

[ that S] [ whether S] · · ·think 0.97 0.14 · · ·know 0.95 0.99 · · ·wonder 0.12 0.99 · · ·

· · ·...

.... . .

40

Page 97: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

A boolean model of idealized syntactic distribution

D̂(VERB, SYNTYPE) =∨

t∈SEMTYPES S(VERB, t) ∧Π(t, SYNTYPE)D̂(know, [ that S]) =∨

t∈{[ P],[ Q],...} S(know, t) ∧Π(t, [ that S])D̂(wonder, [ NP]) =∨

t∈{[ P],[ Q],...} S(wonder, t) ∧Π(t, [ NP])D̂(VERB, SYNTYPE) =∨

t∈SEMTYPES S(VERB, t) ∧Π(t, SYNTYPE)D̂(know, [ that S]) = 1−∏

t∈{[ P],[ Q],...} 1− S(know, t)×Π(t, [ that S])

[ P] [ Q] · · ·think 1 0 · · ·know 1 1 · · ·wonder 0 1 · · ·

· · ·...

.... . .

[ P] [ Q] · · ·think 0.94 0.03 · · ·know 0.97 0.91 · · ·wonder 0.17 0.93 · · ·

· · ·...

.... . .

[ that S] [ whether S] [ NP] · · ·[ P] 1 0 1 · · ·[ Q] 0 1 1 · · ·

· · ·...

......

. . .

[ that S] [ whether S] · · ·[ P] 0.99 0.12 · · ·[ Q] 0.07 0.98 · · ·

· · ·...

.... . .

[ that S] [ whether S] [ NP] · · ·think 1 0 1 · · ·know 1 1 1 · · ·wonder 0 1 1 · · ·

· · ·...

......

. . .

[ that S] [ whether S] · · ·think 0.97 0.14 · · ·know 0.95 0.99 · · ·wonder 0.12 0.99 · · ·

· · ·...

.... . .

40

Page 98: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Wrapping with probabilities

P(S[VERB, t] ∧Π[t, SYNTYPE]) = P(S[VERB, t])P(Π[t, SYNTYPE] | S[VERB, t])= P(S[VERB, t])P(Π[t, SYNTYPE])

P

(∨tS[VERB, t] ∧Π[t, SYNTYPE]

)= P

(¬∧t¬(S[VERB, t] ∧Π[t, SYNTYPE])

)

= 1− P

(∧t¬(S[VERB, t] ∧Π[t, SYNTYPE])

)= 1−

∏tP (¬(S[VERB, t] ∧Π[t, SYNTYPE]))

= 1−∏t1− P (S[VERB, t] ∧Π[t, SYNTYPE])

= 1−∏t1− P(S[VERB, t])P(Π[t, SYNTYPE])

41

Page 99: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Fitting the model

AlgorithmProjected gradient descent with adaptive gradient (Duchi et al. 2011)

Remaining challengeDon’t know the number of type signatures T

Standard solutionFit the model with many type signatures and compare using aninformation criterion, e.g., the Akaike Information Criterion (AIC)

42

Page 100: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Fitting the model

AlgorithmProjected gradient descent with adaptive gradient (Duchi et al. 2011)

Remaining challengeDon’t know the number of type signatures T

Standard solutionFit the model with many type signatures and compare using aninformation criterion, e.g., the Akaike Information Criterion (AIC)

42

Page 101: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Fitting the model

AlgorithmProjected gradient descent with adaptive gradient (Duchi et al. 2011)

Remaining challengeDon’t know the number of type signatures T

Standard solutionFit the model with many type signatures and compare using aninformation criterion, e.g., the Akaike Information Criterion (AIC)

42

Page 102: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Akaike Information Criterion

High-level ideaMeasures the information theoretic “distance” to the truemodelfrom the best model with T types signatures (Akaike 1974)

Low-level idea (cf. Gelman et al. 2013)

For each datapoint...

1. ...remove that datapoint from the dataset2. ...fit the model to the remaining data3. ...predict the held-out datapoint

In the limit, model with lowest error on step 3 has lowest AIC

43

Page 103: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Akaike Information Criterion

High-level ideaMeasures the information theoretic “distance” to the truemodelfrom the best model with T types signatures (Akaike 1974)

Low-level idea (cf. Gelman et al. 2013)

For each datapoint...

1. ...remove that datapoint from the dataset2. ...fit the model to the remaining data3. ...predict the held-out datapoint

In the limit, model with lowest error on step 3 has lowest AIC

43

Page 104: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Fitting the model

Result12 is the optimal number of type signatures according to AIC

Reporting findingsRemainder of talk: best model with 12 type signatures

44

Page 105: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Findings

Three findings

1. Cognitive predicates1.1 Two distinct type signatures [ P] and [ Q]

1.2 Coercion of [ P] to [ Q] and [ Q] to [ P]

2. Communicative predicates2.1 Two unified type signatures [ (Ent) P⊕Q] (optional

recipient) and [ Ent P⊕Q] (obligatory recipient)

45

Page 106: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Findings

[ P] [ Q]

[ that S] [ whether S]

[ (Ent) P⊕Q][ Ent P⊕Q]

[ to NP that S] [ to NP whether S][ NP that S] [ NP whether S]

46

Page 107: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Findings

[ P] [ Q]

[ that S] [ whether S]

[ (Ent) P⊕Q][ Ent P⊕Q]

[ to NP that S] [ to NP whether S][ NP that S] [ NP whether S]

46

Page 108: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Findings

Three findings

1. Cognitive predicates1.1 Two distinct type signatures [ P] and [ Q]1.2 Coercion of [ P] to [ Q] and [ Q] to [ P]

2. Communicative predicates2.1 Two unified type signatures [ (Ent) P⊕Q] (optional

recipient) and [ Ent P⊕Q] (obligatory recipient)

47

Page 109: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Findings

[ P] [ Q]

[ that S] [ whether S]

[ (Ent) P⊕Q][ Ent P⊕Q]

[ to NP that S] [ to NP whether S][ NP that S] [ NP whether S]

48

Page 110: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Findings

Three findings

1. Cognitive predicates1.1 Two distinct type signatures [ P] and [ Q]1.2 Coercion of [ P] to [ Q] and [ Q] to [ P]

2. Communicative predicates2.1 Two unified type signatures [ (Ent) P⊕Q] (optional

recipient) and [ Ent P⊕Q] (obligatory recipient)

49

Page 111: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Findings

[ P] [ Q]

[ that S] [ whether S]

[ (Ent) P⊕Q][ Ent P⊕Q]

[ to NP that S] [ to NP whether S][ NP that S] [ NP whether S]

50

Page 112: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Findings

[ P] [ Q]

[ that S] [ whether S]

[ (Ent) P⊕Q][ Ent P⊕Q]

[ to NP that S] [ to NP whether S][ NP that S] [ NP whether S]

50

Page 113: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Hybrid types

QuestionWhat do we mean by P⊕Q?

ExampleStructures with potentially both informative and inquisitive con-tent (Groenendijk & Roelofsen 2009, a.o.)

• S-selectional behavior of responsive predicates on someaccounts (Uegaki 2012; Rawlins 2013)

• Some attitudes whose content is a hybrid Lewisian (1988)subject matter (Rawlins 2013 on think v. think about)

51

Page 114: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Projection

NP Ved to VP[eventive]NP Ved for NP to VP

NP Ved to VP[stative]NP Ved VPing

NP Ved NP to VP[stative]NP Ved NP VP

NP Ved NP to VP[eventive]NP Ved to NP whether S[future]

NP Ved to NP whether SNP Ved to NP that S[future]

NP Ved to NP that SNP Ved to NP that S[-tense]

NP Ved NP to NPNP Ved about whether S

NP Ved about NPNP Ved

NP was Ved whether S[future]NP was Ved whether S

NP was Ved about whether SNP was Ved that S[future]

NP was Ved that SNP was Ved about NPNP Ved NP whichNP S

NP Ved NP that S[-tense]NP was Ved whichNP S

NP was Ved whether to VPNP Ved NP whether S[future]

NP Ved NP whether SNP Ved NP that S[future]

NP Ved NP that SNP was Ved whichNP to VPNP was Ved that S[-tense]

NP was Ved SNP was Ved so

NP was Ved to VP[stative]NP was Ved to VP[eventive]

NP was VedNP Ved whichNP to VPNP Ved whether to VP

NP Ved whether S[future]NP Ved whether SNP Ved whichNP SNP Ved NP VPing

NP Ved NPNP Ved that S[-tense]

NP Ved soS, I V

NP Ved SNP Ved that S[future]

NP Ved that S

52

Page 115: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Projection

NP Ved NP

NP Ved NP VPing

NP Ved SS, I V

NP Ved that S

NP Ved that S[future]

NP was Ved to VP[stative]

NP Ved whether S

NP Ved whether S[future]NP Ved whether to VP

NP Ved whichNP S

NP Ved whichNP to VP

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00[ P]

[Q]

53

Page 116: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Projection

0.0001

0.001

0.01

0.10.250.50.750.9

0.99

0.999

sselectP

sselec

tQ

54

Page 117: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

acceptacknowledge

admit

affirmagree

announce

assume

attest

believe

decidedetect

expect

figure outfind out

guarantee

hope

swearwish

0.0001

0.001

0.01

0.10.250.50.750.9

0.99

0.999

[ P]

[Q]

55

Page 118: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

accept

analyzeassumebrainstorm

clarify

contemplate

decidedetect

figure outfind out

miss

outlinequery

question

0.0001

0.001

0.01

0.10.250.50.750.9

0.99

0.999

[ P]

[Q]

56

Page 119: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Findings

[ P] [ Q]

[ that S] [ whether S]

[ (Ent) P⊕Q][ Ent P⊕Q]

[ to NP that S] [ to NP whether S][ NP that S] [ NP whether S]

57

Page 120: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Projection

NP Ved NP

NP Ved NP to NP

NP Ved S

S, I V

NP Ved that S

NP Ved that S[future]

NP Ved to NP that S

NP Ved to NP that S[future]

NP Ved to NP that S[-tense]

NP Ved to NP whether S

NP Ved to NP whether S[future]

NP was Ved to VP[stative]

NP Ved whether S

NP Ved whether S[future]

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00[ P]

[(Ent)P

⊕Q]

58

Page 121: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

S-selection

acknowledgeadvertise

announcebabble

chat claim

complain

confirmdeny

explain

faxlie

repeat

reveal

sayshare

signalwrite

0.0001

0.001

0.01

0.10.250.50.750.9

0.99

0.999

0.9999

[ P]

[(Ent)P

⊕Q]

59

Page 122: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Findings

[ P] [ Q]

[ that S] [ whether S]

[ (Ent) P⊕Q][ Ent P⊕Q]

[ to NP that S] [ to NP whether S][ NP that S] [ NP whether S]

60

Page 123: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Projection

NP Ved NP that S

NP Ved NP that S[future]

NP Ved NP that S[-tense]

NP Ved NP whether S

NP Ved NP whether S[future]

NP was Ved S

NP was Ved about NP

NP was Ved about whether S

NP was Ved

NP was Ved that S

NP was Ved that S[future]

NP was Ved that S[-tense]

NP was Ved whether S

NP was Ved whether S[future]

NP was Ved whether to VPNP was Ved whichNP to VP

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00[ Ent P]

[En

tP⊕Q]

61

Page 124: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

S-selection

0.0001

0.001

0.01

0.10.250.50.750.9

0.99

0.999

[ Ent P]

[En

tP⊕Q]

62

Page 125: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

S-selection

advisealert

ask

bet

email

fax

notify

remind

tell

0.0001

0.001

0.01

0.10.250.50.750.9

0.99

0.999

[ Ent P]

[En

tP⊕Q]

63

Page 126: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Findings

[ P] [ Q]

[ that S] [ whether S]

[ (Ent) P⊕Q][ Ent P⊕Q]

[ to NP that S] [ to NP whether S][ NP that S] [ NP whether S]

64

Page 127: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Discussion

What we concludeProposition and question types live alongside hybrid types, andthe presence of a hybrid type correlates with communicativity

What we can exclude

Accounts that reduce (or unify) declarative and interrogative se-lection solely to S-selection of a single type + coercion

Methodological pointCoercion can have measurable effects

65

Page 128: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Discussion

What we concludeProposition and question types live alongside hybrid types, andthe presence of a hybrid type correlates with communicativity

What we can exclude

Accounts that reduce (or unify) declarative and interrogative se-lection solely to S-selection of a single type + coercion

Methodological pointCoercion can have measurable effects

65

Page 129: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Discussion

What we concludeProposition and question types live alongside hybrid types, andthe presence of a hybrid type correlates with communicativity

What we can exclude

Accounts that reduce (or unify) declarative and interrogative se-lection solely to S-selection of a single type + coercion

Methodological pointCoercion can have measurable effects

65

Page 130: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Conclusions and future directions

Page 131: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Conclusions

Goals

1. Demonstrate a combined experimental-computationalmethod for scaling distributional analysis

2. Show that this method provides insight into generalprinciples governing lexical semantic structure

Basic idea

1. Formalize S(emantic)-selection, projection rule, andlexical idiosyncrasy at Marr’s (1982) computational level

2. Collect data on ∼1000 verbs’ syntactic distributions3. Given syntactic distribution data, use computational

techniques to automate inference of projection rules andverbs’ semantic type, controlling for lexical idiosyncrasy

67

Page 132: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Conclusions

Goals

1. Demonstrate a combined experimental-computationalmethod for scaling distributional analysis

2. Show that this method provides insight into generalprinciples governing lexical semantic structure

Basic idea

1. Formalize S(emantic)-selection, projection rule, andlexical idiosyncrasy at Marr’s (1982) computational level

2. Collect data on ∼1000 verbs’ syntactic distributions3. Given syntactic distribution data, use computational

techniques to automate inference of projection rules andverbs’ semantic type, controlling for lexical idiosyncrasy

67

Page 133: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Conclusions

Goals

1. Demonstrate a combined experimental-computationalmethod for scaling distributional analysis

2. Show that this method provides insight into generalprinciples governing lexical semantic structure

Basic idea

1. Formalize S(emantic)-selection, projection rule, andlexical idiosyncrasy at Marr’s (1982) computational level

2. Collect data on ∼1000 verbs’ syntactic distributions3. Given syntactic distribution data, use computational

techniques to automate inference of projection rules andverbs’ semantic type, controlling for lexical idiosyncrasy

67

Page 134: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Conclusions

Goals

1. Demonstrate a combined experimental-computationalmethod for scaling distributional analysis

2. Show that this method provides insight into generalprinciples governing lexical semantic structure

Basic idea

1. Formalize S(emantic)-selection, projection rule, andlexical idiosyncrasy at Marr’s (1982) computational level

2. Collect data on ∼1000 verbs’ syntactic distributions

3. Given syntactic distribution data, use computationaltechniques to automate inference of projection rules andverbs’ semantic type, controlling for lexical idiosyncrasy

67

Page 135: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Conclusions

Goals

1. Demonstrate a combined experimental-computationalmethod for scaling distributional analysis

2. Show that this method provides insight into generalprinciples governing lexical semantic structure

Basic idea

1. Formalize S(emantic)-selection, projection rule, andlexical idiosyncrasy at Marr’s (1982) computational level

2. Collect data on ∼1000 verbs’ syntactic distributions3. Given syntactic distribution data, use computational

techniques to automate inference of projection rules andverbs’ semantic type, controlling for lexical idiosyncrasy 67

Page 136: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Conclusions

Focus

Clause-embedding predicates (∼1000 in English)

Case studyResponsive predicates and the features that underly their se-lectional behavior.

(7) John knows {that, whether} it’s raining.

By looking at such a large data set, we can discover therelevant s-selectional features, and get an angle on theproblem at the scale of the entire lexicon.

68

Page 137: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Conclusions

Focus

Clause-embedding predicates (∼1000 in English)

Case studyResponsive predicates and the features that underly their se-lectional behavior.

(7) John knows {that, whether} it’s raining.

By looking at such a large data set, we can discover therelevant s-selectional features, and get an angle on theproblem at the scale of the entire lexicon.

68

Page 138: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Future directions

Further investigation of type signaturesSeven other type signatures that are also remarkably coherent

ExampleMany nonfinite-taking verbs

69

Page 139: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Future directions

Atomic v. structured type signaturesCurrently treating type signatures as atomic but type signatureshave rich structure

ExamplePreliminary experiments with models that represent type struc-ture suggest that our glosses for the types are correct

70

Page 140: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Future directions

Homophony v. regular polysemy v. underspecificationPatterns in how semantic type signatures distribute across verbsmay belie regular polysemy rules

ExamplePreliminary experiments with a more elaborated model suggestresponsive predicates display a regular polysemy (cf. George 2011)

71

Page 141: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Thanks

We are grateful to audiences at Johns Hopkins University fordiscussion of this work. We would like to thank Shevaun Lewisand Drew Reisinger in particular for useful comments on thistalk.

This work was funded by NSF DDRIG-1456013 (DoctoralDissertation Research: Learning attitude verb meanings), NSFINSPIRE BCS-1344269 (Gradient symbolic computation), and theJHU Science of Learning Institute.

72

Page 142: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Bibliography I

Akaike, Hirotugu. 1974. A new look at the statistical modelidentification. IEEE Transactions on Automatic Control 19(6).716–723.

Carter, Richard. 1976. Some linking regularities. On Linking: Papers byRichard Carter Cambridge MA: Center for Cognitive Science, MIT(Lexicon Project Working Papers No. 25) .

Chomsky, Noam. 1981. Lectures on Government and Binding: The PisaLectures. Walter de Gruyter.

Duchi, John, Elad Hazan & Yoram Singer. 2011. Adaptive subgradientmethods for online learning and stochastic optimization. TheJournal of Machine Learning Research 12. 2121–2159.

73

Page 143: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Bibliography II

Erlewine, Michael Yoshitaka & Hadas Kotek. 2015. A streamlinedapproach to online linguistic surveys. Natural Language &Linguistic Theory 1–15. doi:10.1007/s11049-015-9305-9.http://link.springer.com/article/10.1007/s11049-015-9305-9.

Frana, Ilaria. 2010. Concealed Questions: in search of answers:University of Massachusetts at Amherst Ph.D. dissertation.

Gelman, Andrew, Jessica Hwang & Aki Vehtari. 2013. Understandingpredictive information criteria for Bayesian models. Statistics andComputing 1–20.

George, Benjamin Ross. 2011. Question embedding and the semanticsof answers: University of California Los Angeles dissertation.

Ginzburg, Jonathan. 1995. Resolving questions, II. Linguistics andPhilosophy 18(6). 567–609.

74

Page 144: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Bibliography III

Grimshaw, Jane. 1979. Complement selection and the lexicon.Linguistic Inquiry 10(2). 279–326.

Grimshaw, Jane. 1990. Argument structure. Cambridge, MA: MIT Press.Groenendijk, Jeroen & Floris Roelofsen. 2009. Inquisitive semantics

and pragmatics. Paper presented at Stanford workshop onLanguage, Communication, and Rational Agency.

Groenendijk, Jeroen & Martin Stokhof. 1984. On the semantics ofquestions and the pragmatics of answers. Varieties of formalsemantics 3. 143–170.

Gruber, Jeffrey Steven. 1965. Studies in lexical relations:Massachusetts Institute of Technology dissertation.

Hacquard, Valentine & Alexis Wellwood. 2012. Embedding epistemicmodals in English: A corpus-based study. Semantics andPragmatics 5(4). 1–29.

75

Page 145: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Bibliography IV

Heim, Irene. 1994. Interrogative semantics and Karttunen’s semanticsfor know. In Proceedings of IATL, vol. 1, 128–144.

Jackendoff, Ray. 1972. Semantic interpretation in generative grammar.Cambridge, MA: MIT Press.

Karttunen, Lauri. 1977. Syntax and semantics of questions. Linguisticsand philosophy 1(1). 3–44.

Lahiri, Utpal. 2002. Questions and answers in embedded contexts.Oxford University Press.

Levin, Beth. 1993. English verb classes and alternations: Apreliminary investigation. University of Chicago Press.

Lewis, David. 1988. Relevant implication. Theoria 54(3). 161–174.

76

Page 146: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Bibliography V

Marr, David. 1982. Vision: a computational investigation into thehuman representation and processing of visual information.Henry Holt and Co. .

Nathan, Lance Edward. 2006. On the interpretation of concealedquestions: Massachusetts Institute of Technology dissertation.

Pesetsky, David. 1982. Paths and categories: MIT dissertation.Pesetsky, David. 1991. Zero syntax: vol. 2: Infinitives.Pinker, Steven. 1984. Language learnability and language

development. Harvard University Press.Pinker, Steven. 1989. Learnability and cognition: The acquisition of

argument structure. Cambridge, MA: MIT Press.Rawlins, Kyle. 2013. About ’about’. In Semantics and Linguistic

Theory, vol. 23, 336–357.

77

Page 147: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Bibliography VI

Spector, Benjamin & Paul Egre. 2015. A uniform semantics forembedded interrogatives: An answer, not necessarily the answer.Synthese 192(6). 1729–1784.

Uegaki, Wataru. 2012. Content nouns and the semantics ofquestion-embedding predicates. In Ana Aguilar-Guevara, AnnaChernilovskaya & Rick Nouwen (eds.), Proceedings of SuB 16, .

Uegaki, Wataru. 2015. Interpreting questions under attitudes: MITdissertation.

White, Aaron Steven. 2015. Information and incrementality insyntactic bootstrapping: University of Maryland dissertation.

White, Aaron Steven, Valentine Hacquard & Jeffrey Lidz. 2015.Projecting attitudes.

78

Page 148: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Appendix

Page 149: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

The response model

Two functions

1. Normalize for participants’ judgments so they arecomparable

2. Control for lexicosyntactic noise

80

Page 150: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

The response model

Two functions

1. Normalize for participants’ judgments so they arecomparable

2. Control for lexicosyntactic noise

80

Page 151: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

The response model

Why normalize judgments?Necessary to control for differences in participants’ use of scale

1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7

81

Page 152: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

The response model

Why normalize judgments?Necessary to control for differences in participants’ use of scale

1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7

81

Page 153: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

The response model

0 1

1

−3

2

−2

3

−1

4

0

5

1

6

2

7

3−3 −2 −1 0 1 2 3

R

82

Page 154: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

The response model

0 1

1

−3

2

−2

3

−1

4

0

5

1

6

2

7

3−3 −2 −1 0 1 2 3

R

82

Page 155: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

The response model

0 1

1

−3

2

−2

3

−1

4

0

5

1

6

2

7

3−3 −2 −1 0 1 2 3

R

82

Page 156: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

The response model

0 1

1

−3

2

−2

3

−1

4

0

5

1

6

2

7

3−3 −2 −1 0 1 2 3

R

82

Page 157: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

The response model

-10

-5

0

5

10

-10 -5 0 5 10

NP V S

NPVwh

ethe

rS

83

Page 158: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

The response model

-10

-5

0

5

10

-10 -5 0 5 10

NP V S

NPVwh

ethe

rS

84

Page 159: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

The response model

1

2

345

6

7

1 2 3 4 5 6 7NP V S

NPVwh

ethe

rS

85

Page 160: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

The response model

know

think

want

wonder

1

2

345

6

7

1 2 3 4 5 6 7NP V S

NPVwh

ethe

rS

86

Page 161: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

The response model

know

think

want

wonder

1

2

345

6

7

1 2 3 4 5 6 7NP V S

NPVwh

ethe

rS

87

Page 162: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

The response model

know

think

want

wonder

1

23456

7

1 2 3 4 5 6 7NP V S

NPVwh

ethe

rS

88

Page 163: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

The response model

know

think

want

wonder

1

2

3456

7

1 2 3 4 5 6 7NP V S

NPVwh

ethe

rS

89

Page 164: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

The response model

0 1

1

−3

2

−2

3

−1

4

0

5

1

6

2

7

3−3 −2 −1 0 1 2 3

R

90

Page 165: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

The response model

0 1

1

−3

2

−2

3

−1

4

0

5

1

6

2

7

3−3 −2 −1 0 1 2 3

R

90

Page 166: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

The response model

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00NP V S

NPVwh

ethe

rS

91

Page 167: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Fitting the model

SubgoalFind the optimal number T of type signatures

Goodness of T ↔ model’s ability to... ...fit observed judgments...predict unobserved judgments

• T too small →

bad fitbad prediction

• T too large →

good fitbad prediction

Measure

Akaike Information Criterion (AIC) trades off fit to observeddata and prediction of unobserved data

92

Page 168: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Fitting the model

SubgoalFind the optimal number T of type signatures

Goodness of T ↔ model’s ability to... ...fit observed judgments...predict unobserved judgments

• T too small →

bad fitbad prediction

• T too large →

good fitbad prediction

Measure

Akaike Information Criterion (AIC) trades off fit to observeddata and prediction of unobserved data

92

Page 169: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Fitting the model

SubgoalFind the optimal number T of type signatures

Goodness of T ↔ model’s ability to... ...fit observed judgments...predict unobserved judgments

• T too small →

bad fitbad prediction

• T too large →

good fitbad prediction

Measure

Akaike Information Criterion (AIC) trades off fit to observeddata and prediction of unobserved data

92

Page 170: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Fitting the model

SubgoalFind the optimal number T of type signatures

Goodness of T ↔ model’s ability to... ...fit observed judgments...predict unobserved judgments

• T too small →

bad fitbad prediction

• T too large →

good fitbad prediction

Measure

Akaike Information Criterion (AIC) trades off fit to observeddata and prediction of unobserved data

92

Page 171: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Fitting the model

Number of type signatures

1 2 3 4 5 6 7

Low extremeAll verbs’ syntactic

distributions explainedby single rule

High extreme# types ≥ # frames

every syntactic framehas separate rule

93

Page 172: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Fitting the model

Number of type signatures

1 2 3 4 5 6 7

Low extremeAll verbs’ syntactic

distributions explainedby single rule

High extreme# types ≥ # frames

every syntactic framehas separate rule

93

Page 173: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Fitting the model

Number of type signatures

1 2 3 4 5 6 7

Low extremeAll verbs’ syntactic

distributions explainedby single rule

High extreme# types ≥ # frames

every syntactic framehas separate rule

93

Page 174: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Fitting the model

SubgoalFind the optimal number T of type signatures

Goodness of T ↔ model’s ability to... ...fit observed judgments...predict unobserved judgments

• T too small →

bad fitbad prediction

• T too large →

good fitbad prediction

Measure

Akaike Information Criterion (AIC) trades off fit to observeddata and prediction of unobserved data 94

Page 175: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Model comparison

620000

640000

660000

680000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 151 2 3 4 5 6 7 8 9 10 11 12 13 14 151 2 3 4 5 6 7 8 9 10 11 12 13 14 15Number of semantic type signatures

Akaike

Inform

ationCrite

rion

95

Page 176: A computational model of S-selection · 14th May, 2016 Johns Hopkins University 1Department of Cognitive Science 2Center for Language and Speech Processing 2Science of Learning Institute

Model comparison

621500

622000

622500

623000

623500

8 9 10 11 12 13 14 158 9 10 11 12 13 14 158 9 10 11 12 13 14 15Number of semantic type signatures

Akaike

Inform

ationCrite

rion

96