syntax: constituency and dependency structure

22
Syntax: Constituency and Dependency Structure CS 490A, Fall 2021 https://people.cs.umass.edu/~brenocon/cs490a_f21/ Laure Thompson and Brendan O'Connor College of Information and Computer Sciences University of Massachusetts Amherst

Upload: others

Post on 16-Feb-2022

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Syntax: Constituency and Dependency Structure

Syntax: Constituency and

Dependency StructureCS 490A, Fall 2021

https://people.cs.umass.edu/~brenocon/cs490a_f21/

Laure Thompson and Brendan O'ConnorCollege of Information and Computer Sciences

University of Massachusetts Amherst

Page 2: Syntax: Constituency and Dependency Structure

Announcements

• HW3 coming out shortly (show) • Due Friday, Oct 29 • Work with POS tags, NER, dependency parsing

• Project proposals due tomorrow! • Question: how will they be graded?

• Discuss project ideas after class? • (outdoors please)

2

Page 3: Syntax: Constituency and Dependency Structure

• Syntax: how do words structurally combine to form sentences and meaning?

• Representations

• Constituents

• [the big dogs] chase cats

• [colorless green clouds] chase cats

• Dependencies

• The dog ← chased the cat.

• My dog, who's getting old, chased the cat.

• Idea of a grammar (G): global template for how sentences / utterances / phrases w are formed, via latent syntactic structure y

• Linguistics: what do G and P(w,y | G) look like?

• Generation: score with, or sample from, P(w, y | G)

• Parsing: predict P(y | w, G)

3

Page 4: Syntax: Constituency and Dependency Structure

Syntax for NLP

• If we could predict syntactic structure from raw text, that could help with...

• Language understanding: meaning formed from structure

• Grammar checking

• Preprocessing: Extract phrases and semantic relationships between words for features, viewing, etc.

• Provides a connection between the theory of generative linguistics and computational modeling of language

4

Page 5: Syntax: Constituency and Dependency Structure

Is language context-free?• Regular language: repetition of repeated structures

• e.g. "base noun phrases": (Noun | Adj)* Noun

• subset of the JK pattern

• Context-free: hierarchical recursion

• Center-embedding: classic theoretical argument for CFG vs. regular languages

• (10.1)  The cat is fat.

• (10.2)  The cat that the dog chased is fat.

• (10.3)  *The cat that the dog is fat.

• (10.4)  The cat that the dog that the monkey kissed chased is fat.

• (10.5)  *The cat that the dog that the monkey chased is fat.

• Competence vs. Performance?

5 [Examples from Eisenstein (2017)]

Page 6: Syntax: Constituency and Dependency Structure

Hierarchical view of syntax

• “a Sentence made of Noun Phrase followed by a Verb Phrase”

6

!"#$%&'$!(") *+#+,%'- (" '(./,%,$!0+ #1"$,2 -,#.+$ $-!# '-,33+"4+ 51 #++6!"4 $( '-,%,'$+%!7+-&.," 3,"4&,4+# !" $+%.# (8 &"!0+%#,3 #1"$,'$!'/%(/+%$!+#9 :-!'- .,1 %+83+'$ $-+ '-!3;<# !"",$+6"(:3+;4+9 ,"; "("=&"!0+%#,3 '3&#$+%# (8 #1"$,'$!'/%(/+%$!+# $-,$ /,$$+%" $(4+$-+% ,'%(## 3,"4&,4+#9,"; -+"'+ .,1 5+ 3+,%"+; ,# , 4%(&/) >-&#9 $-+#$&;1 (8 '(./,%,$!0+ #1"$,2 ,"; $-+ #$&;1 (8 3,"=4&,4+ 3+,%"!"4 ,%+ '3(#+31 %+3,$+;) ?!"" !""#$%"%&&#"' (")*%+&#, -+#..#+@

!"#$%&'#(%)* +! *,#(%-(.-(/'+0,

$1234565 .7817169

A3.(#$ ,33 ,''(&"$# (8 $-+ ;!#'%+$+ !"8!"!$1 /%(/+%$1(8 ",$&%,3 3,"4&,4+ #1"$,2 #$,%$ 8%(. $-+ "($!("$-,$ #+"$+"'+# '("#!#$ (8 .(%+ $-," B&#$ #+C&+"'+#(8 :(%;#) D" $-+ .!";# (8 #/+,6+%# ,"; 3!#$+"+%#9#+"$+"'+# ,%+ -!+%,%'-!',331 #$%&'$&%+; %+/%+#+"$,=$!("#9 !" :-!'- :(%;# ,%+ 4%(&/+; $(4+$-+% $( 8(%./-%,#+#9 :-!'- !" $&%" '(.5!"+ $( 8(%. 3,%4+%/-%,#+#) E(% +2,./3+9 , .!"!.,3 #+"$+"'+ (8 F"4=3!#-9 #&'- ,# GH(-" ,%%!0+;<9 '("$,!"# , #&5B+'$ ,"; ,/%+;!',$+9 5&$ $-+ %(3+# (8 #&5B+'$ ,"; /%+;!',$+.,1 5+ %+/3,'+; 51 /-%,#+# (8 ,%5!$%,%1 '(./3+2=!$1) I1 %+/%+#+"$!"4 /(##!53+ #&5B+'$# ,"; /%+;!=',$+# ,# #$%# &'()*"* ?JK#@ ,"; +"(, &'()*"* ?LK#@%+#/+'$!0+319 $-+ #$%&'$&%+ (8 .,"1 /(##!53+ #+"=$+"'+# ?M@ '," 5+ ',/$&%+;) >-!# 5,#!' G$+./3,$+<8(% #+"$+"'+# (8 F"43!#- '," 5+ +2/%+##+; ,# , $%++#$%&'$&%+9 ,# !" ?N,@9 (% ,# , /-%,#+ #$%&'$&%+ %&3+9 ,#!" ?N5@)

Sa.

b. S NP VP

NP VPJohnthe manthe elderly janitor

arrivedate an applelooked at his watch

(1)

{ {{ {H&#$ ,# %&3+# 3!6+ M ! JK LK /%(0!;+ $+./3,$+#

8(% #+"$+"'+#9 $+./3,$+# '," ,3#( 5+ #/+'!8!+; 8(%$-+ !"$+%",3 #$%&'$&%+ (8 "(&" /-%,#+#9 0+%5/-%,#+#9 ,"; .,"1 ($-+% /-%,#+=$1/+#) F0+" ,#.,33 "&.5+% (8 /-%,#+ #$%&'$&%+ %&3+# ,"; ,#.,33 3+2!'(" '," 4+"+%,$+ 3,%4+ "&.5+%# (8 #+"=$+"'+#) O!$- ("31 $-+ 8!0+ /-%,#+ #$%&'$&%+ %&3+# !"?P@ ,"; , QR=:(%; 3+2!'(" ?'("#!#$!"4 (8 NR "(&"#9NR ;+$+%.!"+%#9 ,"; NR 0+%5#@ NPP9NRR ;!88+%+"$ #+"=$+"'+# '," 5+ 4+"+%,$+;)

M ! JK LK

LK ! LJK

LK ! L

JK ! S+$ JK

JK ! J "P#

*&3+# $-,$ ,33(: , /-%,#+ $( 5+ +.5+;;+; !"#!;+,"($-+% /-%,#+ (8 $-+ #,.+ $1/+ ,%+ 6"(:" ,# ("-%(.*/+" %&3+#) T((%;!",$!(" ?Q@9 .(;!8!',$!(" ?U@9 ,";#+"$+"$!,3 '(./3+.+"$,$!(" ?V@ ,33 !"0(30+ %+'&%=#!(") >-+1 '," $-&# 5+ !"0(6+; ,%5!$%,%!31 .,"1$!.+# !" , #!"43+ #+"$+"'+) M&'- %&3+# !"'%+,#+ $-++2/%+##!0+ /(:+% (8 $-+ 4%,..,% 8%(. .+%+31 0,#$$( '3+,%31 !"8!"!$+) >-+%+ ,%+ (50!(&# /%,'$!',3 3!.!=$,$!("# (" $-+ 3+"4$- ,"; '(./3+2!$1 (8 ",$&%,331(''&%%!"4 #+"$+"'+#9 5&$ #&'- 3!.!$,$!("# ,%+ $1/!'=,331 ,$$%!5&$+; $( !";+/+";+"$ 3!.!$,$!("# (" ,$$+"=$!(" ,"; .+.(%1)

JK ! JK T("B JK

LK ! LK T("B LK

T("B ! )#0 "Q#

LK ! LK KK

JK ! JK KK "U#

LK ! L M$

M$ ! T(./ M

T(./ ! 1')1 "V#

A3$-(&4- $-+ %&3+# 3!#$+; !" ?N@W?V@ 8,33 8,% #-(%$(8 $-+ +2/%+##!0+ /(:+% (8 F"43!#-9 +0+" $-!##.,33 8%,4.+"$ #-(:# -(: ",$&%,3 3,"4&,4+#1"$,2 &#+# 8!"!$+ .+,"# $( 4+"+%,$+ !"8!"!$+31.,"1 #+"$+"'+#) ?!"" /0+#&% 1$+23$2+% #"' 456#+70%8+9@

&:61;<617= *64>36>452? -:72616>5739

>-+ #1"$,'$!'!,"<# $((35(2 !"'3&;+# , "&.5+% (8#$%&'$&%,3 $+#$# $-,$ '," 5+ &#+; ,# ,!;# !" ;!,4"(#=!"4 #+"$+"'+ #$%&'$&%+#X 8(% +2,./3+9 -$#*1/1%"#1* (8#+"$+"'+# '," 4+"+%,331 5+ '("B&"'$# !" '((%;!",$+#$%&'$&%+#9 ,# !# #-(:" 8(% JK# ,"; LK# !" ?Y,9 5@)Z$-+% $+#$# $-,$ #-(: $-+ '("#$!$&+"'1 (8 LK# !"='3&;+ #&5#$!$&$!(" (8 $-+ +2/%+##!(" G;( #(< 8(% , LK?[,@9 ,"; 8%("$!"4 (8 $-+ LK $( , '3,&#+=!"!$!,3 /(#=!$!(" ?[5@)

,! O,33,'+ 8+$'-+;%JK $-+ '-++#+& ,"; %JK $-+'%,'6+%#&

5! O,33,'+%LK #3!'+; $-+ '-++#+& ,"; %LK (/+"+;$-+ '%,'6+%#& "Y#

QPR !2#1)3

[From Phillips (2003)]

Page 7: Syntax: Constituency and Dependency Structure

Context-free grammars (CFG)• A CFG is a 4-tuple:

7

180 CHAPTER 10. CONTEXT-FREE GRAMMARS

• pushdown automata define context-free languages;

• Turing machines define recursively-enumerable languages.

In the Chomsky hierarchy, context-free languages (CFLs) are a strict generalization ofregular languages.

regular languages context-free languages

regular expressions context-free grammars (CFGs)finite-state machines pushdown automatapaths derivations

Context-free grammars define CFLs. They are sets of permissible productions whichallow you to derive strings composed of surface symbols. An important feature of CFGsis recursion, in which a nonterminal can be derived from itself.

More formally, a CFG is a tuple hN, ⌃, R, Si:

N a set of non-terminals⌃ a set of terminals (distinct from N )R a set of productions, each of the form A ! �,

where A 2 N and � 2 (⌃ [ N)⇤

S a designated start symbol

Context free grammars provide rules for generating strings.

• The left-hand side (LHS) of each production is a non-terminal 2 N

• The right-hand side (RHS) of each production is a sequence of terminals or non-terminals, {n, �}

⇤, n 2 N, � 2 ⌃.

A derivation t is a sequence of steps from S to a surface string w 2 ⌃⇤, which is the

yield of the derivation. A derivation can be viewed as trees or as bracketings, as shownin Figure 11.4.

If there is some derivation t in grammar G such that w is the yield of t, then w is inthe language defined by the grammar. Equivalently, for grammar G, we can write that|TG(w)| � 1. When there are multiple derivations of w in grammar G, this is a case ofderivational ambiguity; if any such w exists, then we can say that the grammar itself isambiguous.

(c) Jacob Eisenstein 2014-2017. Work in progress.

• Derivation: a sequence of rewrite steps from S to a string (sequence of terminals, i.e. words)

• Yield: the final string (sentence)

• The parse tree or constituency tree corresponds to the rewrite steps that were used to derive the string

• A CFG is a “boolean language model”

• A probabilistic CFG is a probabilistic language model:

• Every production rule has a probability; defines prob dist. over strings.

Example: see handout!

Page 8: Syntax: Constituency and Dependency Structure

• Example: derivation from worksheet's grammar

8

Page 9: Syntax: Constituency and Dependency Structure

Example

• All useful grammars are ambiguous: multiple derivations with same yield

• [Parse tree representations: Nested parens or non-terminal spans]

9

10.2. CONTEXT-FREE LANGUAGES 181

S

VP

NP

PP

NP

NNS

chopsticks

IN

with

NP

NN

sushi

VBZ

eats

NP

PRP

She

(S(NP(PRP She)(VP(VBZ eats)(NP(NP(NN sushi))(PP (INwith)(NP(NNS chopsticks)))))))

S

VP

PP

NP

NNS

chopsticks

IN

with

NP

NN

sushi

VBZ

eats

NP

PRP

She

(S(NP(PRP She)(VP(VBZ eats)(NP(NN sushi))(PP(INwith)(NP(NNS chopsticks))))))

Figure 10.1: Two derivations of the same sentence, shown as both parse trees and brack-etings

(c) Jacob Eisenstein 2014-2017. Work in progress.

10.2. CONTEXT-FREE LANGUAGES 181

S

VP

NP

PP

NP

NNS

chopsticks

IN

with

NP

NN

sushi

VBZ

eats

NP

PRP

She

(S(NP(PRP She)(VP(VBZ eats)(NP(NP(NN sushi))(PP (INwith)(NP(NNS chopsticks)))))))

S

VP

PP

NP

NNS

chopsticks

IN

with

NP

NN

sushi

VBZ

eats

NP

PRP

She

(S(NP(PRP She)(VP(VBZ eats)(NP(NN sushi))(PP(INwith)(NP(NNS chopsticks))))))

Figure 10.1: Two derivations of the same sentence, shown as both parse trees and brack-etings

(c) Jacob Eisenstein 2014-2017. Work in progress.

[Examples from Eisenstein (2017)]

Page 10: Syntax: Constituency and Dependency Structure

Is language context-free?

• CFGs nicely explain nesting and agreement (if you stuff grammatical features into the non-terminals)

• The processor has 10 million times fewer transistors on it than todays typical micro- processors, runs much more slowly, and operates at five times the voltage...

• S → NN VP VP → VP3S | VPN3S | . . . VP3S → VP3S, VP3S, and VP3S | VBZ | VBZ NP | . . .

10 [Examples from Eisenstein (2017)]

Page 11: Syntax: Constituency and Dependency Structure

• Ambiguities in syntax

11

196 CHAPTER 11. CFG PARSING

Time The time complexity is O(M3#|R|). At each cell, we search over O(M) split points,

and #|R| productions, where #|R| is the number of production rules in the gram-mar.

Notice that these are considerably worse than the finite-state algorithms of Viterbi andforward-backward, which are linear time; generic shortest-path for finite-state automatahas complexity O(M log M). As usual, these are worst-case asymptotic complexities. Butin practice, things can be worse than worst-case! (See Figure 11.2) This is because longersentences tend to “unlock” more of the grammar — they involve non-terminals that donot appear in shorter sentences.

Figure 11.2: Figure from Dan Klein’s lecture slides

11.2 Ambiguity in parsing

In many applications, we don’t just want to know whether a sentence is grammatical, wewant to know what structure is the best analysis. Unfortunately, syntactic ambiguity isendemic to natural language:2

Attachment ambiguity we eat sushi with chopsticks, I shot an elephant in my pajamas.

Modifier scope southern food store

Particle versus preposition The puppy tore up the staircase.

Complement structure The tourists objected to the guide that they couldn’t hear.

Coordination scope “I see,” said the blind man, as he picked up the hammer and saw.

Multiple gap constructions The chicken is ready to eat

These forms of ambiguity can combine, so that a seemingly simple sentence like Fedraises interest rates can have dozens of possible analyses, even in a minimal grammar. Real-size broad coverage grammars permit millions of parses of typical sentences. Faced withthis ambiguity, classical parsers faced a tradeoff:

2Examples borrowed from Dan Klein’s slides

(c) Jacob Eisenstein 2014-2017. Work in progress.[Examples from Eisenstein (2017)]

Page 12: Syntax: Constituency and Dependency Structure

Probabilistic CFGs

12

• Defines a probabilistic generative process for words in a sentence

• Can parse with a modified form of CKY

• How to learn? Fully supervised with a treebank... unsupervised learning possible too, but doesn't give great results...

DRAFT

Section 14.1. Probabilistic Context-Free Grammars 3

S → NP VP [.80] Det → that [.10] | a [.30] | the [.60]S → Aux NP VP [.15] Noun → book [.10] | flight [.30]S → VP [.05] | meal [.15] | money [.05]NP → Pronoun [.35] | flights [.40] | dinner [.10]NP → Proper-Noun [.30] Verb → book [.30] | include [.30]NP → Det Nominal [.20] | prefer; [.40]NP → Nominal [.15] Pronoun → I [.40] | she [.05]Nominal → Noun [.75] | me [.15] | you [.40]Nominal → Nominal Noun [.20] Proper-Noun → Houston [.60]Nominal → Nominal PP [.05] | TWA [.40]VP → Verb [.35] Aux → does [.60] | can [40]VP → Verb NP [.20] Preposition → from [.30] | to [.30]VP → Verb NP PP [.10] | on [.20] | near [.15]VP → Verb PP [.15] | through [.05]VP → Verb NP NP [.05]VP → VP PP [.15]PP → Preposition NP [1.0]

Figure 14.1 A PCFGwhich is a probabilistic augmentation of the L1 miniature EnglishCFG grammar and lexicon of Fig. ?? in Ch. 13. These probabilities were made up forpedagogical purposes and are not based on a corpus (since any real corpus would havemany more rules, and so the true probabilities of each rule would be much smaller).

or as

P(RHS|LHS)

Thus if we consider all the possible expansions of a non-terminal, the sum of theirprobabilities must be 1:

∑β

P(A→ β) = 1

Fig. 14.1 shows a PCFG: a probabilistic augmentation of the L1 miniature EnglishCFG grammar and lexicon . Note that the probabilities of all of the expansions of eachnon-terminal sum to 1. Also note that these probabilities were made up for pedagogicalpurposes. In any real grammar there are a great many more rules for each non-terminaland hence the probabilities of any particular rule would tend to be much smaller.

A PCFG is said to be consistent if the sum of the probabilities of all sentences inCONSISTENT

the language equals 1. Certain kinds of recursive rules cause a grammar to be inconsis-tent by causing infinitely looping derivations for some sentences. For example a ruleS→ S with probability 1 would lead to lost probability mass due to derivations thatnever terminate. See Booth and Thompson (1973) for more details on consistent andinconsistent grammars.

How are PCFGs used? A PCFG can be used to estimate a number of useful prob-abilities concerning a sentence and its parse tree(s), including the probability of a par-

[J&M textbook]

Page 13: Syntax: Constituency and Dependency Structure

13

( (S (NP-SBJ (NNP General) (NNP Electric) (NNP Co.) ) (VP (VBD said) (SBAR (-NONE- 0) (S (NP-SBJ (PRP it) ) (VP (VBD signed) (NP (NP (DT a) (NN contract) ) (PP (-NONE- *ICH*-3) )) (PP (IN with) (NP (NP (DT the) (NNS developers) ) (PP (IN of) (NP (DT the) (NNP Ocean) (NNP State) (NNP Power) (NN project) )))) (PP-3 (IN for) (NP (NP (DT the) (JJ second) (NN phase) ) (PP (IN of) (NP (NP (DT an) (JJ independent) (ADJP (QP ($ $) (CD 400) (CD million) ) (-NONE- *U*) ) (NN power) (NN plant) ) (, ,) (SBAR (WHNP-2 (WDT which) ) (S (NP-SBJ-1 (-NONE- *T*-2) ) (VP (VBZ is) (VP (VBG being) (VP (VBN built) (NP (-NONE- *-1) ) (PP-LOC (IN in) (NP (NP (NNP Burrillville) ) (, ,) (NP (NNP R.I) ))))))))))))))))

PennTreebank

Page 14: Syntax: Constituency and Dependency Structure

• A dependency parse is a tree or directed graph among the words in the sentence

• Directly encodes word-to-word grammatical relationships. Each edge is between:

• Head (or governor), to the • Dependent (or child, or modifier)

• [Example]

14

Page 15: Syntax: Constituency and Dependency Structure

• There isn't really a generative grammar view of dependencies

• They're more of a descriptive formalism • Dependency parsers are often convenient to

use for downstream applications • Subtree = a phrase • "Dependency bigrams": (parent,child) pairs in tree

• Do these correspond to phrases? • Dependency treebanks are available for many

languages (https://universaldependencies.org/), and therefore parsers are widely available.

15

Page 16: Syntax: Constituency and Dependency Structure

• Theories of grammar postulate that every phrase has a head word, which contains or typifies the grammatical content of the phrase

16

From Constituentsto Dependencies

Page 17: Syntax: Constituency and Dependency Structure

From Constituentsto Dependencies

17

258 CHAPTER 11. DEPENDENCY PARSING

S(scratch)

VP(scratch)

PP(with)

NP(claws)

NNS

claws

IN

with

NP(people)

NNS

people

VB

scratch

NP(cats)

NNS

cats

DT

The

(a) lexicalized constituency parse

The cats scratch people with claws

(b) unlabeled dependency tree

Figure 11.1: Dependency grammar is closely linked to lexicalized context free grammars:each lexical head has a dependency path to every other word in the constituent. (Thisexample is based on the lexicalization rules from § 10.5.2, which make the prepositionthe head of a prepositional phrase. In the more contemporary Universal Dependenciesannotations, the head of with claws would be claws, so there would be an edge scratch !

claws.)

occupies the central position for the noun phrase, with the word the playing a supportingrole.

The relationships between words in a sentence can be formalized in a directed graph,based on the lexicalized phrase-structure parse: create an edge (i, j) iff word i is the headof a phrase whose child is a phrase headed by word j. Thus, in our example, we wouldhave scratch ! cats and cats ! the. We would not have the edge scratch ! the, becausealthough S(scratch) dominates DET(the) in the phrase-structure parse tree, it is not its im-mediate parent. These edges describe syntactic dependencies, a bilexical relationshipbetween a head and a dependent, which is at the heart of dependency grammar.

Continuing to build out this dependency graph, we will eventually reach every wordin the sentence, as shown in Figure 11.1b. In this graph — and in all graphs constructedin this way — every word has exactly one incoming edge, except for the root word, whichis indicated by a special incoming arrow from above. Furthermore, the graph is weaklyconnected: if the directed edges were replaced with undirected edges, there would be apath between all pairs of nodes. From these properties, it can be shown that there are nocycles in the graph (or else at least one node would have to have more than one incomingedge), and therefore, the graph is a tree. Because the graph includes all vertices, it is aspanning tree.

11.1.1 Heads and dependents

A dependency edge implies an asymmetric syntactic relationship between the head anddependent words, sometimes called modifiers. For a pair like the cats or cats scratch, how

Jacob Eisenstein. Draft of November 13, 2018.

[Eisenstein (2017)]

Page 18: Syntax: Constituency and Dependency Structure

• stopped here 10/19

18

Page 19: Syntax: Constituency and Dependency Structure

19

Adding Headwords to Trees

S

NP

DT

the

NN

lawyer

VP

Vt

questioned

NP

DT

the

NN

witness

+

S(questioned)

NP(lawyer)

DT(the)

the

NN(lawyer)

lawyer

VP(questioned)

Vt(questioned)

questioned

NP(witness)

DT(the)

the

NN(witness)

witness

Heads in constits.

Page 20: Syntax: Constituency and Dependency Structure

20

Adding Headwords to Trees

S

NP

DT

the

NN

lawyer

VP

Vt

questioned

NP

DT

the

NN

witness

+

S(questioned)

NP(lawyer)

DT(the)

the

NN(lawyer)

lawyer

VP(questioned)

Vt(questioned)

questioned

NP(witness)

DT(the)

the

NN(witness)

witness

Heads in constits.

Page 21: Syntax: Constituency and Dependency Structure

Head rules• Idea: Every phrase has a head word

• Head rules: for every nonterminal in tree, choose one of its children to be its “head”. This will define head words.

• Every nonterminal type has a different head rule; e.g. from Collins (1997):

21

• If parent is NP,

• Search from right-to-left for first child that’s NN, NNP, NNPS, NNS, NX, JJR

• Else: search left-to-right for first child which is NP

• Heads are super useful if you want just a single token to stand-in for a phrase!

• Not just dep parsing. Entities, etc.

Page 22: Syntax: Constituency and Dependency Structure

22

• Dependencies tend to be less specific than constituent structure

224 CHAPTER 12. DEPENDENCY PARSING

VP

PP

with a fork

PP

on the table

NP

dinner

V

ate

(a) Flat

VP

PP

with a fork

VP

PP

on the table

VP

NP

dinner

V

ate

(b) Two-level (PTB-style)

VP

PP

with a fork

PP

on the table

VP

NP

dinner

V

ate

(c) Chomsky adjunction

ate dinner on the table with a fork

(d) Dependency representation

Figure 12.3: The three different CFG analyses of this verb phrase all correspond to a singledependency structure.

shown in Figure 12.3d, these three cases all look the same in a dependency parse. Soif you didn’t think there was any meaningful difference between these three constituentrepresentations, you may view this as an advantage of the dependency representation.

Dependency grammar still leaves open some tricky representational decisions. Forexample, coordination is a challenge: in the sentence, Abigail and Max like kimchi (Fig-ure 12.4), which word is the immediate dependent of the main verb likes? Choosing ei-ther Abigail or Max seems arbitrary; for fairness we might choose and, but this seems insome ways to be the least important word in the noun phrase. One typical solution isto simply choose the left-most item in the coordinated structure — in this case, Abigail.Another alternative, as shown in Figure 12.4c, is a collapsed dependency grammar inwhich conjunctions are not included as nodes in the graph, but are instead used to labelthe edges (De Marneffe et al., 2006). Popel et al. (2013) survey alternatives for handlingthis phenomenon across several dependency treebanks.

The same logic that makes us reluctant to accept and as the head of a coordinated nounphrase may also make us reluctant to accept a preposition as the head of a prepositionalphrase. In the sentence cats scratch people with claws, surely the word claws is more cen-tral than the word with — and it is precisely the bilexical relations between scratch, claws,and people that help guide us to the correct syntactic interpretation. Yet there are alsoarguments for preferring the preposition as the head — as we saw in section 11.5, thepreposition itself is what helps us to choose verb attachment in meet the President on Mon-day and noun attachment in meet the President of Mexico. Collapsed dependency grammar

(c) Jacob Eisenstein 2014-2017. Work in progress.

[Eisenstein (2017)]