dependency parsing joakim nivre. dependency grammar old tradition in descriptive grammar modern...
TRANSCRIPT
Dependency Parsing
Joakim Nivre
Dependency Grammar
• Old tradition in descriptive grammar• Modern theroretical developments:
– Structural syntax (Tesnière)
– Meaning-Text Theory (Mel’čuk)
– Word Grammar (Hudson)
– Functional Generative Description (Prague)
• Basic idea:– Syntactic structure consists of binary, asymmetrical
relations between the words of a sentence
Dependency Representations
• Directed graphs:– V is a set of nodes (tokens)– E is a set of arcs (dependency relations)– L is a labeling function on E (dependency types)
• Example:
PPPå
(In)
NN60-talet
(the-60’s)
VBmålade
(painted)
PNhan (he)
JJdjärva (bold)
NNtavlor
(pictures)
ADV
PR
OBJ
SUB ATT
Graph Constraints
• Commonly imposed constraints:– Single-head (at most one head per node)– Connectedness (no dangling nodes)– Acyclicity (no cycles in the graph)– Projectivity:
• An arc i j is projective iff, for every k occurring between i and j in the input string, i j.
• A graph is projective iff every arc in A is projective.
Dependency Parsing
• Dependency-based grammar parsing:– Given a dependency grammar G and an input
string x *, derive some or all of the dependency graphs y assigned to x by G.
• Dependency-based text parsing:– Given a text T = (x1, …, xn), derive the correct
dependency graph yi for every sentence xi T.
• Text parsing may be grammar-driven or not.
Parsing Methods
• Three main approaches:– Dynamic programming algorithms applied to
context-free (projective) dependency grammars– Eliminative parsing techniques applied to
constraint-based formulations of (non-projective) dependency grammar
– Deterministic parsing algorithms combined with weak grammars or data-driven methods
Dynamic Programming
• Early formalizations:– Hays/Gaifman: Equivalent to a subset of
context-free grammars (roughly lexicalized)– Tabular parsing techniques (cf. CKY parsing)
• Modern developments:– Link grammar (Sleator and Temperley)– Bilexical grammar (Eisner):
• Lexicalized parsing in O(n3) time by combining spans instead of constituents.
Constraint Satisfaction
• Constraints on dependency graphs (Maruyama):pos(i) = D [dep(i) = DET pos(head(i)) = N i < head(i)]
pos(i) = N [dep(i) = SUBJ pos(head(i)) = V i < head(i)]
pos(i) = V [dep(i) = ROOT head(i) = nil]
[head(i) = head(j) dep(i) = dep(j)] i = j
• Graph satisfying the above constraints:
Da
Ndog
Vruns
SUBJDET
Parsing with Constraints
• Eliminative parsing:– Start with all formally possible analyses (in a
compact representation)– Eliminate representations that violate
constraints until only valid analyses remain
• Variations:– Weighted constraints– Transformational parsing
Deterministic Parsing
• Covington’s fundamental algorithm:– Accept words one by one starting at the
beginning of the sentence, and try linking each word as head or dependent of every previous word.
• Variations on shift-reduce parsing:– Standard (Kudo, Matsumoto, Yamada)– Arc-eager (Nivre)
Arc-Eager Parsing
• Configuration: C = S, I, AS = StackI = Input (remaining)A = Arc relation (current)
• Initialization:nil, W,
• Termination:S, nil, A for any S, A
• Acceptance:S, nil, A if (W, A) is connected
Transitions
• Left-Arc (LA): wi|S, wj|I, A S, wj|I, A {(wj, wi)}
if a : a A dep(a) = wi
• Right-Arc (RA):wi|S, wj|I, A wj|wi|S, I, A {(wi, wj)}
if a : a A dep(a) = wj
• Reduce (RE):wi|S, I, A S, I, A
if a : a A dep(a) = wi
• Shift (SH):S, wi|I, A wi|S, I, A
MaltParser
• Robust, data-driven dependency parsing using a combination of:– Deterministic parsing (e.g. arc-eager)– Discriminative machine learning (e.g. MBL)– User-defined feature models:
• Lexical features
• Part-of-speech features
• Dependency type features
Why (Not) Dependency Parsing?
• Potential advantages of dependency parsing:– Dependency relations are close to semantic relations,
which facilitates semantic interpretation– Dependency representations are more constrained (less
complex), which facilitates parsing– Dependency representations are more suitable for
languages with free or flexible word order
• Potential disadvantages:– Dependency representations are less expressive– Dependency representations are less well understood
formally and computationally