supertagging cmsc 35100 natural language processing january 31, 2006
TRANSCRIPT
Supertagging
CMSC 35100Natural Language Processing
January 31, 2006
Roadmap
Motivation Tagging, Parsing & Lexicalization
Supertags Definition & Examples
Parsing as disambiguation Structural filters Unigram & N-gram models
Results & Discussion
Motivation: Good, Robust Parsing Main approaches:
Finite-state parsers: Shallow parsing, longest match resolves
ambiguity Hand-crafted
Partition domain-independent, domain-dependent
Statistical parsers: Assign some structure to any string w/probability Automatically extracted from manual annotations Not linguistically transparent, hard to modify
Lexicalization Lexicalization:
Provides integration of syntax, semantics of lexical Enforces subcategorization, semantic constraints
Finite set of elementary structures Trees, strings,etc Anchored on lexical item
Lexical items Associated w/at least one elementary grammar
struct Finite set of operations combines
Framework
FB-LTAG: Feature-based, lexicalized TAG Elementary trees:
Lexical item (anchor) on frontier Provides complex description of anchor Specifies domain of locality for syn/sem
constraints Initial (non-recursive), auxiliary (recursive) Derived by substitution and adjunction
Supertags Inspired by POS taggers
Locally resolve much POS ambiguity before parsing (96-97% accurate)
Limited context, e.g. tri-gram Elementary trees localize dependencies
All and only dependent elements are in tree Supertag=elementary structure
Highly ambiguous One per lexical item per distinct use Word with one POS has (probably) many supertags
E.g. always a verb, many subcat frames
Extended Domain of Locality Each supertag must contain all and
only the arguments of the anchor in the struct
For each lexical item, grammar must contain supertag for each syntactic environment in which the item can appear
Factoring Recursion Recursive constructs represented as
auxiliary trees / supertags
Initial supertags define domains of locality for agreement, subcategorization
Auxiliary trees can capture long distance dependencies by adjunction
Supertags: Ambiguity and Parsing
One supertag per word in complete parse Must select to parse
Problem: Massive ambiguity Solution: Manage with local
disambiguation Supertags localize dependencies Apply local n-gram constraints b/f parsing POS disambiguation makes parsing easier
Supertagging disambiguation makes it trivial Just structure combination
Example
Example
Example
Structural Filtering Simple local tests for supertag use
Span of supertag: Minimum number of lexical items covered Can’t be larger than input string
Left/Right span constraint: Left or right of anchor can’t be longer than input
Lexical items: Can’t use if terminals don’t appear in input
Features, etc Reduces ambiguity by 50% before parsing
Verbs, esp. light verbs worst, reduce 50% Still > 250 per POS
N-gram Supertagging Initially, (POS,supertag) pairs
Tri-gram model, trained on 5K WSJ sentences 68% accuracy – small corpus, little smoothing
Dependency parsing Avoid fixed context length Dependency if substitutes or adjoins to tree Limited by size of LTAG parsed corpus
Too much like regular parsing
Smoothed N-gram Models: Data
Training data: Two sources:
XTAG parses of WSJ, IBM, ATIS Small corpus but clean TAG derivations
Converted Penn Treebank WSJ sentences Associates each lexical item with supertag using
parse Requires heuristics using local tree contexts
Labels of dominating nodes, siblings, parent’s sibs Approximate
Smoothed N-gram Models: Unigram
Disambiguation redux: Assume structural filters applied
Unigram approach: Select supertag for word by its preference Pr(t|w) = freq(t,w)/freq(w) Most frequent supertag for word in training
Missing word: Backoff to most frequent supertag of POS of word
Results: 73-77% top 1 vs 91% POS Errors: Verb subcat, PP attachment, NP
head/mod
Smoothed N-gram Models: Trigram Enhances previous models
Trigram: lexicalized, (word,supertag) Unigram: Adds context
Ideally T=argmaxTP(T1,T2,..Tn)*P(W1,..Wn|T1,..Tn)
Really
Good-Turing smoothing, Katz backoff Results: up to 92%, 1M words training
)|(),...,,|,..,,(
)|(),...,,(
12121
2,11
21
i
N
iinn
ii
N
iin
TWPTTTWWWP
TTTPTTTP
Supertagging & Parsing Front-end to parser
Analogous to POS tagging Key: disambiguate supertags
A) Pick most probable by trigram method 4 vs 120 seconds per sentence to parse If tag is wrong, parse is wrong
B) Pick n-best supertags Recovers many more parses, slower
Still fails if ill-formed
Conclusion
Integration of light-weight n-gram approach With rich, lexicalized representation
N-gram supertagging produces almost parse Good tag, parsing effectiveness > 90% Faster than XTAG by factor of 30
Issues: conversion, ill-formed input, etc
Applications
Lightweight dependency analysis
Enhanced information retrieval Exploit syntactic structure of query Can enhance precision 33-> 79%
Supertagging in other formalisms CCG
Lightweight Dependency Analysis Heuristic, linear-time, deterministic
Pass 1: modifier supertags Pass 2: non-modifier supertags Compute dependencies for s (w/anchor
w) For each frontier node d in s
Connect word to left or right of w by position Label arc to d with internal node
P=82.3, R=93.8 Robust to fragments