supertagging cmsc 35100 natural language processing january 31, 2006

Supertagging

CMSC 35100Natural Language Processing

January 31, 2006

Roadmap

Motivation Tagging, Parsing & Lexicalization

Supertags Definition & Examples

Parsing as disambiguation Structural filters Unigram & N-gram models

Results & Discussion

Motivation: Good, Robust Parsing Main approaches:

Finite-state parsers: Shallow parsing, longest match resolves

ambiguity Hand-crafted

Partition domain-independent, domain-dependent

Statistical parsers: Assign some structure to any string w/probability Automatically extracted from manual annotations Not linguistically transparent, hard to modify

Lexicalization Lexicalization:

Provides integration of syntax, semantics of lexical Enforces subcategorization, semantic constraints

Finite set of elementary structures Trees, strings,etc Anchored on lexical item

Lexical items Associated w/at least one elementary grammar

struct Finite set of operations combines

Framework

FB-LTAG: Feature-based, lexicalized TAG Elementary trees:

Lexical item (anchor) on frontier Provides complex description of anchor Specifies domain of locality for syn/sem

constraints Initial (non-recursive), auxiliary (recursive) Derived by substitution and adjunction

Supertags Inspired by POS taggers

Locally resolve much POS ambiguity before parsing (96-97% accurate)

Limited context, e.g. tri-gram Elementary trees localize dependencies

All and only dependent elements are in tree Supertag=elementary structure

Highly ambiguous One per lexical item per distinct use Word with one POS has (probably) many supertags

E.g. always a verb, many subcat frames

Extended Domain of Locality Each supertag must contain all and

only the arguments of the anchor in the struct

For each lexical item, grammar must contain supertag for each syntactic environment in which the item can appear

Factoring Recursion Recursive constructs represented as

auxiliary trees / supertags

Initial supertags define domains of locality for agreement, subcategorization

Auxiliary trees can capture long distance dependencies by adjunction

Supertags: Ambiguity and Parsing

One supertag per word in complete parse Must select to parse

Problem: Massive ambiguity Solution: Manage with local

disambiguation Supertags localize dependencies Apply local n-gram constraints b/f parsing POS disambiguation makes parsing easier

Supertagging disambiguation makes it trivial Just structure combination

Example

Structural Filtering Simple local tests for supertag use

Span of supertag: Minimum number of lexical items covered Can’t be larger than input string

Left/Right span constraint: Left or right of anchor can’t be longer than input

Lexical items: Can’t use if terminals don’t appear in input

Features, etc Reduces ambiguity by 50% before parsing

Verbs, esp. light verbs worst, reduce 50% Still > 250 per POS

N-gram Supertagging Initially, (POS,supertag) pairs

Tri-gram model, trained on 5K WSJ sentences 68% accuracy – small corpus, little smoothing

Dependency parsing Avoid fixed context length Dependency if substitutes or adjoins to tree Limited by size of LTAG parsed corpus

Too much like regular parsing

Smoothed N-gram Models: Data

Training data: Two sources:

XTAG parses of WSJ, IBM, ATIS Small corpus but clean TAG derivations

Converted Penn Treebank WSJ sentences Associates each lexical item with supertag using

parse Requires heuristics using local tree contexts

Labels of dominating nodes, siblings, parent’s sibs Approximate

Smoothed N-gram Models: Unigram

Disambiguation redux: Assume structural filters applied

Unigram approach: Select supertag for word by its preference Pr(t|w) = freq(t,w)/freq(w) Most frequent supertag for word in training

Missing word: Backoff to most frequent supertag of POS of word

Results: 73-77% top 1 vs 91% POS Errors: Verb subcat, PP attachment, NP

head/mod

Smoothed N-gram Models: Trigram Enhances previous models

Trigram: lexicalized, (word,supertag) Unigram: Adds context

Ideally T=argmaxTP(T1,T2,..Tn)*P(W1,..Wn|T1,..Tn)

Really

Good-Turing smoothing, Katz backoff Results: up to 92%, 1M words training

)|(),...,,|,..,,(

)|(),...,,(

12121

2,11

21

i

N

iinn

ii

N

iin

TWPTTTWWWP

TTTPTTTP

Supertagging & Parsing Front-end to parser

Analogous to POS tagging Key: disambiguate supertags

A) Pick most probable by trigram method 4 vs 120 seconds per sentence to parse If tag is wrong, parse is wrong

B) Pick n-best supertags Recovers many more parses, slower

Still fails if ill-formed

Conclusion

Integration of light-weight n-gram approach With rich, lexicalized representation

N-gram supertagging produces almost parse Good tag, parsing effectiveness > 90% Faster than XTAG by factor of 30

Issues: conversion, ill-formed input, etc

Applications

Lightweight dependency analysis

Enhanced information retrieval Exploit syntactic structure of query Can enhance precision 33-> 79%

Supertagging in other formalisms CCG

Lightweight Dependency Analysis Heuristic, linear-time, deterministic

Pass 1: modifier supertags Pass 2: non-modifier supertags Compute dependencies for s (w/anchor

w) For each frontier node d in s

Connect word to left or right of w by position Label arc to d with internal node

P=82.3, R=93.8 Robust to fragments

supertagging cmsc 35100 natural language processing january 31, 2006

Documents