cs 730: text mining for social media & collaboratively generated content

152
CS 730: Text Mining for Social Media & Collaboratively Generated Content Lecture 3: Parsing and Chunking

Upload: blaine-everett

Post on 31-Dec-2015

24 views

Category:

Documents


3 download

DESCRIPTION

CS 730: Text Mining for Social Media & Collaboratively Generated Content. Lecture 3: Parsing and Chunking. “Big picture” of the course. Language models (word, n-gram, …) Classification and sequence models WSD, Part-of-speech tagging Syntactic parsing and tagging Next week: semantics - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS 730: Text Mining for Social Media & Collaboratively Generated Content

Lecture 3: Parsing and Chunking

Page 2: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 2

“Big picture” of the course

Language models (word, n-gram, …)Classification and sequence models

WSD, Part-of-speech taggingSyntactic parsing and tagging• Next week: semantics• Next->Next week: Info Extraction + Text Mining Intro

– Fall break• Part II: Social media (research papers start)

9/20/2010

Page 3: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 3

Today’s Lecture Plan

• Phrase Chunking

• Syntactic Parsing

• Machine Learning-based Syntactic Parse

9/20/2010

Page 4: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 49/20/2010

Phrase Chunking

Page 5: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 59/20/2010

Why Chunking/Parsing?

Page 6: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 69/20/2010

Phrase Structure (continued)

Page 7: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 79/20/2010

Types of Phrases• Phrases: classify by part of speech of main word or by syntactic role

– subject and predicate; noun phrase and verb phraseIn "The young cats drink milk.“

"The young cats" is a noun phrase and the subject;"drink milk" is a verb phrase and the predicate the main word is the head of the phrase: "cats" in "the young cats"

• Verb complements and modifiers – types of complements ... noun phrases, adjective phrases, prepositional phrases, particles

noun phrase: I served a brownie.adjective phrase: I remained very rich.prepositional phrase: I looked at Fred.particles: He looked up the number.

– clauses; clausal complements • I dreamt that I won a million brownies.

– tenses: simple past, present, future; progressive, perfectsimple present: John bakes cookies.present progressive: John is baking cookies.present perfect: John has baked cookies.

– active vs. passiveactive: Bernie ate the banana.passive: The banana was eaten by Bernie.

Page 8: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 89/20/2010

Noun Phrase Structure• Left modifiers:

– determiner, quantifier, adjective, noun: the five shiny tin cans

• Right modifiers: prepositional phrases and apposition– prepositional phrase: the man in the moon– apposition: Scott, the Arctic explorer

• Relative clauses

the man who ate the popcornthe popcorn which the man atethe man who is eating the popcornthe tourist who was eaten by a lion

• Reduced relative clausesthe man eating the popcornthe man eaten by a lion

Page 9: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 99/20/2010

Attachment Ambiguities

Page 10: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 109/20/2010

Preliminaries: Constraint Grammars/CFGs

Page 11: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 119/20/2010

CFG (applying rewrite rules)

Page 12: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 129/20/2010

Preliminaries: CFG (continued)

Page 13: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 139/20/2010

Parsing

Page 14: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 149/20/2010

Human parsing

Page 15: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 159/20/2010

Chunking (from Abney 1994)• “I begin with an intuition: when I read a sentence, I read it

a chunk at a time.”– Breaks up something like this:

• [I begin] [with an intuition]: [when I read] [a sentence], [I read it] [a chunk] [at a time]

• Chunks correspond to prosodic patterns. – Strongest stresses in the sentence fall one to a chunk– Pauses are most likely to fall between chunks

• Typical chunk consists of a single content word surrounded by a constellation of function words, matching a fixed template.

• A simple context-free grammar is often adequate to describe the structure of chunks.

Page 16: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 169/20/2010

Chunking (continued)• Text chunking subsumes a range of tasks.

– The simplest is finding noun groups or base NPs:• non-recursive noun phrases up to the head (for English).

– More ambitious systems may add additional chunk types, such as verb groups

– Seek a complete partitioning of the sentence into chunks of different types:

[NP He ] [VP reckons ] [NP the current account deficit ] [VP will narrow ] [PP to ] [NP only $1.8 billion ] [PP in ] [NP September ] .Steve Abney, Parsing by Chunks

– The chunks are non-recursive structures which can be handled by finite-state methods (CFGs)

• Why do text chunking?– Full parsing is expensive, and is not very robust.– Partial parsing much faster, more robust, sufficient for many applications

(IE, QA). – Can also serve as a possible first step for full parsing

Page 17: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 179/20/2010

Chunking: Rule-based

• Quite high performance on NP chunking can be obtained with a small number of regular expressions

• With a larger rule set, using Constraint Grammar rules, Voutilainen reports recall of 98%+ with precison of 95-98% for noun chunks.– Atro Voutilainen, NPtool, a Detector of English Noun Phrases,

WVLC 93.

Page 18: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 189/20/2010

Why chunking can be difficult?• Two major sources of error (and these are also error sources for simple finite-state

patterns for baseNP): participles and conjunction. • Whether a particple is part of a noun phrase will depend on the particular choice of

wordsHe enjoys writing letters.He sells writing paper.

– and sometimes is genuinely ambiguous ...He enjoys baking potatoes.He has broken bottles in the basement.

• The rules for conjoined NPs are complicated by the bracketing rules of the Penn Tree Bank. – Conjoined prenominal nouns are generally treated as part of a single baseNP:

"brick and mortar university" (with "brick and mortar" modifying "university"). – Conjoined heads with shared modifiers are also to be treated as a single baseNP:

"ripe apples and bananas"; • If the modifier is not shared, there are two baseNPs:

"ripe apples and cinnamon". • Modifier sharing, however, is hard for people to judge and is not consistently annotated

Page 19: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 199/20/2010

Transformation-Based Learning for Chunking

• Adapted the TBL method from Brill for POS tagger.  One-level NP chunking restated as a word tagging task. 

• Used 3 tags:  I (inside a baseNP)

O (outside a baseNP)B (the start of a baseNP which immediately follows another baseNP)

• Initial tags assigned based on the most likely tag for a given part-of-speech.  • The contexts for TBL rules:

words, part-of-speech assignments, and prior IOB tags.

Ramshaw & Marcus, Text Chunking using Transformation-Based Learning, WVLC 1995

Page 20: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 209/20/2010

TBL-based Chunking (2)

• Results can be scored based on the correct assignment of tags, or on recall and precision of complete baseNPs. – The latter is normally used as the metric, since it corresponds to the

actual objective -- different tag sets can be used as an intermediate representation.

– Obtained about 92% recall and precision with their system for baseNPs, using 200K words of training.

– Without lexical information: 90.5% recall and precision.

Ramshaw & Marcus, Text Chunking using Transformation-Based Learning, WVLC 1995

Page 21: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 219/20/2010

Chunking: Classification-based

• Classification task:– NP or not NP?

• Using classifiers for Chunking– The best performance on the base NP and chunking tasks was

obtained using a Support Vector Machine method. – They obtained an accuracy of 94.22% with the small data set of

Ramshaw and Marcus, and 95.77% by training on almost the entire Penn Treebank.

Taku Kudo; Yuji Matsumoto. Chunking with Support Vector Machines Proc. NAACL 01.

Page 22: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 229/20/2010

Hand-tuning vs. Machine Learning• BaseNP chunking is a task for which people (with some linguistics

training) can write quite good rules quickly. • This raises the practical question of whether we should be using

machine learning at all. – If there is already a large relevant resource, it makes sense to learn

from it. – However, if we have to develop a chunker for a new language, is it

cheaper to annotate some data or to write the rules directly?

• Ngai and Yarowsky addressed this question.– They also considered selecting the data to be annotated. – Traditional training is based on sequential text annotation ... we just

annotate a series of documents in sequence. – Can we do better?

• Ngai, G. and D. Yarowsky, Rule Writing or Annotation: Cost-efficient Resource Usage for Base Noun Phrase Chunking. ACL 2000

Page 23: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 239/20/2010

Active Learning

• Instead of annotating training examples sequentially, choose good examples

• Usually, choose examples “on the boundary” – i.e., for which classifier has low confidence

• Very often allows training to converge much faster than sequential/batch learning.

• Drawback: requires user in the loop.

Ngai &Yarowsky ACL 2000

Page 24: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 249/20/2010

Active Learning (continued)Ngai &Yarowsky ACL 2000

Page 25: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 259/20/2010

Rule Writing vs. Active LearningNgai &Yarowsky ACL 2000

Page 26: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 269/20/2010

Rule Writing vs. Annotation Learning

• Annotation:– Can continue infinitely– Can combine efforts of multiple annotators– More consistent results– Accuracy can be improved by better learning algs.

• Rule writing– Must keep in mind rule interactions– Difficult to combine rules from different experts– Requires more skills– Accuracy limited by set of rules (will not improve ever)

Ngai &Yarowsky ACL 2000

Page 27: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 279/20/2010

The parsing problem

PARSER

Grammar

scorer

correct test trees

testsentences

accuracy

Recent parsers quite accurate

(Eisner, Collins, Charniak, etc.)

Page 28: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 289/20/2010

Applications of parsing (1/2) Machine translation (Alshawi 1996, Wu 1997, ...)

English Chinesetree

operations

Speech synthesis from parses (Prevost 1996)

The government plans to raise income tax. The government plans to raise income tax the imagination.

Speech recognition using parsing (Chelba et al 1998)

Put the file in the folder.

Put the file and the folder.

Page 29: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 299/20/2010

Applications of parsing (2/2) Grammar checking (Microsoft)

Indexing for information retrieval (Woods 1997)

... washing a car with a hose ... vehicle maintenance

Information extraction (Hobbs 1996)

NY Times archive

Database

query

Page 30: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 309/20/2010

Parsing for the Turing Test

Most linguistic properties are defined over trees. One needs to parse to see subtle distinctions. E.g.:

Sara dislikes criticism of her. (her Sara)

Sara dislikes criticism of her by anyone. (her Sara)

Sara dislikes anyone’s criticism of her. (her = Sara or her Sara)

Page 31: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 319/20/2010

What makes a good grammar

• Conjunctions must match– I ate a hamburger and on the stove.– I ate a cold hot dog and well burned.– I ate the hot dog slowly and a hamburger

Page 32: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 329/20/2010

Vanilla CFG not sufficient for NL

• Number agreement– a men

• DET selection– a apple

• Tense, mood, etc. agreement

• For now, let’s what it would take to parse English with vanilla CFG

Page 33: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 339/20/2010

Parsing re-defined

Page 34: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 349/20/2010

Revised CFG

Page 35: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 359/20/2010

In: cats scratch people with claws

Page 36: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 369/20/2010

Soundness and Completeness in Parsing

Page 37: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 379/20/2010

Top-Down Parsing• Top-down parsing is goal directed• A top-down parser starts with a list of constituents to be

built.• The top-down parser rewrites the goals in the goal list by

matching one against the LHS of the grammar rules, and expanding it with the RHS, attempting to match the sentence to be derived.

• If a goal can be rewritten in several ways, then there is a choice of which rule to apply (search problem)

• Can use depth-first or breadth-first search, and goal ordering.

Page 38: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 389/20/2010

Simple Top-down parsing algorithm1. Start with initial state ((S) 1) and no backup states.

2. Select current state: Take the first state off possibilities list and call it C.

3. If the possibilities list is empty then the algorithm fails (that is, no successful parse is possible).

4. If C consists of an empty symbol list and the word position is at the end of the sentence then the algorithm succeeds. Otherwise, generate the next possible states.

5. If the first symbol on the symbol list of C is a lexical symbol, and the next word in the sentence can be in that class, then create a new state by removing the first symbol from the symbol list and updating the

word position, and add it to the possibilities list. Otherwise, if the first symbol on the symbol list of C is a non-terminal, generate a new state

for each rule in the grammar that can rewrite that nonterminal symbol and add them all to the possibilities list.

Page 39: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 399/20/2010

Top-down as search

• For a depth-first strategy, the possibilities list is a stack. In other words, step 1 always takes the first element off the list, and step 3 always puts the new states on the front of the list, yielding a last-in first-out (LIFO) strategy.

• In contrast, in a breadth-first strategy the possibilities list is manipulated as a queue. Step 3 adds the new positions onto the end of the list, rather than the beginning, yielding a first-in first-out (FIR)) strategy.

Page 40: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 409/20/2010

Top-down example• Grammer: same CFG as before• Lexicon:

– cried: V– dogs: N, V– the: ART

• Input: The/1 dogs/2 cried/3• A typical parse state:

– ((N VP) 2)– Parser needs to find N followed by VP, starting at

position 2

Page 41: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 419/20/2010

Parsing “The dogs cried”Step

Current State Backup States Comment

1. ((S) 1)   initial position

2. ((NP VP) 1)   rewriting S by rule I

3. ((ART N VP) 1)   rewriting NP by rules 2 & 3

    ((ART ADJN VP) I)  

4. ((N VP) 2)   matching ART with the

    ((ART ADJ N VP) 1)

 

5. ((VP) 3)   matching N with dogs

    ((ART ADJ N VP) 1)

 

6. ((V) 3)   rewriting VP by rules 5—8

    ((V NP) 3)  

    ((ART ADJ N VP) 1)

 

7.     the parse succeeds as V is matched to cried, leaving an empty grammatical symbol list with an empty sentence

Page 42: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 429/20/2010

Problems with Top-down

• Left recursive rules

• A top-down parser will do badly if there are many different rules for the same LHS. Consider if there are 600 rules for S, 599 of which start with NP, but one of which starts with V, and the sentence starts with V.

• Useless work: expands things that are possible top-down but not there

• Top-down parsers do well if there is useful grammar-driven control: search is directed by the grammar

• Top-down is hopeless for rewriting parts of speech (preterminals) with words (terminals). In practice that is always done bottom-up as lexical lookup.

• Repeated work: anywhere there is common substructure

Page 43: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 439/20/2010

Bottom-up Parsing

• Bottom-up parsing is data directed• The initial goal list of a bottom-up parser is the string to be

parsed. If a sequence in the goal list matches the RHS of a rule, then this sequence may be replaced by the LHS of the rule.

• Parsing is finished when the goal list contains just the start category.

• If the RHS of several rules match the goal list, then there is a choice of which rule to apply (search problem)

• Can use depth-first or breadth-first search, and goal ordering.• The standard presentation is as shift-reduce parsing.

Page 44: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CS730: Text Mining for Social Media, F2010 449/20/2010

Problems with Bottom-up Parsing• Unable to deal with empty categories: termination problem, unless

rewriting empties as constituents is somehow restricted (but then it’s generally incomplete)

• Useless work: locally possible, but globally impossible.

• Inefficient when there is great lexical ambiguity (grammar-driven control might help here)

• Conversely, it is data-directed: it attempts to parse the words that are there.

• Repeated work: anywhere there is common substructure

• Both TD (LL) and BU (LR) parsers can (and frequently do) do work exponential in the sentence length on NLP problems.

Page 45: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Dynamic Programming for Parsing

• Systematically fill in tables of solutions to sub-problems.

• Store subtrees for each of the various constituents in the input as they are discovered

• Cocke-Kasami-Younger (CKY) algorithm, Early’s algorithm, and chart parsing.

9/20/2010 45CS730: Text Mining for Social Media, F2010

Page 46: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CKY algorithm (BU), recognizer version

Input: string of n words Output: yes/no (since it’s only a recognizer) Data structure: n x n table

rows labeled 0 to n-1 columns labeled 1 to n cell [i,j] lists constituents found between i and j

9/20/2010 46CS730: Text Mining for Social Media, F2010

Page 47: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Miniature Grammar

9/20/2010 47CS730: Text Mining for Social Media, F2010

Page 48: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CKY Example

9/20/2010 48CS730: Text Mining for Social Media, F2010

Page 49: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CKY Algorithm

9/20/2010 49CS730: Text Mining for Social Media, F2010

Page 50: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

9/20/2010 50CS730: Text Mining for Social Media, F2010

Page 51: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CKY: Fill last column after “Houston”

9/20/2010 51CS730: Text Mining for Social Media, F2010

Page 52: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CKY: Fill last column after “Houston”

9/20/2010 52CS730: Text Mining for Social Media, F2010

Page 53: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CKY: Fill last column after “Houston”

9/20/2010 53CS730: Text Mining for Social Media, F2010

Page 54: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CKY: Fill last column after “Houston”

9/20/2010 54CS730: Text Mining for Social Media, F2010

Page 55: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CKY: Fill last column after “Houston”

9/20/2010 55CS730: Text Mining for Social Media, F2010

Page 56: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

CKY Algorithm: Additional Information

• More formal algorithm analysis/description:http://www.cs.uiuc.edu/class/sp09/cs373/lectures/lect_15.pdf

• Online demo: http://homepages.uni-tuebingen.de/student/martin.lazarov/demos/cky.html

9/20/2010 CS730: Text Mining for Social Media, F2010 56

Page 57: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Feature-Augmented CFGs

• Motivation: Agreement– Most verbs in English can appear in two forms in the present

tense: • form used for third-person, singular subjects (the flight does).

Called 3sg. Has final -s • used for all other kinds of subjects (all the flights do, I do). Let’s

call non-3sg. Usually does not have final -s.– If subject does not agree with verb are ungrammatical:

• *[What flight] leave in the morning?• *Does [NP you] have a flight from Boston to ForthWorth?• *Do [NP this flight] stop in Dallas

9/20/2010 57CS730: Text Mining for Social Media, F2010

Page 58: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Agreement (continued)• Rule for yes-no-questions

– S → Aux NP VP• We could replace with two rules:

– S → 3sgAux 3sgNP VP– S → Non3sgAux Non3sgNP VP

• Also have to add rules for the lexicon:– 3sgAux → does | has | can | . . .– Non3sgAux → do | have | can | . . .

• Also need to add rules for 3sgNP and Non3sgNP:– Make two copies of each rule for NP

• The problem with this method of dealing with number agreement is that it doubles the size of the grammar

9/20/2010 58CS730: Text Mining for Social Media, F2010

Page 59: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Other Agreement Issues

• Head nouns and determiners have to agree:– this flight *this flights– those flights *those flight

• Problems in languages like German or French, which have gender agreement

• Solutions:– Proliferate rules (lots and lots of CFG rules)– Or…

9/20/2010 59CS730: Text Mining for Social Media, F2010

Page 60: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Feature-Augmented CFGs• Number agreement features:

[NUMBER SG]• Adding an additional feature-value pair to capture person

[ NUMBER SG PERSON 3 # ]

• Encode grammatical category of the constituent[ CAT NP NUMBER SG PERSON 3 ]

• Represent 3sgNP category of noun phrases. • Corresponding plural version of this structure:

[ CAT NPNUMBER PLPERSON 3 ]

9/20/2010 60CS730: Text Mining for Social Media, F2010

Page 61: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Features (continued)

9/20/2010 61CS730: Text Mining for Social Media, F2010

Page 62: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Features in Grammar

• CFG rules with constraint features– Example: Number agreement

S → NP VP[ NP NUMBER ] = [ VP NUMBER ]

• Feature augmentations changes CFGs– No longer blind concatenation of non-terminals

• Can be used as filters in Early’s algorithm

9/20/2010 62CS730: Text Mining for Social Media, F2010

Page 63: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Feature Augmentation Key Ideas

• The elements of context-free grammar rules have feature-based constraints associated with them. Shift from atomic grammatical categories to more complex categories with properties.

• The constraints associated with individual rules can refer to, and manipulate, the feature structures associated with the parts of the rule to which they are attached.

9/20/2010 63CS730: Text Mining for Social Media, F2010

Page 64: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Dependency Grammars

– I gave him my address• All links between lexical (word) nodes• About 35 syntactic and semantic relations

9/20/2010 64CS730: Text Mining for Social Media, F2010

Page 65: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Dependency Grammars (continued)

• Advantages:– Free word order

• Implementations:– Link parser (Sleator). Freely available

9/20/2010 65CS730: Text Mining for Social Media, F2010

Page 66: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

LEARNING TO PARSE

9/20/2010 CS730: Text Mining for Social Media, F2010 66

Page 67: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Parsing in the early 1990s

• The parsers produced detailed, linguistically rich representations

• Parsers had uneven and usually rather poor coverage– E.g., 30% of sentences received no analysis

• Even quite simple sentences had many possible analyses– Parsers either had no method to choose between them or

a very ad hoc treatment of parse preferences• Parsers could not be learned from data• Parser performance usually wasn’t or couldn’t be assessed

quantitatively and the performance of different parsers was often incommensurable

9/20/2010 67CS730: Text Mining for Social Media, F2010

Page 68: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Ambiguity

• John saw Mary – Typhoid Mary– Phillips screwdriver Marynote how rare rules interact

• I see a bird – is this 4 nouns – parsed like “city park scavenger bird”?rare parts of speech, plus systematic ambiguity in noun sequences

• Time flies like an arrow– Fruit flies like a banana– Time reactions like this one– Time reactions like a chemist– or is it just an NP?

The official seat, center of authority, jurisdiction, or office of a bishop

9/20/2010 68CS730: Text Mining for Social Media, F2010

Page 69: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Our bane: Ambiguity

• John saw Mary – Typhoid Mary– Phillips screwdriver Marynote how rare rules interact

• I see a bird – is this 4 nouns – parsed like “city park scavenger bird”?rare parts of speech, plus systematic ambiguity in noun sequences

• Time | flies like an arrow NP VP– Fruit flies | like a banana NP VP– Time | reactions like this one V[stem] NP– Time reactions | like a chemist S PP– or is it just an NP?

9/20/2010 69CS730: Text Mining for Social Media, F2010

Page 70: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

May 2007 example…

9/20/2010 CS730: Text Mining for Social Media, F2010 70

Page 71: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

How to solve this combinatorial explosion of ambiguity?

1. First try parsing without any weird rules, throwing them in only if needed.

2. Better: every rule has a weight. A tree’s weight is total weight of all its rules. Pick the overall lightest parse of sentence.

3. Can we pick the weights automatically Yes: Statistical Parsing

9/20/2010 71CS730: Text Mining for Social Media, F2010

Page 72: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Statistical parsing

• Over the last 12 years statistical parsing has succeeded wonderfully!

• NLP researchers have produced a range of (often free, open source) statistical parsers, which can parse any sentence and often get most of it correct

• These parsers are now a commodity component

• The parsers are still improving.9/20/2010 72CS730: Text Mining for Social Media, F2010

Page 73: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Learning to Parse: A Taste• Penn Treebank project (about 1M words)

9/20/2010 73CS730: Text Mining for Social Media, F2010

Page 74: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Using a Treebank as Grammar

4,500 different rules for expanding VP are separate rules for PP sequences of any length, and every possible arrangement of verb arguments:

9/20/2010 74CS730: Text Mining for Social Media, F2010

Page 75: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Naïve Treebank Grammar• 17,500 distinct rule types.

9/20/2010 75CS730: Text Mining for Social Media, F2010

Page 76: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Spoken Language Syntax• Utterances (vs. sentences)• Much higher rate of pronouns• Repair phenomenon (~40% of sentences)

– use of the words uh and um, word repetitions, restarts, and word fragments (“uh” – most common word)

9/20/2010 76CS730: Text Mining for Social Media, F2010

Page 77: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Treebanks for Speech: LDC Switchboard

9/20/2010 77CS730: Text Mining for Social Media, F2010

Page 78: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Statistical parsing applications

• High precision question answering systems (Pasca and Harabagiu SIGIR 2001)

• Improving biological named entity extraction (Finkel et al. JNLPBA 2004):

• Syntactically based sentence compression (Lin and Wilbur Inf. Retr. 2007)

• Extracting people’s opinions about products (Bloom et al. NAACL 2007)

• Improved interaction in computer games (Gorniak and Roy, AAAI 2005)

• Helping linguists find data (Resnik et al. BLS 2005)

9/20/2010 78CS730: Text Mining for Social Media, F2010

Page 79: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Probabilistic CKY

9/20/2010 79CS730: Text Mining for Social Media, F2010

Page 80: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

1NP4VP4

2P2V5

3 Det1

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP 9/20/2010 80CS730: Text Mining for Social Media, F2010

Page 81: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

1NP4VP4

2P2V5

3 Det1

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP 9/20/2010 81CS730: Text Mining for Social Media, F2010

Page 82: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10

1NP4VP4

2P2V5

3 Det1

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP 9/20/2010 82CS730: Text Mining for Social Media, F2010

Page 83: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8

1NP4VP4

2P2V5

3 Det1

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP 9/20/2010 83CS730: Text Mining for Social Media, F2010

Page 84: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

1NP4VP4

2P2V5

3 Det1

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP 9/20/2010 84CS730: Text Mining for Social Media, F2010

Page 85: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

1NP4VP4

2P2V5

3 Det1

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP 9/20/2010 85CS730: Text Mining for Social Media, F2010

Page 86: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

1NP4VP4

2P2V5

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP 9/20/2010 86CS730: Text Mining for Social Media, F2010

Page 87: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

1NP4VP4

2P2V5

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP 9/20/2010 87CS730: Text Mining for Social Media, F2010

Page 88: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

1NP4VP4

2P2V5

PP12

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP 9/20/2010 88CS730: Text Mining for Social Media, F2010

Page 89: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

1NP4VP4

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP 9/20/2010 89CS730: Text Mining for Social Media, F2010

Page 90: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

1NP4VP4

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP 9/20/2010 90CS730: Text Mining for Social Media, F2010

Page 91: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

1NP4VP4

NP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP 9/20/2010 91CS730: Text Mining for Social Media, F2010

Page 92: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

1NP4VP4

NP18S21

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP 9/20/2010 92CS730: Text Mining for Social Media, F2010

Page 93: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP 9/20/2010 93CS730: Text Mining for Social Media, F2010

Page 94: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP 9/20/2010 94CS730: Text Mining for Social Media, F2010

Page 95: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP 9/20/2010 95CS730: Text Mining for Social Media, F2010

Page 96: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP 9/20/2010 96CS730: Text Mining for Social Media, F2010

Page 97: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP 9/20/2010 97CS730: Text Mining for Social Media, F2010

Page 98: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP 9/20/2010 98CS730: Text Mining for Social Media, F2010

Page 99: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP 9/20/2010 99CS730: Text Mining for Social Media, F2010

Page 100: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP 9/20/2010 10

0CS730: Text Mining for Social Media, F2010

Page 101: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27S22

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP 9/20/2010 10

1CS730: Text Mining for Social Media, F2010

Page 102: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP 9/20/2010 10

2CS730: Text Mining for Social Media, F2010

Page 103: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

SFollow backpointers …

9/20/2010 103

CS730: Text Mining for Social Media, F2010

Page 104: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

S

NP VP

9/20/2010 104

CS730: Text Mining for Social Media, F2010

Page 105: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

S

NP VP

VP PP

9/20/2010 105

CS730: Text Mining for Social Media, F2010

Page 106: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

S

NP VP

VP PP

P NP

9/20/2010 106

CS730: Text Mining for Social Media, F2010

Page 107: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

S

NP VP

VP PP

P NP

Det N

9/20/2010 107

CS730: Text Mining for Social Media, F2010

Page 108: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Which entries do we need?

9/20/2010 108

CS730: Text Mining for Social Media, F2010

Page 109: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Which entries do we need?

9/20/2010 109

CS730: Text Mining for Social Media, F2010

Page 110: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Not worth keeping …

9/20/2010 110

CS730: Text Mining for Social Media, F2010

Page 111: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

… since it just breeds worse options

9/20/2010 111

CS730: Text Mining for Social Media, F2010

Page 112: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

time 1 flies 2 like 3 an 4 arrow 5

0

NP3Vst3

NP10S8S13

NP24S22S27NP24S27S22S27

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Keep only best-in-class!

inferior stock

9/20/2010 112

CS730: Text Mining for Social Media, F2010

Page 113: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

time 1 flies 2 like 3 an 4 arrow 5

NP3Vst3

NP10S8

NP24S22

1NP4VP4

NP18S21VP18

2P2V5

PP12VP16

3 Det1

NP10

4 N8

1 S NP VP6 S Vst NP2 S S PP

1 VP V NP2 VP VP PP

1 NP Det N2 NP NP PP3 NP NP NP

0 PP P NP

Keep only best-in-class!(and backpointers so you can recover parse)

9/20/2010 113

CS730: Text Mining for Social Media, F2010

Page 114: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Probabilistic Trees

• Instead of lightest weight tree, take highest probability tree• Given any tree, a generator should have some probability of

producing it! • Just like using n-grams to choose among strings …• What is the probability of this tree?

S

NPtime

VP

VPflies

PP

Plike

NP

Detan

N arrow9/20/2010 114CS730: Text Mining for Social Media, F2010

Page 115: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Probabilistic or stochastic context-free grammars (PCFGs)• G = (T, N, S, R, P)

– T is set of terminals– N is set of nonterminals

• For NLP, we usually distinguish a set P N of preterminals, which always rewrite as terminals

• S is the start symbol (one of the nonterminals)• R is rules/productions of the form X , where X is a

nonterminal and is a sequence of terminals and nonterminals (possibly an empty sequence)

• P(R) gives the probability of each rule.

• A grammar G generates a language model L.

X N, P(X ) 1X R

P() 1T *

9/20/2010 115CS730: Text Mining for Social Media, F2010

Page 116: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

The probability of trees and strings

• P(t) -- The probability of tree is the product of the probabilities of the rules used to generate it.

• P(w1n) -- The probability of the string is the sum of the probabilities of the trees which have that string as their yield

P(w1n) = Σj P(w1n, t) where t is a parse of w1n

= Σj P(t) 9/20/2010 116CS730: Text Mining for Social Media, F2010

Page 117: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

A Simple PCFG (in CNF)

S NP VP 1.0 VP V NP 0.7VP VP PP 0.3PP P NP 1.0P with 1.0V saw 1.0

NP NP PP 0.4 NP astronomers 0.1 NP ears 0.18 NP saw 0.04 NP stars 0.18 NP telescope 0.1

9/20/2010 117CS730: Text Mining for Social Media, F2010

Page 118: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

9/20/2010 118

CS730: Text Mining for Social Media, F2010

Page 119: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

9/20/2010 119

CS730: Text Mining for Social Media, F2010

Page 120: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Tree and String Probabilities• w15 = astronomers saw stars with ears• P(t1) = 1.0 * 0.1 * 0.7 * 1.0 * 0.4 * 0.18 * 1.0 * 1.0 * 0.18 = 0.0009072• P(t2) = 1.0 * 0.1 * 0.3 * 0.7 * 1.0 * 0.18 * 1.0 * 1.0 * 0.18 = 0.0006804 • P(w15) = P(t1) + P(t2)

= 0.0009072 + 0.0006804 = 0.0015876

9/20/2010 120CS730: Text Mining for Social Media, F2010

Page 121: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Chomsky Normal Form• All rules are of the form X Y Z or X w.• A transformation to this form doesn’t change the

generative capacity of CFGs.– With some extra book-keeping in symbol names, you

can even reconstruct the same trees with a detransform

– Unaries/empties are removed recursively– N-ary rules introduce new nonterminals:

• VP V NP PP becomes VP V @VP-V and @VP-V NP PP

• In practice it’s a pain– Reconstructing n-aries is easy– Reconstructing unaries can be trickier

• But it makes parsing easier/more efficient9/20/2010 121CS730: Text Mining for Social Media, F2010

Page 122: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

N-ary Trees in Treebank

Lexicon and Grammar

Binary Trees

TreeAnnotations.annotateTree

Parsing

Treebank binarization

9/20/2010 122

CS730: Text Mining for Social Media, F2010

Page 123: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

ROOT

S

NP VP

N

cats

V NP PP

P NP

clawswithpeoplescratch

NN

An example: before binarization…

9/20/2010 123CS730: Text Mining for Social Media, F2010

Page 124: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

P

NP

claws

N

@PP->_P

with

NP

N

cats peoplescratch

N

VP

V NP PP

@VP->_V

@VP->_V_NP

ROOT

S

@S->_NP

After binarization..

9/20/2010 124CS730: Text Mining for Social Media, F2010

Page 125: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Probabilistic Trees

• Instead of lightest weight tree, take highest probability tree• Given any tree, your assignment 1 generator would have

some probability of producing it! • Just like using n-grams to choose among strings …• What is the probability of this tree?

S

NPtime

VP

VPflies

PP

Plike

NP

Detan

N arrow9/20/2010 125CS730: Text Mining for Social Media, F2010

Page 126: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Chain rule: One node at a time

9/20/2010 CS730: Text Mining for Social Media, F2010 126

S

NPtime

VP

VPflies

PP

Plike

NP

Detan

N arrow

p( | S) = p(S

NP VP| S) * p(

S

NPtime

VP|

S

NP VP)

* p(S

NPtime

VP

VP PP

|S

NPtime

VP)

* p(S

NPtime

VP

VPflies

PP

|S

NPtime

VP) * …

VP PP

Page 127: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Chain rule + backoff

9/20/2010 CS730: Text Mining for Social Media, F2010 127

S

NPtime

VP

VPflies

PP

Plike

NP

Detan

N arrow

p( | S) = p(S

NP VP| S) * p(

S

NPtime

VP|

S

NP VP)

* p(S

NPtime

VP

VP PP

|S

NPtime

VP)

* p(S

NPtime

VP

VPflies

PP

|S

NPtime

VP) * …

VP PP

Page 128: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Simplified notation

9/20/2010 CS730: Text Mining for Social Media, F2010 128

S

NPtime

VP

VPflies

PP

Plike

NP

Detan

N arrow

p( | S) = p(S NP VP | S) * p(NP flies | NP)

* p(VP VP NP | VP)

* p(VP flies | VP) * …

Page 129: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Already have a CKY alg for weights …

S

NPtime

VP

VPflies

PP

Plike

NP

Detan

N arrow

w( | S) = w(S NP VP) + w(NP flies | NP)

+ w(VP VP NP)

+ w(VP flies) + …

Just let w(X Y Z) = -log p(X Y Z | X)Then lightest tree has highest prob9/20/2010 12

9CS730: Text Mining for Social Media, F2010

Page 130: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Pruning for Speed

• Heuristically throw away constituents that probably won’t make it into best complete parse.

• Use probabilities to decide which ones. – So probs are useful for speed as well as accuracy!

• Both safe and unsafe methods exist– Throw x away if p(x) < 10-200

(and lower this threshold if we don’t get a parse)– Throw x away if p(x) < 100 * p(y)

for some y that spans the same set of words– Throw x away if p(x)*q(x) is small, where q(x) is an estimate of

probability of all rules needed to combine x with the other words in the sentence

9/20/2010 130CS730: Text Mining for Social Media, F2010

Page 131: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Agenda (“Best-First”) Parsing

• Explore best options first– Should get some good parses early on – grab one & go!

• Prioritize constits (and dotted constits)– Whenever we build something, give it a priority

• How likely do we think it is to make it into the highest-prob parse? – usually related to log prob. of that constit– might also hack in the constit’s context, length, etc.– if priorities are defined carefully, obtain an A* algorithm

• Put each constit on a priority queue (heap)• Repeatedly pop and process best constituent.

– CKY style: combine w/ previously popped neighbors. – Earley style: scan/predict/attach as usual. What else?

9/20/2010 131CS730: Text Mining for Social Media, F2010

Page 132: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Preprocessing

• First “tag” the input with parts of speech:– Guess the correct preterminal for each word, using HMMs– Now only allow one part of speech per word– This eliminates a lot of crazy constituents!– But if you tagged wrong you could be hosed

• Raise the stakes: – What if tag says not just “verb” but “transitive verb”? Or “verb

with a direct object and 2 PPs attached”? (“supertagging”)

• Safer to allow a few possible tags per word, not just one …

9/20/2010 132CS730: Text Mining for Social Media, F2010

Page 133: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

How good are PCFGs?

• Robust (usually admit everything, but with low probability)

• Partial solution for grammar ambiguity: a PCFG gives some idea of the plausibility of a sentence

• But not so good because the independence assumptions are too strong

• Give a probabilistic language model – But in a simple case it performs worse than a trigram model

• The problem seems to be it lacks the lexicalization of a trigram model

9/20/2010 133CS730: Text Mining for Social Media, F2010

Page 134: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Lexicalization

• Lexical heads are important for certain classes of ambiguities (e.g., PP attachment):

• Lexicalizing grammar creates a much larger grammar.– Sophisticated smoothing needed– Smarter parsing algorithms needed– More data needed

• How necessary is lexicalization?– Bilexical vs. monolexical selection– Closed vs. open class lexicalization

9/20/2010 134CS730: Text Mining for Social Media, F2010

Page 135: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Lexicalized Parsing

• peel the apple on the towel– ambiguous

• put the apple on the towel– on attached to put (is the other reading even possible?)

• put the apple on the towel in the box• VP[head=put] V[head=put] NP PP• VP[head=put] V[head=put] NP PP[head=on]

• study the apple on the towel– study dislikes on (how can the PCFG express this?)

• VP[head=study] VP[head=study] PP[head=on]• study it on the towel

– it dislikes on even more – PP can’t attach to pronoun

9/20/2010 135CS730: Text Mining for Social Media, F2010

Page 136: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Lexicalized Parsing

• the plan that Natasha would swallow – ambiguous between content of plan and relative clause

• the plan that Natasha would snooze – snooze dislikes a direct object (plan)

• the plan that Natasha would make– make likes a direct object (plan)

• the pill that Natasha would swallow– pill can’t express a content-clause the way plan does– pill is a probable direct object for swallow

• How to express these distinctions in a CFG or PCFG?

9/20/2010 136CS730: Text Mining for Social Media, F2010

Page 137: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Putting words into PCFGs

• A PCFG uses the actual words only to determine the probability of parts-of-speech (the preterminals)

• In many cases we need to know about words to choose a parse

• The head word of a phrase gives a good representation of the phrase’s structure and meaning– Attachment ambiguities

The astronomer saw the moon with the telescope

– Coordination: the dogs in the house and the cats

– Subcategorization frames: put versus like

9/20/2010 137CS730: Text Mining for Social Media, F2010

Page 138: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

(Head) Lexicalization• put takes both an NP and a VP

– Sue put [ the book ]NP [ on the table ]PP

– * Sue put [ the book ]NP

– * Sue put [ on the table ]PP

• like usually takes an NP and not a PP– Sue likes [ the book ]NP

– * Sue likes [ on the table ]PP

• We can’t tell this if we just have a VP with a verb, but we can if we know what verb it is

9/20/2010 138CS730: Text Mining for Social Media, F2010

Page 139: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

(Head) Lexicalization

• Collins 1997, Charniak 1997• Puts the properties of words into a PCFG

Swalked

NPSue VPwalked

Sue Vwalked PPinto

walked Pinto NPstore

into DTthe NPstore

the store9/20/2010 139CS730: Text Mining for Social Media, F2010

Page 140: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Lexicalization sharpens probabilities: rule expansion

Local Tree come take think want

VP V 9.5% 2.6% 4.6% 5.7%

VP V NP 1.1% 32.1% 0.2% 13.9%

VP V PP 34.5% 3.1% 7.1% 0.3%

VP V SBAR 6.6% 0.3% 73.0% 0.2%

VP V S 2.2% 1.3% 4.8% 70.8%

VP V NP S 0.1% 5.7% 0.0% 0.3%

VP V PRT NP 0.3% 5.8% 0.0% 0.0%

VP V PRT PP 6.1% 1.5% 0.2% 0.0%

• E.g., probability of different verbal complement frames (often called “subcategorizations”)

9/20/2010 140CS730: Text Mining for Social Media, F2010

Page 141: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Lexicalization sharpens probabilities: Predicting heads

“Bilexical probabilities”

• p(prices | n-plural) = .013• p(prices | n-plural, NP) = .013• p(prices | n-plural, NP, S) = .025• p(prices | n-plural, NP, S, v-past) = .052• p(prices | n-plural, NP, S, v-past, fell) = .146

9/20/2010 141CS730: Text Mining for Social Media, F2010

Page 142: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Naïve Lexicalized Parsing

• Can, in principle, use CKY on lexicalized PCFGs– O(Rn3) time and O(Sn2) memory– But R = rV2 and S = sV– Result is completely impractical (why?)– Memory: 10K rules * 50K words * (40 words)2 * 8 bytes ≈ 6TB

• Can modify CKY to exploit lexical sparsity– Lexicalized symbols are a base grammar symbol and a pointer

into the input sentence, not any arbitrary word– Result: O(rn5) time, O(sn3)– Memory: 10K rules * (40 words)3 * 8 bytes ≈ 5GB

9/20/2010 142CS730: Text Mining for Social Media, F2010

Page 143: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Charniak (1997) linear interpolation/shrinkage

9/20/2010 CS730: Text Mining for Social Media, F2010 143

Page 144: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Charniak (1997) shrinkage example

9/20/2010 CS730: Text Mining for Social Media, F2010 144

Page 145: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Lexicalized Parsing was seen as the breakthrough of the late 90s

• Eugene Charniak, 2000 JHU workshop: “To do better, it is necessary to condition probabilities on the actual words of the sentence. This makes the probabilities much tighter:– p(VP V NP NP) = 0.00151– p(VP V NP NP | said) = 0.00001– p(VP V NP NP | gave) = 0.01980 ”

• Michael Collins, 2003 COLT tutorial: “Lexicalized Probabilistic Context-Free Grammars … perform vastly better than PCFGs (88% vs. 73% accuracy)”

9/20/2010 145CS730: Text Mining for Social Media, F2010

Page 146: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Lexicalized parsing results (Labeled Constituent Precision/Recall F1)

• Demo: http://nlp.stanford.edu:8080/parser/

9/20/2010 CS730: Text Mining for Social Media, F2010 146

Page 147: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Sparseness & the Penn Treebank

• The Penn Treebank – 1 million words of parsed English WSJ – has been a key resource (because of the widespread reliance on supervised learning)

• But 1 million words is like nothing:– 965,000 constituents, but only 66 WHADJP, of which

only 6 aren’t how much or how many, but there is an infinite space of these

• How clever/original/incompetent (at risk assessment and evaluation) …

• Most of the probabilities that you would like to compute, you can’t compute

9/20/2010 147CS730: Text Mining for Social Media, F2010

Page 148: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Sparseness & the Penn Treebank (2)

• Many parse preferences depend on bilexical statistics: likelihoods of relationships between pairs of words (compound nouns, PP attachments, …)

• Extremely sparse, even on topics central to the WSJ:– stocks plummeted 2 occurrences– stocks stabilized 1 occurrence– stocks skyrocketed 0 occurrences– #stocks discussed 0 occurrences

• So far there has been very modest success in augmenting the Penn Treebank with extra unannotated materials or using semantic classes – once there is more than a little annotated training data. – Cf. Charniak 1997, Charniak 2000; but see McClosky et al. 2006

9/20/2010 148CS730: Text Mining for Social Media, F2010

Page 149: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Motivating discriminative parsing

• In discriminative models, it is easy to incorporate different kinds of features– Often just about anything that seems linguistically

interesting• In generative models, it’s often difficult, and

the model suffers because of false independence assumptions

• This ability to add informative features is the real power of discriminative models for NLP.

9/20/2010 149CS730: Text Mining for Social Media, F2010

Page 150: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Discriminative Parsers

• Discriminative Dependency Parsing– Not as computationally hard (tiny grammar constant)– Explored considerably recently. E.g. McDonald et al. 2005

• Make parser action decisions discriminatively– E.g. with a shift-reduce parser

• Dynamic program Phrase Structure Parsing– Resource intensive! Most work on sentences of length <=15– The need to be able to dynamic program limits the feature types you can

use

• Post-Processing: Parse reranking– Just work with output of k-best generative parser

1. Distribution-free methods2. Probabilistic model methods

9/20/2010 150CS730: Text Mining for Social Media, F2010

Page 151: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Charniak and Johnson (ACL 2005):Coarse-to-fine n-best parsing and MaxEnt discriminative reranking

• Builds a maxent discriminative reranker over parses produced by (a slightly bugfixed and improved version of) Charniak (2000).

• Gets 50 best parses from Charniak (2000) parser– Doing this exploits the “coarse-to-fine” idea to heuristically find good

candidates

• Maxent model for reranking uses heads, etc. as generative model, but also nice linguistic features:– Conjunct parallelism– Right branching preference– Heaviness (length) of constituents factored in

• Gets 91% LP/LR F1 (on all sentences! – up to 80 wd)

9/20/2010 151CS730: Text Mining for Social Media, F2010

Page 152: CS 730:  Text Mining  for  Social Media  & Collaboratively Generated Content

Readings for Next Week

• FSNLP Chapters 11 and 12

9/20/2010 CS730: Text Mining for Social Media, F2010 152