lecture 4 ngrams smoothing

20
Lecture 4 Ngrams Smoothing Topics Topics Python NLTK N – grams Smoothing Readings: Readings: Chapter 4 – Jurafsky and Martin January 23, 2013 CSCE 771 Natural Language Processing

Upload: debbie

Post on 23-Jan-2016

45 views

Category:

Documents


0 download

DESCRIPTION

Lecture 4 Ngrams Smoothing. CSCE 771 Natural Language Processing. Topics Python NLTK N – grams Smoothing Readings: Chapter 4 – Jurafsky and Martin. January 23, 2013. Last Time Slides from Lecture 1 30- Regular expressions in Python, ( grep , vi, emacs , word)? Eliza Morphology - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lecture 4 Ngrams Smoothing

Lecture 4Ngrams Smoothing

Lecture 4Ngrams Smoothing

Topics Topics Python NLTK N – grams Smoothing

Readings:Readings: Chapter 4 – Jurafsky and Martin

January 23, 2013

CSCE 771 Natural Language Processing

Page 2: Lecture 4 Ngrams Smoothing

– 2 –CSCE 771 Spring 2013

Last TimeLast Time Slides from Lecture 1 30-

Regular expressions in Python, (grep, vi, emacs, word)?Eliza

Morphology

TodayToday Smoothing N-gram models Laplace (plus 1) Good Turing Discounting Katz Backoff Neisser-Ney

Page 3: Lecture 4 Ngrams Smoothing

– 3 –CSCE 771 Spring 2013

ProblemProblem

Let’s assume we’re using N-gramsLet’s assume we’re using N-grams

How can we assign a probability to a sequence where How can we assign a probability to a sequence where one of the component n-grams has a value of zeroone of the component n-grams has a value of zero

Assume all the words are known and have been seenAssume all the words are known and have been seen Go to a lower order n-gram Back off from bigrams to unigrams Replace the zero with something else

Page 4: Lecture 4 Ngrams Smoothing

– 4 –CSCE 771 Spring 2013

SmoothingSmoothing

Smoothing - reevaluating some of the zero and low Smoothing - reevaluating some of the zero and low probability N-grams and assigning them non-zero probability N-grams and assigning them non-zero valuesvalues

Add-One (Laplace) Add-One (Laplace)

Make the zero counts 1., really start counting at 1Make the zero counts 1., really start counting at 1

Rationale: They’re just events you haven’t seen yet. If Rationale: They’re just events you haven’t seen yet. If you had seen them, chances are you would only you had seen them, chances are you would only have seen them once… so make the count equal to have seen them once… so make the count equal to 1.1.

Page 5: Lecture 4 Ngrams Smoothing

– 5 –CSCE 771 Spring 2013

Add-One SmoothingAdd-One Smoothing

TerminologyTerminology

N – Number of total wordsN – Number of total words

V – vocabulary size == number of distinct wordsV – vocabulary size == number of distinct words

Maximum Likelihood estimateMaximum Likelihood estimate

ii

xx wc

wcwP

)(

)()(

Page 6: Lecture 4 Ngrams Smoothing

– 6 –CSCE 771 Spring 2013

Adjusted counts “C*”Adjusted counts “C*”

TerminologyTerminology

N – Number of total wordsN – Number of total words

V – vocabulary size == number of distinct wordsV – vocabulary size == number of distinct words

VN

Ncc ii

)1(*

Adjusted count C*Adjusted count C*

N

cp ii *

Adjusted probabilitiesAdjusted probabilities

Page 7: Lecture 4 Ngrams Smoothing

– 7 –CSCE 771 Spring 2013

Discounting ViewDiscounting View

Discounting – lowering some of Discounting – lowering some of the larger non-zero counts to the larger non-zero counts to get the “probability” to assign get the “probability” to assign to the zero entriesto the zero entries

ddcc – the discounted counts – the discounted counts

The discounted probabilities The discounted probabilities can then be directly calculatedcan then be directly calculated

c

cdc

*

VN

cp ii

1*

Page 8: Lecture 4 Ngrams Smoothing

– 8 –CSCE 771 Spring 2013

Original BERP Counts (fig 4.1)Original BERP Counts (fig 4.1)

Berkeley Restaurant Project dataBerkeley Restaurant Project data

V = 1616V = 1616

Page 9: Lecture 4 Ngrams Smoothing

– 9 –CSCE 771 Spring 2013

Figure 4.5 Add one counts (Laplace)Figure 4.5 Add one counts (Laplace)CountsCounts

ProbabilitiesProbabilities

Page 10: Lecture 4 Ngrams Smoothing

– 10 –CSCE 771 Spring 2013

Figure 6.6 Add one counts & prob.Figure 6.6 Add one counts & prob.CountsCounts

ProbabilitiesProbabilities

Page 11: Lecture 4 Ngrams Smoothing

– 11 –CSCE 771 Spring 2013

Add-One Smoothed bigram countsAdd-One Smoothed bigram counts

Think about the occurrence of an unseen item (Think about the occurrence of an unseen item (

Page 12: Lecture 4 Ngrams Smoothing

– 12 –CSCE 771 Spring 2013

Good-Turing DiscountingGood-Turing Discounting

Singleton - an word that occurs only onceSingleton - an word that occurs only once

Good-TuringGood-Turing: Estimate probability of word that occur : Estimate probability of word that occur zero times with the probability of a singletonzero times with the probability of a singleton

Generalize words to bigrams, trigrams … eventsGeneralize words to bigrams, trigrams … events

Page 13: Lecture 4 Ngrams Smoothing

– 13 –CSCE 771 Spring 2013

Calculating Good-TuringCalculating Good-Turing

Page 14: Lecture 4 Ngrams Smoothing

– 14 –CSCE 771 Spring 2013

Witten-BellWitten-Bell

Think about the occurrence of an unseen item Think about the occurrence of an unseen item (word, bigram, etc) as an event.(word, bigram, etc) as an event.

The probability of such an event can be measured The probability of such an event can be measured in a corpus by just looking at how often it in a corpus by just looking at how often it happens.happens.

Just take the single word case first.Just take the single word case first.

Assume a corpus of N tokens and T types.Assume a corpus of N tokens and T types.

How many times was an as yet unseen type How many times was an as yet unseen type encountered?encountered?

Page 15: Lecture 4 Ngrams Smoothing

– 15 –CSCE 771 Spring 2013

Witten BellWitten Bell

First compute the probability of an unseen eventFirst compute the probability of an unseen event

Then distribute that probability mass equally among the Then distribute that probability mass equally among the as yet unseen eventsas yet unseen events That should strike you as odd for a number of reasons In the case of words… In the case of bigrams

Page 16: Lecture 4 Ngrams Smoothing

– 16 –CSCE 771 Spring 2013

Witten-BellWitten-Bell

In the case of bigrams, not all conditioning events are In the case of bigrams, not all conditioning events are equally promiscuousequally promiscuous P(x|the) vs P(x|going)

So distribute the mass assigned to the zero count So distribute the mass assigned to the zero count bigrams according to their promiscuitybigrams according to their promiscuity

Page 17: Lecture 4 Ngrams Smoothing

– 17 –CSCE 771 Spring 2013

Witten-BellWitten-Bell

Finally, renormalize the whole table so that you still Finally, renormalize the whole table so that you still have a valid probabilityhave a valid probability

Page 18: Lecture 4 Ngrams Smoothing

– 18 –CSCE 771 Spring 2013

Original BERP Counts; Original BERP Counts;

Now the Add 1 counts

Page 19: Lecture 4 Ngrams Smoothing

– 19 –CSCE 771 Spring 2013

Witten-Bell Smoothed and ReconstitutedWitten-Bell Smoothed and Reconstituted

Page 20: Lecture 4 Ngrams Smoothing

– 20 –CSCE 771 Spring 2013

Add-One Smoothed BERPReconstitutedAdd-One Smoothed BERPReconstituted