nlp programming en 02 bigramlm

8/18/2019 Nlp Programming en 02 Bigramlm

1/19

1

NLP Programming Tutorial 2 – Bigram Language Model

NLP Programming Tutorial 2 -Bigram Language Models

Graham NeubigNara Institute of Science and Technology (NIST!


2/19

2


"e#ie$%&alculating Sentence Probabilities

● 'e $ant the robability of

●

"eresent this mathematically as%

' ) seech recognition system

P(*'* ) +, $).seech., $

2).recognition., $

+).system.! )

P($)/seech. * $

0 ) /1s.!

3 P($2).recognition. * $

0 ) /1s., $

)/seech.!

3 P($+).system. * $

0 ) /1s., $

)/seech., $

2).recognition.!

3 P($4).15s. * $

0 ) /1s., $

)/seech., $

2).recognition., $

+).system.!

N6T7%sentence start 1s and end 15s symbol

N6T7%

P($0 ) 1s! )


3/19

3


Incremental &omutation

● Pre#ious e8uation can be $ritten%

● 9nigram model ignored conte:t%

P(W )=∏i=1

∣W ∣+ 1 P(wi∣w0…wi−1)

P(wi∣w0…wi−1)≈ P (w i)


4/19

4


9nigram Models Ignore 'ord 6rder;

● Ignoring conte:t, robabilities are the same%

Puni

($)seech recognition system! )

P( $)seech! 3 P( $)recognition! 3 P( $)system! 3 P( $)15s!

Puni

($)system recognition seech ! )

P( $)seech! 3 P( $)recognition! 3 P( $)system! 3 P( $)15s!

)


5/19

5


9nigram Models Ignore greement;

● Good sentences ($ords agree!%

● Bad sentences ($ords don


6/19

6


Solution% dd More &onte:t;

● 9nigram model ignored conte:t%

● Bigram model adds one $ord of conte:t

●

Trigram model adds t$o $ords of conte:t

●

=our-gram, fi#e-gram, si:-gram, etc>>>

P(wi∣w0…wi−1)≈ P (w i)

P(wi∣w0…wi−1)≈ P (w i∣wi−1)

P(wi∣w0…wi−1)≈ P (w i∣wi−2w i−1)


7/19

7


Ma:imum Li?elihood 7stimationof n-gram Probabilities

● &alculate counts of n $ord and n- $ord strings

P(wi∣w i−n+ 1…wi−1)= c (w i−n+ 1…wi)

c (wi−n+ 1…wi−1)

i li#e in osa?a > 15si am a graduate student > 15s

my school is in nara > 15s

P(nara * in! ) c(in nara!5c(in! ) 5 2 ) 0>@

P(osa?a * in! ) c(in osa?a!5c(in! ) 5 2 ) 0>@n)2 A


8/19

8


Still Problems of Sarsity

● 'hen n-gram fre8uency is 0, robability is 0

● Li?e unigram model, $e can use linear interolation

P(nara * in! ) c(i nara!5c(in! ) 5 2 ) 0>@

P(osa?a * in! ) c(i osa?a!5c(in! ) 5 2 ) 0>@

P(school * in! ) c(in school!5c(in! ) 0 5 2 ) 0;;

P(wi∣w i−1)=λ2 P ML (w i∣wi−1)+ (1−λ2) P(wi)

P(wi)=λ1 P ML(wi)+ (1−λ1) 1

N

Bigram%

9nigram%


9/19

9


&hoosing alues of C% Grid Search

● 6ne method to choose C2, C

% try many #alues

λ2=0.95,λ1=0.95

Too many otionsA &hoosing ta?es time;

9sing same C for all n-gramsA There is a smarter $ay;

Problems%

λ2=0.95,λ

1=0.90

λ2=0.95,λ1=0.85

λ2=0.95,λ1=0.05λ

2=0.90,λ

1=0.95

λ2=0.90,λ1=0.90

λ2=0.05,λ1=0.05

λ2=0.05,λ1=0.10

D

D

NLP P i T i l 2 Bi L M d l


10/19

10


&onte:t Eeendent Smoothing

● Ma?e the interolation deend on the conte:t

Figh fre8uency $ord% /To?yo.

c(To?yo city! ) 40c(To?yo is! ) +@

c(To?yo $as! ) 24c(To?yo to$er! ) @c(To?yo ort! ) 0

D

Most 2-grams already e:istA Large C is better;

Lo$ fre8uency $ord% /Tottori.

c(Tottori is! ) 2c(Tottori city! )

c(Tottori $as! ) 0

Many 2-grams $ill be missingA Small C is better;

P(w

i∣w

i−1)=λwi−1 P

ML (w

i∣w

i−1)+ (1

−λwi−1

) P

(w

i)

NLP P i T t i l 2 Bi L M d l


11/19

11


'itten-Bell Smoothing

● 6ne of the many $ays to choose

● =or e:amle%

λw i−1

λw i−1=1− u(wi−1)

u(wi−1)+ c (wi−1)u(wi−1) ) number of uni8ue $ords after $i-

c(Tottori is! ) 2 c(Tottori city! ) c(Tottori! ) + u(Tottori! ) 2

λTottori=1− 2

2+ 3=0.6

c(To?yo city! ) 40 c(To?yo is! ) +@ >>>c(To?yo! ) 20 u(To?yo! ) +0

λTokyo=1− 30

30+ 270=0.9

NLP Programming T torial 2 Bigram Lang age Model


12/19

12


Programming Techni8ues

NLP Programming Tutorial 2 Bigram Language Model


13/19

13


Inserting into rrays

● To calculate n-grams easily, you may $ant to%

● This can be done $ith%

my_words ) H/this., /is., /a., /en.

my_words ) H/1s., /this., /is., /a., /en., /15s.

my_words>append(/15s.! J dd to the endmy_words>insert(0, /1s.! J dd to the beginning



14/19

14


"emo#ing from rrays

● Gi#en an n-gram $ith $i-nK

D $i, $e may $ant the

conte:t $i-nK

D $i-

● This can be done $ith%

my_ngram ) /to?yo to$er.my_words ) my_ngram>split(/ /! J &hange into H/to?yo., /to$er.my_words.pop(! J "emo#e the last element (/to$er.!my_context ) / /> join(my_words! J oin the array bac? together

print my_context



15/19

15


7:ercise



16/19

16


7:ercise

● 'rite t$o rograms

● train-bigram% &reates a bigram model

● test-bigram% "eads a bigram model and calculates

entroy on the test set● Test train-bigram on test502-train-inut>t:t

● Train the model on data5$i?i-en-train>$ord

● &alculate entroy on data5$i?i-en-test>$ord (if linear

interolation, test different #alues of C2!

● &hallenge%

● 9se 'itten-Bell smoothing (Linear interolation is easier!

● &reate a rogram that $or?s $ith any n (not ust bi-gram!



17/19

17


train-bigram (Linear Interolation!create map counts, context_counts

for each line in the training_file split line into an array of words append “ 15s. to the end and /1s. to the beginning of words for each i in to length(words!- J Note% starting at , after 1s countsH/$

i- $

i. K) J dd bigram and bigram conte:t

conte:tcountsH/$i-

. K)

countsH/$i. K) J dd unigram and unigram conte:t

conte:tcountsH/. K)

open the model_file for $riting

for each ngram, count in counts split ngram into an array of words # “w

i-1 w

i ” → {“w

i-1”, “w

i ”}

remove the last element of words # {“w i-1

”, “w i ”} → {“w

i-1”}

join words into context # {“w i-1

”} → “w i-1

”

proaility ) countsHngram5conte:t counts!context" rint n ram, roailit to model_file



18/19

18

g g g g g

test-bigram (Linear Interolation!C

) OOO, C

2 ) OOO,

) 000000, ' ) 0, F ) 0

load model into pros

for each line in test_file split line into an array of words

append “ 15s. to the end and /1s. to the beginning of words for each i in to length(words!- J Note% starting at , after 1s P ) C

prosH/$

i. K ( C

! 5 J Smoothed unigram robability

P2 ) C2 prosH/$

i- $

i. K ( C

2! 3 P J Smoothed bigram robability

F K) -log2

(P2!

' K)

print /entroy ) ”$%&



19/19

19

g g g g g

Than? Qou;

nlp programming en 02 bigramlm

Documents