nlp programming en 02 bigramlm
TRANSCRIPT
-
8/18/2019 Nlp Programming en 02 Bigramlm
1/19
1
NLP Programming Tutorial 2 – Bigram Language Model
NLP Programming Tutorial 2 -Bigram Language Models
Graham NeubigNara Institute of Science and Technology (NIST!
-
8/18/2019 Nlp Programming en 02 Bigramlm
2/19
2
NLP Programming Tutorial 2 – Bigram Language Model
"e#ie$%&alculating Sentence Probabilities
● 'e $ant the robability of
●
"eresent this mathematically as%
' ) seech recognition system
P(*'* ) +, $).seech., $
2).recognition., $
+).system.! )
P($)/seech. * $
0 ) /1s.!
3 P($2).recognition. * $
0 ) /1s., $
)/seech.!
3 P($+).system. * $
0 ) /1s., $
)/seech., $
2).recognition.!
3 P($4).15s. * $
0 ) /1s., $
)/seech., $
2).recognition., $
+).system.!
N6T7%sentence start 1s and end 15s symbol
N6T7%
P($0 ) 1s! )
-
8/18/2019 Nlp Programming en 02 Bigramlm
3/19
3
NLP Programming Tutorial 2 – Bigram Language Model
Incremental &omutation
● Pre#ious e8uation can be $ritten%
● 9nigram model ignored conte:t%
P(W )=∏i=1
∣W ∣+ 1 P(wi∣w0…wi−1)
P(wi∣w0…wi−1)≈ P (w i)
-
8/18/2019 Nlp Programming en 02 Bigramlm
4/19
4
NLP Programming Tutorial 2 – Bigram Language Model
9nigram Models Ignore 'ord 6rder;
● Ignoring conte:t, robabilities are the same%
Puni
($)seech recognition system! )
P( $)seech! 3 P( $)recognition! 3 P( $)system! 3 P( $)15s!
Puni
($)system recognition seech ! )
P( $)seech! 3 P( $)recognition! 3 P( $)system! 3 P( $)15s!
)
-
8/18/2019 Nlp Programming en 02 Bigramlm
5/19
5
NLP Programming Tutorial 2 – Bigram Language Model
9nigram Models Ignore greement;
● Good sentences ($ords agree!%
● Bad sentences ($ords don
-
8/18/2019 Nlp Programming en 02 Bigramlm
6/19
6
NLP Programming Tutorial 2 – Bigram Language Model
Solution% dd More &onte:t;
● 9nigram model ignored conte:t%
● Bigram model adds one $ord of conte:t
●
Trigram model adds t$o $ords of conte:t
●
=our-gram, fi#e-gram, si:-gram, etc>>>
P(wi∣w0…wi−1)≈ P (w i)
P(wi∣w0…wi−1)≈ P (w i∣wi−1)
P(wi∣w0…wi−1)≈ P (w i∣wi−2w i−1)
-
8/18/2019 Nlp Programming en 02 Bigramlm
7/19
7
NLP Programming Tutorial 2 – Bigram Language Model
Ma:imum Li?elihood 7stimationof n-gram Probabilities
● &alculate counts of n $ord and n- $ord strings
P(wi∣w i−n+ 1…wi−1)= c (w i−n+ 1…wi)
c (wi−n+ 1…wi−1)
i li#e in osa?a > 15si am a graduate student > 15s
my school is in nara > 15s
P(nara * in! ) c(in nara!5c(in! ) 5 2 ) 0>@
P(osa?a * in! ) c(in osa?a!5c(in! ) 5 2 ) 0>@n)2 A
-
8/18/2019 Nlp Programming en 02 Bigramlm
8/19
8
NLP Programming Tutorial 2 – Bigram Language Model
Still Problems of Sarsity
● 'hen n-gram fre8uency is 0, robability is 0
● Li?e unigram model, $e can use linear interolation
P(nara * in! ) c(i nara!5c(in! ) 5 2 ) 0>@
P(osa?a * in! ) c(i osa?a!5c(in! ) 5 2 ) 0>@
P(school * in! ) c(in school!5c(in! ) 0 5 2 ) 0;;
P(wi∣w i−1)=λ2 P ML (w i∣wi−1)+ (1−λ2) P(wi)
P(wi)=λ1 P ML(wi)+ (1−λ1) 1
N
Bigram%
9nigram%
-
8/18/2019 Nlp Programming en 02 Bigramlm
9/19
9
NLP Programming Tutorial 2 – Bigram Language Model
&hoosing alues of C% Grid Search
● 6ne method to choose C2, C
% try many #alues
λ2=0.95,λ1=0.95
Too many otionsA &hoosing ta?es time;
9sing same C for all n-gramsA There is a smarter $ay;
Problems%
λ2=0.95,λ
1=0.90
λ2=0.95,λ1=0.85
λ2=0.95,λ1=0.05λ
2=0.90,λ
1=0.95
λ2=0.90,λ1=0.90
λ2=0.05,λ1=0.05
λ2=0.05,λ1=0.10
D
D
NLP P i T i l 2 Bi L M d l
-
8/18/2019 Nlp Programming en 02 Bigramlm
10/19
10
NLP Programming Tutorial 2 – Bigram Language Model
&onte:t Eeendent Smoothing
● Ma?e the interolation deend on the conte:t
Figh fre8uency $ord% /To?yo.
c(To?yo city! ) 40c(To?yo is! ) +@
c(To?yo $as! ) 24c(To?yo to$er! ) @c(To?yo ort! ) 0
D
Most 2-grams already e:istA Large C is better;
Lo$ fre8uency $ord% /Tottori.
c(Tottori is! ) 2c(Tottori city! )
c(Tottori $as! ) 0
Many 2-grams $ill be missingA Small C is better;
P(w
i∣w
i−1)=λwi−1 P
ML (w
i∣w
i−1)+ (1
−λwi−1
) P
(w
i)
NLP P i T t i l 2 Bi L M d l
-
8/18/2019 Nlp Programming en 02 Bigramlm
11/19
11
NLP Programming Tutorial 2 – Bigram Language Model
'itten-Bell Smoothing
● 6ne of the many $ays to choose
● =or e:amle%
λw i−1
λw i−1=1− u(wi−1)
u(wi−1)+ c (wi−1)u(wi−1) ) number of uni8ue $ords after $i-
c(Tottori is! ) 2 c(Tottori city! ) c(Tottori! ) + u(Tottori! ) 2
λTottori=1− 2
2+ 3=0.6
c(To?yo city! ) 40 c(To?yo is! ) +@ >>>c(To?yo! ) 20 u(To?yo! ) +0
λTokyo=1− 30
30+ 270=0.9
NLP Programming T torial 2 Bigram Lang age Model
-
8/18/2019 Nlp Programming en 02 Bigramlm
12/19
12
NLP Programming Tutorial 2 – Bigram Language Model
Programming Techni8ues
NLP Programming Tutorial 2 Bigram Language Model
-
8/18/2019 Nlp Programming en 02 Bigramlm
13/19
13
NLP Programming Tutorial 2 – Bigram Language Model
Inserting into rrays
● To calculate n-grams easily, you may $ant to%
● This can be done $ith%
my_words ) H/this., /is., /a., /en.
my_words ) H/1s., /this., /is., /a., /en., /15s.
my_words>append(/15s.! J dd to the endmy_words>insert(0, /1s.! J dd to the beginning
NLP Programming Tutorial 2 – Bigram Language Model
-
8/18/2019 Nlp Programming en 02 Bigramlm
14/19
14
NLP Programming Tutorial 2 – Bigram Language Model
"emo#ing from rrays
● Gi#en an n-gram $ith $i-nK
D $i, $e may $ant the
conte:t $i-nK
D $i-
● This can be done $ith%
my_ngram ) /to?yo to$er.my_words ) my_ngram>split(/ /! J &hange into H/to?yo., /to$er.my_words.pop(! J "emo#e the last element (/to$er.!my_context ) / /> join(my_words! J oin the array bac? together
print my_context
NLP Programming Tutorial 2 – Bigram Language Model
-
8/18/2019 Nlp Programming en 02 Bigramlm
15/19
15
NLP Programming Tutorial 2 Bigram Language Model
7:ercise
NLP Programming Tutorial 2 – Bigram Language Model
-
8/18/2019 Nlp Programming en 02 Bigramlm
16/19
16
NLP Programming Tutorial 2 Bigram Language Model
7:ercise
● 'rite t$o rograms
● train-bigram% &reates a bigram model
● test-bigram% "eads a bigram model and calculates
entroy on the test set● Test train-bigram on test502-train-inut>t:t
● Train the model on data5$i?i-en-train>$ord
● &alculate entroy on data5$i?i-en-test>$ord (if linear
interolation, test different #alues of C2!
● &hallenge%
● 9se 'itten-Bell smoothing (Linear interolation is easier!
● &reate a rogram that $or?s $ith any n (not ust bi-gram!
NLP Programming Tutorial 2 – Bigram Language Model
-
8/18/2019 Nlp Programming en 02 Bigramlm
17/19
17
NLP Programming Tutorial 2 Bigram Language Model
train-bigram (Linear Interolation!create map counts, context_counts
for each line in the training_file split line into an array of words append “ 15s. to the end and /1s. to the beginning of words for each i in to length(words!- J Note% starting at , after 1s countsH/$
i- $
i. K) J dd bigram and bigram conte:t
conte:tcountsH/$i-
. K)
countsH/$i. K) J dd unigram and unigram conte:t
conte:tcountsH/. K)
open the model_file for $riting
for each ngram, count in counts split ngram into an array of words # “w
i-1 w
i ” → {“w
i-1”, “w
i ”}
remove the last element of words # {“w i-1
”, “w i ”} → {“w
i-1”}
join words into context # {“w i-1
”} → “w i-1
”
proaility ) countsHngram5conte:t counts!context" rint n ram, roailit to model_file
NLP Programming Tutorial 2 – Bigram Language Model
-
8/18/2019 Nlp Programming en 02 Bigramlm
18/19
18
g g g g g
test-bigram (Linear Interolation!C
) OOO, C
2 ) OOO,
) 000000, ' ) 0, F ) 0
load model into pros
for each line in test_file split line into an array of words
append “ 15s. to the end and /1s. to the beginning of words for each i in to length(words!- J Note% starting at , after 1s P ) C
prosH/$
i. K ( C
! 5 J Smoothed unigram robability
P2 ) C2 prosH/$
i- $
i. K ( C
2! 3 P J Smoothed bigram robability
F K) -log2
(P2!
' K)
print /entroy ) ”$%&
NLP Programming Tutorial 2 – Bigram Language Model
-
8/18/2019 Nlp Programming en 02 Bigramlm
19/19
19
g g g g g
Than? Qou;