nlp programming en 02 bigramlm

Upload: van-bao-nguyen

Post on 06-Jul-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/18/2019 Nlp Programming en 02 Bigramlm

    1/19

      1

    NLP Programming Tutorial 2 – Bigram Language Model

    NLP Programming Tutorial 2 -Bigram Language Models

    Graham NeubigNara Institute of Science and Technology (NIST!

  • 8/18/2019 Nlp Programming en 02 Bigramlm

    2/19

      2

    NLP Programming Tutorial 2 – Bigram Language Model

    "e#ie$%&alculating Sentence Probabilities

    ● 'e $ant the robability of

    "eresent this mathematically as%

    ' ) seech recognition system

    P(*'* ) +, $).seech., $

    2).recognition., $

    +).system.! )

    P($)/seech. * $

    0 ) /1s.!

    3 P($2).recognition. * $

    0 ) /1s., $

    )/seech.!

    3 P($+).system. * $

    0 ) /1s., $

    )/seech., $

    2).recognition.!

    3 P($4).15s. * $

    0 ) /1s., $

    )/seech., $

    2).recognition., $

    +).system.!

    N6T7%sentence start 1s and end 15s symbol

    N6T7%

    P($0 ) 1s! )

  • 8/18/2019 Nlp Programming en 02 Bigramlm

    3/19

      3

    NLP Programming Tutorial 2 – Bigram Language Model

    Incremental &omutation

    ● Pre#ious e8uation can be $ritten%

    ● 9nigram model ignored conte:t%

     P(W )=∏i=1

    ∣W ∣+ 1 P(wi∣w0…wi−1)

     P(wi∣w0…wi−1)≈ P (w i)

  • 8/18/2019 Nlp Programming en 02 Bigramlm

    4/19

      4

    NLP Programming Tutorial 2 – Bigram Language Model

    9nigram Models Ignore 'ord 6rder;

    ● Ignoring conte:t, robabilities are the same%

    Puni

    ($)seech recognition system! )

      P( $)seech! 3 P( $)recognition! 3 P( $)system! 3 P( $)15s!

    Puni

    ($)system recognition seech ! )

      P( $)seech! 3 P( $)recognition! 3 P( $)system! 3 P( $)15s!

    )

  • 8/18/2019 Nlp Programming en 02 Bigramlm

    5/19

      5

    NLP Programming Tutorial 2 – Bigram Language Model

    9nigram Models Ignore greement;

    ● Good sentences ($ords agree!%

    ● Bad sentences ($ords don

  • 8/18/2019 Nlp Programming en 02 Bigramlm

    6/19

      6

    NLP Programming Tutorial 2 – Bigram Language Model

    Solution% dd More &onte:t;

    ● 9nigram model ignored conte:t%

    ● Bigram model adds one $ord of conte:t

    Trigram model adds t$o $ords of conte:t

    =our-gram, fi#e-gram, si:-gram, etc>>>

     P(wi∣w0…wi−1)≈ P (w i)

     P(wi∣w0…wi−1)≈ P (w i∣wi−1)

     P(wi∣w0…wi−1)≈ P (w i∣wi−2w i−1)

  • 8/18/2019 Nlp Programming en 02 Bigramlm

    7/19

      7

    NLP Programming Tutorial 2 – Bigram Language Model

    Ma:imum Li?elihood 7stimationof n-gram Probabilities

    ● &alculate counts of n $ord and n- $ord strings

     P(wi∣w i−n+ 1…wi−1)=  c (w i−n+  1…wi)

    c (wi−n+ 1…wi−1)

    i li#e in osa?a > 15si am a graduate student > 15s

    my school is in nara > 15s

    P(nara * in! ) c(in nara!5c(in! )  5 2 ) 0>@

    P(osa?a * in! ) c(in osa?a!5c(in! )  5 2 ) 0>@n)2 A

  • 8/18/2019 Nlp Programming en 02 Bigramlm

    8/19

      8

    NLP Programming Tutorial 2 – Bigram Language Model

    Still Problems of Sarsity

    ● 'hen n-gram fre8uency is 0, robability is 0

    ● Li?e unigram model, $e can use linear interolation

    P(nara * in! ) c(i nara!5c(in! ) 5 2 ) 0>@

    P(osa?a * in! ) c(i osa?a!5c(in! ) 5 2 ) 0>@

    P(school * in! ) c(in school!5c(in! ) 0 5 2 ) 0;;

     P(wi∣w i−1)=λ2 P ML (w i∣wi−1)+ (1−λ2) P(wi)

     P(wi)=λ1 P ML(wi)+ (1−λ1) 1

     N 

    Bigram%

    9nigram%

  • 8/18/2019 Nlp Programming en 02 Bigramlm

    9/19

      9

    NLP Programming Tutorial 2 – Bigram Language Model

    &hoosing alues of C% Grid Search

    ● 6ne method to choose C2, C

    % try many #alues

    λ2=0.95,λ1=0.95

    Too many otionsA &hoosing ta?es time;

    9sing same C for all n-gramsA There is a smarter $ay;

    Problems%

    λ2=0.95,λ

    1=0.90

    λ2=0.95,λ1=0.85

    λ2=0.95,λ1=0.05λ

    2=0.90,λ

    1=0.95

    λ2=0.90,λ1=0.90

    λ2=0.05,λ1=0.05

    λ2=0.05,λ1=0.10

    D

    D

    NLP P i T i l 2 Bi L M d l

  • 8/18/2019 Nlp Programming en 02 Bigramlm

    10/19

      10

    NLP Programming Tutorial 2 – Bigram Language Model

    &onte:t Eeendent Smoothing

    ● Ma?e the interolation deend on the conte:t

    Figh fre8uency $ord% /To?yo.

    c(To?yo city! ) 40c(To?yo is! ) +@

    c(To?yo $as! ) 24c(To?yo to$er! ) @c(To?yo ort! ) 0

    D

    Most 2-grams already e:istA Large C is better;

    Lo$ fre8uency $ord% /Tottori.

    c(Tottori is! ) 2c(Tottori city! )

    c(Tottori $as! ) 0

    Many 2-grams $ill be missingA Small C is better;

     P(w

    i∣w

    i−1)=λwi−1 P

     ML (w

    i∣w

    i−1)+ (1

    −λwi−1

    ) P

    (w

    i)

    NLP P i T t i l 2 Bi L M d l

  • 8/18/2019 Nlp Programming en 02 Bigramlm

    11/19

      11

    NLP Programming Tutorial 2 – Bigram Language Model

    'itten-Bell Smoothing

    ● 6ne of the many $ays to choose

    ● =or e:amle%

    λw i−1

    λw i−1=1−  u(wi−1)

    u(wi−1)+ c (wi−1)u(wi−1) ) number of uni8ue $ords after $i-

    c(Tottori is! ) 2 c(Tottori city! ) c(Tottori! ) + u(Tottori! ) 2

    λTottori=1−  2

    2+ 3=0.6

    c(To?yo city! ) 40 c(To?yo is! ) +@ >>>c(To?yo! ) 20 u(To?yo! ) +0

    λTokyo=1−  30

    30+  270=0.9

    NLP Programming T torial 2 Bigram Lang age Model

  • 8/18/2019 Nlp Programming en 02 Bigramlm

    12/19

      12

    NLP Programming Tutorial 2 – Bigram Language Model

    Programming Techni8ues

    NLP Programming Tutorial 2 Bigram Language Model

  • 8/18/2019 Nlp Programming en 02 Bigramlm

    13/19

      13

    NLP Programming Tutorial 2 – Bigram Language Model

    Inserting into rrays

    ● To calculate n-grams easily, you may $ant to%

    ● This can be done $ith%

    my_words ) H/this., /is., /a., /en.

    my_words ) H/1s., /this., /is., /a., /en., /15s.

    my_words>append(/15s.! J dd to the endmy_words>insert(0, /1s.! J dd to the beginning

    NLP Programming Tutorial 2 – Bigram Language Model

  • 8/18/2019 Nlp Programming en 02 Bigramlm

    14/19

      14

    NLP Programming Tutorial 2 – Bigram Language Model

    "emo#ing from rrays

    ● Gi#en an n-gram $ith $i-nK

     D $i, $e may $ant the

    conte:t $i-nK

     D $i-

    ● This can be done $ith%

    my_ngram ) /to?yo to$er.my_words ) my_ngram>split(/ /! J &hange into H/to?yo., /to$er.my_words.pop(! J "emo#e the last element (/to$er.!my_context  ) / /> join(my_words! J oin the array bac? together

    print my_context 

    NLP Programming Tutorial 2 – Bigram Language Model

  • 8/18/2019 Nlp Programming en 02 Bigramlm

    15/19

      15

    NLP Programming Tutorial 2 Bigram Language Model

    7:ercise

    NLP Programming Tutorial 2 – Bigram Language Model

  • 8/18/2019 Nlp Programming en 02 Bigramlm

    16/19

      16

    NLP Programming Tutorial 2 Bigram Language Model

    7:ercise

    ● 'rite t$o rograms

    ● train-bigram% &reates a bigram model

    ● test-bigram% "eads a bigram model and calculates

    entroy on the test set● Test train-bigram on test502-train-inut>t:t

    ● Train the model on data5$i?i-en-train>$ord

    ● &alculate entroy on data5$i?i-en-test>$ord (if linear

    interolation, test different #alues of C2!

    ● &hallenge%

    ● 9se 'itten-Bell smoothing (Linear interolation is easier!

    ● &reate a rogram that $or?s $ith any  n (not ust bi-gram!

    NLP Programming Tutorial 2 – Bigram Language Model

  • 8/18/2019 Nlp Programming en 02 Bigramlm

    17/19

      17

    NLP Programming Tutorial 2 Bigram Language Model

    train-bigram (Linear Interolation!create map counts, context_counts 

    for each line in the training_file  split line into an array of words  append “ 15s. to the end and /1s. to the beginning of words  for each i in  to length(words!- J Note% starting at , after 1s  countsH/$

    i- $

    i. K) J dd bigram and bigram conte:t

      conte:tcountsH/$i-

    . K)

    countsH/$i. K) J dd unigram and unigram conte:t

      conte:tcountsH/. K)  

    open the model_file for $riting

    for each ngram, count  in counts  split ngram into an array of words # “w 

    i-1 w 

    i ” → {“w 

    i-1”, “w 

    i ”}

      remove the last element of words # {“w i-1

    ”, “w i ”} → {“w 

    i-1”}

      join words into context # {“w i-1

    ”} → “w i-1

    ” 

      proaility  ) countsHngram5conte:t counts!context"   rint n ram, roailit to model_file

    NLP Programming Tutorial 2 – Bigram Language Model

  • 8/18/2019 Nlp Programming en 02 Bigramlm

    18/19

      18

    g g g g g

    test-bigram (Linear Interolation!C

     ) OOO, C

    2 ) OOO,

      ) 000000, ' ) 0, F ) 0

    load model into pros

    for each line in test_file  split line into an array of words

      append “ 15s. to the end and /1s. to the beginning of words  for each i in  to length(words!- J Note% starting at , after 1s  P ) C

     prosH/$

    i. K ( C

    ! 5 J Smoothed unigram robability

      P2 ) C2 prosH/$

    i- $

    i. K ( C

    2! 3 P J Smoothed bigram robability

      F K) -log2

    (P2!

      ' K)

    print /entroy ) ”$%& 

    NLP Programming Tutorial 2 – Bigram Language Model

  • 8/18/2019 Nlp Programming en 02 Bigramlm

    19/19

      19

    g g g g g

    Than? Qou;