learning bit by bit class 4 - ngrams. ngrams counting words using observation to make predictions

28
Learning Bit by Bit Class 4 - Ngrams

Post on 22-Dec-2015

219 views

Category:

Documents


3 download

TRANSCRIPT

Learning Bit by Bit

Class 4 - Ngrams

Ngrams

• Counting words• Using observation to make predictions

Ngrams

• Corpus/Corpora

Unigram

• “how’s the weather out there?”• [how’s, the, weather, out, there]

Unigram

• how many words are there?

Unigram

• How many times does “weather” occur?

Unigram

• Prob “weather” = occurrences of “weather”/ total # words

Unigram

• P(“weather”) = c(“weather”) / c(total)

Bigram

• “the storm swept through the land”• [(the, storm), (storm, swept), (swept,

through), (through, the), (the land)]

Bigram

• How many times does “storm” follow “the”?

Bigram

• How many times does the word “the” occur?

Bigram

• Prob “the storm” given “the” = occurrences of “the storm”/ occurrences of “the”

Bigram

• Prob “the storm” = occurrences of “the storm”/ occurrences of “the”

• P(word n| word n-1)

Markov Assumption

• The assumption that the probability of a word can depend only on the previous word, or previous N words

• P(“land” | “the”)• P (“land” | “the storm swept through the”)

N gram

• Extends bigram model to previous N words

Maximum Likelihood Estimation

• N-Gram probability based on corpus counts• P(word n| word n-1) = counts of word n-1 followed by word n /Counts of all times word n-1 occurs

Trigram

• “the quick red fox jumped the quick black bear. The quick red fox hopped away.”

• [(the, quick, red), (quick, red, fox), (red, fox, jumped), (fox, jumped, the), (jumped, the, quick), (the, quick, black), (quick, black, bear) (the, quick, red) (quick, red, fox), (red, fox, hopped), (fox, hopped, away)]

Trigram

• How many times does “the quick red” occur?

Trigram

• How many times does “the quick” occur?

Trigram

• Prob “the quick red” given “the quick” = occurrences of “the quick red” /

occurrences of “the quick”

Test it in Google

• Google “the weather”• How many results?

Test it in Google

• Google “the weather is”• How many results?

Test it in Google

• Google “the weather out”• How many results?

Test it in Google

• Google “weather the out”• How many results?

Test it in Google

• Prob “the weather out” =Count “the weather out”/Count “the weather”

Test in Google

• Why so few results for “weather the out”?

Training and Testing

• Training set – bigger ie. 80-90%• Testing set – smaller ie. 10-20%

Examples