faster and smaller n-gram lms adam pauls and dan klein
DESCRIPTION
Faster and Smaller N-Gram LMs Adam Pauls and Dan Klein. Presented by SUN Jun. Overview. N-gram LMs A short review of LM implementation Trie Array: implicit Trie This work: Combination of Multiple techniques Implicit Encoding of query word Variable length encoding for compression - PowerPoint PPT PresentationTRANSCRIPT
Faster and Smaller N-Gram LMs
Adam Pauls and Dan Klein
Presented by SUN Jun
Overview
• N-gram LMs• A short review of LM implementation– Trie– Array: implicit Trie
• This work: Combination of Multiple techniques– Implicit Encoding of query word– Variable length encoding for compression– Speed up for decoder
Back-Off LM
• LM: An n-gram LM represents the probability of a word sequence, given history
• Back-Off LM: Trust the highest order language model that contains n-gram
Implementation of Back-off LM
• File based• Trie• Reverse Trie• Array-a: implicit Trie• Array-b: implicit Trie with reverse index to
parent
This paper
• This work: Combination of Multiple techniques– Implicit Encoding of query word– Variable length encoding for compression– Speed up for decoder
Implicit Encoding of query word
• Sorted array
Implicit Encoding of query word
• Hash Table
Implicit Encoding of query word
• We can exploit this redundancy by storing only the context offsets in the main array, using as many bits as needed to encode all context offsets (32 bits for Web1T).
• In auxiliary arrays, one for each n-gram order, we store the beginning and end of the range of the trie array in which all (wi; c) keys are stored for each wi.
Variable length encoding for compression
Speed up decoder
• Repetitive Queries– By cache
• Scrolling Queries
Experiements
Exp
Exp
Exp
END