forgetting counts : constant memory inference for a dependent hierarchical pitman- yor process

Forgetting Counts : Constant Memory Inference for a Dependent Hierarchical

Pitman-Yor Process

Nicholas Bartlett, David Pfau, Frank WoodPresented by Yingjian Wang

Nov. 17, 2010

• Background• The sequential memoizer• Forgetting• The dependent HPY• Experiment results

Outline

Background2006,Teh, ‘A hierarchical Bayesian language model based on Pitman-Yor processes’

N-gram Markov chain language model with the HPY prior.

2009, Wood, ‘A Stochastic Memoizer for Sequence Data’

The Sequential Memoizer (SM) with linear space/time inference scheme. (lossless)

2010, Gasthaus, ’ Lossless compression based onthe Sequence Memoizer’

Combine the SM with an arithmetic coder to develop a compressor (PLUMP/dePLUMP), see www.deplump.com.

2010, Bartlett, ‘Forgetting Counts : Constant Memory Inferencefor a Dependent HPY’

Develop a constant memory/space inference for the SM, by using a dependent HPY. (with loss)

SM-Two concepts

• Memoizer (Donald Michie, 1968): A device which returns former results under the same input instead of recalculating in order to save time.

• Stochastic Memoizer (Wood, 2009): The returned results can change since the prediction probability is based upon a stochastic process.

http://en.wikipedia.org/wiki/Donald_Michie

SM-model and trie

• model:

• The prefix trie: restaurants.

gram

SM-the NSP (1)•The Normalized Stable Process: (Perman, 1990)

Pitman-Yor Process:

A Normalized Stable Process

( , , )G PY d c H

( ,0, )G PY d HDirichlet Process:

(0, , )G PY c H

Concentration parameter: c=0

Discount parameter: d=0

• Collapse the middle restaurants:Theorem:If:

Then:

• Prefix tree: restaurants (Weiner, 1973; Ukkonen, 1995)

SM-the NSP (2)

SM-linear space inference

Forgetting

• Motivation: to achieve constant memory inference on the basis of SM. How to do? ---

• Methods – Forgetting/delete the restaurants. • Restaurants - the basic memory units in the

context tree:• How to delete? – two deletion schemes:

random deletion; greedy deleting.

( , ) 2u usize c t V

Deletion schemes• Random deletion: uniformly delete one leaf

restaurant.• Greedy deletion: least negatively impacts the

estimated likelihood of the observed sequence.

Leaf restaurants

The SMC algorithm

The dependent HPY• But wait, what we get after the deletion-

addition? Will the processes be independent? – No (Since the seating arrangement in the parent restaurant has been changed.)

The experiment results

forgetting counts : constant memory inference for a dependent hierarchical pitman- yor process

Documents

greedy deletion

dependent hpydevelop

constant memory inferencefor

constant memoryspace

trie model

stochastic memoizer

restaurants weiner

middle restaurants