research prediction games in infinitely rich worlds omid madani yahoo! research
TRANSCRIPT
ResearchResearch
Prediction Games in Infinitely Rich Worlds
Omid Madani
Yahoo! Research
ResearchResearch
“Rather, the formation and use of categories is the stuff of experience.”
Philosophy in the Flesh, Lakoff and Johnson.
ResearchResearch
Motivation
• Higher intelligence requires myriad inter-related categories
• How can such be acquired?• Programming them unlikely to be
successful:• Limits of our explicit knowledge• Unknown/unfamiliar domains• Making the system operational..
ResearchResearch
Learn? … How?
• “Supervised” learning likely inadequate:• Required:
• ~millions of categories and beyond..• Billions of weights, and beyond..
• Inaccessible “knowledge” (see last slide!)
• Other approaches are fall short (incomplete, etc): clustering, RL, active learning, etc..
ResearchResearch
This Work: An Exploration
• An avenue: “prediction games in infinitely rich worlds”
• Exciting part: • World provides unbounded learning
opportunity! (world is the teacher!)• World enjoys many regularities (e.g.
“hierarchical”)
ResearchResearch
This Work• Describe the setting
• The games, categories, …
• Discuss:• Desiderata/constraints• Some of the many
challenges/problems
• Preliminary system/observations..
ResearchResearch
The Game
• Repeat • Hide part(s) of the stream• Predict (use context)• Update• Move on
• Goal: predict better ... (subject to constraints)• In the process: categories at different levels of
abstraction learned• Some details: what parts to hide? How much
context? What order?
ResearchResearch
In a Nutshell
Prediction System
…. 0011101110000….
After a While
predict observe & update
Prediction System
observe & updatepredict
low level categories
higher level categories(bigger chunks)(bits, characters, edges,…)
(e.g. words, digits, phrases, phone numbers, faces, visual objects, home pages, sites,…)
ResearchResearch
Example of Games (text)
• .. d?an.. • System predictions (ranked or
assigned probabilities, or.. )• “r”• “e”• “o”• …
• I ? my bike to school.
ResearchResearch
Categories
• Building blocks of intelligence?• Patterns that frequently occur
• External • Internal..• Useful for predicting other categories!• They can have structure/regularities
1. Composition (~conjunctions) of others
2. Grouping (~disjunctions)
ResearchResearch
Categories
• Low level examples: 0 and 1 or characters• Provided to the system
• Higher levels:• Sequence of k bits• Words• Phrases• Regular expressions • Phone number, contact info, resume, ...
ResearchResearch
Prediction Objective
• Desirable: learn higher level categories (bigger/abstract categories are useful externally)
• Question: how does this relate to improving predictions?
1. Higher level categories improve “context” and can save memory
2. Bigger, save time in playing the game (categories are atomic)
ResearchResearch
Goal (evaluation criterion)
• Number of bits (characters) correctly predicted per
unit time (or per prediction action)
• Subject to constraints (space, time,..)
• How about entropy/perplexity? Categories are structured..
ResearchResearch
Desiderata/Challenges/Issues• Lots of data!
• Efficiency: space and time!
• Noise:• Statistical insignificance• Significance, but for short time..
• Variety (need for abstraction)• Drift (e.g. developments within system)• Motivate: (primarily) online
algorithms/systems
ResearchResearch
Desiderata/Challenges• Why need for “system”s?
• Multiple algorithms/parts needed• Persistence
• Long term learning: how can we make sure noise/errors do not accumulate?
• Control of the input stream..
ResearchResearch
Why Now?
• Many category learning is possible/efficient!• Online• Noise tolerant
• Expectation: other problems are solvable..
ResearchResearch
Preliminary Report
• Work in Progress!
• Plays the game in text
• Begins at character level
• No segmentation, just a stream
• Makes and predicts larger sequences (composition)
ResearchResearch
Preliminary Observations
• Ran on Reuters RCV1 (text body) ( simply zcat dir/file* )
• 800k articles• >= 150 million learning/prediction episodes• Over 10 million categories built• 3-4 hours each pass
ResearchResearch
Observations• Performance on held out (one of the
Reuters files):• 8-9 characters long to predict on average• Almost two characters correct on
average, per prediction action
• Can overfit/memorize! (long categories)
• Current: stop category generation in first pass
ResearchResearch
ResearchResearch
Current/Future
• Much work:• Learn groupings• Recognize/use “syntactic”
categories?• Prediction objective is ok?• Category generation.. What’s a good
method?
• Compare: language modeling, etc
ResearchResearch
Much Related Work!
• Online learning, clustering, deep learning, Bayesian methods, hierarchical learning, importance of predictions (“On Intelligence”, “natural computations”), models of neocortex (“circuits of the mind”), concepts (“big book of concepts”), cumulative learning, neural nets, compression, learning an index of categories!