georgios’s visions ( interactive learning representations )
DESCRIPTION
HHMM…. Georgios’s Visions ( interactive learning representations ). MIT CS AI L. Ed Wood ( Characterized as the worst film maker ever ). - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Georgios’s Visions ( interactive learning representations )](https://reader035.vdocuments.net/reader035/viewer/2022062315/5681553b550346895dc30fd5/html5/thumbnails/1.jpg)
Georgios’s Visions(interactive learning representations)
MIT CSAIL
HHMM…
![Page 2: Georgios’s Visions ( interactive learning representations )](https://reader035.vdocuments.net/reader035/viewer/2022062315/5681553b550346895dc30fd5/html5/thumbnails/2.jpg)
Ed Wood(Characterized as the worst film maker ever)
"Home? I have no home Hunted,despised, Living like an animal! The jungle is my home. But I will show the world that I can be its master! I will perfect my own race of people. A race of atomic supermen which will conquer the world!"
![Page 3: Georgios’s Visions ( interactive learning representations )](https://reader035.vdocuments.net/reader035/viewer/2022062315/5681553b550346895dc30fd5/html5/thumbnails/3.jpg)
Why?
Learning from delayed reward is hopeless (in my opinion)
Supervised learning is impractical
Humans and animals live in societies
Need something above RL and below supervised learning
![Page 4: Georgios’s Visions ( interactive learning representations )](https://reader035.vdocuments.net/reader035/viewer/2022062315/5681553b550346895dc30fd5/html5/thumbnails/4.jpg)
Possible Titles
Social learning
Interactive learning
Learning to communicate
Classroom learning
Competitive learning
Do what I mean not what I say
What do you mean?
Let’s talk
Robot apprentices
Searching for the right representations
![Page 5: Georgios’s Visions ( interactive learning representations )](https://reader035.vdocuments.net/reader035/viewer/2022062315/5681553b550346895dc30fd5/html5/thumbnails/5.jpg)
Final Product
Observations, Actions,Rewards,State modification
Erik’s representation
Pavlov’s representation
Georgios’srepresentation
PHYSICAL ENVIRONMENT
![Page 6: Georgios’s Visions ( interactive learning representations )](https://reader035.vdocuments.net/reader035/viewer/2022062315/5681553b550346895dc30fd5/html5/thumbnails/6.jpg)
Obstacles
A mathematical framework for interactive learning (reward shaping?)
What are objects (sensory, motor sequences ?)
How do they relate to each other. What are the representations (atomic, propositional, first-order?)
![Page 7: Georgios’s Visions ( interactive learning representations )](https://reader035.vdocuments.net/reader035/viewer/2022062315/5681553b550346895dc30fd5/html5/thumbnails/7.jpg)
Example Systems
A robot that learns to navigate by interaction with a human trainer
A personalized web agent(active information extraction)
Personal assistants (office)
![Page 8: Georgios’s Visions ( interactive learning representations )](https://reader035.vdocuments.net/reader035/viewer/2022062315/5681553b550346895dc30fd5/html5/thumbnails/8.jpg)
Tools & Concepts
H-POMDPS?
What is missing? Dynamic abstractions (structure learning)
Teleological abstractions
Relational structure
Factorization (hierarchical reuse)
Multiagency /concurrency
![Page 9: Georgios’s Visions ( interactive learning representations )](https://reader035.vdocuments.net/reader035/viewer/2022062315/5681553b550346895dc30fd5/html5/thumbnails/9.jpg)
Grounded Projects
Other H-POMDP applications
Model reduction in POMDPs with macros
Structure learning of H-POMDPs
Theoretical localization results in grid-worlds with structure
Mathematical framework for interactive learning
Efficient algorithms for learning stochastic models
![Page 10: Georgios’s Visions ( interactive learning representations )](https://reader035.vdocuments.net/reader035/viewer/2022062315/5681553b550346895dc30fd5/html5/thumbnails/10.jpg)
Other H-POMDP Applications Passive “hierarchical” HMM applications
Policy recognition (AMM) (Hung Bui) Video Structure discovery (HHMM) (Lexing Xie) Human activity recognition (Nuria Oliver) Emotion Recognition (multi –level HMM) (Ira Cohen) Natural English text & cursive hand-writing (HHMM) (Fine) Information extraction (HHMM) (skounakis)
Active recognition/learning Active object detection/recognition (RL) (Lucas paletta) Selective perception policies for guiding sensing (layered HMM ) (Nuria Oliver, Eric
Horvitz) Active learning of HMMs (Tobias Scheffer)
What can we do (active learning?) (active recognition==POMDP planning?) Recognition of office activity / Active recognition of office activity / Active learning
of model parameters
![Page 11: Georgios’s Visions ( interactive learning representations )](https://reader035.vdocuments.net/reader035/viewer/2022062315/5681553b550346895dc30fd5/html5/thumbnails/11.jpg)
POMDPs & Macro-Actions
A model based RL over a dynamic grid abstraction in belief space with macro-actions (NIPS 2003) Consider only needed part of belief space Learn faster than just using primitive actions Ability to do information gathering
What’s next? A new minimized POMDP other than than the belief
state representation (PSRs? Non-linear dimensionality reductions? Smaller HMMs?)
Other domains
![Page 12: Georgios’s Visions ( interactive learning representations )](https://reader035.vdocuments.net/reader035/viewer/2022062315/5681553b550346895dc30fd5/html5/thumbnails/12.jpg)
Structure Learning
Natural Language approaches Sequitor (Nevill-Manning) Unsupervised Language acquisition (Carl G. de
Marcken)
Structure learning in graphical models Discovering hidden state (X. Boyen)
From Data Mining Bursty and Hierarchical structure in streams (Jon
Kleinberg)
![Page 13: Georgios’s Visions ( interactive learning representations )](https://reader035.vdocuments.net/reader035/viewer/2022062315/5681553b550346895dc30fd5/html5/thumbnails/13.jpg)
Localizing in Flat Grid Worlds is NP-hard
In flat POMDPs finding localization plans that are within a log factor of optimal is NP-Hard (Sven Koenig)
Does the same hold for H-POMDPs?
![Page 14: Georgios’s Visions ( interactive learning representations )](https://reader035.vdocuments.net/reader035/viewer/2022062315/5681553b550346895dc30fd5/html5/thumbnails/14.jpg)
Mathematical Framework for Interactive learning
T
R
O Policy
Action a
State s
Reward r
zAGENT
ENVIRONMENT
State s
Reward r
Supervisor
![Page 15: Georgios’s Visions ( interactive learning representations )](https://reader035.vdocuments.net/reader035/viewer/2022062315/5681553b550346895dc30fd5/html5/thumbnails/15.jpg)
Interactive Learning Literature
Programmable RL agents (David Andre)
Principle methods for advising RL agents (Garrison Cottrell)
Machine discovery of effective admissible heuristics (Armand E. Prieditis)
Supervised learning combined with an actor-critic architecture (Michaels Rosenstein)
Shaping in RL by changing the physics of the problem (Jette Randolv)
What if the teacher needs to learn too?
![Page 16: Georgios’s Visions ( interactive learning representations )](https://reader035.vdocuments.net/reader035/viewer/2022062315/5681553b550346895dc30fd5/html5/thumbnails/16.jpg)
Efficient Learning Algorithms for Models of Stochastic Processes
Parameter learning in graphical models is inefficient (structure learning impractical)
Can we do better? Train model where it needs to be trained Do informed searching when learning
structure
![Page 17: Georgios’s Visions ( interactive learning representations )](https://reader035.vdocuments.net/reader035/viewer/2022062315/5681553b550346895dc30fd5/html5/thumbnails/17.jpg)
Conclusions
Big results require big ambitions
To make progress towards AI,We need to make learning and planning more interactive
This will keep me busy for a while