cs 188: artificial intelligence spring 2007 lecture 29: post-midterm course review 5/8/2007 srini...
Post on 20-Dec-2015
216 views
TRANSCRIPT
![Page 1: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/1.jpg)
CS 188: Artificial IntelligenceSpring 2007
Lecture 29: Post-midterm course review
5/8/2007
Srini Narayanan – ICSI and UC Berkeley
![Page 2: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/2.jpg)
Final Exam
8:10 to 11 AM on 5/15/2007 at 50 BIRGE Final prep page up Includes all topics (see page).
Weighted toward post midterm topics.
2 double sided cheat sheets allowed as is a calculator.
Final exam review Thursday 4 PM Soda 306.
![Page 3: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/3.jpg)
Utility-Based Agents
![Page 4: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/4.jpg)
Today
Review of post midterm topics relevant for the final. Reasoning about time
Markov Models HMM forward algorithm, Vitterbi Algorithm.
Classification Naïve Bayes, Perceptron
Reinforcement Learning MDP, Value Iteration, Policy iteration TD-value learning, Q-learning,
Advanced topics Applications to NLP
![Page 5: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/5.jpg)
Questions
What is the basic conditional independence assertion for markov models?
What is a problem with Markov Models for prediction into the future?
What are the basic CI assertions for HMM? How do inference algorithms exploit the CI
assertions Forward Algorithm Viterbi algorithm.
![Page 6: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/6.jpg)
Markov Models
A Markov model is a chain-structured BN Each node is identically distributed (stationarity) Value of X at a given time is called the state As a BN:
Parameters: called transition probabilities or dynamics, specify how the state evolves over time (also, initial probs)
X2X1 X3 X4
![Page 7: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/7.jpg)
Conditional Independence
Basic conditional independence: Past and future independent of the present Each time step only depends on the previous This is called the (first order) Markov property
Note that the chain is just a (growing BN) We can always use generic BN reasoning on it (if we
truncate the chain)
X2X1 X3 X4
![Page 8: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/8.jpg)
Example
From initial state (observation of sun)
From initial state (observation of rain)
P(X1) P(X2) P(X3) P(X)
P(X1) P(X2) P(X3) P(X)
![Page 9: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/9.jpg)
Hidden Markov Models
Markov chains not so useful for most agents Eventually you don’t know anything anymore Need observations to update your beliefs
Hidden Markov models (HMMs) Underlying Markov chain over states S You observe outputs (effects) at each time step As a Bayes’ net:
X5X2
E1
X1 X3 X4
E2 E3 E4 E5
![Page 10: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/10.jpg)
Example
An HMM is Initial distribution: Transitions: Emissions:
![Page 11: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/11.jpg)
Conditional Independence
HMMs have two important independence properties: Markov hidden process, future depends on past via the present Current observation independent of all else given current state
Quiz: does this mean that observations are independent given no evidence? [No, correlated by the hidden state]
X5X2
E1
X1 X3 X4
E2 E3 E4 E5
![Page 12: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/12.jpg)
Forward Algorithm
Can ask the same questions for HMMs as Markov chains Given current belief state, how to update with evidence?
This is called monitoring or filtering
Formally, we want: X5X2
E1
X1 X3 X4
E2 E3 E4 E5
![Page 13: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/13.jpg)
Viterbi Algorithm Question: what is the most likely state sequence given
the observations? Slow answer: enumerate all possibilities Better answer: cached incremental version
X5X2
E1
X1 X3 X4
E2 E3 E4 E5
![Page 14: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/14.jpg)
Classification
Supervised Models Generative Models
Naïve Bayes
Discriminative Models Perceptron
Unsupervised Models K-means Agglomerative Cluster
![Page 15: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/15.jpg)
Parameter estimation
What are the parameters for Naïve Bayes?
What is Maximum Likelihood estimation for NB?
What are the problems with ML estimates?
![Page 16: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/16.jpg)
General Naïve Bayes
A general naive Bayes model:
We only specify how each feature depends on the class Total number of parameters is linear in n
C
E1 EnE2
|C| parameters n x |E| x |C| parameters
|C| x |E|n parameters
![Page 17: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/17.jpg)
Estimation: Smoothing
Problems with maximum likelihood (relative frequency) estimates: If I flip a coin once, and it’s heads, what’s the estimate for
P(heads)? What if I flip 10 times with 8 heads? What if I flip 10M times with 8M heads?
Basic idea: We have some prior expectation about parameters (here, the
probability of heads) Given little evidence, we should skew towards our prior Given a lot of evidence, we should listen to the data
![Page 18: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/18.jpg)
Estimation: Laplace Smoothing
Laplace’s estimate (extended): Pretend you saw every outcome
k extra times
What’s Laplace with k = 0? k is the strength of the prior
Laplace for conditionals: Smooth each condition
independently:
H H T
![Page 19: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/19.jpg)
Types of Supervised classifiers
Generative Models Naïve Bayes
Discriminative Models Perceptron
![Page 20: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/20.jpg)
Questions
What is a binary threshold perceptron? How can we make a multi-class
perceptron? What sorts of patterns can perceptrons
classify correctly
![Page 21: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/21.jpg)
The Binary Perceptron
Inputs are features Each feature has a weight Sum is the activation
If the activation is: Positive, output 1 Negative, output 0
f1
f2
f3
w1
w2
w3
>0?
![Page 22: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/22.jpg)
The Multiclass Perceptron
If we have more than two classes: Have a weight vector for
each class Calculate an activation for
each class
Highest activation wins
![Page 23: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/23.jpg)
Linear Separators
Binary classification can be viewed as the task of separating classes in feature space:
w . x = 0
w . x < 0
w . x > 0
![Page 24: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/24.jpg)
Feature design
Can we design features f1 and f2 to use a perceptron to separate the the two classes?
![Page 25: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/25.jpg)
MDP and Reinforcement Learning
What is an MDP (Basics) ? What is Bellman’s equation and how is it
used in value iteration? What is reinforcement learning
TD-value learning Q learning Exploration vs. exploitation
![Page 26: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/26.jpg)
Markov Decision Processes Markov decision processes (MDPs)
A set of states s S A model T(s,a,s’) = P(s’ | s,a)
Probability that action a in state s leads to s’
A reward function R(s, a, s’) (sometimes just R(s) for leaving a state or R(s’) for entering one)
A start state (or distribution) Maybe a terminal state
MDPs are the simplest case of reinforcement learning In general reinforcement learning, we
don’t know the model or the reward function
![Page 27: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/27.jpg)
Bellman’s Equation for Selecting actions
Definition of utility leads to a simple relationship amongst optimal utility values:
Optimal rewards = maximize over first action and then follow optimal policy
Formally: Bellman’s Equation
That’s my equation!
![Page 28: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/28.jpg)
Elements of RL
Transition model, how action influences states Reward R, immediate value of state-action transition Policy , maps states to actions
Agent
Environment
State Reward Action
Policy
sss 221100 r a2
r a1
r a0 :::
![Page 29: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/29.jpg)
MDPs
Which of the following are true?
A
B
C
D
E
![Page 30: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/30.jpg)
Reinforcement Learning
What’s wrong with the following agents?
![Page 31: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/31.jpg)
Model-Free Learning
Big idea: why bother learning T? Update each time we experience a transition Frequent outcomes will contribute more updates
(over time)
Temporal difference learning (TD) Policy still fixed! Move values toward value of whatever
successor occurs
a
s
s, a
s,a,s’s’
![Page 32: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/32.jpg)
Problems with TD Value Learning
TD value learning is model-free for policy evaluation However, if we want to turn
our value estimates into a policy, we’re sunk:
Idea: Learn state-action pairings (Q-values) directly
Makes action selection model-free too!
a
s
s, a
s,a,s’s’
![Page 33: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/33.jpg)
Q-Learning
Learn Q*(s,a) values Receive a sample (s,a,s’,r) (select a using e-greedy) Consider your old estimate: Consider your new sample estimate:
Nudge the old estimate towards the new sample
Set s = s’ until s is terminal
![Page 34: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/34.jpg)
Applications to NLP
How can generative models play a role in MT, Speech, NLP?
List three kinds of ambiguities often found in language?
![Page 35: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/35.jpg)
NLP applications ofBayes Rules!!
Handwriting recognition P (text | strokes) = P (text) * P (strokes | text)
Spelling correction P (text | typos) = P (text) * P (typos | text)
OCR P (text | image) = P (text) * P (image | text)
MT P (english | french) = P (english) * P (french| english)
Speech recognition P (language | sound) = P (LM) * P (sound | LM)
![Page 36: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/36.jpg)
Ambiguities
Headlines: Iraqi Head Seeks Arms Ban on Nude Dancing on Governor’s Desk Juvenile Court to Try Shooting Defendant Teacher Strikes Idle Kids Stolen Painting Found by Tree Kids Make Nutritious Snacks Local HS Dropouts Cut in Half Hospitals Are Sued by 7 Foot Doctors
Why are these funny?
![Page 37: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/37.jpg)
Learning
I hear and I forget
I see and I remember
I do and I understand
attributed to Confucius 551-479 B.C.
![Page 38: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/38.jpg)
Thanks!
And good luck on the final and for the future!Srini Narayanan
![Page 39: CS 188: Artificial Intelligence Spring 2007 Lecture 29: Post-midterm course review 5/8/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader037.vdocuments.net/reader037/viewer/2022103022/56649d445503460f94a2041d/html5/thumbnails/39.jpg)
Phase II: Update Means
Move each mean to the average of its assigned points:
Also can only decrease total distance… (Why?)
Fun fact: the point y with minimum squared Euclidean distance to a set of points {x} is their mean