cs 415 – a.i. slide set 12. chapter 5 – stochastic learning heuristic – apply to problems who...
TRANSCRIPT
Chapter 5 – Stochastic Learning
Heuristic – apply to problems who either don’t have an exact solution, or whose state spaces are prohibitively large
Stochastic Methodology – also good for these situations
Based on counting the elements of an application domain
Addition and Multiplication Rule
Set A |A| - cardinality of A (number of elts)
A could be: empty, finite, countably infinite, or uncountably infinite
U – Universe (a.k.a. Domain) The set of ALL elements that could be in A
A’ - Compliment Example
U – people in a room A – males from U A’ – females in the room
Permutations and Combinations Permutation – an arranged sequence
of elements of a set (each used only once)
Question: how many unique permutations are there of a set of size n?
n * n-1 * n-2 * n-3 * … * 1 Question: how many ways can we
arrange a set of 10 books on a shelf where only 6 books can fit?
nPr
Combination – Any subset of the elements that can be formed
Question: How many combinations given a set of items?
1 combination for n elements Order DOES NOT MATTER
Question: How many combinations taken r at a time (How many ways can I form a four person committee from 10 people?)
Examples
What is the probability of rolling a 7 or 11 from two fair die?
Sample Space Size? 36
Event Size? 8
For 6: 1,6; 2,5; 3,4; 4,3; 5,2; 6,1 For 11: 10,1; 1,10
Probability 8/36 = 2/9 Add them together because they are “independent”
How many four-of-a-kind hands can be dealt in all possible five card hands?
Sample Space? 52 cards taken 5 at a time
Event Space? Multiply number of combinations of 13
cards 1 at a time * Combination of 4 taken 4 at a time * 48
(number of different kinds of cards) * (number of ways to pick all four cards of same kind) * (times the number of ways the fifth card can be chosen)
See top of pg 172 Approx. 0.00024
The probability of any event E from the sample space S is:
The sum of the probabilities of all possible outcomes is 1
The probability of the compliment of an event is
The probability of the contradictory or false outcome of an event
Luger: Artificial Intelligence, 5th edition. © Pearson Education Limited, 2005 8
Are Two Events Independent?
Random bit strings of length four 2 Events
1. String has even number of ones2. Bit string ends with a zero
A total of 24=16 bit strings – 8 strings end with zero– 8 strings have even number of ones
Are independent events
Probabilistic Inference: Example 3 boolean random variables
All either true or false S – traffic is slowing down A – probability of an accident C – probability of road construction
Given state traffic data in Table 5.1 Next slide Note: all possibilities sum to 1
Can use these numbers to calculate Probability traffic slowdown Probability of construction without
slowdown, etc.
Luger: Artificial Intelligence, 5th edition. © Pearson Education Limited, 2005
Table 5.1 The joint probability distribution for the traffic slowdown, S, accident, A, and construction, C, variable of the example of Section 5.3.2
Fig 5.1 A Venn diagram representation of the probability distributions of Table 5.1; S is traffic slowdown, A is accident, C is construction.
11
Random Variables Individual Probability Computation
1. Combinatorical Methods (Analytical) Ex: Probability of rolling a 5 on a 6-sided
die
2. Sampling Events (Empirical) For times when it isn’t that simple to
analyze Assumptions
Not all events are equally likely (Easier if they are)
Probability of event lies between 0 and 1 Probabilities of union of sets still holds
Use “Random Variables”
Random Variable Example
Random variable – Season Domain – {spring, summer, fall, winter}
Discrete Random Variable p(Season=spring) = .75
Boolean Random Variable p(Season=spring) = true
Example A fair roulette wheel
Integers 0 – 36 equally spaced Each player places $1 on any number Wheel stops on that number, wins $35
Else – loses the $1
Reward of win $35 Probability 1/37
Cost of loss $1 Probability 36/37
Ex(E) = 35(1/37) + (-1)(36/37) = -0.027
Conditional Probability
2 kinds of Probablities1. Prior Probabilities
What’s the probability of getting a 2 or a 3 on a fair die?
2. Conditional (Posterior) Probabilities If a patient has system X, Y and Z then
what is the probability that he has the flu?
Luger: Artificial Intelligence, 5th edition. © Pearson Education Limited, 2005
Fig 5.2 A Venn diagram illustrating the calculations of P(d|s) as a function of p(s|d).
15
The chain rule for two sets:
The generalization of the chain rule to multiple sets
Luger: Artificial Intelligence, 5th edition. © Pearson Education Limited, 2005
We make an inductive argument to prove the chain rule, consider the nth case:
We apply the intersection of two sets of rules to get:
And then reduce again, considering that:
Until is reached, the base case, which we have already demonstrated.
16
You say [to ow m ey tow] and I say [t ow m aa t ow]…
- Ira Gershwin, Lets Call The Whole Thing Off
Luger: Artificial Intelligence, 5th edition. © Pearson Education Limited, 2005
Fig 5.3 A probabilistic finite state acceptor for the pronunciation of “tomato”, adapted from Jurafsky and Martin (2000).
19
Phoneme Recognition Problem
Use the Tomato style stochastic finite state acceptor
Interpret ambiguous collections of phonemes See how well the phonemes match the path
through the state machine for this and other words
phoneme is the smallest structural unit that distinguishes meaning. Phonemes are not the physical segments themselves, but, in theoretical terms, cognitive abstractions or categorizations of them.
the /t/ sound in the words tip, stand, water, and cat
Phoneme Recognition Problem Suppose an algorithm has identified the phoneme
/ni/ Occurs just after other recognized speech, /l/
Need to associate phoneme with either a word or the first part of a word
How? Brown corpus
1 Million word collection of sentences from 500 texts
Switchboard corpus 1.4 Million word collection of phone
conversations Together: ~2.5 Million words that let us
sample written and spoken words
How to proceed?
Can determine which word with the phoneme is most likely used
See Table 5.2 Most likely, “the”
Luger: Artificial Intelligence, 5th edition. © Pearson Education Limited, 2005
Table 5.2 The ni words with their frequencies and probabilities from the Brown and Switchboard corpora of 2.5M words, adapted from Jurafsky and Martin (2000).
20
Use Bayes’ theorem p(word | [ni]) = p([ni]|word) x p(word) See Table 5.3 Most likely, “new”
“I new” doesn’t make sense “I need”, does
Luger: Artificial Intelligence, 5th edition. © Pearson Education Limited, 2005
Table 5.3 The ni phone/word probabilities from the Brown and Switchboard corpora (Jurafsky and Martin, 2000).
21
Bayes’ Theorem
Review: one disease and one symptom
Individual hypotheses, hi Each is disjoint
Set of hypotheses, H Set of evidence, E
P(hi|E) = (p(E|hi) x p(hi))/p(E) Can use this to determine which
hypothesis is strongest given E Drop the denominator Arg max (maximum likelihood)
hypothesis
Finding p(E)
Given: entire space is partitioned by the set of hypotheses hi
Partition of a set = split of set into disjoint subsets
p(E) = Σip(E|hi)p(hi)
Bayes’ Theorem : General Form
The general form of Bayes’ theorem where we assume the set of hypotheses H partition the evidence set E:
Example
Suppose you want to buy a car Prob go to dealer 1, d1 Prob purchasing a car at d1, a1
Necessary for using Bayes’ All probabilities with various hi must be
known All relationships between evidence and
hypothesis {p(E|hi)} must be known
The application of Bayes’ rule to the car purchase problem:
Luger: Artificial Intelligence, 5th edition. © Pearson Education Limited, 2005 23