probability of random events. random event. statistical and classic determination of probability of...

35
Probability of random events. Random event. Statistical and classic determination of probability of random events. Set-theoretic consideration of random events.

Upload: irma-lydia-daniel

Post on 21-Jan-2016

239 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Probability of random events. Random event. Statistical and classic determination of probability of random events. Set- theoretic consideration of random

Probability of random events. Random event. Statistical and classic determination of probability of random events. Set-theoretic consideration of random events.

Page 2: Probability of random events. Random event. Statistical and classic determination of probability of random events. Set- theoretic consideration of random

History

• Games of chance: 300 BC• 1565: first formalizations• 1654: Fermat & Pascal, conditional probability• Reverend Bayes: 1750’s• 1950: Kolmogorov: axiomatic approach• Objectivists vs subjectivists

– (frequentists vs Bayesians)

• Frequentist build one model• Bayesians use all possible models, with

priors

Page 3: Probability of random events. Random event. Statistical and classic determination of probability of random events. Set- theoretic consideration of random

Concerns

• Future: what is the likelihood that a student will get a CS job given his grades?

• Current: what is the likelihood that a person has cancer given his symptoms?

• Past: what is the likelihood that Marilyn Monroe committed suicide?

• Combining evidence.• Always: Representation & Inference

Page 4: Probability of random events. Random event. Statistical and classic determination of probability of random events. Set- theoretic consideration of random

Basic Idea

• Attach degrees of belief to proposition.• Theorem: Probability theory is the best

way to do this.– if someone does it differently you can play a

game with him and win his money.

• Unlike logic, probability theory is non-monotonic.

• Additional evidence can lower or raise belief in a proposition.

Page 5: Probability of random events. Random event. Statistical and classic determination of probability of random events. Set- theoretic consideration of random

Probability Models: Basic Questions• What are they?

– Analogous to constraint models, with probabilities on each table entry

• How can we use them to make inferences?– Probability theory

• How does new evidence change inferences– Non-monotonic problem solved

• How can we acquire them? – Experts for model structure, hill-climbing for

parameters

Page 6: Probability of random events. Random event. Statistical and classic determination of probability of random events. Set- theoretic consideration of random

Discrete Probability Model

• Set of RandomVariables V1,V2,…Vn

• Each RV has a discrete set of values

• Joint probability known or computable

• For all vi in domain(Vi), Prob(V1=v1,V2=v2,..Vn=vn) is known, non-negative, and sums to 1.

Page 7: Probability of random events. Random event. Statistical and classic determination of probability of random events. Set- theoretic consideration of random

Random Variable

• Intuition: A variable whose values belongs to a known set of values, the domain.

• Math: non-negative function on a domain (called the sample space) whose sum is 1.

• Boolean RV: John has a cavity. – cavity domain ={true,false}

• Discrete RV: Weather Condition– wc domain= {snowy, rainy, cloudy, sunny}.

• Continuous RV: John’s height– john’s height domain = { positive real number}

Page 8: Probability of random events. Random event. Statistical and classic determination of probability of random events. Set- theoretic consideration of random

Cross-Product RV

• If X is RV with values x1,..xn and– Y is RV with values y1,..ym, then– Z = X x Y is a RV with n*m values

<x1,y1>…<xn,ym>

• This will be very useful!

• This does not mean P(X,Y) = P(X)*P(Y).

Page 9: Probability of random events. Random event. Statistical and classic determination of probability of random events. Set- theoretic consideration of random

Discrete Probability Distribution

• If a discrete RV X has values v1,…vn, then a prob distribution for X is non-negative real valued function p such that: sum p(vi) = 1.

• This is just a (normalized) histogram.• Example: a coin is flipped 10 times and

heads occur 6 times.• What is best probability model to predict

this result?• Biased coin model: prob head = .6, trials =

10

Page 10: Probability of random events. Random event. Statistical and classic determination of probability of random events. Set- theoretic consideration of random

From Model to PredictionUse Math or Simulation

• Math: X = number of heads in 10 flips• P(X = 0) = .4^10• P(X = 1) = 10* .6*.4^9• P(X = 2) = Comb(10,2)*.6^2*.4^8 etc• Where Comb(n,m) = n!/ (n-m)!* m!.• Simulation: Do many times: flip coin (p

= .6) 10 times, record heads.• Math is exact, but sometimes too hard.• Computation is inexact and expensive, but

doable

Page 11: Probability of random events. Random event. Statistical and classic determination of probability of random events. Set- theoretic consideration of random

p=.6 Exact 10 100 1000

0 .0001 .0 .0 .0

1 .001 .0 .0 .002

2 .010 .0 .01 .011

3 .042 .0 .04 .042

4 .111 .2 .05 .117

5 .200 .1 .24 .200

6 .250 .6 .22 .246

7 .214 .1 .16 .231

8 .120 .0 .18 .108

9 .43 .0 .09 .035

10 .005 .0 .01 .008

Page 12: Probability of random events. Random event. Statistical and classic determination of probability of random events. Set- theoretic consideration of random

P=.5 Exact 10 100 1000

0 .0009 .0 .0 .002

1 .009 .0 .01 .011

2 .043 .0 .07 .044

3 .117 .1 .13 .101

4 .205 .2 .24 .231

5 .246 .0 .28 .218

6 .205 .3 .15 .224

7 .117 .3 .08 .118

8 .043 .1 .04 .046

9 .009 .0 .0 .009

10 .0009 .0 .0 .001

Page 13: Probability of random events. Random event. Statistical and classic determination of probability of random events. Set- theoretic consideration of random

Learning Model: Hill Climbing

• Theoretically it can be shown that p = .6 is best model.

• Without theory, pick a random p value and simulate. Now try a larger and a smaller p value.

• Maximize P(Data|Model). Get model which gives highest probability to the data.

• This approach extends to more complicated models (variables, parameters).

Page 14: Probability of random events. Random event. Statistical and classic determination of probability of random events. Set- theoretic consideration of random

Another Data Set

What’s going on?

0 .34

1 .38

2 .19

3 .05

4 .01

5 .02

6 .08

7 .20

8 .30

9 .26

10 .1

Page 15: Probability of random events. Random event. Statistical and classic determination of probability of random events. Set- theoretic consideration of random

Mixture Model

• Data generated from two simple models

• coin1 prob = .8 of heads

• coin2 prob = .1 of heads

• With prob .5 pick coin 1 or coin 2 and flip.

• Model has more parameters

• Experts are supposed to supply the model.

• Use data to estimate the parameters.

Page 16: Probability of random events. Random event. Statistical and classic determination of probability of random events. Set- theoretic consideration of random

Continuous Probability

• RV X has values in R, then a prob distribution for X is a non-negative real-valued function p such that the integral of p over R is 1. (called prob density function)

• Standard distributions are uniform, normal or gaussian, poisson, etc.

• May resort to empirical if can’t compute analytically. I.E. Use histogram.

Page 17: Probability of random events. Random event. Statistical and classic determination of probability of random events. Set- theoretic consideration of random

Joint Probability: full knowledge• If X and Y are discrete RVs, then the prob

distribution for X x Y is called the joint prob distribution.

• Let x be in domain of X, y in domain of Y.• If P(X=x,Y=y) = P(X=x)*P(Y=y) for every x

and y, then X and Y are independent.• Standard Shorthand: P(X,Y)=P(X)*P(Y),

which means exactly the statement above.

Page 18: Probability of random events. Random event. Statistical and classic determination of probability of random events. Set- theoretic consideration of random

Marginalization

• Given the joint probability for X and Y, you can compute everything.

• Joint probability to individual probabilities.

• P(X =x) is sum P(X=x and Y=y) over all y

• Conditioning is similar:– P(X=x) = sum P(X=x|Y=y)*P(Y=y)

Page 19: Probability of random events. Random event. Statistical and classic determination of probability of random events. Set- theoretic consideration of random

Marginalization Example

• Compute Prob(X is healthy) from• P(X healthy & X tests positive) = .1• P(X healthy & X tests neg) = .8• P(X healthy) = .1 + .8 = .9• P(flush) = P(heart flush)+P(spade flush)

+

P(diamond flush)+ P(club flush)

Page 20: Probability of random events. Random event. Statistical and classic determination of probability of random events. Set- theoretic consideration of random

Conditional Probability• P(X=x | Y=y) = P(X=x, Y=y)/P(Y=y).• Intuition: use simple examples• 1 card hand X = value card, Y = suit

card P( X= ace | Y= heart) = 1/13 also P( X=ace , Y=heart) = 1/52 P(Y=heart) = 1 / 4 P( X=ace, Y= heart)/P(Y =heart) = 1/13.

Page 21: Probability of random events. Random event. Statistical and classic determination of probability of random events. Set- theoretic consideration of random

Formula

• Shorthand: P(X|Y) = P(X,Y)/P(Y).

• Product Rule: P(X,Y) = P(X |Y) * P(Y)

• Bayes Rule:– P(X|Y) = P(Y|X) *P(X)/P(Y).

• Remember the abbreviations.

Page 22: Probability of random events. Random event. Statistical and classic determination of probability of random events. Set- theoretic consideration of random

Conditional Example

• P(A = 0) = .7• P(A = 1) = .3

P(A,B) = P(B,A)

P(B,A)= P(B|A)*P(A)

P(A,B) = P(A|B)*P(B)

P(A|B) = P(B|A)*P(A)/P(B)

B A P(B|A)

0 0 .2

0 1 .9

1 0 .8

1 1 .1

Page 23: Probability of random events. Random event. Statistical and classic determination of probability of random events. Set- theoretic consideration of random

Exact and simulated

A B P(A,B) 10 100 1000

0 0 .14 .1 .18 .14

0 1 .56 .6 .55 .56

1 0 .27 .2 .24 .24

1 1 .03 .1 .03 .06

Page 24: Probability of random events. Random event. Statistical and classic determination of probability of random events. Set- theoretic consideration of random

Note Joint yields everything

• Via marginalization

• P(A = 0) = P(A=0,B=0)+P(A=0,B=1)=– .14+.56 = .7

• P(B=0) = P(B=0,A=0)+P(B=0,A=1) =– .14+.27 = .41

Page 25: Probability of random events. Random event. Statistical and classic determination of probability of random events. Set- theoretic consideration of random

Simulation

• Given prob for A and prob for B given A• First, choose value for A, according to

prob• Now use conditional table to choose

value for B with correct probability.• That constructs one world.• Repeats lots of times and count number

of times A= 0 & B = 0, A=0 & B= 1, etc.• Turn counts into probabilities.

Page 26: Probability of random events. Random event. Statistical and classic determination of probability of random events. Set- theoretic consideration of random

Consequences of Bayes Rules

• P(X|Y,Z) = P(Y,Z |X)*P(X)/P(Y,Z).

proof: Treat Y&Z as new product RV U

P(X|U) =P(U|X)*P(X)/P(U) by bayes• P(X1,X2,X3) =P(X3|X1,X2)*P(X1,X2)

= P(X3|X1,X2)*P(X2|X1)*P(X1) or

• P(X1,X2,X3) =P(X1)*P(X2|X1)*P(X3|X1,X2).• Note: These equations make no assumptions!• Last equation is called the Chain or Product Rule• Can pick the any ordering of variables.

Page 27: Probability of random events. Random event. Statistical and classic determination of probability of random events. Set- theoretic consideration of random

Extensions of P(A) +P(~A) = 1

• P(X|Y) + P(~X|Y) = 1

• Semantic Argument– conditional just restricts worlds

• Syntactic Argument: lhs equals– P(X,Y)/P(Y) + P(~X,Y)/P(Y) =– (P(X,Y) + P(~X,Y))/P(Y) =

(marginalization)– P(Y)/P(Y) = 1.

Page 28: Probability of random events. Random event. Statistical and classic determination of probability of random events. Set- theoretic consideration of random

Bayes Rule Example• Meningitis causes stiff neck (.5).

– P(s|m) = 0.5

• Prior prob of meningitis = 1/50,000.– p(m)= 1/50,000 = .00002

• Prior prob of stick neck ( 1/20).– p(s) = 1/20.

• Does patient have meningitis?– p(m|s) = p(s|m)*p(m)/p(s) = 0.0002.

• Is this reasonable? p(s|m)/p(s) = change=10

Page 29: Probability of random events. Random event. Statistical and classic determination of probability of random events. Set- theoretic consideration of random

Bayes Rule: multiple symptoms

• Given symptoms s1,s2,..sn, what estimate probability of Disease D.

• P(D|s1,s2…sn) = P(D,s1,..sn)/P(s1,s2..sn).

• If each symptom is boolean, need tables of size 2^n. ex. breast cancer data has 73 features per patient. 2^73 is too big.

• Approximate!

Page 30: Probability of random events. Random event. Statistical and classic determination of probability of random events. Set- theoretic consideration of random

Notation: max arg

• Conceptual definition, not operational

• Max arg f(x) is a value of x that maximizes f(x).

• MaxArg Prob(X = 6 heads | prob heads)

yields prob(heads) = .6

Page 31: Probability of random events. Random event. Statistical and classic determination of probability of random events. Set- theoretic consideration of random

Idiot or Naïve Bayes: First learning Algorithm

Goal: max arg P(D| s1..sn) over all Diseases= max arg P(s1,..sn|D)*P(D)/ P(s1,..sn) = max arg P(s1,..sn|D)*P(D) (why?)~ max arg P(s1|D)*P(s2|D)…P(sn|D)*P(D).• Assumes conditional independence.

• enough data to estimate

• Not necessary to get prob right: only order.

• Pretty good but Bayes Nets do it better.

Page 32: Probability of random events. Random event. Statistical and classic determination of probability of random events. Set- theoretic consideration of random

Chain Rule and Markov Models

• Recall P(X1, X2, …Xn) = P(X1)*P(X2|X1)*…P(Xn| X1,X2,..Xn-1).

• If X1, X2, etc are values at time points 1, 2..

and if Xn only depends on k previous times, then this is a markov model of order k.

• MMO: Independent of time– P(X1,…Xn) = P(X1)*P(X2)..*P(Xn)

Page 33: Probability of random events. Random event. Statistical and classic determination of probability of random events. Set- theoretic consideration of random

Markov Models

• MM1: depends only on previous time– P(X1,…Xn)= P(X1)*P(X2|X1)*…P(Xn|Xn-

1).

• May also be used for approximating probabilities. Much simpler to estimate.

• MM2: depends on previous 2 times– P(X1,X2,..Xn)= P(X1,X2)*P(X3|X1,X2) etc

Page 34: Probability of random events. Random event. Statistical and classic determination of probability of random events. Set- theoretic consideration of random

Common DNA application

• Looking for needles: surprising frequency?

• Goal:Compute P(gataag) given lots of data

• MM0 = P(g)*P(a)*P(t)*P(a)*P(a)*P(g).

• MM1 = P(g)*P(a|g)*P(t|a)*P(a|a)*P(g|a).

• MM2 = P(ga)*P(t|ga)*P(a|ta)*P(g|aa).

• Note: each approximation requires less data and less computation time.

Page 35: Probability of random events. Random event. Statistical and classic determination of probability of random events. Set- theoretic consideration of random

Good luck