modeling social data, lecture 6: classification with naive bayes
Post on 15-Jul-2015
456 Views
Preview:
TRANSCRIPT
Classification: Naive BayesAPAM E4990
Modeling Social Data
Jake Hofman
Columbia University
February 27, 2015
Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 1 / 16
Learning by example
• How did you solve this problem?
• Can you make this process explicit (e.g. write code to do so)?
Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 2 / 16
Learning by example
• How did you solve this problem?
• Can you make this process explicit (e.g. write code to do so)?
Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 2 / 16
Diagnoses a la Bayes1
• You’re testing for a rare disease:• 1% of the population is infected
• You have a highly sensitive and specific test:• 99% of sick patients test positive• 99% of healthy patients test negative
• Given that a patient tests positive, what is probability thepatient is sick?
1Wiggins, SciAm 2006Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 3 / 16
Diagnoses a la Bayes
Population10,000 ppl
1% Sick100 ppl
99% Test +99 ppl
1% Test -1 per
99% Healthy9900 ppl
1% Test +99 ppl
99% Test -9801 ppl
Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 4 / 16
Diagnoses a la Bayes
Population10,000 ppl
1% Sick100 ppl
99% Test +99 ppl
1% Test -1 per
99% Healthy9900 ppl
1% Test +99 ppl
99% Test -9801 ppl
So given that a patient tests positive (198 ppl), there is a 50%chance the patient is sick (99 ppl)!
Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 4 / 16
Diagnoses a la Bayes
Population10,000 ppl
1% Sick100 ppl
99% Test +99 ppl
1% Test -1 per
99% Healthy9900 ppl
1% Test +99 ppl
99% Test -9801 ppl
The small error rate on the large healthy population producesmany false positives.
Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 4 / 16
Natural frequencies a la Gigerenzer2
2http://bit.ly/ggbbcJake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 5 / 16
Inverting conditional probabilities
Bayes’ Theorem
Equate the far right- and left-hand sides of product rule
p (y |x) p (x) = p (x , y) = p (x |y) p (y)
and divide to get the probability of y given x from the probabilityof x given y :
p (y |x) =p (x |y) p (y)
p (x)
where p (x) =∑
y∈ΩYp (x |y) p (y) is the normalization constant.
Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 6 / 16
Diagnoses a la Bayes
Given that a patient tests positive, what is probability the patientis sick?
p (sick |+) =
99/100︷ ︸︸ ︷p (+|sick)
1/100︷ ︸︸ ︷p (sick)
p (+)︸ ︷︷ ︸99/1002+99/1002=198/1002
=99
198=
1
2
where p (+) = p (+|sick) p (sick) + p (+|healthy) p (healthy).
Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 7 / 16
(Super) Naive Bayes
We can use Bayes’ rule to build a one-word spam classifier:
p (spam|word) =p (word |spam) p (spam)
p (word)
where we estimate these probabilities with ratios of counts:
p(word |spam) =# spam docs containing word
# spam docs
p(word |ham) =# ham docs containing word
# ham docs
p(spam) =# spam docs
# docs
p(ham) =# ham docs
# docs
Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 8 / 16
(Super) Naive Bayes
$ ./enron_naive_bayes.sh meeting
1500 spam examples
3672 ham examples
16 spam examples containing meeting
153 ham examples containing meeting
estimated P(spam) = .2900
estimated P(ham) = .7100
estimated P(meeting|spam) = .0106
estimated P(meeting|ham) = .0416
P(spam|meeting) = .0923
Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 9 / 16
(Super) Naive Bayes
$ ./enron_naive_bayes.sh money
1500 spam examples
3672 ham examples
194 spam examples containing money
50 ham examples containing money
estimated P(spam) = .2900
estimated P(ham) = .7100
estimated P(money|spam) = .1293
estimated P(money|ham) = .0136
P(spam|money) = .7957
Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 10 / 16
(Super) Naive Bayes
$ ./enron_naive_bayes.sh enron
1500 spam examples
3672 ham examples
0 spam examples containing enron
1478 ham examples containing enron
estimated P(spam) = .2900
estimated P(ham) = .7100
estimated P(enron|spam) = 0
estimated P(enron|ham) = .4025
P(spam|enron) = 0
Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 11 / 16
Naive Bayes
Represent each document by a binary vector ~x where xj = 1 if thej-th word appears in the document (xj = 0 otherwise).
Modeling each word as an independent Bernoulli random variable,the probability of observing a document ~x of class c is:
p (~x |c) =∏j
θxjjc (1− θjc)1−xj
where θjc denotes the probability that the j-th word occurs in adocument of class c .
Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 12 / 16
Naive Bayes
Using this likelihood in Bayes’ rule and taking a logarithm, we have:
log p (c |~x) = logp (~x |c) p (c)
p (~x)
=∑j
xj logθjc
1− θjc+∑j
log(1− θjc) + logθc
p (~x)
where θc is the probability of observing a document of class c .
Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 13 / 16
Naive Bayes
We can eliminate p (~x) by calculating the log-odds:
logp (1|~x)
p (0|~x)=
∑j
xj logθj1(1− θj0)
θj0(1− θj1)︸ ︷︷ ︸wj
+∑j
log1− θj11− θj0
+ logθ1
θ0︸ ︷︷ ︸w0
which gives a linear classifier of the form ~w · ~x + w0
Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 14 / 16
Naive Bayes
We train by counting words and documents within classes toestimate θjc and θc :
θjc =njcnc
θc =ncn
and use these to calculate the weights wj and bias w0:
wj = logθj1(1− θj0)
θj0(1− θj1)
w0 =∑j
log1− θj11− θj0
+ logθ1
θ0
.
We we predict by simply adding the weights of the words thatappear in the document to the bias term.
Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 15 / 16
Naive Bayes
In practice, this works better than one might expect given itssimplicity3
3http://www.jstor.org/pss/1403452Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 16 / 16
Naive Bayes
Training is computationally cheap and scalable, and the model iseasy to update given new observations3
3http://www.springerlink.com/content/wu3g458834583125/Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 16 / 16
Naive Bayes
Performance varies with document representations andcorresponding likelihood models3
3http://ceas.cc/2006/15.pdfJake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 16 / 16
Naive Bayes
It’s often important to smooth parameter estimates (e.g., byadding pseudocounts) to avoid overfitting
θjc =njc + α
nc + α + β
Jake Hofman (Columbia University) Classification: Naive Bayes February 27, 2015 16 / 16
top related