how to spot a bear - an intro to machine learning for seo
TRANSCRIPT
List of rules (first half):(when I asked in the office)
1. Four legs. 2.Breathes. 3.Furry. 4. Long snout.
List of rules:
1. Four legs. 2.Breathes. 3.Furry. 4. Long snout.
5. Brown. 6.Not always brown. 7. Mammal. 8.No tail.
(how do you spot a mammal?!)
Are there adverts on the page?
Are there lots of spelling mistakes?
Is there little text content?
Are there Calls To Action in ALL CAPS?
Some Possible Spam Signals
List of pages we’ve manually classified.
List of attributes that we believe are important to
classifying pages.
adverts on page?
more than 5 spelling
mistakes?
less than 200 words of content?
CTA in ALL CAPS?
site A Y Y Y Y Spam Site
site B N N Y Y Good Site
site C Y N N N Spam Site
site D N Y N Y Spam Site
site E N Y N N Good Site
Example Data
1 x 0.5 = 0.50 x 0.5 = 01 x 0.5 = 0.50 x 0.5 = 0
1______
Total:Output: TRUE
1
if:inputs >= 1
output TRUE
0
1
0
0.5
0.5
0.5
0.5
TRUE
1 x 0.5 = 0.50 x 0.5 = 00 x 0.5 = 00 x 0.5 = 0
0.5______
Total:Output: FALSE
1
if:inputs >= 1
output TRUE
0
0
0
0.5
0.5
0.5
0.5
FALSE
1 x 0.5 = 0.50 x 0.5 = 01 x 0.4 = 0.40 x 0.5 = 0
0.9______
Total:Output: FALSE
1
if:inputs >= 1
output TRUE
0
1
0
0.5
0.5
0.4
0.5
FALSE
adverts on page?
more than 5 spelling
mistakes?
less than 200 words of content?
CTA in ALL CAPS?
site A Y Y Y Y Spam Site
site B N N Y Y Good Site
site C Y N N N Spam Site
site D N Y N Y Spam Site
site E N Y N N Good Site
Example Data
Untrained Neuron
Is site spam?
adverts
>5 spelling mistakes
< 200 words content
CTA in ALL CAPS
if:inputs >= 1
output TRUE
0.5
0.5
0.5
0.5
Training
adverts
>5 spelling mistakes
< 200 words content
CTA in ALL CAPS
if:inputs >= 1
output TRUE
0.5
0.5
0.5
0.5
0
0
1
1
SPAM!
Training
adverts
>5 spelling mistakes
< 200 words content
CTA in ALL CAPS
if:inputs >= 1
output TRUE
0.5
0.5
0.6
0.6
After training: 4/5 sites correct
Is site spam?
adverts
>5 spelling mistakes
< 200 words content
CTA in ALL CAPS
if:inputs >= 1
output TRUE
0.2
0.7
0.4
0.5
We’re better than machines…source: Pawan Sinha (http://web.mit.edu/bcs/sinha/papers/sinha_recog_review_NN.pdf)