john p john fang yu, yinglian xie, arvind krishnamurthy, martin

50
deSEO: Combating Search-Result Poisoning John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin Abadi University of Washington & MSR, Silicon Valley

Upload: phungcong

Post on 09-Jan-2017

226 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

deSEO: Combating Search-Result Poisoning

John P JohnFang Yu, Yinglian Xie,

Arvind Krishnamurthy, Martin AbadiUniversity of Washington & MSR, Silicon Valley

Page 2: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

The malware pipeline

bad stuff

spread malicious links via email, IM, search results

compromise web servers and host malicious content

find vulnerable web servers

Page 3: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

The malware pipeline

• Malware links spread through:

• spam emails, spam IMs, social networks, search results, etc.

• We look at search results

bad stuff

spread malicious links via email, IM, search results

compromise web servers and host malicious content

find vulnerable web servers

Page 4: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin
Page 5: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

Is this really a problem?

• ~40% of popular searches contain at least one malicious link in top results

• Scareware fraud made $150 m. in pro!t last year

Page 6: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

Is this really a problem?

• ~40% of popular searches contain at least one malicious link in top results

• Scareware fraud made $150 m. in pro!t last year

Page 7: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

Contributions

• How does the search poisoning attack work?

• What can we learn about such attacks?

• How can we defend against them?

-examined a live attack involving 5,000 compromised sites

-identi!ed common features in search poisoning attacks

-developed deSEO, which detected new live SEO attacks on 1,000+ domains

Page 8: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

Anatomy of SEO attack

search engine

redirection server

exploit server

compromised Web server

Page 9: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

Anatomy of SEO attack

search query

search engine

redirection server

exploit server

compromised Web server

Page 10: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

Anatomy of SEO attack

search query

search engine

redirection server

exploit server

compromised Web server

Page 11: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

Anatomy of SEO attack

search query

search engine

redirection server

exploit server

compromised Web server

Page 12: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

Anatomy of SEO attack

search query

search engine

redirection server

exploit server

compromised Web server

Page 13: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

Anatomy of SEO attack

search query

search engine

redirection server

exploit server

compromised Web server

Page 14: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

Analysis of an attack

• Examine a speci!c attack

• August - October 2010

• 5,000 compromised domains

• Tens of thousands of compromised keywords

• Millions of SEO pages generated

Page 15: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

How are servers compromised?

• Sites running osCommerce

• Unpatched vulnerabilities

• Allows attackers to host any !le on the Web server - including executableswww.example.com/admin/file_manager.php/login.php?action=processuploads!

Page 16: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

What files are uploaded?

Page 17: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

What files are uploaded?

• php shell to manage !le operations

Page 18: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

What files are uploaded?

• php shell to manage !le operations

• HTML templates, images

Page 19: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

What files are uploaded?

• php shell to manage !le operations

• HTML templates, images

• php script to generate SEO web pages

Page 20: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

The main php script

www.example.com/images/page.php?page=kobayashi+arrested

Page 21: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

The main php script

www.example.com/images/page.php?page=kobayashi+arrestedkobayashi arrested

Page 22: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

The main php script

• Obfuscated script

• Simple encryption using nested evals

www.example.com/images/page.php?page=kobayashi+arrested

Page 23: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

The main script (de-obfuscated)

Page 24: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

The main script (de-obfuscated)

Check if search crawler

Generate page for keyword

Page 25: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

The main script (de-obfuscated)

Check if search crawler

Generate page for keyword

Fetch: snippets from google images from bing

Page 26: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

The main script (de-obfuscated)

Check if search crawler

Generate page for keyword

Fetch: snippets from google images from bing

Add links to other compromised sites

Page 27: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

The main script (de-obfuscated)

Check if search crawler

Generate page for keyword

Fetch: snippets from google images from bing

Add links to other compromised sites

Cache page

Page 28: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

Dense link structure

• Other compromised domains found by crawling included links

• Each site linked to 200 other sites

• ~5,000 compromised domains identi!ed

• Each site hosted 8,000 SEO pages

• 40 million pages total

Page 29: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

Poisoned keywords

• 20,000+ popular search terms poisoned

Page 30: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

Poisoned keywords

• 20,000+ popular search terms poisoned

Page 31: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

Poisoned keywords

• 20,000+ popular search terms poisoned

Page 32: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

Poisoned keywords

• 20,000+ popular search terms poisoned

• Google Trends + Bing related searches

• haiti earthquake

• senate elections

• veterans day 2010

• halloween 2010

• thanksgiving 2010 ...

Page 33: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

Poisoned keywords

• 20,000+ popular search terms poisoned

• Google Trends + Bing related searches

• haiti earthquake

• senate elections

• veterans day 2010

• halloween 2010

• thanksgiving 2010 ...

• 95% of Google Trends keywords poisoned

Page 34: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

Redirection servers

• Three domains used for redirection

• Over 1,000 exploit URLs fetched

τ0 τ1 τ2 τ3

δ1

τ0+T

δ3

δ2

!"#!!!"$!!!"%!!!"&!!!"'!!!"(!!!")!!!"*!!!"

!"#

$%&'()'*+,-#'*+.+/.'

01/%'

Page 35: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

Redirection servers

• Three domains used for redirection

• Over 1,000 exploit URLs fetched

τ0 τ1 τ2 τ3

δ1

τ0+T

δ3

δ2

Almost 100,000 victims over 10 weeks

!"#!!!"$!!!"%!!!"&!!!"'!!!"(!!!")!!!"*!!!"

!"#

$%&'()'*+,-#'*+.+/.'

01/%'

Page 36: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

Evasive techniques

• Why can’t redirection behavior be easily detected?

• Cloaking

• Requiring user interaction

• Redirection through javascript or "ash

Page 37: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

What are prominent features in search poisoning?

• Dense link structure

• Automatic generation of relevant pages

• Large number of pages with popular keywords

• Behavior of compromised sites• before - diverse content and behavior• after - similar content and behavior

Page 38: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

What are prominent features in search poisoning?

• Dense link structure

• Automatic generation of relevant pages

• Large number of pages with popular keywords

• Behavior of compromised sites• before - diverse content and behavior• after - similar content and behavior

Page 39: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

deSEO steps

1. History-based !ltering

select domains where many new pages are set up, di#erent from older pages

2. Clustering suspicious domains

using K-means++

3. Group similarity analysis

select groups where new pages are similar across domains

Page 40: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

Sample web URLs with trendy keywords

http://www.askania-fachmaerkte.de/images/news.php?page=justin+bieber+breaks+neck

Page 41: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

Sample web URLs with trendy keywords

History based detection

Page 42: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

History based detection

Domain clustering -lexical features of URLs

String features- keyword separators, arguments, !lename, path

Numerical features- number of arguments, length of arguments, length of keywords

Bag of words- set of keywords

Sample web URLs with trendy keywords

Page 43: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

History based detection

Domain clustering -lexical features of URLs

Group analysis -web page feature similarity

Sample web URLs with trendy keywords

Page 44: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

History based detection

Domain clustering -lexical features of URLs

Group analysis -web page feature similarity

Sample web URLs with trendy keywords

Page 45: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

History based detection

Domain clustering -lexical features of URLs

Group analysis -web page feature similarity

!!"!#!"!$!"!%!"!&!"!'!"!(!"!)!"!*!"!+!"#

#! %! '! )! +! ##!

#%!

#'!

#)!

#+!

$#!

$%!

$'!

$)!

%!!

%(!

&!!

&$!

'#!

()!

!"#$%&'

()'*)+#,-.))

/)'*)012.)

!

!"#

!"$

!"%

!"&

!"'

!"(

!")

!"*

! # $ ) + #! $! $+ %$ %* (! (' (( ### #+#

!"#$%&'

()'*)+#,-.))

/)'*)012.)

Sample web URLs with trendy keywords

Page 46: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

History based detection

Domain clustering -lexical features of URLs

Group analysis -web page feature similarity

Regular expressions -to match URLs not in our sample

.*\/xmlrpc\.php\/\?showc=\w+(\+\w+)+$

Sample web URLs with trendy keywords

Page 47: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

deSEO findings

• 11 malicious groups from sampled web graph in January 2011

• 957 domains

• 15,482 URLs

• Revealed a new search poisoning attack

• compromised Wordpress installations

• cloaking to avoid detection

• di#erent link topology

Page 48: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

Applying to search results

• 120 keyword searches in Google and Bing

• 163 malicious URLs detected in results

• 43 search terms a#ected

!"#$ %&'($)*+,*'"-./.+&0*-.1203 45 56 64 77 48 69 4: 7; ;

<*

5*

4*

8*

:*

3<*

3* 5* 6* 4* 7* 8* 9* :* ;*!"#

$%&''()'#

*+,-,(".'+,/0

.'

1%*&-2'&%."+3'4*5%'

Page 49: John P John Fang Yu, Yinglian Xie, Arvind Krishnamurthy, Martin

Conclusion

• Malware and SEO are big problems

• Analyzed an ongoing scareware campaign

• Identi!ed thousands of compromised domains

• Identi!ed prominent features in SEO attacks and used them to build deSEO

• Promising results on a partial dataset from bing

• Identi!ed multiple live SEO attacks