defending against large-scale crawls in online social networks mainack mondal bimal viswanath allen...

25
Defending against large- scale crawls in online social networks Mainack Mondal Bimal Viswanath Allen Clement Peter Druschel Krishna Gummadi Alan Mislove Ansley Post †* MPI-SWS Northeastern University *Now at Google CoNEXT, December 2012

Upload: caroline-leblanc

Post on 27-Mar-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Defending against large-scale crawls in online social networks Mainack Mondal Bimal Viswanath Allen Clement Peter Druschel Krishna Gummadi Alan Mislove

Defending against large-scale crawls in online social networks

Mainack Mondal† Bimal Viswanath† Allen Clement†

Peter Druschel† Krishna Gummadi† Alan Mislove‡ Ansley Post†*

†MPI-SWS ‡Northeastern University *Now at Google

CoNEXT, December 2012

Page 2: Defending against large-scale crawls in online social networks Mainack Mondal Bimal Viswanath Allen Clement Peter Druschel Krishna Gummadi Alan Mislove

Lots of personal data on Online Social Networks (OSNs)

2CoNEXT, December 2012

Page 3: Defending against large-scale crawls in online social networks Mainack Mondal Bimal Viswanath Allen Clement Peter Druschel Krishna Gummadi Alan Mislove

What is the concern with aggregation of this large data?

Aggregators can mine this large data To infer attributes missing in the data, e.g. sexual orientation

Aggregators can republish this data in easily accessible form

Neither user nor OSN has control over usage of crawled data Problem for OSN operators User data is valuable asset to OSN operators OSN operators are blamed for misuse of user data [NYTimes ’10]

OSNs need to limit large-scale aggregation of user data

3CoNEXT, December 2012

In 2010, 171 M Facebook user’s data published in BitTorrent

In 2010, 171 M Facebook user’s data published in BitTorrent

Page 4: Defending against large-scale crawls in online social networks Mainack Mondal Bimal Viswanath Allen Clement Peter Druschel Krishna Gummadi Alan Mislove

Challenge

We are defending against a crawler who Wants to crawl as many accounts as possible Wants to crawl as fast as possible

Our goal is Limit the rate of crawling Make the crawlers as slow as possible

4CoNEXT, December 2012

Page 5: Defending against large-scale crawls in online social networks Mainack Mondal Bimal Viswanath Allen Clement Peter Druschel Krishna Gummadi Alan Mislove

OSNs rate-limit on per-account or per IP address basis Crawlers can defeat rate-limit using multiple accounts

Existing solution: Simple rate-limiting

5CoNEXT, December 2012

The crawlers can create multiple fake accounts or SybilsOr, the crawlers can use compromised accounts

Page 6: Defending against large-scale crawls in online social networks Mainack Mondal Bimal Viswanath Allen Clement Peter Druschel Krishna Gummadi Alan Mislove

Our solution: Genie

Assumption: Social links to good users are harder to get than accounts

Replace user-account-based rate-limiting with link-based rate-limiting

6CoNEXT, December 2012

Page 7: Defending against large-scale crawls in online social networks Mainack Mondal Bimal Viswanath Allen Clement Peter Druschel Krishna Gummadi Alan Mislove

Outline

Background and key idea

Genie design Credit networks How to use credit networks to defend against crawlers Using difference between user and crawler activity

Genie evaluation

7CoNEXT, December 2012

Page 8: Defending against large-scale crawls in online social networks Mainack Mondal Bimal Viswanath Allen Clement Peter Druschel Krishna Gummadi Alan Mislove

Credit Networks [EC ‘11]

Nodes trust each other by providing pair-wise credit Credit is used to pay for the services received

A

B

2

4

1

8

5

CoNEXT, December 2012

Page 9: Defending against large-scale crawls in online social networks Mainack Mondal Bimal Viswanath Allen Clement Peter Druschel Krishna Gummadi Alan Mislove

Credit Networks [EC ‘11]

Nodes trust each other by providing pair-wise credit Credit is used to pay the services received

A C B

To obtain a service, find path(s) with sufficient credits

6

32

32 5

9

3 4

CoNEXT, December 2012

Page 10: Defending against large-scale crawls in online social networks Mainack Mondal Bimal Viswanath Allen Clement Peter Druschel Krishna Gummadi Alan Mislove

How can we map OSN to credit networks ?

OSN operator forms credit network from the social network Operator replenishes credit on each link at a fixed rate

Credit deducted from links to view another user’s profile

2

2 5

2

3 6

3

34

A C D B

10

3 3 4

CoNEXT, December 2012

Page 11: Defending against large-scale crawls in online social networks Mainack Mondal Bimal Viswanath Allen Clement Peter Druschel Krishna Gummadi Alan Mislove

How do credit network defend against crawlers?

Amount of crawling is proportional to attack cut

Rest of the Network

(normal users)

Attack cut

11CoNEXT, December 2012 11

is small Attack cut may be larger

Sybil accountsCompromised accounts

(SybilRank, NSDI 2012)

Page 12: Defending against large-scale crawls in online social networks Mainack Mondal Bimal Viswanath Allen Clement Peter Druschel Krishna Gummadi Alan Mislove

Difference between normal users and crawlers

Reciprocity in profile views Normal users are more reciprocal than crawlers

Repeated profile views Normal users repeatedly visit the same set of profiles

Locality of views

12CoNEXT, December 2012

Page 13: Defending against large-scale crawls in online social networks Mainack Mondal Bimal Viswanath Allen Clement Peter Druschel Krishna Gummadi Alan Mislove

Difference in locality between normal users and crawlers

Renren graph and user browsing trace [IMC ‘10] 33 K users, 96 K activities (2 weeks)

Most of the normal views are local

13

crawler activity

CoNEXT, December 2012

Flickr: Mislove et al. [WOSN ‘08]

Orkut: Cha et al. [IMC ‘09]

% ofviews

Page 14: Defending against large-scale crawls in online social networks Mainack Mondal Bimal Viswanath Allen Clement Peter Druschel Krishna Gummadi Alan Mislove

Genie design principles

Use a credit network to rate limit links

Exploit difference between normal and crawler activity to discriminate crawlers Charge more for views further away

14CoNEXT, December 2012

Page 15: Defending against large-scale crawls in online social networks Mainack Mondal Bimal Viswanath Allen Clement Peter Druschel Krishna Gummadi Alan Mislove

Genie design

New charging model: Pay more to view profiles far away

Credit charged per link = Shortest path distance between two nodes -1

Rate of crawling decreases with increased path length

2

1 4

2

3 6

3

24

A C D B

- 2 - 2 - 2

15

4 4 5+ 2 + 2+ 2

CoNEXT, December 2012

Page 16: Defending against large-scale crawls in online social networks Mainack Mondal Bimal Viswanath Allen Clement Peter Druschel Krishna Gummadi Alan Mislove

Outline

Background and key idea

Genie design Credit networks How to use credit networks to defend against crawlers Using difference between user and crawler activity

Genie evaluation

16CoNEXT, December 2012

Page 17: Defending against large-scale crawls in online social networks Mainack Mondal Bimal Viswanath Allen Clement Peter Druschel Krishna Gummadi Alan Mislove

Genie evaluation

Does Genie limit attackers while allowing normal users?

The parameter to tweak: Credit replenishment rate per link Replenishment rate too high: Crawlers will be allowed Replenishment rate too low: Users will be heavily penalized

17CoNEXT, December 2012

Page 18: Defending against large-scale crawls in online social networks Mainack Mondal Bimal Viswanath Allen Clement Peter Druschel Krishna Gummadi Alan Mislove

Experimental setup

Genie simulator written in C++ Input: social graph and user activity trace Output: allowed/flagged for each activity

Normal user activity trace from Renren Generated multiple synthetic traces for other graphs

We model a strong and efficient crawler Crawler controls compromised user accounts Each good user profile is crawled once Crawlers try to crawl as many profiles as possible

18CoNEXT, December 2012

Page 19: Defending against large-scale crawls in online social networks Mainack Mondal Bimal Viswanath Allen Clement Peter Druschel Krishna Gummadi Alan Mislove

Does Genie limit crawlers?

19

The crawlers are slowed down ~3000 times

Credits/week per link

% of users crawled per week

Only 2.7% of the network is crawled in 1 week

CoNEXT, December 2012

Page 20: Defending against large-scale crawls in online social networks Mainack Mondal Bimal Viswanath Allen Clement Peter Druschel Krishna Gummadi Alan Mislove

Does Genie penalize good users?

20

Credit/week per link

% of user activity flagged

2.6% of total activities from 0.8 %users flagged

CoNEXT, December 2012

Page 21: Defending against large-scale crawls in online social networks Mainack Mondal Bimal Viswanath Allen Clement Peter Druschel Krishna Gummadi Alan Mislove

Does Genie penalize good users?

21

Credit/week per link

% of user activity flagged

CoNEXT, December 2012

10

8

6

4

2

0

% of userscrawled per week

Trade-off point

Page 22: Defending against large-scale crawls in online social networks Mainack Mondal Bimal Viswanath Allen Clement Peter Druschel Krishna Gummadi Alan Mislove

Who are these flagged users?

3 Users with very high number of random profile views Shows crawler like behavior 70% of the flagged activity are by these users

Users with normal # of profile views but very few friends 99% of flagged users have less than 5 friends Adding 4 more friends unflags 97% of these users

22CoNEXT, December 2012

Page 23: Defending against large-scale crawls in online social networks Mainack Mondal Bimal Viswanath Allen Clement Peter Druschel Krishna Gummadi Alan Mislove

Efficiency of Genie

In our Genie simulator To scale up Genie we used Canal library [EuroSys ’12] Multithreaded implementation Used a 24-core, 48 GB physical memory machine for evaluation

For a million node social graph Memory overhead 5 GB Each view request processed in 0.65 ms on average

23CoNEXT, December 2012

Page 24: Defending against large-scale crawls in online social networks Mainack Mondal Bimal Viswanath Allen Clement Peter Druschel Krishna Gummadi Alan Mislove

Summary

We propose rate-limiting links to defend against crawlers

We strengthen our defense using difference between normal user and crawler activities

We evaluated Genie on real world user activity trace

24CoNEXT, December 2012

Page 25: Defending against large-scale crawls in online social networks Mainack Mondal Bimal Viswanath Allen Clement Peter Druschel Krishna Gummadi Alan Mislove

Thank you

25CoNEXT, December 2012