defending against large-scale crawls in online social networks mainack mondal bimal viswanath allen...
TRANSCRIPT
Defending against large-scale crawls in online social networks
Mainack Mondal† Bimal Viswanath† Allen Clement†
Peter Druschel† Krishna Gummadi† Alan Mislove‡ Ansley Post†*
†MPI-SWS ‡Northeastern University *Now at Google
CoNEXT, December 2012
Lots of personal data on Online Social Networks (OSNs)
2CoNEXT, December 2012
What is the concern with aggregation of this large data?
Aggregators can mine this large data To infer attributes missing in the data, e.g. sexual orientation
Aggregators can republish this data in easily accessible form
Neither user nor OSN has control over usage of crawled data Problem for OSN operators User data is valuable asset to OSN operators OSN operators are blamed for misuse of user data [NYTimes ’10]
OSNs need to limit large-scale aggregation of user data
3CoNEXT, December 2012
In 2010, 171 M Facebook user’s data published in BitTorrent
In 2010, 171 M Facebook user’s data published in BitTorrent
Challenge
We are defending against a crawler who Wants to crawl as many accounts as possible Wants to crawl as fast as possible
Our goal is Limit the rate of crawling Make the crawlers as slow as possible
4CoNEXT, December 2012
OSNs rate-limit on per-account or per IP address basis Crawlers can defeat rate-limit using multiple accounts
Existing solution: Simple rate-limiting
5CoNEXT, December 2012
The crawlers can create multiple fake accounts or SybilsOr, the crawlers can use compromised accounts
Our solution: Genie
Assumption: Social links to good users are harder to get than accounts
Replace user-account-based rate-limiting with link-based rate-limiting
6CoNEXT, December 2012
Outline
Background and key idea
Genie design Credit networks How to use credit networks to defend against crawlers Using difference between user and crawler activity
Genie evaluation
7CoNEXT, December 2012
Credit Networks [EC ‘11]
Nodes trust each other by providing pair-wise credit Credit is used to pay for the services received
A
B
2
4
1
8
5
CoNEXT, December 2012
Credit Networks [EC ‘11]
Nodes trust each other by providing pair-wise credit Credit is used to pay the services received
A C B
To obtain a service, find path(s) with sufficient credits
6
32
32 5
9
3 4
CoNEXT, December 2012
How can we map OSN to credit networks ?
OSN operator forms credit network from the social network Operator replenishes credit on each link at a fixed rate
Credit deducted from links to view another user’s profile
2
2 5
2
3 6
3
34
A C D B
10
3 3 4
CoNEXT, December 2012
How do credit network defend against crawlers?
Amount of crawling is proportional to attack cut
Rest of the Network
(normal users)
Attack cut
11CoNEXT, December 2012 11
is small Attack cut may be larger
Sybil accountsCompromised accounts
(SybilRank, NSDI 2012)
Difference between normal users and crawlers
Reciprocity in profile views Normal users are more reciprocal than crawlers
Repeated profile views Normal users repeatedly visit the same set of profiles
Locality of views
12CoNEXT, December 2012
Difference in locality between normal users and crawlers
Renren graph and user browsing trace [IMC ‘10] 33 K users, 96 K activities (2 weeks)
Most of the normal views are local
13
crawler activity
CoNEXT, December 2012
Flickr: Mislove et al. [WOSN ‘08]
Orkut: Cha et al. [IMC ‘09]
% ofviews
Genie design principles
Use a credit network to rate limit links
Exploit difference between normal and crawler activity to discriminate crawlers Charge more for views further away
14CoNEXT, December 2012
Genie design
New charging model: Pay more to view profiles far away
Credit charged per link = Shortest path distance between two nodes -1
Rate of crawling decreases with increased path length
2
1 4
2
3 6
3
24
A C D B
- 2 - 2 - 2
15
4 4 5+ 2 + 2+ 2
CoNEXT, December 2012
Outline
Background and key idea
Genie design Credit networks How to use credit networks to defend against crawlers Using difference between user and crawler activity
Genie evaluation
16CoNEXT, December 2012
Genie evaluation
Does Genie limit attackers while allowing normal users?
The parameter to tweak: Credit replenishment rate per link Replenishment rate too high: Crawlers will be allowed Replenishment rate too low: Users will be heavily penalized
17CoNEXT, December 2012
Experimental setup
Genie simulator written in C++ Input: social graph and user activity trace Output: allowed/flagged for each activity
Normal user activity trace from Renren Generated multiple synthetic traces for other graphs
We model a strong and efficient crawler Crawler controls compromised user accounts Each good user profile is crawled once Crawlers try to crawl as many profiles as possible
18CoNEXT, December 2012
Does Genie limit crawlers?
19
The crawlers are slowed down ~3000 times
Credits/week per link
% of users crawled per week
Only 2.7% of the network is crawled in 1 week
CoNEXT, December 2012
Does Genie penalize good users?
20
Credit/week per link
% of user activity flagged
2.6% of total activities from 0.8 %users flagged
CoNEXT, December 2012
Does Genie penalize good users?
21
Credit/week per link
% of user activity flagged
CoNEXT, December 2012
10
8
6
4
2
0
% of userscrawled per week
Trade-off point
Who are these flagged users?
3 Users with very high number of random profile views Shows crawler like behavior 70% of the flagged activity are by these users
Users with normal # of profile views but very few friends 99% of flagged users have less than 5 friends Adding 4 more friends unflags 97% of these users
22CoNEXT, December 2012
Efficiency of Genie
In our Genie simulator To scale up Genie we used Canal library [EuroSys ’12] Multithreaded implementation Used a 24-core, 48 GB physical memory machine for evaluation
For a million node social graph Memory overhead 5 GB Each view request processed in 0.65 ms on average
23CoNEXT, December 2012
Summary
We propose rate-limiting links to defend against crawlers
We strengthen our defense using difference between normal user and crawler activities
We evaluated Genie on real world user activity trace
24CoNEXT, December 2012
Thank you
25CoNEXT, December 2012