what’s the gist? privacy-preserving aggregation of user profiles

25
What’s the Gist? Privacy-Preserving Aggregation of User Profiles Igor Bilogrevic (Google), Julien Freudiger (PARC), Emiliano De Cristofaro (UCL), Ersin Uzun (PARC) Scott Kildall – Data Crysta

Upload: chastity-dominguez

Post on 02-Jan-2016

27 views

Category:

Documents


0 download

DESCRIPTION

What’s the Gist? Privacy-Preserving Aggregation of User Profiles. Igor Bilogrevic (Google), Julien Freudiger (PARC) , Emiliano De Cristofaro (UCL), Ersin Uzun (PARC). Scott Kildall – Data Crystals. Data is the Crux of Internet Economy. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: What’s the Gist?  Privacy-Preserving Aggregation of User Profiles

What’s the Gist? Privacy-Preserving Aggregation of User ProfilesIgor Bilogrevic (Google), Julien Freudiger (PARC), Emiliano De Cristofaro (UCL), Ersin Uzun (PARC)

Scott Kildall – Data Crystals

Page 2: What’s the Gist?  Privacy-Preserving Aggregation of User Profiles

2

Data is the Crux of Internet Economy

Corporations seek personal data for better targeting

More data and more sensitive data

Data Brokers

Third Parties

UsersUsersUsersUsers

Credit card transactionsInterestsPolitical partyApps usageBrowsing historyMobility patterns…

Page 3: What’s the Gist?  Privacy-Preserving Aggregation of User Profiles

3

Issues with Current Approach

PrivacyWhat personal data is collected?How much and how good is it?

TransparencyWho knows what about me?[1]

Where does this data come from?

RemunerationUsers value their data Users don’t get money for it Data Brokers

A Call for Transparency and AccountabilityFTC, May 2014

[1] aboutthedata.com

Page 4: What’s the Gist?  Privacy-Preserving Aggregation of User Profiles

4

“This question calls for Acxiom to provide information that would reveal business practices that are of a highly competitive nature. Acxiom cannot provide a list of each entity that has provided data from, or about, consumers to us.”

ACXIOM

Page 5: What’s the Gist?  Privacy-Preserving Aggregation of User Profiles

5

Page 6: What’s the Gist?  Privacy-Preserving Aggregation of User Profiles

6Julian Oliver - 2013

Page 7: What’s the Gist?  Privacy-Preserving Aggregation of User Profiles

7

An Emerging Model

Data Brokers

Third Parties

UsersUsersUsersUsers

Participatory Data Brokers

BenefitsUsers retain control over who access what about themUsers decide what data can be monetizedUsers get some revenue

Page 8: What’s the Gist?  Privacy-Preserving Aggregation of User Profiles

8

“What if Facebook paid you? Several startups envision an era in which we are all the brokers, and beneficiaries, of our own personal data.“

David Zax, Is personal data the new currency? MIT Tech Review

You

Page 9: What’s the Gist?  Privacy-Preserving Aggregation of User Profiles

9

Our Contribution

What’s the Gist? Method for monetization of user personal data with privacyUsers choose what to shareBrokers are not required to be trustworthy

IdeaRather than selling data as-is, monetize a model of the data

Age20 30 50

pdfUser data (age)

User1 22User2 56User3 43User4 33…

Aggregate (age)

40 60

Page 10: What’s the Gist?  Privacy-Preserving Aggregation of User Profiles

10

System Architecture

AggregatorThird Party

1. Query 2. Select users

3. QueriesUsers

5. Noisy encrypted answers

6. Aggregate, decrypt, sample, and monetize

7. Answer

UsersUsersUsers

4. Extract features

Interactive modeCustomer queries for certain desired aggregates

Batch modeAggregator prepares certain aggregates

Page 11: What’s the Gist?  Privacy-Preserving Aggregation of User Profiles

11

Users – Profile Computation

Each user i has profile pi with K attributes {ai,j}

Each element ai,j is an integer representing a value or a preference

ai,2

ai,2

ai,3

..

..ai,K

User i

pi =

2822356..23

pi =Example

Age# of friendsAction moviesDrama movies…Rock musicHistory books

Page 12: What’s the Gist?  Privacy-Preserving Aggregation of User Profiles

12

Users – Feature Computation

Features depend on chosen probability modelFor Gaussian model, each user i computes

fi = {[ai,1 , ai,12], …, [ai,K , ai,K

2]}

[28], [282][223], [2232][5], [52][6], [62]..[2], [22][3], [32]

pi =

Age# of friendsAction moviesDrama movies…Rock musicHistory books

Page 13: What’s the Gist?  Privacy-Preserving Aggregation of User Profiles

13

Private Aggregation

PrivacyDifferentially private ri prevents aggregator from deducting user data[1]

SecurityAggregator can only decrypt sumNo shared secret, no pairwise distributed computations

Aggregator

User i

User 1

User n

Assume

Knows

Computes

[1] E Shi et al. Privacy-Preserving Aggregation of Time-Series Data. NDSS, 2011

Page 14: What’s the Gist?  Privacy-Preserving Aggregation of User Profiles

14

Aggregator – Gaussian Approximation

Entities contribute

Enc[a1], Enc[a12], …, Enc[ai], Enc[ai

2]

Broker aggregates to compute mean μ, and variance σ2

Obtains Gaussian approximation N(μ, σ2) for each attribute

age

N(μ, σ2)pdf

Page 15: What’s the Gist?  Privacy-Preserving Aggregation of User Profiles

15

Aggregator - Attribute Ranking

AssumptionAttributes with uniform distribution reveal less information about individual entities

Measure divergenceDistance between two probability distributionsJenson-Shannon (JS) divergenceSmall JS distance means low value

pdf

Uniform distribution

pdf

Page 16: What’s the Gist?  Privacy-Preserving Aggregation of User Profiles

16

Performance

Dataset and implementation100,000 real users from U.S. Census [data.gov, July 2013]3 types of attributes (income, education, age)Java, measurements on Core i5 2.53 GHz, 8 GB RAM

MetricsAccuracy of Gaussian approximationInformation leakage for each attributeRevenueOverhead

Page 17: What’s the Gist?  Privacy-Preserving Aggregation of User Profiles

17

Inco

me

Edu

cati

on

Age

100 users 1,000 users 100,000 users

Page 18: What’s the Gist?  Privacy-Preserving Aggregation of User Profiles

18

Gaussian Approximations

Accuracy improves quickly with number of users (100 is good)

Fit for income and age is 3x better than for education

Page 19: What’s the Gist?  Privacy-Preserving Aggregation of User Profiles

19

Information Leakage vs Uniform

Maximum information leakage achieved at about 1,000 users

Information leakage not necessarily increasing with number of users (stable after a while)

Larger user samples do not necessarily provide better discriminating features

Page 20: What’s the Gist?  Privacy-Preserving Aggregation of User Profiles

20

Revenue Model

Value of user information: from $0.0005[2] to $33[1]

Where w=0.1 is the commission.

[1] J. P. Carrascal, C. Riederer, V. Erramilli, M. Cherubini, and R. de Oliveira. Your browsing behavior for a big mac: Economics of personal information online. WWW, 2013[2] L. Olejnik, T. Minh-Dung, C. Castelluccia. Selling off privacy at auction. NDSS, 2014

Page 21: What’s the Gist?  Privacy-Preserving Aggregation of User Profiles

21

Revenue per AttributeThree privacy sensitivity distributions

User revenue is small and does not increase with the number of participants Revenue similar to Amazon Mechanical Turk

Broker incentivized to collect as many users as possible ($0.07 $ 2897)

Third parties incentivized to select demographic group of size 100

Page 22: What’s the Gist?  Privacy-Preserving Aggregation of User Profiles

22

Overhead

1.5 min for 100 users 27.7 h for 100,000 usersCan and should be parallelized

User Aggregator

1 ms totalIndependent of number of users

Page 23: What’s the Gist?  Privacy-Preserving Aggregation of User Profiles

23

Related Work

Privacy-preserving aggregation

Modified version of the Paillier encryption scheme[1,2] But P2P communications between participants

Homomorphic encryption and differential privacy[3,4] But differential privacy by third party and contributions linkable to users before aggregation

[1] Z. Erkin and G. Tsudik. Private computation of spatial and temporal power consumption with smart meter. ACNS 2012[2] E. Shi, R. Zhang, Y. Liu, and Y. Zhang. Prisense: privacy-preserving data aggregation in people-centric urban sensing systems. INFOCOM, 2010[3] R. Chen, I. E. Akkus, and P. Francis. Splitx: high-performance private analytics. SIGCOMM, 2013 [4] R. Chen, A. Reznichenko, P. Francis, and J. Gehrke. Towards statistical queries over distributed private user data. NSDI, 2012

Page 24: What’s the Gist?  Privacy-Preserving Aggregation of User Profiles

24

Related Work

Privacy-preserving monetization

Local user profile generation, categorization, and ad selection[1,2]

Anonymizing proxies to shield users’ behavioral data from third parties[3]

[1] V. Toubiana, A. Narayanan, D. Boneh, H. Nissenbaum, and S. Barocas. Adnostic: Privacy preserving targeted advertising. NDSS, 2010[2] S.Guha, B.Cheng, and P. Francis. Privad: practical privacy in online advertising. NSDI, 2011[3] C. Riederer, V. Erramilli, A. Chaintreau, B. Krishnamurthy, and P. Rodriguez. For sale: your data: by: you. HotNETs, 2011

Page 25: What’s the Gist?  Privacy-Preserving Aggregation of User Profiles

25

Conclusion

Designed method to monetize sensitive data with privacy

If data is new currency, we are creating marketplace

Evaluation shows practical performance, good accuracy with as little as 100 users and good incentives for parties involved

Future workEnhance security features (range checks to thwart pollution attacks, fault-tolerance, efficient key establishment)Enable targeting of users after aggregationEnable subsequent collection of more than model (i.e., black swan)