do you trust your recommender? an exploration of privacy and trust in recommender systems

30
Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems Dan Frankowski, Dan Cosley, Shilad Sen, Tony Lam, Loren Terveen, John Riedl University of Minnesota

Upload: may

Post on 13-Jan-2016

36 views

Category:

Documents


1 download

DESCRIPTION

Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems. Dan Frankowski , Dan Cosley, Shilad Sen, Tony Lam, Loren Terveen, John Riedl University of Minnesota. Story: Finding “Subversives”. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems

Do You Trust Your Recommender?An Exploration of Privacy and Trust in

Recommender Systems

Dan Frankowski, Dan Cosley, Shilad Sen, Tony Lam, Loren Terveen, John Riedl

University of Minnesota

Page 2: Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems

CDT Spring Research Forum 20072

Story: Finding “Subversives”

“.. few things tell you as much about a person as the books he chooses to read.”

– Tom Owad, applefritter.com

Page 3: Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems

CDT Spring Research Forum 20073

Session Outline

Exposure: undesired access to a person’s information Privacy Risks Preserving Privacy

Bias and Sabotage: manipulating a trusted system to manipulate users of that system

Page 4: Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems

CDT Spring Research Forum 20074

Why Do I Care?

As a businessperson The nearest competitor is one click away Lose your customer’s trust, they will leave Lose your credibility, they will ignore you

As a person Let’s not build Big Brother

Page 5: Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems

CDT Spring Research Forum 20075

Risk of Exposure in One Slide

+ +

= Your private data linked!

algorithms

Seems bad. How can privacy be preserved?

Private Dataset

YOU

Public Dataset

YOU

Page 6: Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems

movielens.org

-Started ~1995

-Users rate movies ½ to 5 stars

-Users get recommendations

-Private: no one outside GroupLens can see user’s ratings

Page 7: Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems

Anonymized Dataset

-Released 2003

-Ratings, some demographic data, but no identifiers

-Intended for research

-Public: anyone can download

Page 8: Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems

movielens.org Forums

-Started June 2005

-Users talk about movies

-Public: on the web, no login to read

-Can forum users be identified in our anonymized dataset?

Page 9: Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems

CDT Spring Research Forum 20079

Research Questions RQ1: RISKS OF DATASET RELEASE:

What are risks to user privacy when releasing a dataset?

RQ2: ALTERING THE DATASET: How can dataset owners alter the dataset they release to preserve user privacy?

RQ3: SELF DEFENSE: How can users protect their own privacy?

Page 10: Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems

CDT Spring Research Forum 200710

Motivation: Privacy Loss MovieLens forum users did not agree to

reveal ratings

Anonymized ratings + public forum data = privacy violation?

More generally: dataset 1 + dataset 2 = privacy risk? What kind of datasets? What kinds of risks?

Page 11: Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems

CDT Spring Research Forum 200711

Vulnerable Datasets We talk about datasets from a sparse relation space

Relates people to items

Is sparse (few relations per person from possible relations)

Has a large space of items

i1 i2 i3 …

p1 X

p2 X

p3 X

Page 12: Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems

CDT Spring Research Forum 200712

Example Sparse Relation Spaces Examples

Customer purchase data from Target Songs played from iTunes Articles edited in Wikipedia Books/Albums/Beers… mentioned by bloggers or

on forums Research papers cited in a paper (or review) Groceries bought at Safeway …

We look at movie ratings and forum mentions, but there are many sparse relation spaces

Page 13: Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems

CDT Spring Research Forum 200713

Risks of re-identification Re-identification is matching a user in

two datasets by using some linking information (e.g., name and address, or movie mentions)

Re-identifying to an identified dataset (e.g., with name and address, or social security number) can result in severe privacy loss

Page 14: Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems

CDT Spring Research Forum 200714

Former Governor of Massachusetts

Story: Finding Medical records (Sweeney 2002)

87% of people in 1990 U.S. census identifiable

by these!

Page 15: Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems

CDT Spring Research Forum 200715

The Rebus Form

+ =

Governor’s medical records!

Page 16: Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems

CDT Spring Research Forum 200716

Related Work Anonymizing datasets: k-anonymity

Sweeney 2002 Privacy-preserving data mining

Verykios et al 2004, Agrawal et al 2000, … Privacy-preserving recommender systems

Polat et al 2003, Berkovsky et al 2005, Ramakrishnan et al 2001

Text mining of user comments and opinions Drenner et al 2006, Dave et al 2003, Pang et al

2002

Page 17: Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems

CDT Spring Research Forum 200717

RQ1: Risks of Dataset Release RQ1: What are risks to user privacy

when releasing a dataset?

RESULT: 1-identification rate of 31% Ignores rating values entirely! Can do even better if text analysis

produces rating value Rarely-rated items were more identifying

Page 18: Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems

CDT Spring Research Forum 200718

Glorious Linking Assumption People mostly talk about things they

know => People tend to have rated what they mentioned

Measured P(u rated m | u mentioned m) averaged over all forum users: 0.82

Page 19: Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems

CDT Spring Research Forum 200719

Algorithm Idea

All Users

Users whorated apopular item

Users whorated a rarely rated item

Users whorated both

Page 20: Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems

Probability of 1-identification vs. algorithm

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1(25)

2..3(21)

4..7(23)

8..15(22)

16..31(18)

32..63(13)

>64(11)

# mentions bin (and # users in bin)

Probability of 1-identification

ExactRating

FuzzyRating

Scoring

TF-IDF

Set Intersection

•>=16 mentions and we often 1-identify

•More mentions => better re-identification

Page 21: Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems

CDT Spring Research Forum 200721

RQ2: ALTERING THE DATASET How can dataset owners alter the dataset

they release to preserve user privacy?

Perturbation: change rating values Oops, Scoring doesn’t need values

Generalization: group items (e.g., genre) Dataset becomes less useful

Suppression: hide data IDEA: Release a ratings dataset suppressing all

“rarely-rated” items

Page 22: Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems

Database-level suppression curves

0

0.1

0.2

0.3

0.4

0.5

0.6

0 0.2 0.4 0.6 0.8 1

Fraction of items suppressed

Fraction of users 1-identified

•Drop 88% of items to protect current users against 1-identification

•88% of items => 28% ratings

Page 23: Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems

CDT Spring Research Forum 200723

RQ3: SELF DEFENSE RQ3: How can users protect their own

privacy? Similar to RQ2, but now per-user User can change ratings or mentions. We

focus on mentions

User can perturb, generalize, or suppress. As before, we study suppression

Page 24: Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems

User-level suppression curves

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5

Fraction of user mentions (per user) suppressed

Fraction of users 1-identified

•Suppressing 20% of mentions dropped 1-ident some, but not

all

•Suppressing >20% is not reasonable for

a user

Page 25: Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems

CDT Spring Research Forum 200725

Another Strategy: Misdirection What if users mention items they did NOT

rate? This might misdirect a re-identification algorithm

Create a misdirection list of items. Each user takes an unrated item from the list and mentions it. Repeat until not identified.

What are good misdirection lists? Remember: rarely-rated items are identifying

Page 26: Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems

User 1-identification vs. number of misdirecting mentions

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0 5 10 15 20

# misdirecting mentions

Fraction of users 1-identified

Rare, rated>=1

Rare, rated>=16

Rare, rated>=1024

Rare, rated>=8192

Popular

•Rarely-rated items don’t misdirect!

•Popular items do better, though 1-ident isn’t zero

•Better to misdirect to a large crowd

•Rarely-rated items are identifying, popular items are misdirecting

Page 27: Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems

CDT Spring Research Forum 200727

Exposure: What Have We Learned? REAL RISK

Re-identification can lead to loss of privacy We found substantial risk of re-identification in our

sparse relation space There are a lot of sparse relation spaces We’re probably in more and more of them available

electronically

HARD TO PRESERVE PRIVACY Dataset owner had to suppress a lot of their dataset to

protect privacy Users had to suppress a lot to protect privacy Users could misdirect somewhat with popular items

Page 28: Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems

CDT Spring Research Forum 200728

Advice: Keep Customer’s Trust

Share data rarely Remember the governor: (zip + birthdate +

gender) is not anonymous

Reduce exposure Example: Google will anonymize search

data older than 24 months

Page 29: Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems

CDT Spring Research Forum 200729

AOL: 650K users, 20M queriesData wants to be free

Government subpoena, research, commerce

People do not know the risks

AOL was text, this is items

NY Times: 4417749 searched for “dog that urinates on everything.”

Page 30: Do You Trust Your Recommender? An Exploration of Privacy and Trust in Recommender Systems

CDT Spring Research Forum 200730

Discussion #1: Exposure

Examples of sparse relation spaces?

Examples of re-identification risks?

How to preserve privacy?