privacy, ethics, and big (smartphone) data, at mobisys 2014

Post on 27-Jan-2015

103 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Keynote talk I gave at the Mobile and Cloud Workshop at Mobisys 2014. I talk about my experiences and reflections on privacy, focusing on (1) Urban Analytics, (2) Google Glass, and (3) PrivacyGrade.

TRANSCRIPT

©2

00

9 C

arn

eg

ie M

ello

n U

niv

ers

ity :

1

Privacy, Ethics, and Big (Smartphone) Data

Mobile Cloud Computing and ServicesJune 16, 2014

Jason Hong

ComputerHumanInteraction:MobilityPrivacySecurity

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

2

Smartphones are Pervasive

• 58% penetration in the US as of early 2014

• About 1.2M Android and iOS apps

• Over 75 billion apps downloaded on each of Android and iOS

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

3

Smartphones are Intimate

• Mobile phones and millennials (Cisco 2012):• 75% use in bed before sleep • 83% sleep with their phones• 90% check first thing in the

morning• A third use in bathroom (!!)• A fifth check every ten

minutes

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

4

Smartphone Data is Intimate

Who we know(contact list)

Who we call(call log)

Who we text(sms log)

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

5

Smartphone Data is Intimate

Where we go(gps, foursquare)

Photos(some geotagged)

Sensors(accel, sound, light)

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

6

The Opportunity

• We are creating a worldwide sensor network with these smartphones

• We can now capture and analyze human behavior at unprecedented fidelity and scale

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

7

Understanding Individuals

Min et al, Toss ‘n’ Turn: Smartphone as Sleep and Sleep Quality Detector, CHI 2014.

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

8

Understanding Small Groups

Better communication

ComputationalSocial Science

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

9

Understanding Urban Areas

• AT&T Work on Human Mobility

Median distance traveled per day

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

10

Laborshed for Morristown NJ

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

11

Eric Fischer’s Maps of Tourists vs Locals

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

12

The Challenge

• Mobile and Cloud computing a clear part of the future

• Lots of fun technical challenges here– Computer scientists are great here!– We’re awesome at optimizing things

• But biggest challenges will likelyrelate to privacy and ethics

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

13

Three Stories

• Reflections and personal experiences• Livehoods

– How much can we infer about people?– Who wins, who doesn’t with tech?

• Google Glass– Why so much negative feedback?– Are there lessons we can learn?

• PrivacyGrade– What are ways of protecting people?– What is the role of developers?

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

14

Story 1: The Challenge of Getting Data About Our Cities

• Today’s methods for getting city data slow, costly, limited– Ex. Travel Behavioral Inventory– US Census 2010 cost $13b– Quality of life surveys

• Some approaches today:– Call Data Records– Deploy a custom app

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

15

The Vision: Urban Analytics

• How can we use smartphones + social media + machine learning to offer new and useful insights about cities in a manner that is cheap, fast, and scalable?

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

16

Livehoods, Our First Urban Analytics Tool• The character of an urban area is defined

not just by the types of places found there, but also by the people that make it part of their daily life

Cranshaw et al, The Livehoods Project: Utilizing Social Media to Understand the Dynamics of a City, ICWSM 2012.

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

17

Two Perspectives

“Politically constructed” “Socially constructed”

Neighborhoods have fixed borders defined by the city government.

Neighborhoods are organic, cultural artifacts. Borders are blurry, imprecise, and may be different to different people.

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

18

Two Perspectives of Cities

“Socially constructed”

Neighborhoods are organic, cultural artifacts. Borders are blurry, imprecise, and may be different to different people.

Can we discover automated ways of identifying the “organic” boundaries of the city?

Can we extract local cultural knowledge from social media?

Can we build a collective cognitive map from data?

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

19

Livehoods Data Source

• Crawled 18m check-ins from foursquare– Claims 20m users– People who linked their

foursquare accts to Twitter

• Spectral clustering basedon geographic and socialproximity

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

20

☺☺

If you watch check-ins over time, you’ll notice that groups of like-minded people tend to stay in the same areas.

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

21We can aggregate these

patterns to compute relationships between check-in venues.

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

22These relationships can

then be used to identify natural borders in the urban landscape.

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

23

Livehood 2

Livehood 1

We call the discovered clusters “Livehoods” reflecting their dynamic character.

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

24

Try it out at livehoods.org

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

25

South Side Pittsburgh

Safety

LH8LH9LH7

LH6

LH7 vs LH8“Whenever I was living down on 15th Street [LH7] I had to worry about drunk people following me home, but on 23rd [LH8] I need to worry about people trying to mug you... so it’s different. It’s not something I had anticipated, but there is a distinct difference between the two areas of the South Side.”

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

26

South Side Pittsburgh

Demographic Differences

LH8LH9LH7

LH6

LH6“There is this interesting mix of people there I don’t see walking around the neighborhood. I think they are coming to the Giant Eagle [grocery store] from lower income neighborhoods... I always assumed they came from up the hill.”

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

27

South Side Pittsburgh“I always assumed they came from up the hill.”

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

28

South Side Pittsburgh

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

29

Other Potential Urban Analytics

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

30

Topic Modeling (LDA)

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

31

Reflections on Urban AnalyticsFew Privacy Issues• Publicly visible data with no

expectations of privacy– No IRB issues

• Remove venues labeled as “home”– We only received one request to remove

a venue (wasn’t labeled as a home)

• We only show data about geographic areas vs individuals

• So far, so good

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

32

Reflections on Urban AnalyticsBut Lots of Questions About Ethics• Discussions with lots of different folks

on how data like this might be used in the future– Urban planners, Yahoo, Facebook,

Google Maps, Startups, and more

• Lots of great questions– Not so many great answers– Share some of these with you

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

33

How Much Can Be Inferred?

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

34

How Much Can Be Inferred?

• Very likely much more can be inferred using rich data like this– Demographics, socioeconomic, friends– Physical and mental health (depression)– How “risky” you are (bars, clinics, etc)

• Unclear how far inferencing can go– Also, not much can stop advertisers, NSA,

startups, etc– And, not just what you do, but what others

like you are doing

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

35

Built a logistic regression to predict sexuality based on what your friends on Facebook disclosed

How Much Can Be Inferred?

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

36

“[An analyst at Target] was able to identify about 25 products that… allowed him to assign each shopper a ‘pregnancy prediction’ score. [H]e could also estimate her due date to within a small window, so Target could send coupons timed to very specific stages of her pregnancy.” (NYTimes)

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

37

How Much Can Be Inferred?

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

38

A New Kind of Redlining?

• “denying, or charging more for, services such as banking, insurance, access to health care, … supermarkets, or denying jobs … against a particular group of people” (Wikipedia)

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

39

Huge Risk of New Kind of Redlining

Map of Philadelphia showing redlining of lower income neighborhoods.Households and businesses in the red zones could not get mortgages or business loans. (Wikipedia)

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

40

Huge Risk of New Kind of Redlining

Johnson says his jaw dropped when he read one of the reasons American Express gave for lowering his credit limit:

"Other customers who have used their card at establishments where you recently shopped have a poor repayment history with American Express."

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

41

Potential Biases in the Data?

Pittsburgh’s Hill District

Was ground zero for Jazz musicians in 20th century

8,000 residents and 400 businesses, decimating the economic center of African-American Pittsburgh

Median Income (2009): $17,939

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

42

Potential Biases in the Data?

• Socioeconomic bias– Little data from lower socioeconomic areas

• Urban bias– Twitter, Flickr, Foursquare all more active

per capita in cities

• Age and gender bias– Most young, male, technology-savvy

• Is this a problem that will solve itself?– Or, can we address this in our models?

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

43

Who Gains From this Data?

• Today, most data only flows one way– Mainly to advertisers (and NSA)– Also banks, insurance, credit cards

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

44

Who Gains From this Data?

• Can we design systems that share the value across more people?– People co-create data and gain value– Participatory design philosophy

• Can we also make people feel more invested in the cities they live in?

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

45

Tiramisu Bus Tracker App

• People can see incoming bus data

• People can alsoshare info– Got on bus– #seats available

• New feature: people can discuss transit

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

46

Story 2: Google Glass

• Why there has been so much negative backlash about Google Glass?

• Are there lessons we can learn here about privacy and adoption of tech?

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

47

Origins of Ubicomp

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

48

Popular Press Reaction

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

49

Why Such Negative Press?

• PARC knew privacy a big problem– But didn’t know what to do– So didn’t build any privacy protections

• Unclear value proposition– Focused on technical aspects– What benefits to end-users?

• Google Glass is replaying the past

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

50

Why Such Negative Press?

• Grudin’s law– Why does groupware fail?– “When those who benefit are not

those who do the work, then the technology is likely to fail, or at least be subverted”

• Privacy corollary– When those who bear the privacy risks do

not benefit in proportion to the perceived risks, the tech is likely to fail

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

51

But Expectations Can Change

• Initial perceptionsof mobile phone users– Rude, annoying– Casual chat, driving

• Six weeks later…– Had same behaviors

• People with more exposure to mobilephones better

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

52

But Expectations Can Change

• Famous 1890article definingprivacy as “theright to be let alone” was aboutphotography

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

53

• People objected to having phones in their homes because it “permitted intrusion… by solicitors, purveyors of inferior music, eavesdropping operators, and even wire-transmitted germs”

But Expectations Can Change

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

54

But Expectations Can Change

One resort felt the trend so heavily that it posted a notice: “PEOPLE ARE FORBIDDEN TO USE THEIR KODAKS ON THE BEACH.” Other locations were no safer. For a time, Kodak cameras were banned from the Washington Monument. The “Hartford Courant” sounded the alarm as well, declaring the “the sedate citizen can’t indulge in any hilariousness without the risk of being caught in the act and having his photograph passed around among his Sunday School children.”

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

55

And Framing Really Matters

• Ubicomp -> Invisible Computing– Talked less about the tech– More about how it could help people– More positive press

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

56

Privacy Hump

• A lot of our concerns about tech fall under umbrella term “privacy”– Value, fears, expectations,

what others around us thinkMany legitimate concerns

Many alarmist rants

“Right” way to deploy?

Value proposition?

Rules on proper use?

Things have settled down

Few fears materialized

People feel in control

People get tangible value

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

57

But How to Get Over Hump?

• Still a big gap in knowledge on best ways of mitigating privacy issues

• Prime example: Facebook news feed

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

58

“We’d put an ad for a lawn mower next to diapers. We’d put a coupon for wineglasses next to infant clothes. That way, it looked like all the products were chosen by chance.”

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

59

• Privacy Placebos: Things that make people feel better about privacy, but doesn’t offer much.

• Other examples: Privacy policies, access logs• How ethical is it to use these approaches?

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

60

Story 3: PrivacyGrade

• – every SMS, search, and phone#

• How can we help improve privacy? – Developers– End-users

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

61

What Do Developers Know about Privacy?

• Interviews with 13 app developers• Surveys with 228 app developers

• What tools? Knowledge? Incentives?• Points of leverage?

Balebako et al, The Privacy and Security Behaviors of Smartphone App Developers. USEC 2014.

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

62

Summary of Findings

• App developers not evil• But need to monetize

– Ads and analytics heavily used– Turns out libraries big source of problems

• Also don’t know what to do– “I haven’t even read [our privacy policy]. I

mean, it’s just legal stuff that’s required, so I just put in there.”

– Often just ask other people around them– Low awareness of guidelines

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

63

Developers Need a Lot of Help!

• Good set best practices for security– SSL, hashing of passwords, randomization– Common attacks: SQL, XSS, CSRF

• What are best practices for privacy???

• Developers also have many problems– App functionality, bandwidth, power,

making money… privacy far down the list

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

64

What are your apps really doing?

Shares your location,gender, unique phone ID,phone# with advertisers

Uploads your entire contact list to their server(including phone #s)

Lin et al, Expectation and Purpose: Understanding User’s Mental Models of Mobile App Privacy thru Crowdsourcing. Ubicomp 2012.

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

65

Many Smartphone Apps Have “Unusual” Permissions

Location Data

Unique device ID

Location Data

Network Access

Unique device ID

Location Data

Unique device ID

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

66

Expectations vs Reality

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

67

Privacy as Expectations

• Apply this same idea of mental models for privacy– Compare what people expect an app

to do vs what an app actually does– Emphasize the biggest gaps,

misconceptions that many people had

App Behavior(What an app actually does)

User Expectations(What people think

the app does)

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

68

10% users were surprised this app wrote contents to their SD card.

25% users were surprised this app sent their approximate location to dictionary.com for searching nearby words.

85% users were surprised this app sent their phone’s unique ID to mobile ads providers.

0% users were surprised this app could control their audio settings.

See all

90% users were surprised this app sent their precise location to mobile ads providers.

95% users were surprised this app sent their approximate location to mobile ads providers.

95% users were surprised this app sent their phone’s unique ID to mobile ads providers.

See all

0% users were surprised this app can control camera flashlight.

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

69

Results for Location Data (N=20 per app, Expectations Condition)

App Comfort Level (-2 – 2)

Maps 1.52

GasBuddy 1.47

Weather Channel 1.45

Foursquare 0.95

TuneIn Radio 0.60

Evernote 0.15

Angry Birds -0.70

Brightest Flashlight Free -1.15

Toss It -1.2

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

70

How to Scale It Up?

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

71

How PrivacyGrade Works

• Libraries can offer us insight into the purpose of a permission request– Location used by Google Maps library vs

Location used by advertising library

• Create a model of people’s concerns of permission by purpose

Lin et al, Modeling Users’ Mobile App Privacy Preferences: Restoring Usability in a Sea of Permission Settings. SOUPS 2014.

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

72

How PrivacyGrade Works

• We crowdsourced people’s expectations of a core set of 837 apps– Ex. “How comfortable are you with

Fruit Ninja using your location for ads?”– 20 people per question (as above)

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

73

How PrivacyGrade Works

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

74

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

75

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

76

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

77

Early Discussion of PrivacyGrade

• Can we use big data for privacy?• Many points of leverage in ecosystem

– Policy makers / third party advocates– Developers– OS / Hardware / app store– End-users

• Legal issues– Still under discussion with CMU lawyers– DMCA, liability issues

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

78

How far should we go in inferring behaviors?

How can we minimize biases in our data and

models?How can we help ensure

data doesn’t flow one way?

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

79

What are better ways of going over the privacy

hump?

How acceptable is it to use privacy placebos?

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

80

How can we helpdevelopers with privacy?

Can we use big data approaches for privacy?

What are ways of getting our work out there?

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

81

How can we work with public policy makers

to create better guidelines

around privacy?

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

82

How can we createa connected world we

would all want to live in?

ComputerHumanInteraction:MobilityPrivacySecurity

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

83

Thanks!

More info at cmuchimps.orgor email jasonh@cs.cmu.edu

Special thanks to:• Army Research Office• NSF• Alfred P. Sloan• NQ Mobile

• DARPA• Google• CMU Cylab

• Shah Amini• Justin Cranshaw• Afsaneh Doryab• Jialiu Lin

• Song Luan• Jun-Ki Min• Dan Tasse• Jason Wiese

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

84

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

85

Summary

• Smartphones and cloud computing

offer big opportunity to understand human behavior

• Also pose many large challenges, in privacy and ethics

• But I’m optimistic

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

86

86

Inte

nsity

feat

ures

Nu

mb

er

of

co-l

ocati

on

s

With

out inte

nsity

Full model

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

87

Using Location Data to Infer Friendships

• 2.8m location sightings of 489 users of Locaccino friend finder in Pittsburgh

• Place entropy for inferring social quality of a place– #unique people seen in a place– 0.0002 x 0.0002 lat/lon grid,

~30m x 30m

Cranshaw et al, Bridging the Gap Between Physical Location and Online Social Networks, Ubicomp 2010

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

88

Inferring Friendships

• 67 different machine learning features– Location diversity (and entropy)– Intensity and Duration– Specificity (TF-IDF)– Graph structure (overlap in friends)

• 92% accuracy in predicting friend/not– Location entropy improves performance

over shallow features like #co-locations

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

89

• Insert graph here• Describe entropy

Co-location data to infer friendshipUsing place entropy, accuracy of 92%Can also infer number of friends

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

90

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

91

• Insert graph here• Describe entropy

Could infer friend / not-friend based on co-location patterns and place entropy at 92%

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

92

Brooklyn Queens Expressway

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

93

Bezerkeley, CA

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

94

Expectations Condition

Why do you think Angry Birds uses your location data?

How comfortable are you with Angry Birds using your location data?

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

95

Purpose Condition

Angry Birds uses your location data for advertising.

How comfortable are you with Angry Birds using your location data?

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

96

Few people read privacy policies• We want to install the app• Reading policies not part of main task• Complexity (the pain!!!)• Clear cost (time) for unclear benefit

©2

01

4 C

arn

eg

ie M

ello

n U

niv

ers

ity :

97

How PrivacyGrade Works (1)

• We operationalize privacy as people’s expectations of data use– Ex. Most people don’t expect Fruit Ninja

to use location data, but it actually does– Ex. Most people do expect Google Maps

to use location data

top related