miguel andres, nicolas bordenabe, konstantinos chatzikokolakis, catuscia palamidessi

31
Geo-Indistinguishability: Differential Privacy for Location Based Services Miguel Andres, Nicolas Bordenabe, Konstantinos Chatzikokolakis, Catuscia Palamidessi

Upload: della-goodwin

Post on 13-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Miguel Andres, Nicolas Bordenabe, Konstantinos Chatzikokolakis, Catuscia Palamidessi

Geo-Indistinguishability: Differential Privacy for

Location Based ServicesMiguel Andres, Nicolas Bordenabe,

Konstantinos Chatzikokolakis, Catuscia Palamidessi

Page 2: Miguel Andres, Nicolas Bordenabe, Konstantinos Chatzikokolakis, Catuscia Palamidessi

Overview Formal Definition Mechanism for Geo-Indistinguishability Enhancing Location Based Services Case Study Strengths and Weaknesses Future Work

Outline

Page 3: Miguel Andres, Nicolas Bordenabe, Konstantinos Chatzikokolakis, Catuscia Palamidessi

Suppose a tourist in Paris wishes to obtain information about restaurants near the Eiffel Tower

However, this presents many potential privacy issues

Real-World Example

Page 4: Miguel Andres, Nicolas Bordenabe, Konstantinos Chatzikokolakis, Catuscia Palamidessi

Provide information based on a user’s location

Fine vs. Coarse Grained◦ Coarse Grained—weather, location-based

advertising, etc.◦ Fine Grained—Point of Interest (POI) services

involving exact location

Location Based Services (LBS)

Page 5: Miguel Andres, Nicolas Bordenabe, Konstantinos Chatzikokolakis, Catuscia Palamidessi

Smartphones equipped with GPS use LBS’s Untrusted LBS’s could lead to user privacy

breach◦ Discover home location◦ Develop user profiles

No current way to use LBS’s without revealing to a server your location

Problem

Page 6: Miguel Andres, Nicolas Bordenabe, Konstantinos Chatzikokolakis, Catuscia Palamidessi

LBS’s need user coordinates in order to provide their service

Trade-off: The user wants privacy, but also good results

The method of obtaining privacy must be computationally efficient enough to run on a smartphone

Problem—Continued

Page 7: Miguel Andres, Nicolas Bordenabe, Konstantinos Chatzikokolakis, Catuscia Palamidessi

Adding controlled noise to user’s location Send the approximate location to LBS Achieves quasi-indistinguistability within a

certain area User is equally likely to be anywhere within

a radius r of the Eiffel Tower Generalization of the notion of differential

privacy

Solution: Geo-Indistinguishability (GI)

Page 8: Miguel Andres, Nicolas Bordenabe, Konstantinos Chatzikokolakis, Catuscia Palamidessi

User specified:◦ Radius: r◦ Level of discrepancy between two points: l

Tradeoff:◦ As r gets larger, privacy level becomes greater

but results become more inaccurate Ratio of l to r is the level of privacy ε

Solution—Continued

Page 9: Miguel Andres, Nicolas Bordenabe, Konstantinos Chatzikokolakis, Catuscia Palamidessi

Since none of these work well, came up with GI

Say a user is located at some point x Value that the user reports to LBS is a point

z What constitutes a truly private value z?

◦ Must report a value within Paris or else it won’t be useful

Formal Definition

Page 10: Miguel Andres, Nicolas Bordenabe, Konstantinos Chatzikokolakis, Catuscia Palamidessi

When the radius of interest is small, one must have a large level of privacy in order to be well-protected

When the radius of interest is large, the level of privacy does not need to be as large in order to be well-protected

Therefore the level of privacy is proportional to the radius of the user’s choice

Formal Definition—Continued

Page 11: Miguel Andres, Nicolas Bordenabe, Konstantinos Chatzikokolakis, Catuscia Palamidessi

Formal Definition—Continued

Page 12: Miguel Andres, Nicolas Bordenabe, Konstantinos Chatzikokolakis, Catuscia Palamidessi

GI is independent of any side information from an attacker

Every point within one unit of distance from each other within the region specified by the level of privacy is equally likely to be returned

Level of privacy depends on the distance between the two points

Formal Definition—Continued

Page 13: Miguel Andres, Nicolas Bordenabe, Konstantinos Chatzikokolakis, Catuscia Palamidessi

Similar to DP, GI is independent from side information of the attacker

Euclidean distance vs. Hamming distance◦ Euclidean distance—spatial, linear distance

between two points◦ Hamming distance—distance between sets of

data

Comparison to Differential Privacy

Page 14: Miguel Andres, Nicolas Bordenabe, Konstantinos Chatzikokolakis, Catuscia Palamidessi

Output perturbation using Laplace distribution

Three step process:◦ Using Laplacian noise on a continuous space◦ Discretize it in order for it to be useful for real

world coordinates◦ Truncate points to reasonable points

Mechanism for GI

Page 15: Miguel Andres, Nicolas Bordenabe, Konstantinos Chatzikokolakis, Catuscia Palamidessi

Perturbate the output by noise generated by the Laplace distribution

Results in a Probability Density Function (PDF)

Choose a random point within the PDF

Continuous Domain

Page 16: Miguel Andres, Nicolas Bordenabe, Konstantinos Chatzikokolakis, Catuscia Palamidessi

Coordinates on a map are given as discrete points (latitude and longitude)

Map the random point chosen in the continuous domain to the nearest point in a discrete domain

Discrete Domain

Page 17: Miguel Andres, Nicolas Bordenabe, Konstantinos Chatzikokolakis, Catuscia Palamidessi

Truncate Eliminate unrealistic points that may be

returned by the output perturbation function

Page 18: Miguel Andres, Nicolas Bordenabe, Konstantinos Chatzikokolakis, Catuscia Palamidessi

The concept and mechanism of GI is most appropriately applied to LBS in smartphones

LBS use a simple client-server model to obtain information

User sends the current location x and server sends back POI info

Enhancing Location Based Services

Page 19: Miguel Andres, Nicolas Bordenabe, Konstantinos Chatzikokolakis, Catuscia Palamidessi

An approximate location will be generated on the client and sent to the LBS

For mildly location-sensitive LBS’s, results are approximately the same even if the reported location is relatively far away

For highly location-sensitive LBS’s, results are undesirable unless within the specified radius

Mild vs. Highly Location-Sensitive LBS

Page 20: Miguel Andres, Nicolas Bordenabe, Konstantinos Chatzikokolakis, Catuscia Palamidessi

For highly location-sensitive LBS’s, an area of retrieval larger than the intended area of retrieval must be specified

Data sent to the server is only the approximate location and area of retrieval

Results from LBS are filtered on the client to match the user’s original area of retrieval

Enhancing Location Based Services

Page 21: Miguel Andres, Nicolas Bordenabe, Konstantinos Chatzikokolakis, Catuscia Palamidessi

Area of Interest

Page 22: Miguel Andres, Nicolas Bordenabe, Konstantinos Chatzikokolakis, Catuscia Palamidessi

Area of Interest vs. Reported Position

Page 23: Miguel Andres, Nicolas Bordenabe, Konstantinos Chatzikokolakis, Catuscia Palamidessi

Area of Interest vs. Area of Retrieval

Page 24: Miguel Andres, Nicolas Bordenabe, Konstantinos Chatzikokolakis, Catuscia Palamidessi

Potential Locations within Area of Retrieval

Page 25: Miguel Andres, Nicolas Bordenabe, Konstantinos Chatzikokolakis, Catuscia Palamidessi

Potential Locations within Area of Interest

Page 26: Miguel Andres, Nicolas Bordenabe, Konstantinos Chatzikokolakis, Catuscia Palamidessi

The Census Bureau contains information in the form of (hBlock, wBlock)◦ hBlock—where the worker lives◦ wBlock—where the worker works

This data is publicly data in sanitized form Their goal is to sanitize information from the

U.S. Census Bureau and compare to the original sanitized data

Case Study—U.S. Census Bureau

Page 27: Miguel Andres, Nicolas Bordenabe, Konstantinos Chatzikokolakis, Catuscia Palamidessi

The GI algorithm takes each point of the census data and randomizes it according to specified values of l and r

Home to work commute distance was used as a verification

As the value of l decreases for a given r, the sanitization results begin to differ more with actual results

Therefore, as the privacy level increases, the accuracy of the data decreases

Case Study Continued

Page 28: Miguel Andres, Nicolas Bordenabe, Konstantinos Chatzikokolakis, Catuscia Palamidessi

Case Study Continued

Page 29: Miguel Andres, Nicolas Bordenabe, Konstantinos Chatzikokolakis, Catuscia Palamidessi

Strengths and Weaknesses Formalized definition

of GI Allows users the

ability to choose privacy levels

Still provides useful data from LBS

Paper does not present a software solution

Current method of user privacy settings could be tedious

Encryption of user preferences

Case study was not a complete verification of the process

Page 30: Miguel Andres, Nicolas Bordenabe, Konstantinos Chatzikokolakis, Catuscia Palamidessi

Adding software solution Appears that this has been attempted

through Location Guard Client add-on to POI services

Future Work