location privacy - mark van cuijk

Location Privacy

Mark van Cuijk and Barry Weymes

December 5, 2010

Abstract

As more individuals carry mobile devices that have the capability to determine their currentlocation and communicate this information to a global network, location based services which providethe user with personalized information emerge. Several researchers have attempted to formalize theprivacy impact of such services and the level of detailed knowledge they obtain about users. Severalalgorithms to cloak the exact location of individuals have been designed, each of them deliveringa certain balance between privacy and usability. This paper presents the results of a small-scaleinterview performed by the authors, summarizes several methods to cloak location data and explainsan algorithm for a privacy-aware location query processor.

1 Introduction

As the progression of location based services (LBS)becomes greater and the ability to locate oneself ona map using GPS, the possibility of abuse by othersbecomes greater also. LBS will have a significantrole to play in the future. Uses such as findingthe nearest shop in unfamiliar terrain or detectingtraffic patterns are already being utilized more andmore these days.

While these new technologies may have benefits,they also carry the possibility of violating ones pri-vacy. For example, a simple location based queryto find the nearest train to your location could beused to track ones movements if it is queried suffi-ciency frequently enough. Tracking movements ofa user may allow an adversary to deduce additionalinformation, e.g. someone frequently visiting a can-cer treatment facility probably has cancer; someonethat unexpectedly visits a competitor of his em-ployer might be shopping for a new job.

There are many privacy enhancing technologies(PET) that can help protect ones privacy. We willoutline some of the different PET that are being re-searched in the field of location privacy. The mainidea in most of the PETs is to be anonymous withina group of others at your location, therefore pre-serving ones privacy. k-anonymous is the generalterm to describe how one subject cannot be identi-fiable from k−1 other subjects at the same locationat a given time.

This paper starts with a small-scale interviewresearch we conducted, in order to find out whetherstudents of the Radboud University in Nijmegen areusing LBS and whether they’re concerned about

their privacy when using such services. Section 3introduces the concept of k-anonymity and demon-strates how this notion can be applied to locationinformation. Several algorithms are discussed thatcombine location data from several users in order topresent anonymized data to an LBS. For an LBS touse anonymized data, it needs modified search algo-rithms, one of which is presented in section 4: finda candidate set of nearest objects, given a cloakeduser location. Section 5 discusses three locationPETs in a social setting: it allows two users of thesystem to determine the physical distance betweeneach other, without revealing exact location infor-mation in the first place. Finally, a wrap up is donein section 6.

2 Interview

As part of this project we perform a small inter-view. This section enlists the questions we posed, adescription of the subjects we used for the interviewand some results of the interview.

2.1 Questions

We wanted to find out whether people are alreadyusing location based services and to see what theirlevel of understanding of location privacy is.

In question section 1, we ask the students abouttheir usage of location based services to get themin the correct frame of mind. In question section 2aand 2b we ask about the students current usage togauge their experiences with the technology. Ques-tion section 3 is designed to query the students im-pressions of location privacy after determining their

1

usage levels. Finally in question section 4, we askthe students about scenarios that may change theway they think about the subject. By comparingthe answers to the questions in sections 3 and 4, wewant to determine whether the concerns studentsacknowledge agree with how they respond to spe-cific scenarios.

Section 1 Do you use location based services on:

• Q1. Your smart phone• Q2. In-car navigation system• Q3. Laptop or desktop computer

Section 2a Have you ever used location based ser-vices to find:

• Q1. An ATM• Q2. A gas station• Q3. A specific shop• Q4. Currently nearby friends

Section 2a Have you ever used:

• Q1. Trackr!1

• Q2. FourSquare2

• Q3. Twitter location tags3

• Q4. Photo camera with GPS

Section 3 Do you have location privacy concernswith:

• Q1. Google Maps• Q2. Photo camera with GPS• Q3. FourSquare

Section 4 Are you concerned about the followingscenario:

• Q1. “While on vacation in Cuba youmake known to the public that you vis-ited there. Some years later while en-tering the US a customs officer asks youwhat you were doing in Cuba.”

• Q2. “You forgot to tell your boyfriendor girlfriend that your going to someoneelse’s house to study. He or she finds outbecause of LBS and asks for an explana-tion.”

• Q3. “For your job, you visit a customer.On the way back to the office you make alengthy detour. You employer finds outabout the detour and asks for an expla-nation.”

2.2 Results

We questioned 69 students of the Radboud Univer-sity, with a slight bias towards students followinga technically-oriented educational program. Theseresults are therefore not representative for society,but we expect this target group to resemble highereducated individuals of the younger generation.

Figure 1: Depicts the answers from the students onthe questions in sections 1, 2a and 2b. Each columnrepresents a question.

Figure 2: This figure shows the answers to the ques-tions of section 3. Each column represents a privacyconcern.

1trackr.nl allows users to track the location of a device and display the movements on a map in real-time2FourSquare is a service that allows users to check-in into a location in order to share this information in social networks3Twitter is a micro-blogging service, allowing users to post 140 character messages to a stream. Certain twitter clients

can report the location of the user at the time the message is composed.

2

Figure 3: This figure shows the results of the ques-tions of section 4.

2.3 Analysis

Section 1 show that very few people currentlyuse smart phones and in-car navigation systems,while all students use location based services – likeGoogle Maps or Bing Maps – on their personal com-puter. We expect that smart phone usage is lowbecause of the high prices of these devices com-pared to normal phones and the fact that our targetgroup is students, which have a relatively low bud-get. Also car usage low among students, so thisprobably explains why in-car navigation systemsare not much used in our target group.

Section 2a shows that not many students uselocation based services to find anything other thanthe location of certain shops. In an additional ad-hoc questionnaire, we discovered very few studentsare aware that they can find ATMs using an LBSlike Google Maps. As with section 1, we note thatmany students don’t own a car and therefore proba-bly never need to find a gas station. Many studentsalso don’t use the functionality of being able to de-tect if a friend is close proximity to them. The mostcommon use for location based services was findinga specific shop, e.g. finding the closest GAP shop:64 of the students have used this type of service.

In section 2b, we asked the student which loca-tion based services they use, other then finding thelocation of a certain object. The results is that moststudents don’t use or don’t know about many ofthese services. Only one person indicated that theyuse Trackr! to plot his movements on an interac-tive map and only 2 people use a camera with GPSfunctionality, like the iPhone 4 that automaticallyadds the GPS coordinates to each picture. Somestudents responded that they don’t know whethertheir photo camera tracks GPS coordinates.

In section 3, we ask about the students concernsabout location privacy. 85% of the students reportthat they are concerned about Google recordingtheir locations when using Google Maps. Less peo-ple are concerned with the privacy issues of having a

GPS-equipped camera and the website FourSquare.Finally in section 4, students report they most

concerned that their partner would use an LBS totrack discover about their movements to a friendshouse. Regarding questions 1 and 3, some notedthat when deciding to visit the US gives nation of-ficials the right to interrogate you as a foreigner andthat an employer is entitled to request explanationof what an employee does during work hours.

We expected students to show less concern onquestions in section 3 and expected students whodiscover about potential dangers that researcherstalk about – by hearing about the scenarios in sec-tion 4 – to raise more concern. However, the resultsshow the contrary: in section 3 students claim theyare rather concerned about their privacy, while theanswers in section 4 show they accept governmentofficials or employers to question their behavior.Although probably a more psychological topic, itcan be interesting to do further research given theseresults.

3 Location k-Anonymity

In 2002, Sweeney describes a model to allow querieson a database to obtain valuable information froma database containing personal information, whileat the same time retaining a certain level of privacy.This section described the original k-anonymitymodel by Sweeney, explains how Gruteser andGrunwald apply this model to location informationand describes two algorithms that are built uponthe k-anonymity concept: CliqueCloak by Gedikand Liu and a peer-to-peer cloaking algorithm byChow, Mokbel and Liu.

3.1 k-Anonymity in general

To do business, organizations need to keep personalinformation. For example, a bank needs to main-tain financial data on an individual and a hospitalneeds to maintain medical records. Often, this datais useful to conduct research, which is often doneby a third party, but law and privacy policies en-force organizations to be careful with revealing suchinformation about individuals.

In [1], Sweeney describes a common practicefor organizations to remove explicit identifiers fromperson-specific data, like name, address and tele-phone number. The assumption that is made isthat the lack of explicit identifiers make the datasecure with respect to the ability to identify per-sons. Sweeney shows how 87% of the population inthe United States can likely be uniquely identifiedbased only on {5-digit ZIP, gender, date of birth}

3

[2]. Combining data from two sources – medicaldata made available to researchers [3] and a voterslist that Sweeney purchased for twenty dollars [4]– it seemed possible to link diagnosis, proceduresand medications to particularly named individuals.

L. Sweeney.k-anonymity: a model for protecting privacy.International Journal on Uncertainty,Fuzziness and Knowledge-based Systems,10 (5), 2002; 557-570.

Page 3

linking diagnosis, procedures, and medications to particularly namedindividuals.

For example, William Weld was governor of Massachusetts at that timeand his medical records were in the GIC data. Governor Weld lived inCambridge Massachusetts. According to the Cambridge Voter list, six peoplehad his particular birth date; only three of them were men; and, he was theonly one in his 5-digit ZIP code.

Ethnicity

Visit date

Diagnosis

Procedure

Medication

Total charge

ZIP

Birthdate

Sex

Name

Address

Dateregistered

Partyaffiliation

Date lastvoted

Medical Data Voter List

Figure 1 Linking to re-identify data

The example above provides a demonstration of re-identification by directlylinking (or “matching”) on shared attributes. The work presented in this papershows that altering the released information to map to many possible people,thereby making the linking ambiguous, can thwart this kind of attack. The greaterthe number of candidates provided, the more ambiguous the linking, and therefore,the more anonymous the data.

2. Background

The problem of releasing a version of privately held data so that the individualswho are the subjects of the data cannot be identified is not a new problem. Thereare existing works in the statistics community on statistical databases and in thecomputer security community on multi-level databases to consider. However,none of these works provide solutions to the broader problems experienced intoday’s data rich setting.

2.1. Statistical databases

Federal and state statistics offices around the world have traditionally beenentrusted with the release of statistical information about all aspects of thepopulace [5]. But like other data holders, statistics offices are also facingtremendous demand for person-specific data for applications such as data mining,

Figure 4: The left circle contains part of the in-formation Sweeney obtained from [3]; the right cir-cle contains the information Sweeney obtained from[4]. The three attributes in the intersection allowsthe two data sets to be successfully combined.

As it was shown that just removing explicitidentifiers is not sufficient to properly anonymizea certain data set, Sweeney proposes a model thatallows organization to publish or sell informationfrom their database, while making sure that theidentity of individuals cannot be deduced by com-bining the information from this data set with anyother source. To get to such a strict goal, the k-anonymity model is proposed to ensure that anyrelease of information about a single individual can-not be distinguished from the information about atleast k− 1 other individuals. A formal definition isgiven in [1]:

Attributes Let B(A1, . . . , An) be a table with afinite number of tuples. The finite set of at-tributes of B are A1, . . . , An.

Quasi-identifier Given a population of entitiesU, an entity-specific table T(A1, . . . , An), fc:U → T and fg: T → U’, where U ⊆ U’.A quasi-identifier of T, written QT , is a setof attributes Ai, . . . , Aj ⊆ A1, . . . , An where:∃pi ∈ U such that fg(fc(pi)[QT ]) = pi.

k-anonymity Let RT(A1, . . . , An) be a table andQIRT be the quasi-identifier associated withit. RT is said to satisfy k-anonymityif and only if each sequence of values inRT[QIRT ] appears with at least k occurrencesin RT[QIRT ].

3.2 Applied to location data

Gruteser et al [5] looks at a way to apply Sweeneyet al’s method [1] of anonymizing data in general,

to the field of location data. This algorithm seeksthe provide anonymity regardless of the populationdensity by setting a minimum amount of possiblenodes that any communication could have origi-nated from. The number used to describe this iscalled kmin.

The system model contains an anonymity serverthat receives location information from nodes thatcommunicate with it. This anonymity server acts asa middle man for all services that the nodes wish touse. Once a node requests a service the anonymityserver perturbs the location information and for-wards the message to the external service.

The location of the node can be represented bythe tuple ([x1, x2], [y1, y2][t1, t2]) where [x1, x2] and[y1, y2] describe an two dimensional area that thesubject was present in and [t1, t2] represents thetime range at which the subject was in the area.This tuple can be considered anonymous when itdescribes the location of other subjects at the sametime.

The key approach to this algorithm is thatthe anonymity server monitors the locations of itsnodes. The nodes request for a service will be de-layed until the required kmin nodes have visited thearea. Then the request will be anonymous. Thetime t2 is set to the current time and t1 is set thethe time of the request minus a random cloakingvalue.

The location tuple can be considered k-anonymous once an adversary cannot uniquelyidentify a subject through observation, since thetuple also matches k − 1 other subjects.

3.2.1 Analysis

However in some circumstances an adversary canreveal a nodes location because of a weakness inthe algorithm.

Consider the following location tuples:

1. ([0, 1], [0, 1][t1, t2])

2. ([1, 2], [0, 1][t1, t2])

3. ([0, 1], [1, 2][t1, t2])

4. ([0, 2], [0, 2][t1, t2])

4

Figure 5: This figure displays a weakness in thealgorithm, obtained from [5]; If node 4 requests aservice it can be singled out because the algorithmmust widen the quadrant, if kmin is greater than 1- which is must be to preserve privacy

All four tuples share the same time attributes.Let us take kmin to be 3, for example. In figure 5,the node 4 can be singled out in communications asall other quadrants have sufficient possible nodes toprevent a privacy violation. If each numbered noderequests a service at the same time node 4 couldbe identified. This simple example shows that theprivacy of the subjects cannot be guaranteed.

3.3 CliqueClock

While the algorithm by Gruteser and Grunwald(section 3.2) allows the client to specify the accept-able spatial and temporal resolution for each in-dividual message, the minimum anonymity set sizekmin is a global parameter and is therefore the samefor all messages.

In an attempt to overcome this limitation,Gedik and Liu designed the CliqueCloak algorithmthat can handle messages that each have individualspatial and temporal resolution requirements, butalso have individual privacy constraints by settinga kmin on each message [6].

3.3.1 Concept of CliqueCloak

The algorithm takes as input a set of messagesms ∈ S : 〈uid, rno, {t, x, y}, k, {dt, dx, dy}, C〉 frommobile nodes and transforms these into a set oftransformed messages mt ∈ T : 〈uid, rno, {X :[xs, xe], Y : [ys, ye], I : [ts, te]}, C〉. The messagewith sequence number rno encodes the location(x, y) of user uid at timestamp t, the requestedminimum anonymity set size k, required spatio-temporal resolution (dx, dy, dt) and a user messageC. The box defined by the ranges [x− dx, x+ dx],[y − dy, y + dy] and [t − dt, t + dt] is called theanonymity constraint box. In the output message,the spatio-temporal information is transformed intoa cloaking box defined by a range for both spatialdimensions X and Y and also for the temporal di-mension I.

Each output message mt relates to exactly oneinput message ms; each input message ms relatesto at most one output message mt. This meansthat the algorithm cannot “come up” with newmessage, but is able to drop input message if it isunable to construct an output message with the re-quested spatio-temporal resolution and privacy re-quirements.

Gedik and Liu introduce a couple of propertiesthat must hold for all messages:

Spatial and Temporal Containment statethat the cloaking box contains the exact loca-tion of the user: ms.x ∈ mt.X, ms.y ∈ mt.Yand ms.t ∈ mt.I

Spatial and Temporal Resolution state thatthe anonymity constraint box contains thecloaking box entirely: mt.X ⊂ [ms.x −ms.dx,ms.x + ms.dx], mt.Y ⊂ [ms.y −ms.dy,ms.y + ms.dy] and mt.I ⊂ [ms.t −ms.dt,ms.t+ms.dt]

Content Preservation states that the user mes-sage is passed on unmodified: ms.C = mt.C

Location k-anonymity states that there are atleast k − 1 messages, each from a differentnode, that are mapped to the same spatio-temporal cloaking box: ∃T ′ ⊂ T , such thatmt ∈ T ′, |T ′| ≥ ms.k, ∀{mt,i,mt,j}∈T ′ :mt,i.uid 6= mt,j .uid and ∀mt,i∈T ′ : mt.X =mt,i.X ∧mt.Y = mt,i.Y ∧mt.I = mt,i

3.3.2 Algorithm description

The idea of the CliqueCloak algorithm is to definean undirected graph, such that each vertex repre-sents a message ms ∈ S and an edge between twovertices ms,1 and ms,2 means that the messagesms,1 and ms,2 can be transformed into the samecloaking box and respecting all requested resolu-tion and privacy constraints.

In a graph G(S,E), S being the set of verticesand E the set of edges, an edge e = (ms,i,ms,j) ∈E exists if and only if P(ms,i) ∈ Bcn(ms,j)),P(ms,j) ∈ Bcn(ms,i)) and ms,i.uid 6= ms,j .uid,where P(ms) is the (x, y, t) coordinate of messagems and Bcn(ms) denotes the anonymity constraintbox of message ms. This means that the twomessages must be sent by two different users andthat both locations must lie within each othersanonymity constraint box.

Having this graph, it is straightforward to trans-late the problem of finding a set of messages thatcan be transformed to the same cloaking box intothe problem of finding cliques in the graph G(S,E)of the size that is at least equal to the largest kminof the individual messages.

Each time a new message is received, the al-gorithm creates a vertex in the graph, determineswhich messages can share a cloaking box with thenew message and creates the corresponding edges.Since only vertices that are within the constraint

5

box of the new vertex can form edges, the Clique-Cloak algorithm can be optimized to collect all ver-tices that are within the constraint box and thentest the reverse requirement on each of the verticesin this set. To implement this efficiently, the ver-tices can be indexed in a multi-dimensional index.Pseudo-code for this algorithm is given in Algo-rithm 1 of [6].

3.3.3 Analysis

Several methods to form cliques have been tested,each giving different results. The best perform-ing method is called nbr-k and in this summarizedoverview we’ll only look into the results by thismethod. We mention some of the interesting re-sults; a more comprehensive overview of the resultsof these tests can be found in section 6 of [6], in-cluding a description of the test methodology.

In general, messages with a lower value for k canbe anonymized easier than messages with a highervalue for k. For the nbr-k method, success rate de-creases from around 80% for messages with k = 2to around 60% for messages with k = 5. Successmeans that a message has not been dropped beforeit expired.

One of the properties of the CliqueCloak algo-rithm is that each message can have a differentvalue for k. This means that messages with dif-ferent k can be combined in a single cloaking box.A measurement relative anonymity for an individ-ual message has been introduced, which equals theratio between the number of messages that are inthe cloaking box and the value of k for this mes-sage, e.g. a relative anonymity value of 2 meansthat the number of messages in the cloaking boxis twice the value of k. The location k-anonymityproperty ensures that the relative anonymity is atleast 1 for each message.

For the nbr-k method, the average relativeanonymity ranges from 1.7 for messages with k = 2to 1.2 for messages with k = 4. For messages withk = 5, the relative anonymity is always 1, sincethe algorithm never searches for cliques larger thanwhat is required by the highest k value in a subsetand the tests didn’t include messages with valuesfor k above 5.

3.4 Peer-to-peer

In the previous algorithms, an intermediate server– or proxy – must be used to anonymize the loca-tion of users. In [7], Chow, et al. propose a peerto peer spatial cloaking algorithm, that doesn’t re-quire such a server to be in place. The idea is thatbefore sending location information to the service

provider, the requester will form a group of nodesand mask its location within this group. It uses thegroup to request information, therefore taking theresponsibly of cloaking its location away form theusual central server system to the nodes themselves.

Figure 6: An example of how peer to peer spatialcloaking works

Lets start with a simple example. In figure 6,we can see 5 cars on the road. A wishes to find outwhere the nearest gas station is. A looks for otherpeers and finds B,C,D and E. A will then cloakits exact location by randomly selecting a peer, e.g.D, to communicate for the group; D is called theagent. This agent takes A′s request and forwardsit onto the location-based database servers. Oncethe information is computed the agent D receivesthe information and forwards the reply to A.

This algorithm has two modes of use: on-demand and procative. In the on-demand mode,the cloaking algorithm is only used when informa-tion is required by one of the nodes, while proca-tive mode means the nodes periodically seek othernodes to form a group with, just in case it or otherswish to request information.

The system model is similar to the other algo-rithms above. A node or mobile client sends its lo-cation and its query to the location based Databaseservers via the base station. Each node has its ownprivacy profile. Each one of them can set its desiredlevel of privacy. To do this 2 variables are set: k andAmin. k describes the level of k-anonymity it de-sires. A high k means a high privacy requirement.Amin is the minimum resolution of the cloaking spa-tial region. The higher Amin, the higher successof finding an appropriate group, but the lower thelevel of quality of the returned answer.

Each node has two communication devices at-tached to it. One is used to communicate with otherpeers and the other is used to communicate withthe location based database servers. The methodof communication can be varied, for example usingBluetooth, GSM or wireless LAN.

6

Figure 7: System model

3.4.1 Analysis

In experimental studies it has been shown thatproactive mode outperforms the on-demand modein terms of response time, since a group has alreadybeen formed at the moment the user initiates a re-quest. However the higher overhead of using proac-tive mode when the system is not used means thatthere is a higher idle power consumption, whichmay not be acceptable for mobile devices.

4 Query processing

In section 3, several techniques were presented thatcan be used to hide the identification of individualsto a privacy-aware LBS and – more important forthe topic of this section – to cloak the exact loca-tion of an individual. This means that the queryprocessor – the component of an LBS that has thetask to process queries presented by clients – hasto cope with location uncertainty.

There are several kinds of queries that can beperformed on location data. In [8], Mokbel, Chowand Aref present a method for processing locationqueries in a privacy-aware setting, that can be im-plemented in conjunction with existing geographicinformation systems.

The kind of query that is being processed by theexplored algorithm is of the kind “Please, find methe location of the nearest object of type T,” whereT can be an object with a fixed location – like anATM machine or a gas station – or a moving object– like a known friend or a police officer. It is evenpossible that the location of this object is storedas a privacy-aware region, as described in section4.2. This kind of query is called a nearest-neighborquery.

Since the query processor is unaware of the ex-act location of the client, it can intuitively beendetermined that it might not always be possible toreturn a single correct answer. Given cloaked loca-tion data for the client, section 4.1 introduces analgorithm that returns a minimal set of candidateresults of objects with a known exact location. Sec-tion 4.2 extends the algorithm to cope with locating

the nearest neighbor among objects whose locationsare also cloaked.

4.1 Private queries over public data

Section 5.1 of [8] describes an algorithm that canhandle queries in this situation. When dealing withpublic data for objects of type T , the query proces-sor has access to the full list of objects of that spe-cific type, which are probably stored in a database:t ∈ objT : 〈id, {x, y}〉, identified by an object iden-tifier id and a position (x, y).

Besides this list, the query processor obtains the(xl, xr, yu, yl) bounds of a rectangle that containsthe position of the client the performs the requestas input. Every point in this rectangle has an equalprobability of being the location of the client. Theoutput of the algorithm is the set of objects thatcan be the nearest neighbor for any client locatedinside the input rectangle. Formally, this can bedescribed as the set of all objects t ∈ objT forwhich ∃(x, y) with xl ≤ x ≤ xr ∧ yl ≤ y ≤ yusuch that @t′ ∈ objT : D(P(x, y), t′) < D(P(x, y), t),with D(p1, p2) being the distance between points p1

and p2.The algorithm consists of four steps to obtain

the set of output objects:

Step 1: filter objects The cloaked region thatcontains the client is rectangle-shaped andtherefore has four corners for which the algo-rithm knows the exact locations: v1 : (xr, yl),v2 : (xl, yl), v3 : (xl, yu) and v4 : (xr, yu). Foreach corner, the algorithm select the databaseobject of type T that is closest to the cor-ner. A set of at most four filter objects isconstructed in this way.

T2

T3

T16

T7

T4

T8

T5

T9

T18

T13

T12

T17

T21

T11

T20 T

22

T24

T25

T15

T26

v1

v2

v3

v4

(a) Step 1: Filters

T2

T3

T16

T7

T4

T8

T5

T9

T18

T13

T12

T17

T21

T11

T20 T

22

T24

T25

T15

T26

v1

v2

v3

v4

m12

m24

m34

m13

(b) Step 2: Middle points

T2

T3

T16

T7

T4

T8

T5

T9

T18

T13

T12

T17

T21

T11

T20 T

22

T24

T25

T15

T26

v1

v2

v3

v4

m12

m24

m34

m13

(c) Step 3: AEXT

T3

T16

T8

T13

T12

T17

T21

v1

v2

v3

v4

(d) Step 4: Client

Figure 5: Example of a private query over public data.

of di, dj , and dm (Line 17 in Algorithm 2). Finally, thearea AEXT is expanded by the distance maxd in the samedirection of edge eij (Line 18 in Algorithm 2). Figure 5cdepicts this step where all the computed distances for thefour edges are plotted. Only those distances that contributeto maxd are plotted as solid while other distances are plottedas dotted lines. An arrowed line is plotted from each edge torepresent its maxd extension to plot AEXT . The intersectionof the arrowed line with its edge represents the point thathas contributed to maxd.

STEP 4: The candidate list step. In this step, theserver issues a range query that returns all target objectswithin the area AEXT as the candidate list. The candidatelist is sent to the client where the query can be evaluatedlocally (Lines 20 to 21 in Algorithm 2). Figure 5d depictsthis step where the candidate list has only seven objectsthat include the exact query answer T13. Notice the differ-ence between Figure 5d and Figure 4c, where in the former,the client needs to evaluate her query on only 7 targets asopposed to 32 targets in the latter case.

Our approach is independent from the nearest-neighborand range query algorithms used in both the filter and ex-tended area steps to find the nearest target to each ver-tex and the objects within area AEXT , respectively. Thesealgorithms are assumed to be implemented in traditionallocation-based database servers. We do not have any as-sumptions about these algorithm, it can be employed us-ing R-tree or any other methods. In fact, our approach canbe seamlessly integrated with any traditional location-baseddatabase servers to turn them to be privacy-aware.

5.1.2 Proof of CorrectnessIn this section, we prove the correctness of Algorithm 2

by proving that: (1) it is inclusive, i.e., it returns the exactanswer within its candidate list, and (2) it is minimal, i.e.,area AEXT is of minimal size given the set of filter targets.

Theorem 1. Given a cloaked area A for a user u locatedanywhere within A, Algorithm 2 returns a candidate list thatincludes the exact nearest target to u.

Proof. Assume that there is a user u located in area Awhere u’s nearest target tu is not included in the candidatelist. Assume further that the exact location of u within Ais on the edge eij = vivj . Based on the filter objects ti andtj of vi and vj , respectively, we identify two cases:

vi

vj

t

(a) t = ti = tj

vi

vj

ti t

j

mij

(b) ti 6= tj

Figure 6: Private queries over public data.

Case 1: ti = tj . Figure 6a depicts this case. The dottedcircle represents the distance di from vi to t while the solidcircle represents the distance dj from vj to t. The bold linethat is tangent to the solid circle represents the boundary ofthe area AEXT that will include the candidate list. For anylocation pu of the user u on vivj , if there is a target tu thatis nearest to pu than t, then tu would be within the unionarea of the two circles. Thus, tu should be below the solidline, hence is included in AEXT and is returned within thecandidate list.Case 2: ti 6= tj . Figure 6b depicts this case. The dotted

and solid circles represent the distances di and dj , respec-tively. The bold circle represents the distance dm from mij

to ti. The bold line that is tangent to the bold circle repre-sents the boundary of the area AEXT that will include thecandidate list. Then, the location pu of the user u is either inthe line segment vimij or mijvj . In the former case, similarto Case 1, if there is a target tu that is nearest to pu thanti, then tu would be within the union area of the dotted andbold circles. Thus, tu should be below the solid line, henceis included in AEXT and is returned within the candidatelist. Similarly, if pu is on the line segment mijvj , then ifthere is a target tu that is nearest to pu than tj , then tu

would be within the union area of the solid and bold circles.Thus, tu should be below the solid line, hence is included inAEXT and is returned within the candidate list.

From Cases 1 and 2, we conclude that if user u is on theboundary of the cloaked region A, then any target object tu

that is nearest to u would be included in the candidate list.Trivially, the proof would be valid if the user u is within thearea A rather than on its boundaries.

769

Figure 8: Result of step 1. T12, T13, T16 andT17 are the selected filter objects, as they areclosest to the corner points of the cloaked re-gion.

Step 2: middle points For each pair of cornersthat share a border of the cloaked region, the

7

algorithm picks the intersection of the per-pendicular bisector of the associated filter ob-jects with the border between the two regioncorners. If both corners have the same filterobject, then the perpendicular bisector doesnot exist. This isn’t a problem, as the singlefilter object will be the closest object for eachpoint on the border.

T2

T3

T16

T7

T4

T8

T5

T9

T18

T13

T12

T17

T21

T11

T20 T

22

T24

T25

T15

T26

v1

v2

v3

v4

(a) Step 1: Filters

T2

T3

T16

T7

T4

T8

T5

T9

T18

T13

T12

T17

T21

T11

T20 T

22

T24

T25

T15

T26

v1

v2

v3

v4

m12

m24

m34

m13


T2

T3

T16

T7

T4

T8

T5

T9

T18

T13

T12

T17

T21

T11

T20 T

22

T24

T25

T15

T26

v1

v2

v3

v4

m12

m24

m34

m13

(c) Step 3: AEXT

T3

T16

T8

T13

T12

T17

T21

v1

v2

v3

v4

(d) Step 4: Client









vi

vj

t

(a) t = ti = tj

vi

vj

ti t

j

mij

(b) ti 6= tj








769

Figure 9: Result of step 2. m12, m13, m34

and m24 are the selected middle points.

Step 3: extended area For each border, the al-gorithm determines the distance from bothcorners to their associated filter objects andthe distance from its midpoint object to thefilter object that is closest to that point. Incase the border share a single filter object, thelast value is non-existent and therefore nottaken into account. Among these two or threedistances, the algorithm selects the maximumand extends the search area by moving theborder outwards over this distance, forming arectangle-shaped extended area AEXT .

T2

T3

T16

T7

T4

T8

T5

T9

T18

T13

T12

T17

T21

T11

T20 T

22

T24

T25

T15

T26

v1

v2

v3

v4

(a) Step 1: Filters

T2

T3

T16

T7

T4

T8

T5

T9

T18

T13

T12

T17

T21

T11

T20 T

22

T24

T25

T15

T26

v1

v2

v3

v4

m12

m24

m34

m13


T2

T3

T16

T7

T4

T8

T5

T9

T18

T13

T12

T17

T21

T11

T20 T

22

T24

T25

T15

T26

v1

v2

v3

v4

m12

m24

m34

m13

(c) Step 3: AEXT

T3

T16

T8

T13

T12

T17

T21

v1

v2

v3

v4

(d) Step 4: Client









vi

vj

t

(a) t = ti = tj

vi

vj

ti t

j

mij

(b) ti 6= tj








769

Figure 10: Result of step 3. The new borderis the border of the extended area. The ar-rows that are drawn indicate the position ofthe point contributed to the distance of theextended area border.

Step 4: candidate list Given the extended areaAEXT – which by the proof in section 5.1.2 of

[8] contains the nearest neighbor – the queryprocessor selects all points the are containedin this area and returns that set to the client.

T2

T3

T16

T7

T4

T8

T5

T9

T18

T13

T12

T17

T21

T11

T20 T

22

T24

T25

T15

T26

v1

v2

v3

v4

(a) Step 1: Filters

T2

T3

T16

T7

T4

T8

T5

T9

T18

T13

T12

T17

T21

T11

T20 T

22

T24

T25

T15

T26

v1

v2

v3

v4

m12

m24

m34

m13


T2

T3

T16

T7

T4

T8

T5

T9

T18

T13

T12

T17

T21

T11

T20 T

22

T24

T25

T15

T26

v1

v2

v3

v4

m12

m24

m34

m13

(c) Step 3: AEXT

T3

T16

T8

T13

T12

T17

T21

v1

v2

v3

v4

(d) Step 4: Client









vi

vj

t

(a) t = ti = tj

vi

vj

ti t

j

mij

(b) ti 6= tj








769

Figure 11: Result of step 4. All points thatare within the extended area are selected andreturned to the client. Since the client knowsits exact location, it can select the nearestneighbor from this set.

4.2 Private queries over private data

One of the more interesting things that Mokbel,Chow and Aref present in [8] is an algorithm thatallows not only the client that issues a query topresent privacy-aware location data, but also allowthe location of target objects in the database tohave a cloaked location. Instead of a single pointthat represents the exact location of a target object,its location is stored as a rectangle-shaped area,with any point in that area having an equal proba-bility of being the exact location of the object.

The algorithm for processing queries over pri-vate data follows the same steps as the one de-scribed in section 4.1 and is, in fact, a generalizedversion. For each step, one of the corners of thecloaked region is used as the input for the algo-rithm:

Step 1: filter objects When determining thenearest target object T for each of the cornerscu,1 to cu,4 of the cloaked location of user u,the algorithm uses the corner cT,i of T thathas the largest distance to cu,i.

Step 2: middle points The method to define amiddle point mi,j on the edge between cor-ners cu,i and cu,j is the same a in the origi-nal algorithm, with the corners of the targetobject regions selected, such that the lengthof the line Li,j connecting the two points ismaximal.

8

v1

v2

v3

v4

t3

t4

t2

t1

m34 m

24

m12

m13

(a) Server side

v1

v2

v3

v4

t3

t4

t2

t1

m34 m

24

m12

m13

(b) Server side

Figure 7: Private query over private data.

Theorem 2. Given a cloaked area A for a user u and aset of filter target objects t1 to t4, Algorithm 2 issues theminimum possible range query to get the candidate list.

Proof. In Figures 6a and 6b, the tangent point betweenthe bold line and the largest circle may contain a target thatis nearest to one location in vivj (point vj in Figure 6a andpoint mij in Figure 6b). Thus, if there is another line thatis lower than the bold line, it will cross the largest circleand may end up in missing target objects. Thus, the boldline presents the minimum possible expansion for the linevivj in order to include all possible nearest target objectsthat are closer than ti and tj . Applying the same proof forthe four edges of the area AEXT concludes that AEXT isthe smallest possible area that contains all possible nearesttarget objects given a set of target filters objects.

5.2 Private Queries over Private DataIn this section, we extend our approach for private queries

over public data to deal with private data that are repre-sented by cloaked regions rather than by exact locations.Similar to Section 5.1 and without loss of generality, we fo-cus on private nearest-neighbor queries over private data.

5.2.1 Algorithm for Nearest-Neighbor QueriesThe same idea of Algorithm 2 can be applied for private

data with the following changes:STEP 1: The filter step. Similar to the filtering step in

Algorithm 2, the four filter target objects are chosen as thenearest target object ti to each vertex vi. The only differenceis that for each vertex vi, we consider that the exact locationof a target object within its cloaked area is the furthestcorner from vi. Figure 7a shows only the nearest four targetobjects t1, t2, t3, and t4 to the cloaked area. Target objectsare drawn as regions since they represent private data. Forclarity, we draw these rectangles smaller in size than thequery area. However, this is not a necessary condition forour approach. The dotted line from each vertex vi to itsnearest target ti represents the distance that we measure indetermining the nearest target.

STEP 2: The middle point step. The main idea issimilar to that of Algorithm 2. The only difference is thatthe line Lij that connects ti and tj would consider the fur-thest corner of ti and tj from the reverse vertex vj and vi,respectively. Figure 7a depicts this step. For example, con-sider edge v3v4, the line L34 connects the furthest corner oft4 from vertex v3 to the furthest corner of t3 from v4.

vi

vj

tij

(a) ti = tj

vi v

jmij

tj

ti

(b) ti 6= tj

Figure 8: Private queries over private data

STEP 3: The extended area step. The main idea issimilar to that of Algorithm 2. However, the three distancesthat are taken into consideration are di as the distance fromvi to the furthest corner of ti from vi, dj as the distancefrom vj to the furthest corner of tj from vj , and dm as thedistance from mij to any of the end points of Lij . Figure 7bdepicts such distances. The bold lines represent the largestdistance computed for each edge. The arrow lines representthe maxd distance used to expand the area A.

STEP 4: The candidate list step. This step is exactlysimilar to that of Algorithm 2. To accommodate data areasrather than exact point locations, we will return any tar-get object that has an area overlapped with AEXT . Moresophisticated techniques of probabilistic queries and datauncertainty can take place. For example, we may choose toreturn only those target objects that have more than x%of their cloaked areas overlap with AEXT . Our approach iscompletely independent from deciding whether an object isconsidered within an area or not. Thus, our approach can beseamlessly integrated with any approaches for probabilisticquery processing (e.g., [10, 11, 28, 38, 42, 46]).

5.2.2 Proof of Correctness

Theorem 3. Given a cloaked area A for a user u locatedanywhere within A and a set of target objects representedby their cloaked areas, Algorithm 2 with the modifications inSection 5.2.1 returns a candidate list that includes the exactnearest target to u.

Proof. The proof is similar to that of Theorem 1 withthe difference of dealing with rectangular areas of the targetobjects rather than with exact point locations. Figures 8aand 8b show the two cases of ti = tj and ti 6= tj , respec-tively. As in the proof of Theorem 1a, in Figure 8a, anytarget object that could be a nearest neighbor to any pointon vivj should be inside the union area of the dotted andsolid circles. Thus, it would be returned with the candidatelist. Similarly, in Figure 8b, any target object that couldbe nearest to any point pu in vivj should overlap the unionof the dotted and bold circles if pu is in vimij or overlapwith the union of the bold and solid circles if pu is in mijvj .Thus, it would be returned within the candidate list.

Theorem 4. Given a cloaked area A for a user u and aset of filter objects t1 to t4 represented by their cloaked areas,Algorithm 2 with the modification in Section 5.2.1 issues theminimum possible range query to get the candidate list.

Proof. The proof is exactly the same as that of Theo-rem 2. The tangent bold lines in Figures 8a and 8b serve thesame purpose as those tangent lines in Figures 6a and 6b

770

Figure 12: Result of steps 1 and 2. t1 to t4are the location regions of the selected filterobjects. m12, m13, m34 and m24 are the se-lected middle points.

Step 3: extended area To determine the dis-tance that is used to move the borders to cre-ate the extended area, the maximum of twoor three distances is used. For the distancebetween a corner cu,i and its filter object, thelargest distance between the corner cu,i andany of the corners cT,j is used. The thirddistance used, is the one between the middlepoint mi,j and any of the endpoints of lineLi,j – which give an equal distance, as pointmi,j is on the perpendicular bisector betweenthese points.

v1

v2

v3

v4

t3

t4

t2

t1

m34 m

24

m12

m13

(a) Server side

v1

v2

v3

v4

t3

t4

t2

t1

m34 m

24

m12

m13

(b) Server side

Figure 7: Private query over private data.

Theorem 2. Given a cloaked area A for a user u and aset of filter target objects t1 to t4, Algorithm 2 issues theminimum possible range query to get the candidate list.

Proof. In Figures 6a and 6b, the tangent point betweenthe bold line and the largest circle may contain a target thatis nearest to one location in vivj (point vj in Figure 6a andpoint mij in Figure 6b). Thus, if there is another line thatis lower than the bold line, it will cross the largest circleand may end up in missing target objects. Thus, the boldline presents the minimum possible expansion for the linevivj in order to include all possible nearest target objectsthat are closer than ti and tj . Applying the same proof forthe four edges of the area AEXT concludes that AEXT isthe smallest possible area that contains all possible nearesttarget objects given a set of target filters objects.

5.2 Private Queries over Private DataIn this section, we extend our approach for private queries

over public data to deal with private data that are repre-sented by cloaked regions rather than by exact locations.Similar to Section 5.1 and without loss of generality, we fo-cus on private nearest-neighbor queries over private data.

5.2.1 Algorithm for Nearest-Neighbor QueriesThe same idea of Algorithm 2 can be applied for private

data with the following changes:STEP 1: The filter step. Similar to the filtering step in

Algorithm 2, the four filter target objects are chosen as thenearest target object ti to each vertex vi. The only differenceis that for each vertex vi, we consider that the exact locationof a target object within its cloaked area is the furthestcorner from vi. Figure 7a shows only the nearest four targetobjects t1, t2, t3, and t4 to the cloaked area. Target objectsare drawn as regions since they represent private data. Forclarity, we draw these rectangles smaller in size than thequery area. However, this is not a necessary condition forour approach. The dotted line from each vertex vi to itsnearest target ti represents the distance that we measure indetermining the nearest target.

STEP 2: The middle point step. The main idea issimilar to that of Algorithm 2. The only difference is thatthe line Lij that connects ti and tj would consider the fur-thest corner of ti and tj from the reverse vertex vj and vi,respectively. Figure 7a depicts this step. For example, con-sider edge v3v4, the line L34 connects the furthest corner oft4 from vertex v3 to the furthest corner of t3 from v4.

vi

vj

tij

(a) ti = tj

vi v

jmij

tj

ti

(b) ti 6= tj

Figure 8: Private queries over private data

STEP 3: The extended area step. The main idea issimilar to that of Algorithm 2. However, the three distancesthat are taken into consideration are di as the distance fromvi to the furthest corner of ti from vi, dj as the distancefrom vj to the furthest corner of tj from vj , and dm as thedistance from mij to any of the end points of Lij . Figure 7bdepicts such distances. The bold lines represent the largestdistance computed for each edge. The arrow lines representthe maxd distance used to expand the area A.

STEP 4: The candidate list step. This step is exactlysimilar to that of Algorithm 2. To accommodate data areasrather than exact point locations, we will return any tar-get object that has an area overlapped with AEXT . Moresophisticated techniques of probabilistic queries and datauncertainty can take place. For example, we may choose toreturn only those target objects that have more than x%of their cloaked areas overlap with AEXT . Our approach iscompletely independent from deciding whether an object isconsidered within an area or not. Thus, our approach can beseamlessly integrated with any approaches for probabilisticquery processing (e.g., [10, 11, 28, 38, 42, 46]).

5.2.2 Proof of Correctness

Theorem 3. Given a cloaked area A for a user u locatedanywhere within A and a set of target objects representedby their cloaked areas, Algorithm 2 with the modifications inSection 5.2.1 returns a candidate list that includes the exactnearest target to u.

Proof. The proof is similar to that of Theorem 1 withthe difference of dealing with rectangular areas of the targetobjects rather than with exact point locations. Figures 8aand 8b show the two cases of ti = tj and ti 6= tj , respec-tively. As in the proof of Theorem 1a, in Figure 8a, anytarget object that could be a nearest neighbor to any pointon vivj should be inside the union area of the dotted andsolid circles. Thus, it would be returned with the candidatelist. Similarly, in Figure 8b, any target object that couldbe nearest to any point pu in vivj should overlap the unionof the dotted and bold circles if pu is in vimij or overlapwith the union of the bold and solid circles if pu is in mijvj .Thus, it would be returned within the candidate list.

Theorem 4. Given a cloaked area A for a user u and aset of filter objects t1 to t4 represented by their cloaked areas,Algorithm 2 with the modification in Section 5.2.1 issues theminimum possible range query to get the candidate list.

Proof. The proof is exactly the same as that of Theo-rem 2. The tangent bold lines in Figures 8a and 8b serve thesame purpose as those tangent lines in Figures 6a and 6b

770

Figure 13: Result of step 3. The bold borderis the border of the extended area.

Step 4: candidate list Given the extended area,all target objects T that have a cloaked areathat overlaps with the extended area, havea probability of more than 0 of being thenearest-neighbor and can be returned to theuser u as part of the candidate list.

4.3 Correctness

For each of the algorithms described in sections 4.1and 4.2, two theorems are given in [8]:

Completeness Given a cloaked area A for a useru located anywhere within A, the algorithmreturns a candidate list that includes the ex-act nearest target to u.

Minimum number of results Given a cloakedarea A for a user u and a set of filter tar-get objects t1 to t4, the algorithm issues theminimum possible range query to get the can-didate list.

These two theorems can be written more for-mally as, where it should be noted that the phras-ing “minimum possible range query” can be inter-preted as requiring the AEXT area to be rectangle-shaped, such that it can be expressed using bounds(xl, xr, yu, yl):

Completeness ∀(x, y) ∈ A : (∃t ∈ AEXT : (@t′ ∈objT : D(t′,P(x, y)) < D(t,P(x, y))))

Minimum number of results ∀t ∈ AEXT :(∃(x, y) ∈ A : (@t′ ∈ objT : D(t′, (x, y)) <D(t,P(x, y))))

Although [8] contains a proof for both theorems,which are correct for the interpreted requirement onthe shape of AEXT , we have actually determinedthat the proof for the minimum result theorem isincorrect if no restrictions on the shape of AEXTapply. We can even provide a counter-example us-ing the figures in section 4.1 to demonstrate thatthe theorem itself isn’t valid without this restric-tion. Note that this restriction hasn’t explicitlybeen stated in [8]. Although this counter-exampleinvalidates the theorem, it’s impact is only smalland the presented algorithm is very useful never-theless.

The example figures used in sections 4.1 alreadycontain a target object T21 that can be used as acounter-example. The object lies within the ex-tended area (T21 ∈ AEXT ), while it can never bethe nearest object, since for all points (x, y) be-tween v1 and v2, and therefore all points (x, y) inthe dark area A, the distance to T16 is smaller thanthe distance to T21: ∀(x, y) ∈ A : D(P(x, y), T16) <D(P(x, y), T21).

5 Social location privacy

The privacy enhancing technologies presented inthe previous sections describe situations where auser wants to query a database containing locationinformation about objects of a specific kind, i.e. thenearest neighbor query. In all cases – except forthe peer-to-peer solution described in section 3.4– there is an intermediate system that knows the

9

exact location of users and has the task of trans-forming messages in such a way that they can berelayed to an LBS provider without revealing useridentification and exact location information.

Zhong, et al., take a different approach in [9],where they describe three protocols that are capa-ble of conditionally revealing the exact location oftwo users of a system to each other, without reveal-ing this information to a third party, like a socialnetwork provider. The three protocols take a differ-ent approach and therefore result in a different kindof information to be revealed, but have a commonproperty that specific location information is onlyrevealed when the distance between both users iswithin a specified range.

5.1 The Louis Protocol

The Louis protocol is described in section 4 of [9]and consists of two phases. In the first phase, Al-ice and Bob are able to learn whether the distancebetween them is within a certain range, while anoptional run of phase two allows them to revealtheir exact locations. The protocol makes use ofa third party, which will not learn the locations ofeither Alice or Bob.66 G. Zhong, I. Goldberg, and U. Hengartner

Trent

6(u,v)(x,y)

5

1

2 AliceBob

3

4

Fig. 1. System model of the Louis protocol. The dashed arrows indicate the optionalsecond phase.

friendship, so they are less likely to misbehave. We discuss the detection ofmisbehaviour by Alice or Bob, and of cheating by the third party Trent insection 4.3.

4.1 Protocol Description

We assume that a location can be mapped to two-dimensional coordinates and thatthe mapping is known to Alice and Bob. Let Alice’s location be (x, y) and Bob’sbe (u, v). By the definition above, they are nearby if

√(x − u)2 + (y − v)2 < r.

Equivalently, we can check the sign of d = (x − u)2 + (y − v)2 − r2. In particular,Bob is near Alice if d < 0.

Figure 1 presents the two communication channels used in our system model.The first is between Alice and Bob, and the second is between Alice and Trent.Alice also acts as a relay of the communication between Bob and Trent. Thebenefit of this approach is to hide Bob’s identity from Trent, thus improvingprivacy. We assume that the two secure communication channels are set upbefore our protocol begins.

The protocol consists of two phases. The first phase lets Alice determinewhether Bob is nearby. If this is the case, the (optional) second phase lets Aliceand Bob learn each other’s locations. In our protocol, EA(·) is the Paillier addi-tive homomorphic encryption function using Alice’s public key, ET (·) is a (non-homomorphic) public-key encryption function using Trent’s public key, H(·) isa cryptographic hash function, sigA(m) is Alice’s signature on message m, andsimilarly with sigT (m).

1. First phase: Alice determines her location (x, y) and her desired radius r,and picks a random salt sA.Alice→Bob: EA(x2 + y2), EA(2x), EA(2y), r, H(x ‖ y ‖ sA)

2. Bob checks the value of r. If he thinks r is too large, he aborts the protocol.Otherwise, he determines his location (u, v), picks a random value k andcomputes

EA(d + k) =EA(x2 + y2) · EA(u2 + v2) · EA(k)

(EA(2x))u · (EA(2y))v · EA(r2),

Bob also chooses a random salt sB.Bob→Alice: EA(d + k), ET (k), H(u ‖ v ‖ sB), H(k).

3. Alice decrypts EA(d + k).Alice→Trent: d + k, ET (k), sigA(d + k), sigA(ET (k))

Figure 14: System model of the Louis protocol.Alice and Bob want to share location data, whileTrent is a third party that won’t learn anythingabout the location of Alice and Bob. The solid ar-rows indicate protocol messages for the first phase,the dashed arrows indicate message for the secondphase.

Define (x, y) as the location of Alice and (u, v)as the location of Bob. The distance between bothindividuals is then given by

√(x− u)2 + (y − v)2.

Testing that this is below a certain maximum ranger can be done efficiently by checking whether d =(x − u)2 + (y − v)2 − r2 < 0. The goal of thefirst phase of the protocol is to perform this check,without revealing both (x, y) and (u, v) to a singleparty.

In the protocol description, PA(·) is the Pail-lier encryption function [10] using the public keyof Alice, RT(·) is a public-key encryption func-tion (say, RSA) using the public key of Trent,H(·) is a cryptographic hash function, sigA(m) isa signature by Alice on message m and similarlysigT(m) is a signature by Trent. Note that the Pail-lier encryption function is non-deterministic and

that is has the following homomorphic property:PA(m1 +m2) = PA(m1) · PA(m1).

The messages below form the protocol; the firstfour message form phase one and the two remain-ing messages form phase two. Both the connectionbetween Alice and Bob and the one between Aliceand Trent are assumed to be confidential.

A→ B PA(x2 + y2), PA(2x), PA(2y), r,H(x‖y‖sA)

B → A PA(d+ k), RT(k), H(u‖v‖sB), H(k)

A→ T d+ k, RT(k), sigA(d+ k), sigA(RT(k))

T → A answer,sigT(answer ‖ sigA(d+ k)‖ sigA(RT(k)))

A→ B answer, d + k, sigA(d + k), sigA(RT(k)),sigT(answer ‖ sigA(d+k)‖ sigA(RT(k))), x, y,sA

B → A u, v, sB , k

sA is a random salt chosen by Alice and sBis one chosen by Bob; r is the desired maximumradius chosen by Alice; k is a random value se-lected by Bob. After receiving the first message,Bob can decide to abort the protocol if he doesn’tlike the value of r. Note that Bob is able to com-pute PA(d + k) using the homomorphic propertyof the Paillier encryption function, using the valuesPA(x2 + y2), PA(2x) and PA(2y) he received fromAlice and the values PA(u2 + v2), PA(2u), PA(2v),PA(k) and PA(r2) he computes himself:

PA(d+ k)

= PA((x− u)2 + (y − v)2 − r2 + k)

= PA(x2 + u2 − 2xu+ y2 + v2 − 2yv − r2 + k)

=PA(x2 + y2) · PA(u2 + v2) · PA(k)

(PA(2x))u · (PA(2y))v · PA(r2)

After receiving the third message, Trent can de-crypt RT(k) and subtract it from d+k to obtain d.If d < 0, Trent sets the answer to ‘YES’. Trent setsthe answer to ‘NO’ otherwise. After receiving theforth message, Alice knows whether the distance toBob is smaller than r and can decide whether or notto proceed with the second phase of the protocol.

In the second phase, Alice reveals her locationto Bob and sends him all information that he needsto verify her location and the answer from Trent.Bob can now decide to reveal his location to Alice.By including his salt and random k, Alice can verifythe protocol run.

10

5.1.1 Louis protocol analysis

Assuming an honest protocol run, after the firstphase neither party will not learn the exact loca-tion of any of the other parties involved. Trent willlearn nothing about the locations of Alice or Boband he won’t even learn about the distance betweenthem. The only thing he will learn is the differencein distance and radius of the circle that is recog-nized as “nearby”. Although Alice nor Bob willlearn this difference in value, Trent does sent thesign of this value to Alice, so she’ll learn whetherthe distance to Bob is above or below the radiusr that she came up with. After the second phase,both Alice and Bob will learn each others exactlocations, while Trent will not learn any new infor-mation.

When one or more of the parties involved in theprotocol are dishonest, some interesting conclusionscan be drawn. The hash values used in the mes-sage exchange ensure that both parties can verifythat the locations revealed in the second phase arethe same as those committed to in the first phase.However, the locations that are reported by Aliceand Bob can in no way be verified by this protocol.Therefore, the easiest attacks possible by both Al-ice and Bob is to use incorrect location data duringthe entire protocol run. Such behavior defeats thepurpose of this protocol and may damage the socialrelation between both parties.

One interesting attack Alice can perform is toexecute the first phase of the protocol several times,using different faked values for her location. Thisway, she can probe Bob for being at a set of likelylocations. Bob can prevent such an attack by refus-ing to take part of the protocol when it is invokedmultiple times within a short timeframe.

The protocol has no way to prevent or detectAlice or Bob to collude with Trent. When Aliceand Trent collude, they are able to determine thedistance between Alice and Bob:

√d+ k − k − r2.

When Bob and Trent collude, they can in the sameway determine the distance between Alice and Bob,but it’s even more powerful to have Trent always re-turn ‘YES’, in order to persuade Alice to reveal herexact location to Bob, after which Bob can abortthe protocol.

5.2 The Lester Protocol

The Louis protocol described in section 5.1 requiresa semi-trusted third party. Although this thirdparty learns only the difference between the dis-tance between Alice and Bob and the requestedthreshold radius, Alice and Bob may be interestedin using a protocol that doesn’t have a dependency

on a third party. The Lester protocol presentedin section 5 of [9] removes the need for the thirdparty, at the cost of only disclosing the distancebetween the two users – instead of the exact loca-tion(s) – to Alice and disclosing nothing to Bob. Itis noted, however, that the protocol can be run asecond time with the roles reversed to perform amutual exchange of information.

In the protocol description, a ∈ Zq is the pri-vate key of Alice, b is the private key of Bob andCA(m) = (ca, cb) = (gr (mod p), Ar+m (mod p)) isthe CGS97 encryption function [11], using a gen-erator g ∈ Zp, a random value r ∈R Zq and thepublic key of Alice A = ga. Alice and Bob canboth compute C = Ab = Ba, like Diffie-Hellmankey exchange. Like the Paillier encryption func-tion, the CGS97 encryption function is both non-deterministic and homomorphic: CA(m1 + m2) =(c1,a · c2,a, c1,b · c2,b). CGS97 also has the propertythat decryption involves computing a discrete loga-rithm and therefore takes O(

√M), where M is the

number of possible plaintext messages, when usingthe Pollard lambda method described in [12].

As before, (x, y) defines the location of Alice and(u, v) the location of Bob. D = (x−u)2 +(y−v)2 isthe squared distance between Alice and Bob. Theprotocol requires two messages to be exchanged:

A→ B CA(x2 + y2), CA(2x), CA(2y)

B → A t, CA(b · (D · 2t + s))

In the second message, Bob introduces a ran-dom salt s of length t. The length t determinesthe amount of work Alice has to do to successfullydecrypt the message. Note that Bob is able to com-pute CA(b·(D ·2t+s)) using the homomorphic prop-erty of CGS97, using CA(2tb ·D) and CA(b ·s) as in-put ciphertexts. Bob can compute CA(2tb ·D) usingthe values CA(x2 + y2), CA(2x) and CA(2y) he re-ceived from Alice and the values CA(u2+v2), CA(2u)and CA(2v) he computes himself. In the derivationbelow, the · operator is overloaded as pairwise mul-tiplication operator when used with two CGS97 ci-phertext as operands:

CA(2tb · ((x− u)2 + (y − v)2))

= CA(2tb · (x2 + u2 − 2xu+ y2 + v2 − 2yv))

= CA(x2 + u2 − 2xu+ y2 + v2 − 2yv)2tb

=(CA(x2 + y2) · CA(u2 + v2)

(CA(2x))u · (CA(2y))v

)2tb

Using CA(b·(D ·2t+s)) = (gr, Ar+(b·(D·2t+s))) =(gr, ArAb·(D2t+s)) andAr = gar, Alice can computeAb·(D·2

t+s) = CD·2t+s from the received message.

11

If Alice wants to determine whether the distanceto Bob is less than some threshold r, she tries tosolve the discrete logarithm by finding D · 2t + sin the range [0, r22t]. Using the Pollard lambdamethod [12], this takes O(

√r22t) = O(2t/2r) time.

Notice that this is linear in the radius r chosen byAlice and exponential in the work factor t chosenby Bob.

If Alice succeeds in solving the discrete loga-rithm, she can obtain D from D · 2t + s by a shift-ing the value by t bits (or performing an integerdivision by 2t) and she can compute the distancer̂ =√D to Bob. If Alice fails to solve the discrete

logarithm, she can conclude that the distance r̂ toBob is larger than r.

5.2.1 Lester protocol analysis

In the Lester protocol, only Alice and Bob partici-pate, so there is no third party Trent that can learnanything about the locations of Alice or Bob or cancollude with any of them. Like the Louis protocol,the Lester protocol cannot enforce Alice and Bob touse their real location in the message exchange, butneglecting to do so may damage the social relationbetween them.

Regarding the protocol, it is interesting to notethat it is difficult for Bob to determine a correctvalue for t. A value too high may prevent Alicefrom successfully learning the distance between herand Bob, while a value too low will allow Alice todetermine the distance in a situation where it islarger than Bob is comfortable to reveal.

Since the time to solve the discrete logarithm islinear in the probed radius, Alice can easily doubleher effort in an attempt to double her search radius.Another effect is that Bob must know what kind ofequipment Alice is using in order to determine asuitable value for t and failure to do so introducesa mismatch between his privacy requirements andthe ability for Alice to find his location. For exam-ple, if Bob chooses a t that allows Alice to find himwith a mobile device within a certain range, shecan probably find him within the same timeframein a much larger area, using her personal computer.We believe this critical property makes the Lesterprotocol unsuitable for real-world scenarios.

5.3 The Pierre Protocol

One disadvantage of the Lester protocol describedin section 5.2 is that Alice can decide to do morework, to discover the distance between her and Bobfor a longer distance. For example, if Bob decideson a certain factor that allows Alice to determinethe distance within 500 meters, Alice can choose

to do twice the amount of work and determine thedistance between them, even when the distance isbetween 500 meters and one kilometer.

In section 6 of [9], the Pierre protocol is de-scribed that doesn’t have this property and cantherefore give Bob more confidence in his privacy.The downside is that even less information is dis-closed using this protocol. Instead of a continuouscoordinate system, the Pierre protocol assumes agrid. Therefore, the coordinates of Alice will be-come (xr, yr) = (bxr c, b

yr c), with r being the size of

the grid cells.Louis, Lester and Pierre: Three Protocols for Location Privacy 71

Alice

D = 0

D = 1 D = 2

r

r r

Bob

Bob

Bob

Bob

D = 5r

D = 1r

Bob

r

Fig. 3. Grid distances in the Pierre protocol. The x and y distances between Alice andBob are measured in grid cells (integral units of r), and Dr = (Δxr)

2 + (Δyr)2. Alice

can determine whether Bob is in the dark grey, medium grey, or light grey area, butno more specific information than that.

in the threshold distance instead of linear. Instead of the CGS97 cryptosystem,the Boneh-Goh-Nissim cryptosystem [2] can be used. This protocol has the sameproperties (additive homomorphic; decryption takes O(

√M) time) as CGS97,

but also allows calculations of encryptions of quadratic functions, in addition tolinear ones. With this system, Bob could compute EA(D2 · 2t + s) for a randomsalt s between 0 and (2D + 1)2t − 1, and Alice’s work to find the distance toBob will be O(r2 · 2t/2).

6 The Pierre Protocol

Our third protocol, Pierre, solves the above problems with the Lester protocoland gives Bob more confidence in his privacy. On the other hand, if Alice andBob are nearby, the Pierre protocol will inform Alice of that fact, but will giveher much less information about Bob’s exact location.

6.1 Protocol Description

In this protocol, Alice picks a resolution distance r, roughly analogous to thethreshold distance r in the previous protocols. Alice and Bob then express theircoordinates in (integral) units of r; that is, if Alice’s true position is (x, y), thenfor the purposes of this protocol, she will use coordinates (xr , yr) = (�x

r �, � yr �),

and similarly for Bob. This has the effect of dividing the plane into a grid, andAlice and Bob’s location calculations only depend on the grid cells they are in;see Figure 3.

Figure 15: Grid coordinates used as in the Pierreprotocol. Notice that the squared distance Dr

equals 0 when Alice and Bob are in the same cell,equals 1 when they’re in adjacent cells and equals2 when they are in diagonally touching cells.

The protocol again relies on the use of a homo-morphic encryption function and the authors of [9]state that both Paillier [10] and CGS97 [11] can beused. In the description below, the notation fromsection 5.2 – and therefore the CGS97 function – isused.

A→ B r, CA(x2r + y2

r), CA(2xr), CA(2yr)

B → A CA(ρ0·Dr), CA(ρ1·(Dr−1)), CA(ρ2·(Dr−2))

The values ρ0, ρ1 and ρ2 are random elementsof Z∗p and Dr = (xr − ur)2 + (yr − vr)2 is thesquared distance between Alice and Bob. Bob cancompute the messages he must send, because theprotocol uses a homomorphic encryption function,he learned CA(x2

r + y2r), CA(2xr) and CA(2yr) from

Alice and can compute CA(u2r + v2

r), CA(2u) andCA(2v) himself:

CA(ρi · ((xr − ur)2 + (yr − vr)2)− i))

=CA(x2

r + u2r − 2xrur + y2

r + v2r − 2yrvr)ρi

CA(iρi)

=

(CA(x2

r+y2r)·CA(u2

r+v2r)(CA(2xr))ur ·(CA(2yr))vr

)ρi

CA(iρi)

12

Alice can now check whether one of the three de-crypted values equals zero: if Alice and Bob are inthe same grid cell, Dr = 0 and therefore ρ0 ·Dr = 0,if Alice and Bob are in adjacent cells, Dr = 1 andtherefore ρ1 · (Dr − 1) = 0 and if Alice and Bob arein diagonally touching cells, Dr = 2 and thereforeρ2 · (Dr − 2) = 0. The authors note that check-ing whether the plaintext of (c1, c2) = (gr, Ar+m)equals zero using CGS97 doesn’t require decryp-tion, since Alice can just check whether ca1 = c2.

5.3.1 Pierre protocol analysis

Like the Lester protocol, the Pierre protocol doesn’trequire a third party and is unable to verify thatthe location data used by the participants of theprotocol is correct.

Since the three values returned by Bob includea random element, Alice is unable to obtain anyinformation about the location of Bob relative toher own, unless one of the three messages decryptsto zero. There is no information that Bob learnsabout the location of Alice.

6 Conclusion

Regarding privacy in location based services, fourmain parts have been looked into in this paper:we did a small-scale interview to gauge the usageof LBS and privacy concerns of students, we dis-cussed several algorithms that can cloak the loca-tion of a user using k-anonymity as main privacyconstraint, we discussed an algorithm that allowsan LBS provider to resolve nearest neighbor querieswhen the location of the user is cloaked and threeprotocols have been discussed that allow locationinformation to be exchange in a social setting.

6.1 Interview

In our interview, we questioned 69 students of theRadboud University. The most interesting resultof this interview is that students show concerns re-garding privacy when using an LBS, but seem torate this privacy less important than the right ofgovernment officials or an employer to ask peopleabout their whereabouts. As this is contrary towhat we expected, this can be an interesting topicto do further research. It should be noted that thegroup we questioned isn’t a representation of soci-ety and more accurate results can only be obtainedby redoing the interview.

For a previous version of this paper, we havebeen doing a different interview and we can con-clude that the questions in this newer version have

been more carefully designed. In the previous in-terview, we were asking people whether they knowabout the possible consequences of using an LBS,without explaining those. Therefore we were rely-ing on the self-judgement of people to decide howwell their knowledge about this topic is. In thisnewer version, we first used a question to makepeople agree on the meaning of an LBS by givingexplicit devices and services to think about.

After establishing this agreement, the questionsof section 3 were used to determine how concernedpeople are, without actually informing them aboutthe dangers that researchers talk about, while thequestions of section 4 were designed to gauge howconcerned people really are about their privacy byframing an explicit scenario.

6.2 Discussion of protocols

Using k-anonymity to quantify the level of pri-vacy a user needs, several protocols have been de-scribed that allow cloaking of the exact location ofa user when using an LBS. We have seen a straight-forward approach by Gruteser, et al. [5] whichkeeps splitting areas to the lowest possible size thatmeets the k-anonymity constraint. The disadvan-tage of this method is that a single value for kmin isused in the system and the fact that a fundamentalflaw has been described that can breach privacy.

One attempt to overcome the limitation of usinga single kmin value is the CliqueCloak algorithm byGedik, et al. [6], allowing the sender of a messageto specify a different kmin value for each individualmessage. Like the Gruteser algorithm, CliqueCloakrelies on a central anonymizing proxy, which is aproperty that Chow, et al. have tried to overcomein a peer-to-peer algorithm [7].

Since the papers that introduce each algorithmdon’t use a common method to test the perfor-mance and efficiency, it is impossible to quanti-tatively compare them. However, the flaw in theGedik algorithm has been replaced by a failure tosend a message to the LBS in the other two algo-rithms. From a privacy point of view we believethis is a better approach, but users probably per-ceive this as a deficiency of a system implementingsuch approach.

We believe that the requirement of setting uppeer-to-peer communications in the last algorithmmay be difficult and cause suboptimal grouping asa result of peer communication that are blocked byobstacles and therefore give a slight preference forthe CliqueCloak algorithm. Regarding this pref-erence, we want to note that mobile phone car-riers might be a suitable party to implement theanonymizing proxy, as they must already be aware

13

of the location of the client in order to provide cellrouting and are already bound by law to keep thisinformation private. However, proper comparativeresearch should be conducted to make a definitedecision.

In section 4 we described the query processingalgorithm introduces by Mokbel, et al. [8], whichcan be used by a LBS provider to construct a list oftarget objects that can be returned to a client witha cloaked location as a result of a nearest neigh-bor query. The completeness property of this algo-rithm proves the usefulness of the algorithm, whilethe property that it returns the minimum numberof result given a rectangular search area makes itsuitable for use with mobile clients. Although weidentify a minor flaw in this latter property, we be-lieve this algorithm is very useful in an LBS setting.

Finally, we discussed three protocols introducedby Zhong, et al. [9] that can be used in a socialsetting. The Louis protocol makes use of a semi-trusted third party and includes message hashesthat allow the participating parties to verify thecommitted location values after the protocol hasfinished. The Lester protocol removes the need forthe third party, at the price of revealing less infor-mation to the participating parties. We believe theproperty of this protocol that finding the distancebetween the participating parties has a complexitylinear to the search range to be a very importantflaw in the protocol. Finally, the Pierre protocolagain reveals less information, but has much strictercontrol over what information is revealed comparedto the Lester protocol, while also not depending ona third party.

References

[1] Latanya Sweeney. k-anonymity: a model forprotecting privacy. Int. J. Uncertain. Fuzzi-ness Knowl.-Based Syst., 10:557–570, October2002.

[2] Latanya Sweeney. Uniqueness of Simple Demo-graphics in the U.S. Population. LIDAP-WP4Carnegie Mellon University, Laboratory forInternational Data Privacy, Pittsburgh, PA:2000, 1000.

[3] Group Insurance Commission. Testimony be-fore the massachusetts health care committe,

1997. See Session of the Joint Committee onHealth Care.

[4] City of Cambridge Massachusetts. Cambridgevoters list database, Cambridge: February1997.

[5] Marco Gruteser and Dirk Grunwald. Anony-mous usage of location-based services throughspatial and temporal cloaking. In Proceedingsof the 1st international conference on Mobilesystems, applications and services, MobiSys’03, pages 31–42, New York, NY, USA, 2003.ACM.

[6] Bugra Gedik and Ling Liu. A customizable k-anonymity model for protecting location pri-vacy. In In ICDCS, pages 620–629, 2004.

[7] Chi yin Chow. A peer-to-peer spatial cloakingalgorithm for anonymous location-based ser-vices. In In: ACM GIS. (2006), pages 171–178. ACM Press, 2006.

[8] Mohamed F. Mokbel, Chi yin Chow, andWalid G. Aref. The new casper: Query pro-cessing for location services without compro-mising privacy. In In VLDB, pages 763–774,2006.

[9] Ge Zhong, Ian Goldberg, and Urs Hengarter.Louis, lester and pierre: Three protocols forlocation privacy. In Privacy Enhancing Tech-nologies, pages 62–76, 2007.

[10] Pascal Paillier. Public-key cryptosystemsbased on composite degree residuosity classes.In Advances in Cryptology — EUROCRYPT’99, volume 1592 of Lecture Notes in ComputerScience, pages 223–238. Springer Berlin / Hei-delberg, 1999.

[11] Ronald Cramer, Rosario Gennaro, and BerrySchoenmakers. A secure and optimally efficientmulti-authority election scheme. In Advancesin Cryptology — EUROCRYPT ’97, volume1233 of Lecture Notes in Computer Science,pages 103–118. Springer Berlin / Heidelberg,1997.

[12] J. M. Pollard. Monte carlo methods for in-dex computation. Mathematics of Computa-tion, 32(143):918–924, 1978.

14

location privacy - mark van cuijk

Documents