understanding identity exposure in pervasive computing environments

18
Pervasive and Mobile Computing 8 (2012) 777–794 Contents lists available at SciVerse ScienceDirect Pervasive and Mobile Computing journal homepage: www.elsevier.com/locate/pmc Understanding identity exposure in pervasive computing environments Feng Zhu a,, Sandra Carpenter b , Ajinkya Kulkarni a a Department of Computer Science, N 300 Technology Hall, The University of Alabama in Huntsville, Huntsville, AL 35899, United States b Department of Psychology, Morton Hall 333, The University of Alabama in Huntsville, Huntsville, AL 35899, United States article info Article history: Received 3 March 2010 Received in revised form 25 June 2011 Accepted 29 June 2011 Available online 18 July 2011 Keywords: Identity management Pervasive computing Privacy abstract Various miniaturized computing devices that store our identity information are emerging rapidly and are likely to become ubiquitous in the future. They allow private information to be exposed and accessed easily via wireless networks. When identity and context information is gathered by pervasive computing devices, personal privacy might be sacrificed to a greater extent than ever before. People whose information is targeted may have different privacy protection skills, awareness, and privacy preferences. In this research, we studied the following issues and their relations: (a) identity information that people think is important to keep private; (b) actions that people claim to take to protect their identities and privacy; (c) privacy concerns; (d) how people expose their identity information in pervasive computing environments; and (e) how our RationalExposure model can help minimize unnecessary identity exposure. We conducted the research in three stages, a comprehensive survey and two in-lab experiments. We built a simulated pervasive computing shopping system, called InfoSource. It consisted of two applications and our RationalExposure model. Our data show that identity exposure decisions depended on participants’ attitudes about maintaining privacy, but did not depend on participants’ concerns or security actions that they claimed to have taken. Our RationalExposure model did help the participants reduce unnecessary disclosures. © 2011 Elsevier B.V. All rights reserved. 1. Introduction Today we face serious challenges in maintaining our personal privacy, especially in pervasive computing environments, where various types of identity information are expressed in digital forms (e.g., health conditions via a wearable medical device, birth date on a driver’s license with a RFID tag). A major advantage to these digital formats is that people can easily expose or share identity information via wireless networks if they choose to do so. A disadvantage, however, is that the identity information might be acquired relatively easily without users’ consent via these pervasive computing devices. Without appropriate control over such exposure, pervasive computing environments could become pervasive surveillance systems [1]. Personal privacy, however, is neither static nor able to be simply defined by rules [2]. Instead, it may be understood as a dynamic process that continuously draws the boundaries between disclosure and concealment [3]. It is based on multiple factors such as contexts, preferences, knowledge, and experiences, going beyond secrecy and the concern of present exposure. More importantly, it includes the concerns of aggregated personal information and potential abusive uses of the information [4]. People may fall prey to a wide range of identity exposure threats such as identity theft and price discrimination. Some service providers collect more identity information than needed for a transaction; in fact, some collect as many as 100 pieces Corresponding author. E-mail addresses: [email protected], [email protected] (F. Zhu), [email protected] (S. Carpenter), [email protected] (A. Kulkarni). 1574-1192/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.pmcj.2011.06.007

Upload: ajinkya

Post on 30-Nov-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Understanding identity exposure in pervasive computing environments

Pervasive and Mobile Computing 8 (2012) 777–794

Contents lists available at SciVerse ScienceDirect

Pervasive and Mobile Computing

journal homepage: www.elsevier.com/locate/pmc

Understanding identity exposure in pervasive computing environmentsFeng Zhu a,∗, Sandra Carpenter b, Ajinkya Kulkarni aa Department of Computer Science, N 300 Technology Hall, The University of Alabama in Huntsville, Huntsville, AL 35899, United Statesb Department of Psychology, Morton Hall 333, The University of Alabama in Huntsville, Huntsville, AL 35899, United States

a r t i c l e i n f o

Article history:Received 3 March 2010Received in revised form 25 June 2011Accepted 29 June 2011Available online 18 July 2011

Keywords:Identity managementPervasive computingPrivacy

a b s t r a c t

Various miniaturized computing devices that store our identity information are emergingrapidly and are likely to become ubiquitous in the future. They allow private informationto be exposed and accessed easily via wireless networks. When identity and contextinformation is gathered by pervasive computing devices, personal privacy might besacrificed to a greater extent than ever before. People whose information is targetedmay have different privacy protection skills, awareness, and privacy preferences. In thisresearch, we studied the following issues and their relations: (a) identity information thatpeople think is important to keep private; (b) actions that people claim to take to protecttheir identities and privacy; (c) privacy concerns; (d) how people expose their identityinformation in pervasive computing environments; and (e) how our RationalExposuremodel can help minimize unnecessary identity exposure. We conducted the research inthree stages, a comprehensive survey and two in-lab experiments. We built a simulatedpervasive computing shopping system, called InfoSource. It consisted of two applicationsand our RationalExposuremodel. Our data show that identity exposure decisions dependedon participants’ attitudes about maintaining privacy, but did not depend on participants’concerns or security actions that they claimed to have taken. Our RationalExposure modeldid help the participants reduce unnecessary disclosures.

© 2011 Elsevier B.V. All rights reserved.

1. Introduction

Today we face serious challenges in maintaining our personal privacy, especially in pervasive computing environments,where various types of identity information are expressed in digital forms (e.g., health conditions via a wearable medicaldevice, birth date on a driver’s license with a RFID tag). A major advantage to these digital formats is that people can easilyexpose or share identity information via wireless networks if they choose to do so. A disadvantage, however, is that theidentity information might be acquired relatively easily without users’ consent via these pervasive computing devices.Without appropriate control over such exposure, pervasive computing environments could become pervasive surveillancesystems [1].

Personal privacy, however, is neither static nor able to be simply defined by rules [2]. Instead, it may be understoodas a dynamic process that continuously draws the boundaries between disclosure and concealment [3]. It is based onmultiple factors such as contexts, preferences, knowledge, and experiences, going beyond secrecy and the concern of presentexposure. More importantly, it includes the concerns of aggregated personal information and potential abusive uses of theinformation [4].

People may fall prey to a wide range of identity exposure threats such as identity theft and price discrimination. Someservice providers collectmore identity information than needed for a transaction; in fact, some collect asmany as 100 pieces

∗ Corresponding author.E-mail addresses: [email protected], [email protected] (F. Zhu), [email protected] (S. Carpenter), [email protected] (A. Kulkarni).

1574-1192/$ – see front matter© 2011 Elsevier B.V. All rights reserved.doi:10.1016/j.pmcj.2011.06.007

Page 2: Understanding identity exposure in pervasive computing environments

778 F. Zhu et al. / Pervasive and Mobile Computing 8 (2012) 777–794

of identity information from a user [5]. Third-party service providers such as ‘‘DoubleClick’’ (one of the largest Internetadvertising service providers) specialized in collecting, compiling, and analyzing users’ information [6]. Often, people donot know the potential negative consequences of their identity exposure [7]. Research shows that the combination of zipcode, birth date, and gender can uniquely identify 87% individuals in the United States [8], a fact of which most peopleare unaware. Identity exposure decisions are usually left to individual users who may not understand the risks associatedwith disclosure. Thus, in a pervasive computing environment, people may need to be educated about how their identityinformation may be acquired and used without their knowledge or consent.

People are highly concerned about privacy in general [9,10]. They differ in their levels of willingness to exposeinformation [9]. In an online shopping context their privacy exposure behaviors do not always match their privacypreferences [11]; peoplemay disclose evenwhen they report havingmore conservative attitudes about exposure. The goal ofthese studieswas to influence the design of Platform for Privacy Preferences Project (P3P)—a standard that enablesWeb sitesto express their privacy practices in a format that can also be retrieved automatically and interpreted easily by users’ Internetbrowsers [12]. In contrast, few studies have focused on privacy issues and disclosure in pervasive computing environments.It is unclear whether people are aware of the importance and sensitivity of various types of identity information, and howtheir specific concerns about identity exposure and privacy are related to actions that they have taken. In this paper, wediscuss our research on these issues.

Privacy issues raised in pervasive computing environments have not been adequately addressed [13]. Many expertsbelieve that the information privacy law does not yet effectively protect privacy andmay have systematic deficiencies [4,7].Several research projects have used policy-based approaches to protect privacy and prevent unnecessary identity exposurein pervasive computing environments [14–17]. However, complex policiesmay require users to have special skills to be ableto specify policies, and thus these policies suffer from usability issues [18]. We propose a game theoretic approach, calledRationalExposure, to make rational suggestions (automatically) to users in pervasive computing environments [19]. In thispaper, we describe the model’s effectiveness in helping people reduce the amount of risky and/or unnecessary informationthat they disclose. Themajor goals of our current research are (a) to ascertain the types of information that people arewillingto share with service providers and (b) to test the effectiveness of our software (i.e., RationalExposure) that alerts users topotential threats and offers suggestions for behaviors that are alternatives to information exposure.

The main contributions of the paper are our analyses that include users’ concerns, the actions they claim to have takenfor privacy protection, the identity elements that they think are important, their exposure behaviors in pervasive computingenvironments, and the effectiveness of the suggestions generated by our RationalExposuremodel. ‘‘RationalExposure’’ refersto both the theoretical model of how tomake low-risk decisions aboutwhich information to disclose, as well as the softwaredesigned to reduce information exposure. To the best of our knowledge, this is the first paper that provides a detailedanalysis and discussion of identity exposure and the relationships among users’ attitudes, concerns, and behaviors relevantto disclosure of private information.

We conducted three stages of research. In the first stage, we used an online survey to ask participants about theirperceptions related to identity exposure: the importance of keeping different types of identity information private, theirprivacy concerns, and actions that they had taken to protect their privacy. There were 229 participants who completed thissurvey. In the second stage, we conducted an in-lab experiment with follow-up surveys to study people’s identity exposurepreferences and behaviors. We implemented two applications that simulated in-store CD shopping and checkout processes.A different group of 100 participants provided data in this stage. In the third stage, we conducted another in-lab experimentto study 56 (different) participants’ identity exposure behaviors when multiple, less sensitive types of identity informationwere requested.

Our statistical analyses of the data show that participants reported being highly concerned about privacy in general, andthey claimed that they engaged in a variety of actions to protect their privacy. In the second and third stages, participantswere asked to provide several pieces of identity information. Although most participants had similar opinions aboutwhich identity information was important to keep private, few of them actually protected their identity information inthe pervasive computing environment by themselves (as evidenced in the control conditions of the experiments). In theexperimental conditions of the second and third stages, our RationalExposure software suggested that participants notexpose identity information, or provide fake information, respectively. These suggestions helpedmost participantsminimizetheir identity exposure, but participants’ exposure decisions also depended on the degree to which they thought theinformation was important to keep private. Exposure behaviors, in both the experimental and control conditions, however,were independent of participants’ stated privacy concerns and the actions they claimed to have taken to increase securityand privacy.

1.1. Our definition of identity elements

An abstract definition of identitymay be found in theOxfordDictionary as ‘‘the fact of beingwho orwhat a person or thingis’’. Marx classifies identity elements into seven categories [20]: a person’s legal name, address, unique symbols (alphabeticor numerical) to identify a person, pseudonyms that cannot be linked back to a person, a person’s distinctive appearanceor behavior patterns, social categorization (such as gender, ethnicity, religion, etc.), and possession of knowledge (such aspassword and secret codes). At the dawn of the post-PC era, when pervasive computing has become more prevalent, weprovide a definition that includes both the traditional definition and the influences of emerging computer technologies.

Page 3: Understanding identity exposure in pervasive computing environments

F. Zhu et al. / Pervasive and Mobile Computing 8 (2012) 777–794 779

We define an identity element as a component of a person’s identity. An ‘‘identity’’ may include one (e.g., social securitynumber) ormultiple identity elements (e.g., name, address, physical characteristics on adriver’s license). An identity elementis a characteristic that differentiates some people from others. It may be an element to identify who you are (e.g., eye color),what you have (e.g., a sports car owner), what you like (e.g., your favorite sports), or where you are (e.g., via a ‘‘foursquare’’app to report one’s locations). Some elements may never change (e.g., finger print), whereas others may change (e.g., phonenumber), or be constantly changing (e.g., current location).

1.2. Emerging privacy challenges in mobile commerce and our experimental scenario

It is projected that the payment of bills viamobile devices will grow, worldwide, from $170 billion in 2010 to $630 billionby 2014 [21]. In addition, more andmore commodities are attached with RFID tags. Via near field communication (NFC), thedata on the RFID tags can be read easily. In the near future, it is likely that smartphones will be equipped with RFID readersand other NFC devices to facilitate users’ interactions with service providers.

The following scenario provides a ‘‘peek into the near future’’, and lays the foundation for how pervasive computingmight be used to elicit identity elements from users.

Michelle sees a new CD release from a popular singer at a Best Buy store. She uses her cell phone with an embeddedRFID reader to read the RFID tag on the CD. Using the URL emitted from the RFID tag, her cell phone shows a webpage. Via the web page, Michelle can experience a sample music video and read other information about the CD. Shelikes the CD and decides to buy it. She goes to the checkout counter and uses her cell phone to pay for the CD. Her cellphone contains and controls all of her digital identification tokens and bank cards. She selects a credit card to pay forthe CD and completes the transaction.

This scenario displays how pervasive computing can aid the interaction between users and service providers and alsoindicates how related devices may contain sensitive information that users may want to keep private. In the experimentalphases of our research (stages two and three), we employ a similar scenario to investigate users’ disclosure of identityelements to service providers.

The remainder of the paper is structured as follows. We describe related work in Section 2. In Section 3, we describe ourexperimental design, illustrate statistical analyses, and present our key findings. Then, in Section 4, we discuss the lessonsthat we learned while conducting the research. Last, in Section 5, we outline our future work and conclude by highlightingthe contributions of our current work.

2. Related work

Related studies on privacy attitudes and identity exposure behaviors. Early work on privacy attitudes may be found inWestin’s studies (e.g., general privacy [22], medical information, and computer fear [23]). His surveys over the last threedecades measured people’s attitudes about privacy and the changes of attitudes over the time. In 1999, a survey of Internetusers’ attitudes toward identity information privacy by Ackerman et al. indicated that users were very worried abouttheir privacy in e-commerce settings [9]. Olson et al. found that these privacy attitudes vary along two dimensions: thetypes of information that should be kept private and the types of people with whom one would or would not share theinformation [24]. Also, a recent survey by Nguyen et al. showed that participants were highly concerned about informationprivacy in general and concerned about unfamiliar emerging computing devices such as RFID tags, but significantly lessconcerned about everyday tracking and recording technologies (e.g., credit cards, loyalty cards, and cameras) [10]. A furtherstudy on awearable camera (SenseCam) revealed that people accepted SenseCam, but theywere concerned about protectingthemselves and preferred to be informed that someone in the vicinity was using a SenseCam [25]. This series of studieshighlights the fact that people are concerned about their privacy and somewhat aware that new technologies may threatentheir privacy.

People’s identity exposure behaviors, however, do not always match their privacy attitudes. Berendt et al. studied users’privacy exposure behaviors in an Internet shopping experience [26]. They found that participants were willing to revealpersonal information in spite of having indicated that this informationwas important to keepprivate. They further found thatsome users were ‘‘identity concerned’’ and exposed less identity information, whereas other users were ‘‘profiling averse’’and revealed less information about their interests, hobbies, and health status. Acquisti and Grossklags pointed out thatpeople often lack adequate information about how to protect their identity elements [27]. Even with enough information,people often trade their privacy for short-term benefits. Consolvo et al. investigated why and when people are willing toshare their location information with friends, family members, and colleagues [28]. They found that participants’ privacyattitudes were not a good predictor of how the participants actually responded to location requests. Thus, people may notbehave in ways that match their privacy attitudes. Exposure behaviors should be related, however, due to the perceived riskof disclosing a particular identity element to a particular provider in a particular situation.

Unlike these previous studies, our research focused on understanding whether users were aware of the importanceof protecting their identity elements, and we tested our game theoretic approach to minimizing identity exposure. Inthe experiments, we provided concise and informative suggestions about appropriate disclosure and observed whetherparticipants adopted our suggestions.

Page 4: Understanding identity exposure in pervasive computing environments

780 F. Zhu et al. / Pervasive and Mobile Computing 8 (2012) 777–794

Privacy and trust. When people trust a service provider, they are more likely to make decisions that could potentially berisky. The concept of trust can be found in sociological, psychological, and economic literature. Relatively recent treatmentsof the concept have been reviewed in these papers and applied them to organizations [29] and to e-commerce [30–32].These researchers have identified several variables that users consider when interacting with technology (e.g., a website),including the interface with the technology (e.g., brand, navigation) and information properties of the technology (e.g., sealsof approval, presentation, policies of disclosure of information, and options for how a consumer’s data might be used in adifferent context). Thus, users’ beliefs and intentions to use a technology are influenced by the perceived effectiveness ofthe technology to reduce risk. Trust can be built across time and is built on positive experiences with a service provider [33].People behavemore conservativelywhen they interact with service providers that they do not trust. Petronio has provided arule-based boundary system that describes how communications are shared to maximize benefits and minimize risks [34].She identified three strategies to reduce risk: withholding information (depending on levels of risk and trust), falsifyinginformation, and seeking information. Metzger has verified that these strategies are typical methods used to maintainprivacy in a computing environment [35]. On the basis of these findings, we predicted that (a) willingness to discloseidentity information would be correlated to the frequency with which users interacted with specific service providers, and(b) participants would be more likely to fake information if they perceived that the information was important to keepprivate, or simply not provide the information, especially if the service provider was unknown. These hypotheses are testedin stage 1 of our research.

Interfaces for helping people make better decisions. The ‘‘five privacy design pitfalls’’ of Lederer et al. inspired ourRationalExposure software design [36]. Specifically, we adopted their suggestions to help users understand the privateinformation flow. Via the screen on a handheld device, participants were given suggestions by the RationalExposuremodel as to whether they should provide particular identity elements. Interested readers may refer to Iachello and Hong’scomprehensive survey on end-user privacy from the perspective of human–computer interaction [37].

Policy-based privacy frameworks. Policy-based privacy protection mechanisms have been adopted in multiple pervasivecomputing projects (e.g., the Privacy Awareness System (pawS) [16], Cranor and Reagle’s ‘‘buckets’’ approach [38], and anadapted model based on Lampson’s access matrix and the Bell and LaPadula’s (BLP) security labels [15]). These approachessuffered from usability issues, and they were considered to be too complex for average users [18]. Hong and Landaydesigned a toolkit, Confab, for application developers to use to enforce policies, to send privacy notifications, and tomanipulate private data [17]. Confab also emphasized on usability. For example, Confab used three basic interactionpatterns for privacy-sensitive applications for end-users. Privacy Bird, an Internet Explorer plug-in, could notify users ifa website’s P3P policy did not meet a user’s privacy preferences [39] and enabled users to easily determine how informationprovided to a site might be used (e.g., one’s health or medical information might be used for marketing). Our approachcomplements these approaches and helps users make rational decisions about which identity elements are appropriateto expose.

The RationalExposure model. We propose an identity exposure model, called RationalExposure, for pervasive computingenvironments [19]. Using this model, a subset of a person’s identity is stored in a hierarchical tree structure that representsidentity elements from a general to a precise level. An identity element is more general if a larger number of people havethe same identity element (e.g., gender). During the interactions between users and service providers, our model suggeststhat users expose the most general identity elements that service providers are willing to accept. When identity exposureis unnecessary, our model suggests that users not expose information or, alternatively, provide falsified information.

Specifically, we model the interaction between a user and a service provider as an ‘‘extensive game’’ [40]. An extensivegame means that a user and a service provider take turns making decisions and taking actions. Extensive games aremathematicalmodels, often used in economics, tomodel two parties’ behavioral choices, in turn, as theymake decisions andtake actions. In our shopping scenario, for example, the store’s computer asks Michelle for her digital driver’s license. If sherefuses the request, the store computer decides whether to complete the transaction. During a game, there may bemultipleactions with different payoffs. For instance, a customer may provide driver’s license information, or she may propose toprovide another type of identification. The interaction between two parties can be expressed in a tree structure. A tree noderepresents a party making a decision, and a branch represents an action that a party may take.

We use a process, known as backward induction in Game Theory [40], to find optimal actions for both parties. The processstarts from the tree leaves of a game tree, compares and selects the actions with the highest achievable payoffs until theroot of the tree is reached. The results (actions and final states) are called ‘‘subgame perfect equilibria’’. These are optimalsolutions for identity exposure games for both parties.

3. Research phases and results

In pervasive computing environments, people’s identity information is readily available to share or be accessed viacomputing devices. Things that people carry and wear may be accessed via attached RFID tags and embedded computersystems. This includes their identification cards such as driver’s licenses and passports [41]. Activity recognition systemsidentify people’s current activity via the sensors that they wear and carry or via the sensors in the environments [42,43].Location sensing technologies express people’s current location information in coordinates or semantically meaningfulplaces [44,45]. Our research focuses on privacy concerns and disclosure in this pervasive computing environment.

Page 5: Understanding identity exposure in pervasive computing environments

F. Zhu et al. / Pervasive and Mobile Computing 8 (2012) 777–794 781

In our survey and experiments, we wanted to understand the following issues of identity exposure and their relations.We use the italic case phases below to summarize the issues. Later in the paper, we refer to these issues using the phases asa brief reference.

1. Privacy attitudes.What identity elements do people think are important to keep private? Our goalswere to identify (a) therelative importance of keeping different identity elements private and (b) elements that participants consider equallyimportant to protect.

2. Privacy concerns. Are people very concerned about identity exposure and privacy in general? What are their strongestprivacy concerns?

3. Claimed security actions. What actions have people taken to protect their identity information?4. Knowledge about service providers. Is the disclosure of private information related to the frequency of interaction with a

provider? Are known service providers perceived to be less risky as a function of previous transactions and disclosure?5. Disclosure behaviors. What are people’s behaviors when they are asked to expose their identity elements in pervasive

computing environments? Are their behaviors consistent with their privacy concerns, their attitudes toward protectingthe identity elements, and the security actions that they claim to have taken?

6. Behaviors, multiple identity elements. What are people’s behaviors when service providers ask them for multiple identityelements? A single identity elementmay not be unique, but the combination of multiple identity elementsmay uniquelyidentify a user.

7. RationalExposure model. Can our RationalExposure model help people make less risky decisions about disclosing theirprivate information? Our software suggests that users not provide specific information or, rather, provide fake identityinformation. We wanted to know the extent of users’ acceptance of our suggestions.

To study the seven issues, we conducted three stages of research. In the first stage, participants completed an onlinesurvey. The survey focused on the first 4 issues. In the second stage, we focused on issues 5 and 7. Participants came toour lab and used our software (called InfoSource), which provided a rich CD shopping experience and the RationalExposuremodel for the checkout process. In the third stage, we studied issues 6 and 7. Participants used a modified version of theInfoSource software, going through the CD shopping experience and using the RationalExposure software in the context ofmultiple requests for personal identity elements.

3.1. Participants

We conducted the research in the spring and fall semesters of 2009. There were 385 college students who participated.Their ages ranged from 17 to 40; approximately 90% of the participants were 23 years or younger. Only 29% of themwere male students. They were students taking introductory psychology courses that are widely used as general educationoptions. Thus, the students had various majors, coming from areas such as engineering, science, nursing, business, andliberal arts. In the first stage, 229 participants completed an online survey. In the second stage, 100 participants attendedexperiments and surveys in our lab. In the third stage, 56 participants finished our study in the lab. Students participated inonly one of the stages, so that their responses/behaviors would not be influenced by previous experience with research onprivacy attitudes or disclosure.

It is the practice of psychology departments at research universities in the United States to expect students to have‘‘hands-on’’ experiences with research. We posted our experimental descriptions and the students chose to participatein our study. In return for participating, they received ‘‘activity points’’ toward their course assignments; they were notcompensated in any other way.

3.2. Stage one

The goal of stage onewas to ascertain privacy attitudes (issue 1), privacy concerns (issue 2), claimed security actions (issue3), and how previous experience with a service provider impacts disclosure (issue 4). Participants (n = 229) finished a 30-min online survey that was ostensibly about their music preferences and music purchasing behaviors. We asked four setsof questions: (a) their music preferences and the extent of their online music purchasing experience, (b) their demographicdata, (c) their attitudes and concerns about privacy and security, and (d) the frequency of interaction with specific providersand their willingness to disclose various identity elements to each provider.

First, we asked participants to identify their music preferences and to indicate how they bought music (e.g., CDs,downloads). Second, we asked them to provide a variety of demographic characteristics. Next, we asked them to indicatehow frequently (on a 4-point scale ranging from ‘‘never’’ to ‘‘very often’’) they used a variety of service providers (i.e., autoinsurance company, Best Buy, restaurant/bar, university bookstore, Amazon.com,Walmart, Netflix, and a fictitious chocolatecompany—for a validity check) and what types of information they would be willing to give each service provider. Third, weasked specific questions related to identity and privacy concerns, such as identity theft, transfer of their private informationto other businesses, and profiling and price discrimination. Fourth, we asked participants 18 questions related to the actionsthat they had taken to protect their identity, privacy and security, such as falsifying their information on a website, deletingcookies from their computers, responding to unsolicited emails, and carefully reading privacy policies on websites. Last,

Page 6: Understanding identity exposure in pervasive computing environments

782 F. Zhu et al. / Pervasive and Mobile Computing 8 (2012) 777–794

participants rated 26 identity elements in terms of how important it is to keep them private. The identity elements rangedfrom social security number, to zip code, to their favorite TV programs.

Many identity elements in our list may be read or inferred from RFID tags embedded in items such as driver’s licenses,credit cards, textbooks, and wine bottles. Location information, one of the most important types of context information inpervasive computing, has attracted much interest in academia to study people’s privacy preferences [46]. In addition, weincluded some identity elements such as favorite TV programs, income, and email addresses that have been used in previousstudies [9,24], so that we could compare our results to those of previous studies and ground our results in context of otherresearch findings.

The extensive survey data that we acquired in the first stage helped us have a clearer understanding of participants’attitudes, concerns, and actions that they claimed to have taken with respect to security and privacy. Our findings guidedus in the second and third stages of the experiments.

3.2.1. Importance of keeping identity information privateTo address issue 1 (privacy attitudes), we asked participants to rate 26 identity elements from ‘‘not at all important’’,

‘‘somewhat important’’, ‘‘substantially important’’, to ‘‘extremely important’’. The text read: ‘‘The following questions askyou to identify the types of information that you think are important to keep private. In this context, ‘‘privacy’’ refers toinformation about yourself that you think should not be accessed without your consent or control. Please indicate howimportant it is to keep each of the types of information private’’. Fig. 1 shows the histograms of their ratings. Each bin in ahistogram represents the percent of the users giving the rating at that level. In Fig. 1, we arranged the identity elementsin the order from the least important element to keep private to the most important element to protect, according toour participants’ ratings. Overall, their ratings ranged from almost unanimous understanding of whether an element wasimportant to protect to very different perceptions of the importance for some identity elements.

Most participants indicated that their favorite TV programs, favorite hobbies, frequency of tobacco and alcohol usage,college majors, and frequency of Internet usage were not sensitive information to keep private. On the other end of thespectrum, most of them agreed that credit card numbers, driver’s license numbers, and social security numbers were highlysensitive information to keep private. These results are consistent with the findings by Ackerman et al. [9].

In addition, more participants believed that credit card numbers were more important to protect than driver’s licensenumbers, even though credit card numbers are usually easier to invalidate and change. In the study conducted by Olsonet al. [24], the credit card numbers were considered even more important to protect than social security numbers. Wespeculate that the monetary risk of having one’s credit card used by someone else is more prevalent in the media and newsthan is identity theft, such that it seems a more likely threat. Future research could test this hypothesis.

Participants’ ratings for the number of credit cards, monthly income, first and last names, IP addresses, phone numbers,and their locations were diverse; that is, the standard deviations of the ratings for these elements were large. Participantsdid not agree about the sensitivity of these identity elements. In reality, first and last names, location information, and phonenumbers may be very sensitive identity elements. Furthermore, compared to the participants in the survey by Ackermanet al. [9], participants seemed more conservative in providing their email addresses (49% vs. 76%) and home addresses(15% vs. 44%).

We used formal cluster analysis to reveal the similarity of participants’ importance ratings and identity elementsto determine which identity elements they considered to be similarly important to protect. We chose the averagelinkage clustering method to measure the psychological distance between identity elements. The average linkage methodstatisticallymeasures themean distance between the identity elements in one cluster and the identity elements in the othercluster. It uses a more central measure of location compared to other linkage methods [47]. As can be seen in Fig. 2, threeclusters were obtained.

Driver’s license numbers, social security numbers, and credit card numbers appeared in one cluster. Thiswas the group ofidentity elements that participants thought were the most important to protect. They also rated first and last names, phonenumbers, email addresses, and location information similarly (second cluster). IP address was rated quite differently fromother identity elements; it might be a unique digital identity, if one connects to Internet directly via an ISP. Or, at least, it is aunique identity within the first hop of the network. The remaining identity elements were in a third cluster. The underlyingdimensions of these clusters can be evaluated in future research. The clusters are provided here as descriptive statistics.

To test whether participants’ experience with particular providers influence their willingness to disclose privateinformation (issue 4, knowledge of service providers), their ratings of experience and disclosure were compared. It wasassumed that people who had had negative experiences with a given service provider would have fewer interactions.Spearman correlations for ordinal data were calculated for each type of identity information. The hypothesis thatparticipants would be more willing to provide information to companies with which they had more experience wassupported, with the caveat that the information needed to be relevant to the service transaction. For example, the onlycorrelation that was statistically significant (r = 0.15, p-value < 0.01) between frequency of use and willingness togive student identification information was for the university bookstore. Frequency of patronage and willingness to givecredit card information was significant for Best Buy (r = 0.16, p-value < 01), restaurant/bar (r = 0.21, p-value < 0.01),Amazon.com (r = 0.31, p-value < 0.01), Walmart (r = 0.18, p-value < 0.01), and Netflix (r = 0.21, p-value < 0.01). Thestrongest correlations between frequency of patronage and willingness to disclose information were between providing a

Page 7: Understanding identity exposure in pervasive computing environments

F. Zhu et al. / Pervasive and Mobile Computing 8 (2012) 777–794 783

Fig. 1. Importance ratings of the identity elements. (X axis—1. Not at all important, 2. Somewhat important, 3. Substantially important, and 4. Extremelyimportant; Y axis—Percent of participants.)

Fig. 2. Dendrogram of the importance rating for the identity elements.

zip code to Amazon.com (r = 0.34, p-value < 0.01) and providing a drivers’ license number to an auto insurance company(r = 0.24, p-value < 0.01). These correlations support our hypothesis that disclosure to a service provider would berelated to the amount of previous experience with the provider was supported. Note that the significant correlations are foridentity information that would be relevant and appropriate for the provider to request (e.g., zip code to Amazon necessaryfor shipping).

Thus, experience with a service provider did impact users’ willingness to disclose appropriate information. Note thatthese correlations, although statistically significant,were small inmagnitude. This is not surprising, given that themeasure ofwillingness to provide identity informationwas dichotomous, necessarily leading to attenuated correlations. Also, a fictitiouschocolate company was used as a validity check; participants accurately reported that they had never interacted with thisprovider. (Participants who were not honest, or alternatively were not paying attention to the survey, might have indicatedthat they had some consumer interactionwith the chocolate company. In this case, these participants’ data could be removedfrom the dataset. But this, in fact, did not occur.)

3.2.2. Participants’ privacy concernsTo investigate issue 2 (privacy concerns), we asked participants to rate privacy concerns from ‘‘not at all concerned’’, ‘‘a

little concerned’’, ‘‘somewhat concerned’’, to ‘‘very concerned’’. (Higher values indicate more concern.) The text read: ‘‘Foreach of the items below, please indicate your level of concern’’. Fig. 3 shows participants’ ratings in histograms.

Participants were concerned about their private information being collected in general. They were also concernedabout their private information being transferred to others, being hacked, or being stolen. However, not many participantsworried about law enforcement acquiring their private information. When we asked participants questions about specific

Page 8: Understanding identity exposure in pervasive computing environments

784 F. Zhu et al. / Pervasive and Mobile Computing 8 (2012) 777–794

Fig. 3. Participants’ privacy concerns. (X axis—1. Not at all concerned, 2. A little concerned, 3. Somewhat concerned, and 4. Very concerned; Y axis—Percentof the participants.)

information being known by others, they showed various levels of concerns. They showed the most concern about theircurrent location being known and infiltration of their computers, while they showed least concern about people knowingtheir hobbies, or their clothes’ and shoes’ brands. Note that participants’ serious concerns about location information iscontrary to the conclusion in [10] (less concerned). There are several differences between the two studies: (1) a generalquestion about location privacy vs. location tracking via everyday tracking and recording technologies, (2) a direct questionabout concerns about disclosure of a current location vs. indirect questions about location privacy via technologies, and(3) college students vs. customers at shoppingmalls. Futurework is needed to better understand people’s concern of locationprivacy. For health conditions, financial situations, purchases, visited websites, and emails being read, they had divergentopinions. Some showed great concern, some did not worry at all, and some had a little concern. The standard deviations ofthese ratings were very large. Future research can address factors that impact these differences between people. Moreover,participants expressed great concerns about identity theft and being harassed, but did not worry about profiling and pricediscrimination.

To provide descriptive information about the similarities between participants’ ratings, the same type of cluster analysiswas used to compare privacy attitudes revealing interesting similarities among the privacy concerns, as shown in thedendrogram in Fig. 4. We separated the concerns into five groups. Participants seemed to worry that many aspects of theirinformation could be used against them negatively, because they rated the seven privacy concerns shown in the left portionof the dendrogram most similarly. Information about their current physical location, however, did not cluster with otherconcerns. Why these particular items clustered together is a question for future research.

3.2.3. Privacy protection actions that participants claimed to have takenLast in the online survey, we addressed issue 3 (claimed security actions). The text read: ‘‘For each of the items below,

please indicate how frequently you engage in the behavior’’. Participants could respond with ‘‘never’’, ‘‘almost never’’,‘‘sometimes’’, ‘‘frequently’’, or ‘‘very often’’. It should be noted that we have only the participants’ reports that they actuallyengage in these behaviors.

They expressed prudence when they interacted with unfamiliar parties and did not actively provide information. Asshown in the first row of Fig. 5, most of them indicated that they did not respond to telemarketing calls, unsolicited emails,and unknown instant messenger chat requests.

They claimed to protect their private information actively. About 70% of the participants reported that they never oralmost never gave their information for better prices and services. More than 47% of them usedmore than one email addressfor privacy reasons. Approximately 50% of them reported that they had falsified their personal information on the Internet,at least sometimes, to protect their identity information and privacy. However, it seems that only 20% of the participantspaid not to be listed in phone directories.

The participants’ responses indicated that they were familiar with computers. They claimed to have taken actionsto secure their computers and protect their digital identities. Most of them used anti-virus software, firewalls, anddownloaded security patches. About 70% of them also deleted cookies, at least sometimes. In their daily lives, the majorityof participants protected their identity and financial information by shredding credit card receipts (70%) and checking creditcard statements (79%), but over 67% of the participants did not order or check their credit reports.

Page 9: Understanding identity exposure in pervasive computing environments

F. Zhu et al. / Pervasive and Mobile Computing 8 (2012) 777–794 785

Fig. 4. Dendrogram of the privacy concern ratings.

Fig. 5. Actions that participants claimed to have taken. (X axis—1. Never, 2. Almost never, 3. Sometimes, 4. Frequently, 5. Very often; Y axis—Percent ofthe participants.)

Approximately 18% of the participants claimed that they were frequently interested in finding out how their personalinformation was used by companies. Similarly, 18% of the participants reported that they frequently read privacy policiescarefully. Note that this self-reported rate might seem similar to the rates (23% in general cases and 43% in e-commercescenarios) reported by Jensen et al. [48], but Jensen and Potts observed that the actual rates of people reading privacy policieswas merely 0.24% [49]. The enormous mismatch between people’s self-reported behaviors and actual behaviors is worthyof further investigation. People who believe that they are aware of privacy policies, but do not actually understand them,may be at an increased risk.

The dendrogram in Fig. 6 suggests that those who cared about their information being used also claimed that theycarefully read privacy policies. About 40% of the participants claimed that they used encryption, at least sometimes,to protect their email messages. This percentage is much higher than our expectation. The patterns of similarities anddifferences in the actions people claim to take to protect their privacy provide some intriguing questions for future research.

In summary of stage one of our research, participants indicated that they (a) had the strongest privacy attitudes abouttheir addresses, driver’s licenses, credit card numbers, and social security numbers, (b) had the strongest concerns abouttheir computers being infiltrated, identity theft, and having their private information collected, transferred, or stolen,(c) most frequently engaged in the security behaviors of blocking pop-ups, checking their credit card statements, and using

Page 10: Understanding identity exposure in pervasive computing environments

786 F. Zhu et al. / Pervasive and Mobile Computing 8 (2012) 777–794

Fig. 6. Dendrogram of the claimed actions taken.

anti-virus software, (d) most frequently disclosed to providers with which they were familiar, and (e) had differentialresponses to requests for information, including disclosure, faking, and refusal. Overall, participants’ attitudes, concerns,and claimed behaviors seem to be congruent with each other, at least in terms of the information that they provided to usin the survey.

The question remains, however, as towhether people actually take personally appropriate actions—those thatmatch theirattitudes and concerns. That is, do people truly act inways that best protect their privacy and security? If so, why do somanypeople fall prey to identity theft and email scams? Anderson indicated that real attacks exploit psychology at least as muchas technology [50]. The second portion (stages two and three) of our research project focused on participants’ behaviors ina simulated shopping situation in our lab in which some personal information (i.e., name, phone number, age, and driver’slicense information) was requested. This pervasive computing shopping experience was simulated using PDAs. Some of the‘‘shoppers’’ were given no help in maintaining privacy, whereas others used software that provided help – in the form ofwarnings not to expose private information – in maintaining privacy.

3.3. Stage two

In stage two, the major goal was to evaluate whether people (a) actually behave according to their stated privacypreferences and (b) reduce their privacy exposure in the presence of our RationalExposure model. Participants (n = 100)came to our lab for the second stage of the experiment. Each session took approximately 30 min, as participants engaged inthe shopping at their own pace. Between 1 and 4 students (mode = 2) participated in a session. Each participantwas given aPDA, a pair of earphones, and a computer. In addition, they were given instructions on how to use the InfoSource (Version 1)software. The PDAs were used to store their identity information. Participants completed a computerized follow-up surveyafter their shopping experience. From the 100 participants, we acquired complete experimental data from 97 participants.Two participants withdrew because they felt that it was unsafe to give their credit card numbers; another participant didnot have a credit card.

We told participants that the experiment was designed to study their music preferences and to simulate a future musicshopping experience in which shoppers would use handheld devices. First, participants entered three pieces of identityinformation on a PDA: their phone numbers, a credit card number, and driver’s license information, as shown in Fig. 7(a).Then, they supplied a password to encrypt all of the identity information. We asked them to pretend that the PDA was theirpersonal cell phone, into which they entered information once and could use many times.

Devices such as the PDAs used in our experiment may potentially serve as both one’s cell phone and a digital wallet inthe future. Such devices would provide a much richer user interface (e.g., a touch screen, a microphone, and a speaker) forits owner to use and manage identity information than to manage digital identity cards such as one’s driver’s license andcredit cards with RFID tags.

When participants entered their identity information, they may have entered false information, but we asked them totreat the identity information as though itwere real during the experiment. Nomatterwhat participants entered, the driver’slicense numbers and the first 12 digits of the credit card numbers that they provided were replaced with an asterisk (∗)during the process for purposes of participants’ security. (The participants did not know that we replaced the information.)Each participant removed all of his or her information on the PDA before the end of the experiment. To protect participants’identity information during our experiment, we recorded onlywhether they provided a certain piece of information, ratherthan recording the actual information. In addition, the lab was configured in such a way that wireless communication wasencrypted and all PDAs and computers were not connected to the Internet or to any other computer that was not used forthis study. (Participants did not know that we disconnected the PDAs and computers from the Internet.)

During the shopping simulation, participants went to the two shelves with CDs displayed. They were asked to look atCDs as if they were shopping in a store. All participants acquired detailed information aboutmultiple CDs via the PDAs. They

Page 11: Understanding identity exposure in pervasive computing environments

F. Zhu et al. / Pervasive and Mobile Computing 8 (2012) 777–794 787

a b

Fig. 7. InfoSource screen shots. (a) A participant enters identity information. (b) More detailed information about a CD and a 30 s sample of the music areaccessible from the PDA.

a b

Fig. 8. (a) Checkout process without the RationalExposure model. (b) The RationalExposure model provides suggestions to users.

read additional information about CDs on the PDAs as shown in Fig. 7(b) and listened to sample songs. Some participantswere so interested in interacting via PDAs that they browsed through every CD.

After participants selected CDs, they went through the checkout process at a checkout desk which had a PC to interactwith their PDAs. They used the PDAs to provide their credit card numbers and other information. The interactions betweentheir PDAs and our server were over a wireless network. We asked for the following information: credit card information topay for the CDs, phone number, and driver’s license information to verify the buyer’s name on the credit card. In addition,they were offered VIP memberships, which required them to provide additional identity information.

Participants needed to provide their credit card numbers to checkout or, alternatively, could quit the checkout process.Similarly, a driver’s license was mandatory to finish the transaction. Participants could finish the checkout process withoutgiving their phone numbers or becoming VIP members. Most of the identity elements requested for the VIP membershipcould be acquired from one’s driver’s license automatically. If participants wanted to become a VIP member, but not sendtheir information from the PDA, they could manually edit the fields.

We let participants send the information via the PDAs. This is analogous to letting a store read a customer’s informationfrom his or her driver’s license with an RFID tag. Forty-two participants used the software without the RationalExposuremodel. They were asked for identity information and needed to make their own decisions as shown Fig. 8(a).

Fifty-five participants used InfoSource software with our RationalExposure model. The RationalExposure model madeidentity exposure suggestions when phone numbers and driver’s license information were requested. Then, users madefinal decisions as shown in Fig. 8(b). Behind-the-scenes, the RationalExposure model analyzed the identity exposure gametrees [19]. Via the PDAs, the RationalExposure model suggested that users not provide their phone numbers, but ratherprovide only their nameswhendriver’s licenseswere requested. The serverwas preprogrammed to complete the transactionwhen phone numbers were not provided or when only the names were provided.

Last, participants filled out a survey. We asked them to evaluate their experience of using the InfoSource software, toindicate (a) the importance of keeping the eight identity elements private (e.g., zip code, home address, and credit cardnumber), (b) their identity exposure and privacy concerns (6 questions), and (c) the actions that they took to protect theirinformation privacy (10 questions). To keep the experiment session reasonable (within 30 min), we selected a subset ofquestions used in the first stage survey data; our selection process is described in detail in Section 3.3.2. After we hadcollected all of the data for this experiment, we sent the participants a ‘‘debriefing’’ email, indicating that we had testedwhether or not they would provide personal information to us during the shopping experience.

Page 12: Understanding identity exposure in pervasive computing environments

788 F. Zhu et al. / Pervasive and Mobile Computing 8 (2012) 777–794

Table 1Participants’ behaviors by experimental condition.

Without the RationalExposure model With the RationalExposure model

Number of participants 42 55Phone number 36 (86%) provided 21 (38%) providedName verification 37 (88%) provided 49 (89%) providedVIP membership 19 (45%) applied 24 (44%) applied

3.3.1. Participants’ identity exposure behaviors (issues 5 and 7)Among the 42 participants who needed to make their own decisions, 36 of them (86%) provided their phone numbers as

shown in Table 1. Note that providing the phone numberwas notmandatory to finish the transaction. For the 55 participantswho used the RationalExposure model, only 21 of them (38%) provided their phone numbers. This was very close to thepercentage (35%) of the participants who rated phone numbers as not important or somewhat important to protect in thestage one survey. Thus, RationalExposure model suggestions seem to help participants make decisions that better matchtheir attitudes toward privacy and security.

Driver’s license information was required to finish the shopping transaction. During the checkout process, participantswere shown (with a message) that their driver’s licenses were used to verify their names. Table 1 shows that 37 out of the42 participants who did not use the RationalExposure model provided their full driver’s license information by clicking a‘‘Yes’’ button. The other 5 participants stopped the transactions. Four of them explained that it was not safe to give the digitaldriver’s licenses. One person reported not having a driver’s license. Thus, 88% of the participants gave the full driver’s licenseinformation, which included unique information such as their driver’s license numbers, when only name information wasrequired.

For the participants in the RationalExposure condition, the software automatically negotiated with the checkoutserver. First, participants saw a message that their driver’s licenses were being requested. Then, the systems startednegotiations with the server. Last, participants were prompted that only their names were required. Six participants didnot provide their names, whereas 49 participants (89%) did so. Note that in the RationalExposure condition, the samepercentage of participants concluded the checkout process, but by providing only their names, rather than their full driver’slicense information. Thus, the RationalExposure model suggestions protected participants, by encouraging them to exposeminimum identity information. Interested readers might refer to our paper for rational exposure, negotiations, and bestoutcomes [19].

After the checkout process, all participants were notified by the checkout server that they might become VIP members,which would always give them the best price and high-tech shopping carts. All participants made their own decisions; wedid not provide rational suggestions. Overall, 43 participants (44%) chose to provide their information for better prices orservices. Both groups had similar percentages of the participants selecting to join, as shown in Table 1. To obtain VIP status,all 43 participants provided their monthly income, email address, home address, and date of birth. Their PDA automaticallyread the information about home address and date of birth from their driver’s licenses. Seven of thosewhowanted to acquireVIP cards clicked the edit button to modify the information before they sent it. (Four of the seven participants were in theRationalExposure condition and three were in the control condition.) Thus, only 7% (7/97) of the participants were notwilling to provide the additional private information required, but still indicated a desire to acquire VIP cards. Recall thatall participants made decisions without RationalExposure suggestions at this point. It does not seem that participants whohad prior experience with the RationalExposure suggestions were more prudent with their private information.

To further understand why participants provided or did not provide their information to become VIP members,participants could be asked about their decisions during a debriefing. In ongoing research, we are using interviews tolearn why participants are willing to expose their identity information and the contexts in which they will provide thatinformation. This part of the experiment could be improved by offering a VIP type of benefit outside the context of thestudy, to examine in situ exposure behaviors, because offering the VIP membership might seem unrealistic and participantswould know that they would not receive the benefit.

3.3.2. The relationship among attitudes, concerns, claimed actions, actual behaviors, and our RationalExposure modelWe used logistic regression to analyze the relationship among participants’ attitudes, their privacy concerns, their

claimed actions to protect their privacy, and their actual behaviorswith andwithout rational suggestions, as shown in Eq. (1).Because the exposure of driver’s license information was different for the two groups (one group needed to expose the fulldriver’s license information, whereas the other group needed to expose only names), in the discussion below we focus onlyon their exposure of phone numbers, which differed as a function of experimental condition (as shown in Table 1).

Phone no. exposure = β0 + β1x1 + β2x2 + β3x3 + β4x4 (1)

where x1 = ‘‘Used RationalExposure model’’ (dummy coded with no RationalExposure = 0).

x2 = ‘‘Attitudes’’.x3 = ‘‘Concerns’’.x4 = ‘‘Claimed actions’’.

Page 13: Understanding identity exposure in pervasive computing environments

F. Zhu et al. / Pervasive and Mobile Computing 8 (2012) 777–794 789

Table 2Logistic regression results for the first model (Eq. (1)).

Predictor Coef Z P Odds ratio

Constant 7.07924 2.96 0.003Used RationalExposure model −2.35530 −4.21 0.000 0.09Attitudes −1.02465 −2.61 0.009 0.36Concerns −0.0171817 −0.05 0.959 0.98Claimed actions −0.397237 −0.97 0.330 0.67

Table 3Logistic regression results for the second model (Eq. (2)).

Predictor Coef Z P Odds ratio 95% CI lower

Constant 5.79306 3.53 0.000Used RationalExposure model −2.43147 −4.38 0.000 0.09 0.03Attitudes −1.00426 −2.63 0.008 0.37 0.17Goodness-of-fit tests

Method Chi-square DF P

Pearson 1.74945 5 0.883Deviance 2.23248 5 0.816Hosmer–Lemeshow 0.64930 4 0.957

Measures of association: (Between the response variable and predicted probabilities)

Pairs Number Percent Summary measures

Concordant 1662 72.9 Somer’s D 0.62Discordant 250 11.0 Goodman–Kruskal gamma 0.74Ties 368 16.1 Kendall’s Tau-a 0.30Total 2280 100.0

The statistical results in Table 2 indicate that the coefficients for the participants’ attitudes and whether they used theRationalExposuremodel are not zero and the p-values are statistically significant. Thus participants’ attitudes towardprivacy,coupled with the experimental manipulation, predicted their disclosure behaviors. However, participants’ concerns andtheir claimed actions did not seem to be related to their actual behaviors.

We subsequently tested amodel that included only the significant predictors obtained in the first analysis—their attitudesand whether they used our rational model as shown in Eq. (2).

Phone no. exposure = β0 + β1x1 + β2x2 (2)

where x1 = ‘‘Used RationalExposure model’’.

x2 = ‘‘Attitudes’’.

In Table 3, the negative coefficient for participants’ attitudes suggests that themore sensitive the participants consideredthe identity elements the less likely they were to expose their phone numbers. Similarly, the negative coefficient for theRationalExposuremodel indicates that if the RationalExposure suggestions were provided it was less likely that participantswould expose their phone numbers. The odds ratio (comparing the exposure behaviors in two experimental conditions)further suggests that given the same rating for the importance of the identity elements, those who were provided theRationalExposure model were much less likely (odds ratio = 0.09) to expose their phone numbers. In addition, the Pearson,Deviance, and Hosmer–Lemeshow Goodness-of-Fit tests show that the model fits the data adequately. In the measures ofassociation section, the summary measures (Somer’s D, Goodman–Kruskal Gamma, and Kendall’s Tau-a) indicates that themodel provides 30%–74% of the predictive ability.

To summarize, the research in stage two showed that participants in the experimental conditionwere advantaged by ourRationalExposuremodel. Theywere less likely to provide phone number and driver’s license identity information than thosein the control condition. These privacy behaviors, however, did not generalize to a context in which the RationalExposuremodel was not present; participants in the RationalExposure condition exposed as much identity information in the VIPcontext as those in the control condition. On the basis of the results of the survey that participants completed immediatelyfollowing the experiment, their attitudes toward privacy, coupledwith the experimental condition, predicted their exposurebehaviors. Their privacy concerns and the actions that they claimed to take to protect their security were not, however,significant predictors of their actual exposure behaviors.

3.4. Stage three

Stage three was designed to investigate participants’ behaviors when multiple requests for identity information weremade. Fifty-six participants attended the third stage experiment in our lab. A participant usually spent 30 min completing

Page 14: Understanding identity exposure in pervasive computing environments

790 F. Zhu et al. / Pervasive and Mobile Computing 8 (2012) 777–794

a b

Fig. 9. InfoSource Version 2. (a) Shopping agent. (b) Participants may play sample music video related to a CD album.

a b

Fig. 10. InfoSource Version 2 and the RationalExposure model. (a) RationalExposure makes an exposure suggestion. (b) RationalExposure provides anexplanation of the rational action.

the experiment. The experiment setting was very similar to the setting in stage two in that participants used a PDA to assisttheir CD shopping.

We made several improvements to the new version, InfoSource Version 2. First, the communication between a PDA andthe InfoSource server was encrypted using AES. Second, when a participant shopped for a CD, he or she interacted with ananimated shopping agent, Alice, as shown in Fig. 9(a). When the application loaded, Alice introduced herself and greeted theuser. When she talked, her mouth moved. (We recorded a real female voice and played it back.) Alice guided a user throughthe CD shopping. If a userwas interested, Alice presented the CD’s background information, its popularity, sales information,and other information. When she presented the information, related photos were displayed in the slide show formwith thekey phrases shown on the screen (Fig. 9(b)). A user could click a ‘‘skip button’’ at any time to skip the information and resumeinteractions with Alice. Alice also offered sample music videos of the songs in the CD. To facilitate full screen video mode,InfoSource software ran in the horizontal format throughout the experiments. A user could stop the video by tapping thescreen and return to the interaction with Alice. When Alice asked questions, users employed a stylus to input text.

We used the animated agent to increase participants’ attention and interest. On the other hand, we did not want tointroduce extraneous factors that might affect participants’ identity exposure behaviors. In the experiments, Alice stateddetailed information and facts about the CDs. We did not use persuasive approaches that an animated software agent mighttake.

Alice asked participants to provide 4 identity elements (age, gender, birthday, and zip code) within the context ofa shopping experience. The requests for identity information were interleaved with CD related information that Alicepresented. For example, after a participant watched a sample music video, she asked a participant’s age.

Unlike the experiment in stage two, the RationalExposure software ran as an independent application on the PDAs. Theexposure suggestions were displayed in a small message box in the right bottom corner as shown in Fig. 10(a). A user couldaccess two levels of information currently provided by the software. At the first level, an exposure action was suggested(an example is given as shown in Fig. 10(a)). At the next level, we provided an explanation of the rational action (as shownin Fig. 10(b)). In this research, we made rational suggestions, but users made their own decisions. They could adopt thesuggestions, click the skip button on the screen and not input anything, or ignore the suggestions and input the true data.After all data had been collected for this stage, participants were debriefed via an email that indicated our main goal inthe research was to determine what types of identity information they would provide and under what conditions. Thedifferences between the experimental and control conditions were described.

3.4.1. Participants’ identity exposure behaviors when multiple identity elements were requested (issue 6)In this phase of the research, 24 participants were in the control group and used the software without the

RationalExposure model. All of them provided true information for their gender and date of birth (birthday and age) asshown in Table 4. Three participants did not provide their true zip code. One wrote in the follow-up survey (after theexperiment) that she did not know the zip code of her current residence and provided the zip code of her home town.Another explained that she accidentally skipped the step; the third participant did not provide a reason. All other participantsreported providing their true zip codes. Therefore, for all four identity elements, only three participants did not provide allof the requested information.

Page 15: Understanding identity exposure in pervasive computing environments

F. Zhu et al. / Pervasive and Mobile Computing 8 (2012) 777–794 791

Table 4Participants’ exposure behaviors when multiple identity elements were requested.

Without the RationalExposure model With the RationalExposure model

Number of participants 24 32Age 24 (100%) 23 (72%)Birthday 24 (100%) 18 (56%)Gender 24 (100%) 32 (100%)Zip code 21 (88%) 17 (53%)Number of participants who provided all the information 21 (88%) 14 (44%)

Thirty-two participants used the software with the RationalExposure model. When Alice asked participants’ ages, thegame theoretic identity exposure software generated a pop-up window that suggested participants consider giving anapproximate age. In addition, an example was given: ‘‘If you are 21, you may enter 25’’. Nevertheless, 23 participants(72%) reported giving their true ages; 5 participants (16%) said in their surveys that they falsified their age after readingthe message; and 4 participants (13%) stated that they would falsify their ages even without the message. Gender is oneof the least unique identity elements, so people might think it is therefore unimportant either to expose or to fake. Allparticipants chose to give their true gender information. When a birthday was requested, the game theoretic model wasprogrammed to suggest that participants give fake information as shown in Fig. 10(a). Eighteen participants (56%) gavetheir true birthday; 9 participants (28%) gave fake birthdays after they saw the suggestion; and 5 participants (16%) saidthat they would falsify their ages themselves. Similarly, for zip codes, the software suggested to consider giving a fake yetplausible zip code. Participants’ actionswere: 17 participants (53%) gave their own zip codes, 12 participants (38%) gave fakezip codes, and 3 participants (9%) reported that they would give fake data themselves. In summary, 14 participants (44%)still provided their true identity information for all four identity elements.

We ran the two proportion test (left-tailed with α = 0.05) to compare whether the participants in the RationalExposurecondition were less likely to provide their true information than those in the control group. For age, birthday, and zip code,the tests were all statistically significant. The p-values of the three tests were all less than 0.004.

Overall, the RationalExposure suggestions helped users to expose less identity information. Participants were lesslikely to disclose their age, birth date, and zip code in the context of the RationalExposure suggestions. Even in theRationalExposure condition, however, all participants revealed their gender. As indicated in Fig. 1, gender information is notconsidered, by most people, to be important information to protect. This finding, therefore, indicates one of the boundaryconditions of the utility of the RationalExposure model.

4. Limitations and lessons learned

Our survey and experimental results reflect a specific group – college students – whose ages primarily ranged from 17to 23 years. Perhaps this group is representative of those who will use mobile and pervasive computing devices more oftenthan other groups. However, some of the survey results will not generalize broadly (e.g., their attitudes toward maritalstatus, employment status, monthly income, and monthly expenses). In addition, younger people might have fewer privacyconcerns about keeping health conditions private than older adults. Theymay also have less experience in their informationbeing collected and used (e.g., profiling and identity theft). This younger age group, however, will be faced with these issuesin the very near future. It would be ideal to help such people make rational decisions before they make costly mistakes bysharing identity information. To understand the larger population, we are studying awider diversity of participants’ identityexposure attitudes, concerns, and behaviors on a larger population group – from teenagers to seniors – and also expandingthe types of identity elements that we study.

Participants had diverse backgrounds, and some of them might not have understood technical terms such as pricediscrimination, IP addresses, and cookies. Thus, our survey results might be less accurate for these particular attitudesand behaviors. In our future studies, we may need to better explain terms in our surveys and experiments. An alternativeapproach to improve our surveys would be to ask participants to define (or choose from a set of definitions) technical termsbefore we ask them about their related attitudes and behaviors.

Our researchwas approved by the university IRB andwe provided an informed consent form to informparticipants aboutpotential risks. Because we promised no risk to participants, those who read the consent form carefully would reasonablyhave felt ‘‘safer’’ in disclosing private information in this study than they might to a retailer. This feature of experiments inthe lab may complicate the interpretation of the results and restrict their generality.

It is challenging to design surveys to study people’s privacy attitudes, concerns, and claimed privacy protection actions.When privacy issues are salient, people might become more cautious, and thus their answers might be biased towardgreater security. To reduce this potential problem in our research, we mixed our target questions with other questions,such as those about participants’ music preferences. This same strategy may also be used by service providers to elicitmore sensitive identity elements from users. Another challenge in survey research is that participants may interpret thequestions differently or theymay consider the privacy questions in different contexts. For example, most people are likely toprovide information about their health conditions to doctors or familymembers, yet theymay not want to inform strangers.Participantsmay also become fatigued or reluctant to spend a long time answering an extensive number of survey questions,

Page 16: Understanding identity exposure in pervasive computing environments

792 F. Zhu et al. / Pervasive and Mobile Computing 8 (2012) 777–794

such that their answers may be of lower quality (e.g., less accurate) toward the later part of the surveys. For this reason, wedid not provide detailed contexts for each of the questions because this would increase the length of the survey. We are,however, currently exploring disclosure in various contexts (e.g., to whom and under what conditions do people disclose?)in our ongoing work.

Although we may acquire users’ attitudes toward keeping identity elements private via questionnaires, we may notbe able to acquire information about all of their attitudes because there are too many combinations of multiple identityelements to study. If a service provider wants to acquire a sensitive identity element from a user, he might request severalidentity elements that are less sensitive and use the combination of the identity elements to achieve the same purpose, aswe discussed in Section 3.4.

The experiments might be further improved in their degree of realism or by occurring in typical pervasive computingenvironments. We could implement the software using RFID tags to augment the CDs and RFID readers using the SD slot onour PDAs. Users could therefore have a better CD shopping experience by automating the information exchange betweena CD, a PDA, and a backend server. There was, however, only one SD slot on each of the PDAs that we used. Although weimplemented the interaction using RFID readers for PDAs, we elected to use the slot for a SD memory card for temporaryaudio and video storage, rather than using the RFID readers and tags.

In our RationalExposure approach, we suggested that users employ the strategy of exposing general identity elementsrather than specific identity elements. Often, this may be a good, conservative approach for protecting identity. A generalidentity element, however, may be still sensitive enough to keep private (e.g., being a patient of a drug rehabilitationclinic). In these cases, our future software should allow users to specify identity elements that may not be used in ordinarynegotiations utilizing the RationalExposure model.

5. Conclusion and future work

The goal of this research was to address seven issues related to identity exposure attitudes, concerns, claimed actions,and actual exposure behaviors. The research tested the hypothesis that the number of interactions with a service providerwould be correlated with the sensitivity of the identity information exposed to that provider. That is, knowledge of a serviceprovider was expected to predict participants’ level of exposure. In addition, the research tested the effectiveness of theRationalExposure model in helping people minimize the amount of identity information that they expose.

Participants indicated that their addresses, driver’s licenses, credit card numbers, and social security numbers were themost important identity elements to keep private. They also indicated that they had the strongest concerns about theircomputers being infiltrated, identity theft, and having their private information collected, transferred, or stolen. The mostfrequently claimed security actions were blocking pop-ups, checking their credit card statements, and using anti-virussoftware. When asked to provide identity information, they varied in their responses, including disclosure, faking, andrefusal. Participants who indicated previous interactions with particular service providers (e.g., Amazon.com) were alsomore likely to indicate that they would disclose personal information to those providers.

An interesting finding that can be seen in one of the dendrograms (Fig. 2) of our study is that participants considered someidentity elements to be of similar importance to keep private. For example, survey participants rated their phone numbersand home addresses very similarly. Thus, if a user believes that her home address is important to keep private, she mightalso be likely to keep other unique information, such as her phone number, private. These patterns should be re-evaluatedwith a different sample of participants, in a conceptual replication, to test the generality of our results.

In the two experimental stages of the research, participants were less likely to reveal identifying informationwhen aidedby the RationalExposure model suggestions. An important finding of this research is that although participants followed thesuggestions providedby theRationalExposuremodel for disclosure of phonenumbers andof full driver’s license information,they did not learn or choose to be more prudent in their disclosures when suggestions were not provided (as in the VIPprivacy disclosures). If future research replicates this result, we could speculate that people either need more practice innegotiating which identity information to expose or need explicit instructions from the RationalExposure model when theyare engaging in exposure behaviors. Our data do provide evidence that participants’ exposure behaviors were not related totheir privacy concerns or to their claimed security actions. In addition, our experimental data indicate that participants werewilling to disclose multiple, less sensitive items of identity information that, unfortunately, could be used in combinationto uniquely identify individuals.

Another observation is that wemay calculate the approximate sensitivity of identity elements from some basic statistics.The calculation may be useful in determining the sensitivity of the combination of multiple identity elements. If we dividethe population of the United States (300 million people) by 40,000 zip codes, 365 days a year, 80 possible ages and 2 gendertypes, the result is less than 0.13; we can uniquely identify a person. Therefore, if a user wants to keep her home address(unique identity information) private, she may also want to keep the combination of her zip code, date of birth, and genderprivate. Sweeney’s research showedmultiple examples of how a few, seemingly anonymous, identity elements can be usedto uniquely identify individuals [51]. Thus, when a service provider cumulatively asks for multiple identity elements, ourRationalExposure software may notify users that the combination of the identity elements to the service provider willuniquely identify the user.

We are working on several aspects to improve the suggestions that our RationalExposure model provides. We want toincrease the suggestion adoption ratio, especially when the requested identity elements are not perceived to be unique or

Page 17: Understanding identity exposure in pervasive computing environments

F. Zhu et al. / Pervasive and Mobile Computing 8 (2012) 777–794 793

sensitive by users. First, we are using psychological approaches, grounded in empirical research on persuasion, to identifystrategies that can help users to be more prudent in their identity exposure. We are evaluating the effectiveness of a varietyof types of messages and alerts in our RationalExposuremodel. Second, we are collecting facts and data about potential risksof identity exposure, so that we can provide more detailed information to users about the possible consequences of theirdisclosure, to better protect their privacy and security interests.

More research needs to be done on these topics. As technology advances, it will be essential to understand users’ attitudesand behaviors related to the privacy and security of their personal information in this pervasive computing environment.Studying this ‘‘human factor’’ fromapsychological perspectivewill help to provide software developerswith the informationthey need to develop effective security systems for privacy protection.

References

[1] R. Campbell, et al. Towards security and privacy for pervasive computing, in: International Symposium on Software Security, Tokyo, Japan, 2002.[2] I. Altman, The Environment and Social Behavior: Privacy, Personal Space, Territory, and Crowding, Brooks/Cole Publishing Company, Monterey,

California, 1975.[3] L. Palen, P. Dourish, Unpacking ‘‘Privacy’’ for a networked world, in: Proceedings of the SIGCHI conference on Human factors in computing,

Ft. Lauderdale, Florida, 2003.[4] D.J. Solove, The Digital Person: Technology and Privacy in the Information Age, New York University Press, 2004.[5] L. Sweeney, k-ANONYMITY: a model for protecting privacy, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10 (2002)

557–570.[6] D.J. Solove, M. Rotenberg, P.M. Schwartz, Privacy, Information, and Technology, Aspen Publishers, 2006.[7] D. Lyon, in: T. May (Ed.), Surveillance Society: Monitoring Everyday Life, in: Issues in Society, Open University Press, 2001.[8] L. Sweeney, Uniqueness of simple demographics in the US population, Carnegie Mellon University, Laboratory for International Data Privacy,

Pittsburgh, 2000.[9] M.S. Ackerman, L.F. Cranor, J. Reagle, Privacy in E-commerce: examining user scenarios and privacy preference, in: Proceedings of the 1st ACM

Conference on Electronic Commerce, Denver, Colorado, 1999.[10] D.H. Nguyen, A. Kobsa, G.R. Hayes, An empirical investigation of concerns of everyday tracking and recording technologies, in: Proceedings of the 10th

International Conference on Ubiquitous Computing, Seoul, Korea, 2008.[11] S. Spiekermann, J. Grossklags, B. Berendt, E-privacy in 2nd generation E-commerce: privacy preferences versus actual behavior, in: Proceedings of the

3rd ACM Conference on Electronic Commerce, Tampa, Florida, 2001.[12] L. Cranor, et al. The platform for privacy preferences 1.1, P3P1.1, Specification, W3C, November 2006.[13] M. Langheinrich, Privacy by design—principles of privacy-aware Ubiquitous systems, in: Ubicomp 2001 Proceedings, 1st ed., in: Lecture Notes in

Computer Science, vol. 2201, 2001, pp. 273–291.[14] E. Snekkenes, Concepts for personal location privacy policies, in: 3rd ACM conference on Electronic Commerce, Tampa, Florida, USA, 2001.[15] U. Leonhardt, J. Magee, Security considerations for a distributed location service, Journal of Network and Systems Management 6 (1998) 51–70.[16] M. Langheinrich, A privacy awareness system for Ubiquitous computing environments, in: 4th International Conference on Ubiquitous Computing,

Göteborg, Sweden, 2002.[17] J. Hong, J. Landay, An architecture for privacy-sensitive Ubiquitous computing, in: 2nd International Conference onMobile Systems, Applications, and

Services, Boston, MA, 2004.[18] A. Soppera, T. Burbridge, Maintaining privacy in pervasive computing—enabling acceptance of sensor-based services, BT Technology Journal 22 (2004)

106–118.[19] F. Zhu, W. Zhu, RationalExposure: a game theoretic approach to optimize identity exposure in pervasive computing environments, in: IEEE Annual

Conference on Pervasive Computing and Communications, Percom 2009, Galveston, TX, 2009.[20] G. Marx, Identity and anonymity: some conceptual distinctions and issues for research, in: J. Caplan, J.C. Torpey (Eds.), Documenting Individual

Identity: The Development of State Practices in the Modern World, 2001.[21] K.J. Bannan, Cell phone payment system options multiply, in: creditcards.com, May 24, 2010.[22] A. Westin, 1994 Equifax/Harris consumer privacy survey, 1994. Available: www.cis.gsu.edu/~dstraub/CIS8680/.../equifax_executive_summary.doc.[23] P. Kumaraguru, L.F. Cranor, Privacy Indexes: A Survey of Westin’s Studies, Carnegie Mellon University, 2005.[24] J.S. Olson, J. Grudin, E. Horvitz, A study of preferences for sharing and privacy, in: 2005 Conference On Human Factors In Computing System, CHI 2005,

Portland, Oregon, 2005.[25] D.H. Nguyen, et al. Encountering sensecam: personal recording technologies in everyday life, in: Proceedings of the 11th International Conference on

Ubiquitous Computing, Orlando, Florida, 2009.[26] B. Berendt, O. Günther, S. Spiekermann, Privacy in E-commerce: stated preferences vs. Actual behavior, Communications of the ACM 48 (2005).[27] A. Acquisti, J. Grossklags, Privacy and rationality in individual decision making, IEEE Security and Privacy (2005) 26–33.[28] S. Consolvo, et al. Location disclosure to social relations: why, when, &what peoplewant to share, in: Proceedings of the SIGCHI Conference on Human

Factors in Computing Systems, 2005.[29] R.M. Kramer, Trust and distrust in organizations: emerging perspectives, enduring questions, Annual Review of Psychology 50 (1999) 569–598.[30] F.N. Egger, Affective design of E-commerce user interface: how to maximize perceived trustworthiness, in: International Conference on Affective

Human Factors Design, London, 2001.[31] T.A. Hemphill, Electronic commerce and consumer privacy: establishing online trust in the US digital economy, Business and Society Review 107

(2002) 221–239.[32] D.H. McKnight, V. Choudhury, C. Kacmar, Developing and validating trust measures for E-commerce: an integrative typology, Information Systems

Research 1 (2002) 334–359.[33] R. Boyle, P. Bonacich, The development of trust and mistrust in mixed-motive games, Sociometery 33 (1970) 123–139.[34] S. Petronio, W.T. Durham, Communication privacy management theory, in: L.A. Baxter, D.O. Braithwaite (Eds.), Engaging Theories in Interpersonal

Communication: Multiple Perspectives, Sage, 2008.[35] M.J. Metzger, Communication privacy management in electronic commerce, Journal of Computer-Mediated Communication 12 (2007) 1–27.[36] S. Lederer, et al., Personal privacy through understanding and action: five pitfalls for designers, Personal and Ubiquitous Computing 8 (2004) 440–454.[37] G. Iachello, J. Hong, End-user privacy in human–computer interaction, Foundations and Trends in Human–Computer Interaction 1 (2007).[38] L.F. Cranor, J. Reagle (Eds.), Designing a Social Protocol: Lessons Learned from the Platform for Privacy Preferences Project (Telephony, the Internet,

and the Meda), Lawrence Erlbaum Associates, Mahwah, 1998.[39] A. Brandt, Privacy Watch: A Little Bird That Guards Your Online Privacy, PCWorld, 2002.[40] M. Osborne, An Introduction to Game Theory, Oxford University Press, New York, 2004.[41] United states to require RFID chips in passports, in: PC World, October, 26, 2005.[42] B. Logan, et al. A long-term evaluation of sensingmodalities for activity recognition, in: Proceedings of the 9th International Conference on Ubiquitous

Computing Innsbruck, Austria, 2007.

Page 18: Understanding identity exposure in pervasive computing environments

794 F. Zhu et al. / Pervasive and Mobile Computing 8 (2012) 777–794

[43] K. Kunze, P. Lukowicz, Dealing with sensor displacement in motion-based onbody activity recognition systems, in: Tenth International Conference onUbiquitous Computing, Seoul, Korea, 2008.

[44] M. Hazas, J. Scott, J. Krumm, Location-aware computing comes of age, IEEE Computer 37 (2004).[45] D.H. Kim, et al. Discovering semantically meaningful places from pervasive RF-beacons, in: 11th International Conference on Ubiquitous Computing

Orlando, Florida, 2009.[46] E. Toch, et al. Empiricalmodels of privacy in location sharing, in: 12th ACM International Conference onUbiquitous Computing, Copenhagen, Denmark,

2010.[47] Minitab statistical software, Release 15 for Windows.[48] C. Jensen, C. Potts, C. Jensen, Privacy practices of internet users: self-reports versus observed behavior, International Journal of Human–Computer

Studies 63 (2005) 203–227.[49] C. Jensen, C. Potts, Privacy plicies as decision-making tools: an evaluation of online privacy notices, in: Conference on Human Factors In Computing

Systems, CHI 2004, Vienna, Austria, 2004.[50] R. Anderson, Security Engineering: A Guide to Building Dependable Distributed Systems, 2nd ed., Wiley, 2008.[51] L. Sweeney, Computational disclosure control, Ph.D. Dissertaton, Laboratory for Computer Science, Massachusetts Institute of Technology, 2001.