the university of chicago retrospective file …€¦ ·  · 2017-11-22the university of chicago...

80
THE UNIVERSITY OF CHICAGO RETROSPECTIVE FILE MANAGEMENT FOR PRIVACY AND SECURITY IN CLOUD STORAGE SERVICES A DISSERTATION SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCES IN CANDIDACY FOR THE DEGREE OF MASTER’S DEGREE DEPARTMENT OF COMPUTER SCIENCE BY MARIA HYUN CHICAGO, ILLINOIS NOVEMBER 2017

Upload: phamtruc

Post on 27-May-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

THE UNIVERSITY OF CHICAGO

RETROSPECTIVE FILE MANAGEMENT FOR PRIVACY AND SECURITY IN

CLOUD STORAGE SERVICES

A DISSERTATION SUBMITTED TO

THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCES

IN CANDIDACY FOR THE DEGREE OF

MASTER’S DEGREE

DEPARTMENT OF COMPUTER SCIENCE

BY

MARIA HYUN

CHICAGO, ILLINOIS

NOVEMBER 2017

Copyright © 2017 by Maria Hyun

All Rights Reserved

TABLE OF CONTENTS

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Our Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 RELATED WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.1 Cloud Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Privacy and Security Concerns for Cloud Storage . . . . . . . . . . . . . . . 42.3 Retrospective Privacy of Social Media . . . . . . . . . . . . . . . . . . . . . . 62.4 User Conceptualization of File Sharing . . . . . . . . . . . . . . . . . . . . . 72.5 Personal Information Management . . . . . . . . . . . . . . . . . . . . . . . 7

3 METHODOLOGY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.1 Cloud Storage Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.2 Data Collection and Ethics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.3 Recruitment and Inclusion Criteria . . . . . . . . . . . . . . . . . . . . . . . 123.4 File Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.5 Survey Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.5.1 Generic Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.5.2 File-Specific Questions . . . . . . . . . . . . . . . . . . . . . . . . . . 153.5.3 Features and Demographics . . . . . . . . . . . . . . . . . . . . . . . 17

4 DATA ANALYSIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.1 Aggregation and Basic Statistics . . . . . . . . . . . . . . . . . . . . . . . . . 184.2 Qualitative Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.3 Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

5 RESULTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215.1 Participant Demographics and Account Usage . . . . . . . . . . . . . . . . . 215.2 Account Archeology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235.3 File Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245.4 File Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.5 File Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365.6 File Co-ownership . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.7 File Automation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

iii

6 DISCUSSION AND LIMITATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . 476.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

7 CONCLUSION AND FUTURE WORK . . . . . . . . . . . . . . . . . . . . . . . 497.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

A SURVEY INSTRUMENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56A.1 General question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56A.2 Content specific question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60A.3 Features and Demographics . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

iv

LIST OF FIGURES

3.1 An overview of the survey procedures from the perspective of a participant. . . . 103.2 Screenshot of file-specific questions . . . . . . . . . . . . . . . . . . . . . . . . . 16

5.1 Number of files based on creation date . . . . . . . . . . . . . . . . . . . . . . . 245.2 Number of files based on last modification date . . . . . . . . . . . . . . . . . . 245.3 Comparison of file ownership and remembrance . . . . . . . . . . . . . . . . . . 255.4 File recollection and management decisions . . . . . . . . . . . . . . . . . . . . . 295.5 Comparison of file deletion and file ownership levels . . . . . . . . . . . . . . . . 315.6 Comparison of file encryption and participant technical background . . . . . . . 315.7 Future access and file management decision . . . . . . . . . . . . . . . . . . . . 315.8 Participant management decisions for additional copies . . . . . . . . . . . . . . 325.9 The effect of security perception on file management decisions . . . . . . . . . . 335.10 Ability to access the files and file management decisions . . . . . . . . . . . . . 335.11 Participant preferences for sharing decisions . . . . . . . . . . . . . . . . . . . . 385.12 Sharing type and sharing decisions . . . . . . . . . . . . . . . . . . . . . . . . . 385.13 Original shared status and sharing decision . . . . . . . . . . . . . . . . . . . . . 395.14 Cloud storage and file co-ownership . . . . . . . . . . . . . . . . . . . . . . . . . 415.15 The effect of file ownership on co-ownership . . . . . . . . . . . . . . . . . . . . 415.16 Original shared status and file versioning . . . . . . . . . . . . . . . . . . . . . . 415.17 Sharing method and file versioning . . . . . . . . . . . . . . . . . . . . . . . . . 425.18 Comparison of auto-archiving and delay tolerance . . . . . . . . . . . . . . . . . 46

v

LIST OF TABLES

3.1 Categories for selecting files in our stratified sample. . . . . . . . . . . . . . . . 13

5.1 Participant demographics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215.2 Descriptive statistics of participant accounts . . . . . . . . . . . . . . . . . . . . 235.3 Factors correlated with file recognition . . . . . . . . . . . . . . . . . . . . . . . 275.4 Factors correlated with file remembrance . . . . . . . . . . . . . . . . . . . . . . 285.5 Factors correlated with preferences for file deletion . . . . . . . . . . . . . . . . 345.6 Factors correlated with preferences for file encryption . . . . . . . . . . . . . . . 355.7 Factors correlated with wanting to stop sharing . . . . . . . . . . . . . . . . . . 39

vi

ACKNOWLEDGMENTS

아무 것도 염려하지 말고 다만 모든 일에 기도와 간구로,

너희 구할 것을 감사함으로 하나님께 아뢰라.

그리하면 모든 지각에 뛰어난 하나님의 평강이

그리스도 예수 안에서 너희 마음과 생각을 지키시리라.

(빌립보서 4:6-7)

First, I would like to express my sincere gratitude to my advisor, Prof. Ur, for the con-

tinuous support of my study, for his patience, motivation, and immense knowledge. His

guidance helped me in research and writing this thesis. I could not have imagined having a

better advisor and mentor for my MS thesis.

Besides my advisor, I would like to thank Prof. Kanich for his insightful comments and

encouragement, but also for his hard questions which incentivized me to widen my research

and account for various perspectives. My sincere thanks also goes to Taha. It was always my

pleasure to work with you. Also, I thank Miranda, who helped me to organize and edit my

thesis.

Last but not least, I would like to thank my family: my parents, my sister, Young-Eun,

and Modu, Dalbi, Bana for supporting me spiritually throughout writing this thesis and my

life in general.

vii

ABSTRACT

Users have accumulated years of personal data in cloud storage, creating potential privacy

and security risks. This agglomeration includes files retained or shared with others simply

out of momentum, rather than intention. We presented 100 online survey participants with a

stratified sample of 10 files currently stored in their own Dropbox or Google Drive accounts.

We asked about the origin of each file, whether the participant remembered that file was

stored there, and, when applicable, about that file’s sharing status. We also recorded partic-

ipants’ preferences moving forward for keeping, deleting, or encrypting those files, as well as

adjusting sharing settings. Participants had forgotten that half of the files they saw were in

the cloud. Participants recalled that 50.6% of the files they saw in the study were stored in

the cloud. Participants did not recognize 13.5% of the files they saw. Participants recognized

the remaining 35.9%, but had forgotten that the file was stored in the cloud. Moreover, out

of the ten files we asked about, the median number of files the subject remembered storing

to their cloud account was five. Overall, 83% of participants wanted to delete at least one

file they saw, while 13% wanted to unshare at least one file. 81% of participants responded

that it was important to keep at least one of the ten presented files safe from unauthorized

access, yet they had forgotten that file was stored in the cloud. Our combined results suggest

directions for retrospective cloud data management.

viii

CHAPTER 1

INTRODUCTION

1.1 Motivations

As cloud platforms for storage and backup have matured, many users have implicitly become

long-term users of these platforms. These users have years of their personal data stored in the

cloud, yet they have likely forgotten about the existence of most of this data. This state of

affairs has two troubling consequences. First, the agglomeration of a user’s personal data in

one location presents attackers with a very attractive single target. If an attacker successfully

impersonates the user (e.g., by guessing his or her password), the attacker can potentially

access all of the user’s data. Second, maintaining this large amount of data such that all of

it is accessible to the user on a moment’s notice is a tremendous waste of resources.

However, many of these concerns could be mitigated if users had an active role in man-

aging their data and better understood which files were stored in their cloud. Although

researchers have analyzed user perceptions and system limitations, there has been little re-

search from a user-centered perspective about what data users have stored in the cloud and

forgotten about, as well as what they would like to do with that data. Thus, we take the first

steps toward filling that gap. We investigated cloud storage usage, including why participants

originally stored files in the cloud, to determine optimal file management decisions.

1.2 Our Work

This thesis focuses on our user study to characterize the data participants have stored in

their cloud accounts. We also investigated three types of remediations for retrospective data

management: deleting old data, automatically encrypting old data, and moving old data to

low-energy archives. In our study, participants were first given a series of file management

1

options and instructed to identify their preferred outcome for each file and the relevant

parameters (i.e., service, file type, access permission, size of file).

We conducted a 100-participant online survey using Amazon’s Mechanical Turk. To

ground this survey concretely in a participants’ own stored data, we focused the survey

questions on ten files selected from a participant’s very own Dropbox or Google Drive in

a stratified sample. We used the APIs for Dropbox and Google Drive to show participants

these files and to characterize their account more broadly.

Our survey consisted of three parts. The first part was composed of generic questions

related to account information, such as account age and the main reason for using cloud

storage. Second, we asked questions related to the ten different files selected from the user’s

account. We investigated whether participants knew what the file was, whether they remem-

bered that it was stored in the cloud, and gauged whether they wanted to keep the file

as-is, or if they wanted to either delete or encrypt it. If the file was shared with other users,

either by name or via a shared link, we also asked about the origin of this sharing, as well

as whether sharing the file was still desired. Finally, we asked about user demographics and

general preferences related to the possibility of automated retrospective file management.

1.3 Findings

Our participants used either Google Drive or Dropbox for storing and sharing a nontrivial

number of files, and they had varied goals in using these services. The median number of

files stored in each participant’s cloud was 444.5. 71% used cloud storage for collaboration,

83% for sharing, 92% for archival purposes, and 5% for other reasons.

Overall, we found that the cloud storage accounts of these participants contained a mass

of data that was indeed forgotten, but not gone. Participants recalled that 50.6% of the files

they saw in the study were stored in the cloud. Participants did not recognize 13.5% of the

files they saw. Participants recognized the remaining 35.9%, but had forgotten that the file

2

was stored in the cloud. Moreover, out of the ten files we asked about, the median number

of files the subject remembered storing to their cloud account was five. The likelihood a

participant remembered a file was stored in the cloud varied significantly based on a number

of other factors, including file type, the participant’s access to the file (owner, editor, viewer),

file size, and when the file was last modified.

Participants’ responses to our questions about managing files in their cloud storage and

the sharing settings of those files revealed a latent need for retrospective data management.

83% of participants wanted to delete at least one file of the ten presented, and 13% wanted

to unshare at least one previously shared file.

Our study is the first to focus on cloud-user needs for retrospective file management by

grounding questions in a sample of the files stored in participants’ own cloud storage accounts.

81% of participants responded that it was important to keep at least one of the ten presented

files safe from unauthorized access, yet they had forgotten that file was stored in the cloud.

Such latent risks are exactly those that users have difficulty effectively understanding or

managing.

Moreover, using mixed-effects logistic regression, we investigated possible predictive fac-

tors for these file management preferences. Beyond a small number of factors, like the par-

ticipant’s access to the file, these models did not capture much of the rationale underlying

the decisions of participants. However, our study is the first step toward designing interfaces

and mechanisms for enabling retrospective file management in the cloud. Further research

into both understanding user perceptions of these archives and new methods of effectively

managing them can empower users to better deal with privacy and security threats.

3

CHAPTER 2

RELATED WORK

Here we summarize the history of cloud storage and associated privacy and security concerns

that have emerged. We then describe work that has been done to improve retrospective

privacy in social media and personal information management in email and other archives.

2.1 Cloud Storage

The advent of cloud storage was based on the reality of increasing amounts of data and

decreasing costs for storage. Cloud storage allows “ubiquitous, convenient, on-demand net-

work access” to its users at a low cost [42]. Moreover, cloud storage provides broad network

accesses by allowing thick and thin client platforms [42]. Data availability is also ensured

because cloud storage companies protect any failures [20,36]. As a result, cloud storage has

gained significant popularity. Consumer cloud services have developed primarily over the last

decade. Box announced online file sharing for personal use in 2005 and Dropbox followed

soon after. Eventually big companies such as Microsoft and Google started their services in

2012 [20], and some researchers predict that the global market for personal cloud storage is

projected to reach $71.3 billion USD by 2020 [25].

2.2 Privacy and Security Concerns for Cloud Storage

Despite its benefits, cloud storage has many implications for privacy and security. Careful

analysis of the architecture and workloads of such systems highlights vulnerabilities in their

usage and impact on users [20, 26, 63]. Computer experts have found security issues in the

implementations of cloud storage. Hu et al. evaluated cloud storage options from Mozy,

Carbonite, Dropbox, and CrashPlan, and found that no company offered any guarantees for

data integrity and availability, nor did they assume any liability for security breaches or data

4

loss [31]. Moreover, most free services do not offer data encryption, forcing data safety to

become the responsibility of the user. Although some solutions have been proposed to allow

users to take advantage of the cloud without compromising privacy and autonomy [52],

personal cloud storage is still vulnerable to many attacks. When personal information is

at risk, as in the 2014 case of Dropbox’s link disclosure vulnerability [28], users are left

vulnerable. While legal protections on data stored in the cloud dictate that users do have a

reasonable expectation of security and privacy [33], the question remains: how do providers

implement user-centered data management?

These issues are exacerbated because users do not fully understand how their data is

managed. It is not uncommon for private information to be uploaded to the cloud uninten-

tionally. Clark et al. discovered the majority of cloud users did not know that their private

photos were uploaded to their cloud storage [17].

Moreover, users still express distrust in the cloud. In Ion et al.’s cross-cultural study

of cloud usage, most participants perceived cloud storage to be less secure than local stor-

age [32]. This would explain why users are reluctant to store sensitive data in the cloud [2,

16,43,49,54].

Many of these concerns can be mitigated if users have a more active role in managing their

data and better understand which files are stored in their cloud. Although researchers have

analyzed user perceptions and system limitations, there has been little research from a user-

centered perspective about what data users have stored in the cloud and forgotten about and

what they would like to do with such data. We investigated cloud storage usage, including

why participants originally stored files in the cloud, to determine optimal file management

decisions.

5

2.3 Retrospective Privacy of Social Media

While surprisingly little work has investigated retrospective data management for cloud stor-

age, researchers have examined analogous questions concerning social media. Safeguarding

privacy in social media is especially complex because users make dynamic privacy decisions

based on context [50]. Nevertheless, the context of social media is a useful point of comparison

for cloud storage— support content that can either be shared publicly or kept private.

According the current researchers, temporality mediates whether users perceive content

to be public or private. It can also explains the changing relevance of posts over time [3,9,62].

The passage of time plays an important role for predicting the behaviors of users. Zhao et

al. found that there were two regions in social network services—personal and public—

people gradually moved their posts from the public region to the private by reevaluating and

reselecting the content depending on time [62]. Ayalon et al. showed that the willingness to

share and the relevancy of posts significantly dropped as time advances [3].

However, temporality cannot always predict what a user’s preferences will be in the

future [9]. Bauer et al. examined longitudinal aspects of Facebook privacy and expected

that content would become less valuable over time and thus users would want that content

to “fade away.” However, they found much more complex preferences. Participants wanted

roughly one-third of posts to indeed fade away after a month, but they surprisingly wanted

another one-third of posts to become more widely accessible one month later. Through the

qualitative responses of these participants, Bauer et al. found that these changes were caused

by a number of factors ranging from life events to nostalgia [9].

A study of retrospective privacy on Twitter demonstrates the limitations of current con-

tent control mechanisms in social media. Even if users withdraw tweets (e.g., by deleting

them), retweets may provide residual evidence and may even highlight when deleted tweets

are missing [44]. Cloud storage can create similar problems for users. They may not be fully

aware of the consequences of changing file-sharing settings. In this study, we investigated

6

the optimal choices that should be offered to cloud storage users and how to minimize the

negative consequences that could result from sharing files.

2.4 User Conceptualization of File Sharing

One of the advantages of cloud storage is sharing files with the service’s users. However,

there is no clear sharing characterization for cloud storage [27, 45, 64], and users experience

problems for understanding the functionality of cloud providers because they have inaccu-

rate conceptual models of the cloud [40]. In addition to unclear sharing characterization,

insufficient visibility of collaborator activities is one of the major problems of cloud shar-

ing [55,56,60].

Sharing privileges pose many issues within the cloud file sharing paradigm. Local files

usually have a single owner and others are only given editor and viewer privileges. In the

cloud, however, owner privileges can be assigned to others or to multiple people. This distinc-

tion is not always intuitive to users and requires more explanation [15]. Nuances of sharing

are central to how users understand the concept of cloud storage [29]. Users often refrain

from making decisions about shared-ownership files, even when they can and should, mainly

because they relegate authority to the original creator [47, 61]. Also, reluctance to delete

files results in clutter, frustrates shared users [46], and creates problems for file management.

Moreover, it is challenging and confusing for most users to understand the implications of

deleting from a shared folder [48]. This can be exacerbated in shared repositories, and users

must develop a variety of management structures and strategies [41].

2.5 Personal Information Management

Research on personal information management (PIM) began in the 1980s to help users better

store, organize, and retrieve collections of data [7, 10,11,14,39]. Researchers have suggested

7

PIM interfaces for web activities [21,35], email [4, 5, 10, 51,58,59], and local files [6, 8].

There are several technical and usability limitations of existing PIM systems, and users

struggle to manage volumes of information constantly increasing over time [12, 13]. People

naturally want to organize their information, regardless of data type or storage location, and

building software that is aware of these expectations will greatly decrease costs, errors, and

frustrations [53]. PIM must be adequately supported by current technologies.

The critics of PIM have focused on current research trends. These researchers have

adopted a cognitive approach to their PIM tools. One focus has been on understanding

memory issues, and these researchers found that memory problems hinder PIM [19, 23, 34].

Furthermore, many groups have designed systems to support known characteristics of mem-

ory [1, 18, 21, 24, 34, 35, 38, 57]. Elsweiler et al. tried to understand the memory lapses that

related to PIM. Their study focused on retrospective lapse—defined as forgetting details of

past events or previously acquired information—and found that this caused problems for ac-

tions performed in the present. They promoted reducing cognitive overload and maintaining

several organizational systems that minimized cognitive effort [23].

Other researchers adopted machine learning techniques to reduce user effort expended on

PIM [4, 53]. Ayodele et al. suggested an intelligent email assistant manager, which applied

semantic content learning tools to capture email conversation threads in order to group

them according to contextual relevancies [4]. Stumpf and Herlocker suggested TaskTracer,

which used machine learning techniques and past activities to reduce physical and cognitive

costs and errors to increase productivity [53]. TaskTracer monitored user interactions with a

computer, collected detailed records of user activities and resources accessed, used machine

learning techniques to detect task switches, and was thereby able to predict what the current

task was.

Lastly, little work has focused on PIM for the unique complications of consumer cloud

storage. Cloudsweeper, a cloud-based email protection system for PIM, let users remove or

8

“lock up” sensitive, unexpected, and rarely used information. While it effectively protected

some sensitive files [51], Cloudsweeper’s methods do not map directly to cloud storage. Thus,

we tried to determine participant preferences for their file management system to provide

better insight for PIM in cloud storage through a retrospective file management system.

9

CHAPTER 3

METHODOLOGY

Obtain UserConsent

OAuth Flow

Collect DataFrom API

GenericQuestions

DisplaySelected File

File SpecificQuestions

Repeated for 10 selected files

Features andDemographics

Questions

Figure 3.1: An overview of the survey procedures from the perspective of a participant.

To map out the needs and opportunities for helping users manage forgotten files in their

cloud storage accounts, our procedure combined programmatic access to the stored files

with a dynamic online survey. Due to their popularity and API availability, we chose to

implement our survey instrument for both Google Drive and Dropbox. The survey has three

main sections: (1) a set of generic questions regarding the use of cloud storage, (2) detailed

questions about a stratified sample of 10 files that each participant had in their actual Google

Drive or Dropbox account, and (3) a final section in which we asked about the potential for

automating file management and collected participant demographics. Figure 3.1 summarizes

our survey flow. Each step is detailed in the following sections.

3.1 Cloud Storage Services

While Dropbox has existed since 2007, Google Drive was only introduced in 2012. Both ser-

vices offer free and paid tiers. Dropbox offers 2GB of free storage, while Google Drive provides

15GB. Google’s free 15GB, however, are shared between all Google services, including Gmail

and Google Photos.

While the services offered by Google Drive and Dropbox are similar in the grand scheme,

some small differences impacted our study design. Dropbox and Google Drive provide sharing

in two distinct ways: the first one is sharing files via email, which is done on an individual

10

basis. The second method of sharing is via a link, where anyone with a link has access to

the file. Additionally, sharing can be transitive: a file shared from user A to user B can then

be shared from user B to user C, depending upon the permissions granted by user A. How

sharing works differs slightly between services: a Dropbox user sharing an individual file can

only give others viewing access; granting edit access requires the entire folder containing

the file to be shared. On the other hand, Google Drive allows its users to grant view and

edit access for both files and folders. Furthermore, for link sharing, Dropbox users with free

accounts are limited to share links with view access only, whereas Google Drive links can

apportion view or edit access. When asking specifically about shared files in our survey, we

did not consider Dropbox files shared via a link because they do not enable collaboration.

3.2 Data Collection and Ethics

An essential part of our study involved showing participants files in their own cloud storage

accounts and asking questions to gauge their receptiveness to different data management

options. We first presented users with a consent form explaining what API access we needed

and what information we would retain on our servers. After participants consented to the

study, we requested authorized access to the service using OAuth2, which allows our applica-

tion to programmatically access the files stored within the account. This mechanism allowed

us to be granted temporary access to these accounts without having to ask users for their

passwords. This access can be revoked by the user at any time.

After obtaining participant authorization, we used the official APIs provided by Dropbox

and Google Drive to collect the data. Specifically, we used the Dropbox API v2 and Google

Drive API v3. Because the number of files per account varied widely, and we needed the

full list of files in the account to perform a stratified sample, we optimized API calls to

ensure that the collection process was robust and relatively quick. As shown in Figure 3.1,

we programmatically collected this data while the participant completed the generic portion

11

of our survey.

Throughout this process, our primary concern was to maintain the privacy of all partici-

pants and to collect data in an ethical manner. We used multiple techniques to protect user

safety. The survey was hosted on an HTTPS domain with a valid certificate. We provided

participants with our detailed privacy policy, including our contact details. For both cloud

services, we limited the OAuth2 permission scope and requested only basic account informa-

tion along with the file/folder metadata needed for our survey. In terms of data storage, we

only stored the information we needed, including one-way hashes for any unique identifiers to

prevent retaining PII (Personally identifiable information). Furthermore, information such

as file names and the names of other users who shared files with the participants were only

displayed in-browser via direct API calls and were not retained on our servers.

3.3 Recruitment and Inclusion Criteria

We recruited participants on Amazon’s Mechanical Turk. We limited participants to North

America and also required them to have a previous approval rating of 95%+. As our goal

was to the investigate temporal file management and sharing decisions for cloud storage,

we preformed a preliminary screening of the survey participants using metadata from their

accounts and verified that they met our criteria for inclusion, which we also presented to

prospective participants in our Mechanical Turk HIT description. Our criteria included the

following stipulations:

• More than 50 total files in the cloud storage account

• At least one file that was older than 30 days

• At least one shared folder on Dropbox or at least ten shared files on Google Drive

These filters ensured that the participants’ accounts were sufficiently well used for us to

ask about various use cases. We had additional sanity checks to ensure that participants

could not attempt to trivially meet our requirements without using their own legitimate

12

Index Selected File Description1 Largest shared file of any type2 Largest unshared file of any type3 Shared media file of size greater than 250KB4 Unshared media files of size greater than 250KB5 Recently modified shared document6 Recently modified unshared document7 Old modified shared document8 Old modified unshared document9 Any shared file where participant is an editor10 Any file shared via link (Google Drive Only)

Table 3.1: Categories for selecting files in our stratified sample.

account.

We recruited participants through two classes of HITs. The first, we asked participants

to select the service they used more often for cloud storage, and resulted in 17 Dropbox users

and 67 Google Drive users. To able to compare for cross the service, we posted additional

Dropbox-only HITs, which resulted in additional 16 Dropbox users.

3.4 File Selection

To gauge the various factors that might affect the file management choices of participants,

we asked each participant about ten different files from their cloud storage account. While

random sampling of files would allow us to make statistical inferences about the entire

contents of the cloud storage account, our focus was instead on collecting perceptions about

as broad a set of files and use cases as possible. We therefore conducted a stratified sampling

strategy, which is outlined in Table 3.1. Within each of these ten categories, we randomly

selected one file from all files that met the specified criteria. If no files in the user’s account

matched a category (or if we had already asked about the only such file), we selected a

random file from the account in its place.

The first two categories (#1 and #2) were used to gauge perceptions of file size and

sharing. We selected each of the largest shared and unshared files present in a participant’s

13

cloud storage. Categories #3–#8 select files by varying file types, recency of edits, and

sharing status. Finally, to investigate how sharing modality affects answers, we varied the

sharing modality for categories #9 and #10. Because one cannot share a file for editing via

link, for Dropbox users, category #10 was replaced with a file that satisfies category #3

instead. Category #10 also asked Google Drive users about link-based sharing practices.

This stratified file selection enabled us to study various metrics across individual file types.

After performing this study with 100 participants, we collected information about 1,000 files

total. Due to an error, our survey software did not record three of these 1,000 responses. We,

therefore, report results for 997 files.

3.5 Survey Structure

Our online survey consisted of three main sections. The first and third sections covered

generic questions about cloud storage usage and demographics. The second section, which

was repeated ten times, asked a series of questions about each of the ten files selected in the

stratified sample. The questionnaire used for the survey can be found in the Appendix.

3.5.1 Generic Questions

The first set of questions targeted account information and usage trends. Specifically, we

asked about 1) account history, 2) account type, 3) reasons for using cloud storage, 4) device

usage and storage patterns, and 5) account management.

We asked participants when they originally created their cloud storage accounts. We then

inquired if they had a free or a paid versions, as this may impact expectations or use cases.

The next part of the generic survey queried whether the participant used that account for

work (or school), as well as for personal purposes. We further asked whether the account

was used for collaboration, sharing, file backup, or a combination of factors.

One benefit of cloud storage is that access is not limited to an individual machine. To

14

investigate how participants accessed their accounts, a subset of the generic survey questions

asked about how frequently this storage was accessed, and whether that access was through

the service’s website, desktop, or mobile application.

We then asked participants how often they replicated their cloud storage files on local

computers, and what proportions of their local files were also backed up in the cloud. We

defined a local file as any file stored on a user’s computer accessible without an Internet

connection, and cloud storage as any file only accessible with an Internet connection. We

speculated that responses to this set of questions would provide insight into the overall file

management strategies of participants.

Since cloud storage provides a finite capacity, we asked participants how often they run

out of storage space on their cloud accounts. Along the same lines, we also asked how

frequently they organize their cloud storage by deleting unnecessary files, moving files to

different folders, or performing similar clean-up tasks. Finally, we presented the participants

with a comprehensive list of popular cloud services and asked about the ones they had

used. Our list included Amazon Cloud Drive, Apple iCloud, Box, Dropbox, Google Drive,

Microsoft OneDrive, and SpiderOak One.

3.5.2 File-Specific Questions

We proceeded to the file-specific questions once we selected the ten files. The questions were

repeated for each specific file (ten times). Before participants began answering the questions

about each file, we had them view the file via a preview link provided by the respective cloud

service API. When participants clicked the file name on the screen, the actual file opened in

new tab. This was mandatory and the next button was disabled until that link was viewed.

Figure 3.2 shows a screenshot of what a participant saw at the beginning of each set of

file-specific questions.

The first set of questions asked to what extent the participants remembered storing the

15

Figure 3.2: What participants see at the beginning of a file-specific question. Clicking theview button triggers a new browser tab with a file preview provided by the cloud storageservice.

file. We defined two levels of recall: recognition and remembrance. Recognition refers to the

individual knowing what the file is after looking at it. Remembrance indicates that the user

remembered that the file resided in their cloud storage account prior to taking the survey.1

For files the participant recognized, we asked when and why they originally stored the file

and when they would most likely access it in future.

We also presented participants with three hypothetical file management decisions for

each file: keep the file as-is, delete the file, and encrypt the file. We described the benefits

and disadvantages of each decision. For instance, leaving the file as is provides instant access

but leaves the file vulnerable in the case of an account compromise. Deleting files eliminates

sensitive personal information but is irreversible, making the file inaccessible in the future.

Encryption protects a user’s data from attackers yet would entail the overhead of managing

an encryption key or password.

Participants chose their preferred management decision from these three choices for each

1. We also asked about a third level of recall, that of remembering whether the file was still retainedanywhere, including local or offline storage. It was highly correlated with remembrance, and thus we excludedit from further analysis (ρ : 0.91, p < 0.001).

16

file. They also explained their decision in a follow-up free-response question. Lastly, we also

asked whether the participants would want to automatically apply the same decision to other

files on their account.

If a file we showed the participant was shared with others, we asked a set of questions

regarding how and why that file was shared. First, we randomly selected a set of members it

was shared with, up to three, and asked participants if they knew the person and had been in

contact with them in the past year. We also asked participants if they wanted to change the

sharing preferences of the file and why or why not. To understand how users conceptualize

file changes on those that were shared but not edited recently, we also asked if they would

like their copy of a shared file to reflect the changes others made to the file, and vice versa.

Finally, for Google Drive participants, we asked a subset of similar questions about files

shared via link. These questions did not list the name of any participants with whom the

file was shared, but still aimed toward capturing the same set of concepts.

3.5.3 Features and Demographics

The final section of our survey included questions pertinent to additional features that could

possibly be added to cloud storage services in the future. Specifically, we asked about auto-

matic file management. That is, we asked whether auto-deletion, auto-archiving, and auto-

encryption would be useful for the participant. We also inquired about what kinds of files or

folders they would want to apply these automatic decisions to, if any. Finally, we collected

optional demographic information about our participants, including gender, age, occupation,

and if they had a degree or job in computer science or a related technical field.

17

CHAPTER 4

DATA ANALYSIS

4.1 Aggregation and Basic Statistics

Besides survey responses, we collected non-sensitive, non-personally identifiable metadata

from participant cloud storage accounts. Specifically, we calculated basic descriptive account

statistics, such as the number of bytes stored in the account, number of files, and percent

of files shared in each account. We then aggregated all this file metadata with our survey

analysis for further interpretation.

4.2 Qualitative Coding

We used a standard coding process to analyze free text responses. First, a researcher created

a codebook based on the text responses. This codebook included labels for each response

with definitions. After the first researcher finished creating the codebook, that researcher

and another researcher read through the same survey responses and assigned a code to

each using the codebook. After calibrating a small number of responses, both researchers

independently coded all participant answers and calculated a Cohen’s kappa coefficient to

determine agreement on the coding. The codebook for each question varied between three

and fifteen categories per question, and Cohen’s Kappa between the two coders was at least

0.61 for each question. After each researchers finished their coding, they also calibrated their

coding results by discussing each text responses and their assigned codes.

4.3 Regression Model

We ran a series of mixed-effects logistic regressions to understand what file-level metadata,

information about a given cloud storage account, and participant demographics correlated

18

with participant ability to recognize or remember files, and the decisions they made con-

cerning managing the file and its sharing settings. We chose to use a mixed model because

ten files in our model belonged to each participant and our mixed-effects logistic regression

included a participant-specific random factor to account for this non-independence of data.

We included the following account-specific independent variables in each of our regression

models:

• Service (Dropbox or Google Drive)

• Age of the cloud storage account (years)

• Whether the account was used for work purposes

• Whether the account was used for personal purposes

We included the following file-specific factors:

• File type (document, image, spreadsheet, video, or other)

• File access permissions (owner, editor, or viewer)

• Number of days (log10) since the file was last modified

• Size of the file (log10)

• Whether the file was shared, either with specific users or using a shared link

Because we hypothesized that usage patterns and management decisions might differ

between Dropbox and Google Drive, we included terms to capture the interaction between

the service and all five file-specific factors.

We included the following participant-specific factors:

• Participant’s age

• Participant’s technical background (degree or job in computer science or related fields)

We also ran a mixed-effects logistic regression to identify correlations between these

factors and whether or not participants preferred to keep sharing that file with up to three

different individuals with whom that file was shared (sharing recipients). The dependent

19

variable was ordinal and captured preferences to keep sharing (1), whether it did not matter

if the file was shared (2), or to stop sharing (3). We removed the sharing status independent

variable from the regression because we only modeled shared files in this regression. However,

we added an independent variable for participant responses about how recently they had been

in touch with the sharing recipient (within the past year, over a year ago, or they did not

know who that person was). We treated both the participant and the file as random factors

in our mixed model. Because shared files were only a fraction of our data set, we did not

include interaction terms in our model. In the body of the paper, we report the p-values for

factors that were significant in our models.

20

CHAPTER 5

RESULTS

5.1 Participant Demographics and Account Usage

Here we present an overview of the results of our survey as well as a statistical analysis of the

file recollection and management decisions of users as a function of various factors related

to the user and files in question.

Dropbox GDrive

Total # Participants 33 67

Gender Male 21 37Female 11 30

Not answered 1 0

Age <20 1 020-34 18 4735-49 8 18

50+ 5 2Not answered 1 0

Technical Yes 11 19Background No 21 48

Not answered 1 0

Table 5.1: Participant demographics.

Our participant pool contained a total of 100 individuals, of which 58% were male and

41% were female. The remaining 1% did not declare their gender. The ages of the individuals

varied from 19 to 68 with a mean of 32 years. We classified participant technical background

on whether the individual had a degree or a job in a computer science related field. Our

survey indicated that 69% of our participants did not have a strong technical background.

Table 5.1 provides the demographic details of our participants.

From the collected responses, we evaluated the overall cloud storage services that our

participants used. While our survey data included 33 Dropbox and 67 Google Drive users,

33% of our participants also used Microsoft OneDrive and 24% had an Apple iCloud account.

21

Other commonly used services included Amazon Cloud Drive and Box. Most participants

used at least two different cloud storage services. Only 17% of participants used one cloud

storage service. However, 36% of participants used two different cloud storage services, and

47% of participants used more than three cloud storage services.

Information regarding the usage of these accounts is presented in Table 5.2. While both

Dropbox and Google Drive services have attracted significant numbers of new users in recent

years [30,37], our participants had been using these services for quite some time: 85% of the

Google Drive and 94% of Dropbox accounts were older than 3 years old. Also, our survey

results indicated that median age of the accounts of our participants (of both cloud storage

services) was 4.9.1

There is also a good amount of variety in how these accounts are used. More than 80%

of the participants used accounts for both work/school and personal reasons, which can lead

to an intermingling of files stored for different purposes with different sensitivities. 48% of

participants used their accounts for either work/school or personal purposes at least once

a week and 86% participants used it at least once a month. However, it was relatively rare

for the cloud to completely supplant local file storage: 88% of individuals retained at least a

subset of their cloud files on some local storage medium.

Participants frequently used synced folders, the official website, and smart phones to

access their cloud storage. 12% of participants accessed their cloud storage through synced

folders at least once a day, while 15% of participants daily accessed their cloud storage

through the service’s website. However, smartphone use was quite limited. 19% of partici-

pants never used their smartphones to access their cloud storage and only 9% of participants

used their smartphones to access the storage medium on a daily basis.

To investigate the organization pattern of the cloud storage accounts, we asked our par-

ticipants how often they ran out of storage space. Although Google Drive and Dropbox only

1. We calculated account age based on the oldest file(s) in a participant’s cloud storage account.

22

offer limited cloud storage capacity, our participants did not run out of storage space fre-

quently. 45.5% of Dropbox users and 83.6% of Google Drive users never ran out of storage

space. However, storing capacity also correlated with storage shortage. 9.1% of our Dropbox

users and 4.5% of Google Drive users almost always ran out of storage space.

Even though half of our participants answered that they never ran out of storage space,

they organized their cloud storage on a yearly basis. 33% of participants organized their

cloud storage at least once a year, but less than once a month. 33% of participants organized

it less than once a year, but occasionally. However, 17% of participants never took the time

to organize their cloud storage.

Property Service Min Median Max Mean(SD)

Account Age DB 0.4 4.9 8.2 4.7(2.2)(Years) GD 0.05 4.9 5.3 4.1(1.5)

Account Size DB 0.12 2.0 54.1 3.9(9.1)(GB) GD 0.002 1.2 63.3 3.4(8.4)

# of Files DB 53 514 66.6K 3.5K(11.4K)GD 59 424 22.1K 1.8K(3.6K)

Avg. File(MB) DB 0.015 3.1 26.7 6.0(7.3)GD 0.15 7.3 131.1 14.6(21.5)

Largest File(MB) DP 3.1 295.9 4000 571(820)GD 8.4 506.9 9600 1100(1660)

Shared Files DP 0.02 21.5 100 38.6(38.5)(%) GD 0.3 44.0 99.7 46.7(34.4)

Table 5.2: Descriptive statistics of the Dropbox (DB) and Google Drive (GD) accounts ofparticipants.

5.2 Account Archeology

The statistics we collected about each cloud storage service showed the original characteristics

of the storage system. Google Drive was shared between all Google services, including Gmail

and Google Photos, while Dropbox only provided cloud storage services. Google Drive’s

23

median account size was smaller than that of Dropbox, which showed Google Drive users

used their cloud storage as a part of their entire account. Moreover, Google Drive was more

sharing oriented than Dropbox—44% of Google Drive files and 21.5% of Dropbox files were

shared with others.

Moreover, most files in the cloud storage accounts of participants were modified within the

past three years, but participants kept files that were older than three years also. Figure 5.1

and Figure 5.2 shows file creation and modification dates.

0 30 60 90 120 150 1800

10

20

Weeks

#of

file

s

Figure 5.1: [Based on creation date,Google Drive only.] Participants kepttheir files more than 3 years. Some par-ticipants kept their files for 9 years.

0 30 60 90 120 150 180

0

10

20

Weeks

#of

file

s

Figure 5.2: [Based on modification date,Dropbox and Google Drive.] Our sampleshowed that participants only continuedto modify files for three years.

5.3 File Recognition

First, we asked participants whether they recognized a file by directly asking: “(After looking

at this file), do you know what it is?” We found that the vast majority of the files we asked

about were recognized: Only 9.7% of Dropbox files and 15.5% of Google Drive files were not

recognized.

As described in the methodology, we ran a mixed-effects logistic regression to investigate

what factors specific to the file, account, or participant correlated with whether participants

24

Ownership Remembrance

Owner0 10 20 30 40 50 60 70 80 90 100

Editor0 10 20 30 40 50 60 70 80 90 100

Viewer 0 10 20 30 40 50 60 70 80 90 100Stronglyagree

Agree Neutral Disagree Stronglydisagree

Figure 5.3: Comparison of file ownership and remembrance. File ownership had a significantpositive correlation with remembering that the file was stored in the cloud (χ2(8, N = 862)= 32.24, p < .001).

recognized the files they were shown. Table 5.3 includes the results of this logistic regression.

Compared to the “other” file type,2 participants were more likely to recognize documents

(p < .001) and images (p = .027). Unsurprisingly, compared to files for which they were

the owner, participants were less likely to recognize files owned by others and for which

they only had editor (p = .001) or viewer (p = .011) permissions. We observed a significant

interaction effect in which participants were more likely to recognize files for which they had

editor permissions if they used Dropbox, rather than Google Drive (p = .018), but the cloud

storage service otherwise did not significantly impact file recognition. We did not observe

any significant correlations between whether the participant recognized the file and any of

the other file metadata factors or participant-specific factors we collected.

In addition to asking whether a participant recognized a file, we asked whether they

remembered that they retained it in cloud storage. Compared to recognizing the file, partici-

pants remembered retaining far fewer files. Users did not remember that 39.39% of Dropbox

files and 34.18% of Google Drive files were retained in cloud storage. While our non-random

sampling approach is not representative of all files stored within these accounts, this result

suggests that even though recalling the act of saving a file is not hard, with such large and

2. We categorized file type based on five different file extensions: document, image, spreadsheet, video,and other. The “other” category is a baseline of file type.

25

long-lived accounts it is difficult to keep track of what has been retained.

Using logistic regression (Table 5.4), we found that compared to files in the “other”

category, participants were more likely to remember video files (p = .025), yet less likely to

remember image files (p < .001). Unsurprisingly, participants were less likely to remember

files if they had only editor (p = .013) or viewer (p < .001) permissions, as opposed to being

the owner of the file. Participants were also more likely to remember a file the more recently

it had been modified (p < .001) or the larger its file size (p < .001). They were also more

likely to remember shared files than unshared files (p < .001). Participants were less likely to

remember a file if their cloud storage account was older (p < .001), although they were more

likely to remember a file if they, the participant, were older in age (p < .001). Participants

were less likely to remember files if they used their account for work purposes (p < .001) and

more likely to remember files if they used their account for personal purposes (p < .001).

To investigate the utility of these stored files, we asked participants to self-report when

they last accessed each file.3 We found that most files that we asked about were not recently

accessed. 28.76% of Dropbox files and 43.15% of Google Drive files were last accessed between

one month and one year ago. 41.18% of Dropbox files and 40.93% of Google Drive files

were last accessed between one and five years ago. Regarding potential future utility, our

participants answered that 30.13% of Google Drive files and 23.03% of Dropbox files would

most likely never be accessed again. While copious cheap or free storage makes such “write

only” archives tenable, if a user is to store sensitive data here without expecting it to provide

future benefit, the risks of such an archive clearly outweigh the rewards. Participants used

their cloud storage for various reasons. We also investigated the reasons why participants

originally stored the files using the cloud. 21.0% of files were stored for backup, 15.9% for

work, 12.3% for access advantages. Even though it was not the main reason for storing the

3. Last access time was not available via the API, and only modification date was available. However,modification date was not limited to survey participants. The last modification date also recorded whenother owners or editors modified the file.

26

Table 5.3: The results of a mixed-effects logistic regression to identify what factors werecorrelated with recognizing what the file was (baseline: not recognized). Non-italicizedvalues in the baseline column specify the baseline category for terms representing categoricalvariables. Italicized values in the baseline column indicate the units for numerical terms.Significant p-values are bolded.

Factor Baseline / Units Coefficient Std. Error z value p

Service: Dropbox Google Drive -0.727 2.219 -0.328 0.743File Type: Document Other 1.997 0.488 4.090 <.001File Type: Image Other 0.940 0.424 2.215 0.027File Type: Spreadsheet Other 1.209 0.896 1.349 0.177File Type: Video Other 1.483 1.070 1.386 0.166Access: Editor Owner -2.143 0.653 -3.281 0.001Access: Viewer Owner -1.690 0.664 -2.546 0.011Days Since Modified log10(days+1) -0.308 0.285 -1.082 0.279File Size log10(bytes) 0.264 0.142 1.862 0.062Shared Not shared -0.058 0.541 -0.107 0.915Account Age Years 0.274 0.166 1.654 0.098Participant Tech. Background No 0.350 0.429 0.816 0.414Participant Age Years 0.016 0.021 0.772 0.440Account for Work Purposes No 0.491 0.427 1.150 0.250Account for Personal Purposes No -0.147 0.528 -0.278 0.781Service: Dropbox * File Type: Document Google Drive, Other 0.050 0.866 0.058 0.954Service: Dropbox * File Type: Image Google Drive, Other 0.425 0.733 0.580 0.562Service: Dropbox * File Type: Spreadsheet Google Drive, Other 0.152 1.499 0.102 0.919Service: Dropbox * File Type: Video Google Drive, Other 0.187 1.702 0.110 0.912Service: Dropbox * Access Type: Editor Google Drive, Owner 2.983 1.258 2.371 0.018Service: Dropbox * Access Type: Viewer Google Drive, Owner 1.038 1.688 0.615 0.539Service: Dropbox * Days Since Modified Google Drive, N/A -0.538 0.591 -0.911 0.362Service: Dropbox * File Size Google Drive, N/A 0.276 0.213 1.297 0.195Service: Dropbox * Shared Google Drive, Not 0.754 0.883 0.853 0.393

files, 3.4% of participants used their cloud storage for keeping personal memories, such as

family pictures and love letters with their spouses. P45, a Google Drive user, answered that

he kept the files, “Just because they are my father’s feet. haha I know it sounds weird but

the day he is gone, I want to remember everything :( ”

Lastly, we analyzed recognition and remembrance by participants. Most participants had

at least one file that they did not recognize or remember. 59% of participants had at least

one file that they did not recognize and 81% of participants had at least one file that they

did not remember. The average number of files that participants did not recognize was 1.35.

The average number of files that participant did not remember was 3.3. Our participants

only remembered five out of ten files, and only 10% of participants fully recognized and

remembered their survey files.

27

Table 5.4: The results of a mixed-effects logistic regression to identify what factors werecorrelated with remembering that the file was stored in the cloud, which was recordedon a five-point Likert scale coded as an integer from -2 (the participant strongly disagreesthat they remembered the file was stored in the cloud) to 2 (the participant strongly agreesthat they remembered the file was stored in the cloud). Non-italicized values in the baselinecolumn specify the baseline category for terms representing categorical variables. Italicizedvalues in the baseline column indicate the units for numerical terms. Significant p-values arebolded.

Factor Baseline / Units Coefficient Std. Error z value p

Service: Dropbox Google Drive -1.444 1.111 -1.300 0.193File Type: Document Other 0.102 0.220 0.465 0.642File Type: Image Other -0.150 0.002 -88.197 <.001File Type: Spreadsheet Other -0.063 0.599 -0.104 0.917File Type: Video Other 1.467 0.655 2.239 0.025Access: Editor Owner -1.067 0.428 -2.493 0.013Access: Viewer Owner -3.09 0.455 -6.789 <.001Days Since Modified log10(days+1) -0.658 0.002 -385.835 <.001File Size log10(bytes) 0.058 0.002 34.251 <.001Shared Not shared 0.375 0.002 220.280 <.001Account Age Years -0.126 0.002 -73.893 <.001Participant Tech. Background No -0.459 0.433 -1.061 0.289Participant Age Years 0.011 0.002 6.491 <.001Account for Work Purposes No -0.382 0.002 -212.773 <.001Account for Personal Purposes No 0.726 0.002 404.017 <.001Service: Dropbox * File Type: Document Google Drive, Other 0.025 0.483 0.051 0.959Service: Dropbox * File Type: Image Google Drive, Other 0.106 0.403 0.263 0.793Service: Dropbox * File Type: Spreadsheet Google Drive, Other 0.682 0.988 0.690 0.490Service: Dropbox * File Type: Video Google Drive, Other -2.348 0.917 -2.560 0.010Service: Dropbox * Access Type: Editor Google Drive, Owner 1.549 0.688 2.250 0.024Service: Dropbox * Access Type: Viewer Google Drive, Owner 4.084 1.187 3.441 <.001Service: Dropbox * Days Since Modified Google Drive, N/A -0.294 0.259 -1.133 0.257Service: Dropbox * File Size Google Drive, N/A 0.294 0.095 3.086 0.002Service: Dropbox * Shared Google Drive, Not -0.493 0.428 -1.154 0.249

5.4 File Management

To assess file management needs, we asked our participants what file management decision

they wanted to perform for each file: encrypt the file in place, delete it, or keep it as-is.

Participants wanted to keep 57.93% of the files they saw in our survey as-is, delete 35.24% of

files, and encrypt 6.83% of files. Recollection of files was highly correlated with file manage-

ment decisions. Participants were more likely to delete files if they did not recognize them.

They typically kept the file as-is if they recognized and remembered it. Also, participants

were more likely to encrypt the file if they recalled it (Figure 5.4).

Participants were more likely to prefer deleting files if they only had editor (p = .008)

28

Recognition &Remembrance File Management Decision

Not recognized0 10 20 30 40 50 60 70 80 90 100

Recognized butnot remembered 0 10 20 30 40 50 60 70 80 90 100

Recognized andremembered 0 10 20 30 40 50 60 70 80 90 100

Keep as-is Encrypt Delete

Figure 5.4: Participant management decisions on files across the possible combinations of filerecognition and remembrance. Our statistics suggest these correlations are significant (χ2(4,N = 997) = 260.26, p < .001).

or viewer (p < .001) permissions, as opposed to being the owner of the file. This effect,

however, was far more muted for files on Dropbox than for those on Google Drive. There

was a significant negative interaction between the service and the access permissions in

predicting preferences for file deletion. We did not observe any other significant main effects

for predicting which files a participant would express a preference for deleting, nor which

files participants were more likely to delete rather than keep as-is (Table 5.5).

As with our regression model identifying which file-based, account-based, and participant-

based features correlated with preferences for encrypting a file, we observed few significant

correlations between these factors and participant preferences to encrypt a file. We observed

that participants with a technical background, relative to participants without such a back-

ground, were more likely to choose to encrypt a file (p = .036). In addition, participants who

used their cloud storage account for work purposes were less likely to choose to encrypt a

file (p = .013). We did not observe any other significant correlations (Table 5.6).

We asked participants why they made each file management decision. Participants had

multiple reasons for wanting to keep files as-is. 21.1% of keep as-is decisions were based on the

fact that participants might need the file in the future. P5, a Google Drive user, mentioned,

“I might need it if I am ever audited, and I don’t know how long I need to keep tax-related

29

for” for a tax-related file they were storing. For 19.1% of keep as-is decisions, participants

suggested that they did not care about the file because it did not contain any private or

sensitive information and they wanted to keep it. For instance, one participant described,

“There is nothing about the file that I would be concerned about during a data breach”.

17.7% of the responses emphasized saving files for backup purposes, and 13.9% mentioned

that files should be kept as-is because participants wanted to access the files remotely and

across multiple devices.

For 69.2% of delete decisions, participants mentioned that the files were no longer useful.

When questioned about one of the images we displayed, P27 said, “I don’t need it anymore

and that folder is full of junk photos.” 12.0% of delete decisions were determined because

participants did not remember the file in question. Participants answered that they wanted to

delete files to clear up space for 10% of delete decisions. Another popular reason for deleting

files was because the content was said to be personal and users wanted to prevent unau-

thorized access. One participant worried about a personal photo and said, “It’s a personal

photo of my wife and I don’t want anyone else to see it”.

Encryption was not as common as deletion. For 30.6% of encrypt decisions, participants

answered that the file contained private information. 27.8% of encryption decisions were

for security purposes. The responses suggested participants encrypted files that contained

sensitive information. P44’s recorded response said “It is a financial document that I would

not want to be public”. We also found instances where users wanted to encrypt pictures and

videos.

The possibility to access a file in the future also impacted file management decisions.

79% of files that participants said that they would access in the future were kept as-is.

Participants answered that they wanted to delete 10.7% of files that they would access in

the future. Therefore, we investigated why participants might want to delete files even if

they thought they would access them in the future. Participants had second copies of 27.4%

30

Ownership Deletion Decision

Owner0 10 20 30 40 50 60 70 80 90 100

Editor0 10 20 30 40 50 60 70 80 90 100

Viewer0 10 20 30 40 50 60 70 80 90 100

Do not Delete Delete

Figure 5.5: Comparison of deletion and file ownership levels. The ownership level significantlycorrelated with the decision to delete a file (χ2(2, N = 928) = 13.81, p = .001).

Participant

Background Encryption Decision

Technical0 10 20 30 40 50 60 70 80 90 100

Non-technical0 10 20 30 40 50 60 70 80 90 100

Do not encrypt Encrypt

Figure 5.6: Comparison of file encryption and participant technical background. If the par-ticipant had a technical background, they were more likely to encrypt the file (χ2(1, N =645) = 8.14, p = .004).

Future Access File Management Decision

Access0 10 20 30 40 50 60 70 80 90 100

Do not access0 10 20 30 40 50 60 70 80 90 100

Keep as-is Encrypt Delete

Figure 5.7: Our participants were more likely to delete files when they expected they wouldnever need to access the files in the future (χ2(2, N = 997) = 272.84, p < .001).

of files that they wanted to delete. However, participants also answered that they wanted

to delete 60% of those files because they did not need the files. Also, participants elected

to delete 13.3% of those files to save space. Participants stated that they wanted to delete

64.6% of files that they would not need to access in the future (Figure 5.10).

In terms of managing multiple copies of files, we thought participants would be more

31

Second Copy File Management Decision

Yes0 10 20 30 40 50 60 70 80 90 100

No0 10 20 30 40 50 60 70 80 90 100

Do not know0 10 20 30 40 50 60 70 80 90 100

Keep as-is Encrypt Delete

Figure 5.8: Our participants were more likely to keep files as-is when they had a secondcopy. However, when they did not have a second copy, they were more likely to delete thefile (χ2(4, N = 997) = 130.85, p < .001).

likely to delete their files when they had a second copy. However, our results showed that

participants were more likely to keep their files when they had extra copies. This suggested

that participants kept multiple copies when they thought files were important (Figure 5.8).

Lastly, we also investigated how security perceptions affected file management decisions.

Our participants thought that never losing the ability to access a file was more important

than security. 26.3% of participants answered that keeping the file safe from unauthorized

access was important, while 40.3% of participants thought never losing the ability to access

a file was important. Participants who considered security important were more likely to

choose file encryption (Figure 5.9), while participants who thought ability to access a file

was important were more likely to keep a file as-is. However, if ability to access a file was not

important to participants, they were more likely to delete the files (Figure 5.10). We think

that participants who do not want to lose the ability to access a file also assume that other

people can easily access the file. Also, participants do not want to spend time on encryption.

They were likely to delete files whether the file had personal information or not.

32

Keep the file safe fromunauthorized access File Management Decision

Important0 10 20 30 40 50 60 70 80 90 100

Not important0 10 20 30 40 50 60 70 80 90 100

Keep as-is Encrypt Delete

Figure 5.9: Our participants were more likely to encrypt files when they wanted to preventunauthorized file access (χ2(2, N = 997) = 205.55, p < .001).

Never lose the abilityto access the file File Management Decision

Important0 10 20 30 40 50 60 70 80 90 100

Not important0 10 20 30 40 50 60 70 80 90 100

Keep as-is Encrypt Delete

Figure 5.10: Our participants were more likely to keep the files as-is when they thought neverlosing the ability to access a file was important (χ2(2, N = 997) = 310.42, p < .001).

33

Table 5.5: The results of a mixed-effects logistic regression to identify what factors werecorrelated with expressing a preference to delete the file shown, as opposed to keepingthe file as-is. Files the participant wanted to encrypt are excluded from this model. Non-italicized values in the baseline column specify the baseline category for terms representingcategorical variables. Italicized values in the baseline column indicate the units for numericalterms. Significant p-values are bolded.

Factor Baseline / Units Coefficient Std. Error z value p

Service: Dropbox Google Drive -2.342 1.556 -1.505 0.132File Type: Document Other -0.375 0.335 -1.121 0.262File Type: Image Other -0.226 0.337 -0.671 0.502File Type: Spreadsheet Other 1.233 0.696 1.772 0.076File Type: Video Other -1.143 0.683 -1.673 0.094Access: Editor Owner 1.379 0.518 2.665 0.008Access: Viewer Owner 2.054 0.535 3.838 <.001Days Since Modified log10(days+1) 0.077 0.189 0.407 0.684File Size log10(bytes) -0.149 0.105 -1.419 0.156Shared Not shared -0.361 0.359 -1.006 0.314Account Age Years -0.186 0.142 -1.316 0.188Participant Tech. Background No -0.129 0.362 -0.355 0.722Participant Age Years -0.018 0.017 -1.012 0.312Account for Work Purposes No -0.045 0.361 -0.123 0.902Account for Personal Purposes No -0.420 0.435 -0.964 0.335Service: Dropbox * File Type: Document Google Drive, Other 1.024 0.631 1.623 0.105Service: Dropbox * File Type: Image Google Drive, Other 1.208 0.602 2.007 0.045Service: Dropbox * File Type: Spreadsheet Google Drive, Other -0.158 1.278 -0.123 0.902Service: Dropbox * File Type: Video Google Drive, Other 1.350 1.073 1.258 0.208Service: Dropbox * Access Type: Editor Google Drive, Owner -2.924 0.864 -3.385 <.001Service: Dropbox * Access Type: Viewer Google Drive, Owner -3.322 1.624 -2.045 0.041Service: Dropbox * Days Since Modified Google Drive, N/A 0.351 0.355 0.989 0.322Service: Dropbox * File Size Google Drive, N/A 0.020 0.160 0.127 0.899Service: Dropbox * Shared Google Drive, Not 0.859 0.611 1.406 0.160

34

Table 5.6: The results of a mixed-effects logistic regression to identify what factors werecorrelated with expressing a preference to encrypt the file shown, as opposed tokeeping the file as-is. Files the participant wanted to delete are excluded from this model.Non-italicized values in the baseline column specify the baseline category for terms repre-senting categorical variables. Italicized values in the baseline column indicate the units fornumerical terms. Significant p-values are bolded.

Factor Baseline / Units Coefficient Std. Error z value p

Service: Dropbox Google Drive -4.400 2.808 -1.567 0.117File Type: Document Other -0.348 0.645 -0.539 0.590File Type: Image Other -0.424 0.680 -0.624 0.533File Type: Spreadsheet Other -24.054 420.899 -0.057 0.954File Type: Video Other -1.527 1.340 -1.139 0.255Access: Editor Owner 0.255 0.982 0.260 0.795Access: Viewer Owner -23.491 280.600 -0.084 0.933Days Since Modified log10(days+1) -0.034 0.341 -0.099 0.921File Size log10(bytes) -0.256 0.189 -1.355 0.176Shared Not shared -0.086 0.634 -0.135 0.892Account Age Years 0.105 0.234 0.451 0.652Participant Tech. Background No 1.177 0.562 2.095 0.036Participant Age Years -0.032 0.029 -1.089 0.276Account for Work Purposes No -1.37 0.549 -2.486 0.013Account for Personal Purposes No -0.594 0.624 -0.952 0.341Service: Dropbox * File Type: Document Google Drive, Other 1.098 1.228 0.894 0.371Service: Dropbox * File Type: Image Google Drive, Other 1.375 1.108 1.241 0.215Service: Dropbox * File Type: Spreadsheet Google Drive, Other 28.213 420.896 0.067 0.947Service: Dropbox * File Type: Video Google Drive, Other 1.583 1.987 0.797 0.426Service: Dropbox * Access Type: Editor Google Drive, Owner 0.156 1.442 0.108 0.914Service: Dropbox * Access Type: Viewer Google Drive, Owner 24.414 280.597 0.087 0.931Service: Dropbox * Days Since Modified Google Drive, N/A -0.086 0.571 -0.150 0.881Service: Dropbox * File Size Google Drive, N/A 0.517 0.290 1.783 0.075Service: Dropbox * Shared Google Drive, Not 0.189 1.056 0.179 0.858

35

5.5 File Sharing

Besides asking about file retention for each file, we also asked about whether users wanted to

maintain sharing relationships. We asked this question for 212 files and got 447 file-recipient

pairs. For each shared file, there was a range of 1 to 19 shared individuals. If there were only

1–3 individuals the file was shared with, we asked questions pertaining to them. However,

if there were four or more shared individuals, we randomly selected three. As a result, we

focused on 80 files that were shared with one person, 29 files that were shared with 2 people,

and 103 files shared with more than three people. Most participants wanted to keep the same

sharing decision for each file: Only 20.1% of shared with two people had different sharing

decisions. 13.5% of shared with three people had different sharing decisions.

Participants wanted to keep sharing 40.7% of these file-recipient pairs, stop sharing 11.4%

of these file-recipient pairs, and did not care about 47.9% of the file-recipient pairs.

In our regression of participant preferences about whether or not to continue sharing files

that were shared with one or more other users by name, rather than through a shared link,

we found that a handful of factors correlated with participant preferences, and the regression

results are shown in Table 5.7. Unsurprisingly, participants tended toward continuing to share

files when they had communicated in the past year with the person with whom the file was

shared (p < .001). Dropbox participants were more likely to want to keep sharing files than

Google Drive participants (p = .011). Furthermore, the use of accounts for work purposes

had a nuanced correlation with sharing preferences (p = .044).

Whether participants were in touch (had communicated with the sharing recipient in the

last year) was highly correlated with participants wanting to keep sharing files. Participants

in touch with the recipient definitely wanted to keep sharing with the recipient for 58% of

file-recipient pairs. In contrast, they definitely wanted to keep sharing for only 20% of file-

recipient pairs when they were out of touch (had not communicated in the past year) and

12% of files in cases when they did not know who the recipient was. Participants definitely

36

wanted to stop sharing files for 5% of pairs when they were in touch with the recipient, 25%

of pairs when they were out of touch, and 21% of pairs when they did not know who the

recipient was.

While the proportion of files participants definitely wanted to stop sharing with a partic-

ular person was similar for Dropbox (12%) and Google Drive (15%), the difference was in the

strength of the preference to keep sharing. For particular file-recipient pairs, 59% of Dropbox

participants definitely wanted to keep sharing the file, but Google Drive participants only

definitely wanted to keep sharing 22% of file-recipient pairs. For the majority of Google Drive

pairs (63%), participants did not care whether or not the file was still shared, whereas the

same was true for only 29% of Dropbox pairs. Participants who used their accounts for work

purposes did not care whether or not files continued to be shared for 56% of file-recipient

pairs. The same was true for only 33% of pairs when participants did not use their accounts

for work purposes. Participants who did not use their accounts for work purposes wanted to

both definitely keep sharing and stop sharing files at higher rates than participants who did

use their accounts for work purposes.

When participants were asked why they originally shared a file, the main reasons were for

work (38.9%). Another major reason to share files was to provide access, which accounted

for 17.0% of the responses. Participants who mentioned they would like to continue sharing

the file stated similar reasons as to why they shared the file in the first place. Participants

answered that they wanted to provide access for 44.1% of keep sharing decisions. 17.6%

of keep sharing decisions were based on the fact that participants wanted to keep sharing

files they collaborated on with other individuals. Participants said that there was no obvious

reason to stop sharing the file for 3% of keep sharing decisions. As an example, P25 mentioned

“There is no reason to stop. They don’t need access to it for anything important, but its not

necessary to stop sharing.”

On the other hand, users also had interesting reasons for deciding to stop sharing files.

37

Recipient Sharing Decision

In touch0 10 20 30 40 50 60 70 80 90 100

Out of touch0 10 20 30 40 50 60 70 80 90 100

Don’t know them0 10 20 30 40 50 60 70 80 90 100

Keep sharing Neutral Stop sharing

Figure 5.11: Participant preferences for definitely continuing to share files (keep sharing), notcaring whether or not the file continues to be shared (neutral), or to definitely stop sharinga file (stop sharing) across file-recipient pairs based on whether the participant said theywere in touch with the recipient (had communicated in the past year), out of touch withthe recipient (had not communicated in the past year), or did not know the recipient (don’tknow them).

Sharing Method Sharing Decision

Via e-mail0 10 20 30 40 50 60 70 80 90 100

Via link0 10 20 30 40 50 60 70 80 90 100

Keep sharing Neutral Stop sharing

Figure 5.12: Our participants were more likely to keep sharing a file if they had shared itvia e-mail. However, they were more likely to stop sharing or they did not care about thesharing status if they had used a link to share the file (χ2(2, N = 562) = 17.25, p < .001).

For Participants decided to stop sharing files 41.6% of the time because they could not

remember the recipient or were no longer in communication. Participants also answered that

the task pertinent to the file was completed in relation to 41.7% of stop sharing decisions.

One participant who wanted to stop sharing said: “Because I don’t remember sharing it with

them in the first place.”

38

Original Sharer Sharing Decision

Shared with all of them0 10 20 30 40 50 60 70 80 90 100

Shared with no one0 10 20 30 40 50 60 70 80 90 100

Don’t know0 10 20 30 40 50 60 70 80 90 100

Keep sharing Neutral Stop sharing

Figure 5.13: If a participant originally shared the file, they were more likely to wanted tostop sharing it. However, if the participant was the recipient, they were less likely to wantto stop sharing (χ2(6, N = 447) = 54.16, p < .001).

Table 5.7: The results of a mixed-effects logistic regression to identify what factors were cor-related with expressing a preference to stop sharing the file shown. In particular thedependent variable is an ordinal variable reflecting a preference to keep sharing (1), whetherthe sharing setting does not matter (2), or to stop sharing (3). Non-italicized values in thebaseline column specify the baseline category for terms representing categorical variables.Italicized values in the baseline column indicate the units for numerical terms. Significantp-values are bolded.

Factor Baseline / Units Coefficient Std. Error z value p

Service: Dropbox Google Drive -5.159 2.030 -2.541 0.011File Type: Document Other -0.400 1.552 -0.258 0.797File Type: Image Other 0.059 1.505 0.039 0.969File Type: Spreadsheet Other 1.202 2.109 0.570 0.569File Type: Video Other -4.871 2.618 -1.861 0.063Access: Editor Owner 0.748 1.217 0.615 0.539Access: Viewer Owner 1.475 3.650 0.404 0.686Days Since Modified log10(days+1) 2.055 1.112 1.849 0.065File Size log10(bytes) 0.748 0.460 1.625 0.104Account Age Years -0.237 0.823 -0.288 0.773Participant Tech. Background No -1.341 1.845 -0.727 0.467Participant Age Years -0.092 0.110 -0.836 0.403Account for Work Purposes No -3.837 1.903 -2.016 0.044Account for Personal Purposes No 3.069 1.930 1.590 0.112Relationship to Sharing Recipient Have communicated in past year 3.104 0.740 4.193 <.001

39

5.6 File Co-ownership

One potential way to handle shared files long after their original use is simply to provide

each participant with their own independent copy, which then diverges as any edits are made.

We asked whether users would prefer the edits of others to be reflected in their files, or if

they would prefer not to receive those edits and keep their own copy. Thus, we asked two

questions that are related with file co-ownership. We asked participants to indicate on a

five-point Likert scale whether they agree with these statements: “If anyone other than me

changes (modifies or deletes) the file, my copy of the file should also reflect their changes

(Others’ changes → My copy),” and “If I change (modify or delete) this file, other people’s

copies of the file should also reflect my changes (My changes → Others copies).”

For 60.8% of Dropbox files and 27.6% of Google Drive files, our participants preferred to

receive edits, and conversely for 51.2% of Dropbox files and 39.1% of Google Drive files our

participants preferred that their own edits be reflected in someone else’s copy of the shared

file. We think this difference originated from the sharing characteristics of each cloud storage

service. For Dropbox users, sharing an individual file can only give others viewer access;

granting editor access requires making a “team” inside their cloud storage service and the

entire folder containing the file has to be shared. On the other hand, Google Drive allows

its users to grant view and edit access for both files and folders. Therefore, the relationship

between individuals who share files with each other in Dropbox is stronger than Google

Drive.

This decision was also impacted by whether the participant was an owner or editor of

the file. For files owned by the participant, they preferred that their files reflect external

changes 39.2% of the time and that their changes should be applied to other copies 51.6% of

the time. For files with editing rather than ownership permissions, the participants preferred

that changes be reflected in their copy 53.7% of the time and that their changes be applied

to other copies 43.6% of the time.

40

Cloud storage Others’ changes → My copy My changes → Others copies

Dropbox0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100

Google Drive 0 10 20 30 40 50 60 70 80 90 100Stronglyagree

Agree Neutral Disagree Stronglydisagree

0 10 20 30 40 50 60 70 80 90 100

Figure 5.14: Dropbox users were more likely to have the same version of shared files thanGoogle Drive users (Others’ changes → My copy : χ2(4, N = 327) = 49.3, p < .001, Mychanges → Others copies : χ2(4, N = 327) = 9.88, p = .04).

Ownership Others’ changes → My copy My changes → Others copies

Owner0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100

Editor0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100

Viewer 0 10 20 30 40 50 60 70 80 90 100Stronglyagree

Agree Neutral Disagree Stronglydisagree

0 10 20 30 40 50 60 70 80 90 100

Figure 5.15: If a participant owned the file in question, they were less likely to accept thechanges of others and were more likely to want their changes to be reflected in the sharedcopies (Others’ changes → My copy : χ2(8, N = 327) = 23.07, p = .003, My changes →Others copies : χ2(4, N = 327) = 14.43, p = .07).

Shared with Others’ changes → My copy My changes → Others copies

All of them0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100

None0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100

Don’t know 0 10 20 30 40 50 60 70 80 90 100Stronglyagree

Agree Neutral Disagree Stronglydisagree

0 10 20 30 40 50 60 70 80 90 100

Figure 5.16: If a participant was the original sharer of the file in question, they were lesslikely to accept the changes of others and more likely to want their changes reflected in theshared copies (Others’ changes→My copy : χ2(12, N = 327) = 33.64, p < .001, My changes→ Others copies : χ212, N = 327) = 43.41, p < .001).

41

We asked whether the participant originally shared the file or not. If the participant

was the original sharer of the file, they preferred that changes be reflected in their copy for

32.48% of files and that their changes be applied to other copies to 53.5% of files. However,

if the participant was the recipient of the file, they preferred that changes be reflected in

their copy for 44.09% of files and that their changes be applied to other copies for 28.34%

of files.

The sharing method also impacted these decisions. For 47.17% of e-mail based and 21.74%

of link based shared files, participants preferred to receive edits. For 46.22% of e-mail based

and 38.26% of link based sharing files, participants preferred that their own edits be reflected

in someone else’s copy of the shared file.

SharingMethod

Others’ changes → My copy My changes → Others copies

E-mail0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100

Link 0 10 20 30 40 50 60 70 80 90 100Stronglyagree

Agree Neutral Disagree Stronglydisagree

0 10 20 30 40 50 60 70 80 90 100

Figure 5.17: Participants preferred to keep the same version of files for e-mail based ratherthan link based shared files (Others’ changes→ My copy : χ2(4, N = 337) = 23.88, p < .001,My changes → Others copies : χ2(4, N = 337) = 3.40, p = .49).

We asked participants why their copy of the file should also reflect the changes of oth-

ers. 57.4% answered that they want to receive updates. P99 emphasized the importance of

updates: “modifications should absolutely be shown. better to have record of what has hap-

pened with a file than not to have record of it.” 11.9% answered that accepting the changes

of others was a natural part and even the point of collaboration. P18 replied, “It was a group

assignment, so it should be up to date with everyone’s information.” Trusting the work of

others was one reason why participants wanted to accept collaborator edits. P55 said, “I trust

that the changes they make are appropriate.” However, participants had different reasons

42

for not wanted to receive updates for other selected files. 40.3% showed that they did not

care about updates. P58 answered, “I don’t need to see their changes or care about them.”

Also, 22.9% answered that they wanted to keep the original version of files. P75 replied, “I

want to remember it the way it was, rather than someone changing it.”

Also, we asked why other people’s copies of the file should also reflect a participant’s

changes. 51.0% answered that they wanted to receive updates. P33 mentioned that “Changes

to that file indicate changes to our budget, so I’d like to keep that up to date.” 10.4% answered

other cloud service members needed to accept their changes because they collaborated with

each other. P88 replied that “All other collaborators within a file or project should know

who is editing our project, including myself.” 9.4% answered that they were the owner of the

file in question. 6.3% answered that they trusted the changes of others. P55 answered that

“The other person would be OK with me making any changes to this file.” However, the

participants who did not want to impact the file copies other others had different answers.

22.8% said that they did not care about updates. P16 answered that “I don’t care what

happens to the files at this point.” 13.2% answered that they were not the owner of the file

in question. 11.4% stated that shared members could keep their own working copies. 9.6%

answered that receiving a participant’s updates should be the decision of another shared file

user. P17 answered that “I think others should make their own decision.”

43

5.7 File Automation

We suggested three types of file automation features that could be added in the future:

deleting, encrypting, and moving the data to low-energy archives. To investigate user prefer-

ence, we asked the participants to indicate how much they agreed with each statement; “It

would be helpful if I could specify that a file on my cloud storage should be automatically

encrypted, deleted, or moved to an archive.” After that, we assumed that participants “sup-

port” file management automation if they selected “strongly agree” or “agree,” and assumed

that participants “did not support” file management automation if they selected “neutral,”

“disagree,” or “strongly disagree.” Participants had a more positive attitude toward auto-

encryption (72%) than auto-deletion (32%) and auto-archiving (37%).

Participants who supported auto-encryption answered that it offered better security(37.5%),

and it was easy to use (26.4%). P34 mentioned “that [he] wouldn’t have to implement [his]

own encryption solution.” However, 13.9% answered that having auto-encryption would be

useful whether they would use it or not. P65 stated, “I don’t know when I would need this

feature, but it would always be nice to know it was there when I do.” To know which fac-

tors should be considered for auto-encryption, we also asked participants to identify files or

folders in their account that should be automated. 36.1% of participants answered that a

“pre-defined rule” should be used for auto-encryption. P40 wanted a radio button that could

be used for auto-encryption: “I’d want to tag it with a radio button (in the shape of a lock,

maybe?) mouseover of course would explain that it would encrypt the file. You could also

have a section in settings that scans the filetype and automatically encrypts based on what

filetype you specify (such as, *.mp3, *.wav, *.pdf).” Moreover, 20.8% of participants replied

that “specific file type” should be considered for auto-encryption. P7 said, “Maybe by asking

first if you would like this ”type” of file to be automatically encrypted let’s say the file type

was a “Package” file. Or it was a JPG file, or something like that and then it could just

automatically assume you’d want that package file to be encrypted.”

44

Participants who supported auto-deletion answered that auto-deletion would delete junk

(43.7%) and free up space (18.8%). P34 said, “Sometimes I put things in my Dropbox

temporarily and then after I use the, I forget they’re there. I would like to set some of these

files to delete automatically after a set amount of time.” However, there was no significant

factor that could be applied for auto-deletion.

Many of our participants expressed concern about unintentional deletion. 37.5% of par-

ticipants answered that they wanted to decide whether they deleted the files or not. P24

answered, “I want to decide if I want to get rid of certain things.” Moreover, 25.0% of par-

ticipants worried about accidental deletion. P16 mentioned that “you might accidentally put

something in there you need.”

28.9% of participants who supported auto-archiving answered that auto-archiving would

be helpful to “save energy.” P1 stated, “I would like to be able to help save energy with things

I haven’t used in a long time but might need later.” 28.9% answered that auto-archiving

would “save space.” P95 mentioned that auto-archiving freed up space while it was revocable:

“This automatically free up space but it is also non-permanent. I still have the photos.” To

know which factors should be considered for auto-archiving, we asked participants to identify

files or folders in their account that should be automated. 47.3% of participants answered

that “time” should be a factor for things to be considered for auto-archiving. P17 answered,

“If it could be identified by year. Everything before a certain period of time could be archived.

After 2 years, send it to archive.”

Participants who had technical backgrounds were more likely to support file automa-

tion. 83.3% of participants who had technical backgrounds answered that auto-encryption

was useful, while 68.0% of non-technical participants supported auto-encryption. 36.7% of

participants who had technical backgrounds supported auto-deletion. 46.7% of participants

supported auto-archiving and 30.3% of non-technical participants supported auto-deletion.

33.3% of participants supported auto-archiving.

45

Auto-archiving Delay Tolerance

Support0 10 20 30 40 50 60 70 80 90 100

Do not support 0 10 20 30 40 50 60 70 80 90 100

No dely Minutes Hours Days

Figure 5.18: Comparison of auto-archiving and delay tolerance. Participants who supportauto-archiving tolerate a longer delay before a file is retrieved (χ2(3, N = 534) = 38.31,p < .001).

Also, based on attitudes toward auto-archiving, the tolerance for how long it takes to

retrieve a file changed. If participants thought that auto-archiving was useful, they could

accept a longer delay, but only up to a few minutes (57.6%).

46

CHAPTER 6

DISCUSSION AND LIMITATIONS

6.1 Discussion

Our participants had forgotten that a high proportion of the files they saw in our study were

stored in the cloud, yet many participants wanted to delete or encrypt at least one of those

files. Further, participants did not even recognize 13.5% of the files they saw, and wanted

to delete or encrypt 83.6% of these unrecognized files. These combined results highlight the

need for retrospective file management mechanisms in the cloud.

Some retrospection tools already exist in other domains. For instance, Facebook has

an “on this day” feature to highlight an old post, though this mechanism is focused on

resharing. Whereas Facebook’s feature is meant to drive reminiscence and engagement, our

results suggest that cloud users also need such retrospective mechanisms to remind them of

forgotten files, particularly those likely to arouse privacy concerns.

While automated retrospective file management mechanisms would be helpful, we did not

find many significant predictors in our regression models. Basic file metadata and information

about the participant alone was not enough to predict the file management decision.

Content clustering, closer interaction with users during the discovery process, and deeper

analyses of file contents, not just metadata, might enable better predictions on the way to

automated file management.

6.2 Limitations

A core limitation of our study is that we report on a convenience sample. Our participants

may not represent the typical user of cloud storage services, particularly since Mechanical

Turk workers tend to be more technically oriented than the population at large. Furthermore,

prospective participants with particularly sensitive files stored in the cloud might be reluctant

47

to participate since they needed to give our software OAuth permissions to access their files.

That said, even among individuals who were willing to participate, we observed many files

participants would want to delete or encrypt.

Our study focused on Dropbox and Google Drive, which are only two of the many cloud

storages services available, albeit the two most popular. We had an unequal distribution of

Dropbox and Google Drive participants in our sample. A more comparably sized sample of

the two services would provide a more accurate point of comparison.

In this research, we did not include the online document-creation service in our analysis.

However, these online documents accelerate collaboration by allowing co-editing, a special

feature that is totally different from locally shared files. For example, Google Drive does not

consume local storage space, but files can be frequently modified. An additional compari-

son of files generated by these web-based editing tools would have helped us develop more

comparable insights across the two cloud storage platforms.

48

CHAPTER 7

CONCLUSION AND FUTURE WORK

7.1 Conclusion

By investigating participant perspectives on a stratified sample of files stored in their own

Google Drive or Dropbox account, we built a better understanding of the contents of cloud

storage accounts, identifying latent needs for retrospective file management tools. We used a

stratified sample to measure a broad cross-section of files users retain in their cloud storage

accounts, rather than focusing on the files most likely to arouse security and privacy concerns

(e.g., files named “taxreturn2017.pdf” or that contain saved passwords). Even so, we found

that 83% of participants wanted to permanently delete at least one file from this sample

of ten. This result highlights the disconnect between the desired file management decisions

of our participants and the high overhead of retrospectively managing thousands of files

in a cloud storage account. Thus, our results highlight the need for retrospective privacy

mechanisms that empower users to manage the risks latent in their file archives without

expending unreasonable effort.

7.2 Future Work

According to our research, the average number of files stored on each participant’s cloud

account was 444.5 and almost 46.6% of files were forgotten after they were stored in the

cloud. Thus, managing files for users is quite demanding work. We believed we could use

our user-oriented research to help people manage their files easily. Our user-centered ap-

proach will be helpful for developing a predictive model. Because it is unfeasible to ask users

to retrospectively revisit all of their previous files, this survey can be used for building a

predictive model for which files might be safely deleted, automatically encrypted, or moved

to cold storage. Predictive models could combine techniques from machine learning with

49

insights drawn from human—computer interaction work concerning user security and pri-

vacy personas [22]. Based on this latter body of work, we expect that users can naturally

be categorized into a small set of different approaches to data management (e.g., those who

favor deletion, those who hoard files in cold storage, etc.). A predictive model could combine

a deep understanding of a user’s preferred mode of archive management with the specific

management decisions already made for certain files. After the user makes a few represen-

tative file management decisions, these more advanced methods might be able to partially

automate file management.

Moreover, we did not fully investigate file sharing practices, however, file sharing fore-

grounds several security and privacy issues. Rader reveals that users almost exclusively touch

files they have created themselves and are particularly reluctant to delete files that could

be useful to someone else in the future. Such behavior can result in clutter and frustrate

users [46]. Even though only a few participants wanted to stop sharing, keeping entire shared

files in cloud storage eventually brings file management issues. Our research shows the me-

dian percent of shared files was 34.9%. Thus, to develop a better file management system,

we need to conceptualize user preferences concerning sharing decisions and file versioning.

Lastly, we expected that studying an online document creation service would be greatly

helpful for understanding current cloud storage practices. Even though we did not include

online document creation tools in our research, this has many implications for future research.

Currently, many people uses online document creation tools, however, this area has never

been studied before. We need to investigate these online document creation tools to get

better insight about the current state of cloud storage.

50

REFERENCES

[1] John Robert Anderson. 1985. Cognitive psychology and its implications. A series ofbooks in psychology. (1985).

[2] Ibrahim Arpaci, Kerem Kilicer, and Salih Bardakci. 2015. Effects of security and privacyconcerns on educational use of cloud services. Computers in Human Behavior 45 (2015),93–98.

[3] Oshrat Ayalon and Eran Toch. 2013. Retrospective privacy: Managing longitudinal pri-vacy in online social networks. In Proc. 9th Symposium on Usable Privacy and Security.ACM, 4.

[4] Taiwo Ayodele, Galyna Akmayeva, and Charles A Shoniregun. 2012. Machine learningapproach towards email management. In Internet Security (WorldCIS), 2012 WorldCongress on. IEEE, 106–109.

[5] Olle Balter. 1997. Strategies for organizing email messages. HCI 1997 (1997), 21–38.

[6] Deborah Barreau and Bonnie A Nardi. 1995. Finding and reminding: file organizationfrom the desktop. ACM SigChi Bulletin 27, 3 (1995), 39–43.

[7] Deborah K Barreau. 1995a. Context as a factor in personal information managementsystems. Journal of the American Society for Information Science 46, 5 (1995), 327.

[8] Deborah K Barreau. 1995b. Context as a factor in personal information managementsystems. Journal of the American Society for Information Science 46, 5 (1995), 327.

[9] Lujo Bauer, Lorrie Faith Cranor, Saranga Komanduri, Michelle L Mazurek, Michael KReiter, Manya Sleeper, and Blase Ur. 2013. The post anachronism: The temporal di-mension of Facebook privacy. In Proc. 12th ACM Workshop on privacy in the electronicsociety. ACM, 1–12.

[10] Victoria Bellotti, Nicolas Ducheneaut, Mark Howard, Ian Smith, and ChristineNeuwirth. 2002. Innovation in extremis: evolving an application for the critical work ofemail and information management. In Proc. 4th conference on Designing interactivesystems: processes, practices, methods, and techniques. ACM, 181–192.

[11] Ofer Bergman, Richard Boardman, Jacek Gwizdka, and William Jones. 2004. Personalinformation management. In Proc. CHI. ACM, 1598–1599.

[12] Richard Boardman and M Angela Sasse. 2004. Stuff goes into the computer and doesn’tcome out: a cross-tool study of personal information management. In Proc. CHI. ACM,583–590.

[13] Richard Boardman, Robert Spence, and M Angela Sasse. 2003. Too many hierarchies?The daily struggle for control of the workspace. In Proc. HCI international, Vol. 1.616–620.

51

[14] Richard Peter Boardman. 2004. Improving tool support for personal information man-agement. Ph.D. Dissertation. University of London.

[15] Robert Capra, Emily Vardell, and Kathy Brennan. 2014. File synchronization andsharing: User practices and challenges. Proc. ASIS&T 51, 1 (2014).

[16] Richard Chow, Philippe Golle, Markus Jakobsson, Elaine Shi, Jessica Staddon, RyusukeMasuoka, and Jesus Molina. 2009. Controlling data in the cloud: outsourcing compu-tation without outsourcing control. In Proc. 2009 ACM workshop on Cloud computingsecurity. ACM, 85–90.

[17] Jason W Clark, Peter Snyder, Damon McCoy, and Chris Kanich. 2015. I Saw ImagesI Didn’t Even Know I Had: Understanding User Perceptions of Cloud Storage Privacy.In Proc. CHI. ACM, 1641–1644.

[18] Edward Cutrell, Daniel Robbins, Susan Dumais, and Raman Sarin. 2006. Fast, flexiblefiltering with phlat. In Proc. CHI. ACM, 261–270.

[19] Mary Czerwinski and Eric Horvitz. 2002. An investigation of memory for daily com-puting events. People and Computers (2002), 229–246.

[20] Idilio Drago, Marco Mellia, Maurizio M Munafo, Anna Sperotto, Ramin Sadre, and AikoPras. 2012. Inside dropbox: understanding personal cloud storage services. In Proc. 2012ACM conference on Internet measurement conference. ACM, 481–494.

[21] Susan Dumais, Edward Cutrell, Jonathan J Cadiz, Gavin Jancke, Raman Sarin, andDaniel C Robbins. 2016. Stuff I’ve seen: a system for personal information retrieval andre-use. In ACM SIGIR Forum, Vol. 49. ACM, 28–35.

[22] Janna Lynn Dupree, Richard Devries, Daniel M. Berry, and Edward Lank. 2016. PrivacyPersonas: Clustering Users via Attitudes and Behaviors toward Security Practices. InProc. CHI.

[23] David Elsweiler, Ian Ruthven, and Christopher Jones. 2007. Towards memory support-ing personal information management tools. Journal of the Association for InformationScience and Technology 58, 7 (2007), 924–946.

[24] Eric Freeman and David Gelernter. 1996. Lifestreams: A storage model for personaldata. ACM SIGMOD Record 25, 1 (1996), 80–86.

[25] Global Industry Analysts. Inc. Accessed 2017. personal cloud-a global strategic businessreport. http://www.strategyr.com/MarketResearch/PersonalC loudMarketT rends.asp.(Accessed2017).

[26] Glauber Goncalves, Idilio Drago, Ana Paula Couto Da Silva, Alex Borges Vieira, andJussara M Almeida. 2014. Modeling the dropbox client behavior. In Proc. ICC.

52

[27] Raul Gracia-Tinedo, Pedro Garcıa-Lopez, Alberto Gomez, and Anastasio Illana. 2016.Understanding data sharing in private personal clouds. In Cloud Computing (CLOUD),2016 IEEE 9th International Conference on. IEEE, 392–399.

[28] Graham Cluley. Accessed 2017. Dropbox users leak tax returns, mortgage applicationsand more. https://www.grahamcluley.com/dropbox-box-leak/. (Accessed 2017).

[29] Jane Gruning and Sian Lindley. 2016. Things We Own Together: Sharing Possessionsat Home. In Proc. CHI. ACM, 1176–1186.

[30] Drew Houston and Arash Ferdowsi. 2016. Celebrating half a billion users.https://blogs.dropbox.com/dropbox/2016/03/500-million/. (2016).

[31] Wenjin Hu, Tao Yang, and Jeanna N Matthews. 2010. The good, the bad and theugly of consumer cloud storage. ACM SIGOPS Operating Systems Review 44, 3 (2010),110–115.

[32] Iulia Ion, Niharika Sachdeva, Ponnurangam Kumaraguru, and Srdjan Capkun. 2011.Home is safer than the cloud!: privacy concerns for consumer cloud storage. In Proc.7th Symposium on Usable Privacy and Security. ACM, 13.

[33] Eric Johnson. 2017. Lost in the Cloud: Cloud Storage, Privacy, and Suggestions forProtecting Users’ Data. Stan. L. Rev. 69 (2017), 867.

[34] William Jones, Charles F Munat, Harry Bruce, and Austin Foxley. 2005. The univer-sal labeler: Plan the project and let your information follow. Proc. Association forInformation Science and Technology 42, 1 (2005).

[35] Victor Kaptelinin. 2003. UMEA: translating interaction histories into project contexts.In Proc. CHI. ACM, 353–360.

[36] Beom Heyn Kim, Wei Huang, and David Lie. 2012. Unity: secure and durable personalcloud storage. In Proc. 2012 ACM Workshop on Cloud computing security. ACM, 31–36.

[37] Felix Kollmar. 2017. Cloud Storage Report 2017. https://blog.cloudrail.com/cloud-storage-report-2017/. (2017).

[38] Aparna Krishnan and Steve Jones. 2005. TimeSpace: activity-based temporal visuali-sation of personal information spaces. Personal and Ubiquitous Computing 9, 1 (2005),46–65.

[39] Mark W Lansdale. 1988. The psychology of personal information management. Appliedergonomics 19, 1 (1988), 55–66.

[40] Cathy Marshall and John C Tang. 2012. That syncing feeling: early user experienceswith the cloud. In Proc. Designing Interactive Systems Conference. ACM, 544–553.

53

[41] Charlotte Massey, Thomas Lennig, and Steve Whittaker. 2014. Cloudy forecast: anexploration of the factors underlying shared repository use. In Proc. CHI.

[42] Peter Mell, Tim Grance, and others. 2011. The NIST definition of cloud computing.(2011).

[43] Adriana Mijuskovic and Mexhid Ferati. 2015. User awareness of existing privacy andsecurity risks when storing data in the cloud. In Proc. International Conference one-Learning, Vol. 15. 268–273.

[44] Mainack Mondal, Johnnatan Messias, Saptarshi Ghosh, Krishna P Gummadi, andAniket Kate. 2017. Longitudinal Privacy Management in Social Media: The Need forBetter Controls. IEEE Internet Computing 21, 3 (2017), 48–55.

[45] Michael Nebeling, Matthias Geel, Oleksiy Syrotkin, and Moira C Norrie. 2015. MUBox:Multi-User Aware Personal Cloud Storage. In Proc. CHI.

[46] Emilee Rader. 2009. Yours, mine and (not) ours: social influences on group informationrepositories. In Proc. CHI. ACM, 2095–2098.

[47] Emilee Rader. 2010. The effect of audience design on labeling, organizing, and findingshared files. In Proc. CHI. ACM, 777–786.

[48] Kopo Marvin Ramokapane, Awais Rashid, and Jose Such. 2017. “I feel stupid I can’tdelete...”: a study of users’ cloud deletion practices and coping strategies. In Proc. 13thSymposium on Usable Privacy and Security. ACM.

[49] Esther Schindler. Accessed 2017. Cloud development sur-vey. Evans Data Corporation, Strategic Reports, July 2010.https://evansdata.com/reports/viewRelease.php?reportID=27. (Accessed 2017).

[50] Manya Sleeper, William Melicher, Hana Habib, Lujo Bauer, Lorrie Faith Cranor, andMichelle L Mazurek. 2016. Sharing personal content online: Exploring channel choiceand multi-channel behaviors. In Proc. CHI.

[51] Peter Snyder and Chris Kanich. 2013. Cloudsweeper: enabling data-centric documentmanagement for secure cloud archives. In Proc. 2013 ACM workshop on Cloud comput-ing security. ACM, 47–54.

[52] Luke Stark and Matt Tierney. 2014. Lockbox: mobility, privacy and values in cloudstorage. Ethics and Information Technology 16, 1 (2014), 1–13.

[53] Simone Stumpf and Jon Herlocker. 2006. Tasktracer: Enhancing personal informationmanagement through machine learning. 2nd Invitational Workshop on Personal Infor-mation Management at SIGIR 2006 (2006), 105.

[54] Nabil Ahmed Sultan. 2011. Reaching for the ‘cloud’: How SMEs can manage. Interna-tional journal of information management 31, 3 (2011), 272–278.

54

[55] Amy Voida, Judith S Olson, and Gary M Olson. 2013. Turbulence in the clouds: chal-lenges of cloud-based information work. In Proc. CHI. ACM, 2273–2282.

[56] Stephen Voida, W Keith Edwards, Mark W Newman, Rebecca E Grinter, and NicolasDucheneaut. 2006. Share and share alike: exploring the user interface affordances of filesharing. In Proc. CHI. ACM, 221–230.

[57] James Wen. 2003. Post-valued recall web pages: User disorientation hits the big time.It & Society 1, 3 (2003), 184–194.

[58] Steve Whittaker, Victoria Bellotti, and Jacek Gwizdka. 2006. Email in personal infor-mation management. Commun. ACM 49, 1 (2006), 68–73.

[59] Steve Whittaker and Candace Sidner. 1996. Email overload: exploring personal infor-mation management of email. In Proc. CHI. ACM, 276–283.

[60] Jinsheng Xu, Jinghua Zhang, T Harvey, and J Young. 2008. A survey of asynchronouscollaboration tools. Information Technology Journal 7, 8 (2008), 1182–1187.

[61] Hong Zhang and Michael Twidale. 2012. Mine, yours and ours: using shared folders inpersonal information management. Personal Information Management (2012).

[62] Xuan Zhao, Niloufar Salehi, Sasha Naranjit, Sara Alwaalan, Stephen Voida, and DanCosley. 2013. The many faces of Facebook: Experiencing social media as performance,exhibition, and personal archive. In Proc. CHI. ACM, 1–10.

[63] Diao Zhe, Wang Qinghong, Su Naizheng, and Zhang Yuhan. 2017. Study on Data Secu-rity Policy Based on Cloud Storage. In Big Data Security on Cloud (BigDataSecurity),IEEE International Conference on High Performance and Smart Computing (HPSC),and IEEE International Conference on Intelligent Data and Security (IDS), 2017 IEEE3rd International Conference on. IEEE, 145–149.

[64] Jianying Zhou. 2014. On the security of cloud data storage and sharing. In Proc. 2ndinternational workshop on Security in cloud computing. ACM, 1–2.

55

APPENDIX A

SURVEY INSTRUMENT

A.1 General question

G1 For approximately how long have you had the Cloud Storage account youare using for this study?◦ Less than 1 year (1)◦ At least 1 year, but less than 2 years (2)◦ At least 2 years, but less than 3 years (3)◦ At least 3 years, but less than 4 years (4)◦ At least 4 years, but less than 5 years (5)◦ More than 5 years (6)

G1-1 Cloud storage providers offer both free accounts and paid accounts, wherethe latter offers more storage space. Do you use a free Cloud Storage account ora paid Cloud Storage account?◦ Free account (1)◦ Paid account (2)◦ I’m not sure (3)

Display This Question: If Cloud storage providers offer both free accounts and paid accounts,where the latter offers more storage space. Do you use a free Cloud Storage account or a paidCloud Storage account? Paid account Is SelectedG1-2 How much do you pay per month?

G2 How often do you use this Cloud Storage account for work or school pur-poses?◦ At least once a week (1)◦ At least once a month, but less than once a week (2)◦ At least once a year, but less than once a month (3)◦ Less than once a year, but sometimes (4)◦ I do not use it for work or school purposes (5)

56

G3 How often do you use this Cloud Storage account for personal purposes (i.e.,for purposes other than for work or school)?◦ At least once a week (1)◦ At least once a month, but less than once a week (2)◦ At least once a year, but less than once a month (3)◦ Less than once a year, but sometimes (4)◦ I do not use it for personal purposes (5)

G4 I use this Cloud Storage account for the following purposes: (Check all thatapply)� Collaborating with co-workers, classmates, or professional contacts by joinly creating andediting files (1)� Collaborating with friends and family by joinly creating and editing files (2)� Sharing files that I have created with co-workers, classmates, or other professional contacts(3)� Sharing files that I have created with family and friends (4)� Backing up files related to my job, school, or career (5)� Backing up files that are not related to my job, school, or career (6)� Other (7)

G5 There are multiple ways you can access files in your Cloud Storage account.One of these ways is by installing Cloud Storage software on your computer sothat certain folders are automatically synced with your Cloud Storage account.How often do you access (view or edit) files or folders on your computer thatare automatically synced with your Cloud Storage account?◦ Daily or more frequently (1)◦ Every few days (2)◦ Weekly (3)◦ Monthly (4)◦ Less than once a month, but sometimes (5)◦ Never (6)

G6 Another way to access files in your Cloud Storage account is by using a webbrowser like Chrome, Firefox, or Safari to log into the Cloud Storage website.How often do you log into the Cloud Storage website using this account?◦ Daily or more frequently (1)◦ Every few days (2)◦ Weekly (3)◦ Monthly (4)◦ Less than once a month, but sometimes (5)◦ Never (6)

57

G7 Yet another way to access files in your Cloud Storage account is by using anapp on your smartphone (IPhone or Android). How often do you use a smart-phone app to access files or folders stored in this Cloud Storage account?◦ Daily or more frequently (1)◦ Every few days (2)◦ Weekly (3)◦ Monthly (4)◦ Less than once a month, but sometimes (5)◦ Never, though I do use a smartphone (6)◦ Never; I do not use a smartphone (7)

The following two questions concern the following distinction: A file stored locallyis acces-sible on your computer (i.e., stored on the hard drive) even if you are not connected to theInternetA file stored in the cloud is accessible only if you are connected to the Internet Notethat a given file might be stored both locally and in the cloud.

G8-1 Which statement best describes your current situation regarding cloud-files?◦ All of my cloud files are also stored locally on my computer (1)◦ Most of my cloud files are also stored locally on my computer (2)◦ Some of my cloud files are also stored locally on my computer (3)◦ None of my cloud files are also stored locally on my computer (4)

G8-2 Which statement best describes your current situation regarding files stored-locally?◦ All of my locally stored files are also accessible in the cloud via Cloud Storage (1)◦ Most of my locally stored files are also accessible in the cloud via Cloud Storage (2)◦ Some of my locally stored files are also accessible in the cloud via Cloud Storage (3)◦ None of my locally stored files are also accessible in the cloud via Cloud Storage (4)

G9 On average, how often do you run out of storage space on your Cloud Storageaccount?◦ I am almost always out of storage space (1)◦ At least once a month (2)◦ At least once a year, but less than once a month (3)◦ Less than once a year, but sometimes (4)◦ I have never run out of storage space (5)◦ I don’t know (6)

58

G10 On average, how often do you organize your Cloud Storage by deleting un-necessary files, moving files to different folders, or performing similar clean-uptasks?◦ At least once a week (1)◦ At least once a month, but less than once a week (2)◦ At least once a year, but less than once a month (3)◦ Less than once a year, but sometimes (4)◦ I have never organized my Cloud Storage (5)◦ I don’t know (6)

G11 Overall, which of the following cloud services do you use? (Check all thatapply)� Amazon Cloud Drive (1)� Apple iCloud (2)� Box (3)� Dropbox (4)� Google Drive (5)� Microsoft OneDrive (6)� SpiderOak One (7)� Other (8)

59

A.2 Content specific question

CF-1 After looking at this file, do you know what it is? (Note: You might notknow what it is if the file was automatically created or automatically saved toyour cloud storage.)◦ Yes (1)◦ No (2)

Display This Question: If After looking at this file, do you know what it is? (Note: You mightnot know what it is if the f· · · Yes Is SelectedCF-11 Prior to this survey, I remembered that this file was stored on any deviceor service I use.◦ Strongly Agree (1)◦ Agree (2)◦ Neutral (3)◦ Disagree (4)◦ Strongly disagree (5)

Display This Question: If After looking at this file, do you know what it is? (Note: You mightnot know what it is if the f· · · Yes Is SelectedCF-12 Prior to this survey, I remembered that this file was stored in my Cloud Storage.◦ Strongly agree (1)◦ Agree (2)◦ Neutral (3)◦ Disagree (4)◦ Strongly disagree (5)

Display This Question: If After looking at this file, do you know what it is? (Note: You mightnot know what it is if the f· · · Yes Is SelectedCF-13 As far as you can remember, why did you originally store this file onCloud Storage?

Display This Question: If After looking at this file, do you know what it is? (Note: You mightnot know what it is if the f· · · Yes Is SelectedCF-14 As far as you can remember, when did you originally store this file onCloud Storage?◦ Within the last week (1)◦ At least a week ago, but less than a month ago (2)◦ At least a month ago, but less than a year ago (3)◦ At least a year ago, but less than five years ago (4)◦ At least five years ago (5)◦ I don’t know (6)◦ As far as I remember, I did not store it on Cloud Storage (7)

60

CF-2 Which of these statements best characterizes what you would like to hap-pen to this file?◦ I would like to keep this file stored as-is in my Cloud Storage. (1)◦ I would like to keep only an encrypted version of this file in my Cloud Storage. (2)◦ I would like to delete this file from my Cloud Storage. (3)

Display This Question: If After looking at this file, do you know what it is? (Note: You mightnot know what it is if the f· · · Yes Is SelectedCF-15 As far as you remember, when is the last time you accessed (viewed ormodified) this file?◦ Less than a week ago (1)◦ Over 1 week ago, but less than 1 month ago (2)◦ Over 1 month ago, but less than 1 year ago (3)◦ Over 1 year ago, but less than 5 years ago (4)◦ Over 5 years ago (5)◦ As far as I know, I have never accessed this file (6)◦ I don’t remember (7)

Display This Question: If After looking at this file, do you know what it is? (Note: You mightnot know what it is if the f· · · Yes Is SelectedCF-16 When do you next expect to access (view or modify) this file in the fu-ture?◦ Within the next week (1)◦ Over 1 week from now, but less than 1 month from now (2)◦ Over 1 month from now, but less than 1 year from now (3)◦ Over 1 year from now, but less than 5 years from now (4)◦ Over 5 years from now, but eventually (5)◦ Never (6)

61

Display This Question: If Which of these statements best characterizes what you would liketo happen to this file?I would like to keep this file stored as-is in my Cloud Storage. Is Se-lected Or Which of these statements best characterizes what you would like to happen tothis file?I would like to keep only an encrypted version of this file in my Cloud Storage. IsSelectedCF-21 Files could potentially be stored in the cloud in a way that saves energy.However, this would mean that the file could only be accessed with some delay,rather than instantaneously. When I next try to access this file· · ·◦ · · · no delay in the file being available is acceptable (1)◦ · · · a delay of up to a few minutes in being able to access the file is acceptable (2)◦ · · · a delay of up to a few hours in being able to access the file is acceptable (3)◦ · · · a delay of up to a few days in being able to access the file is acceptable (4)

Display This Question: If Which of these statements best characterizes what you wouldlike to happen to this file?I would like to keep only an encrypted version of this file in myCloud Storage. Is SelectedCFE-1 It is important to me that this file is encrypted, rather than remainingas-is in my Cloud Storage.◦ Strongly agree (1)◦ Agree (2)◦ Neutral (3)◦ Disagree (4)◦ Strongly disagree (5)

Display This Question: If Which of these statements best characterizes what you would liketo happen to this file?I would like to delete this file from my Cloud Storage. Is SelectedCFD-1 It is important to me that this file is deleted, rather than remaining as-isin my Cloud Storage.◦ Strongly agree (1)◦ Agree (2)◦ Neutral (3)◦ Disagree (4)◦ Strongly disagree (5)

Display This Question: If Which of these statements best characterizes what you would liketo happen to this file?I would like to keep this file stored as-is in my Cloud Storage. Is Se-lectedCFA-1 Why would you want to continue storing this file as-is onCloud Storage?

62

Display This Question: If Which of these statements best characterizes what you wouldlike to happen to this file?I would like to keep only an encrypted version of this file in myCloud Storage. Is SelectedCFE-2 Why would you want to keep an encrypted version of this file onCloud Storage?

Display This Question: If Which of these statements best characterizes what you would liketo happen to this file?I would like to delete this file from my Cloud Storage. Is SelectedCFD-2 Why would you want to delete this file fromCloud Storage?

CF-3 It is important to me to keep this file safe from unauthorized access.◦ Strongly agree (1)◦ Agree (2)◦ Neutral (3)◦ Disagree (4)◦ Strongly disagree (5)

CF-4 It is important to me that I never lose the ability to access this file.◦ Strongly agree (1)◦ Agree (2)◦ Neutral (3)◦ Disagree (4)◦ Strongly disagree (5)

CF-5 As far as you know, do you have a copy of this file on any other device orservice you use?◦ Yes, I have another copy of the file somewhere (1)◦ No, I do not have any other copies of this file (2)◦ I’m not sure (3)

Display This Question: If Which of these statements best characterizes what you would liketo happen to this file?I would like to delete this file from my Cloud Storage. Is SelectedCFD-3 Which of the following two statements better describes what you wouldwant to happen?◦ Although I would like to delete this file from my Cloud Storage account, I would want tokeep a copy of the file on a local device (e.g., my computer or smartphone) (1)◦ I would like to delete this file from my Cloud Storage account, and I would not want tokeep a copy of the file on any of my local devices (2)

63

CF-6 Are there any other files stored in your Cloud Storage account for whichyou would want to apply the same file-management decision (keep as-is, encrypt,delete) as for this file?◦ Yes (1)◦ No (2)◦ I’m not sure (3)

Display This Question: If Are there any other files stored in your Cloud Storage account forwhich you would want to· · · Yes Is SelectedCF-61 For what other files in your Cloud Storage would you want to apply thesame file-management decision? Please describe those files using whatever lan-guage you use to think about them, rather than constraining yourself to thecurrent Cloud Storage interface.

Display This Question: If Are there any other files stored in your Cloud Storage account forwhich you would want to· · · No Is SelectedCF-62 Why would you not want to apply the same file-management decisionfrom this file toother files in your Cloud Storage?

CSP-1 For each of these people with whom the file is shared, indicate belowwhether you know who the person is.

I know who this is, andI have talked to themwithin the last year

I know who this is, butI have not talked tothem in over a year

I do not know who thisis

Member A ◦ ◦ ◦

Member B ◦ ◦ ◦

Member C ◦ ◦ ◦

CSP-2 For each of these people, indicate below whether you would want to keepsharing this particular file with that person, stop sharing thisparticularfile withthat person, or whether it doesn’t matter to you.

Definitely keep sharing Doesn’t matter Definitely stop sharing

Member A ◦ ◦ ◦

Member B ◦ ◦ ◦

Member C ◦ ◦ ◦

64

CSP-3 To your knowledge, were you the person who originally shared this filewith those people?◦ I am the person who shared this file with all of those people (1)◦ I am the person who shared this file with some, but not all, of those people (2)◦ I am not the person who shared this file with any of those people (3)◦ I don’t know (4)

Display This Question:If For each of these people, indicate below whether you would want to keep sharing thisparticular f· · · MamberA - Definitely keep sharing Is SelectedOr For each of these people, indicate below whether you would want to keep sharing thisparticular f· · · MamberB - Definitely keep sharing Is SelectedOr For each of these people, indicate below whether you would want to keep sharing thisparticular f· · · MamberC - Definitely keep sharing Is SelectedCSP-11 You indicated that you want to keep sharing this file with at least oneother person. Why do you want to keep sharing this file with them?

Display This Question:If For each of these people, indicate below whether you would want to keep sharing thisparticular f· · · MamberA - Definitely stop sharing Is SelectedOr For each of these people, indicate below whether you would want to keep sharing thisparticular f· · · MamberB - Definitely stop sharing Is SelectedOr For each of these people, indicate below whether you would want to keep sharing thisparticular f· · · MamberC - Definitely stop sharing Is SelectedCSP-12 You indicated that you want to stop sharing this file with at least oneother person. Why do you want to stop sharing this file with them?

Display This Question:If To your knowledge, were you the person who originally shared this file with those people?Iam the person who shared this file with all of those people Is SelectedOr To your knowledge, were you the person who originally shared this file with those peo-ple?I am the person who shared this file with some, but not all, of those people Is SelectedCSP-31 If you remember, when did you first share this file with other people?◦ Less than a week ago (1)◦ Over 1 week ago, but less than 1 month ago (2)◦ Over 1 month ago, but less than 1 year ago (3)◦ Over 1 year ago, but less than 5 years ago (4)◦ Over 5 years ago (5)◦ I don’t know (6)

65

Display This Question:If MemberA Is Not Equal to —And IfIf To your knowledge, were you the person who originally shared this file with those people?Iam the person who shared this file with all of those people Is SelectedOr To your knowledge, were you the person who originally shared this file with those peo-ple?I am the person who shared this file with some, but not all, of those people Is SelectedCSP-32 (Optional) If you remember, why did you originally share this file withMamberA?

Display This Question:If MemberB Is Not Equal to —And IfIf To your knowledge, were you the person who originally shared this file with those people?I am the person who shared this file with all of those people Is SelectedOr To your knowledge, were you the person who originally shared this file with those people?I am the person who shared this file with some, but not all, of those people Is SelectedCSP-33 (Optional) If you remember, why did you originally share this file withMamberB?

Display This Question:If MemberC Is Not Equal to —And IfIf To your knowledge, were you the person who originally shared this file with those people?I am the person who shared this file with all of those people Is SelectedOr To your knowledge, were you the person who originally shared this file with those people?I am the person who shared this file with some, but not all, of those people Is SelectedCSP-34 (Optional) If you remember, why did you originally share this file withMamberC?

CSP-4 If anyone other than me changes (modifies or deletes) the file, my copyof the file should also reflect their changes.◦ Strongly agree (1)◦ Agree (2)◦ Neutral (3)◦ Disagree (4)◦ Strongly disagree (5)

CSP-41 Why?

66

CSP-5 If I change (modify or delete) this file, other people’s copies of the fileshould also reflect my changes.◦ Strongly agree (1)◦ Agree (2)◦ Neutral (3)◦ Disagree (4)◦ Strongly disagree (5)

CSP-51 Why?

CSI-1 To your knowledge, were you the person who created a shareable link forthis file?◦ Yes, I am the person who created the link for sharing this file (1)◦ No, I am not the person who created the link for sharing this file (2)◦ I don’t know (3)

Display This Question: If To your knowledge, were you the person who created a shareablelink for this file? Yes, I am the person who created the link for sharing this file Is SelectedCSI-11 To your knowledge, with how many people have you shared the link toaccess this file?◦ No one other than yourself (1)◦ 1 - 5 people (2)◦ 6 - 10 people (3)◦ 11 - 15 people (4)◦ 16 - 20 people (5)◦ More than 20 people (6)◦ I don’t know (7)

CSI-2 Do you want to keep sharing this particular file with others using a link,stop sharing thisparticularfile with others using a link, or does it not matter toyou?◦ Definitely keep sharing using a link (1)◦ Doesn’t matter (2)◦ Definitely stop sharing using a link (3)

Display This Question: If Do you want to keep sharing this particular file with others usinga link, stop sharing thispart· · · Definitely keep sharing using a link Is SelectedCSI-21 You indicated that you want to keep sharing this file using a link. Whydo you want to keep sharing this file?

67

Display This Question: If Do you want to keep sharing this particular file with others usinga link, stop sharing thispart· · · Definitely stop sharing using a link Is SelectedCSI-22 You indicated that you want to stop sharing this file using a link. Whydo you want to stop sharing this file?

Display This Question: If To your knowledge, were you the person who created a shareablelink for this file? Yes, I am the person who created the link for sharing this file Is SelectedCSI-12 If you remember, when did you set this file to be shared using a link?◦ Less than a week ago (1)◦ Over 1 week ago, but less than 1 month ago (2)◦ Over 1 month ago, but less than 1 year ago (3)◦ Over 1 year ago, but less than 5 years ago (4)◦ Over 5 years ago (5)◦ I don’t know (6)

Display This Question: If To your knowledge, were you the person who created a shareablelink for this file? Yes, I am the person who created the link for sharing this file Is SelectedCSI-13 (Optional) If you remember, why did you originally share this file usinga link?

CSI-3 If anyone other than me changes (modifies or deletes) the file, my copy ofthe file should also reflect their changes.◦ Strongly agree (1)◦ Agree (2)◦ Neutral (3)◦ Disagree (4)◦ Strongly disagree (5)

CSI-31 Why?

CSI-4 If I change (modify or delete) this file, other people’s copies of the fileshould also reflect my changes.◦ Strongly agree (1)◦ Agree (2)◦ Neutral (3)◦ Disagree (4)◦ Strongly disagree (5)

CSI-41 Why?

68

A.3 Features and Demographics

Our last few questions cover features that Cloud Storage could possibly add in the future.Please respond to the following statements.

DE (Regardless of whether or not I would want to encrypt any of the files Isaw in today’s study,) it would be helpful if I could specify that a file on myCloud Storageshould be automatically encrypted.◦ Strongly agree (1)◦ Agree (2)◦ Neutral (3)◦ Disagree (4)◦ Strongly disagree (5)

Display This Question: If (Regardless of whether or not I would want to encrypt any ofthe files I saw in today’s study,) it would be helpful if I could specify that a file on myCloud Storage should be automatic Strongly agree Is Selected Or (Regardless of whether ornot I would want to encrypt any of the files I saw in today’s study,) it would be helpful if Icould specify that a file on my Cloud Storage should be automatic Agree Is SelectedDE-11 Why would it be helpful?

Display This Question: If (Regardless of whether or not I would want to encrypt any ofthe files I saw in today’s study,) it would be helpful if I could specify that a file on myCloud Storage should be automatic Strongly agree Is Selected Or (Regardless of whether ornot I would want to encrypt any of the files I saw in today’s study,) it would be helpful if Icould specify that a file on my Cloud Storage should be automatic Agree Is SelectedDE-12 How, if at all, could Cloud Storage identify files or folders in your accountthat should be automatically encrypted?

Display This Question:If (Regardless of whether or not I would want to encrypt any of the files I saw in today’sstudy,) it would be helpful if I could specify that a file on my Cloud Storage should be au-tomatic Disagree Is Selected Or (Regardless of whether or not I would want to encrypt anyof the files I saw in today’s study,) it would be helpful if I could specify that a file on myCloud Storage should be automatic Strongly disagree Is SelectedDE-2 Why would it not be helpful??

69

DD It would be helpful if I could choose that certain files or folders would au-tomatically and permanently delete themselves from my Cloud Storageaccountafter a period of time I specify.◦ Strongly agree (1)◦ Agree (2)◦ Neutral (3)◦ Disagree (4)◦ Strongly disagree (5)

Display This Question: If It would be helpful if I could choose that certain files or folderswould automatically and permanently delete themselves from my Cloud Storage account af-ter a period of time I specify. Strongly agree Is Selected Or It would be helpful if I couldchoose that certain files or folders would automatically and permanently delete themselvesfrom my Cloud Storage account after a period of time I specify. Agree Is SelectedDD-11 Why would it be helpful?

Display This Question: If It would be helpful if I could choose that certain files or folderswould automatically and permanently delete themselves from my Cloud Storage account af-ter a period of time I specify. Strongly agree Is Selected Or It would be helpful if I couldchoose that certain files or folders would automatically and permanently delete themselvesfrom my Cloud Storage account after a period of time I specify. Agree Is SelectedDD-12 How, if at all, could Cloud Storage automatically identify files or foldersin your account that should be automatically deleted?

Display This Question: If It would be helpful if I could choose that certain files or folderswould automatically and permanently delete themselves from my Cloud Storage account af-ter a period of time I specify. Disagree Is Selected Or It would be helpful if I could choosethat certain files or folders would automatically and permanently delete themselves from myCloud Storage account after a period of time I specify. Strongly disagree Is SelectedDD-2 Why would it not be helpful?

70

DA It would be helpful if I could specify that certain files or folders would au-tomatically move to an archive (saving energy, but causing a delay when I tryto access the file) after a period of time I specify.◦ Strongly agree (1)◦ Agree (2)◦ Neutral (3)◦ Disagree (4)◦ Strongly disagree (5)

Display This Question: If It would be helpful if I could specify that certain files or folderswould automatically move to an archive (saving energy, but causing a delay when I try toaccess the file) after a period of time Strongly agree Is Selected Or It would be helpful ifI could specify that certain files or folders would automatically move to an archive (savingenergy, but causing a delay when I try to access the file) after a period of time Agree IsSelectedDA-11 Why would it be helpful?

Display This Question: If It would be helpful if I could specify that certain files or folderswould automatically move to an archive (saving energy, but causing a delay when I try toaccess the file) after a period of time Strongly agree Is Selected Or It would be helpful ifI could specify that certain files or folders would automatically move to an archive (savingenergy, but causing a delay when I try to access the file) after a period of time Agree IsSelectedDA-12 How, if at all, could Cloud Storage automatically identify files or foldersin your account that should be automatically moved to an energy-saving archive?

Display This Question:If It would be helpful if I could specify that certain files or folderswould automatically move to an archive (saving energy, but causing a delay when I try toaccess the file) after a period of time Disagree Is Selected Or It would be helpful if I couldspecify that certain files or folders would automatically move to an archive (saving energy,but causing a delay when I try to access the file) after a period of time Strongly disagree IsSelectedDA-2 Why would it not be helpful?

71

DC (Optional) Do you have any other comments about anything in today’s sur-vey?

DP-1 With what gender do you identify?◦ Male (1)◦ Female (2)◦ Other (3)◦ Prefer not to answer (4)

DP-2 Are you majoring in, or do you have a degree or job in, computer science,computer engineering, information technology, or a related field?◦ Yes (1)◦ No (2)◦ Prefer not to answer (3)

DP-3 How old are you?

DP-4 What is your occupation?

72