riffle: an efficient communication system with strong

Riffle: An Efficient Communication System with Strong

Anonymity

by

Young Hyun Kwon

B.S., University of Pennsylvania (2013)

ARCHNESMASSACHUSETTS tNSTITJTE

OF TECHNOLOLGY

JUL 0 7 2015

LIBRARIES

Submitted to the Department of Electrical Engineering and Computer Sciencein partial fulfillment of the requirements for the degree of

Master of Science in Electrical Engineering and Computer Science

at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

June 2015

Massachusetts Institute of Technology 2015. All rights reserved.

Author ........Signature redacted

................Department of Electrical Engineering and Computer Science

May 20, 2015

0((Signature redacted

Certified by.............................................................Srini Devadas

Edwin Sibley Webster Professor of Electrical Engineering and Computer ScienceThesis Supervisor

Accepted by .................. Signature redactedI e L6t(i. Kolodziejski

Chair, Department Committee on Graduate Students

I

...

Riffle: An Efficient Communication System with Strong Anonymityby

Young Hyun Kwon

Submitted to the Department of Electrical Engineering and Computer Scienceon May 20, 2015, in partial fulfillment of the

requirements for the degree ofMaster of Science in Electrical Engineering and Computer Science

Abstract

Anonymous communication is an important part of democratic societies and freedom ofspeech. Whistleblowers, protest organizers, and, more broadly, anyone with controversialviewpoints have been using the limited form of anonymity the Internet provides to protecttheir privacy. Unfortunately, the basic anonymity the Internet guarantees is too weak toprotect their identities from even the weakest adversaries. As a result, more and more usershave adopted privacy enhancing technologies to protect themselves.

All existing anonymity systems, however, sacrifice anonymity for efficient communicationor vice-versa. Onion-routing achieves low latency, high bandwidth, and scalable anonymouscommunication, but is susceptible to traffic analysis attacks. Designs based on DC-Nets, onthe other hand, protect the users against traffic analysis attacks, but sacrifice bandwidth.Verifiable mixnets maintain strong anonymity with low bandwidth overhead, but suffer fromhigh computation overhead instead.

In this thesis, we present Riffle, a bandwidth and computation efficient communicationsystem with strong anonymity. Riffle consists of a small set of anonymity servers and alarge number of users, and guarantees anonymity as long as there exists at least one honestserver. Riffle uses a new hybrid verifiable shuffle technique and private information retrievalfor bandwidth- and computation-efficient anonymous communication. We have evaluatedRiffle in two different applications: file sharing and microblogging. Our evaluation showsthat Riffle can achieve a bandwidth of over 100KB/s per user in an anonymity set of 200users in the case of file sharing, and handle over 100,000 users with less than 10 secondlatency in the case of microblogging.

Thesis Supervisor: Srini DevadasTitle: Edwin Sibley Webster Professor of Electrical Engineering and Computer Science

3

Acknowledgments

First and foremost, I would like to thank my advisor, Srini Devadas, for his endless support

and guidance. Your energy and advice have guided me through the maze that is graduate

school thus far, and helped me grow immensely both as a person and a researcher. I especially

thank you for the amount of research freedom you have granted me, allowing me to explore

different areas of computer science without any restrictions. I really cannot imagine a better

advisor, and I look forward to our road ahead.I would like to thank my collaborators, Bryan Ford and David Lazar, for the long dis-

cussions we had, both technical and personal. This project was very much a rollercoasterride: almost daily, we hit highs when we found new ideas and directions, and, just hours

later, we hit lows when we realized new flaws in the system that broke our ideas entirely.If it were not for Bryan and David, I would have given up on this project a long time ago;

thank you for making the highs higher and lows not so low. As a result of this collaboration,I also found some new research directions I would like to pursue in the future; for that, Iam extremely grateful.

I have also been fortunate enough to find a research home with people of infinite academicinterests and capabilities. I would like to thank everyone in our group, especially ChrisFletcher, Ling Ren, Charles Herder, Mashael Al-Sabah, and Marten Van Dijk (who is now

a professor at University of Connecticut) for everything they have done for and with mesince I started at MIT. When I say something crazy, perhaps even stupid, they are the

people who find the silver-lining, and develop it into an interesting research idea. I love the

open-minded attitude of our group towards research, and I am excited to see what more wecan accomplish.

I also want to thank my undergraduate advisor, Andr6 DeHon. He took me under

his wing when I was just a lost sophomore in college, and helped me discover my passion

for computer security. Without him, not only would I not be pursuing a Ph.D., but Iwould probably not even be working on computer security. I would like to also thank

professors Jonathan Smith, Boon Thau Loo, and Sanjeev Khanna who have provided me

with invaluable advice on both personal and professional matters. Without their help, I am

honestly not sure where I would be.Thank you to all my MIT and non-MIT friends. I cannot mention everyone as the

list would be longer than the actual thesis, but I would like to say special thanks to (in

alphabetical order): Misha Borochin, Daniel Grier, Michal Grzadkowski, Nick Howarth, "G"Kamath, Twan Koolen, Jerry Li, Daniel Ramos, Po-An Tsai, Yang Yang Zhao, and Alex

Zheng. I need not say more than "You guys are the best."

Finally, the biggest thanks to my grandmother, my parents, my brother Fred, and my

better half Fannie Liu. They believe in me even when I find it hard to, and constantly

remind me why my life is so fantastic. Fannie, you are now starting graduate school, and Ihope I can repay some of the things you have done for me by doing the same for you!

5

Contents

1 Introduction 131.1 Motivation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.2 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2 Related Work 172.1 Proxy-Based Anonymity Systems . . . . . . . . . . . . . . . . . . . . . . . . . 172.2 Anonymity versus Bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.3 Anonymity versus Computation . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3 Background 213.1 Verifiable Shuffle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.2 Authenticated Encryption . . . . . ........................ 21

3.3 Riffle Deployment Model and Assumptions . . . . . . . . . . . . . . . . . . . . 22

4 Riffle Architecture 254.1 Riffle Protocol Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.2 Hybrid Verifiable Shuffle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.3 Private Information Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.4 Accusation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.5 Bandwidth Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.6 Security Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5 Applications 335.1 File Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5.1.1 Non-Bijective Requests and Uploads . . . . . . . . . . . . . . . . . . . 34

5.1.2 Bandwidth Overhead in File Sharing . . . . . . . . . . . . . . . . . . . 355.2 Microblogging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

6 Prototype Evaluation 376.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

6.2.1 File sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376.2.2 Microblogging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

7 Discussion, Future Work, and Conclusion 417.1 D icussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

7.1.1 Alternate usage model . . . . . ... . . . . . . . . . . . . . . . . . . . . 41

7

7.1.2 M alicious Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417.1.3 Server to server bandwidth . . . . . . . . . . . . . . . . . . . . . . . . 427.1.4 Network churns and group changes . . . . . . . ... . . . . . . . . . . . 427.1.5 Intersection attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

7.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

8

List of Figures

3-1 Deployment model of Riffle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4-1 Anonymous communication in Riffle . . . . . . . . . . . . . . . . . . . . . . . 26

5-1 Anonymous File Sharing Protocol . . . . . . . . . . . . . . . . . . . . . . . . . 34

6-1 Testbed topology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386-2 Average time taken to share a 300MB file and effective bandwidth with dif-

ferent group configurations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396-3 Breakdown of the total time taken to share two different size files for varying

numbers of clients with 3 servers. . . . . . . . . . . . . . . . . . . . . . . . . . 396-4 Average latency to share a microblogging post for different message sizes with

3 servers. ....... .............. .............. ...... ... 40

9

List of Tables

2.1 Trade-offs made by existing systems. . . . . . . . . . . . . . . . . . . . . . . . 17

3.1 Term inology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

11

Chapter 1

Introduction

1.1 Motivation

The right to remain anonymous is a fundamental right in a democratic society and is crucialfor freedom of speech [49]. Many whistleblowers, protest organizers, and, more broadly,anyone with controversial viewpoints have long been relying on the limited form of anonymitythe Internet provides to voice their ideas. Recently, anonymizing networks based on relayssuch as Tor [251 have been gaining popularity as a practical privacy enhancing technologyamong users seeking higher levels of privacy. However, such systems are susceptible to trafficanalysis attacks [44, 36] by powerful adversaries such as an authoritarian government or astate controlled ISP, and have recently been attacked by even weaker adversaries monitoringonly the users' traffic [32, 40, 15, 51].

There are two major branches of work that offer traffic analysis resistance even in thepresence of a powerful adversary. The first is the Dining-Cryptographer Networks (DC-Nets) [161, which offers information theoretically secure anonymous communication for itsusers as long as one other participant is honest. Dissent [53] improved upon DC-Nets bymoving to the anytrust model, where the network is organized as servers and clients, andguarantee anonymity as long as there exists one honest server. The second is verifiablemixnets, based on mix networks [18]. In this design, the mixes use verifiable shuffle [14, 28,37, 8] to permute the ciphertexts, and produce a third-party verifiable proof of the correctnessof the shuffle without revealing the actual permutation. Similar to DC-Net based systems,verifiable mixnets guarantee anonymity as long as one mix in the network is honest.

Both designs, however, suffer from serious drawbacks. DC-Nets and DC-Net based sys-tems, by design, implement a broadcast channel. That is, they were primarily designed forthe case where one client messages everyone in the network. Thus, when multiple users wishto communicate simultaneously, every user must transfer a message of size proportional tothe number of clients who wish to communicate. As a result, DC-Net based designs sufferfrom a large bandwidth overhead, and only scale to a few thousand clients [53, 23]. Verifiablemixnets, on the other hand, allow the clients to send messages proportional only to theirown messages, and thus can be bandwidth efficient. However, the high computation over-head of verifiable shuffles has prevented verifiable mixnets from supportig high bandwidthcommunication.

13

1.2 Thesis Contributions

In this thesis, we present Riffle, a system for bandwidth- and computation-efficient anony-

mous communication. Riffle addresses the problems of DC-Nets and verifiable mixnets,while offering the same level of anonymity. At a high level, Riffle is organized as servers and

clients, similar to previous works [53, 21, 23]. Riffle focuses on minimizing the bandwidthconsumption of the clients, who may be connecting from bandwidth-constrained environ-

ments such as their mobile phones, and reduce the computation overhead of the servers, who

may have to compute on large amounts of data. Specifically, the clients in Riffle consume

upstream bandwidth proportional only to the size of their messages rather than the number

of clients, and the server computation only involves fast symmetric key cryptography in thecommon case. This allows the users to efficiently exchange messages, making it suitable

for applications like file sharing that no current strong anonymity system can support well.

Moreover, Riffle provides strong anonymity for all clients as long as one of the servers is

honest.Riffle achieves bandwidth and computation efficiency by employing two privacy primi-

tives: a new hybrid verifiable shuffle for upstream communication, and private informationretrieval (PIR) [201 for downstream communication. Our novel hybrid verifiable shuffle

scheme avoids expensive traditional verifiable shuffle in the critical path, and improves both

bandwidth and computation overhead of the shuffle without losing verifiability by using au-

thenticated encryption. We also propose a novel application of private information retrieval

in the anytrust setting. Previous strong anonymity systems made a trade-off between com-

putation and bandwidth by either broadcasting all messages to all users (low computation,high bandwidth) [53, 23, 21], or using computationally expensive PIR (high computation,low bandwidth) [47]. In the anytrust model, we show that PIR can minimize the download

bandwidth with minimal computation overhead.

We develop a Riffle prototype, and two applications. The first is an anonymous file

sharing application, where each message is large and is potentially of interest to only a small

number of users. Sharing large files is a scenario that has not been considered carefully byprevious strong anonymity systems [8, 53, 21], and we propose a new file sharing protocol

using Riffle in this thesis. The second is an anonymous microblogging application, similar

to the applications studied by previous works [53, 21]. In this setting, each user posts small

messages to the servers, and each message is of interest to many users.

Our prototype demonstrates effectiveness for both applications. In file sharing, the pro-

totype achieves high bandwidth (over 100KB/s) for more than 200 clients. In microblogging,the prototype can support more than 10,000 clients with latency less than a second, or han-

dle more than 100,000 clients with latency of less than 10 seconds. We show that our results

are orders of magnitude better in terms of both scalability and bandwidth compared to

previous systems [48, 53, 81 in comparable scenarios.

To summarize, this thesis makes the following contributions:

1. A hybrid verifiable shuffle that uses symmetric encryption, and avoids expensive public

key verifiable shuffling [14, 28, 37, 8] in the common case.

2. A novel application of private information retrieval [20] in anytrust settings.

3. A bandwidth- and computation-efficient anonymous communication system that is

resilient against traffic analysis attacks and malicious clients.

4. Evaluation of Riffle that demonstrates efficiency in different applications-'

14

1.3 Thesis Organization

In Chapter 2, we describe related work. In Chapter 3, we describe cryptographic primitives

relevant to Riffle, and explain the threat model and deployment model of Riffle. In Chapter 4,

we present the Riffle architecture in detail. We then describe the example of applications

Riffle in Chapter 5, and evaluate our prototype in Chapter 6. Finally, we discuss future

work and conclude in Chapter 7.

15

Chapter 2

Related Work

In this chapter, we describe related work, focusing on the anonymity, bandwidth, and com-putation trade-offs made by existing anonymity systems. Table 2.1 summarizes the trade-offsdescribed in this chapter, along with Riffle's characteristics.

Table 2.1: Trade-offs made by existing systems.

2.1 Proxy-Based Anonymity Systems

Tor [25] is a popular anonymity system that focuses on scalability and low-latency by rout-ing the users' messages through their decentralized volunteer relays without any :delays orcover traffic. While this design allows Tor to scale to millions of users [7], it also allows apowerful adversary who can observe traffic going in and out of the relay network (e.g., astate controlled ISP) to deanonymize the users [44, 36]. Moreover, a recent class of machinelearning based attacks enables even a local adversary only observing traffic to entry relaysto deanonymize the users with high probability [32, 40, 15, 51]. Other systems that alsouse anonymizing proxies [33, 35] focus on different aspects of communication while makingsimilar sacrifices of anonymity.

Mix networks (mixnets) [18] and systems that build on them [24, 45, 27, 46] aim toprevent traffic analysis attacks via routing each user's message through a set of anonymityservers called mixes. In mixnets, mixes collect many users' inputs and shuffle them beforesending any out, making it difficult to correlate the inputs and the outputs even for aglobal adversary. Because the mixes can be distributed and decentralized, mixnets havebeen shown to scale to a large number of clients [46], and provide reasonable latency and

Offers good bandwidth only if the database is small.

17

System Anonymity Bandwidth ComputationTor [25] X

Mixnets [18] XDC-Nets [16, 22, 30] /

Dissent [53] XRiposte 121] Al

Verifiable Mixnets [37, 8] VRiffle

bandwidth when the mixes are well-provisioned. With malicious mixes in the network,however, mixnets fail to provide the level of anonymity required for sensitive activities likewhistleblowing. Specifically, several proposed attacks [42, 41, 38, 52] allow a malicious mixto deanonymize the users through dropping, modifying, or duplicating the input messagesbefore sending them out.

2.2 Anonymity versus Bandwidth

Unlike systems using anonymizing proxies, Dining Cryptographer Networks (DC-Nets) [16]provide information theoretic anonymity even in the face of global adversaries as long as thereexists at least one other honest participant. However, they require communication betweenevery user to broadcast one message from a single user, and thus incur high bandwidthoverhead. As a result, systems that build on DC-Nets could not scale to more than a fewtens of clients [30, 22]. Recent DC-Net based systems 153, 23] use the anytrust model, wherethe network is organized as servers and clients, and guarantee anonymity as long as one ofthe servers is honest. The new model allowed the designs to scale to a few thousand clientswith strong anonymity, but they still suffer from a bandwidth penalty proportional to thenumber of clients and the message size.

Riposte [21] is a recent system that is optimized for anonymous microblogging. Eachclient in Riposte uses a novel "private information storage" technique to write into a shareddatabase maintained by multiple servers. It follows a similar threat model as Dissent, whereanonymity is guaranteed as long as one server is honest, and offers good throughput forsmall messages. However, each Riposte client must submit a message proportional to thesquare root of the size of the whole database (i.e., collection of all clients' data), making itunsuitable for sharing large messages among many clients.

2.3 Anonymity versus Computation

Another design that provides strong anonymity is a verifiable mixnet. Verifiable mixnetsaim to solve the security problems of plain mixnets by using verifiable shuffles [14, 28, 37, 8].In this design, when a mix shuffles the inputs, it also generates a zero-knowledge proof thatproves that the outputs form a valid permutation of the input ciphertexts. 2 Using the proofand the ciphertexts, other mixes in the network can verify that the mix did not tamper withany message, while learning nothing about the actual permutation. Assuming that at leastone of the mixes is honest, a verifiable mixnet is secure even with compromised mixes inthe network: The honest mix will shuffle inputs sufficiently, which prevents traffic analysisattacks, and the malicious mixes cannot tamper with the inputs without generating a badproof. However, generation and verification of such proofs is computationally expensive,resulting in high latency and low bandwidth. The state-of-the-art verifiable shuffle by Bayerand Groth [8], for instance, takes 2 minutes to prove and verify a shuffle of 100,000 shortmessages.

Another privacy primitive that trades off computation for anonymity is private infor-mation retrieval (PIR) [20]. While most anonymity systems mentioned previously focus onprotecting the senders' anonymity, PIR protects the privacy of the downloader. In the set-ting of PIR, a client accesses some data in a database managed by a server or a collection of

2The encryption scheme used must be probabilistic, and the mix re-randomizes the ciphertexts beforeoutputting. Otherwise, other mixes can trivially learn the permutation.

18

servers, and the goal is to hide which data was accessed. There are variants of PIR for differ-

ent settings [20, 19, 29], but many schemes have complex formulation, and incur significant

overheads in computation. On the other hand, the original PIR scheme proposed by Chor

et al. [20] is efficient in terms of both computation and bandwidth, but its threat model is

limiting: it requires multiple servers each with a copy of the database, and at least one of

the servers needs to be honest. However, this is precisely the setting of anytrust, and we

show that the efficient PIR can be used in a practical system to minimize the downstream

bandwidth overhead.

19

20

.............. gm,"mm - ', I,

Chapter 3

Background

This chapter summarizes two important cryptographic primitives used in Riffle, and de-scribes the system and threat model.

3.1 Verifiable Shuffle

As mentioned in Chapter 2, verifiable shuffles [37, 8J can be used to construct mixnets withstrong anonymity guarantees. In general, verifiable shuffles consist of two parties: a proverP and a verifier V. Most known verifiable shuffles operate in groups Z, for some prime p,and is used to shuffle ElGamal ciphertexts. In this setting, the problem is parameterized bythe following: prime p, generator g E Z, with prime multiplicative order q. There are twoinput sequences provided to the two parties: X1, . . . , X" and Y, . .. , Y" where Xi, Yi E ZpVi.Furthermore, there are constants c, d E Zq known only to P, but the commitments C = g'and D = gd are known to both parties. Given these inputs, verifiable shuffle requires P toprove to V that there exists a permutation 7r such that yjd = X without revealing anyinformation about 7r, c, and d. In mixnets, the mix that performs the shuffle is the proverand all other mixes in the network are the verifiers. Each mix uses random c and d tore-randomize the ciphertexts, and generates a zero-knowledge proof that the re-randomizedciphertexts form a valid permutation of the original inputs.

3.2 Authenticated Encryption

Authenticated encryption aims to provide confidentiality, integrity, and authenticity of theciphertexts for private key encryption schemes. In practice, an authenticated encryptionscheme AEnc consists of (1) symmetric encryption for privacy, and (2) message authen-tication code (MAC) for authenticity. There are three common ways to combine the twoprimitives:

" Encrypt-and-MAC: AEnck(M) = EnCk (M) MACk(m). Append the MAC of theplaintext to the ciphertext of the encryption. The receiver decrypts the ciphertextfirst, and verifies the MAC after.

" MAC-then-Encrypt: AEnck(M) = Enck(MIMACk(m)). Append the MAC of theplaintext before encryption, and encrypt them together. The receiver decrypts theciphertext first, and verifies the MAC afterwards.

21

* Encrypt-then-MAC: AEnck(m) = Enck(m)IIMACk(Enck(m)). Append the MAC ofthe ciphertext to the ciphertext. The receiver verifies MAC first, and decrypts theciphertext after.

Bellare and Namprempre 110] showed that if the symmetric encryption scheme is semanticallysecure and the MAC is unforgeable under chosen message attack, then Encrypt-then-MACprovides both privacy and authenticity. To meet the semantic security requirement, manysecure authenticated encryption schemes implement randomized encryption with a nonce.That is, AEnc uses a key k and a nonce r to encrypt a message m (i.e., AEnck,,(m)),and the guarantee is that if r is never repeated, the ciphertexts will not be distinguishable.Bellare and Namprempre also presented constructions of the other two modes of operation(albeit only theoretical, not realistic, constructions) that show that privacy is not preserved.

3.3 Riffle Deployment Model and Assumptions

Riffle Group XServer B

Server A Server C

Figure 3-1: Deployment model of Riffle

Riffle consists of a group of clients and servers, as illustrated in Figure 3-1. In each group,the clients form the anonymity set for the individuals in the group who wish to communicate

anonymously, and the servers collectively form an anonymity provider. For deployment,we assume that each server is run by separately administered entities, such as different

commercial or non-profit anonymity providers. Each client in a Riffle group is connected

to his or her preferred server called primary server (e.g., by location, hosting organization,

etc). This thesis focuses on a single group where the clients wish to communicate with the

other clients in the same group, not with other groups or the general Internet. Chapter 7

discusses other possible usage models of Riffle.

In this setting, we assume that the precious resource is the bandwidth between the clients

and the servers. Having a high bandwidth network between a small number of servers is

feasible, and already common (e.g., between data centers, large companies, etc). However,we cannot expect all clients to have high bandwidth connections to the servers (e.g., clients

connecting from locations with bad infrastructure, their mobile devices, 104.', Therefore,Riffle focuses on minimizing the client-server bandwidth requirements.

22

For the threat model, Riffle assumes an anytrust model, first presented in [531. Riffle

does not depend on a fraction of the servers being honest, or even a particular server, to

guarantee anonymity. We rely only on the assumption that there exists an honest server. In

particular, despite the clients being connected to only one server, we guarantee anonymity

of all clients in a group as long as there exists an honest server in the group. Apart from

having one honest server, we do not limit the adversaries' power in any way, and allow any

number of servers to collude. We do not, however, consider denial-of-service attacks from

the servers. We note that such an attack would be trivial for malicious servers since they

can simply drop all messages sent to them.

In this thesis, we assume that a group is already established, and focus on the operation

of a single group. That is, we do not consider how the clients select the servers to ensure

presence of an honest server, or how each client determines the appropriate group (anonymity

set) size. Previous works 130, 48] have explored these problems in detail, and the solutions

are applicable to Riffle as well.

We note that any communication over the network must be done through authenticated

and encrypted channels. Moreover, each message needs to be padded to a fixed length to

prevent privacy leakage through the size of the message. We assume this is done for all

communication, and omit this detail in the rest of the thesis to simplify presentation.

Table 3.1 summarizes the terminologies used throughout this thesis.

Table 3.1: Terminology

Terminology Description

C Set of Riffle clients

n The number of clients

S Set of Riffle servers

m The number of servers

b Size of a message

A Security parameter

23

Chapter 4

Riffle Architecture

In this chapter, we describe the Riffle architecture, which achieves near optimal bandwidthwith strong anonymity and minimal computation overhead. We first describe the systemand threat model, and our assumptions. We then give an overview of the Riffle protocol,and describe our hybrid shuffle and our application of private information retrieval. We alsodescribe how to hold a malicious client accountable, without any privacy leakage and per-formance overhead during regular operation. Finally, we analyze the asymptotic bandwidthrequirement, and provide a security argument for Riffle.

4.1 Riffle Protocol Overview

Riffle uses verifiable shuffle at its core for anonymous communication. At a high level, itconsists of two phases: setup and communication. The setup phase is used to share keys andhappens only once in the beginning. The communication phase consists of multiple rounds,similar to previous designs [53, 21], and clients upload and download messages to and fromthe servers in each round.

During the setup phase, every server Si generates a public key pi, and the clients down-load all pi's. To upload a message, a client Ci onion encrypts his or her message with allkeys pi's, and uploads the ciphertext to his or her primary server. To be fully traffic analysisresistant, all users are required to upload a message, even if they do not wish to communi-cate that round. Once all ciphertexts are uploaded, the first server Si collects the messages,and performs verifiable decryption and verifiable shuffle. It sends the proof of decryptionand shuffle along with the decrypted ciphertexts to all other servers, who will then verifythe proofs. Decryption, shuffling, and verification of proofs are repeated until the last serverfinally reveals all plaintexts to the servers. Once the plaintext messages are available, theservers broadcast all messages to their clients.

We note that this design already achieves a significant client-server bandwidth savingsfor uploads compared to DC-Net based designs [53, 23], both in practice and asymptotically;the ciphertext each client uploads is proportional only to the actual message.

4.2 Hybrid Verifiable Shuffle

Despite the significant bandwidth savings, the computational overhead of verifiable shufflesmakes it unsuitable for high bandwidth communication. As a concrete example, the state-of-the-art verifiable shuffle proposed by Bayer and Groth [8] takes more than 2 minutes

25

(a) Setup: Clients share keys with ev-ery server through verifiable shuffle, andshare secrets used for PIR.

(c) Shuffle: Servers authenticate, de-crypt, and shuffle the ciphertexts.

(b) Upload: Clients upload ciphertextlayer encrypted using authenticated en-cryption.

(d) Download: Clients download themessages either through PIR or broad-cast.

Figure 4-1: Anonymous communication in Riffle

to shuffle 100,000 ElGamal 1261 ciphertexts, each of which corresponds to a small message

(e.g., a symmetric encryption key). Furthermore, randomized public key encryption schemes

commonly used for verifiable shuffle, such as ElGamal, often result in ciphertext expansion

of at least 2, which halves the effective bandwidth.

To address the issues of traditional verifiable shuffles, we propose a new hybrid verifiable

shuffle. In hybrid shuffle, verifiable shuffle is performed only once to share encryption keys

that are used throughout an epoch, which consists of many rounds. Riffle uses authenticated

encryption 1101 with the shared keys for the communication phase.

During the setup phase, the clients share pairwise secrets (keys) with the servers using

verifiable shuffle, and each server Si retains permutation ?rj used during the verifiable shuffle.

To upload, C, now onion encrypts his or her message Mj for round r using authenticated

encryption with the keys shared during the setup phase, and sends the resulting ciphertext.

When Si collects the ciphertexts, it authenticates and decrypts the ciphertexts using the

shared keys, and shuffles them using the 7ri saved from the setup phase. It then sends the

shuffled ciphertexts (with one less layer of encryption) to the next server, and the servers

repeat the authentication, decryption, and shuffling until the last server finally reveals the

plaintext messages to all servers. The details of the hybrid verifiable shuffle are presented

in Algorithm 1.

It is important to note that the authenticated encryption in practice is symmetric 110],which lowers computational overhead by orders of magnitude compared to public key en-

cryption schemes. Furthermore, this eliminates expensive generation and verification of

26

Algorithm 1 Hybrid Shuffle

1. Sharing keys: Cj submits m onion-encrypted keys (secrets) to its primary server.

Each key kij for Si is encrypted with keys p1, ... , pi. Starting with Si, Si's decrypt a

layer on all submitted encrypted keys, and perform verifiable shuffle on all keys using

7ir, except for kij's . Si then retains the permutation xi and the keys kij for 1 < j 5 n,and sends the other (still encrypted) keys to the next server. Once this step finishes,kij will be at position ri_.1(... (iri(j)) .. .) in Si. This step happens only once per

epoch.

2. Upload: In round r, Cj onion-encrypts, using authenticated encryption, the message

MJ using keys kij's and the round number r as a nonce:

AEnc1,...,m(MJ) = AEnCkij,r(. . . (AEnckm,r (MJ)) ... ).

Each client uploads AEnc1 ,...,m(Mr) to its primary server.

3. Shuffle: Si collects all uploaded ciphertexts for round r. Starting with Si, all the

Si's verify authenticity of the ciphertexts and decrypt a layer using the shared secrets

and r. Then, using the permutation 7ri, Si shuffles the messages and sends them to

Si+1. Note that this means in Si, ciphertext AEnci,...,m(Mj) of Cj is at position

7ri(... (7r1(j)) .. .), which is where the matching key kij is. This is repeated until the

last server learns the permuted plaintext messages M,, and broadcasts M, to all other

servers. The final permutation of the messages is ir = 7rmnm-1(... (72(71)) ... ))-

proofs, and the servers simply need to authenticate the ciphertext to verify the shuffle.

4.3 Private Information Retrieval

There are situations where a client is not interested in the majority of the messages. For

example, consider two clients chatting through Riffle. Here, the clients can learn the expected

location of the messages after one round of communication (since our hybrid shuffle uses

the same 7r for all rounds in an epoch), but the clients are forced to download all available

messages due to broadcast. In these scenarios, we can use multi-server private information

retrieval (PIR) proposed by Chor et al. 1201 to improve the download efficiency. Let Ij be

the index (location) of the message client C, wants to download. To download the message,

C, first generates m - 1 random bit masks each of length n, and computes a mask such that

the XOR of all m masks results in a bit mask with 1 only at position Ij. Each mask is sent

to a server, and each server Si XORs the messages at positions with 1 in the bit mask to

generate its response rij for Cj. Finally, C, downloads all rij's and XORs them together to

learn the plaintext message.

Although this scheme is fairly efficient, we can further reduce the bandwidth overhead

using pseudo-random number generators (PRNGs). To avoid sending masks to all servers

every round, C, shares an initial mask with every server during the setup phase. Each server

then updates the mask internally using a PRNG every round, and C, only sends a mask to

its primary server Sp, to ensure the XOR of all masks has 1 only in position Ij.

To avoid downloading a message from every server, C, can ask Sp, to collect all responses

27

Algorithm 2 Private Information Retrieval

1. Sharing Secrets: Each client Cj shares two secrets mij and sij with each Si exceptfor the primary server Spj it is connected to. This step happens only once per epoch.

2. Mask Generation: Let index Ij be the index of the message C, wants to download.

Ci generates mpj, such that EQi mi = el, where ei, is a bit mask with 1 only in slotIj. C, then sends mjj to Spj.

3. Response Generation: Each server Si computes the response rij for C, by com-puting the XOR sum of the messages at the positions of 1's in the mij, and XORingthe secret sij. Specifically, response rij = (@e fm [V] - M) E sij, where M is the ithplaintext message. Then, the servers send rij to Sp , and S, computes rj:

ri = (@ rij) = ( mz ij[f]M) si

= M, 8 ( s

4. Message Download: C, downloads rj from Sp,,, and XORs all sij's to compute themessage of interest, MI, = rj E ( sij).

5. Update Secrets: Both C and S apply PRNG to their masks and secrets to get freshmasks and secrets.

and XOR them together. However, doing so naively results in Sp, learning the message C,downloaded. To prevent this privacy leakage, C, shares another set of secrets with everyserver during the setup, and each server XORs its secret into the response. Sp, can nowcollect the responses and XOR them, while learning nothing about which message C, isinterested in. Finally, C, can download the response from Spj (i.e., the message XORedwith the shared secrets), and remove the secrets to recover the message. Similar to themasks, the servers and the clients can internally update the secrets using a PRNG everyround. Since PIR hides which data was accessed, we note that sharing of masks and secretsneed not be through verifiable shuffle; we do not need to hide which secret is associatedwith which client. However, each client must perform PIR every round to remain resistantto traffic analysis attacks even if the client is not interested in any message. Algorithm 2demonstrates the exact details of the optimized PIR, and Figure 4-1 illustrates the finalRiffle protocol.

4.4 Accusation

In DC-nets [16] or DC-net based designs [53], it is easy for a malicious client to denial-of-service attack the whole network. Namely, any client can XOR arbitrary bits at any timeto corrupt a message from any user. This problem has led to complex, and often costly,solutions to hold a client accountable, such as trap protocols [50], trap bits [53], and verifiableDC-nets [23], or limited the size of the group to minimize the attack surface 130].

28

Without any precautions, a similar attack is possible in Riffle as well: a malicious clientcould send an unauthenticated ciphertext, and the system will come to a halt. However,unlike DC-nets, shuffling does not allow one client's input to corrupt any others' inputs.Leveraging this fact, Riffle provides an efficient way to hold a client accountable whileintroducing no overhead during regular operation and leaking no privacy of honest clients:when a server Si detects that a ciphertext at its position j is not authenticated correctly,it begins the accusation process by revealing j to Si-1. Si-1 then reveals j' = r- 1(j), andthis continues until Si reveals the client who sent the problem ciphertext.

This naive approach, however, could be abused by a malicious server: any server couldflag an error for any client, and that client will be deanonymized since other servers haveno way to verify the claim. To prevent this problem, Riffle requires all servers to commitall their shared secrets to all other servers during the setup phase. When a server Si wishesto accuse a client corresponding to position j, it reveals j, the associated secret kij, and theproblem ciphertext Eij to all upstream servers. The servers then run the following steps,where each step executes only if the previous step finishes successfully:

1. Server Si-1 confirms the Eij is indeed the ciphertext at location j.

2. All upstream servers (Sj for f < i) verify the secret kij with the commitment, andcheck that Eij is not authenticated correctly.

3. Si-1 reveals j' = r- (j), the corresponding secret ki_,y, and ciphertext Ei_1,j.

4. Upstream servers verify that decryption of Eji_,' using ki_1,' is Eij.

This is repeated until Si finally announces the real client.We note that a malicious client cannot perform a similar attack during a download

phase. When the messages are broadcast, there is nothing that a client can do to disruptother clients. When the clients use PIR, a client can only corrupt its own response sinceeach client generates its own masks and secrets.

4.5 Bandwidth Overhead

Riffle achieves near optimal bandwidth between a client and a server when sending a message.A client only uploads a ciphertext of layered authenticated encryption of size b + mA, whereb is the size of the message, m is the number of servers, and A is the size of the messageauthentication codes (MACs) [9] used with the authenticated encryption [101. If the client isinterested in only one message and the index is known, then the only overhead is sending themask of size n, the number of clients, to the primary server. The total upstream bandwidthis then b + mA + n, and the total downstream bandwidth is b per client per round. We notethat even though upstream bandwidth grows linearly with n, it only requires 1 bit per client.In the general case where the index of the message is not known, the download bandwidthis nb per client due to the broadcast, but the upload bandwidth decreases by n, the size ofthe mask.

The-bandwidth requirement between the servers, on the other hand, grows linearly withthe number of users. Every server must download n ciphertexts, and upload n ciphertexts(with one removed layer) to the next downstream server. The last server also needs to

,send the plaintexts to all other servers as well. Furthermore, though PIR reduces thedownload bandwidth overhead of the clients, the server to server bahdwidth increases: with

29

our optimizations, each server needs to send ri to the clients' primary servers. Therefore, theservers also need (m - 1)nb additional bandwidth for the PIR responses. In total, the serverto server bandwidth requirement per round is approximately 3(m - 1)nb. Even though thetotal grows linearly with the number of clients, we note that this is asymptotically betterthan previous anytrust systems. For example, Dissent [53] requires m(m - 1)nb to performDC-Net among the servers with all clients' data.

4.6 Security Analysis

We present the security argument for Riffle. The security of the setup phase of the Riffleprotocol follows immediately from the security of the verifiable shuffle. Any server down-stream from the honest server cannot correlate a secret to a particular client if the proof ofthe shuffle is zero-knowledge.

Verifiability and Zero-Knowledge of Hybrid Shuffle. Following the proposalof [101, let us assume that the authenticated encryption used by Riffle is semantically se-cure [31] and is secure against forgery. For Si to tamper with the inputs and not be detectedby Si+1, Si needs to generate outputs such that some of the outputs are not decryptions ofthe inputs, but still authenticate properly under the keys of Si+1. However, if ciphertextsare unforgeable and the keys of Si+1 are unknown, then Si cannot generate such outputs.We also use the round number, which is provided internally by Si+1, as the nonce to au-thenticated encryption to guarantee the freshness of the data, and stop replay attacks by Si.Therefore, shuffle and decryption of Si is valid if and only if Si+1 can authenticate all outputciphertexts. Moreover, if the encryption is semantically secure, then the outputs cannot becorrelated to the inputs, and thus the shuffle is zero-knowledge.

Anonymity of Uploads. Anonymity of the uploads relies on the fact that the shuffleis verifiable and zero-knowledge. The verifiability ensures that a malicious server cannotprovide the downstream servers with invalid inputs designed to deanonymize a client (e.g.,inputs "marked" to reveal the final position of a client). The zero-knowledge property ofthe shuffle guarantees that the inputs and the outputs cannot be correlated. Moreover,in the anytrust setting, it guarantees that the permutation irH of the honest server SH

is unknown to the adversary. Since 7rH is generated randomly independently of all otherpermutations, the final permutation 7r = 7rm(7m-1(... (r 1 ) .. .)) is also random and unknownto the adversary.

Anonymity of Downloads. The anonymity of the downloads depends on the securityof PIR and PRNGs. The security of PIR used in Riffle was proven by Chor et al. [20].Intuitively, if the masks are generated at random, then the mth mask cannot be inferredfrom the other m - 1 masks. The optimization to reduce mask sharing is secure if the PRNGis cryptographically secure, which says that the masks cannot be distinguished from trulyrandom masks. Finally, collecting the responses at one server is also secure as long as notall secrets are known to the malicious servers.

Anonymity of Accusation. Assume that a malicious server Si starts a false accu-sation by claiming that a ciphertext is mis-authenticated when it was actually correctlyauthenticated. Let SH be an honest server. There are two cases: (1) i < H (maliciousupstream server) or (2) i > H (malicious downstream server). In the first case, because theaccusation does not involve the honest server at all, the honest server will not reveal any ofits permutation. As noted before, one unknown random permutation is sufficient to keep allclients anonymous, and thus malicious upstream servers cannot deanonymize a client.

30

In the second case, eventually the honest server SH will see all ciphertexts EH+1, ... , Ei

flagged by the downstream servers along with the associated secrets. Apart from checking

the secrets against the commitments and the authenticity of Ej, the honest server also

checks the validity of the ciphertexts by (1) confirming EH+1 is the ciphertext given to

SH+1 and (2) checking that the decryptions of EH+1 matches EH+2, .. . , Ei. If SH finds any

mismatching ciphertexts, it will not reveal the permutation, thus preserving anonymity. If

the commitment scheme is binding and a ciphertext cannot decrypt to two different values

using one key, then the malicious servers cannot generate a correct decryption of EH+1 that

results in a mis-authenticated ciphertext. Therefore, no server can falsely accuse a client,and reveal the client's true identity.

31

Chapter 5

Applications

The efficiency of Riffle makes it suitable for high bandwidth applications like anonymous file

sharing, and latency sensitive applications like anonymous microblogging. In this chapter,

we present the details of the two applications in Riffle.

5.1 File Sharing

File sharing within a Riffle group is similar to that of BitTorrent [21, despite the differences

in the system model (server-client of Riffle vs. peer-to-peer of BitTorrent). When a client

wants to share a file, he or she generates a torrent file, which contains the hashes of all blocks

(the smallest unit used for file sharing) of the file. Then, using Riffle, the client uploads

the torrent file to the servers. The servers play the role of torrent trackers in BitTorrent,

and manage all available files in that group. In the simplest design, the file descriptors are

broadcast to all connected clients, and clients can locally choose a file to download. Since the

torrent files are fairly small even for large files (only a few 100KB in practice) and sharing

them is a one-time cost, we assume broadcasting them is inexpensive and focus on sharing

blocks.With the torrent files distributed, the clients can now share files anonymously using

Riffle. There are three major steps:

1. Requesting Blocks: Each C, identifies a file F of interest, and the hashes of the

blocks of the file HF via its torrent file. C, then requests a block of F by uploading

the hash of the block Hj (E HF to Sp using Riffle. When a client has no blocks to

request, he or she sends a random value as a (non-)request to remain traffic analysis

resistant. All requests H, are broadcast to the clients at the end of this step.

2. Uploading Blocks: Each C, checks if it possesses any requested block by checking

the hashes of the blocks it owns with H. If a matching block Mj is found, then C,

uploads Mj using Riffle. Once the plaintext blocks are available to the servers, each

server broadcasts the hashes of the available blocks H' .

3. Downloading Blocks: From Hj and H', C, learns the index Ij of the block he or

she requested. Using PIR, C, downloads the block.

The rounds can (and should) be pipelined for performance since each request is independent.

That is, we allow for many outtinding requests. Figure 5-1 summarizes our file sharing

protocol.

33

(a) Setup: Clients share the torrent filesanonymously.

(c) Upload: Each client uploads an en-crypted file based on the requests usingRiffle.

(b) Request: Each client requests a fileby uploading the hash of the file usingRiffle.

(d) Download: Each client downloadsthe file he or she requested using PIR.

Figure 5-1: Anonymous File Sharing Protocol

5.1.1 Non-Bijective Requests and Uploads

In a given round, it is possible that the mapping from requests to uploaders is not bijective.

It could be that

1. Multiple C2's request blocks from the same Ck,

2. Cj has no matching blocks for a round, and

3. Multiple C,'s have the same block

There are two possible sub-cases for Case 1. When the Cj's request the same block,the protocol can be carried out normally from Ck's perspective. When Ck needs to service

multiple blocks, this becomes a problem. One possible solution is to have multiple rounds of

uploading. All participants can detect the problem by checking the request hashes H, and

the hashes of available uploaded blocks. If any are missing, then more rounds of uploads

happen until all blocks are available. The client who does not possess any of the missing

blocks will upload a dummy block (e.g., identically 0 block) for anonymity. Another possible

solution is simply ignoring the missing content, and have the clients request the block again.

The first solution wastes bandwidth for those uploading dummy blocks, while the second

one does not guarantee fairness.

Case 2 happens when (1) some clients did not request a block, or (2) Case 1 happened.

In either situation, the clients with jio matching blocks must upload a dummy block to

remain resistant to traffic analysis

34

Case 3 is only a problem when several Cj's upload the same block when they actually

could have uploaded different blocks, resulting in wasted bandwidth. To mitigate this issue,

the servers could keep track of the number of clients who possess a block (i.e., the rarity of

the block). When the servers broadcast the request hashes, they can annotate the download

requests with their rarities, and the clients will upload the rarest blocks first.

5.1.2 Bandwidth Overhead in File Sharing

To share a block among users in peer-to-peer file sharing such as BitTorrent 12], each client

only needs to request and download a block. The client also may need to upload a file if it

receives a request. If each request is of size h, each client consumes h +b of upload bandwidth

(assuming it has a block to upload to another client), and b of download bandwidth. In Riffle

with PIR, there are three sources of bandwidth overhead for a client: (1) downloading the

requests, (2) uploading MACs of authenticated encryption, and (3) the mask uploaded to the

primary server. Thus, the total bandwidth between a client and a server is h + b + n + 2mA

of upload, and b + 2hn of download. We note that even though both bandwidths grow with

the number of clients, n and 2nh are much smaller than b in file sharing scenarios for a

reasonable number of clients and block size.

5.2 Microblogging

Microblogging in Riffle is similar to previous work 153, 211: each client submits a small

message per round that will be published, and all plaintext messages are broadcast to all

clients every round. In microblogging scenarios, Riffle uses broadcast instead of PIR because

the users are interested in many messages posted by different users; users would have to

perform p PIRs to download p messages, and therefore the benefits of PIR decreases as p

increases.

35

Chapter 6

Prototype Evaluation

6.1 Implementation

We have implemented a Riffle prototype in Go [4] using Go's native crypto library alongwith the DeDis Advanced Crypto library [1]. We used ElGamal encryption [26] usingCurve25519 [11] and Neff's shuffle [37] with Chaum-Pederson proofs [17] for verifiable shuffleand decryption. For authenticated encryption, we use Go's Secretbox implementation [5, 6],which internally uses Salsa20 [12] for encryption and Poly1305 [13] for authentication. Forthe pseudo-random number generator used to update the masks and the secrets neededfor PIR, we used keyed AES [39]. Our prototype supports two different modes of opera-tion: (1) file sharing, and (2) microblogging. We implemented the applications described inChapter 5.

6.2 Evaluation

To evaluate our system, we used the Emulab [3] testbed. Emulab provided a stable en-vironment to test our prototype, while allowing us to easily vary the group topology andserver configurations. The servers were connected to each other via a shared LAN, and theclients were distributed evenly among all servers. The clients connected to their primaryserver using one shared 100 Mbps link with 20ms delay, and the servers were connected toeach other through a 1 Gbps link with 10ms delay. Due to resource and time constraints,we used 50 physical nodes to simulate all clients. Each server was equipped with an 8-coreIntel Xeon E5530 Nehalem processor1 , and the majority of the client nodes were using adual core Intel Pentium 4 processor. As shown in the next sections, we have found thatthe "outdated" processors did not impact our results much. Figure 6-1 shows the testbedtopology used for the majority of our experiments.

6.2.1 File sharing

We simulated users sharing large files by first creating a pool of files, each of which was300MB. From the pool, each client chose one file to request and one file to share, and eachfile was divided into 256KB blocks, similar to BitTorrent's typical block size [43]. We have

'The most powerful machines with Sandy Bridge Xeon with 10 Gigabit Ethernet were not readily avail-able.

37

El

em/ 1d

Figure 6-1: Testbed topology.

experimented with different block sizes, and found that the block size changed the effectivebandwidth by less than 5% in all experiments as we varied it from 128KB to 1MB.

Figure 6-2a shows the total time spent and the effective bandwidth (the size of the sharedfile over the total time spent) when sharing a 300MB file.2 We also plotted the "ideal" Riffle,where we computed the expected time to share a 300MB file based on our analytic bandwidthmodel (Chapter 5.1.2), assuming computation is free and network bandwidth is perfect. Wehave created a similar model for Dissent 1531 as well for comparison.

In this experiments, Riffle performed competitively up to 200 clients, supporting 100KB/sof effective bandwidth per client. Our prototype matches the analytical model fairly closely,showing that the amount of computation is minimal, and the primary limitation is the serverto server bandwidth. If the servers were connected through 10 Gbps connections, we expectthe effective bandwidth to improve by an order of magnitude. The discrepancy between theidealized model and the prototype for larger numbers of clients is due to two factors: First,the ideal model ignores cost of computation, which increases linearly with the number ofclients. Though symmetric decryption is inexpensive, the cost becomes non-negligible whenthe number of clients is large. Second, the effective bandwidth per client decreases since weare sharing a 100 Mbps link among a few hundred clients.

When comparing to the ideal model of Dissent 1531, we see an order of magnitude speedup. This is expected since the client-server bandwidth overhead in Dissent grows linearlywith the number of clients, and each client only has access to a small amount of bandwidth.

We also tested the impact of different server configurations on performance. Figure 6-2bshows the effective bandwidth of 200 clients sharing 300MB files as we varied the number ofservers. As observed in our analytical model (Chapter 5.1.2), the server to server bandwidth

requirement grows with the number of servers, so the average bandwidth of the clients dropsas we increase the number of servers.

Finally, we evaluated the full system performance of Riffle, including the setup phase of

~Note that the y-axis of the time graph is in log scale to properly display the time taken for Dissent.

38

- Riffle .. Ideal Riffle I-deal Dissent

--- -- -

.... .. . . EP

1 1 2 2 0 3 - 450- --

'90 100 150 200 250 300 35 400 450 -500Number of clients

(a) Varying numbers of clients with 3 servers.

VC

10'

le

l0'

120

100

s0so40

20

- Riffle -.- dea1 Riffle -- Ideal Dissent

--- - - - - - - - - - - - - - -

.-.-----.

4 7

Number of servers

(b) 200 clients with varying numbers of servers.

Figure 6-2: Average time taken to share a 300MB file and effective bandwidth with different

group configurations.

EI-

10000

:o00

6000

4000

2000

- Setup - Upload- Request - Download

Numuer or ciinis

(a) 10MB file.

Figure 6-3: Breakdown of the total time

numbers of clients with 3 servers.

Number of clients

(b) 300MB file.

taken to share two different size files for varying

an epoch. Figure 6-3a and Figure 6-3b present the breakdown of the time spent to share a

10MB file and a 300MB file for different numbers of clients, which takes 40 and 1200 rounds

respectively with 256KB blocks. Specifically, the graph shows the time spent in the setup

phase (verifiable shuffle of keys, and sharing secrets for PIR), and the three steps of file

sharing rounds (request, upload, and download). When sharing a 10MB file, the verifiable

shuffle took more than half of the total time, proving to be quite costly. However, as

demonstrated by Figure 6-3b, the verifiable shuffle is a one-time operation in the beginning

of an epoch, and the cost becomes less significant for longer epochs. Moreover, the verifiable

shuffle used by the Riffle prototype 1371 is not the most efficient shuffle, and we can reduce

the setup time if we implement faster verifiable shuffles 181. We also note that with more

powerful machines, the verifiable shuffle should be much faster, as computation dominates

the overhead in this case.

39

10

EFloll

10'

4001

300

200

100

4

700

600

500

S 400

EF 300

200

100

ImNDU

I Infk

I-34 7

6.2.2 Microblogging

We simulated a microblogging scenario by creating tens to hundreds of thousands of clientsall of whom submitted a small message (160 byte or 320 byte) per round, and broadcastingthe messages at the end of each round. Due to physical resource limitation, we createdhundreds to thousands of "super" clients, each of which submitted hundreds of messages tosimulate a large number of users. For our microblogging experiment, we fixed the numberof servers at 3 as we varied the number of clients.

Figure 6-4 shows the average latency of posting one message. For latency sensitivemicroblogging, we can support up to 10,000 users with less than one second latency with160 byte messages. If the messages can tolerate some delay, we can support more than100,000 users with less than 10 seconds of latency. This figure also demonstrates the abilityof Riffle to exchange latency for message size: if we double the message size, the latency alsodoubles. Because Riffle is bandwidth and computation efficient, the latency is determinedsolely by the total number of bits of the messages in a round. This makes it easy to makea conscientious trade-off between the latency, the message size, and the number of clients.

Riffle can support an order of magnitude more clients compared to Dissent [531 forlatency sensitive microblogging (10,000 in Riffle vs. 1,000 in Dissent). For a large numberof clients, Riffle outperforms previous works [53, 8] by orders of magnitude. For instance,it takes 2 minutes just to verifiably shuffle messages of 100,000 users [8], ignoring networkcommunication. In Riffle, it takes a fraction of a second to shuffle and verify, and takes lessthan 10 seconds in total for 100,000 users, including the time spent in the network. Finally,though it is hard to compare Riffle to Riposte [21] directly due to lack of a database inRiffle, we expect Riffle to perform better as the database of microblog posts grows larger:the bandwidth requirement of Riposte clients grows with the size of the database, while thebandwidth requirement of Riffle clients remains the same.

25

- 160 byte message - - 320 byte message

0 - . ................ ...................... --.-...............

0-20000 40000 60000 80000 100000 120000

Number of clients

Figure 6-4: Average latency to share a microblogging post for different message sizes with 3servers.

40

Chapter 7

Discussion, Future Work, andConclusion

7.1 Dicussion

In this section, we discuss a few limitations and some aspects of Riffle we did not consider

in this thesis, and how they could be addressed in future work.

7.1.1 Alternate usage model

We have assumed that the clients want to communicate with others in the same group.

However, the clients may also want to interact with (1) clients in other Riffle groups, or (2)

the general Internet. In the first setting, we can use ideas from Herbivore [30], and connect

the servers of different groups to each other. The clients can then communicate with any

client in any group through the network of servers. In the second setting, we could use

the Riffle servers as "exit" nodes: each client can submit his or her message for someone

outside through Riffle, and the servers interact with the Internet on the client's behalf. For

example, each client uploads a BitTorrent request for a file to the servers. The servers then

participate in a regular BitTorrent protocol in the open Internet, and the clients can use PIR

to securely download their files. Moreover, the servers could use a highly scalable anonymity

system with a large anonymity set (e.g., Tor [25]) to expand the anonymity set of the clients

beyond just one Riffle group, while providing strong anonymity within the group.

7.1.2 Malicious Servers

Verifiability and zero-knowledge property of the hybrid shuffle (Chapter 4.6) prevent any

malicious server from deanonymizing a client, and make it possible for an honest server

to reveal a malicious server. However, since the identity of the honest server is unknown,the clients cannot rely on the claims of the servers. Concretely, there is no mechanism

to prevent m - 1 malicious servers from claiming that the one honest server is malicious.

Though this scenario should be rare with independently administered servers, we hope to

provide a mechanism to prevent such an attack in the future.

41

7.1.3 Server to server bandwidth

As noted in our bandwidth analysis and evaluation, the server to server bandwidth require-ment grows linearly with the number of clients. This quickly became the bottleneck ofRiffle, and limited the effective throughput (Chapter 6). One potential solution is for eachprovider to manage a "super" server that consists of smaller servers, and each smaller serverhandles a fraction of the traffic. Essentially, we can increase the server to server bandwidthby replicating the connections.

Though we hope to lower the actual overhead in future work, we believe that some of thecost is fundamental to the anytrust model proposed by 153] and assumed by Riffle. Namely,if there is only one honest server and the identity of the honest server is unknown, it seemsnecessary for all data to pass through all servers.

7.1.4 Network churns and group changes

In a realistic network, different clients can have drastically different connection speeds, andclients can leave or join the group dynamically. The default Riffle protocol requires thateveryone in a group submit a message every round to maintain anonymity, which makesthe overall latency as bad as the worst individual latency. Worse, any changes in the grouprequire Riffle to perform the expensive verifiable shuffle to create a new permutation for thenew set of active clients. To handle the case of clients dropping out, we currently ask eachclient to submit some cover traffic to the first server. When a client disconnects, we use thecover traffic to carry on the rounds with the active clients, and perform the verifiable shufflein the background to create a new permutation. However, this scheme wastes bandwidth,and does not allow for dynamic growth of the group. Tolerating network churns and changesin the group more effectively is deferred to future work.

7.1.5 Intersection attacks

A powerful adversary monitoring clients and network over longer periods of time can corre-late the presence of some messages with the online status of the clients [34]. For instance,if messages related to a protest are only posted when a particular client is online, then theadversary can link the messages to the client. Though Riffle does not protect against theseclass of attacks, it could benefit from prior work on mitigating these attacks [54].

7.2 Conclusion

Riffle is an anonymous communication system that provides traffic analysis resistance andstrong anonymity while keeping the bandwidth and computation overhead to minimum. Weachieved this by developing a new hybrid shuffle technique which avoids expensive verifiableshuffles in the critical path for uploading messages, and using private information retrieval fordownloading messages. We have also demonstrated through a prototype the bandwidth andcomputation efficiency of Riffle via anonymous file sharing and microblogging applications,and that strong anonymity can indeed scale to a large number of users.

42

Bibliography

[1] Advanced crypto library for the go language. https://github.com/DeDiS/crypto.

[2] Bittorrent. https://bittorrent.com.

[3] Emulab network emulation testbed. http://www.emulab.net/.

[4] The go programming language. https://golang.org/.

[51 Secret-key authenticated encryption. http://nacl.cr.yp.to/secretbox.html.

[61 Secretbox - godoc. https://godoc.org/golang.org/x/crypto/nacl/secretbox.

[7] Tor metrics portal. https://metrics.torproject.org.

[81 Stephanie Bayer and Jens Groth. Efficient zero-knowledge argument for correctness of

a shuffle. In Proceedings of the 31st Annual International Conference on Theory and

Applications of Cryptographic Techniques, EUROCRYPT'12, pages 263-280, Berlin,

Heidelberg, 2012. Springer-Verlag.

[9] Mihir Bellare, Ran Canetti, and Hugo Krawczyk. Keying hash functions for message

authentication. pages 1-15. Springer-Verlag, 1996.

[101 Mihir Bellare and Chanathip Namprempre. Authenticated encryption: Relations

among notions and&#xOOaO;analysis of the generic composition paradigm. J. Cryptol.,

21(4):469-491, September 2008.

[11] Daniel J. Bernstein. Curve25519: new diffie-hellman speed records. In In Public Key

Cryptography (PKC), Springer- Verlag LNCS 3958, page 2006, 2006.

[121 Daniel J. Bernstein. New stream cipher designs. chapter The Salsa20 Family of Stream

Ciphers, pages 84-97. Springer-Verlag, Berlin, Heidelberg, 2008.

[13] DanielJ. Bernstein. The poly1305-aes message-authentication code. In Henri Gilbert

and Helena Handschuh, editors, Fast Software Encryption, volume 3557 of Lecture Notes

in Computer Science, pages 32-49. Springer Berlin Heidelberg, 2005.

[14] Justin Brickell and Vitaly Shmatikov. Efficient anonymity-preserving data collection.

In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Dis-

covery and Data Mining, KDD '06, pages 76-85, New York, NY, USA, 2006. ACM.

[15] Xiang Cai, Xincheng Zhang, Brijesh Joshi, and Rob Johnson. Touching from a distance:

Website fingerprinting attacks and defenses. In Proceedings of the 19th A CM conference

on Computer and Communications Security (CCS 2012), October 2012.

43

[16] D. Chaum. The dining cryptographers problem: Unconditional sender and recipientuntraceability. J. Cryptol., 1(1):65-75, March 1988.

[17] David Chaum and Torben P. Pedersen. Wallet databases with observers. In Proceedingsof the 12th Annual International Cryptology Conference on Advances in Cryptology,CRYPTO '92, pages 89-105, London, UK, UK, 1993. Springer-Verlag.

[18] David L. Chaum. Untraceable electronic mail, return addresses, and digitalpseudonyms. Commun. ACM, 24(2):84-90, February 1981.

[19] Benny Chor and Niv Gilboa. Computationally private information retrieval (extendedabstract). In Proceedings of the Twenty-ninth Annual ACM Symposium on Theory ofComputing, STOC '97, pages 304-313, New York, NY, USA, 1997. ACM.

[20] Benny Chor, Eyal Kushilevitz, Oded Goldreich, and Madhu Sudan. Private informationretrieval. J. ACM, 45(6):965-981, November 1998.

[211 H. Corrigan-Gibbs, D. Boneh, and D. Mazieres. Riposte: An Anonymous MessagingSystem Handling Millions of Users. ArXiv e-prints, March 2015.

[22] Henry Corrigan-Gibbs and Bryan Ford. Dissent: Accountable anonymous group mes-saging. In Proceedings of the 17th ACM Conference on Computer and CommunicationsSecurity, CCS '10, pages 340-350, New York, NY, USA, 2010. ACM.

[23] Henry Corrigan-Gibbs, David Isaac Wolinsky, and Bryan Ford. Proactively accountableanonymous messaging in verdict. In Presented as part of the 22nd USENIX SecuritySymposium (USENIX Security 13), pages 147-162, Washington, D.C., 2013. USENIX.

[24] George Danezis, Roger Dingledine, David Hopwood, and Nick Mathewson. Mixminion:Design of a type iii anonymous remailer protocol. In In Proceedings of the 2003 IEEESymposium on Security and Privacy, pages 2-15, 2003.

[25] Roger Dingledine, Nick Mathewson, and Paul Syverson. Tor: The Second-GenerationOnion Router. In Proceedings of the 13th USENIX Security Symposium, pages 303-320,August 2004.

[26] T. Elgamal. A public key cryptosystem and a signature scheme based on discretelogarithms. Information Theory, IEEE Transactions on, 31(4):469-472, Jul 1985.

[27] Michael J. Freedman and Robert Morris. Tarzan: A peer-to-peer anonymizing networklayer. In Proceedings of the 9th ACM Conference on Computer and CommunicationsSecurity, CCS '02, pages 193-206, New York, NY, USA, 2002. ACM.

[28] Jun Furukawa and Kazue Sako. An efficient scheme for proving a shuffle. In In Proc.of CRYPTO dAZ01, pages 368-387. Springer-Verlag, 2001.

[29] Niv Gilboa and Yuval Ishai. Distributed point functions and their applications. InPhongQ. Nguyen and Elisabeth Oswald, editors, Advances in Cryptology - EURO-CRYPT 2014, volume 8441 of Lecture Notes in Computer Science, pages 640-658.Springer Berlin Heidelberg, 2014.

[30] Sharad Goel, Mark Robson, Milo Polte, and Emin Gun Sirer. Herbivore: A Scalableand Efficient Protocol for Anonymous Communication. Technical Report 2003-1890,Cornell University, Ithaca, NY, February 2003.

44

RRONOMMMMIN MOP

[311 Shafi Goldwasser and Silvio Micali. Probabilistic encryption & how to play mental

poker keeping secret all partial information. In Proceedings of the Fourteenth Annual

A CM Symposium on Theory of Computing, STOC '82, pages 365-377, New York, NY,USA, 1982. ACM.

[32] Dominik Herrmann, Rolf Wendolsky, and Hannes Federrath. Website fingerprinting:Attacking popular privacy enhancing technologies with the multinomial naïve-bayes classifier. In Proceedings of the 2009 ACM Workshop on Cloud Computing Secu-rity, CCSW '09, pages 31-42, New York, NY, USA, 2009. ACM.

[331 Stevens Le Blond, David Choffnes, Wenxuan Zhou, Peter Druschel, Hitesh Ballani,and Paul Francis. Towards efficient traffic-analysis resistant anonymity networks. InProceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM, SIGCOMM '13,pages 303-314, New York, NY, USA, 2013. ACM.

134] Nick Mathewson and Roger Dingledine. Practical traffic analysis: extending and resist-ing statistical disclosure. In 4th International Workshop on Privacy Enhancing Tech-nologies, May 2004.

[35] Prateek Mittal and Nikita Borisov. Shadowwalker: Peer-to-peer anonymous communi-cation using redundant structured topologies. In Proceedings of the 16th ACM Confer-

ence on Computer and Communications Security, CCS '09, pages 161-172, New York,NY, USA, 2009. ACM.

[36] Steven J. Murdoch and George Danezis. Low-cost traffic analysis of tor. In Proceed-

ings of the 2005 IEEE Symposium on Security and Privacy, SP '05, pages 183-195,Washington, DC, USA, 2005. IEEE Computer Society.

[37] C. Andrew Neff. A verifiable secret shuffle and its application to e-voting. In Proceedings

of the 8th ACM Conference on Computer and Communications Security, CCS '01, pages

116-125, New York, NY, USA, 2001. ACM.

[38] Lan Nguyen and Rei Safavi-naini. Breaking and mending resilient mix-nets. In Proc.

PETaAZ03, Springer-Verlag, LNCS 2760, pages 66-80. Springer-Verlag, LNCS, 2003.

[39] National Institute of Standards and Technology. Specification for the advanced en-

cryption standard (aes). In Federal Information Processing Standards Publication 197.2001.

[40] Andriy Panchenko, Lukas Niessen, Andreas Zinnen, and Thomas Engel. Website Fin-gerprinting in Onion Routing Based Anonymization Networks. In Proceedings of the

ACM Workshop on Privacy in the Electronic Society (WPES), pages 103-114, October

2011.

[41] Birgit Pfitzmann. Breaking an efficient anonymous channel. In In EUROCRYPT, pages

332-340. Springer-Verlag, 1995.

[42] Birgit Pfitzmann and Andreas Pfitzmann. How to break the direct rsa-implementation

of mixes. In Advances in Cryptology-EUROCRYPT '89 Proceedings, pages 373-381.

Springer-Verlag, 1990.

45

[43] Johan Pouwelse, Pawe Garbacki, Dick Epema, and Henk Sips. The bittorrent p2pfile-sharing system: Measurements and analysis. In Miguel Castro and Robbert vanRenesse, editors, Peer-to-Peer Systems IV, volume 3640 of Lecture Notes in ComputerScience, pages 205-216. Springer Berlin Heidelberg, 2005.

[44] Jean-Frangois Raymond. Traffic Analysis: Protocols, Attacks, Design Issues, and OpenProblems. In H. Federrath, editor, Proceedings of Designing Privacy Enhancing Tech-nologies: Workshop on Design Issues in Anonymity and Unobservability, pages 10-29.Springer-Verlag, LNCS 2009, July 2000.

[45] Michael K. Reiter and Aviel D. Rubin. Anonymous web transactions with crowds.Commun. ACM, 42(2):32-48, February 1999.

[46] Marc Rennhard and Bernhard Plattner. Introducing morphmix: Peer-to-peer basedanonymous internet usage with collusion detection. In Proceedings of the 2002 ACMWorkshop on Privacy in the Electronic Society, WPES '02, pages 91-102, New York,NY, USA, 2002. ACM.

[47] Len Sassaman, Bram Cohen, and Nick Mathewson. The pynchon gate: A secure methodof pseudonymous mail retrieval. In Proceedings of the 2005 ACM Workshop on Privacyin the Electronic Society, WPES '05, pages 1-9, New York, NY, USA, 2005. ACM.

[481 Emin Gun Sirer, Sharad Goel, and Mark Robson. Eluding carnivores: File sharing withstrong anonymity. In In Proc. of ACM SIGOPS European Workshop, 2004.

[49] Al Teich, Mark S. Frankel, Rob Kling, and Yaching Lee. Anonymous communicationpolicies for the internet: Results and recommendations of the aaas conference. Infor-mation Society, 15(2), 1999.

[50] Michael Waidner and Birgit Pfitzmann. The dining cryptographers in the disco: Uncon-ditional sender and recipient untraceability with computationally secure serviceability.In Proceedings of the Workshop on the Theory and Application of Cryptographic Tech-niques on Advances in Cryptology, EUROCRYPT '89, pages 690-, New York, NY, USA,1990. Springer-Verlag New York, Inc.

[51] Tao Wang and Ian Goldberg. Improved website fingerprinting on tor. In Proceedingsof the Workshop on Privacy in the Electronic Society (WPES 2013). ACM, November2013.

[52] Douglas Wikstrom. Four practical attacks for "optimistic mixing for exit-polls", 2003.

[53] David Isaac Wolinsky, Henry Corrigan-Gibbs, Bryan Ford, and Aaron Johnson. Dissentin numbers: Making strong anonymity scale. In Presented as part of the 10th USENIXSymposium on Operating Systems Design and Implementation (OSDI 12), pages 179-182, Hollywood, CA, 2012. USENIX.

[54] David Isaac Wolinsky, Ewa Syta, and Bryan Ford. Hang with your buddies to resistintersection attacks. In Proceedings of the 2013 ACM SIGSAC conference on Computerand Communications Security, CCS '13, pages 1153-1166, New York, NY, USA, 2013.ACM.

46

riffle: an efficient communication system with strong

Documents