rights / license: research collection in copyright - non …48196/eth-48196-01.pdf · — glenn...

73
Research Collection Master Thesis Secure High-Speed Anonymity Systems on Future Internet Architectures Author(s): Asoni, Daniele Enrico Publication Date: 2015 Permanent Link: https://doi.org/10.3929/ethz-a-010540726 Rights / License: In Copyright - Non-Commercial Use Permitted This page was generated automatically upon download from the ETH Zurich Research Collection . For more information please consult the Terms of use . ETH Library

Upload: others

Post on 31-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

Research Collection

Master Thesis

Secure High-Speed Anonymity Systems on Future InternetArchitectures

Author(s): Asoni, Daniele Enrico

Publication Date: 2015

Permanent Link: https://doi.org/10.3929/ethz-a-010540726

Rights / License: In Copyright - Non-Commercial Use Permitted

This page was generated automatically upon download from the ETH Zurich Research Collection. For moreinformation please consult the Terms of use.

ETH Library

Page 2: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

Secure High-Speed AnonymitySystems on Future Internet

Architectures

Master Thesis

Daniele Enrico Asoni

April 13, 2015

Advisors: Dr. D. Barrera, Prof. Dr. A. Perrig

Department of Computer Science, ETH Zurich

Page 3: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State
Page 4: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

Abstract

We design and evaluate HORNET, a protocol for anonymous communi-cations that operates at the network level. HORNET allows endpointsto communicate anonymously as long as not all nodes on a path arecompromised. The protocol uses asymmetric cryptography only forthe setup, and symmetric cryptography for data forwarding, enablinglow-latency anonymous communication suitable for realtime chat andvideo.

We evaluate the security of HORNET by analyzing possible attacks andshowing the strengths and limits of the protocol in defending againstthem. We also evaluate HORNET’s performance by implementing itand simulating the processing of data packets, and find that the over-head introduced by the protocol is typically under 2 ms, one tenth ofthe average network delay.

The current version of HORNET proves to be fast, scalable and secure,making it a well suited protocol to be part of the next generation ofInternet architectures.

i

Page 5: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State
Page 6: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

Contents

Contents iii

1 Introduction 11.1 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Background 32.1 Networking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1.1 OSI Model . . . . . . . . . . . . . . . . . . . . . . . . . . 42.1.2 Network Infrastructure . . . . . . . . . . . . . . . . . . 52.1.3 Future Internet Architectures . . . . . . . . . . . . . . . 6

2.2 Cryptographic Tools . . . . . . . . . . . . . . . . . . . . . . . . 72.2.1 Symmetric-Key Cryptographic Primitives . . . . . . . . 82.2.2 Cryptographic Hash Functions . . . . . . . . . . . . . . 102.2.3 Asymmetric Cryptography . . . . . . . . . . . . . . . . 10

3 Anonymous Communications 133.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.2 Mix Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.2.1 Sphinx . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.3 Onion Routing Networks . . . . . . . . . . . . . . . . . . . . . 18

3.3.1 Lightweight Anonymity for FIAs . . . . . . . . . . . . . 19

4 HORNET: High-speed Onion Routing at the NETwork Layer 214.1 Design objectives and assumptions . . . . . . . . . . . . . . . . 21

4.1.1 Network Model . . . . . . . . . . . . . . . . . . . . . . . 224.1.2 Requirements for Performance and Scalability . . . . . 234.1.3 Threat Model . . . . . . . . . . . . . . . . . . . . . . . . 244.1.4 Security Goals . . . . . . . . . . . . . . . . . . . . . . . . 25

4.2 Protocol Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 27

iii

Page 7: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

Contents

4.2.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.2.3 Forwarding Segment Collection and Distribution . . . 294.2.4 Session Setup Phase . . . . . . . . . . . . . . . . . . . . 334.2.5 Data Transmission Phase . . . . . . . . . . . . . . . . . 37

4.3 Enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.3.1 Session Re-establishment . . . . . . . . . . . . . . . . . 384.3.2 End-to-End Secure Channel . . . . . . . . . . . . . . . . 39

5 Analysis and Discussion 415.1 Security Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.1.1 Passive Attacks on Anonymity . . . . . . . . . . . . . . 415.1.2 Active Attacks on Anonymity . . . . . . . . . . . . . . 435.1.3 Forward Secrecy . . . . . . . . . . . . . . . . . . . . . . 455.1.4 Protecting the Nodes: DDoS attacks . . . . . . . . . . . 46

5.2 Anonymous Path Retrieval . . . . . . . . . . . . . . . . . . . . 475.3 Memory-Bandwidth Trade-off . . . . . . . . . . . . . . . . . . . 475.4 Composability with Other Protocols . . . . . . . . . . . . . . . 48

6 Evaluation 496.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

6.1.1 Interfaces and Modularity . . . . . . . . . . . . . . . . . 506.1.2 Notable Issues and Lessons Learned . . . . . . . . . . . 50

6.2 Initial Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . 516.3 Performance Measurements . . . . . . . . . . . . . . . . . . . . 52

6.3.1 Experiment Design . . . . . . . . . . . . . . . . . . . . . 526.3.2 Payload Size and Maximum Path Length . . . . . . . . 546.3.3 Performance Results . . . . . . . . . . . . . . . . . . . . 55

7 Conclusions 597.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

Bibliography 61

iv

Page 8: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

Chapter 1

Introduction

Transparency is for those who carry out public duties andexercise public power. Privacy is for everyone else.

— Glenn Greenwald, No Place to Hide: Edward Snowden,the NSA, and the U.S. Surveillance State

Over the last decade the design of the Internet has shown increasinglyalarming problems of scalability, manageability and security, promptingresearchers to investigate alternative designs for a next-generation Inter-net [35][47]. Among the most important problems is the surveillance pronenature of today’s Internet, which was recently brought to the attention of thepublic in the context of the leaks about the U.S. National Security Agency’smass surveillance programs [19]. Large volumes of global Internet traffic tra-verse a small set of routers, many of which are located in jurisdictions wherelarge-scale data collection is legal. However, in these future Internet archi-tectures anonymity and censorship resistance were almost never consideredfirst class citizen.

In the National Security Agency (NSA) leaks, Edward Snowden, former dataanalyst for the NSA, revealed classified programs which targeted indiscrim-inately the online browsing activities, emails, and phone calls of hundredsof millions of people. Supporters of these programs often claim that onlythose who engage in criminal activities have “something to hide” [43]. Thereare however clear cases in which anonymity would be desirable for (non-criminal) end users. For example, anonymity allows users to look up sensi-tive information, e.g., medical data, without revealing their interest in thatinformation. It allows journalists to report while avoiding censorship and re-taliation, and without revealing their sources. Medical privacy and freedomof expression are issues that, indeed, affect everyone, not only criminals.

Some solutions have been proposed to help users regain some of their lost

1

Page 9: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

1. Introduction

privacy, some of them even specifically for future Internet architectures. Todate, however, proposed solutions tend to trade-off one or more of security,usability or performance, leaving users with solutions that are highly securebut very slow, very user friendly but not secure, or very fast but not resistantto sophisticated attacks. This limits their use to only a very small subset ofthe users of the Internet.

in this thesis we present HORNET (High-speed Onion Routing at the NET-work layer), a new highly scalable solution based on next-generation Inter-net architectures, which enables secure high-speed and low latency anony-mous communications. HORNET does not trade off security for speed, andworks transparently with upper layer protocols, requiring no user-facingchanges to software. HORNET allows endpoints to establish anonymouscommunication channels such that no observer (be it a government-leveladversary or an ISP) can see who is communicating. To achieve this, HOR-NET uses layered encryption to completely mask communications at eachpoint in the network. It requires mostly only very efficient computations toachieve low latency, and it uses packet-carried state, which obviates the needfor routers to keep state, achieving optimal scalability. With these propertiesHORNET ise an ideal candidate for bringing anonymity into the core of thenetwork architecture of the future Internet.

HORNET is the result of joint work together with C. Chen (first author ofthe research), Dr. D. Barrera, and Prof. Dr. A. Perrig, from ETH Zurich, andwith Prof. Dr. G. Danezis from UCL.

1.1 Organization

The remainder of this thesis is organized as follows. In chapter 2 we dis-cuss the background topics of computer networking and of cryptography.We cover the high level structure of network protocols and architectures,providing some further information on future Internet architectures. Forcryptography we describe the fundamental functionalities (primitives) thatare needed to understand the details of HORNET. In chapter 3, we presentanonymous communications, getting the reader acquainted with the termi-nology of the field, and with some of the existing work. In chapter 4, wepresent the details of HORNET. The potential attacks on our scheme andits defenses, as well as a number of other aspects of the protocol, are dis-cussed in chapter 5. We evaluate the performance of HORNET in chapter 6.Chapter 7 discusses conclusion and future work.

2

Page 10: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

Chapter 2

Background

In this chapter we provide some background on computer networking anda very short description of the cryptographic primitives that will be used.A reader who is already familiar with these topics may want to skip thischapter, and come back to it later through references if needed.

2.1 Networking

The Internet is a global infrastructure which interconnects a multitude ofcomputer networks, allowing billions of users (private individuals, compa-nies and organizations) to communicate with each other. Though mostlyhidden to the average user, the degree of technological complexity of thisinfrastructure is very high: it includes a myriad of aspects, ranging from thedetails of physical transmission of information to the protocols that handlerouting of data packets through the networks, and from the handling of end-to-end data flows to the high-level application protocols (which enable forinstance email and web browsing).

It is beyond the scope of this thesis to present all of these parts, and wewill assume that the reader has some familiarity with the topic. Here wewill only briefly describe the Open Systems Interconnection (OSI) modelby the International Standards Organization (ISO), which is an abstractionwidely used for computer networks. We present more details only for onepart of this model, the network layer, which is the one that actually allowsinternetworking, and could arguably be defined as the core of the Internet.Afterwards, in subsection 2.1.3, we talk about proposals for alternative ar-chitectures for the Internet, which will be of fundamental importance in thisthesis.

3

Page 11: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

2. Background

2.1.1 OSI Model

The ISO Open Systems Interconnection (OSI) model for communication sys-tems [51] defines seven abstraction layers. The first (or lowest) is calledphysical layer and consists of the hardware that allows the transmission ofbits; examples are ethernet cables and optic fibers, but also equipment thatcan generate and receive electromagnetic waves. The second layer is calleddata link, and handles the underlying physical medium, using it to transmitsequences of bits (called frames).

OSI Model TCP/IP Model

Figure 2.1: Comparison between the ISO OSI model and the TCP/IP model.

The third layer is the network layer: this is the most important for our treat-ment. It handles the transmission of packets over multiple links of thenetwork, from an end point to another. The network layer is operated byrouters, devices with the purpose of forwarding packets in such a way thatthey will eventually reach their destination. Typically the original senderand recipient of a packet (we will call them source and destination) are endhosts, computers that unlike routers do not forward network traffic. Bothend hosts and routers are identified by a network address, sometimes calledIP address1. There are a number of protocols in practice that allow routersto exchange information between themselves in order to discover how theyneed to route a packet, given its destination address.

The fourth layer is the transport layer: it is implemented on end hosts, whileusually not on routers. This layer is responsible for the reliable transmission

1In the TCP/IP protocol suite, the most commonly used in the Internet, IP (Internetprotocol) is the main protocol for packet routing.

4

Page 12: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

2.1. Networking

of so called segments of data from a source to a destination, and is obliviousof the way the packets will be routed by the underlying layer: these detailsare hidden by the network layer, so that at the transport layer we have theabstraction of direct end-to-end communication.

We will not present the upper layers of the model, also because they do notwell reflect the structure of the TCP/IP protocol suite, the most widely usedin the Internet; instead we adopt for this part the structure of the TCP/IPmodel, which considers only one layer above transport, called applicationlayer (see Figure 2.1). Any protocol using the transport layer is considered tobe in the application layer, so besides a number of well known standardizedprotocols also user-defined protocols are at this layer.

2.1.2 Network Infrastructure

Often the network layer is depicted as a uniform grid of routers, but in re-ality it is realized through a hierarchical structure, where each unit is calledInternet Service Provider (ISP) or Autonomous System (AS)2: as the namesuggests, each Autonomous System is an independent network of routers.These ASes are usually controlled by a single entity or organization: a set ofbusiness relationships and policies between them regulate how network traf-fic should be exchanged. Routers on the border of the ASes (i.e., connectedto routers of other ASes), called edge-routers, are aware of the policies, anduse a protocol called Border Gateway Protocol (BGP) to exchange routinginformation. Internal routers instead are not aware of what is outside theAS they are part of.

The ASes are organized in a shallow hierarchy of three tiers, as show inFigure 2.2. ASes are calle The tier architecture is not regulated, it is justa naming convention. Tier 3 networks are at the bottom of the hierarchy:they are Internet Service Providers (ISPs) with a number of customers (endusers) to which they provide the possibility to communicate which eachother and access the Internet. However, to provide connectivity to the restof the Internet these ISPs need to purchase transit from another network.They are typically the ASes with smalles traffic volume.

Tier 2 networks are also called ISPs, as they provide Internet access either toend users or to tier 3 networks (or both). They also purchase transit fromother ISPs, but what distinguishes them from tier 3 networks is that theyalso have a number of peering links to other ASes. These peering links are

2To be precise, the terms ‘ISP’ and ‘AS’ do not mean exactly the same: the ISP is anorganization that controls an AS, which is a network with an assigned autonomous systemnumber (ASN). In some cases in practice a number of ISPs are visible to the rest of theInternet as a single AS, but for simplicity we will not consider this case and assume in thistreatment that the two terms can be used as synonyms.

5

Page 13: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

2. Background

Tier 1 ISP

Tier 1 ISP Tier 1 ISP

Tier 2 ISP

Tier 2 ISP

Tier 2 ISP

Tier 2 ISP

Tier 2 ISP

Tier 3 ISP

Tier 3 ISP

Tier 3 ISP

Tier 3 ISP

Tier 3 ISP

Tier 3 ISP

Figure 2.2: IPSs in their hierarchical structure: tier 1 networks at the top, tier 3 networks atthe bottom. Solid line are customer-provider links (c2p), dashed lines are peering links (p2p).

typically established in mutual interest (neither AS having to purchase tran-sit through the link from the other), but each AS typically implementing thepolicy of only accepting incoming traffic if destined to one of its customers.Tier 1 networks only have peering links to other networks, and their cus-tomers are other ISPs, not end users. These networks are those with thelargest capacity (i.e., transferring the largest amount of data), though tier 2ISPs have often a high number of peering links to other tier 2 ISPs to avoidas much as possible to send traffic through their tier 1 provider (which theyhave to pay according to the traffic volume).

With this we conclude our overview of the current state of the Internet. Weinvite the interested reader to consult a networking textbook, e.g., Tanen-baum’s Computer Networks [46] or Kurose and Ross’s Computer Networking: atop-down approach [26]. We will now move to new proposals for alternativearchitectures for the Internet, known as Future Internet Architectures. Thesewill be at the base of our research; most concepts in these architectures areunchanged with respect to what was presented so far.

2.1.3 Future Internet Architectures

The Internet was created around 40 years ago, but it is only in the last 20years that it has expanded rapidly to become the colossal infrastructure weknow today [28]. Used by billions of people, it is has a scope and an varietyof applications unimaginable at the time it was designed. It comes thereforeas no surprise that the original design of the Internet turned out to be unable

6

Page 14: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

2.2. Cryptographic Tools

to match many of the requirements of today’s applications, in particular interms of security, reliability and scalability [35][21]. For this reason, over adecade ago a number of projects were started in order to find possible alter-natives that could overcome the many issues that affect the current designof the Internet [35]. These new designs are generally called Future InternetArchitectures (FIAs).

Recent examples of FIAs are NIRA [49], Pathlets [17] and SCION [50]: theyare all based on the principle of source routing, meaning that, unlike in thecurrent Internet design, users can determine (to some extent) what path thepackets they send will be routed through. We will see that this principlecan be very useful for anonymity protocols. NIRA [49] focuses on scalabil-ity, and introduces the idea of splitting the path into a sender part and adestination part; SCION [50] builds in part on these ideas, but additionallyintroduces the concept of trust domains (also called isolation domains orISDs), which allow for higher resilience and security. Pathlets [17] insteadlays the focus on flexibility of the routing policies, which can be expressedthrough simple configuration rules.

2.2 Cryptographic Tools

We define in this section a series of well known and widely used crypto-graphic tools (called cryptographic primitives) which allow us to constructsecure protocols, for instance by providing authentication and confidential-ity for communications. We will assume some familiarity with the concepts,but we deem an intuitive understanding of the concepts to be sufficient tounderstand at least the high level ideas in this thesis3. We will however as-sume a good understanding of statistics and a basic knowledge of abstractalgebra (group theory).

As we will also see in the next chapter (3), in Information Security there is al-ways the concept of an adversary against which one tries to defend (thoughsometimes it is not explicitly stated): it is arguably the main aspect thatdistinguishes Information Security from other fields in Computer Science.To specify the strength of the adversary usually a security parameter k isprovided, to indicate that the adversary can only do an amount of compu-tational steps that is much smaller4 than 2k. We will use this parameter inthis thesis and when needed for the implementation and evaluation we willassume that k = 128.

Before presenting the cryptographic primitives we introduce some basic no-tation (mostly standard in computer science). We will in general work with

3For an extensive introduction to Cryptography see for example Cryptography: Theoryand Practice by Stinson [44].

4To be precise, the number of steps the adversary does must be polynomial in k.

7

Page 15: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

2. Background

bit sequences (or strings). The set of bit sequences of length l is indicatedas {0, 1}l ; 0l indicates a 0 bit string of length l. The set of all bit strings (ofany length) is indicated with {0, 1}⇤. For a bit string x, x[a..b] indicates thesubstring consisting of the bits from a to b, inclusive (the indices start from0, which indicates the leftmost bit). We use a special index end to indicatethe end of the string. With |x| we indicate the length of x. If y is anotherbit string, x k y denotes the concatenation of x and y. We denote the emptystring (i.e., of length 0) with #.

2.2.1 Symmetric-Key Cryptographic Primitives

Most primitives in cryptography need a key, which is a sequence of bits oflength k that is assumed to be uniformly random, or at least pseudo-random,meaning that an adversary cannot distinguish it from a string chosen uni-formly at random. Note that, because of the way the security parameteris defined, the probability that an adversary is able to guess such a key isnegligible (for each guess the probability that it is correct is 2�k). We de-scribe three basic cryptographic primitives, modelled as functions, whichneed keys. They are the following: Message Authentication Code (MAC),pseudo-random permutation (PRP) and pseudo-random generator (PRG).

MAC : {0, 1}k ⇥ {0, 1}⇤ ! {0, 1}k: A Message Authentication Code functiontakes as input a key and a string of arbitrary length, which is the messagethat has to be authenticated, and returns a bit string of length k, which iscalled MAC (admittedly, the terminology is a bit ambiguous). This is usedtypically when two parties share a secret key, known to no one else: toauthenticate a message, one party computes the MAC with the shared keyover the message (the second parameter of the function), and then send themessage to the other party together with the MAC. Once the other party re-ceives them, it can recompute the MAC over the message and check whetherit is equal: if it is, it knows that the first party sent that message, because thefunction is such that without knowing the key it is not possible to obtain thecorrect MAC for any message.

PRP : {0, 1}k ⇥ {0, 1}l ! {0, 1}l : For any input key, a pseudo-random per-mutation is a permutation5 (bijective function from and to {0, 1}l) suchthat if the key is pseudo-random it is indistinguishable from a permutationchosen uniformly at random from the set of all permutations over {0, 1}l

(this ideal primitive is called uniform random permutation, or URP). Wealways assume that one can also compute the inverse, PRP�1, which issuch that for any key s and message m (|m| = l) PRP�1(s;PRP(s; m)) =PRP(s;PRP�1(s; m)) = m6. It should be clear by the definition that, if some-one does not know the key s, knowing the value c = PRP(s; m) will leak

5Actually a family of permutations dependent on l.6In this and in the following functions we will use the semicolon (;) to separate the

8

Page 16: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

2.2. Cryptographic Tools

no information about m. An important thing to know about the existingimplementations of PRPs is that they have usually a small value of l7. Con-structions with larger l exist [4], but there has been less research on them asthey are computationally more expensive to compute.

PRG : {0, 1}k ! {0, 1}l : Given an input key, a pseudo-random generatorreturns a pseudo-random bit string of length l. Typically this is used with lmuch larger than k. In practice PRGs are realized through PRPs, though wewill not show the construction to achieve this here.

Encryption and Decryption. We use the PRG to define another pair of primi-tives, encryption and decryption. The typical definition is simply that of twokeyed functions enc and dec that are such that dec(s; enc(s; m)) = m and thatan adversary does not learn any information about m from c = enc(s; m) (cis called ciphertext, m is called plaintext). We see that the definition of PRPand PRP�1 satisfies these requirements, but the problem is that we want tohave encryption and decryption for plaintexts of arbitrary size, and as wementioned implementations of PRPs support only small message sizes (l).

Instead we define encryption and decryption as enc(s; m) = dec(s; m) =m � PRG(s). It is easy to see that dec(s; enc(s; m)) = m (remember that, if|x| = l, x � x = 0l , which is the identity element for the x-or operation).This construction is called stream cipher. It turns out that this construction isnot secure if the same key is used to encrypt more than one payload. Whatis usually done to avoid this problem is to use a modified PRG that takes anextra parameter, which for simplicity we consider as an extension to the key.Encryption and decryption become the following:

enc(s; v; m) = dec(s; v; m) = m� PRG(s k v)

The way this is used is that the party that is encrypting message m will gen-erate a value v, called nonce (for number used once), use it for the encryptionand then send or store it together with the ciphertext. The party that laterwants to decrypt the ciphertext will use the nonce v to do so. As long as nononce is used twice with the same key, this scheme can be used to encryptmultiple messages.

While in this construction the encryption and decryption operations are thesame, sometimes other constructions are used that do not have this property.To be more flexible we will therefore not require that enc and dec be thesame, and when this would be needed we will instead explicitly use thePRG function and x-or its output to the plaintext.

arguments. This makes the notation more readable and removes the ambiguity due to thefact that sometimes in literature the comma (,) is instead used to denote string concatenation.

7in AES [32], l 2 {128, 192, 256}, so for simplicity in our evaluation we use l = k = 128.

9

Page 17: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

2. Background

2.2.2 Cryptographic Hash Functions

A conceptually very simple but very useful primitive it the cryptographichash function. These are functions with signature {0, 1}⇤ ! {0, 1}l for somenumber l, which are typically assumed to act as publicly available pseudo-random functions (PRFs). Analogously to the case of PRPs and URPs whichwe saw previously, a PRF is a family of keyed functions that, when theirkey is chosen randomly, are computationally indistinguishable for an idealprimitive called uniform random function (URF), which is a randomly cho-sen function from the set of all existing functions with signature8 {0, 1}⇤ !{0, 1}l .

To be precise, a publicly available PRF without a key is what is called arandom oracle (RO). It has been shown that there are schemes that are securein the RO model, but not under any concrete instantiation of it [7]. For thisreason cryptographic hash functions are instead defined only by a series ofproperties, specified as games which an adversary should not be able to winwith non-negligible probability. We will not go into further details of thisand assume instead that the hash functions we use behave like ROs.

One of the main uses we will make of these functions is to derive a set ofkeys from a single key. Specifically, in our scheme we will often have thescenario where one shared key is available but a number of keyed primitiveshave to be used (see chapter 4). Since it would be insecure to reuse the samekey for different primitives, we assume instead that for each a different keyis derived from the main shared key by using a different hash function. Wewill usually subscript the hash function with the name of the primitive forwhich the key is generated: for example, hashPRG(s) would be the key of aPRG, derived from the main key s.

2.2.3 Asymmetric Cryptography

In 1976 Diffie and Hellman presented a scheme [12] based on exponentia-tion in finite cyclic groups which allows the establishment of a shared keybetween two parties over an insecure channel (e.g., the Internet). This publi-cation represents a fundamental turn in the history of cryptography, whichup to that point had only presented schemes that relied on keys alreadyestablished between parties (assuming they would have been agreed uponwith some offline method, like a face to face meeting).

Di�e-Hellman Key Exchange. The basic idea of this scheme works as fol-lows. A cyclic group G of order q ⇡ 22k with generator g is given. PartiesA and B have each a secret integer in Z⇤q , a and b, respectively, and a pub-lic group element, ga and gb, respectively. Values a and b are called private

8The signature is the same as that of the PRF, so in other cases the input domain couldbe limited, e.g., to {0, 1}t for some t.

10

Page 18: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

2.2. Cryptographic Tools

keys, ga and gb public keys. A and B can exchange their public keys overan insecure channel. Each party computes the exponentiation of the otherparty’s public key with its own private key: A computes (gb)a = gab and Bcomputes (ga)b = gba. Since the group operation is commutative in cyclicgroups, gab = gba, so the two parties actually obtain the same value.

However, is believed that (in certain groups) computing gab from ga and gb

is hard9, so an adversary eavesdropping on the network would not learn gab.This means that this value can be used by A and B as a shared secret key (aslong as the right groups are chosen).

When using this scheme in practice, the keys that are used could be gen-erate just before a communication and used just for that communication(temporary keys), or be long-term keys, used over longer periods of timefor an arbitrary number of key exchanges. In the latter case, often thesekeys are registered and distributed through so called public key infrastruc-tures (PKIs), where trusted entities can provide guarantees that a certainpublic key belongs to a certain party. This is done through signed certifi-cates, which are based on another kind of public key cryptography that wewill not present here.

In general there are other schemes that use this same concept of private andpublic key pairs (in general called asymmetric cryptography), most notablyRSA [40], which allows direct encryption of messages with a public keythat can then only be decrypted with the corresponding private key. In thisthesis, however, we will only need the Diffie-Hellman key exchange.

Forward Secrecy. We briefly discuss a property called forward secrecy [13]which is related to the use of temporary keys as opposed to long-term keys.We saw that if during a Diffie-Hellman key exchange between A and Ban adversary E is eavesdropping on a communication channel, E learns ga

and gb. We also mentioned that from these two elements E cannot learnthe established shared key gab. Assume that A and B use the shared keyto exchange confidential messages by encrypting them: the adversary seesonly the ciphertext of these encrypted message.

Now consider what happens if, some time after the communication ended,E is able to compromise one of the parties, say A, thereby learning all thesecret keys stored on A’s computer. If for the Diffie-Hellman key estab-lishment with B A used his long-term key, E learns the value a from A’scomputer, and can thus compute (gb)a = gab. If E stored all the encryptedmessages it saw before, it is now able to decrypt them. If A had used a tem-porary key instead, this would not have happened, since at the point when

9This is called the Computational Diffie-Hellman assumption (CDH). It is even believedthat it (in certain groups) it is hard to distinguish gab from a random group element given ga

and gb, which is called the Distinctional Diffie-Hellman assumption (DDH). We will assumethat a group is used that satisfies DDH.

11

Page 19: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

2. Background

E gets access to A’s computer the temporary private key a has already beendeleted. Cryptographic schemes that remain secure even if at some pointin the future some of the parties are compromised (their long term privatekey is leaked to the adversary) are said to have perfect forward secrecy, or justforward secrecy.

Performance. While asymmetric cryptography is very powerful and hasadvantages over symmetric cryptography10, the great disadvantage it has isperformance. The difference in the time it takes to perform operations ofasymmetric cryptography and operations of symmetric cryptography is ofat least three to four orders of magnitude. This means that when designingany protocol that uses cryptographic primitives one should try to restrictthe use of asymmetric cryptography as much as possible. In chapter 4 wedescribe how our proposed anonymity protocol restricts the use of asym-metric cryptography to the essential, switching to symmetric cryptographyfor most of its processing.

10One of these advantages is the fact that in general the security of asymmetric cryptog-raphy schemes is based on hardness assumptions on well defined mathematical problems,while for symmetric cryptography more ad-hoc methods are used.

12

Page 20: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

Chapter 3

Anonymous Communications

When first approaching the topic of anonymous communications it mayseem easy to define the main goal: to hide, from some adversarial entity,which party is communicating with which other party. In practice, however,this concept can assume very different connotations. Often it is the case thata user wishes to access some service on the Internet, without that service(or some other entity in the network) learning the identity of the user: typi-cal examples are companies browsing for patents desiring to avoid revealingtheir interest, but also whistleblowers submitting documents to a newspaper,or activists and journalists working under censorship regimes, who want toavoid retaliation. More and more it is also just private citizens who, becauseof the increase in user tracking by companies on the web [31] and in lightof the recent disclosures about global mass surveillance [19], wish to have ameans that ensures their privacy is being respected.

A different kind of anonymous communications is that of anonymous ser-vices, that wish to be accessible without disclosing their location, and thereare even cases where two parties who know each other wish to communi-cate without anyone else learning about it. For example, a scenario wherethis could happen is that of two companies in the early stages of a mergerdiscussion; another possibility is that of a research group of a company com-municating with the patent attorneys of their company who are in anothercity1.

Yet more variety in the field is introduced by the different types of adver-saries that are considered by different systems offering anonymous com-munications: from curious destinations to powerful Dolev-Yao-like adver-saries [15], many possibilities have been considered in the literature. Allthese points make a unified treatment difficult. To the complexity of thetopic comes also the fact that anonymous communications constitute a rela-

1These two examples are suggested by Danezis et al. [10].

13

Page 21: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

3. Anonymous Communications

tively new research area in Information Security compared for example withconfidential communications2[6][10]: all this explains why no treatment ofthe subject has arisen so far that is generally accepted by the researchers inthe field [45] (or that can be applied well to all cases).

We will not consider all aspects of the problem, and will instead limit thescope to those elements that are essential to this thesis. In this section wewill present the terminology that we adopt, and then discuss the two mostimportant approaches that can be taken to designing anonymity systems,which are that of mix networks (section 3.2) and that of onion routing networks(section 3.3). In the latter category we also present a number of recent pro-posals for anonymity systems built for future Internet architectures (FIAs,see subsection 2.1.3), which constitute an important piece of related workfor our proposal (which we will see in chapter 4). For a more detailed treat-ment we refer the reader to the article Systems of Anonymous Communicationby Danezis, Diaz and Syverson [10].

3.1 Terminology

We use the widely adopted terminology from Pfitzmann and Hansen3 [36]as presented by Danezis et al. [10]. The quotations below are from the latterunless otherwise specified. In particular we are interested in the definitionof anonymity, undetectability and unlinkability.

Anonymity. Anonymity of a subject is defined in relation to a set of subjects“with potentially the same attributes”, as “the state of being not identifiablewithin a set of subjects, the anonymity set”; the scope of this definition is an“action or transaction”. In this thesis we will consider only (wired) networkcommunications, so the “actions” will be either single messages (packets)sent over the network, or more high level communications, which mightconsist of a number of packets. These communications are between two endhosts, a source (or sender) and a destination (or receiver)4: accordingly, thesubject to which the definition of anonymity refers can therefore be a sourceor a destination.

For a certain communication, the anonymity set composed of the possiblesources and the anonymity set of the possible destinations are independent,

2Privacy of communications has been a concern for centuries, but was dealt with by leg-islative means, rather than technological ones, until recently: the first publication in the fieldwas by Chaum in 1981 [8], made possible by the development of computer networks as wellas public key cryptography. On the other hand there have been attempts to tackle confiden-tial communications by technological means since antiquity (e.g., the Cesar Cipher)[27].

3Hansen was named ‘Kohntopp’ at the time of the original publication [36].4For bidirectional communication the source is the end host which initiates the commu-

nication.

14

Page 22: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

3.1. Terminology

they can have some or all entities in common, or be completely disjoint.These sets provide a simple way to measure anonymity, the intuition beingthat a larger set implies better anonymity5.

Danezis et al. [10] also rephrase the property as follows: “a subject carries on[a] transaction anonymously if he cannot be adequately distinguished (by anadversary) from other subjects”. This definition introduces a second element,which is the adversary: as always in information security the indication of theadversarial model is of paramount importance, as no significant definitionof security can be made without it.

The importance of the specification of the adversary for our network settingbecomes clear in the following example. Assume that a source S wishes tocontact a web server D anonymously, in such a way that no single entity inthe network (a router or the web server) learns that S is communicating withD. Now considering D’s point of view, it sees the incoming communication,but if the sender address is somehow hidden D might have little informationabout which end host the communication is coming from, so the anonymityset size of the possible sources could be very large. By the definition abovewe can say that S has anonymity or is anonymous; but this is with respect to D.If instead we consider the point of view of the source’s ISP, we see that thisentity knows exactly which source is participating in which communication,so the S’s anonymity set size is 1: S is not anonymous to its ISP.

Undetectability. The example above shows that anonymity, as defined, isnot well suited to our network communication scenario, or at least not suffi-cient. Indeed to be anonymous from its ISP, the source would need to havean additional property which is called undetectability6, meaning that the ISPshould not be able to detect that the source is communicating at all. In wire-less networks a source could achieve undetectability through mechanismslike spread spectrum, while in other schemes this property is achieved in-troducing padding traffic (i.e., dummy traffic that will be discarded by thedestination) in such a way that, except for the source and destination, noone is able to tell whether a packet contains significant traffic or not.

However we note that a scheme that instead hides the destination of a com-munication from source’s ISP (without undetectability) might still be desir-able for the source: in such a case the ISP would see the source communicat-ing, but might not know with what destination. By our definition in such acase it would be the destination to be anonymous (within some anonymityset). So, intuitively, for a source it could be sufficient if for a certain commu-

5Measuring anonymity by the anonymity set size has important limits however and hasbeen criticized for this reason [45][41]. We will not go into the details here.

6Pfitzmann and Hansen originally called this property “unobservability”[36], andDanezis et al. [10] still use this terminology, but meanwhile Pfitzmann and Hansen haveopted for “undetectability”.

15

Page 23: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

3. Anonymous Communications

nication its ISP is not able to de-anonymize both its end points at the sametime. More generally, if an entity can identify the source (or restrict it to avery small anonymity set) then the anonymity set of the destination must belarge, and vice versa. The concept of unlinkability captures this notion.

Unlinkability. Danezis et al. [10] define unlinkability as follows:

“Unlinkability of two or more Items Of Interest (IOIs: e.g., sub-jects, messages, events, actions, . . . ) means that within the sys-tem (comprising these and possibly other items), from the at-tacker’s perspective, these items of interest are no more and noless related after his observation than they were related concern-ing his a-priori knowledge.”

This definition is very generic, and we will restrict it to the case where thetwo IOIs are subjects (end hosts of a network), a source and a destination.We will therefore often use the expression source-destination unlinkability. Wesee that this definition fits our previous scenario much better: what thesource wants is to be unlinkable to the destination it is communicating with,with respect to an adversary, be it its ISP, a router in the Internet or thedestination itself.

To be precise, the above definition is what Danezis et al. call “relative unlink-ability”; they also present a definition for “absolute unlinkability”, whichchanges in that the attacker is not able to determine a relation between twoIOIs at all. We will not use the latter definition however because it is toostrict and difficult to use, as in most scenarios the adversary knows someinformation about the system from the beginning (which web services aremost popular, in what country the most active Internet users are at a certaintime, etc.).

3.2 Mix Networks

In 1981 Chaum published a paper on anonymous email communications [8]which introduced the concept of mix networks, and more in general that ofanonymous communications in computer networks. His scheme consists ina set of mixes which forward messages hiding the link between them and thecorresponding output messages (through bitwise unlinkability). The senderof a message specifies the sequence of nodes that need to be traversed, usingtheir RSA public keys for this purpose. In the scheme, each mix only knowsthe previous and next mix. The mixes “mix” the messages they forward bychanging their order, hence the name for this kind of schemes.

Mix networks tend to be high-latency (hours) since they require asymmetriccryptography to be used (see subsection 2.2.3). Additionally, any traffic froma host which is relayed through a mix network must be mixed with the traffic

16

Page 24: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

3.2. Mix Networks

of other hosts to provide anonymity guarantees. Extra traffic (often calledpadding or cover traffic) is sometimes added as well. Thus, due to their slowperformance, mix networks are impractical for realtime communications

The principle that allows this type of anonymous communications is calledpartitioned routing information, which means that every node on a path of amessage obtains only a fraction of the information on how to route the mes-sage from sender to receiver, ideally only as much as is needed to correctlyforward it to the next mix. This principle holds in general for all networkanonymity systems.

Notable schemes that were proposed later which are based on the same prin-ciples are Babel [20] and Mixmaster (a now-expired IETF Internet-Draft)[48].More recently Mixminion was proposed [9], which is widely deployed todayand considered a state of the art remailer [10]. For Mixminion a replacementfor the packet format and associated processing algorithms was recently pro-posed by Danezis and Goldberg, called Sphinx [11]; Sphinx can also be seenas an autonomous protocol. It is a scheme that, as we will see, is fundamen-tal for our proposal, HORNET, that we will present in chapter 4. For thisreason we provide here a more detailed overview of it.

3.2.1 Sphinx

Sphinx [11] is an anonymity protocol which is provably secure against anadversary capable of actively injecting or manipulating traffic, with accessto all network links and capable of compromising all mix nodes but one onthe path (if less nodes are compromised the anonymity-set size of the sourceincreases). Sphinx itself does not do any mixing or batching, since it to beused as part of Mixminion which handles these operations.

We use Sphinx in HORNET (see chapter 4). HORNET is designed to bea low-latency anonymity system, so we use Sphinx without any mixingor batching as this would introduce a large overhead. Another importantchange in our use of Sphinx with respect to the original specification in thepaper concerns replay protection. Sphinx requires the mix nodes to storehashes of seen messages (only of part of their headers, to be precise) to pre-vent replays of the same message. In our use of this protocol we will assumethat this part of the processing is skipped, as it would be too expensive interms of memory usage in our scenario (see chapter 4). Even without mix-ing, batching and replay protection the scheme provides sufficient securityproperties for our purposes.

At the core of Sphinx is the use of Diffie-Hellman (DH) key exchanges(which we explained in subsection 2.2.3), through which the sender of amessage establishes a shared secret with each node on the path: from eachshared secret a number of keys are derived (through cryptographic hash

17

Page 25: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

3. Anonymous Communications

functions) which are used for encryption, integrity protection, and othersteps of packet-processing. In particular the scheme allows to establish theshared keys with all the mixes on a path with one single packet, as thepacket traverses the mixes.

The establishment of these keys is achieved by the source generating a tem-porary public key gx, which is inserted in the header of the packet and usedto do a DH key exchange with the first mix (for this the long-term publickey of the node is used, so the source can compute the key that will beestablished before sending the packet). This mix then computes a blindingfactor b based on the established key, and uses it to blind the public key ofthe source, by computing gxb. This becomes the public key he sources usesfor the second mix. Each mix can also verify the integrity of the header andextract from it the address of the next mix. This process is repeated untilone mix receives, instead of the address of the next node, an identifier thatindicates that the packet has reached the intended destination.

The design of the scheme allows the header to be very compact: it can be assmall as 64 bytes plus 32 bytes per hop. The scheme fixes a maximum num-ber of hops that are allowed to be on a path. Sphinx also provides receiveranonymity, which means that an anonymous user can construct a headerwhich can be used by a so-called nymserver (which handles pseudonyms)to send back a reply to that user.

For payload encryption the protocol uses a block cipher with block sizeequal to the size of the entire message (the authors suggest LIONESS [4]);while less efficient than a stream cipher, such a block cipher guarantees thatif at any step the payload is modified, the message it carries will be lostcompletely.

3.3 Onion Routing Networks

Between 1996 and 1998, a low-latency alternative to mix networks was pro-posed by researchers at the U.S. Naval Research Lab (NRL) which intro-duced the concept of onion routing [18][39]. This scheme later evolved intoone of the most popular anonymity systems used today, Tor [14]. The fun-damental idea of these schemes, which clearly distinguishes them from mixnetworks, is that they work by first establishing a circuit between a sourceand a destination, and then allowing fast (i.e., without asymmetric cryptog-raphy) bidirectional communication over the circuit. While these networksare faster, they provide less security than mix networks, so in general theyassume less powerful opponents.

One difference between the original onion routing design and Tor is thatthe first uses a single packet to establish the whole circuit, while Tor has atelescopic path-extension mechanism, in which the client opening a circuit

18

Page 26: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

3.3. Onion Routing Networks

contacts each onion router sequentially, extending the circuit one hop at thetime. While this requires a larger network overhead (each router has to sendback a reply that allows the source to obtain the established shared key), itallows the scheme to achieve perfect forward secrecy (see subsection 2.2.3).

Other notable schemes that fall into the category of onion routing networksare peer-to-peer networks like Tarzan [16].

3.3.1 Lightweight Anonymity for FIAs

Recently a number of schemes were proposed that are based on future In-ternet architectures (FIAs, see subsection 2.1.3) and that exploit the proper-ties of their design. LAP [23] leverages source routing and a very relaxedattacker model to obtain a lightweight anonymity protocol that does not re-quire any online public key cryptography operation and uses near-shortest-path routes.This scheme is especially interesting of this thesis as it has asimilar ambition as the one behind our proposal, HORNET: this ambition isto make anonymity one of the fundamental tasks of the network architecture,rather than something to be achieved as an overlay.

One of the key aspects of LAP, which is not shared by HORNET, is thatwhile the source specifies the path to the destination and back, it does notestablish any shared key with the nodes, and has no way to ensure thatthe packet is actually being routed correctly, nor indeed whether it is beinganonymized at all, should the first node on the path be compromised7. Alsothe payload is never encrypted, and the entire packet looks the same at eachhop. Trying to improve on some of the flaws in LAP, by allowing for aslightly stronger attacker model, Dovetail [41] was proposed. This schemestill has no onion-encryption for the payload, and the packets look the sameat each hop. Both schemes reveal the length of the path and the position ofa node on the path8.

Also related to LAP and Dovetail, though not specifically meant fo FIAs,is a workshop proposal named Tor Instead of IP [30], in which the authorssuggest integrating Tor’s onion routing into the network routing protocols.

7One of the assumptions in LAP is that the first node is honest.8LAP provides a way to hide the exact path length to some extent, but simple timing

analysis through the measurement of the round-trip time of packets could still reveal it.

19

Page 27: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State
Page 28: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

Chapter 4

HORNET: High-speed Onion Routingat the NETwork Layer

In this chapter we describe our protocol, HORNET, a secure high-speedanonymity protocol designed to be part of the network layer of Future Inter-net Architectures (FIAs, see section 2.1). The vision behind HORNET is thatthe networking infrastructure should have anonymity as one of its principaltasks. Compared to previous proposals with similar goals, HORNET is de-signed to provide higher security while maintaining an acceptable level ofperformance overhead.

We first show the rationale underlying our design choices (section 4.1), thendescribe the protocol (section 4.2), providing a high level overview of itsstructure before going into a more detailed and formal specification. Finallywe mention possible extensions to the protocol in section 4.3.

HORNET is the result of joint work together with C. Chen (first author ofthe research), Dr. D. Barrera, and Prof. Dr. A. Perrig, from ETH Zurich, andwith Prof. Dr. G. Danezis from UCL.

4.1 Design objectives and assumptions

The purpose of our protocol is to enable end hosts of a network to com-municate with each other anonymously. To be more precise according tothe definitions in chapter 3, for each anonymous communication HORNETguarantees to the source1 that source and destination are unlinkable by theadversary (as defined in the threat model, see below).

1The naming reflects the way the anonymous communication is initiated, but trafficmay actually flow in both directions. See section 2.1 for more details about the adoptedterminology for networking.

21

Page 29: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

4. HORNET: High-speed Onion Routing at the NETwork Layer

HORNET was designed with the main objective of high performance, scal-ability, and security. The first two properties are strictly dependent on thenetwork model we adopt, while security (which we intend mainly as theguarantee of anonymity) is strictly connected to the threat model. In thissection we present these models, which constitute the fundamental assump-tions underlying our design, and show how the main features of our pro-tocol are derived from them. In particular we show what requirements ournetwork model imposes on our system for it to be fast and scalable, andwhat security properties are needed to defend against the adversary definedin our threat model.

4.1.1 Network Model

We consider a scenario where a large network (Internet-scale)2 is involvedin anonymous communications; in particular, we assume that a large part ofthe traffic traversing this network (e.g., 10%) is anonymized through HOR-NET. We assume this network is made of sparsely connected nodes3 and endhosts. These nodes are responsible for the forwarding of anonymized traffic,and cooperate with sources to establish what we call anonymous sessions (orjust sessions) to their intended destinations. Their role is equivalent of that ofmixes in mix networks, or of onion routers in onion routing networks (seechapter 3).

We assume that each node has generated a public/private key pair andhas made the public key available for lookup in a public-key infrastructure(PKI) [1]. This PKI allows any end host to retrieve and verify a node’s publickey. Public key management and distribution is a complex research problemwhich is complementary to ours, so we do not present a full solution herein.However, we imagine that promising proposals such as RPKI [24] could beused to provide such a functionality. In addition to the network’s nodes,also end-hosts acting as destinations in anonymous communications havea public key, which for simplicity we imagine to be registered through thesame PKI4. Because of similarities in HORNET’s packet processing by desti-

2We try to define the network model in a generic way, so that it may adapt to differentscenarios, but in practice we imagine that this network would be the whole Internet.

3The reason why we use this abstraction is that a node would model different entities inpractice, depending on the FIA that is used. We imagine that in SCION [50] and NIRA [49] anode would be an autonomous domain (AD), and the actual processing would be done by anedge router of the AD, while in Pathlets [17] a node would correspond to a vnode (a virtualnode, which could be implemented as part of a router or as group or routers, depending onthe situation).

4This could be the case for web pages and other web services with public access, butalternatives would be AIP [3] or PGP key servers [52], the latter requiring an extra layer ofindirection (the source would connect anonymously to the key server first, the key of whichthe source would need to retrieve through the PKI or AIP).

22

Page 30: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

4.1. Design objectives and assumptions

nations and by traffic-forwarding nodes, we will often extend the definitionof nodes to include destinations.

As we mentioned, the protocol is to be part of the network layer of FIAswhich allow source routing, i.e., they allow sources to specify, to some extent,how packets should be routed from the source to the intended destination.In our model, a source can specify the entire sequence of nodes throughwhich traffic should be routed: we call such a sequence of nodes a path.More precisely, when a source want to establish an anonymous session it canspecify both the path for the traffic it sends to the destination and the pathfor the traffic the destination sends back. We call these paths forward pathand backward path, respectively. In general these paths can involve differentrouters (asymmetric paths), but they could also be just one the reverse of theother (symmetric paths).

Source routing constitutes the basis for moving anonymity to the networklayer, as it provides a way to have partitioned routing information (see sec-tion 3.2) which current solutions can only achieve as overlay networks (i.e.,at the application layer). In order to have anonymous source routing, how-ever, a way is needed for sources to obtain the paths – and the public keysfor the nodes on those paths – without revealing their intended destination.Providing a scheme that achieves this is not trivial, but not our primary fo-cus. In section 5.2 we discuss possible approaches to solve this problem, andfor now we maintain the assumption that one such scheme is available.

4.1.2 Requirements for Performance and Scalability

From the assumptions on the network model we just presented we can de-rive requirements our protocol must fulfill to have high performance andhigh scalability. For performance, two points are to be considered: the la-tency of the network, and the time required to process each packet. Forthe network latency our protocol is already optimal, as it gives control ofrouting to the source, allowing it choose the shortest path to the destination.Processing time becomes therefore our main concern for performance.

For security, to establish an anonymous session with a destination a sourceneeds to have shared keys with every node on the paths (forward and back-ward), which will be used to verify the integrity of packets and encrypt/de-crypt them in such a way that they look different at every hop (see below,subsection 4.1.3). This is something that is done for each packet in mix net-works, but the reason we cannot just use such systems for all traffic is thatthey require asymmetric cryptography to be used by every node, for eachpacket: this is prohibitively expensive in terms of processing time, especiallyconsidering the large traffic volume that nodes would have to handle accord-ing to our network model. The first requirement for our system is therefore

23

Page 31: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

4. HORNET: High-speed Onion Routing at the NETwork Layer

that, for the overwhelming majority of the packets, only symmetric keycryptography must be used.

Since the shared keys are not something that our model provides, HORNETneeds to provide a way to establish them. For this, asymmetric cryptographyis unavoidable, so the structure of the protocol should have two phases: ashort initial phase where the keys are established, and a phase where datais transmitted and only symmetric cryptography is used. This is a schemetypically used in onion routing as, e.g., Tor [14].

The problem with such a scheme is that it requires the nodes to keep state(at least the shared key) for every session that passes through them. Unfor-tunately, this clashes with our goal of having a highly scalable system: ifa node is keeping state and is implemented in a distributed or parallelizedway (for example using multiple routers), then the state needs to be sharedor replicated if any unit should be able to process all sessions. What makesit even worse is the fact that in our model we imagine that nodes at centralpositions in the network might be handling many millions anonymous ses-sions in normal operating conditions, meaning that the size of the state ofsuch a node could be of the order of Gigabytes. These points bring us tostate another requirement for our protocol, which is that it should requireno per-session state on nodes5.

As we will show in more detail in section 4.2, the key to solving the conflictbetween the restrictions on the use of cryptography and the requirementof not storing per-session state on nodes is packet-carried state, i.e., in ourprotocol every packet will carry the state of all nodes on the path in itsheader.

4.1.3 Threat Model

As discussed in subsection 3.3.1, low-latency anonymity networks are un-able to offer effective protection against global adversaries that can eaves-drop on all links and compromise a high number of nodes, which is insteadthe opponent typically considered in the more secure mix networks. We as-sume therefore an adversary that can observe only a fraction of the network.He can compromise nodes6, thereby learning all their secret keys, and seeingall the traffic traversing them. The adversary can also actively inject, modify,

5There would probably be ways around this requirement: for parallelization, there couldbe ways to ensure that each router of a node only needs to handle a subset of the sessionstraversing the node, thus still allowing scalability. Also, routers might have memories ofthe order of magnitude required (though a DDoS attack could still cause memory exhaus-tion). However, all considered we prefer to keep this requirement, which guarantees that ourprotocol will scale in a very simple and flexible way.

6We consider nodes owned by the adversary as compromised nodes.

24

Page 32: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

4.1. Design objectives and assumptions

replay and drop packets. However, we assume that the adversary wishes toavoid detection, i.e., revealing which nodes are compromised.

We consider the network topology to be fixed and publicly known. Thismeans that, when an anonymized packet traversed a compromised node,the adversary can restrict the anonymity sets of possible sources and desti-nations, because the network does not allow traffic to be routed arbitrarilythrough the network (and also because, as we will see, our protocol has toset an upper bound to the length of the paths). We call this identity leak-age from the topology (ILT), and it is an inherent characteristic against whicha high-speed protocol at the network layer like HORNET cannot protectagainst.

In section 5.4 we discuss possible ways in which our protocol could be en-hanced or used by upper-layer protocols in order to protect against strongeradversaries.

4.1.4 Security Goals

We mentioned before (subsection 4.1.2) that in our protocol we will havepacket carried state, meaning that every packet will carry the state of eachnode on the path it traverses (only the state of the session the packet be-longs to). For each node this state consists mainly in a key shared with thesource, but as we will see it will also contain the information on how toroute the packet to the next hop. Clearly the adversary should not learn thisinformation for any honest node, so HORNET must provide confidentialityfor the carried state.

Our protocol should also prevent packets from cryptographically leakingto any node what its position is on a certain path (i.e., the distance to thesource and destination), and what the total length of the path is. Note thatwhile packets might cryptographically hide these two characteristics, theadversary could still learn partial information because of the ILT: since aswe saw this cannot be avoided, our aim is to prevent the adversary fromgaining extra knowledge beyond that from the ILT.

Since the adversary can be active (injecting, modifying or replaying packets),but also wants to avoid detection, HORNET should try to reduce the adver-sary to a passive one by making such behavior detectable. In particular thismeans either having integrity protection that will allow an honest node toimmediately drop illegitimate packets (which can be noticed by the source orthe destination), or having end-to-end integrity protection (directly checkedby source and destination)7.

7Defense against replay is the most problematic aspect, and we discuss it extensively insubsection 5.1.2.

25

Page 33: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

4. HORNET: High-speed Onion Routing at the NETwork Layer

The scenario behind the characterization of the adversary presented in sub-section 4.1.3 is that of a government-class opponent (a local or national gov-ernment entity, or large criminal organization), which is not focusing juston a handful of targets, but is instead analyzing all the traffic it can observein order to do user profiling, which over time could allow him to identifyindividuals and links between them according to some criteria. To frustratesuch an attacker HORNET should provide session unlinkability, meaning thatby observing the packets of two sessions an adversary is not able to discoverwhether they have an end point (or both) in common or not.

We also want to make it more difficult for an adversary to do traffic analysisto discover whether two sessions observed at two different points on thepath are the same. More precisely, HORNET’s packet processing at eachnode should be such that at different points in the network packets from thesame flow are computationally indistinguishable from packets of differentflows: we call this packet blinding. Note that this alone will not prevent an ad-versary from linking packets seen at different points to the same session, butit means that there is for instance no identifier in a packet that would allowhim to do so trivially. Instead, the adversary would have to do more com-plex traffic analysis which would require him to store and compare largesequences of packets instead of just storing and comparing, e.g., the firstpacket seen for each session: this would make the attack much more expen-sive. It also allows for mechanisms to defend against traffic analysis to beadded by upper layer protocols which use HORNET; finding such strategieshowever is out of the scope of this thesis.

To recapitulate we list all the specific security goals we identified:

• Confidentiality for the carried state• Node position hiding• Path length hiding• Integrity protection• Session unlinkability• Packet blinding.

4.2 Protocol Design

Now that the assumptions, requirements and goals have been defined wecan describe HORNET. In particular we show how it establishes a singleanonymous session between a source S and a destination D, assuming thatS has already determined the forward path and backward path to use, andhas retrieved the public keys for all the nodes on those paths (see subsec-tion 4.1.1). Before going into the details, we provide an overview in whichwe show how the requirements and goals lead to our fundamental design

26

Page 34: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

4.2. Protocol Design

decisions. In the overview we also show what the main challenges are thatthe design needs to overcome.

4.2.1 Overview

We saw in subsection 4.1.2 that given our assumptions on the network model,to have high performance our protocol should be structured in two phases.The first is the session setup phase (or just setup phase), in which keys sharedbetween the source and each node are established (using asymmetric cryp-tography). The second phase is the data transmission phase, in which onlysymmetric key cryptography is used. In order to process and route a packetduring the data transmission phase, each node needs the key it has estab-lished with the source during the setup phase, as well as the informationabout how the packet needs to be forwarded8.

However, all this information constitutes per-flow state, which as we sawcannot be stored on the nodes for scalability reasons. The solution is touse packet carried state (this is very similar to the notion of “packet-carriedforwarding state” in LAP [23], the difference being that in LAP the state onlycontains the routing information to forward packets). To explain how thisworks, let us first consider only the forward path (from S to D): during thesetup, after having established the shared key with the source, each nodeencrypts the state with a secret value SV known only to the node itself. Wecall this encrypted state forwarding segment, or FS for brevity. The followingequation shows the concept:

FS = enc (SV; shared key k routing info) (4.1)

The FSes for all nodes are then collected by the source (we will come backto how this is done). During the data transmission phase the source willinclude all the FSes in the header of every data packet it sends, so that eachnode will be able to retrieve its state for the session by decrypting the FS itpreviously generated.

From this description three non-trivial points emerge that need to be solved.First, how the FSes can be collected anonymously, i.e., without breakingsource-destination unlinkability, during the setup. Second, how the FSescan be put in the headers of data packets so that each node can obtain hisown FS, but no node can learn its position on the path nor the length ofit. And third, how the destination can be enabled to send anonymized datapackets back to the source during the data transmission phase.

8We can imagine this routing information to be simply the name or address of the nextnode on the path, though in practice it would depend on the FIA on which HORNET isimplemented.

27

Page 35: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

4. HORNET: High-speed Onion Routing at the NETwork Layer

Anonymity in the Setup. We address the first issue, how to collect the FSesanonymously during the setup, by leveraging an existing mix network pro-tocol, Sphinx9 [11], of which we gave an overview in subsection 3.2.1. Byhaving each node act like a Sphinx mix node, the source can send Sphinxpackets that will traverse the network anonymously (since Sphinx uses pub-lic key cryptography, we will limit its use to the minimum possible).

The idea for the setup is to use one round trip made of two Sphinx packets,the first sent by the source along the forward path and the second sent backby the destination along the backward path (using a Sphinx reply headerconstructed by the source and included in the payload of the first packet).Additionally to the processing of the packets, each node will also create anFS, containing the shared key that is established by Sphinx. Each node then

“appends” the FS to the packet, so that when the source receives the secondpacket sent back by the destination it obtains all the FSes of both the forwardand backward path.

FS Collection and Distribution. The problem of somehow “appending” theFSes to the packets without this leaking information about the path lengthor position on the path to the nodes is dual to the second non-trivial issuethat we identified previously, i.e., the distribution of the FSes during thedata forwarding phase. To solve these problems we use two constructionswhich we call FS payload and FS dispenser. The FS payload can be seen asa stack to which nodes can push their FSes, that does not leak how manyFSes it already contains and the contents of which can be retrieved (pop)only by the source. Similarly, the FS dispenser can be seen as a stack thatthe source fills with all the FSes of a path, and from which each node canget its own FS (but none of the following). Like the FS payload, also the FSdispenser hides the number of FSes it contains. We will describe the detailsof these constructions in subsection 4.2.3.

Anonymous Replies in Data Transmission Phase. The third issue we hadidentified above, how the destination can send anonymous packets back tothe source, is easier to solve now that we have explained the high level struc-ture of the setup and the way the FS dispenser works. Once the sourceobtains the second setup packet it will be able to get all the FSes from theFS payload, and will construct two FS dispensers, one with the FSes of thenodes on the forward path and the other with the FSes of the nodes onthe backward path. It will use these FS dispensers to make the data packetheaders, which we call anonymous headers, for the two paths. At this point thesource can create data packets with the first header that will be forwardedto the destination anonymously, and it will immediately send one contain-ing the anonymous header for the backward path to the destination. Once

9The reasons for this choice are discussed in subsection 5.1.3.

28

Page 36: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

4.2. Protocol Design

the destination obtains his header, both end host can communicate, and atthis point we consider the setup concluded and the anonymous session es-tablished. We describe the setup in detail in subsection 4.2.4, and the datatransmission phase in subsection 4.2.5. In the next section we present somenotation that will be needed for the formal description of certain aspects ofthe protocol.

4.2.2 Notation

We indicate with r the maximum length of a path, that is the maximumnumber of nodes on the path including the destination. In general we indicatea path as a sequence of nodes, for example p = n0, n1, . . . , nn�1, where n ris the length of the path.

When needed we will add a superscript f to indicate that a certain ele-ment refers to the forward path, and a superscript b for elements of thebackward path. For example, the forward path would be denoted as p f =

n f0 , n f

1 , . . . , n fn f�1 with n f r. Here n f

0 represents the node closest to S while

n fn f�1 is the destination. Similarly, the backward path is pb = nb

0, nb1, . . . , nb

nb�1with nb < r, where nb

0 is the node closes to D and nbnb�1 is the node closes to

the source (not the source itself, which we do not consider as a node).

Other notation used in this chapter was defined for the cryptographic back-ground, in section 2.2. Note that in the background a number of functionswere presented that have a length parameter for the input and/or outputdomain. In our treatment it should always be clear from the context whatthe size of these parameters are10.

4.2.3 Forwarding Segment Collection and Distribution

We now show in detail how the FSes are collected during the setup anddistributed during the data transmission phase. As we have seen we will usetwo constructions, the FS payload and the FS dispenser, respectively. Theyhave the goal of hiding their contents, in particular not leaking the length ofthe path nor the position of a node on the path. They also provide integrityprotection. These constructions are based on the one used in Sphinx [11] forthe Sphinx header.

Collection. The FS payload is a bit sequence of constant length (|FS|+ k)r.The basic idea of the construction is that each node ni, when receiving an FSpayload P, can add its forwarding segment FSi to the front of P (the leftmostend), then encrypt the resulting sequence and compute a MAC over it. Like

10We also do not want to clutter this treatment with too much notation, so for some ofthe finer details we invite the reader to look at our well-documented implementation (seechapter 6).

29

Page 37: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

4. HORNET: High-speed Onion Routing at the NETwork Layer

with the FS, the node then prepends this MAC to the sequence (i.e., it adds itto the leftmost end). Doing so however would increase the length of the FSpayload, so actually the first thing ni does is dropping the rightmost |FS|+ kbits before appending FSi (remember from section 2.2 that the length of aMAC is k). The scheme is shown in Figure 4.1.

FS c.

MAC FS d.

b.

c.

a.

Figure 4.1: Insertion of an FS into an FS payload: (a.) shows the input FS payload, (b.) showshow the last |FS|+ k bits are dropped, (c.) how the FS is prepended and (d.) how the wholeis encrypted and MAC-ed. The FS payload thus obtained has the same length as the input FSpayload.

In Algorithm 1 we formally specify the process. We will refer to this pro-cedure as add fs. From its specification it is possible to see that the actualconstruction allows each node so insert not just his FS, but also an addi-tional string (z). We will see later in subsection 4.2.4 for what it is used, butin general this will have a fixed size (the same for each node). The totallength of the FS payload therefore is actually (|FS|+ |z|+ k)r.

Algorithm 1 add fs: adds an FS and a string z to an FS payloadInput: s key shared with the source

FS forwarding segment to be inserted into the FS payloadz additional information to be inserted into the FS payloadPin the FS payload received

Output: Pout new FS payload containing FS and z

1: d |FS|+ |z|+ k2: Ptmp

n

FS k z k Pin [d..(r�1)d]

o

� PRG(hPRG(s))[k..end]

3: a MAC(hMAC(s); Ptmp)4: Pout a k Ptmp

To create an initial empty FS payload Pinit, the source will just fill it withrandom bits. Hop after hop, the nodes using add fs drop blocks of this

30

Page 38: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

4.2. Protocol Design

initial FS payload (to which every previous node added a layer of encryp-tion). Clearly an FS payload of length (|FS|+ |z|+ k)r should not be usedfor more than r hops, else the (r + i)-th node would drop what the i-th nodepreviously added.

The process used by the source to retrieve all the FSes from an FS payloadis basically the reverse of the operations in add fs done by the nodes. Giventhat the source knows all the shared keys that where used by each node, itcan remove the layers of encryption added by each node, from last to first,and after every decryption S gets one FS in plaintext. To check that allMACs are correct the source could then, starting with the initial random FSpayload Pinit, simulate each step done by the nodes and thus verify whetherall MACs are correct.

The reason why the source needs to first retrieve all the FSes and then doa second pass to do the integrity verification is that it cannot compute theMACs directly. This is due to the fact that, with add fs a node computesa MAC over Ptmp (line 3), but then the next node on the path will drop thelast part of Ptmp. Without the complete string over which a MAC has beencomputed it is not possible to verify the MAC. There is in fact a way forthe source to be more efficient than this by only precomputing the blocksthat are dropped by the nodes at each step. This is what is done in Algo-rithm 2 (retrieve fses), but we will not discuss the reasoning behind thecomputation used to do so.

Distribution.

The construction of the FS dispenser, which is used during the data trans-mission phase, is in some respects similar to the one we just saw for theFS payload. We first show how the actual distribution works, i.e., howeach node can retrieve his FS, before showing how the FS dispenser is con-structed. At each hop the FS dispenser is a bit string composed of threeparts: a plaintext FS for the current node, a MAC g, and an encrypted partb. When the i-th node receives the FS dispenser, it can get its forwardingsegment FSi from the front, and use its secret value SVi to decrypt it andobtain the shared key si contained in FSi. With this key it can verify theMAC gi (which is computed over FSi k bi), and decrypt bi.

To keep the length of the FS dispenser constant, before the decryption ni ap-pends a zero-padding of length |FS|+ k to the end of bi, and then decryptsthe whole resulting string. To be precise, this is done by x-oring with theoutput of a PRG (see subsection 2.2.1), so to be precise by computing thefollowing:

(bi k 0c)� PRG(hPRG(si))

where c = |FS|+ k. The result is the new FS dispenser for the next node,which consists in the bit string FSi+1 k gi+1 k bi+1. This process is depicted

31

Page 39: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

4. HORNET: High-speed Onion Routing at the NETwork Layer

Algorithm 2 retrieve fses: retrieves all FSes and additional z strings fromthe FS payloadInput: Pf inal final FS payload

Pinit initial FS payloads0, . . . , sn�1 keys shared with the nodes

Output: FS0, . . . , FSn�1 List of FSes retrieved from Pf inalz0, . . . , zn�1 List of z strings retrieved from Pf inal

1: d |FS|+ |z|+ k2: y Pinit [(r�l)d..rd�1] � PRG(hPRG(s0))[(r�l+1)d..end] k 0d

� PRG(hPRG(s1))[(r�l+2)d..end] k 02d

· · ·� PRG(hPRG(sl�2))[(r�1)d..end] k 0(l�1)d

3: Pf ull = Pf inal k y . Append part made of all the dropped blocks4: for i (n� 1), . . . , 0 do5: check Pf ull [0..k�1] = MAC(hMAC(si); Pf ull [k..rd�1])

6: Pf ull Pf ull � (PRG(hPRG(si)) k 0(i+1)d)7: FSi Pf ull [k..k+|FS|�1]8: zi Pf ull [k+|FS|..d�1]9: Pf ull Pf ull [d..end]

10: end for

in Figure 4.2. We refer to this procedure as get fs.

a.FSi MACi FSi+1 MACi+1

0x00000000 b.FSi+1 MACi+1

c.FSi+1 MACi+1

Figure 4.2: Retrieval of FSi from the FS dispenser and decryption of f si+1. (a.) The FSdispenser received by ni has FSi and the MAC in plaintext, directly accessible. (b.) ni removesFSi and the MAC from the front and appends a zero padding of length |FS|+ k. (c.) Node nidecrypts the entire block, obtaining the new FS dispenser for the next node.

The construction of the initial FS dispenser is done by the source by com-puting the reverse of the get fs procedure for each node, starting from theFS dispenser as it looks after the processing by the last node. The difficultyin this comes from the fact that, as we saw, when computing get fs eachnode adds a zero-padding which is then x-ored with part of the output ofa PRG. This means that when the FS dispenser reaches the last node (the

32

Page 40: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

4.2. Protocol Design

destination) it will contain a large final part made of successively encryptedzero blocks. The source will therefore first compute just this part, whichwe denote with j, and only once this is done do the reverse processing ofget fs. We show the details of this scheme in Algorithm 3. We call theprocedure create fs dispenser.

Algorithm 3 create fs dispenser: Creates an FS dispenserInput: s0, . . . , sn�1 keys shared with the nodes

FS0, . . . , FSn�1 List of FSes to insert into the FS dispenser

Output: FS0 k g0 k b0 Initial FS dispenser

1: j0 #2: for i 0, · · · , l � 1 do3: ji (ji�1 k 0c)�

n

PRG(hPRG(si�1))[(r�i)c..rc]

o

4: end for5: bl

rand(c(r� l)) k jl�1

. rand(l): random bit string of length l6: for i (l � 1), . . . , 0 do7: bi

n

FSi+1 k gi+1 k bi+1[0..c(r�1)�1]

o

� PRG(hPRG(si))

8: gi MAC(hMAC(si); FSi k bi)9: end for

With this we conclude our description of the constructions of the FS payloadand FS dispenser. For further details the reader may wish to consult ourwell documented implementation, and in particular for the FS dispenseralso the header construction of Sphinx [11].

4.2.4 Session Setup Phase

The setup phase consists of a round-trip made of two packets which arerouted using Sphinx [11] (see subsection 3.2.1), plus data packet. The firstsetup packet is sent by the source along the forward path p f to the destina-tion, and it contains in its payload a reply Sphinx header which the desti-nation then uses to send back the second setup packet along the backwardpath pb.

The format of these first two packets is shown in Figure 4.3. In particularthey contain an FS payload: when these packets traverse the forward andbackward path, each node will add the FS it creates to it before forwardingthe packet. The header also includes an EXP timestamp11, indicating the

11There are a number of complications with this timestamp. One is its integrity protection:to solve it the source will extend the per-hop MACs included in the Sphinx header to coverthe EXP timestamp. Other problems include the fact that this timestamp could constitute an

33

Page 41: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

4. HORNET: High-speed Onion Routing at the NETwork Layer

Figure 4.3: Format of the first two packets of thesession setup phase. The type field is mainly usedto allow a node to distinguish a setup packet froma data packet (i.e., one of the data transmissionphase). EXP indicates the expiration time for a ses-sion. The Sphinx header and payload are exactlyas specified in the original protocol [11]. The FSpayload is described in subsection 4.2.3. Note thatthe di↵erent parts are not in scale, in particular theSphinx header and payload and the FS payload aremuch larger compared to the type and EXP fields.

expiration time of the session, that as we will see below each node includesinto its FS.

We now describe more in detail how the setup protocol is defined, analyzingthe main steps:

1. the creation of the first packet by the source;2. the processing of setup packets by the nodes;3. the processing of the first packet and creation of the second by the

destination;4. the processing of the second packet by the source, the construction of

the anonymous headers, and the creation of the third packet, which de-livers the backward anonymous header to the destination, concludingthe setup phase.

After this we will see how the protocol can be adapted to achieve forwardsecrecy.

Initialization by the Source. To setup a HORNET session the source Screates a first setup packet as follows. It determines the expiration timeEXP of the session, and generates an empty FS payload P f

init (i.e., made ofrandom bits). Then S generates a Sphinx header to be routed along p f , andincludes in the corresponding payload a reply Sphinx header to be routedalong pb (which will be used by D). It then sends out this packet to n f

0 .

Processing by the Nodes. The nodes on both p f and pb do first the process-ing of the Sphinx header and payload: if this results in an error (because ofa failed verification of the MAC, or an incorrect packet format), the entirepacket is dropped. Otherwise the Sphinx processing will return the forward-ing information extracted from the header. After this a new FS is generatedand added to the FS payload using add fs (algorithm 1). The processed

identifier of the packet over multiple hops, which is something that in our security goals wewanted to avoid. We discuss how this can still be achieved in subsection 5.1.1.

34

Page 42: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

4.2. Protocol Design

elements (FS payload and Sphinx header and payload) are then forwarded,the type and EXP field of the setup packet remaining the same.

Regarding the creation of the FS, the way we defined it in Equation 4.1 wasjust to provide the general idea, but it may actually be insecure. As wediscussed in subsection 2.2.1, encryption without a nonce could be insecureif the same key is used to encrypt different messages. This is exactly thecase here, since a node uses his secret value SV to generate the FSes for allsetup packets that traverses it.

To solve this we could simply require encryption with nonces instead, butthis increases the size of the FS, which is something that we wish to avoid12.We could also use a PRP keyed with SV, which would solve this issue, butthis solution is limited in practice by the fact that the block size of availablePRP implementations is always smaller than the length of all the contents ofan FS. Instead we use the following, slightly more complex construction:

FSi = PRP(SVi; si) k enc(SVi; hashv(si); Ri k EXPi) (4.2)

An FS contains therefore the shared key si, the routing information Ri andthe expiration time EXPi. To retrieve them from FSi, node ni would firstdecrypt the first part (the leftmost k bits) to obtain the key. Then it wouldcompute the hash hashv(si) to get the initialization vector to decrypt the rest.

Processing by the Destination. When destination D receives the first setuppacket, it first processes it as done by the previous nodes. This includescreating a forwarding segment and adding it to the FS payload. Doing soD obtains the final FS payload of the forward path P f

f inal . It then retrievesthe reply Sphinx header from the payload and uses it to construct the sec-ond packet. For the EXP timestamp D uses the same of the first packet.It puts P f

f inal in the Sphinx payload, and generates a new FS payload asPb

init = PRG(hPRG(sD)), where sD is the newly established key shared withthe source. Having thus composed the second packet, the destination sendsit to nb

0 (the first node on the backward path).

Processing by the Source. Once the source receives the second setup packetit obtains the final FS payload for the backward path, Pb

f inal , and can retrieve

the final FS payload for the forward path, P ff inal , from the Sphinx payload.

Using the procedure retrieve fses (algorithm 2), S can get the forwardingsegments for all the nodes on the forward and backward paths. For the

12An increase in the length of each FS by c bits means an increase in the size of theanonymous headers by c · r bits, where r is the maximum number of hops on a path. Largerheaders in turn mean that more bandwidth is used just for the routing process, so the maxi-mal throughput of the protocol in terms of number of payload bytes transferred per unit oftime decreases. We discuss this further in section 5.3.

35

Page 43: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

4. HORNET: High-speed Onion Routing at the NETwork Layer

latter it will need the initial FS payload Pbinit generated by the destination,

which S can easily recompute.

With the FSes the source then constructs the initial FS dispensers with pro-cedure create fs dispenser (algorithm 3), one for the forward path, T f

init(containing the FSes of the nodes on p f ), and one for the backward pathTb

init (containing the FSes of the nodes on pb). Using T finit the source can

send data packets to D, as specified in the next section (4.2.5). The thirdsetup packet is of this type, and the source uses it to deliver the backwardanonymous header, constructed using Tb

init, to the destination. Once D ob-tains the anonymous header both end host can communicate using datapackets, and we consider the setup concluded and the session established.

Adding Forward Secrecy. For now we have assumed that the nodes wouldstore inside their FSes the key established with Sphinx. While this is apossibility, it does not provide forward secrecy, since the shared keys inSphinx are established using the long-term public keys of the nodes. Instead,we use some additional steps to still be able to achieve this property.

At each hop Sphinx allows a node to establish a shared key with the sourcethrough a Diffie-Hellman key exchange (see subsection 2.2.3). As we saw insubsection 3.2.1, in order to establish the shared keys, in the Sphinx protocolthe source initially generates a temporary public key gx, and includes it inthe Sphinx header. The first node (mix) uses this for the key establishment,then blinds it with a value b0, obtaining gxb0 . This group element is usedto replace gx in the processed header, and is then used by the second nodefor the key establishment. For the whole path it is therefore as if the sourcewould perform a separate Diffie-Hellman key exchange which each node,using the following temporary private keys:

xS,0, xS,1, . . . , xS,n�1 = x, xb0, . . . , (xb0 . . . bn�2) .

To achieve forward secrecy each node ni generates a temporary private keyxi,temp, and computes the key si = (gxS,i)xi,temp . This key si is then stored (en-crypted) inside the forwarding segment f si. To allow the source to computethe same key, ni inserts into the FS payload not just FSi but also the tempo-rary public key gxi,temp (as the z string, see procedure add fs, algorithm 1).This way when the source uses retrieve fses (algorithm 2) it also gets allthe temporary public keys of the nodes and can complete the Diffie-Hellmankey exchanges and get the new shared keys s0, . . . , sn�1. Note that for therest of the processing of the FS payload both the nodes and the source usethe keys established with Sphinx (else the source would not be able to re-trieve the FSes from the final FS payload).

It should be noted that forward secrecy as described covers only the datatransmission, not the session setup. We discuss this point in subsection 5.1.3

36

Page 44: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

4.2. Protocol Design

4.2.5 Data Transmission Phase

When a session setup is completed, both end host, source and destination,have an anonymous header which allow each to send anonymous traffic tothe other. An anonymous header mainly consists of an FS dispenser which,as we saw in subsection 4.2.3, allows each node on the path to obtain theFS it created during the setup. From its FS, each node ni obtains the key sishared with the source, the expiration time of the session EXP, and the rout-ing information to forward the packet. If according to EXP the session hasexpired, the packet is dropped. After having processed the FS dispenser(with get fs, see subsection 4.2.3), ni processes also the data payload (ei-ther encrypting or decrypting it, see below), and then forwards the packetaccording to the routing information in the FS.

type nonce

FS Dispenser

Data Payload

Figure 4.4: Format of data packets. The type fieldis used to allow a node to distinguish a data packetfrom a setup packet, and to see if it is a forwardor a backward packet. The nonce field contains thenonce that was last used to encrypt the data payload.The FS dispenser is described in subsection 4.2.3.The colored part is the anonymous header. Notethat the di↵erent parts are not in scale, in particularthe FS dispenser and the data payload are muchlarger compared to the type and nonce fields.

In Figure 4.4 the format of these packets is specified. The type field allowsa node to distinguish a data packet from a setup packet, but also to knowwhether it is a forward or a backward packet. The nonce is needed for theprocessing of the payload, which we now describe.

Payload Creation and Processing.

To process a data payload, each node checks the packet type: if it is a for-ward packet, it decrypts the payload, if it is a backward packet, it encryptsit. For both operations it uses a nonce derived from the nonce included inthe packet header by PRP encryption. The used nonce is then included inthe processed packet, which is then forwarded to the next node.

For each new data packet the source creates it first generates a new nonce,and simulates the processing of it by the nodes, successively encrypting itwith the PRP. Doing so S obtains the nonces used by each node on the for-ward path. After this it onion-encrypts the data13 it wants to send, starting

13To prevent an adversary from using the length of the payload as a means to recognizepackets across different nodes, the source first needs to pad the data up to a fixed length.

37

Page 45: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

4. HORNET: High-speed Onion Routing at the NETwork Layer

from the destination and adding one layer of encryption for each node onthe reversed forward path, using the corresponding nonce. This way whenthe payload is processed each node can correctly remove one encryptionlayer, and finally the destination can get the original plaintext.

Sending data packets on the backward path works similarly. The destinationgenerates a new nonce and uses it to encrypt once the data it wishes to send.It includes this nonce in the packet header. This time each node on thepath adds a layer of encryption. When the source receives the packet it willremove the layers of encryption starting from the last one added, and ateach step it will also compute the previous nonce by decrypting the currentone (with PRP�1). Once the last layer of encryption is removed the sourceobtains the original data sent by the destination.

4.3 Enhancements

In the previous section we have presented HORNET in its basic form, whichis also the one that was used in the implementation (chapter 6). In the dis-cussion of the protocol in the next chapter (section 5.4) we suggest way thisprotocol might be composed with other protocols to achieve higher security.Here we suggest two ways in which the protocol itself could be modified inorder to have additional features or better security.

4.3.1 Session Re-establishment

We saw that the destination creates a forwarding segment that is then in-cluded by the source into the forward anonymous header. One benefit ofdoing this is that there is no need for the destination to keep state beforereceiving the third setup packet. However, it also allows for an even betterfeature which is to have a fast session re-establishment.

There could be cases in which the destination loses the backward anony-mous header for a certain session. For example, a server with many anony-mous users may want to reduce the amount of anonymous headers it has tostore by deleting some of them, for instance those corresponding to sessionsthat have been inactive for some time. In such a case it may seem that asource of such a session has no other option than to perform a new setup.This would be very inefficient, however. A much better alternative is for thesource to just send the backward anonymous header again, basically justreplaying the third setup packet.

Note that this would actually work without changing the specification ofthe destination. This scheme assumes that it is not a problem for a source tostore the backward anonymous headers for all its active sessions, which isusually the case.

38

Page 46: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

4.3. Enhancements

4.3.2 End-to-End Secure Channel

Since our protocol uses onion routing, it actually provides not just anonymitybut also a confidential channel between the source and the destination. Thismight be a desired property for the upper-layer protocols using HORNET. Inmany cases however applications need a secure channel, meaning one thatis integrity protected in addition to confidential. This can be easily achievedhowever: since source and destination share a key, they can add MACs tothe payload to protect its integrity.

To be precise, this would be a confidential and authenticated channel fromthe destination to the source, meaning that the source knows that the mes-sages it receives through it come from the intended destination (the onewhose public key the source used in the setup). For the reverse, the desti-nation cannot know the identity of the source from HORNET (which is ofcourse one of the main security properties of our protocol). However it stillachieves what is called sender invariance, meaning that once the session is es-tablished the destination knows that all the messages it receives come fromthe same source.

39

Page 47: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State
Page 48: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

Chapter 5

Analysis and Discussion

5.1 Security Analysis

In this section we discuss several attacks against HORNET, informally argu-ing the security properties that the protocol provides. We discuss passiveattacks, where the adversary is only able to observe traffic between nodesat some (possibly many) points on the network; active attacks where theadversary can manipulate and inject traffic into the network; and forwardsecrecy.

While a formal proof of security may appear to be preferable to an informaldiscussion of attacks and defenses, the complexity of our protocol (includinga standalone setup phase followed by a data transmission phase, each ofwhich operate with different cryptographic primitives) presently precludesus from formally analyzing its properties.

5.1.1 Passive Attacks on Anonymity

As we saw when presenting our threat model and security goals (see sec-tion 4.1), the protocol’s aim is not to protect against highly targeted attackswhere an adversary spends a large amount of effort on a small set of knowntargets. Instead the goal is to frustrate an attacker doing mass surveillanceof network traffic in an indiscriminate way. This is also why HORNET doesnot try to protect against confirmation analysis, where an adversary onlyneeds to confirm a suspicion on a single link. Instead our protocol defendsagainst traffic analysis, by restricting the ways in which the adversary we de-scribed could dynamically narrow down the scope of its search, which couldbe done by profiling of users, by looking for the same bit patterns at differ-ent points in the network or by monitoring other identifiers (for examplepath lengths).

Session Unlinkability. A source generates a random temporary public/pri-

41

Page 49: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

5. Analysis and Discussion

vate key pair for each session it creates, and uses no long-term identifiers.As a consequence, two different sessions are cryptographically completelyindependent from the source that created them. This means that a nodecannot tell whether two sessions that traverse him are from the same sourceor not. This property is called session unlinkability, and it effectively preventsuser profiling.

Bitwise Unlinkability. From the protocol description it is possible to seethat almost all parts of each packet are either encrypted or decrypted ateach hop, meaning that these parts will be statistically independent1 whenseen at different point on the path. Also consider that when the public keyof the source is blinded at each hop (see subsection 3.2.1 and the Sphinxprotocol specification [11]), the blinding factors are derived from the estab-lished shared keys. When a session is set up traversing two compromisednon-consecutive nodes, the public key of the source is blinded by at leastone factor dependent on the key established with an uncompromised node,which by the DDH assumption (see subsection 2.2.3) is indistinguishablefrom randomness by the adversary. This means that the two public keys ofthe source as seen by the adversary at the two compromised nodes are tohim statistically independent.

There is however a problem with those fields that are not blinded. Specifi-cally, these are the type field and the EXP field (see Figure 4.4 and Figure 4.3).While the type field has only a very restricted number of possible values2, theEXP field was defined as a timestamp, so it could very well become a com-mon identifier for a packet across all the nodes on its path. As we discussbelow, having a timestamp is very important however to limit the ability ofthe adversary to replay setup packets. As we will see, for this we also needto have a loose time synchronization between nodes and end-hosts. To solvethe problem we therefore use a format for the EXP field with low precision(e.g., to 1 minute) and a fixed session duration of similar size (1 minute), sothat because of the time synchronization all sessions active at the same timewill have the same or only a very limited set of possible values.

Path Length and Node Position Hiding. HORNET hides (both from exter-nal observers and from nodes the nodes on a path) the total length of thepath and what the position of each node on the path is, within the limitsof the information leakage from the topology (ILT, see subsection 4.1.3). Asthe reader can easily verify, our constructions of the FS payload and FS dis-penser (see subsection 4.2.3) require each the exact same processing by all

1Here and in the rest of the session, we always refer to computationally bounded adver-saries, so the statistical independence is meant only for efficient tests.

2According to our specification the possible values are 3. For each of them there is adifferent class within which we want to achieve our unlinkability goals. However a setuppacket can never be indistinguishable from a forward data packet, for instance.

42

Page 50: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

5.1. Security Analysis

nodes, and all parts have the same length. Furthermore, the length of thesestructures is padded up to the maximum number of hops on a path (r). Theother parts of the packets also do not leak path length nor a node’s position:in the setup packets this protection is given by Sphinx’s packet format, whilein data packets the payload has always the same length.

In a number of low-latency anonymity schemes timing analysis can be usedto determine the distance to the source or destination by measuring theelapsed time between the transit of packets on the forward path and thetransit of what is guessed to be the response on the backward path [22]. Inour scheme this is more difficult to achieve for two reasons. First, the sourcecould use asymmetric paths that have the least possible amount of commonnodes, so most nodes would see for a certain session only forward or onlybackward packets. Second, in HORNET the shared keys along the two pathsare established independently from each other (from a cryptographic per-spective), so the forward packets and backward packets of the same sessionare indistinguishable from packets of two different sessions3. The possibili-ties for an adversary to do this kind of timing attacks are therefore greatlylimited.

5.1.2 Active Attacks on Anonymity

We assumed in our threat model that the adversary can be active, whichmeans that he is able to inject, modify, replay and drop packets. We alsoassumed that the adversary would want to avoid revealing which nodes arecompromised. In our protocol we try therefore to make malicious behaviordetectable, preferably by honest nodes on paths, but at worst by the endpoints of the paths.

The reasoning behind this assumption on the adversary is that we imaginethat if HORNET were deployed, a number of independent organizationsand researchers would constantly monitor the network for signs of activemisbehavior. Note that, because of properties like session unlinkability thatwe mentioned above, an adversary is forced to choose the sessions to target(with active behavior) randomly. This means that even if an adversary had astrategy presenting active behavior, that allowed him to gain information ona certain flow, in order to use this strategy effectively for mass surveillancehe would have to target a large amount of sessions, which would be highlydetectable.

3Note that the destination will always be able to do this type of round-trip timing. Whilethis is less severe than if the distances to the end points were revealed to an intermediatenode, it is still possible to mitigate this attack by adding some randomized delay beforeanswering on the source. However, this impacts performance, and can only hide the pathlength to a certain extent.

43

Page 51: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

5. Analysis and Discussion

If an adversary engaged in such large scale active behavior, we expect thatit would be possible to “triangulate” the position of the misbehaving nodesby using multiple observation points. We expect, that is, that by comparingdifferent paths, on all of which malicious behavior was seen, it would besufficient to look for nodes that are on all or most of those paths to identifycompromised nodes. Based on this, a trustworthiness score of nodes couldbe measured. In such a scheme, if a node had low score it would be likelythat the node has been compromised. If the compromise is an illegal breachin security, the owner of the node would have an incentive to fix the problem.If the compromise is due to legal coercion, then the fact that large-scaleactive behavior would cause financial damage to the node’s owner mightstill deter the legal authorities from performing such active surveillance.

Integrity Protection. In order to make modifications detectable, both setupand data packet headers are integrity protected, the mac being computed atevery hop. The data payloads are not, but as we mentioned in section 4.3 itwould be easy to add end-to-end integrity, so that the source or destinationwould notice if the payload was modified. It should also be noted that, sincethe payload is encrypted or decrypted at each step, and its size is fixed, itdoes not offer the possibility to effectively tag a packet in order to make itrecognizable at a later point on the path (the adversary would not be able todistinguish a tagged from a non-tagged packet).

The nonce in the header of data packets is not integrity protected, but,equally to what just saw for the payload, a modification of the nonce wouldnot be recognizable at later points in the networks. Since the nonce is used toprocess the payload, if modified it would cause also the payload to change,so if end-to-end integrity is used the modification would be detected.

The last part of the packets we need to consider is the FS payload, which isincluded in the session setup packets. The FS payload is such that once ithas been processed by an honest node, all that was added to it previously isintegrity protected and cannot be changed without the source detecting it4.

Replay Attacks. A more difficult challenge than protection against packetmodification is protection against packet replay. Such protection can beachieved by requiring that all nodes store a hash of each packets they see,so that they can verify that a packet is not a replay by checking that itshash in not among the stored hashes. This is usually done in mix networks,

4In an early design stage this integrity protection was not in place, and we found thatthis made the protocol vulnerable to a position guessing attack. In the attack, a maliciousnode present on both forward and backward path could, during the session setup, try tomodify the FS payload of the forward path included in the Sphinx payload of the secondsetup packet. If the modification (e.g., a flipped bit) was in a position corresponding to theFS of that node (for the forward path), the node could then detect it during data transmission,and learn its position on the path.

44

Page 52: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

5.1. Security Analysis

but is clearly against our principle of not keeping state on nodes (see sub-section 4.1.2). The only alternative is to restrict the validity of packets byincluding an expiration timestamp. This is what we do in HORNET, wherethe timestamp EXP (which we discussed in the previous section) is includedin the header of the setup packets and later in the forwarding segments(FSes) of each node for the data packets. For this we assume that all nodesand end hosts have a loose time synchronization. Within the expiration time,however, replays are still possible. For better protection the end host couldcheck for duplicates. Upper layer protocols might provide simple mecha-nisms to do so (as, e.g., the TCP protocol [46]).

A related attack is that of packet dropping. Both this and replay attacksallow an adversary to introduce timing signatures which could allow himto link together sessions seen at different points in the network. This kindof attack cannot be prevented without padding traffic, so as for the previousattack, solutions to packet dropping will likely require involvement fromthe end-host for detection.

Active Tra�c Analysis. It has been shown that an adversary may exploitthe injection of large amounts of traffic to trace an anonymous path by thechanges in the latency [5][22]. This can be a very effective type of attack, butit is also very noisy and, therefore, highly detectable. It could still be reason-able for an adversary to use this attack to link a small set of known targetsto the destinations they are connected to, but as we mentioned HORNET isnot designed to protect against such a scenario.

5.1.3 Forward Secrecy

We introduced forward secrecy for HORNET at the end of subsection 4.2.4.To have this property means that even if after a session has expired a node iscompromised (the adversary learns the long term private keys), that sessionstill remains secure. First we must clarify that we are assuming that thesecret value SV of each node is a short term key, i.e., the nodes change itfrequently5.

Then we also observe that the forward secrecy we can provide is restricted tothe data transmission phase and the keys used therein. The setup howeveris based on Sphinx, which does not provide forward secrecy. This meansthat, when a node is compromised, the adversary can actually learn the es-tablished Sphinx key, and as a consequence learn the routing information

5This is not a problem as the SV needs to be known to no one else; to still be able todecrypt all forwarding segments for open sessions, nodes could keep at all time two SVvalues whose lifetimes are the same and overlapping for half of them. For new sessions thenode would always use the newer of those values. To avoid having to do two decryptions (incase for the first the MAC check fails), one bit of the FS could be reserved to indicate whichof the two keys needs to be used.

45

Page 53: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

5. Analysis and Discussion

for the compromised node of all the sessions that where stored by the ad-versary. This inherently restricts the property of forward secrecy we canachieve in HORNET to the confidentiality of the data payloads rather thanfor the unlinkability of source and destination.

We compare this to Tor [14], which instead achieves forward secrecy foranonymity protection. Tor is able to do this by using its telescopic pathextension mechanism (see section 3.3): it first establishes a shared key thathas forward secrecy with an onion router, and then, using this key, it requestsa circuit extension to the next onion router. The drawback of such a schemeis that it incurs in higher network latency, as for each node (onion router)the source sends one packet and has to wait for the reply. It is also a schemethat cannot support asymmetric paths, and that cannot hide the position ofa node on a path from an attacker using timing analysis.

Still, if forward secrecy for anonymity were considered a necessary securityproperty, we imagine that a HORNET variant could be defined that uses aTor-like circuit setup to collect the FSes instead of Sphinx. The data trans-mission phase could remain the same.

5.1.4 Protecting the Nodes: DDoS attacks

So far we have always focused on providing security properties for thesource. Here we consider the case of distributed denial of service attacksin which we want instead to protect the nodes from malicious sources.

Since HORNET guarantees that the nodes do not need to store per-sessionstate, denial of service attacks aiming to cause memory exhaustion are notpossible. On the other hand, however, the computations required to de-crypt the packet-carried state (the FS) for each data packet, and even morethe computations required to process each setup packet, make the schemevulnerable to distributed denial of service attacks aiming to exhaust the pro-cessing unit(s) of a node.

To protect against this type of attack the nodes should agree with their neigh-bors on the maximum rate of HORNET packets to be exchanged for eachlink. In particular this should be enforced by the nodes connected to endhosts (these nodes would usually be the ISPs of the end host). Each nodewould then just block traffic exceeding the limit rate, and possibly hold thesender of the extra traffic accountable for the violation, depending on thebusiness agreements between them.

The maximum rate should be established separately and be lower for setuppackets, which require more processing time. We will further discuss DDoSattacks when presenting the results of the evaluation in subsection 6.3.3.

46

Page 54: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

5.2. Anonymous Path Retrieval

5.2 Anonymous Path Retrieval

If before using HORNET to connect to a web server D a source had to openlyask a path server (a DNS-like system6) for the path to reach D, all possibleefforts done by HORNET to keep the link between the source and the webserver secret would be pointless, as the communication with the path serverwould already leak that link. This shows that the process to obtain a pathmust also be anonymized.

The easiest way to achieve this is to use HORNET to connect to such apath server. The source S would first openly request and obtain a pathto reach the path server P (this is a functionality that must be providedby the network architecture). Then S can establish an anonymous sessionwith our protocol to connect anonymously to P, and obtain the path tothe web server D. Other possibilities would be private information retrieval(PIR) schemes [33][34], which allow the source to performs queries withouteven the queried server(s) knowing what information was retrieved. Suchschemes tend to impose a large overhead however.

Similar considerations to those made for path lookup can also be madeabout the retrieval of the public keys of the nodes, so a good solution wouldbe to have a single system that can provide a path to the desired destinationas well as the public keys of all the nodes on that path.

5.3 Memory-Bandwidth Trade-o↵

By using packet-carried state, as discussed in section 4.1, we allow HORNETto be highly scalable, and deployable on devices with low memory. Howeverwhat is gained in terms of saved memory is lost in terms of bandwidth. Thefact that the state of the nodes is included in the header of each packetimposes a considerable overhead on the size of the packet headers, meaningthat for the same amount of data transmitted an anonymous session requiressignificantly more bandwidth than a plain (insecure) transmission.

As we will see more in detail in subsection 6.3.2 when evaluating the im-plementation of HORNET, the size of a data packet in our protocol is givenby r · 52B + N, where r is the maximum number of nodes on a path andN is the size of the payload. In the evaluation we argue that r = 7 andN = 1000B are reasonable assumptions: for these values the size of the datapacket header is 364 B, and the of the whole packet is 1364 B. This showsthat the header takes up over one fourth of the total packet size. In practicethis would mean that if a fraction a of the network traffic were anonymized,

6DNS is the Domain Name System, used to lookup network addresses of services bytheir name [46].

47

Page 55: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

5. Analysis and Discussion

then the capacity of the links would need to be increased on average by afactor of 0.27a.

This is a generous estimate, as the size of the data packet header could easilybe smaller. The per-hop size of 52 B is obtained from the elements that areadded for each hop to the FS dispenser: one MAC (16 B), one shared key(16 B), the routing information (16 B) and the expiration timestamp EXP (4B). In some cases it is possible to reduce some of these sizes: if a MAC needsto resist only online attacks, as is the case in our protocol (the validity of anFS is limited by EXP), it might be sufficient to reduce its size to 8 B. Also therouting information could very well be smaller, as it only has to point oneof the adjacent nodes. It could even be as small as 4 bytes (which is the sizeof IP addresses [46]). So in this best case scenario the header size would ber · 32 = 224 for r = 7.

Even in this case the overhead is still significant. Additionally there is alsothe fact that the payload is to maximum length, so that would also introduceoverhead in a number of scenarios. To solve this problem either the capacityof the links in the network has to be adapted, or per-node restrictions onthe anonymized traffic must be enforced (like those just discussed for DDoSprotection, subsection 5.1.4).

5.4 Composability with Other Protocols

HORNET can be used together with other schemes to have stronger secu-rity, i.e., to be able to protect against a stronger adversary. One possibilityis to compose our protocol with a lower-level scheme (layer 2, see subsec-tion 2.1.1), for example link encryption between nodes. Since in our modeleach node is associated with a public key, it is easy for a pair of adjacentnodes to dynamically establish a shared key and use it to encrypt, and pos-sibly authenticate, all the communications between them. This would allowto protect against a stronger adversary that is able to eavesdrop on networklinks (making our protocol closer to Tor [14], which uses Transport LayerSecurity, TLS).

The other possibility is to compose HORNET with a higher level protocol.Here there are many possibilities; a simple but potentially very effective oneis to have proxies that relay the traffic through anonymous HORNET ses-sions. Just one relay would probably be enough to remove the informationleakage from the topology (ILT) which we saw in subsection 4.1.3, thus in-creasing the anonymity set sizes of the source and destination. Other suchschemes could be obtained by composing HORNET with protocols of thetransport or application layer, for example TLS, VPNs and even Tor.

48

Page 56: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

Chapter 6

Evaluation

In this section we present the evaluation of an implementation we made ofHORNET. We will first give an overview of the implementation itself, whichwe consider also a way to specify in detail the protocol. Afterwards wepresent the results of a number of tests to evaluate HORNET’s performance.We first show in section 6.2 the preliminary code profiling we did to findand if possible remove bottlenecks, and to generally optimize the protocol.

We then present our main findings in section 6.3. These results are to be in-tended as a preliminary performance assessment, rather than as a completeand definitive evaluation of the limits of HORNET’s theoretical design.

6.1 Implementation

We implemented HORNET to be able to evaluate its performance, but alsoas a way to check the correctness of the protocol. As programming languagewe used Python [38], which allows to do rapid prototyping1. We also imple-mented Sphinx [11], which as we saw in section 4.2 is needed for settingup HORNET sessions. Rather than using the reference implementation ofSphinx provided by Danezis and Goldberg, we re-implemented the proto-col from scratch which enabled us to include the necessary changes for theprotocol to work in HORNET.

Our implementation of the entire protocol (including Sphinx) has a total of1891 LOCs (543 of which for Sphinx) and 21 classes (exceptions excluded, 5classes for Sphinx). Though the implementation was meant to serve mainlyfor the testing purposes we mentioned, its code was written in high quality

1As an interpreted language, Python is in general slower than a compiled low-levellanguage like C/C++. For maximum performance in an implementation meant to be used inproduction code we would opt for the latter kind of language. Our python implementationcan still be used as a blueprint for such an implementation.

49

Page 57: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

6. Evaluation

so that it could still be used as a base for a future implementation meant forproduction: the comment lines to code lines ratio is about 2:3, and the pylint(a source code quality metric [37]) score of the code is 9.2/10.

6.1.1 Interfaces and Modularity

The code was designed with a modular approach, which allows it to beused on top of any FIA implementation (as long as it can be interfaced withpython). The main classes of our Python implementation (HornetSource,HornetNode and HornetDestination) expose an interface in which methodsthat require packets to be sent will return these packets to the caller insteadof sending them directly, e.g., through a socket. This allows to have maxi-mum flexibility.

These interfaces all expose methods to process packets that return instancesof a ProcessingResult class: these return values indicate for example that anew session was established, returning an identifier for that session (whichcan then be used to send data packets), or that the processing of a packetfailed (e.g., incorrect MAC). In general in the design of these interfaces weavoided exception-driven programming for better performance.

6.1.2 Notable Issues and Lessons Learned

As we mentioned, one of the benefits of having an implementation is thatthis allows to verify that all parts of the protocol have been considered, andthe whole is correct (formally, this means that it works in absence of an ad-versary). Indeed it helped us find certain aspects that we had not consideredduring the design, and even forced us to change the design in some points.

It was helpful in finding the right way to concretely handle the additionof forward secrecy to the protocol. Another points was payload padding:in some cases padding schemes can lead to security vulnerabilities (leak ofthe actual payload length), so it should not be done in some ad-hoc way.We used the ISO/IEC 7816-4 padding scheme which requires a 0x80 bytefollowed by 0x00 bytes. However, the best way to avoid attacks based on thepadding would be to have end-to-end MACs, which we discussed amongthe possible protocol enhancements in section 4.3.

The limitations on cryptography where probably the most notable problemwe encountered. While in theory a number of primitives are assumed to beavailable, in practice the commonly used libraries only provide a restrictedset of operations2. This affected our protocol in particular in two points.

2In a way this is a security feature, as the cryptoanalysis and testing of these implementa-tion is more thorough when the efforts of the community are focused on a small number ofprimitives; it also restricts the possibilities for non-expert users to use cryptographic schemesin ways that could be insecure.

50

Page 58: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

6.2. Initial Profiling

One is the creation of forwarding segments, where the availability of a PRPwith large block size would have allowed us to have a more simple scheme(cf. Equation 4.1 with Equation 4.2). The other point, which we did notmention in chapter 4, is the opposite, i.e., the unavailability of PRPs forsmall block sizes3: we would have needed to have such a scheme to supportsmall nonces for payload encryption (see subsection 4.2.5). Instead in theimplementation we had to use 16-bytes nonces (16 bytes is the smallest blocksize for AES [32]).

These problems show the importance and necessity of considering whatschemes are actually available and tested when one is designing protocolsthat require cryptography, as the availability or not of certain schemes mightwell influence the design.

6.2 Initial Profiling

For the initial profiling we used cProfile, a standard Python tool that al-lows for a protocol run to obtain results like the list of functions that werecalled, how many times they were called and how much time was spentinside each of them. Such measurement tools have limits as they are usu-ally not able to (completely) remove the perturbation they add to the results.Nonetheless they are very useful to find bottlenecks and other problems inthe code.

We ran the profiling tool over a simulated protocol run, with no actual net-work communication. For the setup phase we found a problem with thestream ciphers we were using. We had implemented stream cipher encryp-tion (which we formally indicated as x-oring the plaintext with the outputof a PRG) through AES [32] in counter mode, available from the PythonCryptography Toolkit (PyCrypto) [29], which is the standard cryptographiclibrary in Python. It turned out that this type of encryption was requiring agreat amount of CPU time, so much that on the whole it took more than theasymmetric cryptography processing. This was of course completely unex-pected as a Diffie-Hellman key establishment notoriously requires a numberof CPU cycles which is over three orders of magnitude larger than that forencryption with AES.

The profiling further revealed that the main culprit of the slow processingwas the initialization of the Crypto.Util.Counter object. Also the rest ofthe processing done by AES was slower than expected, however. It seemsa problem of the Python implementation, since known performance evalua-tions of this kind of encryption in low level compiled languages show much

3Schemes exist, and are used for example in credit card applications, but they are notavailable in python libraries.

51

Page 59: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

6. Evaluation

better results4. This is a problem that we cannot solve (we did not find al-ternative cryptographic libraries), so if we were to implement HORNET forproduction we would use a compiled language for which high-performancecryptographic modules are available, instead of Python.

Comparison with Simulation. During the research on HORNET, a simu-lated implementation in C of a node was tested, and the results showed thatindeed much better performance can be achieved using a compiled low levelprogramming language. We will discuss this simulated implementation fur-ther when discussing the results of our performance measurements, but wewill not go into the details of it as the author of this thesis did not contributeto this part of the research.

6.3 Performance Measurements

We now present the performance evaluation of HORNET. We measured theprotocol on an algorithmic level, not on a system level, because for an evalu-ation of the latter kind we would need to fix an underlying network architec-ture (e.g., one of the FIAs presented in subsection 2.1.3), which is somethingthat we wish to avoid. With a system level evaluation many additionalfactors are to be considered, like multi-threading, buffering strategies, etc.,while we wished to just focus on the protocol. For this reason we did notmeasure performance by the throughput of a source or of a node; we mea-sured it instead by processing time of the algorithms.

For performance measurements we used the timeit Python utility, whichallows to run a certain piece of code a given number of times, measuringthe total time of each run (no intermediate measurements). All tests wererun on a desktop machine running Red Hat Enterprise 6.6, with an Intel®

Core™ i5-4570 processor (3.20 GHz) and 16 GB of RAM.

6.3.1 Experiment Design

For the experiments we identified three main factors we could vary whichcould influence the performance of HORNET: the maximum number ofnodes on a path (r), the number of nodes on the chosen path (n) and thesize of the data payload (relevant only for the data transmission phase). Forsimplicity we consider a scenario where the path is symmetric, i.e., the samenodes are on the forward and backward path. We did not consider other fac-tors, as for example the security parameter k, which we fixes as constants5.

4Actually PyCrypto uses native code to run most parts of the algorithms, so the problemseems really to be in the Python initialization.

5The available cryptographic implementations also limit the sizes of keys they allow,making an evaluation for different values of the k parameter not possible.

52

Page 60: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

6.3. Performance Measurements

For each factor we decided on a small number of possible levels6. For thesize of the payload we looked at a relatively recent survey on the distributionof packet sizes in the Internet [42], which shows that the packet sizes arestrongly bimodal, with one mode at 40 B (40%) and another at 1500 B (20%),and that the rest of the packet sizes are between these modes and close tothem. This led us to choose one level at 500 B and the other at 1500 B.The reason for choosing the level of 500 B rather than 40 B is because thepayload size has to be large enough to contain the backward anonymousheader when this is sent to the destination to conclude the session setup(see subsection 4.2.4). In the rest of this section we let N denote the payloadsize of data packet for brevity.

The reason why we do not fix the value of r is that depending on the under-lying network architecture different values would be reasonable. If nodescorrespond to ASes, as might be the case for NIRA [49] or SCION [50], thenour discussion about network topology in subsection 4.1.1 would suggestthat r = 7 would be a reasonable choice, assuming that the longest pathwould be between two tier 3 ISPs, traversing two tier 2 and two tier 1 ISPs(remember that in the path length we also count the destination as a node).Other FIAs like Pathlets [17] might have longer paths, but because of thelack of real-world FIA testbeds or deployments, reliable path length mea-surements are not available. Thus we choose r = 10 as second level, as wethink that longer paths would add excessive overhead. Should an implemen-tation require a higher maximum we would advise to try to find a differentmapping of the node abstraction onto the network entities7.

For n we chose the set {2, ..., 10}, excluding n = 1 which would be the caseof a destination connected directly to the source, which is a scenario thatwould provide no anonymity at all. The upper limit is due to the fact thatn r. Each experiment was made by taking 100 samples (i.e., measuring thetiming of 100 runs), and all experiments were replicated 4 times to obtainthe average mean and the average standard deviation. In all results wefound that the standard deviation was below 1% of the mean (coefficientof variation < 0.01), a result that we attribute to the fact that the machineon which we ran the tests was not running any other user applications andpresented therefore very little noise. When presenting the evaluation resultswe will therefore avoid to specify the standard deviation.

6By levels we mean the possible values of a factor. In general we adopt the terminologyof Jain [25].

7We will see below that this factor does not have a significant impact on performance, sothe results would be similar for cases where r > 10.

53

Page 61: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

6. Evaluation

6.3.2 Payload Size and Maximum Path Length

The data payload size N and the maximum path length r are two importantfactors in that together they determine the size of the data packets, and radditionally determines also the size of the first two setup packets. Specifi-cally, the size of a data packet is given by (52 · r)B + N, and the size of thefirst and second setup packet8 by (48 + 116 · r)B.

Figure 6.1: Comparison of the total processing time for the transmission of a data packet fordi↵erent payload sizes, 500 B and 1500 B, showing that the payload size has a limited influence.The two configurations considered, (n = 2, r = 7) and (n = 10, r = 10), are the limit cases forthe considered factor levels.

However, we expected these two factors to have only a marginal impacton performance, since both only influence the length of certain values thatare encrypted or decrypted, but not the number of encryptions and decryp-tions that are performed. Since for these symmetric-key operations the mostexpensive part is the initialization of the ciphers rather than the actual pro-cessing of the plaintext or ciphertext [32] (confirmed also by our profilingresults), it seemed reasonable to assume that the impact of r and N wouldbe low.

Our evaluation showed that our assumption was correct. In Figure 6.1 weplot the differences between the two levels for the payload in the two limitcases of (n = 2, r = 7) and (n = 10, r = 10). As can be seen, the impact ofthe payload size is minimal. For r the results were even more extreme, withthe difference being below 2% in all configuration (all combinations of n and

8We will not consider the size of the setup packet further since typical applications weassumed for HORNET are the browsing of web pages, which can have a size a few hundredkB to a few MB, meaning that with a payload size of 1000 B the communication wouldrequire hundreds to thousands of data packets, making the size of the setup negligible incomparison.

54

Page 62: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

6.3. Performance Measurements

N). For this reason in the next section we will only focus on the results forr = 7.

Similarly, we decided to fix the payload size for the rest of the evaluation toan average of the two chosen level, i.e., N = 1000 B. The reasoning behindthis choice is based on the size of data packets. As we saw, the size ofthe the anonymous header is 52 · r bytes, meaning that for r = 7 it is 364B9: having a too small payload would therefore be a waste of computingresources, since the processing time depends mostly on the header, as wesaw. On the other hand, a too large payload would be a waste of bandwidth,so we chose an intermediate value of 1000 B. In practice, applications thatuse HORNET should try whenever possible to use the whole length of thepayload. Alternatively it could also be possible to allow two different sizesto be used, but this would reduce the anonymity guarantees of the protocol.

6.3.3 Performance Results

Having fixed the maximum number of nodes r to 7 and data payload sizeN to 1000 B, we show the results for the evaluation for the different levelschosen for n, the effective number of nodes on the path from the source tothe destination (remember that we considered symmetric paths). Since n rthe set of levels reduces to {2, ..., 7}.

Data transmission. Figure 6.2 shows the results for the transmission of onedata packet. The plot focuses on the complete transmission, including thetotal processing time required by all nodes. We see that even for the longestpaths the latency due to computation is around 1 ms. If we compare this totypical network latencies, which are around 18 ms10, we see that the delayintroduced by HORNET constitutes an increase of less than 10%.

The figure also allows us to see how much of this time is spent by the endhosts (and by difference, how much is due to processing of the intermediatenodes). Comparing the effort required of the source and of the destinationfor the transmission of one packet, we note that the source does a greateramount of computation (mainly consisting in the onion encryption of thepayload) than the destination (which only needs to remove one encryptionlayer). This is a positive aspect that allows web servers acting as destina-tions to have a minimum degree of protection against DDoS attacks. Thisresults, and in general the complete evaluation of the data packet process-ing time, do not change significantly when considering packets being sentby the destination along the backward path to the source.

9This value could be reduced for instance by using smaller MACs.10We measured the roundtrip time to the top 10 global websites (from Alexa’s analyt-

ics [2]) by pinging each website 10 times, taking the minimum for each, and diving themedian of these values by two to obtain an average latency of 17.9 ms. Since these arepopular websites we expect that in general the average value would actually be even higher.

55

Page 63: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

6. Evaluation

Figure 6.2: Processing time of the transmission of one data packet. The total (upper curve)includes the processing time by all nodes.

The processing time of a data packet for a node (each intermediate nodedoes exactly the same steps) was of 130 µs, independently of the pathlength11. This translates to an upper bound on the throughput of a nodeof about 7700 packets per second, or equivalently of 62 Mbps with payloadsize N = 1000 B. More precisely, this is a limit on the throughput of a singleCPU, so given the properties of high parallelizability of the protocol thatwe described in subsection 4.1.2 we expect that for a correctly implementedmulti-threading node with, e.g., 16 cores, the throughput of the whole nodewould actually be around 1 Gbps12.

We do not think that this is the best that can be achieved, however. Weexpect that a more optimized implementation in a compiled language likeC, rather than Python, could achieve performances that are greater than oursby one or even two orders of magnitude. The evaluation of the simulatedimplementation we did (see section 6.2) strongly suggests that this may bethe case, as on a 16 core software router it achieved a throughput of 93 Gbps.

With this we conclude the presentation of our main findings for the datatransmission phase of HORNET. To complete the evaluation we now analyzethe performance of the session setup phase.

Session Setup. For our evaluation of the setup phase we followed the samescheme we used for the data transmission phase, and the same parameters.

11If the processing time depended on the number of hops on the path, our scheme wouldleak this information, i.e., the length of the path, to the nodes, which is something that wewanted to protect against in our security goals (see subsection 4.1.4).

12We leave the verification of this assumptions as future work.

56

Page 64: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

6.3. Performance Measurements

In Figure 6.3 one can see the entire processing time required for a completesetup, including both end hosts and intermediate nodes. As for the trans-mission of a data packet, we ran the whole processing, from the creation ofthe first setup packet to the processing of the third data packet by the desti-nation, on one machine. For example, when a HornetNode object returneda process packet as a byte sequence, this was immediately given as input tothe next HornetNode object on the path.

Figure 6.3: Processing time of a complete session setup. The total (upper curve) includes theprocessing time by all nodes.

For n = r = 7 the setup required 37 ms. Given a one-way network delay of18 ms as considered earlier, the total network delay for the three packets ofthe setup would amount to 54 ms, and the total time required for a setupwould be below 100 ms13, which seems acceptable for usability. Comparingthis value with the time required to send a data packet, which was approx-imatively 1 ms, we see that if over 400 data packets are sent (400 kB) thenthe time required by the setup is less than 10%.

Like for the data transmission, the setup-packet processing time for eachnode was constant independently of the number of nodes on the path, as ex-pected. The value of the processing time was 0.92 ms. We see that this couldpotentially be a problem, as it means that even a CPU entirely dedicated toprocessing setup packets could not achieve a throughput higher than ⇠1080setup packets per second. While some of the considerations made previ-

13This computation is done assuming that the network implementing HORNET would besuch that the time the packets spend in queues (buffers) on HORNET nodes is similar to thequeuing time on current Internet routers.

57

Page 65: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

6. Evaluation

ously about the processing of data packets by nodes might still be appliedhere, in this case most of the processing time is due to the Diffie-Hellmankey establishments. For these operations we used native modules, and wedo not expect therefore to have strong benefits from changing programminglanguage. Indeed, the evaluation of our simulated implementation (see sec-tion 6.2) returned a processing time of around 250 µs, which is “only” 4times less than what the implementation in Python achieves.

To avoid DDoS attacks (discussed in subsection 5.1.4) on nodes based onsending a large amount of setup packet through targeted nodes, the onlysolution we see is to have all nodes restrict the number of setup packetsthey will accept, and in particular this should be enforced at the nodes thatreceive traffic directly from sources. For instance, assuming an underlyingnetwork architecture like NIRA [49] or SCION [50] where nodes would cor-respond to autonomous domains, it should be the ISPs that limit the numberof setup packets that may be sent by their users.

With this we have summarized our most important findings about the perfor-mance of our Python implementation of HORNET. A number of questionsstill remain to be solved. As future work we would try to address them, inparticular by doing a complete implementation of HORNET nodes, possiblyin a language like C/C++, and do an evaluation on a system level, to seehow much traffic an actual node would be able to handle.

58

Page 66: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

Chapter 7

Conclusions

In this thesis, we have presented motivation, design, implementation andevaluation of a protocol for anonymous communications, HORNET.

We have shown that the design of HORNET, while complex, follows a clearlogical structure, which derives from the assumptions on the network andthreat models. The design is based on state of the art anonymous protocoldesign, in that it tries as much as possible to build upon existing construc-tions: most importantly, for the setup of an anonymous session it leveragesa provably secure mix network protocol, Sphinx [11]. As part of the designwe have also isolated two cryptographic constructions, called FS payloadand FS dispenser in the context of the protocol, which we expect could beuseful to construct other schemes. We introduced the fundamental conceptof packet-carried state for HORNET’s design, which allows the protocol toscale much more than current anonymity networks.

A detailed analysis of possible attacks, both active and passive, showed thatour protocol is able to effectively achieve the security goals that we had pre-viously defined based on our threat model. We also discussed possibilitiesto extend the protocol and compose it with other existing schemes.

Our performance evaluation shows that the protocol can work with accept-able levels of overhead. The design of our protocol requires no state tobe maintained at intermediate nodes, allowing the number of concurrentanonymous connections to scale massively. These two properties would en-able many millions of Internet users to enjoy the benefits of anonymouslow-latency communications. Through a detailed evaluation, along with up-to-date statistics on real world Internet topologies, we have also identifiedthe optimal values for a number of protocol parameters.

In conclusion, we have defined a scheme that can be included into the net-work layer of a future Internet architecture, and could potentially becomeone of the key building blocks of tomorrow’s Internet.

59

Page 67: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

7. Conclusions

7.1 Future work

While the security and performance results seem promising thus far, thereare possible improvements that could be made in future work. In particular,we would like to implement a full fledged node running HORNET, in acompiled language as C/C++: this would allow us to better investigate thereal limits of the performance of our protocol.

We would also like to provide further guarantees of the security of HORNET,either by means of an automated theorem prover, or by a formal proof. A lastline of research that looks promising is finding a scheme to automaticallyidentify misbehaving nodes, both in the middle of the network (done bynodes) and at its outer edges (done by end hosts).

60

Page 68: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

Bibliography

[1] Carlisle Adams and Steve Lloyd. “Understanding PKI: Concepts, Stan-dards, and Deployment Considerations”. In: (May 2002).

[2] Alexa Top 500 Global Sites. url: http://www.alexa.com/topsites(visited on 04/11/2015).

[3] David G. Andersen et al. “Accountable internet protocol (aip)”. In:ACM SIGCOMM Computer Communication Review 38.4 (2008), p. 339.

[4] Ross Anderson and Eli Biham. “Two Practical and Provably SecureBlock Ciphers: BEAR and LION”. In: Lecture Notes in Computer Science1039.X (1996), pp. 113–120.

[5] N Borisov et al. “Denial of service or denial of security?” In: Proceed-ings of the 14th ACM conference on Computer and communications security(2007), pp. 92–102. url: http://dl.acm.org/citation.cfm?id=1315258$\backslash$npapers3://publication/uuid/3DF6A564-

C3A2-45A7-8B1B-AD3A6C4B142A.[6] Jan Camenisch and Anna Lysyanskaya. “A Formal Treatment of Onion

Routing”. In: Advances in Cryptology – CRYPTO 2005. Vol. 3621. Springer,2005, pp. 169–187.

[7] Ran Canetti, Oded Goldreich, and Shai Halevi. “The random oraclemethodology, revisited”. In: Journal of the ACM 51.4 (July 2004), pp. 557–594.

[8] David L. Chaum. Untraceable electronic mail, return addresses, and digitalpseudonyms. 1981.

[9] G. Danezis, R. Dingledine, and N. Mathewson. “Mixminion: designof a type III anonymous remailer protocol”. In: 2003 Symposium onSecurity and Privacy, 2003. (2003).

[10] George Danezis, Claudia Diaz, and P Syverson. “Systems for anony-mous communication”. In: Handbook of Financial Cryptography and Se-curity. Chapman & Hall/CRC, 2009, pp. 341–390.

61

Page 69: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

Bibliography

[11] George Danezis and Ian Goldberg. “Sphinx: A compact and provablysecure mix format”. In: Proceedings - IEEE Symposium on Security andPrivacy. 2009, pp. 269–282.

[12] W. Diffie and M. Hellman. “New directions in cryptography”. In: IEEETransactions on Information Theory 22.6 (Nov. 1976), pp. 644–654.

[13] Whitfield Diffie, Paul C. Van Oorschot, and Michael J. Wiener. “Au-thentication and authenticated key exchanges”. In: Designs, Codes andCryptography 2.2 (1992), pp. 107–125.

[14] Roger Dingledine, Nick Mathewson, and Paul Syverson. “Tor: Thesecond-generation onion router”. In: SSYM’04 Proceedings of the 13thconference on USENIX Security Symposium (2004).

[15] D. Dolev and A. Yao. “On the security of public key protocols”. In:IEEE Transactions on Information Theory 29.2 (Mar. 1983), pp. 198–208.

[16] Michael J Freedman and Robert Morris. “Tarzan : A Peer-to-Peer Anonymiz-ing Network Layer”. In: Technology (2002), pp. 193–206.

[17] P Brighten Godfrey et al. Pathlet routing. 2009.

[18] David Goldschlag, Michael Reed, and Paul Syverson. “Hiding Routinginformation”. In: Information Hiding. Ed. by Ross Anderson. Vol. 1174.Lecture Notes in Computer Science. Springer Berlin / Heidelberg,1996, pp. 137–150.

[19] Glenn Greenwald. No Place to Hide: Edward Snowden, the NSA, and theU.S. Surveillance State. 2014.

[20] C. Gulcu and G. Tsudik. “Mixing E-mail with Babel”. In: Proceedings ofInternet Society Symposium on Network and Distributed Systems Security(1996).

[21] Danny Hillis. The Internet could crash. We need a Plan B. 2013. url: http://www.ted.com/talks/danny\_hillis\_the\_internet\_could\

_crash\_we\_need\_a\_plan\_b (visited on 03/30/2015).

[22] Nicholas Hopper, Eugene Y. Vasserman, and Eric Chan-TIN. How muchanonymity does network latency leak? Feb. 2010.

[23] Hsu Chun Hsiao et al. “LAP: Lightweight anonymity and privacy”. In:Proceedings - IEEE Symposium on Security and Privacy. 2012.

[24] IETF Secure Inter-Domain Routing (sidr). url: https://datatracker.ietf.org/wg/sidr/charter/ (visited on 04/04/2015).

[25] R Jain. The Art of Computer Systems Performance Analysis. Ed. by R Jain.Vol. 182. Ch. 32 – Queueing Networks. Wiley, 1991, p. 716.

[26] James F Kurose and Keith W Ross. Computer networking: a top-downapproach. Boston: Addison-Wesley, 2003.

62

Page 70: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

Bibliography

[27] Karl de Leeuw and Jan Bergstra, eds. The History of Information Security:A Comprehensive Handbook. Elsevier, 2007, p. 900.

[28] Barry M. Leiner et al. “A brief history of the internet”. In: ACM SIG-COMM Computer Communication Review 39.5 (Oct. 2009), p. 22.

[29] Dwayne Litzenberger. PyCrypto - The Python Cryptography Toolkit. url:https://www.dlitz.net/software/pycrypto/ (visited on 04/10/2015).

[30] Vincent Liu et al. “Tor instead of IP”. In: Proceedings of the 10th ACMWorkshop on Hot Topics in Networks - HotNets ’11 (2011), pp. 1–6.

[31] Jonathan R. Mayer and John C. Mitchell. “Third-Party Web Tracking:Policy and Technology”. In: 2012 IEEE Symposium on Security and Pri-vacy. IEEE, May 2012, pp. 413–427.

[32] Frederic P. Miller, Agnes F. Vandome, and John McBrewster. “Ad-vanced Encryption Standard”. In: (Dec. 2009). url: http://dl.acm.org/citation.cfm?id=1823209.

[33] Prateek Mittal et al. “PIR-Tor: Scalable Anonymous CommunicationUsing Private Information Retrieval”. In: USENIX Security Symposium.2011.

[34] Femi Olumofin and Ian Goldberg. “Revisiting the computational prac-ticality of private information retrieval”. In: Lecture Notes in ComputerScience (including subseries Lecture Notes in Artificial Intelligence and Lec-ture Notes in Bioinformatics). Vol. 7035 LNCS. 2012, pp. 158–172.

[35] J Pan, S Paul, and R Jain. “A survey of the research on future internetarchitectures”. In: Communications Magazine, IEEE 49.7 (2011), pp. 26–36.

[36] Andreas Pfitzmann and Marit Kohntopp. “Anonymity, Unobservabil-ity, and Pseudonymity – A Proposal for Terminology”. In: DesigningPrivacy Enhancing Technologies. Vol. 2009. 2001, pp. 1–9.

[37] Pylint - code analysis for Python. url: http://www.pylint.org/ (visitedon 04/13/2015).

[38] Python Software Foundation. Python Programming Language – OfficialWebsite. 2011. url: http://www.python.org/.

[39] Michael G. Reed, Paul F. Syverson, and David M. Goldschlag. “Anony-mous connections and onion routing”. In: IEEE Journal on Selected Ar-eas in Communications 16.4 (1998), pp. 482–493.

[40] R. L. Rivest, A. Shamir, and L. Adleman. “A method for obtainingdigital signatures and public-key cryptosystems”. In: Communicationsof the ACM 21.2 (Feb. 1978), pp. 120–126.

63

Page 71: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

Bibliography

[41] Jody Sankey and Matthew Wright. “Dovetail: Stronger anonymity innext-generation internet routing”. In: Lecture Notes in Computer Sci-ence (including subseries Lecture Notes in Artificial Intelligence and LectureNotes in Bioinformatics). Vol. 8555 LNCS. Springer Verlag, May 2014,pp. 283–303. arXiv:1405.0351.

[42] Rishi Sinha, Christos Papadopoulos, and John Heidemann. “Internetpacket size distributions: Some observations”. In: USC/Information Sci-ences Inst.[ . . . (2007), pp. 1–7.

[43] Daniel J. Solove. Nothing to Hide: The False Tradeoff Between Privacy andSecurity. Yale University Press, 2011.

[44] Douglas R. Stinson. Cryptography: Theory and Practice, Third Edition.2005.

[45] Paul Syverson. “Why I’m not an entropist”. In: Lecture Notes in Com-puter Science (including subseries Lecture Notes in Artificial Intelligence andLecture Notes in Bioinformatics). Vol. 7028 LNCS. April. 2013, pp. 213–230.

[46] Andrew S Tanenbaum. Computer Networks (4th Edition). Prentice HallProfessional Technical Reference, 2002, p. 912.

[47] The National Science Foundation. NSF NeTS FIND Initiative. url: http://www.nets-find.net/ (visited on 04/13/2015).

[48] P Palfrader U. Moller L. Cottrell and L. Sassaman. Mixmaster protocol— version 2. 2004. url: https://tools.ietf.org/html/draft-sassaman-mixmaster-03.

[49] Xiaowei Yang, David Clark, and Arthur W. Berger. “NIRA: A NewInter-Domain Routing Architecture”. In: IEEE/ACM Transactions on Net-working 15.4 (Aug. 2007), pp. 775–788.

[50] Xin Zhang et al. “SCION: Scalability, control, and isolation on next-generation networks”. In: Proceedings - IEEE Symposium on Security andPrivacy (2011), pp. 1–16.

[51] H. Zimmermann. “OSI Reference Model–The ISO Model of Architec-ture for Open Systems Interconnection”. In: IEEE Transactions on Com-munications 28.4 (Apr. 1980), pp. 425–432.

[52] Philip R. Zimmermann. “The official PGP user’s guide”. In: (May1995). url: http://dl.acm.org/citation.cfm?id=202735.

64

Page 72: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State
Page 73: Rights / License: Research Collection In Copyright - Non …48196/eth-48196-01.pdf · — Glenn Greenwald, No Place to Hide: Edward Snowden, the NSA, and the U.S. Surveillance State

AddendumThe work on the HORNET project presented in the thesis has been extended since the submission of this thesis, and was condensed into a conference paper that has been accepted and presented at the ACM Conference on Computer and Communications Security (CCS) 2015, held in Denver, Colorado.

The reference for this paper is the following:

Chen, C., Asoni, D. E., Barrera, D., Danezis, G., & Perrig, A. (2015). HORNET: High-speed Onion Routing at the Network Layer. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS ’15).

AcknowledgmentsI would like to take the opportunity of this addendum to thank all my co-authors with whom I had the pleasure to work on this project. In particular I would like to thank David for dedicating a lot of time to supervising my work, for helping me out when I needed guidance, and for supporting my ideas. I would like to thank Chen, who allowed me to join the project and who was always open to my questions, arguments, and ideas. And I would like to thank Adrian for his help and useful comments, and for letting me join the group and continue doing research in this area.

I would also like to thank my family and friends for their support, and most of all Francesca, friend and family at the same time, who supported me during the long months of work on this thesis.

Zürich, 25 October 2015

Daniele Enrico Asoni