1 a conflict-free replicated json datatypegoogle docs [14], etherpad [15], novell vibe [16] and...

17
1 A Conflict-Free Replicated JSON Datatype Martin Kleppmann and Alastair R. Beresford Abstract—Many applications model their data in a general-purpose storage format such as JSON. This data structure is modified by the application as a result of user input. Such modifications are well understood if performed sequentially on a single copy of the data, but if the data is replicated and modified concurrently on multiple devices, it is unclear what the semantics should be. In this paper we present an algorithm and formal semantics for a JSON data structure that automatically resolves concurrent modifications such that no updates are lost, and such that all replicas converge towards the same state (a conflict-free replicated datatype or CRDT). It supports arbitrarily nested list and map types, which can be modified by insertion, deletion and assignment. The algorithm performs all merging client-side and does not depend on ordering guarantees from the network, making it suitable for deployment on mobile devices with poor network connectivity, in peer-to-peer networks, and in messaging systems with end-to-end encryption. Index Terms—CRDTs, Collaborative Editing, P2P, JSON, Optimistic Replication, Operational Semantics, Eventual Consistency. 1 I NTRODUCTION U SERS of mobile devices, such as smartphones, expect applications to continue working while the device is offline or has poor network connectivity, and to synchronize its state with the user’s other devices when the network is available. Examples of such applications include calendars, address books, note-taking tools, to-do lists, and password managers. Similarly, collaborative work often requires sev- eral people to simultaneously edit the same text docu- ment, spreadsheet, presentation, graphic, and other kinds of document, with each person’s edits reflected on the other collaborators’ copies of the document with minimal delay. What these applications have in common is that the ap- plication state needs to be replicated to several devices, each of which may modify the state locally. The traditional ap- proach to concurrency control, serializability, would cause the application to become unusable at times of poor network connectivity [1]. If we require that applications work re- gardless of network availability, we must assume that users can make arbitrary modifications concurrently on different devices, and that any resulting conflicts must be resolved. The simplest way to resolve conflicts is to discard some modifications when a conflict occurs, for example using a “last writer wins” policy. However, this approach is unde- sirable as it incurs data loss. An alternative is to let the user manually resolve the conflict, which is tedious and error- prone, and therefore should be avoided whenever possible. Current applications solve this problem with a range of ad-hoc and application-specific mechanisms. In this paper we present a general-purpose datatype that provides the full expressiveness of the JSON data model, and supports concurrent modifications without loss of information. As we shall see later, our approach typically supports the automatic merging of concurrent modifications into a JSON data structure. We introduce a single, general mechanism (a multi-value register) into our model to record conflicting updates to leaf nodes in the JSON data structure. This mechanism then provides a consistent basis on which ap- M. Kleppmann and A.R. Beresford are with the University of Cambridge Computer Laboratory, Cambridge, UK. Email: [email protected], [email protected]. plications can resolve any remaining conflicts through pro- grammatic means, or via further user input. We expect that implementations of this datatype will drastically simplify the development of collaborative and state-synchronizing applications for mobile devices. 1.1 JSON Data Model JSON is a popular general-purpose data encoding format, used in many databases and web services. It has similarities to XML, and we compare them in Section 3.2. The structure of a JSON document can optionally be constrained by a schema; however, for simplicity, this paper discusses only untyped JSON without an explicit schema. A JSON document is a tree containing two types of branch node: Map: A node whose children have no defined order, and where each child is labelled with a string key. A key uniquely identifies one of the chil- dren. We treat keys as immutable, but values as mutable, and key-value mappings can be added and removed from the map. A JSON map is also known as an object. List: A node whose children have an order defined by the application. The list can be mutated by inserting or deleting list elements. A JSON list is also known as an array. A child of a branch node can be either another branch node, or a leaf node. A leaf of the tree contains a primitive value (string, number, boolean, or null). We treat primitive values as immutable, but allow the value of a leaf node to be modified by treating it as a register that can be assigned a new value. This model is sufficient to express the state of a wide range of applications. For example, a text document can be represented by a list of single-character strings; character- by-character edits are then expressed as insertions and deletions of list elements. In Section 3.1 we describe four more complex examples of using JSON to model application data. arXiv:1608.03960v3 [cs.DC] 15 Aug 2017

Upload: others

Post on 25-Sep-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 A Conflict-Free Replicated JSON DatatypeGoogle Docs [14], Etherpad [15], Novell Vibe [16] and Apache Wave (formerly Google Wave [11]), rely on a single server to decide on a total

1

A Conflict-Free Replicated JSON DatatypeMartin Kleppmann and Alastair R. Beresford

Abstract—Many applications model their data in a general-purpose storage format such as JSON. This data structure is modified bythe application as a result of user input. Such modifications are well understood if performed sequentially on a single copy of the data,but if the data is replicated and modified concurrently on multiple devices, it is unclear what the semantics should be. In this paper wepresent an algorithm and formal semantics for a JSON data structure that automatically resolves concurrent modifications such that noupdates are lost, and such that all replicas converge towards the same state (a conflict-free replicated datatype or CRDT). It supportsarbitrarily nested list and map types, which can be modified by insertion, deletion and assignment. The algorithm performs all mergingclient-side and does not depend on ordering guarantees from the network, making it suitable for deployment on mobile devices withpoor network connectivity, in peer-to-peer networks, and in messaging systems with end-to-end encryption.

Index Terms—CRDTs, Collaborative Editing, P2P, JSON, Optimistic Replication, Operational Semantics, Eventual Consistency.

F

1 INTRODUCTION

U SERS of mobile devices, such as smartphones, expectapplications to continue working while the device is

offline or has poor network connectivity, and to synchronizeits state with the user’s other devices when the network isavailable. Examples of such applications include calendars,address books, note-taking tools, to-do lists, and passwordmanagers. Similarly, collaborative work often requires sev-eral people to simultaneously edit the same text docu-ment, spreadsheet, presentation, graphic, and other kindsof document, with each person’s edits reflected on the othercollaborators’ copies of the document with minimal delay.

What these applications have in common is that the ap-plication state needs to be replicated to several devices, eachof which may modify the state locally. The traditional ap-proach to concurrency control, serializability, would causethe application to become unusable at times of poor networkconnectivity [1]. If we require that applications work re-gardless of network availability, we must assume that userscan make arbitrary modifications concurrently on differentdevices, and that any resulting conflicts must be resolved.

The simplest way to resolve conflicts is to discard somemodifications when a conflict occurs, for example using a“last writer wins” policy. However, this approach is unde-sirable as it incurs data loss. An alternative is to let the usermanually resolve the conflict, which is tedious and error-prone, and therefore should be avoided whenever possible.

Current applications solve this problem with a range ofad-hoc and application-specific mechanisms. In this paperwe present a general-purpose datatype that provides thefull expressiveness of the JSON data model, and supportsconcurrent modifications without loss of information. Aswe shall see later, our approach typically supports theautomatic merging of concurrent modifications into a JSONdata structure. We introduce a single, general mechanism (amulti-value register) into our model to record conflictingupdates to leaf nodes in the JSON data structure. Thismechanism then provides a consistent basis on which ap-

M. Kleppmann and A.R. Beresford are with the University of CambridgeComputer Laboratory, Cambridge, UK.Email: [email protected], [email protected].

plications can resolve any remaining conflicts through pro-grammatic means, or via further user input. We expect thatimplementations of this datatype will drastically simplifythe development of collaborative and state-synchronizingapplications for mobile devices.

1.1 JSON Data ModelJSON is a popular general-purpose data encoding format,used in many databases and web services. It has similaritiesto XML, and we compare them in Section 3.2. The structureof a JSON document can optionally be constrained by aschema; however, for simplicity, this paper discusses onlyuntyped JSON without an explicit schema.

A JSON document is a tree containing two types ofbranch node:

Map: A node whose children have no defined order,and where each child is labelled with a stringkey. A key uniquely identifies one of the chil-dren. We treat keys as immutable, but values asmutable, and key-value mappings can be addedand removed from the map. A JSON map is alsoknown as an object.

List: A node whose children have an order definedby the application. The list can be mutated byinserting or deleting list elements. A JSON list isalso known as an array.

A child of a branch node can be either another branchnode, or a leaf node. A leaf of the tree contains a primitivevalue (string, number, boolean, or null). We treat primitivevalues as immutable, but allow the value of a leaf node tobe modified by treating it as a register that can be assigned anew value.

This model is sufficient to express the state of a widerange of applications. For example, a text document can berepresented by a list of single-character strings; character-by-character edits are then expressed as insertions anddeletions of list elements. In Section 3.1 we describe fourmore complex examples of using JSON to model applicationdata.

arX

iv:1

608.

0396

0v3

[cs

.DC

] 1

5 A

ug 2

017

Page 2: 1 A Conflict-Free Replicated JSON DatatypeGoogle Docs [14], Etherpad [15], Novell Vibe [16] and Apache Wave (formerly Google Wave [11]), rely on a single server to decide on a total

2

1.2 Replication and Conflict Resolution

We consider systems in which a full copy of the JSON doc-ument is replicated on several devices. Those devices couldbe servers in datacenters, but we focus on mobile devicessuch as smartphones and laptops, which have intermittentnetwork connectivity. We do not distinguish between de-vices owned by the same user and different users. Ourmodel allows each device to optimistically modify its localreplica of the document, and to asynchronously propagatethose edits to other replicas.

Our only requirement of the network is that messagessent by one replica are eventually delivered to all otherreplicas, by retrying if delivery fails. We assume the networkmay arbitrarily delay, reorder and duplicate messages.

Our algorithm works client-side and does not depend onany server to transform or process messages. This approachallows messages to be delivered via a peer-to-peer networkas well as a secure messaging protocol with end-to-endencryption [2]. The details of the network implementationand cryptographic protocols are outside of the scope of thispaper.

In Section 4 we define formal semantics describing howconflicts are resolved when a JSON document is concur-rently modified on different devices. Our design is basedon three simple principles:

1) All replicas of the data structure should automat-ically converge towards the same state (a require-ment known as strong eventual consistency [3]).

2) No user input should be lost due to concurrentmodifications.

3) If all sequential permutations of a set of updateslead to the same state, then concurrent execution ofthose updates also leads to the same state [4].

1.3 Our Contributions

Our main contribution in this work is to define an algorithmand formal semantics for collaborative, concurrent editingof JSON data structures with automatic conflict resolution.Although similar algorithms have previously been definedfor lists, maps and registers individually (see Section 2), toour knowledge this paper is the first to integrate all of thesestructures into an arbitrarily composable datatype that canbe deployed on any network topology.

A key requirement of conflict resolution is that after anysequence of concurrent modifications, all replicas eventuallyconverge towards the same state. In Section 4.4 and theappendix we prove a theorem to show that our algorithmsatisfies this requirement.

Composing maps and lists into arbitrarily nested struc-tures opens up subtle challenges that do not arise in flatstructures, due to the possibility of concurrent edits at differ-ent levels of the tree. We illustrate some of those challengesby example in Section 3.1. Nested structures are an impor-tant requirement for many applications. Consequently, thelong-term goal of our work is to simplify the developmentof applications that use optimistic replication by providinga general algorithm for conflict resolution whose details canlargely be hidden inside an easy-to-use software library.

2 RELATED WORK

In this section we discuss existing approaches to optimisticreplication, collaborative editing and conflict resolution.

2.1 Operational Transformation

Algorithms based on operational transformation (OT) havelong been used for collaborative editing applications [5], [6],[7], [8]. Most of them treat a document as a single orderedlist (of characters, for example) and do not support nestedtree structures that are required by many applications. Somealgorithms generalize OT to editing XML documents [9],[10], [11], which provides nesting of ordered lists, but thesealgorithms do not support key-value maps as defined in thispaper (see Section 3.2). The performance of OT algorithmsdegrades rapidly as the number of concurrent operationsincreases [12], [13].

Most deployed OT collaboration systems, includingGoogle Docs [14], Etherpad [15], Novell Vibe [16] andApache Wave (formerly Google Wave [11]), rely on a singleserver to decide on a total ordering of operations [17], adesign decision inherited from the Jupiter system [8]. Thisapproach has the advantage of making the transformationfunctions simpler and less error-prone [18], but it does notmeet our requirements, since we want to support peer-to-peer collaboration without requiring a single server.

Many secure messaging protocols, which we plan to usefor encrypted collaboration, do not guarantee that differentrecipients will see messages in the same order [2]. Althoughit is possible to decide on a total ordering of operationswithout using a single server by using an atomic broad-cast protocol [19], such protocols are equivalent to consen-sus [20], so they can only safely make progress if a quorumof participants are reachable. We expect that in peer-to-peersystems of mobile devices participants will frequently beoffline, and so any algorithm requiring atomic broadcastwould struggle to reach a quorum and become unavailable.Without quorums, the strongest guarantee a system can giveis causal ordering [21].

The Google Realtime API [22] is to our knowledge theonly implementation of OT that supports arbitrary nestingof lists and maps. Like Google Docs, it relies on a singleserver [17]. As a proprietary product, details of its algo-rithms have not been published.

2.2 CRDTs

Conflict-free replicated datatypes (CRDTs) are a family ofdata structures that support concurrent modification andguarantee convergence of concurrent updates. They workby attaching additional metadata to the data structure, mak-ing modification operations commutative by construction.The JSON datatype described in this paper is a CRDT.

CRDTs for registers, counters, maps, and sets are well-known [3], [23], and have been implemented in various de-ployed systems such as Riak [24], [25]. For ordered lists, var-ious algorithms have been proposed, including WOOT [26],RGA [27], Treedoc [28], Logoot [29], and LSEQ [30]. Attiyaet al. [31] analyze the metadata overhead of collaborativelyedited lists, and provide a correctness proof of the RGAalgorithm. However, none of them support nesting: all of

Page 3: 1 A Conflict-Free Replicated JSON DatatypeGoogle Docs [14], Etherpad [15], Novell Vibe [16] and Apache Wave (formerly Google Wave [11]), rely on a single server to decide on a total

3

the aforementioned algorithms assume that each of theirelements is a primitive value, not another CRDT.

The problem of nesting one CRDT inside another (alsoknown as composition or embedding) has only been studiedmore recently. Riak allows nesting of counters and registersinside maps, and of maps within other maps [24], [25].Embedding counters inside maps raises questions of se-mantics, which have been studied by Baquero, Almeida andLerche [32]. Almeida et al. [33] also define delta mutationsfor nested maps, and Baquero et al. [34] define a theoreticalframework for composition of state-based CRDTs, based onlattices. None of this work integrates CRDTs for orderedlists, but the treatment of causality in these datatypes formsa basis for the semantics developed in this paper.

Burckhardt et al. [35] define cloud types, which are similarto CRDTs and can be composed. They define cloud arrays,which behave similarly to our map datatype, and entities,which are like unordered sets or relations; ordered lists arenot defined in this framework.

On the other hand, Martin et al. [36] generalize Lo-goot [29] to support collaborative editing of XML docu-ments – that is, a tree of nested ordered lists without nestedmaps. As discussed in Section 3.2, such a structure does notcapture the expressiveness of JSON.

Although CRDTs for registers, maps and ordered listshave existed for years in isolation, we are not aware ofany prior work that allows them all to be composed intoan arbitrarily nested CRDT with a JSON-like structure.

2.3 Other ApproachesMany replicated data systems need to deal with the prob-lem of concurrent, conflicting modifications, but the solu-tions are often ad-hoc. For example, in Dynamo [37] andCouchDB, if several values are concurrently written to thesame key, the database preserves all of these values, andleaves conflict resolution to application code – in otherwords, the only datatype it supports is a multi-value regis-ter. Naively chosen merge functions often exhibit anomaliessuch as deleted items reappearing [37]. We believe thatconflict resolution is not a simple matter that can reasonablybe left to application programmers.

Another frequently-used approach to conflict resolutionis last writer wins (LWW), which arbitrarily chooses oneamong several concurrent writes as “winner” and discardsthe others. LWW is used in Apache Cassandra, for example.It does not meet our requirements, since we want no userinput to be lost due to concurrent modifications.

Resolving concurrent updates on tree structures has beenstudied in the context of file synchronization [38], [39].

Finally, systems such as Bayou [40] allow offline nodesto execute transactions tentatively, and confirm them whenthey are next online. This approach relies on all serversexecuting transactions in the same serial order, and decid-ing whether a transaction was successful depending on itspreconditions. Bayou has the advantage of being able toexpress global invariants such as uniqueness constraints,which require serialization and cannot be expressed usingCRDTs [41]. Bayou’s downside is that tentative transactionsmay be rolled back, requiring explicit handling by the ap-plication, whereas CRDTs are defined such that operationscannot fail after they have been performed on one replica.

3 COMPOSING DATA STRUCTURES

In this section we informally introduce our approach tocollaborative editing of JSON data structures, and illustratesome peculiarities of concurrent nested data structures. Aformal presentation of the algorithm follows in Section 4.

3.1 Concurrent Editing ExamplesThe sequential semantics of editing a JSON data structureare well-known, and the semantics of concurrently editinga flat map or list data structure have been thoroughlyexplored in the literature (see Section 2). However, whendefining a CRDT for JSON data, difficulties arise due tothe interactions between concurrency and nested data struc-tures.

In the following examples we show some situationsthat might occur when JSON documents are concurrentlymodified, demonstrate how they are handled by our algo-rithm, and explain the rationale for our design decisions.In all examples we assume two replicas, labelled p (drawnon the left-hand side) and q (right-hand side). Local statefor a replica is drawn in boxes, and modifications to localstate are shown with labelled solid arrows; time runs downthe page. Since replicas only mutate local state, we makecommunication of state changes between replicas explicitin our model. Network communication is depicted withdashed arrows.

Our first example is shown in Figure 1. In a documentthat maps “key” to a register with value “A”, replica p setsthe value of the register to “B”, while replica q concurrentlysets it to “C”. As the replicas subsequently exchange editsvia network communication, they detect the conflict. Sincewe do not want to simply discard one of the edits, andthe strings “B” and “C” cannot be meaningfully merged,the system must preserve both concurrent updates. Thisdatatype is known as a multi-value register: although a replicacan only assign a single value to the register, reading theregister may return a set of multiple values that wereconcurrently written.

A multi-value register is hardly an impressive CRDT,since it does not actually perform any conflict resolution.We use it only for primitive values for which no auto-matic merge function is defined. Other CRDTs could besubstituted in its place: for example, a counter CRDT fora number that can only be incremented and decremented,or an ordered list of characters for a collaboratively editablestring (see also Figure 4).

Figure 2 gives an example of concurrent edits at differentlevels of the JSON tree. Here, replica p adds “red” to a mapof colors, while replica q concurrently blanks out the entiremap of colors and then adds “green”. Instead of assigningan empty map, q could equivalently remove the entire key“colors” from the outer map, and then assign a new emptymap to that key. The difficulty in this example is that theaddition of “red” occurs at a lower level of the tree, whileconcurrently the removal of the map of colors occurs at ahigher level of the tree.

One possible way of handling such a conflict wouldbe to let edits at higher levels of the tree always overrideconcurrent edits within that subtree. In this case, that wouldmean the addition of “red” would be discarded, since it

Page 4: 1 A Conflict-Free Replicated JSON DatatypeGoogle Docs [14], Etherpad [15], Novell Vibe [16] and Apache Wave (formerly Google Wave [11]), rely on a single server to decide on a total

4

Replica p: Replica q:

{“key”: “A”} {“key”: “A”}

{“key”: “B”} {“key”: “C”}

{“key”: {“B”, “C”}} {“key”: {“B”, “C”}}

network communication

doc.get(“key”) := “B”; doc.get(“key”) := “C”;

Fig. 1. Concurrent assignment to the register at doc.get(“key”) by replicas p and q.

{“colors”: {“blue”: “#0000ff”}} {“colors”: {“blue”: “#0000ff”}}

{“colors”: {“blue”: “#0000ff”,

“red”: “#ff0000”}}

{“colors”: {}}

{“colors”: {“green”: “#00ff00”}}

{“colors”: {“red”: “#ff0000”,

“green”: “#00ff00”}}{“colors”: {“red”: “#ff0000”,

“green”: “#00ff00”}}

network communication

doc.get(“colors”).get(“red”):= “#ff0000”;

doc.get(“colors”) := {};

doc.get(“colors”).get(“green”):= “#00ff00”;

Fig. 2. Modifying the contents of a nested map while concurrently the entire map is overwritten.

{} {}

{“grocery”: []} {“grocery”: []}

{“grocery”: [“eggs”]}

{“grocery”: [“eggs”, “ham”]}

{“grocery”: [“milk”]}

{“grocery”: [“milk”, “flour”]}

{“grocery”: [“eggs”, “ham”, “milk”, “flour”]} {“grocery”: [“eggs”, “ham”, “milk”, “flour”]}

network communication

doc.get(“grocery”) := []; doc.get(“grocery”) := [];

doc.get(“grocery”).idx(0).insertAfter(“eggs”);

doc.get(“grocery”).idx(0).insertAfter(“milk”);

doc.get(“grocery”).idx(1).insertAfter(“ham”);

doc.get(“grocery”).idx(1).insertAfter(“flour”);

Fig. 3. Two replicas concurrently create ordered lists under the same map key.

Page 5: 1 A Conflict-Free Replicated JSON DatatypeGoogle Docs [14], Etherpad [15], Novell Vibe [16] and Apache Wave (formerly Google Wave [11]), rely on a single server to decide on a total

5

Replica p: Replica q:

[“a”, “b”, “c”]

[“a”, “c”]

[“a”, “x”, “c”]

[“y”, “a”, “x”, “z”, “c”]

[“a”, “b”, “c”]

[“y”, “a”, “b”, “c”]

[“y”, “a”, “z”, “b”, “c”]

[“y”, “a”, “x”, “z”, “c”]

network communication

doc.idx(2).delete;

doc.idx(1).insertAfter(“x”);

doc.idx(0).insertAfter(“y”);

doc.idx(2).insertAfter(“z”);

Fig. 4. Concurrent editing of an ordered list of characters (i.e., a text document).

{}

{“a”: {}}

{“a”: {“x”: “y”}}

{mapT(“a”): {“x”: “y”},listT(“a”): [“z”]}

{}

{“a”: []}

{“a”: [“z”]}

{mapT(“a”): {“x”: “y”},listT(“a”): [“z”]}

network communication

doc.get(“a”) := {};

doc.get(“a”).get(“x”) := “y”;

doc.get(“a”) := [];

doc.get(“a”).idx(0).insertAfter(“z”);

Fig. 5. Concurrently assigning values of different types to the same map key.

{“todo”: [{“title”: “buy milk”,

“done”: false}]}{“todo”: [{“title”: “buy milk”,

“done”: false}]}

{“todo”: []}{“todo”: [{“title”: “buy milk”,

“done”: true}]}

{“todo”: [{“done”: true}]} {“todo”: [{“done”: true}]}

network communication

doc.get(“todo”).idx(1).delete; doc.get(“todo”).idx(1).get(“done”) := true;

Fig. 6. One replica removes a list element, while another concurrently updates its contents.

Page 6: 1 A Conflict-Free Replicated JSON DatatypeGoogle Docs [14], Etherpad [15], Novell Vibe [16] and Apache Wave (formerly Google Wave [11]), rely on a single server to decide on a total

6

would be overridden by the blanking-out of the entiremap of colors. However, that behavior would violate ourrequirement that no user input should be lost due to concur-rent modifications. Instead, we define merge semantics thatpreserve all changes, as shown in Figure 2: “blue” must beabsent from the final map, since it was removed by blankingout the map, while “red” and “green” must be present, sincethey were explicitly added. This behavior matches that ofCRDT maps in Riak [24], [25].

Figure 3 illustrates another problem with maps: tworeplicas can concurrently insert the same map key. Here, pand q each independently create a new shopping list underthe same map key “grocery”, and add items to the list. Inthe case of Figure 1, concurrent assignments to the samemap key were left to be resolved by the application, but inFigure 3, both values are lists and so they can be merged au-tomatically. We preserve the ordering and adjacency of itemsinserted at each replica, so “ham” appears after “eggs”, and“flour” appears after “milk” in the merged result. There isno information on which replica’s items should appear firstin the merged result, so the algorithm can make an arbitrarychoice between “eggs, ham, milk, flour” and “milk, flour,eggs, ham”, provided that all replicas end up with the itemsin the same order.

Figure 4 shows how a collaborative text editor can beimplemented, by treating the document as a list of charac-ters. All changes are preserved in the merged result: “y” isinserted before “a”; “x” and “z” are inserted between “a”and “c”; and “b” is deleted.

Figure 5 demonstrates a variant of the situation in Fig-ure 3, where two replicas concurrently insert the same mapkey, but they do so with different datatypes as values:p inserts a nested map, whereas q inserts a list. Thesedatatypes cannot be meaningfully merged, so we preserveboth values separately. We do this by tagging each mapkey with a type annotation (mapT, listT, or regT for a map,list, or register value respectively), so each type inhabits aseparate namespace.

Finally, Figure 6 shows a limitation of the principle ofpreserving all user input. In a to-do list application, onereplica removes a to-do item from the list, while anotherreplica concurrently marks the same item as done. As thechanges are merged, the update of the map key “done”effectively causes the list item to be resurrected on replicap, leaving a to-do item without a title (since the title wasdeleted as part of deleting the list item). This behavior isconsistent with the example in Figure 2, but it is perhaps sur-prising. In this case it may be more desirable to discard oneof the concurrent updates, and thus preserve the implicitschema that a to-do item has both a “title” and a “done”field. We leave the analysis of developer expectations andthe development of a schema language for future work.

3.2 JSON Versus XML

The most common alternative to JSON is XML, and col-laborative editing of XML documents has been previouslystudied [9], [10], [11]. Besides the superficial syntacticaldifferences, the tree structure of XML and JSON appearsquite similar. However, there is an important difference thatwe should highlight.

CMD ::= let x = EXPR x ∈ VAR| EXPR := v v ∈ VAL| EXPR.insertAfter(v) v ∈ VAL| EXPR.delete| yield| CMD; CMD

EXPR ::= doc| x x ∈ VAR| EXPR.get(key) key ∈ String| EXPR.idx(i) i ≥ 0| EXPR.keys| EXPR.values

VAR ::= x x ∈ VarStringVAL ::= n n ∈ Number

| str str ∈ String| true | false | null| {} | []

Fig. 7. Syntax of command language for querying and modifying adocument.

doc := {};doc.get("shopping") := [];let head = doc.get("shopping").idx(0);head.insertAfter("eggs");let eggs = doc.get("shopping").idx(1);head.insertAfter("cheese");eggs.insertAfter("milk");

// Final state:{"shopping": ["cheese", "eggs", "milk"]}

Fig. 8. Example of programmatically constructing a JSON document.

JSON has two collection constructs that can be arbitrarilynested: maps for unordered key-value pairs, and lists forordered sequences. In XML, the children of an element forman ordered sequence, while the attributes of an element areunordered key-value pairs. However, XML does not allownested elements inside attributes – the value of an attributecan only be a primitive datatype. Thus, XML supports mapswithin lists, but not lists within maps. In this regard, XML isless expressive than JSON: the scenarios in Figures 3 and 5cannot occur in XML.

Some applications may attach map-like semantics to thechildren of an XML document, for example by interpretingthe child element name as key. However, this key-valuestructure is not part of XML itself, and would not be en-forced by existing collaborative editing algorithms for XML.If multiple children with the same key are concurrentlycreated, existing algorithms would create duplicate childrenwith the same key rather than merging them like in Figure 3.

3.3 Document Editing API

To define the semantics for collaboratively editable datastructures, we first define a simple command language thatis executed locally at any of the replicas, and which allowsthat replica’s local copy of the document to be queried andmodified. Performing read-only queries has no side-effects,

Page 7: 1 A Conflict-Free Replicated JSON DatatypeGoogle Docs [14], Etherpad [15], Novell Vibe [16] and Apache Wave (formerly Google Wave [11]), rely on a single server to decide on a total

7

but modifying the document has the effect of producingoperations describing the mutation. Those operations areimmediately applied to the local copy of the document,and also enqueued for asynchronous broadcasting to otherreplicas.

The syntax of the command language is given in Fig-ure 7. It is not a full programming language, but ratheran API through which the document state is queried andmodified. We assume that the program accepts user inputand issues a (possibly infinite) sequence of commands to theAPI. We model only the semantics of those commands, anddo not assume anything about the program in which thecommand language is embedded. The API differs slightlyfrom the JSON libraries found in many programming lan-guages, in order to allow us to define consistent mergesemantics.

We first explain the language informally, before givingits formal semantics. The expression construct EXPR is usedto construct a cursor which identifies a position in thedocument. An expression starts with either the special tokendoc, identifying the root of the JSON document tree, or avariable x that was previously defined in a let command.The expression defines, left to right, the path the cursortakes as it navigates through the tree towards the leaves:the operator .get(key) selects a key within a map, and.idx(n) selects the nth element of an ordered list. Lists areindexed starting from 1, and .idx(0) is a special cursorindicating the head of a list (a virtual position before thefirst list element).

The expression construct EXPR can also query the stateof the document: keys returns the set of keys in the map atthe current cursor, and values returns the contents of themulti-value register at the current cursor. (values is notdefined if the cursor refers to a map or list.)

A command CMD either sets the value of a local vari-able (let), performs network communication (yield), ormodifies the document. A document can be modified bywriting to a register (the operator := assigns the valueof the register identified by the cursor), by inserting anelement into a list (insertAfter places a new elementafter the existing list element identified by the cursor, and.idx(0).insertAfter inserts at the head of a list), or bydeleting an element from a list or a map (delete removesthe element identified by the cursor).

Figure 8 shows an example sequence of commands thatconstructs a new document representing a shopping list.First doc is set to {}, the empty map literal, and then thekey "shopping" within that map is set to the empty list [].The third line navigates to the key "shopping" and selectsthe head of the list, placing the cursor in a variable calledhead. The list element “eggs” is inserted at the head of thelist. In line 5, the variable eggs is set to a cursor pointingat the list element “eggs”. Then two more list elements areinserted: “cheese” at the head, and “milk” after “eggs”.

Note that the cursor eggs identifies the list element byidentity, not by its index: after the insertion of “cheese”,the element “eggs” moves from index 1 to 2, but “milk”is nevertheless inserted after “eggs”. As we shall see later,this feature is helpful for achieving desirable semantics inthe presence of concurrent modifications.

4 FORMAL SEMANTICS

We now explain formally how to achieve the concurrentsemantics outlined in Section 3. The state of replica p isdescribed by Ap, a finite partial function. The evaluationrules of the command language inspect and modify thislocal state Ap, and they do not depend on Aq (the state ofany other replica q 6= p). The only communication betweenreplicas occurs in the evaluation of the yield command,which we discuss later. For now, we concentrate on theexecution of commands at a single replica p.

4.1 Expression EvaluationFigure 9 gives the rules for evaluating EXPR expressions inthe command language, which are evaluated in the contextof the local replica state Ap. The EXEC rule formalizes theassumption that commands are executed sequentially. TheLET rule allows the program to define a local variable, whichis added to the local state (the notation Ap[x 7→ cur ]denotes a partial function that is the same as Ap, exceptthat Ap(x) = cur ). The corresponding VAR rule allowsthe program to retrieve the value of a previously definedvariable.

The following rules in Figure 9 show how an expressionis evaluated to a cursor, which unambiguously identifiesa particular position in a JSON document by describing apath from the root of the document tree to some branch orleaf node. A cursor consists only of immutable keys andidentifiers, so it can be sent over the network to anotherreplica, where it can be used to locate the same position inthe document.

For example,

cursor(〈mapT(doc), listT(“shopping”)〉, id1)

is a cursor representing the list element "eggs" in Figure 8,assuming that id1 is the unique identifier of the operationthat inserted this list element (we will discuss these identi-fiers in Section 4.2.1). The cursor can be interpreted as a paththrough the local replica state structure Ap, read from leftto right: starting from the doc map at the root, it traversesthrough the map entry “shopping” of type listT, and finisheswith the list element that has identifier id1.

In general, cursor(〈k1, . . . , kn−1〉, kn) consists of a (pos-sibly empty) vector of keys 〈k1, . . . , kn−1〉, and a final keykn which is always present. kn can be thought of as thefinal element of the vector, with the distinction that it isnot tagged with a datatype, whereas the elements of thevector are tagged with the datatype of the branch nodebeing traversed, either mapT or listT.

The DOC rule in Figure 9 defines the simplest cur-sor cursor(〈〉, doc), referencing the root of the documentusing the special atom doc. The GET rule navigates acursor to a particular key within a map. For exam-ple, the expression doc.get("shopping") evaluates tocursor(〈mapT(doc)〉, “shopping”) by applying the DOC andGET rules. Note that the expression doc.get(...) implic-itly asserts that doc is of type mapT, and this assertion isencoded in the cursor.

The rules IDX1...5 define how to evaluate the expression.idx(n), moving the cursor to a particular element of alist. IDX1 constructs a cursor representing the head of the

Page 8: 1 A Conflict-Free Replicated JSON DatatypeGoogle Docs [14], Etherpad [15], Novell Vibe [16] and Apache Wave (formerly Google Wave [11]), rely on a single server to decide on a total

8

cmd1 : CMD Ap, cmd1 =⇒ A′pEXEC

Ap, 〈cmd1 ; cmd2 ; . . . 〉 =⇒ A′p, 〈cmd2 ; . . . 〉DOC

Ap, doc =⇒ cursor(〈〉, doc)

Ap, expr =⇒ curLET

Ap, let x = expr =⇒ Ap[x 7→ cur ]

x ∈ dom(Ap)VAR

Ap, x =⇒ Ap(x)

Ap, expr =⇒ cursor(〈k1, . . . , kn−1〉, kn) kn 6= headGET

Ap, expr .get(key) =⇒ cursor(〈k1, . . . , kn−1,mapT(kn)〉, key)

Ap, expr =⇒ cursor(〈k1, . . . , kn−1〉, kn) Ap, cursor(〈k1, . . . , kn−1, listT(kn)〉, head).idx(i) =⇒ cur ′IDX1

Ap, expr .idx(i) =⇒ cur ′

k1 ∈ dom(ctx ) ctx (k1), cursor(〈k2, . . . , kn−1〉, kn).idx(i) =⇒ cursor(〈k2, . . . , kn−1〉, k′n)IDX2ctx , cursor(〈k1, k2, . . . , kn−1〉, kn).idx(i) =⇒ cursor(〈k1, k2, . . . , kn−1〉, k′n)

i > 0 ∧ ctx (next(k)) = k′ ∧ k′ 6= tail ctx (pres(k′)) 6= {} ctx , cursor(〈〉, k′).idx(i− 1) =⇒ ctx ′IDX3

ctx , cursor(〈〉, k).idx(i) =⇒ ctx ′

i > 0 ∧ ctx (next(k)) = k′ ∧ k′ 6= tail ctx (pres(k′)) = {} ctx , cursor(〈〉, k′).idx(i) =⇒ cur ′IDX4

ctx , cursor(〈〉, k).idx(i) =⇒ cur ′

i = 0IDX5ctx , cursor(〈〉, k).idx(i) =⇒ cursor(〈〉, k)

keys(ctx ) = { k | mapT(k) ∈ dom(ctx ) ∨ listT(k) ∈ dom(ctx ) ∨ regT(k) ∈ dom(ctx ) }

Ap, expr =⇒ cur Ap, cur .keys =⇒ keysKEYS1

Ap, expr .keys =⇒ keys

map = ctx (mapT(k)) keys = { k | k ∈ keys(map) ∧ map(pres(k)) 6= {} }KEYS2

Ap, cursor(〈〉, k).keys =⇒ keys

k1 ∈ dom(ctx ) ctx (k1), cursor(〈k2, . . . , kn−1〉, kn).keys =⇒ keysKEYS3

ctx , cursor(〈k1, k2, . . . , kn−1〉, kn).keys =⇒ keys

Ap, expr =⇒ cur Ap, cur .values =⇒ valVAL1

Ap, expr .values =⇒ val

regT(k) ∈ dom(ctx ) val = range(ctx (regT(k)))VAL2

ctx , cursor(〈〉, k).values =⇒ val

k1 ∈ dom(ctx ) ctx (k1), cursor(〈k2, . . . , kn−1〉, kn).values =⇒ valVAL3

ctx , cursor(〈k1, k2, . . . , kn−1〉, kn).values =⇒ val

Fig. 9. Rules for evaluating expressions.

list, and delegates to the subsequent rules to scan over thelist. IDX2 recursively descends the local state according tothe vector of keys in the cursor. When the vector of keys isempty, the context ctx is the subtree of Ap that stores the listin question, and the rules IDX3,4,5 iterate over that list untilthe desired element is found.

IDX5 terminates the iteration when the index reacheszero, while IDX3 moves to the next element and decrementsthe index, and IDX4 skips over list elements that are markedas deleted. The structure resembles a linked list: each listelement has a unique identifier k, and the partial functionrepresenting local state maps next(k) to the ID of the listelement that follows k.

Deleted elements are never removed from the linkedlist structure, but only marked as deleted (they become so-

called tombstones). To this end, the local state maintains apresence set pres(k) for the list element with ID k, which is theset of all operations that have asserted the existence of thislist element. When a list element is deleted, the presence setis set to the empty set, which marks it as deleted; however,a concurrent operation that references the list element cancause the presence set to be come non-empty again (leadingto the situations in Figures 2 and 6). Rule IDX4 handleslist elements with an empty presence set by moving to thenext list element without decrementing the index (i.e., notcounting them as list elements).

The KEYS1,2,3 rules allow the application to inspect theset of keys in a map. This set is determined by examining thelocal state, and excluding any keys for which the presenceset is empty (indicating that the key has been deleted).

Page 9: 1 A Conflict-Free Replicated JSON DatatypeGoogle Docs [14], Etherpad [15], Novell Vibe [16] and Apache Wave (formerly Google Wave [11]), rely on a single server to decide on a total

9

Finally, the VAL1,2,3 rules allow the application to readthe contents of a register at a particular cursor position,using a similar recursive rule structure as the IDX rules. Aregister is expressed using the regT type annotation in thelocal state. Although a replica can only assign a single valueto a register, a register can nevertheless contain multiplevalues if multiple replicas concurrently assign values to it.

4.2 Generating OperationsWhen commands mutate the state of the document, theygenerate operations that describe the mutation. In our se-mantics, a command never directly modifies the local replicastate Ap, but only generates an operation. That operation isthen immediately applied to Ap so that it takes effect locally,and the same operation is also asynchronously broadcast tothe other replicas.

4.2.1 Lamport TimestampsEvery operation in our model is given a unique identifier,which is used in the local state and in cursors. Wheneveran element is inserted into a list, or a value is assigned to aregister, the new list element or register value is identifiedby the identifier of the operation that created it.

In order to generate globally unique operation identifierswithout requiring synchronous coordination between repli-cas we use Lamport timestamps [42]. A Lamport timestampis a pair (c, p) where p ∈ ReplicaID is the unique identifierof the replica on which the edit is made (for example, ahash of its public key), and c ∈ N is a counter that is storedat each replica and incremented for every operation. Sinceeach replica generates a strictly monotonically increasingsequence of counter values c, the pair (c, p) is unique.

If a replica receives an operation with a counter value cthat is greater than the locally stored counter value, the localcounter is increased to the value of the incoming counter.This ensures that if operation o1 causally happened beforeo2 (that is, the replica that generated o2 had received andprocessed o1 before o2 was generated), then o2 must havea greater counter value than o1. Only concurrent operationscan have equal counter values.

We can thus define a total ordering < for Lamporttimestamps:

(c1, p1) < (c2, p2) iff (c1 < c2) ∨ (c1 = c2 ∧ p1 < p2).

If one operation happened before another, this ordering isconsistent with causality (the earlier operation has a lowertimestamp). If two operations are concurrent, their orderaccording to < is arbitrary but deterministic. This orderingproperty is important for our definition of the semantics ofordered lists.

4.2.2 Operation StructureAn operation is a tuple of the form

op(

id : N× ReplicaID,

deps : P(N× ReplicaID),

cur : cursor(〈k1, . . . , kn−1〉, kn),mut : insert(v) | delete | assign(v) v : VAL

)

where id is the Lamport timestamp that uniquely identifiesthe operation, cur is the cursor describing the position inthe document being modified, and mut is the mutation thatwas requested at the specified position.

deps is the set of causal dependencies of the operation.It is defined as follows: if operation o2 was generated byreplica p, then a causal dependency of o2 is any operationo1 that had already been applied on p at the time when o2

was generated. In this paper, we define deps as the set ofLamport timestamps of all causal dependencies. In a realimplementation, this set would become impracticably large,so a compact representation of causal history would be usedinstead – for example, version vectors [43], state vectors [5],or dotted version vectors [44]. However, to avoid ambiguityin our semantics we give the dependencies as a simple setof operation IDs.

The purpose of the causal dependencies deps is to im-pose a partial ordering on operations: an operation canonly be applied after all operations that “happened before”it have been applied. In particular, this means that thesequence of operations generated at one particular replicawill be applied in the same order at every other replica.Operations that are concurrent (i.e., where there is no causaldependency in either direction) can be applied in any order.

4.2.3 Semantics of Generating OperationsThe evaluation rules for commands are given in Fig-ure 10. The MAKE-ASSIGN, MAKE-INSERT and MAKE-DELETE rules define how these respective commands mu-tate the document: all three delegate to the MAKE-OP ruleto generate and apply the operation. MAKE-OP generates anew Lamport timestamp by choosing a counter value thatis 1 greater than any existing counter in Ap(ops), the set ofall operation IDs that have been applied to replica p.

MAKE-OP constructs an op() tuple of the form describedabove, and delegates to the APPLY-LOCAL rule to processthe operation. APPLY-LOCAL does three things: it evalu-ates the operation to produce a modified local state A′p, itadds the operation to the queue of generated operationsAp(queue), and it adds the ID to the set of processedoperations Ap(ops).

The yield command, inspired by Burckhardt et al. [35],performs network communication: sending and receivingoperations to and from other replicas, and applying opera-tions from remote replicas. The rules APPLY-REMOTE, SEND,RECV and YIELD define the semantics of yield. Since anyof these rules can be used to evaluate yield, their effect isnondeterministic, which models the asynchronicity of thenetwork: a message sent by one replica arrives at anotherreplica at some arbitrarily later point in time, and there isno message ordering guarantee in the network.

The SEND rule takes any operations that were placed inAp(queue) by APPLY-LOCAL and adds them to a send bufferAp(send). Correspondingly, the RECV rule takes operationsin the send buffer of replica q and adds them to the receivebuffer Ap(recv) of replica p. This is the only rule thatinvolves more than one replica, and it models all networkcommunication.

Once an operation appears in the receive buffer Ap(recv),the rule APPLY-REMOTE may apply. Under the precondi-tions that the operation has not already been processed and

Page 10: 1 A Conflict-Free Replicated JSON DatatypeGoogle Docs [14], Etherpad [15], Novell Vibe [16] and Apache Wave (formerly Google Wave [11]), rely on a single server to decide on a total

10

Ap, expr =⇒ cur val : VAL Ap, makeOp(cur , assign(val)) =⇒ A′pMAKE-ASSIGN

Ap, expr := val =⇒ A′p

Ap, expr =⇒ cur val : VAL Ap, makeOp(cur , insert(val)) =⇒ A′pMAKE-INSERT

Ap, expr .insertAfter(val) =⇒ A′p

Ap, expr =⇒ cur Ap, makeOp(cur , delete) =⇒ A′pMAKE-DELETE

Ap, expr .delete =⇒ A′p

ctr = max({0} ∪ {ci | (ci, pi) ∈ Ap(ops)} Ap, apply(op((ctr + 1, p), Ap(ops), cur ,mut)) =⇒ A′pMAKE-OP

Ap, makeOp(cur ,mut) =⇒ A′p

Ap, op =⇒ A′pAPPLY-LOCAL

Ap, apply(op) =⇒ A′p[ queue 7→ A′p(queue) ∪ {op}, ops 7→ A′p(ops) ∪ {op.id} ]

op ∈ Ap(recv) op.id /∈ Ap(ops) op.deps ⊆ Ap(ops) Ap, op =⇒ A′pAPPLY-REMOTE

Ap, yield =⇒ A′p[ ops 7→ A′p(ops) ∪ {op.id} ]

SENDAp, yield =⇒ Ap[ send 7→ Ap(send) ∪ Ap(queue) ]

q : ReplicaIDRECV

Ap, yield =⇒ Ap[ recv 7→ Ap(recv) ∪ Aq(send) ]

Ap, yield =⇒ A′p A′p, yield =⇒ A′′pYIELD

Ap, yield =⇒ A′′p

Fig. 10. Rules for generating, sending, and receiving operations.

that its causal dependencies are satisfied, APPLY-REMOTEapplies the operation in the same way as APPLY-LOCAL,and adds its ID to the set of processed operations Ap(ops).

The actual document modifications are performed byapplying the operations, which we discuss next.

4.3 Applying Operations

Figure 11 gives the rules that apply an operation op to acontext ctx , producing an updated context ctx ′. The contextis initially the replica state Ap, but may refer to subtrees ofthe state as rules are recursively applied. These rules areused by APPLY-LOCAL and APPLY-REMOTE to perform thestate updates on a document.

When the operation cursor’s vector of keys is non-empty,the DESCEND rule first applies. It recursively descends thedocument tree by following the path described by the keys.If the tree node already exists in the local replica state,CHILD-GET finds it, otherwise CHILD-MAP and CHILD-LIST create an empty map or list respectively.

The DESCEND rule also invokes ADD-ID1,2 at each treenode along the path described by the cursor, adding theoperation ID to the presence set pres(k) to indicate that thesubtree includes a mutation made by this operation.

The remaining rules in Figure 11 apply when the vectorof keys in the cursor is empty, which is the case whendescended to the context of the tree node to which themutation applies. The ASSIGN rule handles assignment ofa primitive value to a register, EMPTY-MAP handles as-signment where the value is the empty map literal {},and EMPTY-LIST handles assignment of the empty list [].

These three rules for assign have a similar structure: firstclearing the prior value at the cursor (as discussed in thenext section), then adding the operation ID to the presenceset, and finally incorporating the new value into the tree oflocal state.

The INSERT1,2 rules handle insertion of a new elementinto an ordered list. In this case, the cursor refers to thelist element prev , and the new element is inserted afterthat position in the list. INSERT1 performs the insertion bymanipulating the linked list structure. INSERT2 handles thecase of multiple replicas concurrently inserting list elementsat the same position, and uses the ordering relation < onLamport timestamps to consistently determine the insertionpoint. Our approach for handling insertions is based on theRGA algorithm [27]. We show later that these rules ensureall replicas converge towards the same state.

4.3.1 Clearing Prior State

Assignment and deletion operations require that prior state(the value being overwritten or deleted) is cleared, whilealso ensuring that concurrent modifications are not lost, asillustrated in Figure 2. The rules to handle this clearingprocess are given in Figure 12. Intuitively, the effect ofclearing something is to reset it to its empty state by undoingany operations that causally precede the current operation,while leaving the effect of any concurrent operations un-touched.

A delete operation can be used to delete either an ele-ment from an ordered list or a key from a map, dependingon what the cursor refers to. The DELETE rule shows how

Page 11: 1 A Conflict-Free Replicated JSON DatatypeGoogle Docs [14], Etherpad [15], Novell Vibe [16] and Apache Wave (formerly Google Wave [11]), rely on a single server to decide on a total

11

ctx,k

1=⇒

child

child,op

(id,deps,cursor(〈k

2,...,k

n−

1〉,kn),mut)

=⇒

child′

ctx,addId(k

1,id,m

ut)

=⇒

ctx′

DE

SCE

ND

ctx,op

(id,deps,cursor(〈k

1,k

2,...,k

n−

1〉,kn),mut)

=⇒

ctx′ [k

17→

child′ ]

k∈dom

(ctx)

CH

ILD

-GE

Tctx,k

=⇒

ctx(k)

mapT(k)/∈dom(ctx)

CH

ILD

-MA

Pctx,mapT(k)=⇒{}

listT

(k)/∈dom(ctx)

CH

ILD

-LIS

Tctx,listT

(k)=⇒{n

ext(head)7→

tail}

regT

(k)/∈dom

(ctx)

CH

ILD

-RE

Gctx,regT

(k)=⇒{}

pres(k)∈dom(ctx)

PR

ESE

NC

E1

ctx,pres(k)=⇒

ctx(pres(k))

pres(k)/∈dom(ctx)

PR

ESE

NC

E2

ctx,pres(k)=⇒{}

mut6=

delete

kta

g∈{m

apT(k),listT

(k),regT

(k)}

ctx,pres(k)=⇒

pres

AD

D-I

D1

ctx,addId(k

tag,id,m

ut)

=⇒

ctx[pres(k)7→

pres∪{id}]

mut=

delete

AD

D-I

D2

ctx,addId(k

tag,id,m

ut)

=⇒

ctx

val6=[]∧

val6={}

ctx,clear(deps,regT

(k))

=⇒

ctx′ ,pres

ctx′ ,addId(regT(k),id,assign(val))=⇒

ctx′′

ctx′′,regT

(k)=⇒

child

ASS

IGN

ctx,op

(id,deps,cursor(〈〉,k),assign

(val))=⇒

ctx′′[regT(k)7→

child[id7→

val]]

val={}

ctx,clearElem(deps,k)=⇒

ctx′ ,pres

ctx′ ,addId(m

apT(k),id,assign(val))=⇒

ctx′′

ctx′′,mapT(k)=⇒

child

EM

PT

Y-M

AP

ctx,op

(id,deps,cursor(〈〉,k),assign

(val))=⇒

ctx′′[m

apT(k)7→

child]

val=[]

ctx,clearElem(deps,k)=⇒

ctx′ ,pres

ctx′ ,addId(listT

(k),id,assign(val))=⇒

ctx′′

ctx′′,listT

(k)=⇒

child

EM

PT

Y-L

IST

ctx,op

(id,deps,cursor(〈〉,k),assign

(val))=⇒

ctx′′[listT

(k)7→

child]

ctx(next(prev))

=next

next<

id∨

next=

tail

ctx,op

(id,deps,cursor(〈〉,id),assign

(val))=⇒

ctx′

INSE

RT

1ctx,op

(id,deps,cursor(〈〉,prev),insert(val))=⇒

ctx′ [next(prev)7→

id,next(id)7→

next]

ctx(next(prev))

=next

id<

next

ctx,op

(id,deps,cursor(〈〉,next),insert(val))=⇒

ctx′

INSE

RT

2ctx,op

(id,deps,cursor(〈〉,prev),insert(val))=⇒

ctx′

Fig.

11.R

ules

fora

pply

ing

inse

rtio

nan

das

sign

men

tope

ratio

nsto

upda

teth

est

ate

ofa

repl

ica.

Page 12: 1 A Conflict-Free Replicated JSON DatatypeGoogle Docs [14], Etherpad [15], Novell Vibe [16] and Apache Wave (formerly Google Wave [11]), rely on a single server to decide on a total

12

ctx , clearElem(deps, k) =⇒ ctx ′, presDELETE

ctx , op(id , deps, cursor(〈〉, k), delete) =⇒ ctx ′

ctx , clearAny(deps, k) =⇒ ctx ′, pres1 ctx ′, pres(k) =⇒ pres2 pres3 = pres1 ∪ pres2 \ depsCLEAR-ELEMctx , clearElem(deps, k) =⇒ ctx ′[ pres(k) 7→ pres3 ], pres3

ctx , clear(deps,mapT(k))=⇒ ctx 1, pres1

ctx 1, clear(deps, listT(k))=⇒ ctx 2, pres2

ctx 2, clear(deps, regT(k))=⇒ ctx 3, pres3

CLEAR-ANYctx , clearAny(deps, k) =⇒ ctx 3, pres1 ∪ pres2 ∪ pres3

k /∈ dom(ctx )CLEAR-NONE

ctx , clear(deps, k) =⇒ ctx , {}

regT(k) ∈ dom(ctx ) concurrent = {id 7→ v | (id 7→ v) ∈ ctx (regT(k)) ∧ id /∈ deps}CLEAR-REG

ctx , clear(deps, regT(k)) =⇒ ctx [ regT(k) 7→ concurrent ], dom(concurrent)

mapT(k) ∈ dom(ctx ) ctx (mapT(k)), clearMap(deps, {}) =⇒ cleared , presCLEAR-MAP1

ctx , clear(deps,mapT(k)) =⇒ ctx [mapT(k) 7→ cleared ], pres

k ∈ keys(ctx )∧ k /∈ done

ctx , clearElem(deps, k)=⇒ ctx ′, pres1

ctx ′, clearMap(deps, done ∪ {k})=⇒ ctx ′′, pres2CLEAR-MAP2

ctx , clearMap(deps, done) =⇒ ctx ′′, pres1 ∪ pres2

done = keys(ctx )CLEAR-MAP3

ctx , clearMap(deps, done) =⇒ ctx , {}

listT(k) ∈ dom(ctx ) ctx (listT(k)), clearList(deps, head) =⇒ cleared , presCLEAR-LIST1

ctx , clear(deps, listT(k)) =⇒ ctx [ listT(k) 7→ cleared ], pres

k 6= tail∧ctx (next(k)) = next

ctx , clearElem(deps, k)=⇒ ctx ′, pres1

ctx ′, clearList(deps,next)=⇒ ctx ′′, pres2CLEAR-LIST2

ctx , clearList(deps, k) =⇒ ctx ′′, pres1 ∪ pres2

k = tailCLEAR-LIST3ctx , clearList(deps, k) =⇒ ctx , {}

Fig. 12. Rules for applying deletion operations to update the state of a replica.

this operation is evaluated by delegating to CLEAR-ELEM. Inturn, CLEAR-ELEM uses CLEAR-ANY to clear out any datawith a given key, regardless of whether it is of type mapT,listT or regT, and also updates the presence set to includeany nested operation IDs, but exclude any operations indeps .

The premises of CLEAR-ANY are satisfied byCLEAR-MAP1, CLEAR-LIST1 and CLEAR-REG if therespective key appears in ctx , or by CLEAR-NONE (whichdoes nothing) if the key is absent.

As defined by the ASSIGN rule, a register maintains amapping from operation IDs to values. CLEAR-REG updatesa register by removing all operation IDs that appear in deps(i.e., which causally precede the clearing operation), butretaining all operation IDs that do not appear in deps (fromassignment operations that are concurrent with the clearingoperation).

Clearing maps and lists takes a similar approach: eachelement of the map or list is recursively cleared usingclearElem, and presence sets are updated to exclude deps .Thus, any list elements or map entries whose modificationscausally precede the clearing operation will end up with

empty presence sets, and thus be considered deleted. Anymap or list elements containing operations that are concur-rent with the clearing operation are preserved.

4.4 ConvergenceAs outlined in Section 1.2, we require that all replicas auto-matically converge towards the same state – a key require-ment of a CRDT. We now formalize this notion, and showthat the rules in Figures 9 to 12 satisfy this requirement.

Definition 1 (valid execution). A valid execution is a setof operations generated by a set of replicas {p1, . . . , pk}, eachreducing a sequence of commands 〈cmd1 ; . . . ; cmdn〉 withoutgetting stuck.

A reduction gets stuck if there is no application of rulesin which all premises are satisfied. For example, the IDX3,4

rules in Figure 9 get stuck if idx(n) tries to iterate past theend of a list, which would happen if n is greater than thenumber of non-deleted elements in the list; in a real imple-mentation this would be a runtime error. By constrainingvalid executions to those that do not get stuck, we ensurethat operations only refer to list elements that actually exist.

Page 13: 1 A Conflict-Free Replicated JSON DatatypeGoogle Docs [14], Etherpad [15], Novell Vibe [16] and Apache Wave (formerly Google Wave [11]), rely on a single server to decide on a total

13

Note that it is valid for an execution to never performany network communication, either because it never in-vokes the yield command, or because the nondeterministicexecution of yield never applies the RECV rule. We needonly a replica’s local state to determine whether reductiongets stuck.

Definition 2 (history). A history is a sequence of operations inthe order it was applied at one particular replica p by applicationof the rules APPLY-LOCAL and APPLY-REMOTE.

Since the evaluation rules sequentially apply one op-eration at a time at a given replica, the order is well-defined. Even if two replicas p and q applied the sameset of operations, i.e. if Ap(ops) = Aq(ops), they mayhave applied any concurrent operations in a different order.Due to the premise op.deps ⊆ Ap(ops) in APPLY-REMOTE,histories are consistent with causality: if an operation hascausal dependencies, it appears at some point after thosedependencies in the history.

Definition 3 (document state). The document state ofa replica p is the subtree of Ap containing the document:that is, Ap(mapT(doc)) or Ap(listT(doc)) or Ap(regT(doc)),whichever is defined.

Ap contains variables defined with let, which are localto one replica, and not part of the replicated state. Thedefinition of document state excludes these variables.

Theorem. For any two replicas p and q that participated in avalid execution, if Ap(ops) = Aq(ops), then p and q have thesame document state.

This theorem is proved in the appendix. It formalizesthe safety property of convergence: if two replicas haveprocessed the same set of operations, possibly in a differentorder, then they are in the same state. In combination witha liveness property, namely that every replica eventuallyprocesses all operations, we obtain the desired notion ofconvergence: all replicas eventually end up in the samestate.

The liveness property depends on assumptions of repli-cas invoking yield sufficiently often, and all nondeterminis-tic rules for yield being chosen fairly. We will not formalizethe liveness property in this paper, but assert that it canusually be provided in practice, as network interruptionsare usually of finite duration.

5 CONCLUSIONS AND FURTHER WORK

In this paper we demonstrated how to compose CRDTs forordered lists, maps and registers into a compound CRDTwith a JSON data model. It supports arbitrarily nested listsand maps, and it allows replicas to make arbitrary changesto the data without waiting for network communication.Replicas asynchronously send mutations to other replicasin the form of operations. Concurrent operations are com-mutative, which ensures that replicas converge towards thesame state without requiring application-specific conflictresolution logic.

This work focused on the formal semantics of the JSONCRDT, represented as a mathematical model. We are alsoworking on a practical implementation of the algorithm, and

will report on its performance characteristics in follow-onwork.

Our principle of not losing input due to concurrentmodifications appears at first glance to be reasonable, butas illustrated in Figure 6, it leads to merged documentstates that may be surprising to application programmerswho are more familiar with sequential programs. Furtherwork will be needed to understand the expectations ofapplication programmers, and to design data structures thatare minimally surprising under concurrent modification. Itmay turn out that a schema language will be required tosupport more complex applications. A schema languagecould also support semantic annotations, such as indicatingthat a number should be treated as a counter rather than aregister.

The CRDT defined in this paper supports insertion,deletion and assignment operations. In addition to these,it would be useful to support a move operation (to changethe order of elements in an ordered list, or to move asubtree from one position in a document to another) andan undo operation. Moreover, garbage collection (tombstoneremoval) is required in order to prevent unbounded growthof the data structure. We plan to address these missingfeatures in future work.

ACKNOWLEDGEMENTS

This research was supported by a grant from The BoeingCompany. Thank you to Dominic Orchard, Diana Vasile,and the anonymous reviewers for comments that improvedthis paper.

REFERENCES

[1] S. B. Davidson, H. Garcia-Molina, and D. Skeen, “Consistency inpartitioned networks,” ACM Computing Surveys, vol. 17, no. 3, pp.341–370, Sep. 1985.

[2] N. Unger, S. Dechand, J. Bonneau, S. Fahl, H. Perl, I. Goldberg,and M. Smith, “SoK: Secure messaging,” in 36th IEEE Symposiumon Security and Privacy, May 2015.

[3] M. Shapiro, N. Preguica, C. Baquero, and M. Zawirski, “Conflict-free replicated data types,” in 13th International Symposium onStabilization, Safety, and Security of Distributed Systems (SSS), Oct.2011, pp. 386–400.

[4] A. Bieniusa, M. Zawirski, N. Preguica, M. Shapiro, C. Baquero,V. Balegas, and S. Duarte, “Brief announcement: Semantics ofeventually consistent replicated sets,” in 26th International Sym-posium on Distributed Computing (DISC), Oct. 2012.

[5] C. Ellis and S. J. Gibbs, “Concurrency control in groupwaresystems,” in ACM International Conference on Management of Data(SIGMOD), May 1989, pp. 399–407.

[6] M. Ressel, D. Nitsche-Ruhland, and R. Gunzenhauer, “An integrat-ing, transformation-oriented approach to concurrency control andundo in group editors,” in ACM Conference on Computer SupportedCooperative Work (CSCW), Nov. 1996, pp. 288–297.

[7] C. Sun and C. Ellis, “Operational transformation in real-timegroup editors: Issues, algorithms, and achievements,” in ACMConference on Computer Supported Cooperative Work (CSCW), Nov.1998, pp. 59–68.

[8] D. A. Nichols, P. Curtis, M. Dixon, and J. Lamping, “High-latency,low-bandwidth windowing in the Jupiter collaboration system,”in 8th Annual ACM Symposium on User Interface Software andTechnology (UIST), Nov. 1995, pp. 111–120.

[9] A. H. Davis, C. Sun, and J. Lu, “Generalizing operational trans-formation to the Standard General Markup Language,” in ACMConference on Computer Supported Cooperative Work (CSCW), Nov.2002, pp. 58–67.

Page 14: 1 A Conflict-Free Replicated JSON DatatypeGoogle Docs [14], Etherpad [15], Novell Vibe [16] and Apache Wave (formerly Google Wave [11]), rely on a single server to decide on a total

14

[10] C.-L. Ignat and M. C. Norrie, “Customizable collaborative editorrelying on treeOPT algorithm,” in 8th European Conference onComputer-Supported Cooperative Work (ECSCW), Sep. 2003, pp. 315–334.

[11] D. Wang, A. Mah, S. Lassen, and S. Thorogood. (2015, Aug.)Apache Wave (incubating) protocol documentation, release 0.4.Apache Software Foundation. [Online]. Available: https://people.apache.org/∼al/wave docs/ApacheWaveProtocol-0.4.pdf

[12] D. Li and R. Li, “A performance study of group editing algo-rithms,” in 12th International Conference on Parallel and DistributedSystems (ICPADS), Jul. 2006, pp. 300–307.

[13] M. Ahmed-Nacer, C.-L. Ignat, G. Oster, H.-G. Roh, and P. Urso,“Evaluating CRDTs for real-time document editing,” in 11th ACMSymposium on Document Engineering (DocEng), Sep. 2011, pp. 103–112.

[14] J. Day-Richter. (2010, Sep.) What’s different about thenew Google Docs: Making collaboration fast. [On-line]. Available: https://drive.googleblog.com/2010/09/whats-different-about-new-google-docs.html

[15] AppJet, Inc. (2011, Mar.) Etherpad and EasySync technicalmanual. [Online]. Available: https://github.com/ether/etherpad-lite/blob/e2ce9dc/doc/easysync/easysync-full-description.pdf

[16] D. Spiewak. (2010, May) Understanding and ap-plying operational transformation. [Online]. Avail-able: http://www.codecommit.com/blog/java/understanding-and-applying-operational-transformation

[17] M. Lemonik, Personal communication, Mar. 2016.[18] A. Imine, P. Molli, G. Oster, and M. Rusinowitch, “Proving cor-

rectness of transformation functions in real-time groupware,” in8th European Conference on Computer-Supported Cooperative Work(ECSCW), Sep. 2003, pp. 277–293.

[19] X. Defago, A. Schiper, and P. Urban, “Total order broadcast andmulticast algorithms: Taxonomy and survey,” ACM ComputingSurveys, vol. 36, no. 4, pp. 372–421, Dec. 2004.

[20] T. D. Chandra and S. Toueg, “Unreliable failure detectors forreliable distributed systems,” Journal of the ACM, vol. 43, no. 2,pp. 225–267, Mar. 1996.

[21] H. Attiya, F. Ellen, and A. Morrison, “Limitations of highly-available eventually-consistent data stores,” in ACM Symposiumon Principles of Distributed Computing (PODC), Jul. 2015.

[22] Google, Inc. (2015) Google Realtime API. [Online]. Available:https://developers.google.com/google-apps/realtime/overview

[23] M. Shapiro, N. Preguica, C. Baquero, and M. Zawirski, “A com-prehensive study of convergent and commutative replicated datatypes,” INRIA, Tech. Rep. 7506, 2011.

[24] R. Brown, S. Cribbs, C. Meiklejohn, and S. Elliott, “Riak DT map: acomposable, convergent replicated dictionary,” in 1st Workshop onPrinciples and Practice of Eventual Consistency (PaPEC), Apr. 2014.

[25] R. Brown. (2013, Oct.) A bluffers guide to CRDTs inRiak. [Online]. Available: https://gist.github.com/russelldb/f92f44bdfb619e089a4d

[26] G. Oster, P. Urso, P. Molli, and A. Imine, “Data consistency for P2Pcollaborative editing,” in ACM Conference on Computer SupportedCooperative Work (CSCW), Nov. 2006.

[27] H.-G. Roh, M. Jeon, J.-S. Kim, and J. Lee, “Replicated abstract datatypes: Building blocks for collaborative applications,” Journal ofParallel and Distributed Computing, vol. 71, no. 3, pp. 354–368, 2011.

[28] N. Preguica, J. Manuel Marques, M. Shapiro, and M. Letia, “Acommutative replicated data type for cooperative editing,” in29th IEEE International Conference on Distributed Computing Systems(ICDCS), Jun. 2009.

[29] S. Weiss, P. Urso, and P. Molli, “Logoot-Undo: Distributed col-laborative editing system on P2P networks,” IEEE Transactions onParallel and Distributed Systems, vol. 21, no. 8, pp. 1162–1174, Jan.2010.

[30] B. Nedelec, P. Molli, A. Mostefaoui, and E. Desmontils, “LSEQ: anadaptive structure for sequences in distributed collaborative edit-ing,” in 13th ACM Symposium on Document Engineering (DocEng),Sep. 2013, pp. 37–46.

[31] H. Attiya, S. Burckhardt, A. Gotsman, A. Morrison, H. Yang, andM. Zawirski, “Specification and complexity of collaborative textediting,” in ACM Symposium on Principles of Distributed Computing(PODC), Jul. 2016, pp. 259–268.

[32] C. Baquero, P. S. Almeida, and C. Lerche, “The problem withembedded CRDT counters and a solution,” in 2nd Workshop on thePrinciples and Practice of Consistency for Distributed Data (PaPoC),Apr. 2016.

[33] P. S. Almeida, A. Shoker, and C. Baquero, “Delta state replicateddata types,” arXiv:1603.01529 [cs.DC], Mar. 2016. [Online].Available: http://arxiv.org/abs/1603.01529

[34] C. Baquero, P. S. Almeida, A. Cunha, and C. Ferreira,“Composition of state-based CRDTs,” HASLab, Tech. Rep., May2015. [Online]. Available: http://haslab.uminho.pt/cbm/files/crdtcompositionreport.pdf

[35] S. Burckhardt, M. Fahndrich, D. Leijen, and B. P. Wood, “Cloudtypes for eventual consistency,” in 26th European Conference onObject-Oriented Programming (ECOOP), Jun. 2012.

[36] S. Martin, P. Urso, and S. Weiss, “Scalable XML collaborativeediting with undo,” in On the Move to Meaningful Internet Systems,Oct. 2010, pp. 507–514.

[37] G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Laksh-man, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels,“Dynamo: Amazon’s highly available key-value store,” in 21stACM Symposium on Operating Systems Principles (SOSP), Oct. 2007,pp. 205–220.

[38] S. Balasubramaniam and B. C. Pierce, “What is a file synchro-nizer?” in 4th Annual ACM/IEEE International Conference on MobileComputing and Networking (MobiCom), Oct. 1998, pp. 98–108.

[39] N. Ramsey and E. Csirmaz, “An algebraic approach to filesynchronization,” in 8th European Software Engineering Conference(ESEC/FSE-9), Sep. 2001.

[40] D. B. Terry, M. M. Theimer, K. Petersen, A. J. Demers, M. J.Spreitzer, and C. H. Hauser, “Managing update conflicts in Bayou,a weakly connected replicated storage system,” in 15th ACMSymposium on Operating Systems Principles (SOSP), Dec. 1995, pp.172–182.

[41] P. Bailis, A. Fekete, M. J. Franklin, A. Ghodsi, J. M. Hellerstein,and I. Stoica, “Coordination avoidance in database systems,”Proceedings of the VLDB Endowment, vol. 8, no. 3, pp. 185–196, Nov.2014.

[42] L. Lamport, “Time, clocks, and the ordering of events in a dis-tributed system,” Communications of the ACM, vol. 21, no. 7, pp.558–565, Jul. 1978.

[43] D. S. Parker, Jr, G. J. Popek, G. Rudisin, A. Stoughton, B. J.Walker, E. Walton, J. M. Chow, D. Edwards, S. Kiser, and C. Kline,“Detection of mutual inconsistency in distributed systems,” IEEETransactions on Software Engineering, vol. SE-9, no. 3, pp. 240–247,May 1983.

[44] N. Preguica, C. Baquero, P. S. Almeida, V. Fonte, and R. Goncalves,“Brief announcement: Efficient causality tracking in distributedstorage systems with dotted version vectors,” in 31st ACM Sympo-sium on Principles of Distributed Computing (PODC), Jul. 2012.

Martin Kleppmann is a Research Associate inthe Computer Laboratory at the University ofCambridge. His current research project, TRVEData, is working towards better security and pri-vacy in cloud applications by applying end-to-end encryption to collaboratively editable appli-cation data. His book Designing Data-IntensiveApplications was published by O’Reilly Mediain 2017. Previously, he worked as a softwareengineer and entrepreneur at several internetcompanies, including Rapportive and LinkedIn.

Alastair R. Beresford is a Senior Lecturer in theComputer Laboratory at the University of Cam-bridge. His research work explores the securityand privacy of large-scale distributed systems,with a particular focus on networked mobile de-vices such as smartphones, tablets and laptops.He looks at the security and privacy of the de-vices themselves, as well as the security andprivacy problems induced by the interaction be-tween mobile devices and cloud-based Internetservices.

Page 15: 1 A Conflict-Free Replicated JSON DatatypeGoogle Docs [14], Etherpad [15], Novell Vibe [16] and Apache Wave (formerly Google Wave [11]), rely on a single server to decide on a total

15

APPENDIXPROOF OF CONVERGENCE

Theorem 1. For any two replicas p and q that participated in avalid execution, if Ap(ops) = Aq(ops), then p and q have thesame document state.

Proof. Consider the histories Hp and Hq at p and q re-spectively (see Definition 2). The rules APPLY-LOCAL andAPPLY-REMOTE maintain the invariant that an operation isadded to Ap(ops) or Aq(ops) if and only if it was applied tothe document state at p or q. Thus, Ap(ops) = Aq(ops) iffHp and Hq contain the same set of operations (potentiallyordered differently).

The history Hp at replica p is a sequence of n operations:Hp = 〈o1, . . . , on〉, and the document state at p is derivedfrom Hp by starting in the empty state and applying theoperations in order. Likewise, the document state at q isderived from Hq , which is a permutation of Hp. Bothhistories must be consistent with causality, i.e. for all i with1 ≤ i ≤ n, we require oi.deps ⊆ {oj .id | 1 ≤ j < i}. Thecausality invariant is maintained by the APPLY-* rules.

We can prove the theorem by induction over the lengthof history n.

Base case: An empty history with n = 0 describes theempty document state. The empty document is always thesame, and so any two replicas that have not executed anyoperations are by definition in the same state.

Induction step: Given histories Hp and Hq of length n,such that Hp = 〈o1, . . . , on〉 and Hq is a permutation of Hp,and such that applying Hp results in the same documentstate as applying Hq , we can construct new histories H ′pand H ′q of length n + 1 by inserting a new operation on+1

at any causally ready position in Hp or Hq respectively.We must then show that for all the histories H ′p and H ′qconstructed this way, applying the sequence of operationsin order results in the same document state.

In order to prove the induction step, we examine theinsertion of on+1 into Hp and Hq . Each history can be splitinto a prefix, which is the minimal subsequence 〈o1, . . . , oj〉such that on+1.deps ⊆ {o1.id , . . . , oj .id}, and a suffix,which is the remaining subsequence 〈oj+1, . . . , on〉. Theprefix contains all operations that causally precede on+1,and possibly some operations that are concurrent with on+1;the suffix contains only operations that are concurrent withon+1. The earliest position where on+1 can be inserted intothe history is between the prefix and the suffix; the latestposition is at the end of the suffix; or it could be inserted atany point within the suffix.

We need to show that the effect on the document stateis the same, regardless of the position at which on+1 isinserted, and regardless of whether it is inserted into Hp

or Hq . We do this in Lemma 8 by showing that on+1 iscommutative with respect to all operations in the suffix,i.e. with respect to any operations that are concurrent toon+1.

Before we can prove the commutativity of operations,we must first define some more terms and prove somepreliminary lemmas.

Definition 4 (appearing after). In the ordered list ctx , listelement kj appears after list element k1 if there exists a (possibly

empty) sequence of list elements k2, . . . , kj−1 such that for all iwith 1 ≤ i < j, ctx (next(ki)) = ki+1. Moreover, we say kjappears immediately after k1 if that sequence is empty, i.e. ifctx (next(k1)) = kj .

The definition of appearing after corresponds to the orderin which the IDX rules iterate over the list.

Lemma 2. If k2 appears after k1 in an ordered list, and the listis mutated according to the evaluation rules, k2 also appears afterk1 in all later document states.

Proof. The only rule that modifies the next pointers in thecontext is INSERT1, and it inserts a new list element be-tween two existing list elements (possibly head and/or tail).This modification preserves the appears-after relationshipbetween any two existing list elements. Since no other ruleaffects the list order, appears-after is always preserved.

Note that deletion of an element from a list does notremove it from the sequence of next pointers, but only clearsits presence set pres(k).

Lemma 3. If one replica inserts a list element knew between k1

and k2, i.e. if knew appears after k1 in the list and k2 appearsafter knew in the list on the source replica after applying APPLY-LOCAL, then knew appears after k1 and k2 appears after knew onevery other replica where that operation is applied.

Proof. The rules for generating list operations ensure that k1

is either head or an operation identifier, and k2 is either tailor an operation identifier.

When the insertion operation is generated using theMAKE-OP rule, its operation identifier is given a countervalue ctr that is greater than the counter of any existingoperation ID in Ap(ops). If k2 is an operation identifier,we must have k2 ∈ Ap(ops), since both APPLY-LOCALand APPLY-REMOTE add operation IDs to Ap(ops) whenapplying an insertion. Thus, either k2 < knew under theordering relation < for Lamport timestamps, or k2 = tail.

When the insertion operation is applied on anotherreplica using APPLY-REMOTE and INSERT1,2, k2 appearsafter k1 on that replica (by Lemma 2 and causality). Thecursor of the operation is cursor(〈. . . 〉, k1), so the rules startiterating the list at k1, and therefore knew is inserted at someposition after k1.

If other concurrent insertions occurred between k1 andk2, their operation ID may be greater than or less than knew ,and thus either INSERT1 or INSERT2 may apply. In partic-ular, INSERT2 skips over any list elements whose Lamporttimestamp is greater than knew . However, we know thatk2 < knew ∨ k2 = tail, and so INSERT1 will apply withnext = k2 at the latest. The INSERT1,2 rules thus neveriterate past k2, and thus knew is never inserted at a listposition that appears after k2.

Definition 5 (common ancestor). In a history H , the commonancestor of two concurrent operations or and os is the latestdocument state that causally precedes both or and os.

The common ancestor of or and os can be defined moreformally as the document state resulting from applying asequence of operations 〈o1, . . . , oj〉 that is the shortest prefixof H that satisfies (or.deps ∩ os.deps) ⊆ {o1.id , . . . , oj .id}.

Page 16: 1 A Conflict-Free Replicated JSON DatatypeGoogle Docs [14], Etherpad [15], Novell Vibe [16] and Apache Wave (formerly Google Wave [11]), rely on a single server to decide on a total

16

Definition 6 (insertion interval). Given two concurrent op-erations or and os that insert into the same list, the insertioninterval of or is the pair of keys (kbefore

r , kafterr ) such that or.id

appears after kbeforer when or has been applied, kafter

r appears afteror.id when or has been applied, and kafter

r appears immediatelyafter kbefore

r in the common ancestor of or and os. The insertioninterval of os is the pair of keys (kbefore

s , kafters ) defined similarly.

It may be the case that kbeforer or kbefore

s is head, and thatkafterr or kafter

s is tail.

Lemma 4. For any two concurrent insertion operations or, osin a history H , if or.cur = os.cur , then the order at whichthe inserted elements appear in the list after applying H isdeterministic and independent of the order of or and os in H .

Proof. Without loss of generality, assume that or.id < os.idaccording to the ordering relation on Lamport timestamps.(If the operation ID of or is greater than that of os, the twooperations can be swapped in this proof.) We now distin-guish the two possible orders of applying the operations:

1) or is applied before os in H . Thus, at the time whenos is applied, or has already been applied. Whenapplying os, since or has a lesser operation ID, therule INSERT1 applies with next = or.id at the latest,so the insertion position of os must appear beforeor. It is not possible for INSERT2 to skip past or .

2) os is applied before or in H . Thus, at the timewhen or is applied, os has already been applied.When applying or , the rule INSERT2 applies withnext = os.id , so the rule skips past os and inserts orat a position after os. Moreover, any list elementsthat appear between os.cur and os at the timeof inserting or must have a Lamport timestampgreater than os.id , so INSERT2 also skips over thoselist elements when inserting or. Thus, the insertionposition of or must be after os.

Thus, the insertion position of or appears after the inser-tion position of os, regardless of the order in which the twooperations are applied. The ordering depends only on theoperation IDs, and since these IDs are fixed at the time theoperations are generated, the list order is determined by theIDs.

Lemma 5. In an operation history H , an insertion operation iscommutative with respect to concurrent insertion operations tothe same list.

Proof. Given any two concurrent insertion operations or, osin H , we must show that the document state does notdepend on the order in which or and os are applied.

Either or and os have the same insertion interval asdefined in Definition 6, or they have different insertionintervals. If the insertion intervals are different, then byLemma 3 the operations cannot affect each other, and sothey have the same effect regardless of their order. So weneed only analyze the case in which they have the sameinsertion interval (kbefore, kafter).

If or.cur = os.cur , then by Lemma 4, the operation withthe greater operation ID appears first in the list, regardlessof the order in which the operations are applied. If or.cur 6=os.cur , then one or both of the cursors must refer to a list

element that appears between kbefore and kafter, and thatdid not yet exist in the common ancestor (Definition 5).

Take a cursor that differs from kbefore: the list element itrefers to was inserted by a prior operation, whose cursor inturn refers to another prior operation, and so on. Followingthis chain of cursors for a finite number of steps leads toan operation ofirst whose cursor refers to kbefore (since aninsertion operation always inserts at a position after thecursor).

Note that all of the operations in this chain are causallydependent on ofirst, and so they must have a Lamporttimestamp greater than ofirst. Thus, we can apply the sameargument as in Lemma 4: if INSERT2 skips over the listelement inserted by ofirst, it will also skip over all of thelist elements that are causally dependent on it; if INSERT1

inserts a new element before ofirst, it is also inserted beforethe chain of operations that is based on it.

Therefore, the order of or and os in the final list is de-termined by the Lamport timestamps of the first insertionsinto the insertion interval after their common ancestor, in thechains of cursor references of the two operations. Since theargument above applies to all pairs of concurrent operationsor, os in H , we deduce that the final order of elements in thelist depends only on the operation IDs but not the order ofapplication, which shows that concurrent insertions to thesame list are commutative.

Lemma 6. In a history H , a deletion operation is commutativewith respect to concurrent operations.

Proof. Given a deletion operation od and any other concur-rent operation oc, we must show that the document stateafter applying both operations does not depend on the orderin which od and oc were applied.

The rules in Figure 12 define how a deletion operationod is applied: starting at the cursor in the operation, theyrecursively descend the subtree, removing od.deps from thepresence set pres(k) at all branch nodes in the subtree,and updating all registers to remove any values written byoperations in od.deps .

If oc is an assignment or insertion operation, the ASSIGNrule adds oc.id to the mapping from operation ID to valuefor a register, and the DESCEND, ASSIGN, EMPTY-MAP andEMPTY-LIST rules add oc.id to the presence sets pres(k)along the path through the document tree described by thecursor.

If od.cur is not a prefix of oc.cur , the operations affectdisjoint subtrees of the document, and so they are triviallycommutative. Any state changes by DESCEND and ADD-ID1

along the shared part of the cursor path are applied usingthe set union operator ∪, which is commutative.

Now consider the case where od.cur is a prefix of oc.cur .Since oc is concurrent with od, we know that oc.id /∈od.deps . Therefore, if oc is applied before od in the history,the CLEAR-* rules evaluating od will leave any occurrencesof oc.id in the document state undisturbed, while removingany occurrences of operations in od.deps .

If od is applied before oc, the effect on presence setsand registers is the same as if they had been applied inthe reverse order. Moreover, oc applies in the same way asif od had not been applied previously, because applying adeletion only modifies presence sets and registers, without

Page 17: 1 A Conflict-Free Replicated JSON DatatypeGoogle Docs [14], Etherpad [15], Novell Vibe [16] and Apache Wave (formerly Google Wave [11]), rely on a single server to decide on a total

17

actually removing map keys or list elements, and becausethe rules for applying an operation are not conditional onthe previous content of presence sets and registers.

Thus, the effect of applying oc before od is the same asapplying od before oc, so the operations commute.

Lemma 7. In a history H , an assignment operation is commuta-tive with respect to concurrent operations.

Proof. Given an assignment oa and any other concurrentoperation oc, we must show that the document state afterapplying both operations does not depend on the order inwhich oa and oc were applied.

The rules ASSIGN, EMPTY-MAP and EMPTY-LIST definehow an assignment operation oa is applied, depending onthe value being assigned. All three rules first clear anycausally prior state from the cursor at which the assignmentis occurring; by Lemma 6, this clearing operation is com-mutative with concurrent operations, and leaves updates byconcurrent operations untouched.

The rules also add oa.id to the presence set identified bythe cursor, and DESCEND adds oa.id to the presence sets onthe path from the root of the document tree described by thecursor. These state changes are applied using the set unionoperator ∪, which is commutative.

Finally, in the case where value assigned by oa is aprimitive and the ASSIGN rule applies, the mapping fromoperation ID to value is added to the register by the expres-sion child [ id 7→ val ]. If oc is not an assignment operationor if oa.cursor 6= oc.cursor , the operations are independentand thus trivially commutative.

If oa and oc are assignments to the same cursor,we use the commutativity of updates to a partial func-tion: child [ id1 7→ val1 ] [ id2 7→ val2 ] = child [ id2 7→val2 ] [ id1 7→ val1 ] provided that id1 6= id2. Since opera-tion IDs (Lamport timestamps) are unique, two concurrentassignments add two different keys to the mapping, andtheir order is immaterial.

Thus, all parts of the process of applying oa have thesame effect on the document state, regardless of whether ocis applied before or after oa, so the operations commute.

Lemma 8. Given an operation history H = 〈o1, . . . , on〉 from avalid execution, a new operation on+1 from that execution can beinserted at any point in H after on+1.deps have been applied. Forall histories H ′ that can be constructed this way, the documentstate resulting from applying the operations in H ′ in order isthe same, and independent of the ordering of any concurrentoperations in H .

Proof. H can be split into a prefix and a suffix, as describedin the proof of Theorem 1. The suffix contains only opera-tions that are concurrent with on+1, and we allow on+1 tobe inserted at any point after the prefix. We then prove thelemma case-by-case, depending on the type of mutation inon+1.

If on+1 is a deletion, by Lemma 6 it is commutative withall operations in the suffix, and so on+1 can be inserted atany point within, before, or after the suffix without changingits effect on the final document state. Similarly, if on+1

is an assignment, by Lemma 7 it is commutative with alloperations in the suffix.

If on+1 is an insertion, let oc be any operation in thesuffix, and consider the cases of on+1 being inserted beforeand after oc in the history. If oc is a deletion or assignment,it is commutative with on+1 by Lemma 6 or Lemma 7respectively. If oc is an insertion into the same list as on+1,then by Lemma 5 the operations are commutative. If oc isan insertion into a different list in the document, its effectis independent from on+1 and so the two operations can beapplied in any order.

Thus, on+1 is commutative with respect to any concur-rent operation in H . Therefore, on+1 can be inserted into Hat any point after its causal dependencies, and the effect onthe final document state is independent of the position atwhich the operation is inserted.

This completes the induction step in the proof of Theo-rem 1, and thus proves convergence of our datatype.