persistent data structure

8/7/2019 PERSISTENT DATA STRUCTURE

1/21

PERSISTENT DATA STRUCTURE

1. Introduction2. Requirement of Persistent Data structure3. Ephemeral v/s Persistent Data Structure4. Advantages5. Applications

a. Persistent Linked Listb. Persistent Binary Treec. Random Access Listd. Planar Point Location

6. Methods to make the data structure persistenta. The Fat-Node Methodb. The Node-Copying Method7. Simple Application Code


2/21

Introduction

The persistent data structure is a data structure which always preserves

previous version of itself when it is modified. This data structure is

immutable as their operations do not visibly update the structure in-place,

but instead always yield a new updated structure.

The data structure is partially persistent if all versions can be accessed but

only the newest version can be modified.

The data structure is fully persistent if every version can be both accessed

and modified.

If there is also a meld or merge operation that can create a new versionfrom two previous versions, the data structure is called confluently

persistent. In such data structures, we use combinatory to combine input ofmore than one previous versions to output a new single version. Rather thana branching tree, combinations of versions induce a DAG(direct acyclicgraph) structure on the version graph.

Structures that does not preserves previous versions are called ephemeral.


3/21

Requirement of Persistent Data structure

The typical data structures most programmers know and use require

imperative programming: they fundamentally depend on replacing the

values of fields with assignment statements. A particular data structure

represents the state of something at that particular moment in time, and

that moment only. If you want to know what the state was in the past you

needed to have made a copy of the entire data structure back then, and

kept it around until you needed it.

Alternatively, you could keep a log of changes made to the data structure

that you could play in reverse until you get the previous state - and then

play it back forwards to get back to where you are now. Both these

techniques are typically used to implement undo/redo.

Or you could use a persistent data structure. A persistent data structure

allows you to access previous versions at any time without having to do any

copying. All you needed to do at the time was to save a pointer to the data

structure. If you have a persistent data structure, your undo/redo

implementation is simply a stack of pointers that you push a pointer onto

after you make any change to the data structure.


4/21


5/21

Advantages

1. In text editing: while we edit text, one can access the previous recent

changes made to the texts by undo facility.

2. In file editing.

3. Planner Point Location: The point location problem is a fundamental topic

of Computational Geometry. It finds applications in areas that deal with

processing geometrical data.for example: In GPS, google maps. It can be of

1-dimmensional point location and of 2-D point location(example shown

later).

4. Persistent data structure is immutable i.e. it doesnt allow us to makesome more assumptions but also expand capabilities of language.

5. With Such an immutable data structure and functional style we can also

freely use MEMOIZATION ( In computing, memoization is an optimization

technique used primarily to speed up computer programs by having functioncalls avoid repeating the calculation of results for previously-processed

inputs.)

Most of the time we face the case that the application of a certain function to

a particular set of arguments happens repeatedly in the execution of a

program.So in such cases it becomes better to store the result of thecomputation in a table together with the function-arguments combinationthat gave rise to it. And we can avoid computing the value from the

beginning. An example of how memoization can speed up a deterministic

computation come from the Fibonaccifunction.

The Fibonacci function is defined mathematically by the following rule. By

definition, the first two Fibonacci numbers are 0 and 1, and each subsequent

number is the sum of the previous two.

Fib(n) = Fib(n-1) + Fib(n-2)

Where n is fibonacci number to be calculated.

Here, we are taking example of finding fib 6.

public class Fibonacci {

public static long fib(int n) {


6/21

if (n


7/21

In the above case, we can see that memorization turns a procedure that

takes exponential time to run into one which takes linear time.

6. Persistent DS can share structure, which reduces memory usage.

For example, consider list [1, 2, 3, 4] in Haskell and some imperative

language like Java. To produce new list in Haskell you only have to create

new cons (pair of value and reference-to-next-element) and connect it to the

previous list. In Java you have to create completely new list not to damage

the previous one.

7. Persistent (immutable) data structures extremely improve concurrency.Since multiple versions can be accessed at the same time and one can even

compare between two different versions simultaneously. This feature of

persistent data structure makes it concurrent.

8. Content Management System uses persistent data structure. For

example:Real-time collaborative editing facilitates geographicallydistributedusers to work together through individual contributions.An

example of such tools is Google Wave Live Editing and Google maps. Suchcollaborative editing applications not only need toefficiently maintain all the

editing histories for large amount of data,but are expected to persistentlykeep track on all changes of documentsas well. Compared to standalone

document editing, co-authoreddocuments are less controlled, especially in

an opencommunity. For example, users may query the authorship ofrecentlyinserted text or attach meta-data (a query or doubts may be asked about

the recent changes made to particular things) to pieces of data.

9. Object Database Management System (ODMS) uses persistent model. Aswhenever any changes made to the database, the program which is linked tomaintain the database uses such a data structure which keeps recent


8/21

updates and maintains the database, thus such programs uses persistent

data structure.

10. Version control system for software development uses fully persistent

database. It helps in keeping track of all the previous changes (states) that

were made in past time.

A version control systemis used to maintain systematic set of all the

versions of files that are made over the long time. Version control systems

allow people to access previous revisions of each files, and to compare any

two revisions to view the changes made between them. In this way, version

control keeps a historically accurate and retrievable log of a file's revisions.

More importantly, version control systems help several people (also in the

case when people are situated in geographical regions) to work together on

a development project over the Internet by merging their changes into the

same source repository.

11. Since it is possible to construct all previous states of data structures, It

is also useful in the programs which require backtracking.

Applications

Persistent Linked List

The singly linked list is one of the most widely used data structures inprogramming. It consists of a series of nodes linked together one right after

the another. Each node has a reference to the node which comes after it,

and the last node in the list terminates with a null reference.To traverse a singly linked list, we begin at the head of the list and

move from one node to the next node until wereached the node for which

we are looking for or have reached the last node:


9/21

Lets insert a new item into the list. This list is not persistent, meaning that it

can be changed in-place without generating a new version. After taking alook at the insertion operation on a non-persistent list, we'll look at the same

operation on a persistent list.

Inserting a new item into a singly linked list involves creating a new node:

We will insert the new node at the fourth position in the list. First, wetraverse the list until we've reached that position. Then the node that will

precede the new node is unlinked from the next node

...and relinked to the new node. The new node is, in turn, linked to the

remaining nodes in the list like following:

Inserting a new item into a persistent singly linked list will not alter the

existing list but create a new version with the item inserted into it. Instead

of copying the entire list and then inserting the item into the copy, it isalways better strategy to reuse as much of the old list as possible. Since the

nodes themselves are persistent, we don't have to worry about aliasing

problems.


10/21

To insert a new node at the fourth position, we traverse the list as before

only copying each node along the way (up to the desired location). Each

copied node is linked to the next copied node:

The last copied node is linked to the new node, and the new node is linked

to the remaining nodes in the old list:

On an average, about N/2 nodes will be copied in the persistent version for

insertions and deletions, where N equals the number of nodes in the list.

This isn't too much efficient but does give us some savings. One persistentdata structure where this approach to singly linked list saves us a lot is the

stack.

Persistent Binary Tree

A binary tree is a collection of nodes in which each node contains two links,one to its left child and another to its right child. Each child is itself a node,and either or both of the child nodes can be null, meaning that a node may

have zero to two children. In the binary search tree version, each node

usually stores a key/value pair. The tree is searched and ordered accordingto its keys. The key stored at a node is always greater than the keys stored

in its left descendents and always less than the keys stored in its right

descendents. This makes searching for any particular key very fast.

Here is an example of a binary search tree. The keys are listed as numbers;the values have been omitted but are assumed to exist. Notice how each key


11/21

as we descend to the left is less than the key of its predecessor, and vice

versa as you descend to the right:

Changing the value of a particular node in a non-persistent tree involvesstarting at the root of the tree and searching for a particular key associated

with that value, and then changing the value once the node has been found.Changing a persistent tree, on the other hand, generates a new version of

the tree. We will use the same strategy in implementing a persistent binarytree as we did for the persistent singly linked list, which is to reuse as much

of the data structure as possible when making a new version.

Let's change the value stored in the node with the key 7. As the search for

the key leads us down the tree, we copy each node along the way. If we

descend to the left, we point the previously copied node's left child to the

currently copied node. The previous node's right child continues to point to

nodes in the older version. If we descend to the right, we do just theopposite.

This illustrates the spine of the search down the tree. The red nodes are the

only nodes that need to be copied in making a new version of the tree (rest

of the part will remain same):


12/21

Insertions and deletions workin the same way, only steps should be taken

to keep the tree in balance, such as using an AVL tree. If a binary tree

becomes degenerate, we run into the same efficiency problems as we did

with the singly linked list.

Random Access List

An interesting persistent data structure that combines the singly linked list

with the binary tree is Chris Okasaki's random-access list. This datastructure allows for random access of its items as well as adding andremoving items from the beginning of the list. It is structured as a singlylinked list of completely balanced binary trees. The advantage of this data

structure is that it allows access, insertion, and removal of the head of the

list in O(1) time as well as provides logarithmic performance in randomly

accessing its items.

Here is a random-access list with 13 items:

When a node is added to the list, the first two root nodes (if they exist) are

checked to see if they both have the same height. If so, the new node is

made the parent of the first two nodes; the current head of the list is made

the left child of the new node, and the second root node is made the rightchild. If the first two root nodes do not have the same height, the new node

is simply placed at the beginning of the list and linked to the next tree in the

list.

To remove the head of the list, the root node at the beginning of the list is

removed, with its left child becoming the new head and its right child

becoming the root of the second tree in the list. The new head of the list is

right linked with the next root node in the list:


13/21

The algorithm for finding a node at a specific index is in two parts: in thefirst part, we find the tree in the list that contains the node we're looking for.

In the second part, we descend into the tree to find the node itself. The

following algorithm is used to find a node in the list at a specific index:

1. Let I be the index of the node we're looking for. Set T to the head ofthe list where T will be our reference to the root node of the current

tree in the list we're examining.2. If I is equal to 0, we've found the node we're looking for; terminate

algorithm. Else if Iis greater than or equal to the number of nodes in T,

subtract the number of nodes in T from I and set T to the root of thenext tree in the list and repeat step 2. Else if Iis less than the numberof nodes in T, go to step 3.

3. Set S to the number of nodes in T divided by 2 (the fractional part ofthe division is ignored. For example, if the number of nodes in the

current sub-tree is 3, S will be 1).4. If Iis less than S, subtract 1 from I and set T to T's left child. Else

subtract (S + 1) from I and set T to T's right child.

5. If I is equal to 0, we've found the node we're looking for; terminatealgorithm. Else go to step 3.

This illustrates using the algorithm to find the 10th item in the list:

Keep in mind that all operations that change a random-access list do not

change the existing list but rather generate a new version representing the

change. As much of the old list is reused in creating a new version.


14/21

Planar Point Location

Planar point location:this is the one of the fundamental topics ofcomputational geometry. It has several uses, such as in Global Positioning

Systems etc.There is an arrangement of polygons which divides

place into regions. Euclidean plane subdivided into polygons by n lineSegments that intersect only at their endpoints.

Query: given a query point p determine which polygon that contains p.

Measuredata structure by three parameters:

Preprocessing time

Query time

Space

Or


15/21


16/21

This plane can be divided into multiple vertical sections.

And in each vertical slab we can perform search methods to locate the point.


17/21

Within each slab the lines are totally ordered.

Now we can create a search tree, Ti, per slab containing the lines and

associate with eachLine.

Methods to make the data structure persistent

The Fat-Node Method

It record all changes made to node fields in the nodes them-selves, without

erasing old values of the fields. This requires that we allow nodes to become

arbitrarily fat i.e., to hold an arbitrary number of values of each field. Tobe more precise, each fat node will contain the same information and

pointer fields as an ephemeral node (holding original field values),

along with space for an arbitrary number of extra field values. Each extra

field value has an associated field name and a version stamp. The version

stamp indicates the version in which the named field was changed to have

the specified value. In addition, each fat node has its own version stamp,

indicating the version in which the node was created.

The Node-Copying Method

It allows nodes in the persistent structure to hold only a fixed number of

field values. When we run out of space in a node, we create a new copy of

the node, containing only the newest value of each field. We must also store

pointers to the new copy in all predecessors of the copied node in the

newest version. If there is no space in a predecessor for such a pointer, the

predecessor, too, must be copied. Nevertheless, if we assume that the

underlying ephemeral structure has nodes of constant bounded in-degree

and we allow sufficient extra space in each node of the persistent

structure, then we can derive an O( 1 ) amortized bound on the number of

nodes copied and the time required per update step.


18/21


19/21

Simple Application Code

An application program for implementing persistent data structure:

importjava.lang.reflect.InvocationHandler;

import java.lang.reflect.Method;import java.lang.reflect.Proxy;

import java.util.HashMap;

import java.util.Map;

class ImmutableBuilder {

static T of(Immutable immutable) {

ClasstargetClass = immutable.getTargetClass();

return (T) Proxy.newProxyInstance(targetClass.getClassLoader(),

new Class[]{targetClass},

immutable);

}

public static T of(ClassaClass) {

return of(new Immutable(aClass, new HashMap()));

}

}

class Immutable implements InvocationHandler {

private final ClasstargetClass;

private final Map fields;

public Immutable(ClassaTargetClass, MapimmutableFields) {

targetClass = aTargetClass;

fields = immutableFields;}

public Object invoke(Object proxy, Method method, Object[] args) throws

Throwable {

if (method.getName().equals("toString")) {


20/21

// XXX: toString() result can be cached

return fields.toString();

}

if (method.getName().equals("hashCode")) {// XXX: hashCode() result can be cached

return fields.hashCode();

}

// XXX: naming policy here

String fieldName = method.getName();

if (method.getReturnType().equals(targetClass)) {

MapnewFields = new HashMap(fields);

newFields.put(fieldName, args[0]);

return ImmutableBuilder.of(new Immutable(targetClass, newFields));

} else {

return fields.get(fieldName);

}

}

public ClassgetTargetClass() {

return targetClass;

}

public static void main(String[] args) {

Person mark = ImmutableBuilder.of(Person.class).name("mark");

Person a = mark.age(34);

Person john = mark.name("john");

Person b = john.age(21);

System.out.println(mark);

System.out.println(john);

System.out.println(a);

System.out.println(b);

}

}


21/21

interface Person {

String name();

Person name(String name);

int age();Person age(int age);

}

Explanations:

IMMUTABLE BUILDER: The Builder implements Cloneable and overrides

clone() and instead of copying every field of the builder, the immutable class

keeps a private clone of the builder. This makes it easy to return a new

builder and create slightly modified copies of an immutable instance

INVOCATION HANDLER: InvocationHandler is the interface implemented bythe invocation handlerof a proxy instance. Each proxy instance has an

associated invocation handler. When a method is invoked on a proxy

instance, the method invocation is encoded and dispatched to the invoke

method of its invocation handler.

We have created a plane old java Object (POJO) called PERSON which has

name & age as its attribute. And getters and setters to set the attributes.

Here we creating proxy server, as we cannot make interface as object.

Whenever we call setter of an attribute, instead of modifying, A new object

is created. For that we have created proxy classes(proxy are run-time

generated class) which has INVOCATION HANDLER which handles calls to

every method, whenever the setter with return type, the person class is

called, a new object is created and return with the new object set.

persistent data structure

Documents