persistent data structure
TRANSCRIPT
-
8/7/2019 PERSISTENT DATA STRUCTURE
1/21
PERSISTENT DATA STRUCTURE
1. Introduction2. Requirement of Persistent Data structure3. Ephemeral v/s Persistent Data Structure4. Advantages5. Applications
a. Persistent Linked Listb. Persistent Binary Treec. Random Access Listd. Planar Point Location
6. Methods to make the data structure persistenta. The Fat-Node Methodb. The Node-Copying Method7. Simple Application Code
-
8/7/2019 PERSISTENT DATA STRUCTURE
2/21
Introduction
The persistent data structure is a data structure which always preserves
previous version of itself when it is modified. This data structure is
immutable as their operations do not visibly update the structure in-place,
but instead always yield a new updated structure.
The data structure is partially persistent if all versions can be accessed but
only the newest version can be modified.
The data structure is fully persistent if every version can be both accessed
and modified.
If there is also a meld or merge operation that can create a new versionfrom two previous versions, the data structure is called confluently
persistent. In such data structures, we use combinatory to combine input ofmore than one previous versions to output a new single version. Rather thana branching tree, combinations of versions induce a DAG(direct acyclicgraph) structure on the version graph.
Structures that does not preserves previous versions are called ephemeral.
-
8/7/2019 PERSISTENT DATA STRUCTURE
3/21
Requirement of Persistent Data structure
The typical data structures most programmers know and use require
imperative programming: they fundamentally depend on replacing the
values of fields with assignment statements. A particular data structure
represents the state of something at that particular moment in time, and
that moment only. If you want to know what the state was in the past you
needed to have made a copy of the entire data structure back then, and
kept it around until you needed it.
Alternatively, you could keep a log of changes made to the data structure
that you could play in reverse until you get the previous state - and then
play it back forwards to get back to where you are now. Both these
techniques are typically used to implement undo/redo.
Or you could use a persistent data structure. A persistent data structure
allows you to access previous versions at any time without having to do any
copying. All you needed to do at the time was to save a pointer to the data
structure. If you have a persistent data structure, your undo/redo
implementation is simply a stack of pointers that you push a pointer onto
after you make any change to the data structure.
-
8/7/2019 PERSISTENT DATA STRUCTURE
4/21
-
8/7/2019 PERSISTENT DATA STRUCTURE
5/21
Advantages
1. In text editing: while we edit text, one can access the previous recent
changes made to the texts by undo facility.
2. In file editing.
3. Planner Point Location: The point location problem is a fundamental topic
of Computational Geometry. It finds applications in areas that deal with
processing geometrical data.for example: In GPS, google maps. It can be of
1-dimmensional point location and of 2-D point location(example shown
later).
4. Persistent data structure is immutable i.e. it doesnt allow us to makesome more assumptions but also expand capabilities of language.
5. With Such an immutable data structure and functional style we can also
freely use MEMOIZATION ( In computing, memoization is an optimization
technique used primarily to speed up computer programs by having functioncalls avoid repeating the calculation of results for previously-processed
inputs.)
Most of the time we face the case that the application of a certain function to
a particular set of arguments happens repeatedly in the execution of a
program.So in such cases it becomes better to store the result of thecomputation in a table together with the function-arguments combinationthat gave rise to it. And we can avoid computing the value from the
beginning. An example of how memoization can speed up a deterministic
computation come from the Fibonaccifunction.
The Fibonacci function is defined mathematically by the following rule. By
definition, the first two Fibonacci numbers are 0 and 1, and each subsequent
number is the sum of the previous two.
Fib(n) = Fib(n-1) + Fib(n-2)
Where n is fibonacci number to be calculated.
Here, we are taking example of finding fib 6.
public class Fibonacci {
public static long fib(int n) {
-
8/7/2019 PERSISTENT DATA STRUCTURE
6/21
if (n
-
8/7/2019 PERSISTENT DATA STRUCTURE
7/21
In the above case, we can see that memorization turns a procedure that
takes exponential time to run into one which takes linear time.
6. Persistent DS can share structure, which reduces memory usage.
For example, consider list [1, 2, 3, 4] in Haskell and some imperative
language like Java. To produce new list in Haskell you only have to create
new cons (pair of value and reference-to-next-element) and connect it to the
previous list. In Java you have to create completely new list not to damage
the previous one.
7. Persistent (immutable) data structures extremely improve concurrency.Since multiple versions can be accessed at the same time and one can even
compare between two different versions simultaneously. This feature of
persistent data structure makes it concurrent.
8. Content Management System uses persistent data structure. For
example:Real-time collaborative editing facilitates geographicallydistributedusers to work together through individual contributions.An
example of such tools is Google Wave Live Editing and Google maps. Suchcollaborative editing applications not only need toefficiently maintain all the
editing histories for large amount of data,but are expected to persistentlykeep track on all changes of documentsas well. Compared to standalone
document editing, co-authoreddocuments are less controlled, especially in
an opencommunity. For example, users may query the authorship ofrecentlyinserted text or attach meta-data (a query or doubts may be asked about
the recent changes made to particular things) to pieces of data.
9. Object Database Management System (ODMS) uses persistent model. Aswhenever any changes made to the database, the program which is linked tomaintain the database uses such a data structure which keeps recent
-
8/7/2019 PERSISTENT DATA STRUCTURE
8/21
updates and maintains the database, thus such programs uses persistent
data structure.
10. Version control system for software development uses fully persistent
database. It helps in keeping track of all the previous changes (states) that
were made in past time.
A version control systemis used to maintain systematic set of all the
versions of files that are made over the long time. Version control systems
allow people to access previous revisions of each files, and to compare any
two revisions to view the changes made between them. In this way, version
control keeps a historically accurate and retrievable log of a file's revisions.
More importantly, version control systems help several people (also in the
case when people are situated in geographical regions) to work together on
a development project over the Internet by merging their changes into the
same source repository.
11. Since it is possible to construct all previous states of data structures, It
is also useful in the programs which require backtracking.
Applications
Persistent Linked List
The singly linked list is one of the most widely used data structures inprogramming. It consists of a series of nodes linked together one right after
the another. Each node has a reference to the node which comes after it,
and the last node in the list terminates with a null reference.To traverse a singly linked list, we begin at the head of the list and
move from one node to the next node until wereached the node for which
we are looking for or have reached the last node:
-
8/7/2019 PERSISTENT DATA STRUCTURE
9/21
Lets insert a new item into the list. This list is not persistent, meaning that it
can be changed in-place without generating a new version. After taking alook at the insertion operation on a non-persistent list, we'll look at the same
operation on a persistent list.
Inserting a new item into a singly linked list involves creating a new node:
We will insert the new node at the fourth position in the list. First, wetraverse the list until we've reached that position. Then the node that will
precede the new node is unlinked from the next node
...and relinked to the new node. The new node is, in turn, linked to the
remaining nodes in the list like following:
Inserting a new item into a persistent singly linked list will not alter the
existing list but create a new version with the item inserted into it. Instead
of copying the entire list and then inserting the item into the copy, it isalways better strategy to reuse as much of the old list as possible. Since the
nodes themselves are persistent, we don't have to worry about aliasing
problems.
-
8/7/2019 PERSISTENT DATA STRUCTURE
10/21
To insert a new node at the fourth position, we traverse the list as before
only copying each node along the way (up to the desired location). Each
copied node is linked to the next copied node:
The last copied node is linked to the new node, and the new node is linked
to the remaining nodes in the old list:
On an average, about N/2 nodes will be copied in the persistent version for
insertions and deletions, where N equals the number of nodes in the list.
This isn't too much efficient but does give us some savings. One persistentdata structure where this approach to singly linked list saves us a lot is the
stack.
Persistent Binary Tree
A binary tree is a collection of nodes in which each node contains two links,one to its left child and another to its right child. Each child is itself a node,and either or both of the child nodes can be null, meaning that a node may
have zero to two children. In the binary search tree version, each node
usually stores a key/value pair. The tree is searched and ordered accordingto its keys. The key stored at a node is always greater than the keys stored
in its left descendents and always less than the keys stored in its right
descendents. This makes searching for any particular key very fast.
Here is an example of a binary search tree. The keys are listed as numbers;the values have been omitted but are assumed to exist. Notice how each key
-
8/7/2019 PERSISTENT DATA STRUCTURE
11/21
as we descend to the left is less than the key of its predecessor, and vice
versa as you descend to the right:
Changing the value of a particular node in a non-persistent tree involvesstarting at the root of the tree and searching for a particular key associated
with that value, and then changing the value once the node has been found.Changing a persistent tree, on the other hand, generates a new version of
the tree. We will use the same strategy in implementing a persistent binarytree as we did for the persistent singly linked list, which is to reuse as much
of the data structure as possible when making a new version.
Let's change the value stored in the node with the key 7. As the search for
the key leads us down the tree, we copy each node along the way. If we
descend to the left, we point the previously copied node's left child to the
currently copied node. The previous node's right child continues to point to
nodes in the older version. If we descend to the right, we do just theopposite.
This illustrates the spine of the search down the tree. The red nodes are the
only nodes that need to be copied in making a new version of the tree (rest
of the part will remain same):
-
8/7/2019 PERSISTENT DATA STRUCTURE
12/21
Insertions and deletions workin the same way, only steps should be taken
to keep the tree in balance, such as using an AVL tree. If a binary tree
becomes degenerate, we run into the same efficiency problems as we did
with the singly linked list.
Random Access List
An interesting persistent data structure that combines the singly linked list
with the binary tree is Chris Okasaki's random-access list. This datastructure allows for random access of its items as well as adding andremoving items from the beginning of the list. It is structured as a singlylinked list of completely balanced binary trees. The advantage of this data
structure is that it allows access, insertion, and removal of the head of the
list in O(1) time as well as provides logarithmic performance in randomly
accessing its items.
Here is a random-access list with 13 items:
When a node is added to the list, the first two root nodes (if they exist) are
checked to see if they both have the same height. If so, the new node is
made the parent of the first two nodes; the current head of the list is made
the left child of the new node, and the second root node is made the rightchild. If the first two root nodes do not have the same height, the new node
is simply placed at the beginning of the list and linked to the next tree in the
list.
To remove the head of the list, the root node at the beginning of the list is
removed, with its left child becoming the new head and its right child
becoming the root of the second tree in the list. The new head of the list is
right linked with the next root node in the list:
-
8/7/2019 PERSISTENT DATA STRUCTURE
13/21
The algorithm for finding a node at a specific index is in two parts: in thefirst part, we find the tree in the list that contains the node we're looking for.
In the second part, we descend into the tree to find the node itself. The
following algorithm is used to find a node in the list at a specific index:
1. Let I be the index of the node we're looking for. Set T to the head ofthe list where T will be our reference to the root node of the current
tree in the list we're examining.2. If I is equal to 0, we've found the node we're looking for; terminate
algorithm. Else if Iis greater than or equal to the number of nodes in T,
subtract the number of nodes in T from I and set T to the root of thenext tree in the list and repeat step 2. Else if Iis less than the numberof nodes in T, go to step 3.
3. Set S to the number of nodes in T divided by 2 (the fractional part ofthe division is ignored. For example, if the number of nodes in the
current sub-tree is 3, S will be 1).4. If Iis less than S, subtract 1 from I and set T to T's left child. Else
subtract (S + 1) from I and set T to T's right child.
5. If I is equal to 0, we've found the node we're looking for; terminatealgorithm. Else go to step 3.
This illustrates using the algorithm to find the 10th item in the list:
Keep in mind that all operations that change a random-access list do not
change the existing list but rather generate a new version representing the
change. As much of the old list is reused in creating a new version.
-
8/7/2019 PERSISTENT DATA STRUCTURE
14/21
Planar Point Location
Planar point location:this is the one of the fundamental topics ofcomputational geometry. It has several uses, such as in Global Positioning
Systems etc.There is an arrangement of polygons which divides
place into regions. Euclidean plane subdivided into polygons by n lineSegments that intersect only at their endpoints.
Query: given a query point p determine which polygon that contains p.
Measuredata structure by three parameters:
Preprocessing time
Query time
Space
Or
-
8/7/2019 PERSISTENT DATA STRUCTURE
15/21
-
8/7/2019 PERSISTENT DATA STRUCTURE
16/21
This plane can be divided into multiple vertical sections.
And in each vertical slab we can perform search methods to locate the point.
-
8/7/2019 PERSISTENT DATA STRUCTURE
17/21
Within each slab the lines are totally ordered.
Now we can create a search tree, Ti, per slab containing the lines and
associate with eachLine.
Methods to make the data structure persistent
The Fat-Node Method
It record all changes made to node fields in the nodes them-selves, without
erasing old values of the fields. This requires that we allow nodes to become
arbitrarily fat i.e., to hold an arbitrary number of values of each field. Tobe more precise, each fat node will contain the same information and
pointer fields as an ephemeral node (holding original field values),
along with space for an arbitrary number of extra field values. Each extra
field value has an associated field name and a version stamp. The version
stamp indicates the version in which the named field was changed to have
the specified value. In addition, each fat node has its own version stamp,
indicating the version in which the node was created.
The Node-Copying Method
It allows nodes in the persistent structure to hold only a fixed number of
field values. When we run out of space in a node, we create a new copy of
the node, containing only the newest value of each field. We must also store
pointers to the new copy in all predecessors of the copied node in the
newest version. If there is no space in a predecessor for such a pointer, the
predecessor, too, must be copied. Nevertheless, if we assume that the
underlying ephemeral structure has nodes of constant bounded in-degree
and we allow sufficient extra space in each node of the persistent
structure, then we can derive an O( 1 ) amortized bound on the number of
nodes copied and the time required per update step.
-
8/7/2019 PERSISTENT DATA STRUCTURE
18/21
-
8/7/2019 PERSISTENT DATA STRUCTURE
19/21
Simple Application Code
An application program for implementing persistent data structure:
importjava.lang.reflect.InvocationHandler;
import java.lang.reflect.Method;import java.lang.reflect.Proxy;
import java.util.HashMap;
import java.util.Map;
class ImmutableBuilder {
static T of(Immutable immutable) {
ClasstargetClass = immutable.getTargetClass();
return (T) Proxy.newProxyInstance(targetClass.getClassLoader(),
new Class[]{targetClass},
immutable);
}
public static T of(ClassaClass) {
return of(new Immutable(aClass, new HashMap()));
}
}
class Immutable implements InvocationHandler {
private final ClasstargetClass;
private final Map fields;
public Immutable(ClassaTargetClass, MapimmutableFields) {
targetClass = aTargetClass;
fields = immutableFields;}
public Object invoke(Object proxy, Method method, Object[] args) throws
Throwable {
if (method.getName().equals("toString")) {
-
8/7/2019 PERSISTENT DATA STRUCTURE
20/21
// XXX: toString() result can be cached
return fields.toString();
}
if (method.getName().equals("hashCode")) {// XXX: hashCode() result can be cached
return fields.hashCode();
}
// XXX: naming policy here
String fieldName = method.getName();
if (method.getReturnType().equals(targetClass)) {
MapnewFields = new HashMap(fields);
newFields.put(fieldName, args[0]);
return ImmutableBuilder.of(new Immutable(targetClass, newFields));
} else {
return fields.get(fieldName);
}
}
public ClassgetTargetClass() {
return targetClass;
}
public static void main(String[] args) {
Person mark = ImmutableBuilder.of(Person.class).name("mark");
Person a = mark.age(34);
Person john = mark.name("john");
Person b = john.age(21);
System.out.println(mark);
System.out.println(john);
System.out.println(a);
System.out.println(b);
}
}
-
8/7/2019 PERSISTENT DATA STRUCTURE
21/21
interface Person {
String name();
Person name(String name);
int age();Person age(int age);
}
Explanations:
IMMUTABLE BUILDER: The Builder implements Cloneable and overrides
clone() and instead of copying every field of the builder, the immutable class
keeps a private clone of the builder. This makes it easy to return a new
builder and create slightly modified copies of an immutable instance
INVOCATION HANDLER: InvocationHandler is the interface implemented bythe invocation handlerof a proxy instance. Each proxy instance has an
associated invocation handler. When a method is invoked on a proxy
instance, the method invocation is encoded and dispatched to the invoke
method of its invocation handler.
We have created a plane old java Object (POJO) called PERSON which has
name & age as its attribute. And getters and setters to set the attributes.
Here we creating proxy server, as we cannot make interface as object.
Whenever we call setter of an attribute, instead of modifying, A new object
is created. For that we have created proxy classes(proxy are run-time
generated class) which has INVOCATION HANDLER which handles calls to
every method, whenever the setter with return type, the person class is
called, a new object is created and return with the new object set.