persistent data structure

Upload: amritesh-kumar

Post on 08-Apr-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/7/2019 PERSISTENT DATA STRUCTURE

    1/21

    PERSISTENT DATA STRUCTURE

    1. Introduction2. Requirement of Persistent Data structure3. Ephemeral v/s Persistent Data Structure4. Advantages5. Applications

    a. Persistent Linked Listb. Persistent Binary Treec. Random Access Listd. Planar Point Location

    6. Methods to make the data structure persistenta. The Fat-Node Methodb. The Node-Copying Method7. Simple Application Code

  • 8/7/2019 PERSISTENT DATA STRUCTURE

    2/21

    Introduction

    The persistent data structure is a data structure which always preserves

    previous version of itself when it is modified. This data structure is

    immutable as their operations do not visibly update the structure in-place,

    but instead always yield a new updated structure.

    The data structure is partially persistent if all versions can be accessed but

    only the newest version can be modified.

    The data structure is fully persistent if every version can be both accessed

    and modified.

    If there is also a meld or merge operation that can create a new versionfrom two previous versions, the data structure is called confluently

    persistent. In such data structures, we use combinatory to combine input ofmore than one previous versions to output a new single version. Rather thana branching tree, combinations of versions induce a DAG(direct acyclicgraph) structure on the version graph.

    Structures that does not preserves previous versions are called ephemeral.

  • 8/7/2019 PERSISTENT DATA STRUCTURE

    3/21

    Requirement of Persistent Data structure

    The typical data structures most programmers know and use require

    imperative programming: they fundamentally depend on replacing the

    values of fields with assignment statements. A particular data structure

    represents the state of something at that particular moment in time, and

    that moment only. If you want to know what the state was in the past you

    needed to have made a copy of the entire data structure back then, and

    kept it around until you needed it.

    Alternatively, you could keep a log of changes made to the data structure

    that you could play in reverse until you get the previous state - and then

    play it back forwards to get back to where you are now. Both these

    techniques are typically used to implement undo/redo.

    Or you could use a persistent data structure. A persistent data structure

    allows you to access previous versions at any time without having to do any

    copying. All you needed to do at the time was to save a pointer to the data

    structure. If you have a persistent data structure, your undo/redo

    implementation is simply a stack of pointers that you push a pointer onto

    after you make any change to the data structure.

  • 8/7/2019 PERSISTENT DATA STRUCTURE

    4/21

  • 8/7/2019 PERSISTENT DATA STRUCTURE

    5/21

    Advantages

    1. In text editing: while we edit text, one can access the previous recent

    changes made to the texts by undo facility.

    2. In file editing.

    3. Planner Point Location: The point location problem is a fundamental topic

    of Computational Geometry. It finds applications in areas that deal with

    processing geometrical data.for example: In GPS, google maps. It can be of

    1-dimmensional point location and of 2-D point location(example shown

    later).

    4. Persistent data structure is immutable i.e. it doesnt allow us to makesome more assumptions but also expand capabilities of language.

    5. With Such an immutable data structure and functional style we can also

    freely use MEMOIZATION ( In computing, memoization is an optimization

    technique used primarily to speed up computer programs by having functioncalls avoid repeating the calculation of results for previously-processed

    inputs.)

    Most of the time we face the case that the application of a certain function to

    a particular set of arguments happens repeatedly in the execution of a

    program.So in such cases it becomes better to store the result of thecomputation in a table together with the function-arguments combinationthat gave rise to it. And we can avoid computing the value from the

    beginning. An example of how memoization can speed up a deterministic

    computation come from the Fibonaccifunction.

    The Fibonacci function is defined mathematically by the following rule. By

    definition, the first two Fibonacci numbers are 0 and 1, and each subsequent

    number is the sum of the previous two.

    Fib(n) = Fib(n-1) + Fib(n-2)

    Where n is fibonacci number to be calculated.

    Here, we are taking example of finding fib 6.

    public class Fibonacci {

    public static long fib(int n) {

  • 8/7/2019 PERSISTENT DATA STRUCTURE

    6/21

    if (n

  • 8/7/2019 PERSISTENT DATA STRUCTURE

    7/21

    In the above case, we can see that memorization turns a procedure that

    takes exponential time to run into one which takes linear time.

    6. Persistent DS can share structure, which reduces memory usage.

    For example, consider list [1, 2, 3, 4] in Haskell and some imperative

    language like Java. To produce new list in Haskell you only have to create

    new cons (pair of value and reference-to-next-element) and connect it to the

    previous list. In Java you have to create completely new list not to damage

    the previous one.

    7. Persistent (immutable) data structures extremely improve concurrency.Since multiple versions can be accessed at the same time and one can even

    compare between two different versions simultaneously. This feature of

    persistent data structure makes it concurrent.

    8. Content Management System uses persistent data structure. For

    example:Real-time collaborative editing facilitates geographicallydistributedusers to work together through individual contributions.An

    example of such tools is Google Wave Live Editing and Google maps. Suchcollaborative editing applications not only need toefficiently maintain all the

    editing histories for large amount of data,but are expected to persistentlykeep track on all changes of documentsas well. Compared to standalone

    document editing, co-authoreddocuments are less controlled, especially in

    an opencommunity. For example, users may query the authorship ofrecentlyinserted text or attach meta-data (a query or doubts may be asked about

    the recent changes made to particular things) to pieces of data.

    9. Object Database Management System (ODMS) uses persistent model. Aswhenever any changes made to the database, the program which is linked tomaintain the database uses such a data structure which keeps recent

  • 8/7/2019 PERSISTENT DATA STRUCTURE

    8/21

    updates and maintains the database, thus such programs uses persistent

    data structure.

    10. Version control system for software development uses fully persistent

    database. It helps in keeping track of all the previous changes (states) that

    were made in past time.

    A version control systemis used to maintain systematic set of all the

    versions of files that are made over the long time. Version control systems

    allow people to access previous revisions of each files, and to compare any

    two revisions to view the changes made between them. In this way, version

    control keeps a historically accurate and retrievable log of a file's revisions.

    More importantly, version control systems help several people (also in the

    case when people are situated in geographical regions) to work together on

    a development project over the Internet by merging their changes into the

    same source repository.

    11. Since it is possible to construct all previous states of data structures, It

    is also useful in the programs which require backtracking.

    Applications

    Persistent Linked List

    The singly linked list is one of the most widely used data structures inprogramming. It consists of a series of nodes linked together one right after

    the another. Each node has a reference to the node which comes after it,

    and the last node in the list terminates with a null reference.To traverse a singly linked list, we begin at the head of the list and

    move from one node to the next node until wereached the node for which

    we are looking for or have reached the last node:

  • 8/7/2019 PERSISTENT DATA STRUCTURE

    9/21

    Lets insert a new item into the list. This list is not persistent, meaning that it

    can be changed in-place without generating a new version. After taking alook at the insertion operation on a non-persistent list, we'll look at the same

    operation on a persistent list.

    Inserting a new item into a singly linked list involves creating a new node:

    We will insert the new node at the fourth position in the list. First, wetraverse the list until we've reached that position. Then the node that will

    precede the new node is unlinked from the next node

    ...and relinked to the new node. The new node is, in turn, linked to the

    remaining nodes in the list like following:

    Inserting a new item into a persistent singly linked list will not alter the

    existing list but create a new version with the item inserted into it. Instead

    of copying the entire list and then inserting the item into the copy, it isalways better strategy to reuse as much of the old list as possible. Since the

    nodes themselves are persistent, we don't have to worry about aliasing

    problems.

  • 8/7/2019 PERSISTENT DATA STRUCTURE

    10/21

    To insert a new node at the fourth position, we traverse the list as before

    only copying each node along the way (up to the desired location). Each

    copied node is linked to the next copied node:

    The last copied node is linked to the new node, and the new node is linked

    to the remaining nodes in the old list:

    On an average, about N/2 nodes will be copied in the persistent version for

    insertions and deletions, where N equals the number of nodes in the list.

    This isn't too much efficient but does give us some savings. One persistentdata structure where this approach to singly linked list saves us a lot is the

    stack.

    Persistent Binary Tree

    A binary tree is a collection of nodes in which each node contains two links,one to its left child and another to its right child. Each child is itself a node,and either or both of the child nodes can be null, meaning that a node may

    have zero to two children. In the binary search tree version, each node

    usually stores a key/value pair. The tree is searched and ordered accordingto its keys. The key stored at a node is always greater than the keys stored

    in its left descendents and always less than the keys stored in its right

    descendents. This makes searching for any particular key very fast.

    Here is an example of a binary search tree. The keys are listed as numbers;the values have been omitted but are assumed to exist. Notice how each key

  • 8/7/2019 PERSISTENT DATA STRUCTURE

    11/21

    as we descend to the left is less than the key of its predecessor, and vice

    versa as you descend to the right:

    Changing the value of a particular node in a non-persistent tree involvesstarting at the root of the tree and searching for a particular key associated

    with that value, and then changing the value once the node has been found.Changing a persistent tree, on the other hand, generates a new version of

    the tree. We will use the same strategy in implementing a persistent binarytree as we did for the persistent singly linked list, which is to reuse as much

    of the data structure as possible when making a new version.

    Let's change the value stored in the node with the key 7. As the search for

    the key leads us down the tree, we copy each node along the way. If we

    descend to the left, we point the previously copied node's left child to the

    currently copied node. The previous node's right child continues to point to

    nodes in the older version. If we descend to the right, we do just theopposite.

    This illustrates the spine of the search down the tree. The red nodes are the

    only nodes that need to be copied in making a new version of the tree (rest

    of the part will remain same):

  • 8/7/2019 PERSISTENT DATA STRUCTURE

    12/21

    Insertions and deletions workin the same way, only steps should be taken

    to keep the tree in balance, such as using an AVL tree. If a binary tree

    becomes degenerate, we run into the same efficiency problems as we did

    with the singly linked list.

    Random Access List

    An interesting persistent data structure that combines the singly linked list

    with the binary tree is Chris Okasaki's random-access list. This datastructure allows for random access of its items as well as adding andremoving items from the beginning of the list. It is structured as a singlylinked list of completely balanced binary trees. The advantage of this data

    structure is that it allows access, insertion, and removal of the head of the

    list in O(1) time as well as provides logarithmic performance in randomly

    accessing its items.

    Here is a random-access list with 13 items:

    When a node is added to the list, the first two root nodes (if they exist) are

    checked to see if they both have the same height. If so, the new node is

    made the parent of the first two nodes; the current head of the list is made

    the left child of the new node, and the second root node is made the rightchild. If the first two root nodes do not have the same height, the new node

    is simply placed at the beginning of the list and linked to the next tree in the

    list.

    To remove the head of the list, the root node at the beginning of the list is

    removed, with its left child becoming the new head and its right child

    becoming the root of the second tree in the list. The new head of the list is

    right linked with the next root node in the list:

  • 8/7/2019 PERSISTENT DATA STRUCTURE

    13/21

    The algorithm for finding a node at a specific index is in two parts: in thefirst part, we find the tree in the list that contains the node we're looking for.

    In the second part, we descend into the tree to find the node itself. The

    following algorithm is used to find a node in the list at a specific index:

    1. Let I be the index of the node we're looking for. Set T to the head ofthe list where T will be our reference to the root node of the current

    tree in the list we're examining.2. If I is equal to 0, we've found the node we're looking for; terminate

    algorithm. Else if Iis greater than or equal to the number of nodes in T,

    subtract the number of nodes in T from I and set T to the root of thenext tree in the list and repeat step 2. Else if Iis less than the numberof nodes in T, go to step 3.

    3. Set S to the number of nodes in T divided by 2 (the fractional part ofthe division is ignored. For example, if the number of nodes in the

    current sub-tree is 3, S will be 1).4. If Iis less than S, subtract 1 from I and set T to T's left child. Else

    subtract (S + 1) from I and set T to T's right child.

    5. If I is equal to 0, we've found the node we're looking for; terminatealgorithm. Else go to step 3.

    This illustrates using the algorithm to find the 10th item in the list:

    Keep in mind that all operations that change a random-access list do not

    change the existing list but rather generate a new version representing the

    change. As much of the old list is reused in creating a new version.

  • 8/7/2019 PERSISTENT DATA STRUCTURE

    14/21

    Planar Point Location

    Planar point location:this is the one of the fundamental topics ofcomputational geometry. It has several uses, such as in Global Positioning

    Systems etc.There is an arrangement of polygons which divides

    place into regions. Euclidean plane subdivided into polygons by n lineSegments that intersect only at their endpoints.

    Query: given a query point p determine which polygon that contains p.

    Measuredata structure by three parameters:

    Preprocessing time

    Query time

    Space

    Or

  • 8/7/2019 PERSISTENT DATA STRUCTURE

    15/21

  • 8/7/2019 PERSISTENT DATA STRUCTURE

    16/21

    This plane can be divided into multiple vertical sections.

    And in each vertical slab we can perform search methods to locate the point.

  • 8/7/2019 PERSISTENT DATA STRUCTURE

    17/21

    Within each slab the lines are totally ordered.

    Now we can create a search tree, Ti, per slab containing the lines and

    associate with eachLine.

    Methods to make the data structure persistent

    The Fat-Node Method

    It record all changes made to node fields in the nodes them-selves, without

    erasing old values of the fields. This requires that we allow nodes to become

    arbitrarily fat i.e., to hold an arbitrary number of values of each field. Tobe more precise, each fat node will contain the same information and

    pointer fields as an ephemeral node (holding original field values),

    along with space for an arbitrary number of extra field values. Each extra

    field value has an associated field name and a version stamp. The version

    stamp indicates the version in which the named field was changed to have

    the specified value. In addition, each fat node has its own version stamp,

    indicating the version in which the node was created.

    The Node-Copying Method

    It allows nodes in the persistent structure to hold only a fixed number of

    field values. When we run out of space in a node, we create a new copy of

    the node, containing only the newest value of each field. We must also store

    pointers to the new copy in all predecessors of the copied node in the

    newest version. If there is no space in a predecessor for such a pointer, the

    predecessor, too, must be copied. Nevertheless, if we assume that the

    underlying ephemeral structure has nodes of constant bounded in-degree

    and we allow sufficient extra space in each node of the persistent

    structure, then we can derive an O( 1 ) amortized bound on the number of

    nodes copied and the time required per update step.

  • 8/7/2019 PERSISTENT DATA STRUCTURE

    18/21

  • 8/7/2019 PERSISTENT DATA STRUCTURE

    19/21

    Simple Application Code

    An application program for implementing persistent data structure:

    importjava.lang.reflect.InvocationHandler;

    import java.lang.reflect.Method;import java.lang.reflect.Proxy;

    import java.util.HashMap;

    import java.util.Map;

    class ImmutableBuilder {

    static T of(Immutable immutable) {

    ClasstargetClass = immutable.getTargetClass();

    return (T) Proxy.newProxyInstance(targetClass.getClassLoader(),

    new Class[]{targetClass},

    immutable);

    }

    public static T of(ClassaClass) {

    return of(new Immutable(aClass, new HashMap()));

    }

    }

    class Immutable implements InvocationHandler {

    private final ClasstargetClass;

    private final Map fields;

    public Immutable(ClassaTargetClass, MapimmutableFields) {

    targetClass = aTargetClass;

    fields = immutableFields;}

    public Object invoke(Object proxy, Method method, Object[] args) throws

    Throwable {

    if (method.getName().equals("toString")) {

  • 8/7/2019 PERSISTENT DATA STRUCTURE

    20/21

    // XXX: toString() result can be cached

    return fields.toString();

    }

    if (method.getName().equals("hashCode")) {// XXX: hashCode() result can be cached

    return fields.hashCode();

    }

    // XXX: naming policy here

    String fieldName = method.getName();

    if (method.getReturnType().equals(targetClass)) {

    MapnewFields = new HashMap(fields);

    newFields.put(fieldName, args[0]);

    return ImmutableBuilder.of(new Immutable(targetClass, newFields));

    } else {

    return fields.get(fieldName);

    }

    }

    public ClassgetTargetClass() {

    return targetClass;

    }

    public static void main(String[] args) {

    Person mark = ImmutableBuilder.of(Person.class).name("mark");

    Person a = mark.age(34);

    Person john = mark.name("john");

    Person b = john.age(21);

    System.out.println(mark);

    System.out.println(john);

    System.out.println(a);

    System.out.println(b);

    }

    }

  • 8/7/2019 PERSISTENT DATA STRUCTURE

    21/21

    interface Person {

    String name();

    Person name(String name);

    int age();Person age(int age);

    }

    Explanations:

    IMMUTABLE BUILDER: The Builder implements Cloneable and overrides

    clone() and instead of copying every field of the builder, the immutable class

    keeps a private clone of the builder. This makes it easy to return a new

    builder and create slightly modified copies of an immutable instance

    INVOCATION HANDLER: InvocationHandler is the interface implemented bythe invocation handlerof a proxy instance. Each proxy instance has an

    associated invocation handler. When a method is invoked on a proxy

    instance, the method invocation is encoded and dispatched to the invoke

    method of its invocation handler.

    We have created a plane old java Object (POJO) called PERSON which has

    name & age as its attribute. And getters and setters to set the attributes.

    Here we creating proxy server, as we cannot make interface as object.

    Whenever we call setter of an attribute, instead of modifying, A new object

    is created. For that we have created proxy classes(proxy are run-time

    generated class) which has INVOCATION HANDLER which handles calls to

    every method, whenever the setter with return type, the person class is

    called, a new object is created and return with the new object set.