indexing data structure

Vivek Kantariya(09bce020)

Guided by :Prof. Vibha Patel

Manage large data Provide faster access Easy search Reduce unwanted memory access Proper memory allocation Increase efficiency

It contains a search key and a pointer. Search key - an attribute or set of

attributes that is used to look up the records in a file.

Pointer - contains the address of where the data is stored in memory.

Five Factors involved when choosing the indexing technique:

1)access type2)access time3)insertion time4)deletion time5)space overhead

1) Access type - is the type of access being used.

2) Access time - time required to locate the data.

3) Insertion time - time required to insert the new data.

4) Deletion time - time required to delete the data.

5) Space overhead - the additional space occupied by the added data structure.

It is for multi- dimension data. Used to describe 2D or 3D objects. Real world usage.

Examples are : R tree , R+ tree , KD tree , A tree , Hilbert tree , etc

Computer Aided Design (CAD) Geographic applications (like maps) Multimedia Applications (like X-rays) Biological Databases

Any Type of Geometry Point

City Line

Trail Polygon

Border A Collection of Geometries

Ski Resort Trails Any Coordinate System

Meters Pixels WGS84 (GPS)

• Proposed by • Antonin Guttman• UC Berkley

• All Spatial Data Enveloped• Minimum Bounding Rectangle (MBR)

• Stored and Indexed According to MBR• Structure Resembles B+-tree• Height Balanced

• For an index record <I, tuple-identifier>• I = (I0, I1, … In)• n = Number of Dimensions in the Geometry• Each I is a set of the form [a,b] describing the range

of the rectangle along the dimension• a or b can be equal to infinity

• Tuple-identifier points to a record• Non-leaf nodes are in the form:

<I, child-pointer>

• M is the maximum number of entries in one node• m specifies the minimum number of entries in a

node , where m ≤ M/2• Properties :1. Every Leaf Node Contains Between m and M index

records unless it is root.2. For each index record, <I, tuple-identifier> in a leaf

node is the smallest rectangle that spatially contains the n-dimensional data object.

3. Every non-leaf node has between m and M children unless it is the root.

4. For each entry <I, child-pointer> in a non-leaf node, I is the smallest rectangle that spatially contains the rectangles in the child nodes.

5. The root node has at least two children unless it is a leaf.

6. All leaves appear on the same level.

1. Search2. Insert3. Delete4. Nearest Neighbor

1. Given R-tree with root T and and all records overlap with Search rectangle S.

2. If T is not leaf, check each entry E to determine whether Ei overlaps with S.

3. For all overlapping entries invoke search on each of them with root as node pointed by Ep.

4. If T is a leaf check each entry E. If it overlaps output it.

1) Start at the root node2) Select the child that needs the least

enlargement in order to fit the new geometry.

3) Repeat until at a leaf node.4) If leaf node has available space then insert.

5) Else split the entry into two nodes.• Update parent nodes• Update the entry that pointed to the node with

a new MBR [ Minimum Bounding Rectangle ] .• Add a new entry for the second new node

6) If there is no space in the parent node, split and repeat.

Make sure nodes are split so they cover the smallest possible area.

Split should minimize average search time.

GOOD SPLIT!

BAD!

1) Remove index node E from R-Tree.2) Find node containing record.3) Remove E.4) If node contains fewer than m records

remove the node and add it to Queue.5) Move up and do the same reducing

covering rectangles.6) Reinsert all records from Queue.

• Split Entries in the tree so that there is no overlap• No more multiple paths to reach a solution• Child pointers duplicated within the tree

R-Tree MBRs R+-Tree MBRs

Do not split nodes on insertTake entries from the overfull node and reinsert

them into the tree Changes MBRs

Saves time and possibly rebalances the tree

1. www.ieeexplore.ieee.org◦ A NEW APPROACH TO CREATING SPATIAL INDEX

WITH R-TREE byZe-Bao Zhang, Jian-Pei Zhang, Jing Yang, Yue Yang

◦ A NEW VARIATION OF R-TREE FOR INDEXING SPACIAL DATA IN GIS byChen Yongkang , Zhou Xintie , Shi Tailai , Feng Xiaoming

2. http://wikipedia.org/wiki/R_tree

http://www.ieeexplore.ieee.org/

indexing data structure

Education

time required

search key

index record

tree mbrs

smallest rectangle

leaf node

data

leaf