improving min/max aggregation over spatial objects donghui zhang, vassilis j. tsotras university of...
Post on 11-Jan-2016
221 Views
Preview:
TRANSCRIPT
Improving Min/Max Aggregation over Spatial Objects
Donghui Zhang, Vassilis J. Tsotras
University of California, Riverside
ACM GIS’01
Outline
• Problem Definition
• Straightforward Solutions
• Our Solution
• Performance Results
• By-Product: Optimized the MSB-tree
• Conclusions
ACM GIS’01
Problem Definition• Consider a collection of spatial objects.
• Each object: rectangle r, value v.
• Spatial Aggregation: find aggregate value over objects intersecting a given rectangle. We focus on MAX.
• E.g.: a database of rainfalls over geographical areas. Find max rainfall in Los Angeles area.
Problem Definition
5 4
2
7
1
ACM GIS’01
Straightforward Solutions• Use an R*-tree [BKS+90] to index the objects.
• Reduce to range search.
Straightforward Solutions
• Better approach: aR-tree [PKZ+01, LM01]. Store MAX of the sub-tree in internal nodes;
• If query rectangle contains a sub-tree, no need to search it.
ACM GIS’01
Straightforward Solutions• Use an R*-tree [BKS+90] to index the objects.
• Reduce to range search.
Straightforward Solutions
• Better approach: aR-tree [PKZ+01, LM01]. Store MAX of the sub-tree in internal nodes;
• If query rectangle contains a sub-tree, no need to search it.
ACM GIS’01
Our Solution -- overview• The MR-tree: a specialized index for Min/Max
aggregation. It uses the R*-tree and four optimization techniques:
Our Solution
k-max : increase the chance for the search algorithm to stop at higher tree levels;
box-elimination : erase information from the tree that will not contribute to any query;
union : do not insert an object which will not contribute to any query;
area-reduction : reduce the area of the object to be inserted.ACM GIS’01
The k-max Optimization• Motivation: The aR-tree is not efficient if the
query rectangle intersects but does not fully contain a sub-tree rectangle.
Optimization Techniques
8
7 4
1 5 7
5 9
2
4
9 6
4 2
ACM GIS’01
The k-max Optimization
Optimization Techniques
8
7 4
1 5 7
5 9
2
4
9 6
4 2
• Motivation: The aR-tree is not efficient if the query rectangle intersects but does not fully contain a sub-tree rectangle.
ACM GIS’01
The k-max Optimization• Along with each index record r, store the k
max-value objects in sub-tree(r).
• Upon query, if the query rectangle intersects any of the k objects at r, omit sub-tree(r).
Optimization Techniques
• Trade-off: larger k more sub-trees to be omitted during query; but also more space & update.
ACM GIS’01
The box-elimination Optimization• Motivation: if for objects o1 and o2 , o1.box
contains o2 .box and o1.value o2 .value, o2 is obsolete, i.e. does not contribute to any query and thus can be deleted.
Optimization Techniques
o1:7
o2:5
ACM GIS’01
The box-elimination Optimization• Similar for object o1 and index record r2 , i.e. if
o1.box contains r2 .box and o1.value max value in sub-tree(r2), the whole sub-tree is obsolete.
Optimization Techniques
• Trade-off: larger c smaller index size and faster query time; but also more update time.
• Ideally, remove all obsolete objects/sub-trees, but too expensive. Instead, pick c (c : constant) paths.
• The optimization: at insertion, remove obsolete objects and sub-trees along the insertion path.
ACM GIS’01
The union Optimization• Motivation 1: if a new object o1 is obsolete due to an
existing object o2 , o1 should not be inserted.
Optimization Techniques
• Motivation 2: a new object o1 may be obsolete due to the union of several existing objects.
o1: 2
8
7
ACM GIS’01
The union Optimization• Motivation 1: if a new object o1 is obsolete due to an
existing object o2 , o1 should not be inserted.
Optimization Techniques
• Motivation 2: a new object o1 may be obsolete due to the union of several existing objects.
o1: 2
8
7
ACM GIS’01
The union Optimization• Along with each index record r, store the
union of boxes of all objects in sub-tree(r); also store the MIN value of all these objects.
• Do not perform the insertion of object o1 if:
Optimization Techniques
• Question: how is the union computed and stored?
o1.box is contained in r.union, and
o1.value r.min.
ACM GIS’01
The union Optimization• Store an approximate union representation using t
(t : constant) boxes.
• The approximation should be fully contained in the actual union, and should cover as much space as possible.
Optimization Techniques
• Def: given a set of n boxes S={s1,…, sn}, the covered t-union of S is a set of t boxes A={a1,…, at} s.t. si covers ai , and
ai covers max area possible.
ACM GIS’01
The union Optimization
Optimization Techniques
• To compute the exact covered t-union: O(n
2t+4).
• We propose an much faster approximate algorithm: O(n logn).
ACM GIS’01
• Idea of our algorithm: pick up t largest boxes and expand them.
The area-reduction Optimization• Motivation: the box of a new object o1 can be reduced if
an existing object o2 intersects it with a larger or equal value.
Optimization Techniques
o2: 8 o1: 6
ACM GIS’01
The area-reduction Optimization• Motivation: the box of a new object o1 can be reduced if
an existing object o2 intersects it with a larger or equal value.
Optimization Techniques
o2: 8 o1: 6
ACM GIS’01
The area-reduction Optimization• Reduce the area of new object o1 when:
Optimization Techniques
index record r s.t. r.union intersects o.box and r.min o.value, or
one of the k max-value objects intersects o1 with a larger or equal value, or
leaf object o2 s.t. o2 .box intersects o1.box and o2 .value o1.value .
ACM GIS’01
The area-reduction Optimization• Benefit 1: reduce overlap among sibling nodes.
Optimization Techniques
8
r1 (min=9)
r2 (min=7)
new object
ACM GIS’01
The area-reduction Optimization• Benefit 1: reduce overlap among sibling nodes.
Optimization Techniques
• Benefit 2: increase chance to make new objects obsolete.
8
r1 (min=9)
r2 (min=7)
actual object inserted
ACM GIS’01
Performance Results• Datasets: 5 million square objects, size randomly chosen
from 10 to 10000 (space in each dimension is 1 to one million).
• Implemented algorithms:
Performance Results
R*: the R*-tree [BKS+90];
aR: the aR-tree [PKZ+01, LM01];
kaR: the aR-tree with k-max optimization;
MR: the MR-tree (with all the optimizations).
ACM GIS’01
Index Sizes
Performance Results
R* aR kaR MR
0
25
50
75
100
125
150In
dex
Siz
es (
#MB
)
ACM GIS’01
Performance Results
Query Performance (log scale)
• Query time is the total of 100 random queries of the same query rectangle size.
0.0001 0.001 0.01 0.1 1 10 50 .
0.01
0.1
1
10
100
1000
10000
R*
aR
kaR
MR
Query Rectangle Area (%)
Que
ry T
ime
(#se
c)
ACM GIS’01
Optimizing the MSB-tree• The MSB-tree [YW00]: efficiently maintains and computes
MIN/MAX aggregates over 1-dim interval data.
• Insertion/Query: O(logB m), B is page capacity, m is number of leaf records.
• [YW00]: periodically reconstruct the whole tree to maintain a small m. During reconstruction, the index is off-line.
• Can avoid reconstruction by applying the box-elimination optimization. Idea: if a new interval contains all intervals in a sub-tree with a larger value, the sub-tree is obsolete.
Optimizing the MSB-treeACM GIS’01
Conclusions
• Addressed the MIN/MAX aggregation problem over spatial objects;
• Four optimization techniques;
• The MR-tree;
• Much smaller index size and query time;
• By-product: optimized the MSB-tree.
ConclusionsACM GIS’01
top related