icde 2002, san jose, ca efficient temporal join processing using indices donghui zhang university of...
TRANSCRIPT
![Page 1: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f505503460f94c72b53/html5/thumbnails/1.jpg)
ICDE 2002, San Jose, CA
Efficient Temporal Join Processing using Indices
Donghui Zhang
University of California, Riverside
Vassilis J. Tsotras
University of California, Riverside
Bernhard SeegerUniversity of Marburg, Germany
![Page 2: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f505503460f94c72b53/html5/thumbnails/2.jpg)
ICDE 2002, San Jose, CA
Contents
Problem definition: GTE-Join Straightforward approaches Temporal indexing Proposed join algorithms Performance study Conclusions
![Page 3: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f505503460f94c72b53/html5/thumbnails/3.jpg)
ICDE 2002, San Jose, CA
Problem Definition
Temporal record: (key, start, end, attributes) TE-Join: two records qualify for join if
their time intervals intersect; and their keys are equal.
![Page 4: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f505503460f94c72b53/html5/thumbnails/4.jpg)
ICDE 2002, San Jose, CA
DeptLocation Dept Start End Locaction D1 1 7 Boston D1 8 20 Riverside D2 1 20 Los Angeles D3 1 15 New York D3 16 20 San Jose
DeptManager Dept Start End Manager D1 1 10 John D1 11 20 Mart D2 1 20 Jane D3 1 20 Alice
DeptLocationManager Dept Start End Location Manager D1 1 7 Boston John D1 8 10 Riverside John D1 11 20 Riverside Mart D2 1 20 Los Angeles Jane D3 1 15 New York Alice D3 16 20 San Jose Alice
TE-Join: “find the locations and Managers of all departments over time”.
![Page 5: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f505503460f94c72b53/html5/thumbnails/5.jpg)
ICDE 2002, San Jose, CA
Problem Definition GTE-Join: general TE-Join – record keys
should be in a certain range r and time intervals should intersect a given interval i.
temporal relations are large;
TE-Join is a special case, when r and i are (-, +).
Interesting because:
time
key
r
i
![Page 6: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f505503460f94c72b53/html5/thumbnails/6.jpg)
ICDE 2002, San Jose, CA
DeptLocation Dept Start End Location D1 1 7 Boston D1 8 20 Riverside D2 1 20 Los Angeles D3 1 15 New York D3 16 20 San Jose
DeptManager Dept Start End Manager D1 1 10 John D1 11 20 Mart D2 1 20 Jane D3 1 20 Alice
DeptLocationManager Dept Start End Locaction Manager D1 5 7 Boston John D1 8 10 Riverside John D2 5 10 Los Angeles Jane
GTE-Join: “find the locations and managers of departments in range [D1, D2] during time [5, 10]”.
![Page 7: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f505503460f94c72b53/html5/thumbnails/7.jpg)
ICDE 2002, San Jose, CA
Straightforward Solutions
Non-indexed join; Unsynchronized join; Synchronized join using B+-trees; Synchronized join using R-trees.
![Page 8: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f505503460f94c72b53/html5/thumbnails/8.jpg)
ICDE 2002, San Jose, CA
Straightforward Solutions
1. Non-indexed join: existing TE-Join research [Zur97] focuses on non-indexed join; not efficient for GTE-Join due to full scan.
2. Unsynchronized join: separate the selection and join phases; not efficient for:
storage of intermediate result; selection in one relation ignores data
distribution of the other relation.
![Page 9: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f505503460f94c72b53/html5/thumbnails/9.jpg)
ICDE 2002, San Jose, CA
3. Synchronized using B+-trees;
Not efficient:
x2
tmin tmax
i
x1
Straightforward Solutions
If cluster on start:
Cluster on end is similar.
![Page 10: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f505503460f94c72b53/html5/thumbnails/10.jpg)
ICDE 2002, San Jose, CA
records with keys in r are stored together and are sorted;
focus on these records in each relation and sort-merge join, while skipping those whose intervals not in i.
However, not efficient since records in the query rectangle are scattered.
3. Synchronized using B+-trees;
Straightforward Solutions
If cluster on key:
![Page 11: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f505503460f94c72b53/html5/thumbnails/11.jpg)
ICDE 2002, San Jose, CA
Store each record as a two-dimensional interval in the R-tree;
Use existing R-tree join algorithms [BKS93, HJR97];
Modification: integrate the selection regarding query rectangle.However, not efficient since R-trees do
not handle long intervals well.
4. Synchronized using R-trees;
Straightforward Solutions
![Page 12: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f505503460f94c72b53/html5/thumbnails/12.jpg)
ICDE 2002, San Jose, CA
Our Solutions
Synchronized join using temporal indices. Multi-version B+-tree (MVBT) [BGO+96]:
asymptotically optimal space, update, query. We propose: two categories of synchronized,
MVBT-based join algorithms.
(apply to other temporal indices as well)
![Page 13: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f505503460f94c72b53/html5/thumbnails/13.jpg)
ICDE 2002, San Jose, CA
Review of MVBT Suppose a page holds up to 3 records.
time
key
t0
![Page 14: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f505503460f94c72b53/html5/thumbnails/14.jpg)
ICDE 2002, San Jose, CA
Review of MVBT Suppose a page holds up to 3 records.
time
key
now t0
![Page 15: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f505503460f94c72b53/html5/thumbnails/15.jpg)
ICDE 2002, San Jose, CA
Review of MVBT Suppose a page holds up to 3 records.
time
key
now t0 t1
![Page 16: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f505503460f94c72b53/html5/thumbnails/16.jpg)
ICDE 2002, San Jose, CA
Review of MVBT Suppose a page holds up to 3 records.
time
key
now t0 t1
![Page 17: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f505503460f94c72b53/html5/thumbnails/17.jpg)
ICDE 2002, San Jose, CA
Review of MVBT Suppose a page holds up to 3 records.
time
key
now t0 t1 t2
![Page 18: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f505503460f94c72b53/html5/thumbnails/18.jpg)
ICDE 2002, San Jose, CA
time
key
now t0 t1 t2
[t0, t1) Root 1
![Page 19: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f505503460f94c72b53/html5/thumbnails/19.jpg)
ICDE 2002, San Jose, CA
time
key
now t0 t1 t2
Root 2
[t0, t1) Root 1
[t1, t2)
![Page 20: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f505503460f94c72b53/html5/thumbnails/20.jpg)
ICDE 2002, San Jose, CA
time
key
now t0 t1 t2
Root 2 Root 3
[t1, t2) [t2, now) [t0, t1) Root 1
![Page 21: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f505503460f94c72b53/html5/thumbnails/21.jpg)
ICDE 2002, San Jose, CA
Review of MVBT
A “forest”: different trees may overlap; Root nodes correspond to contiguous, non-
intersecting time intervals; A record may be stored in multiple pages;
end time of all but the last copy is +. Range-Interval selection algorithms [BS96]:
avoid duplicate by reporting the first copy.
![Page 22: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f505503460f94c72b53/html5/thumbnails/22.jpg)
ICDE 2002, San Jose, CA
The Incorrect End Time Problem
key
time
y
t1 copy point
t2
x
Solution: report the rightmost copy!
[BS96] reports first copy of x (whose end is +); would lead GTE-Join algorithms to join x with y.
![Page 23: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f505503460f94c72b53/html5/thumbnails/23.jpg)
ICDE 2002, San Jose, CA
Top-down Approaches
Idea: for each pair of trees, one from each MVBT forest, synchronized tree traversal (STT).
STT for two trees: initially, join root nodes; to join two nodes, join their children; eventually, join elements in leaf pages.
? join condition?
![Page 24: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f505503460f94c72b53/html5/thumbnails/24.jpg)
ICDE 2002, San Jose, CA
Balancing Condition Optimization (BCO)
To find <x, y>, Page 3 and page 0 has to join;
page 2
page 3 x
y
page 1
page 0
BCO: balancing two conditions. (1) only intersecting pages join; (2) examine records even if not last copy. E.g. join <x, y> when joining page 2 with page 0.
In general, join two pages even though they do not intersect. Inefficient!
page 4 page 5
page 6
![Page 25: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f505503460f94c72b53/html5/thumbnails/25.jpg)
ICDE 2002, San Jose, CA
Virtual Height Optimization (VHO)
A1
B4 B5 B6 B7
B2 B3
B1
A3 A4 A2
At the middle level, STT joins:<A2, B2>, <A3, B2>, <A4, B2>,<A2, B3>, <A3, B3> ,<A4, B3>
A1’
With VHO: <A1, B2>, <A1, B3>
![Page 26: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f505503460f94c72b53/html5/thumbnails/26.jpg)
ICDE 2002, San Jose, CA
Sideways Approach 1: Link-based
A
B
C
In each leaf page, store a pointer to its predecessor;
D find pairs of data pages that intersect with the
right border of the query rectangle and with each other;
keep such pairs in priority queue; sweep left synchronously.
For GTE-Join:
![Page 27: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f505503460f94c72b53/html5/thumbnails/27.jpg)
ICDE 2002, San Jose, CA
Sideways Approach 1: Link-based
A
B
C
In each leaf page, store a pointer to its predecessor;
D
special techniques to avoid duplicates.
find pairs of data pages that intersect with the right border of the query rectangle and with each other;
keep such pairs in priority queue; sweep left synchronously.
For GTE-Join:
![Page 28: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f505503460f94c72b53/html5/thumbnails/28.jpg)
ICDE 2002, San Jose, CA
Sideways Approach 2: Plane Sweep
Similar to link-based; Maintain two priority queues, one for each
MVBT; At each step, access the leaf page with the
largest end time and add records to buffer; To add records to buffer, join with
existing records from the other MVBT; Throw away useless records.
![Page 29: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f505503460f94c72b53/html5/thumbnails/29.jpg)
ICDE 2002, San Jose, CA
Performance StudyNotation: Meaning:
mvbt_df Synchronized MVBT, depth-first
mvbt_bf Synchronized MVBT, breadth-first
mvbt_link Synchronized MVBT, link-based
mvbt_ps Synchronized MVBT, plane-sweep
mvbt_sm Unsynchronized, sort-merge after selection
b+ Synchronized B+-tree, index on keyr*_df Synchronized R*-tree, depth-first
r*_bf Synchronized R*-tree, breadth-first
![Page 30: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f505503460f94c72b53/html5/thumbnails/30.jpg)
ICDE 2002, San Jose, CA
Experimental Setup
• Implemented in GNU C++;• Sun Enterprise 250 Server machine with two
UltraSPARC-II processors using Solaris 2.8;• Page size = 8KB;• Buffer size = 10MB; LRU buffer;• Each data set: 10 million records;• QRS: size ratio between the query rectangle
and the whole space.• Long intervals: 1/100 of time space;• Short intervals: 1/10,000 of time space.
![Page 31: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f505503460f94c72b53/html5/thumbnails/31.jpg)
ICDE 2002, San Jose, CA
GTE-Join Performance
mvbt_df
mvbt_bf
mvbt_link
mvbt_ps
mvbt_sm
b+ r*_df r*_bf
0
1000
2000
3000
4000
IO
CPU
Tot
al T
ime
(# s
ec)
Joining mainly long intervals.
![Page 32: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f505503460f94c72b53/html5/thumbnails/32.jpg)
ICDE 2002, San Jose, CA
GTE-Join Performance
Joining mainly short intervals.
mvbt_df
mvbt_bf
mvbt_link
mvbt_ps
mvbt_sm
b+ r*_df r*_bf
0
500
1000
1500
2000
2500
IO
CPU
Tot
al T
ime
(# s
ec)
![Page 33: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f505503460f94c72b53/html5/thumbnails/33.jpg)
ICDE 2002, San Jose, CA
GTE-Join Performance
Varying QRS.
0.1% 1% 10%10
100
1000
10000
100000
mvbt_df
mvbt_link
mvbt_ps
mvbt_sm
b+
r*_dfTo
tal T
ime
(#
se
c)
(Log Scale)
![Page 34: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f505503460f94c72b53/html5/thumbnails/34.jpg)
ICDE 2002, San Jose, CA
Conclusions We addressed the GTE-Join; Unsynchronized approach not efficient; Synchronized approaches based on traditional
indices (B+-tree, R-tree) also not efficient; We proposed synchronized approaches based on
temporal indices (MVBT); We also proposed BCO and VHO optimizations; Experiments: link-based is the best.
![Page 35: ICDE 2002, San Jose, CA Efficient Temporal Join Processing using Indices Donghui Zhang University of California, Riverside Vassilis J. Tsotras University](https://reader036.vdocuments.net/reader036/viewer/2022062518/56649f505503460f94c72b53/html5/thumbnails/35.jpg)
ICDE 2002, San Jose, CA