xjoin : getting fast answers from slow and bursty networks
DESCRIPTION
XJoin : Getting Fast Answers From Slow and Bursty Networks. T. Urhan M. J. Franklin IACS, CSD, University of Maryland. Presented by: Abdelmounaam Rezgui. CS-TR-3994. The Problem. How to improve the interactive performance of queries over widely distributed data sources ?. 2. Source B. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/1.jpg)
XJoinXJoin: : Getting Fast Answers From Getting Fast Answers From Slow and Bursty NetworksSlow and Bursty Networks
T. UrhanM. J. Franklin
IACS, CSD, University of
Maryland
Presented by: Abdelmounaam
Rezgui
CS-TR-3994
![Page 2: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/2.jpg)
The Problem
How to improve the interactive performance of queries over widely distributed data sources ?
2
![Page 3: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/3.jpg)
RS
Tuples
Tuples
3
The Problem
Source BSource A
![Page 4: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/4.jpg)
Why is the response-time unpredictable ?
• Remote sources
• Intermediate sites
• Communication links
• Overloading
• Congestion
• Failures
are vulnerable
to {
4
Significant and unpredictable delays
Unresponsive and unusable systems
![Page 5: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/5.jpg)
Different classes of delays
• Initial delay: a longer than expected wait to receive the first tuple.
• Slow delivery: data arrive at a fairly constant but slower than expected rate.
• Bursty arrival: bursts of data followed by long periods of no arrivals.
5
![Page 6: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/6.jpg)
Some Join variants
• Nested Loops Join• Block Nested Loops Join• Index Nested Loops Join• Sort-Merge Join• Classic Hash Join• Simple Hash Join• Grace Hash Join• Hybrid Hash Join (HHJ)• TID Hash Join• Symmetric Hash Join (SHJ)• XJoin
6
![Page 7: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/7.jpg)
Query Scrambling
reacts to data delivery pbs. by on-the-fly rescheduling of query operators and
restructuring of the query execution plan.
7
• improve the response time for the entire query• may slow down the return of some initial results
To be presented on November 22, 1999
![Page 8: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/8.jpg)
Traditional query processing techniques
• Reduce the memory requirements• Reduce Disk I/O
• Delivery of the entire query result (on-line users would like to receive initial results asap.)
• Slow and bursty delivery of data from remote sources can stall query execution.
8
![Page 9: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/9.jpg)
XJoin: Fundamental principles
• improves the interactive performance by producing results incrementally (as they become available)
• allows progress to be made even when one or more sources experience delays (delays are exploited to produce more tuples earlier)
9
![Page 10: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/10.jpg)
XJoin : The key idea
When inputs are delayed
run a background processing on the previously received results
10
![Page 11: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/11.jpg)
• Managing the flow of tuples between memory and secondary storage.
• Controlling the background processing.
• Full answer (all the tuples are produced).
• No duplicate tuples are generated.
XJoin : The challenges
11
![Page 12: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/12.jpg)
SHJoin (Symmetric Hash Join)
Hash table 2
Matching
Hash table 1
Source 2Source 112
![Page 13: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/13.jpg)
SHJoin requires:
13
Hash tables for both of its inputs be memory resident.
Unacceptable for complex queries.
![Page 14: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/14.jpg)
XJoin
14
Partioning:
• each input is partitioned into a number of partitions based on a hash function.
• each partition i of source A, PiA :
PiA = MPiA DPiA
MPiA DPiA =
![Page 15: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/15.jpg)
D I S K Tuple B
hash(Tuple B) = n
SOURCE-B
Memory-resident partitions of source B
. . . . . .k1 n
flu
shDisk-resident
partitions of source B
. . . . . .
Disk-residentpartitions of source A
Memory-resident partitions of source A
. . . . . . . . . . . .1
SOURCE-A
M E
M O
R Y
. . .
n
1n1 k n
15
Tuple A
hash(Tuple A) = 1
![Page 16: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/16.jpg)
hash(record B) = j
Partitions of source B
. . . . . . . . .ii
M E
M O
R Y j
16
Stage 1: Memory-to-memory Joins
Partitions of source A
j
SOURCE-B
Tuple B
SOURCE-A
Tuple A
hash(record A) = i
. . . . . . . . .
insertinsert probeprobe
Output
![Page 17: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/17.jpg)
Partitions of source BPartitions of source A
M E
M O
R Y
i. . . . . . .
ii
D I
S K
i
Output
17
Stage 2: Disk-to-memory Joins
. . . . . . .. . . . . . .. . . . . . .
Partitions of source BPartitions of source A
. . . . .. . . . .. . . . .. . . . .
DPiA MPiB
![Page 18: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/18.jpg)
18
Stage 3: Clean-up
• Stage 1 fails to join tuples that were not in the memory at the same time.
• Stage 2 fails to join two tuples if one of them is not in the memory when the other is brought from the disk.
• Stage 3 joins all the partitions (memory-resident and disk-resident portions) of the two sources.
![Page 19: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/19.jpg)
19
Handling duplicates
• Timestamps
Tuple X
Tuple X ATS DTS
• Example
Tuple X 99 235
• Counter 51
![Page 20: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/20.jpg)
20
Detecting tuples joined in the 1st stage
Tuple A 102 234
Tuple B1 178 198
• Tuples joined in the first stage
DTSATS
Overlapping
Tuple A 102 234
Tuple B2 348 601
• Tuples not joined in the first stage
DTSATS
Non-Overlapping
![Page 21: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/21.jpg)
21
Detecting tuples joined in the 2nd stage
Tuple A
DTS
20 340 250 550 300 700100 200
ATS ProbeTSDTSlast
Tuple B
DTS
100 300 800 900500 600
ATS
Overlap
History list for the corresponding partitions
![Page 22: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/22.jpg)
22
Optimization 1: Adding a cache
• Stage 2 joins DPiA and MPiB
• Tuples of DPiA are discarded after use.
The idea: retain some tuples of DPiA (cached)
Could be used by a subsequent run of stage 2
joining DPiB and MPiA
![Page 23: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/23.jpg)
23
i . . .. . .i . . .. . .
i . . .. . .i . . .. . . i
CA
CH
E
Partitions of Source B
Partitions of Source A
i . . .. . .i . . .. . .
i . . .. . .i . . .. . . i
CA
CH
E
Partitions of Source B
Partitions of Source A
ME
MO
RY
DIS
K
prob
e
insert
OutputOutputOutput
Partitions of Source B
Partitions of Source A
Second run of stage 2First run of stage 2
prob
eprobe
![Page 24: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/24.jpg)
24
Optimization 2: Controlling Stage 2
• Overhead incured by Stage 2 is hidden only when both inputs experience delays
• Reduce the aggressiveness of Stage 2
• Dynamic activation threshold (e. g., 0.01 0.02)
![Page 25: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/25.jpg)
Experiment Environment
25
PREDATOR, an Object-Relational DBMS
• Xjoin operator added.
• Query optimizer extended to:
• account for XJoin.
• provide some of the statistics and calculations required by XJoin.
![Page 26: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/26.jpg)
Arrival Patterns
2 have been chosen:
Fig. 1: Bursty arrival.Avg. Rate: 23.5 KB/s
Fig. 2: Fast arrival.Avg. Rate: 129.6 KB/s
26
![Page 27: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/27.jpg)
• 100 000 tuple Wisconsin benchmark relations.
• each tuple: 288 bytes
• Unique unclustered integer join attribute
• Result cardinality: 100 000.
• Sun Ultra 5 WS: – Solaris 2.6– 128 MB of real memory– Disk space (approx.): 4 GB– Disk & Memory pages: 8 KB
• Storage manager buffer size: 800 KB
27
![Page 28: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/28.jpg)
Results
Experiment 1 Basic performance of XJoin
• Memory space allocated to the join operators: 3 MB.
• Input relations: 28.8 MB each
• Activation threshold (of stage 2): 0.01
• 4 delay scenarios
28
![Page 29: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/29.jpg)
29
![Page 30: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/30.jpg)
Case 1: Slow NetworkBoth sources are slow
• XJoin improves the delivery time of initial answers.
• The reactive background processing is an effective solution to exploit delays.
• The use of cache can further improve performance.
30
![Page 31: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/31.jpg)
Case 2: Mixed Network Slow build/Fast probeFast build/Slow probe
• XJoin variants perform better.
• (/Case 1) XJoins with the 2nd Stage perform better.
31
![Page 32: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/32.jpg)
• XJoin variants deliver initial results earlier.
• HHJ delivers the 2nd half of the result faster than XJoin-NoCache and XJoin.
• XJoin-No2nd delivers the last 60 % of the result faster than the other XJoin variants.
32
Case 3: Fast NetworkBoth sources are fast
![Page 33: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/33.jpg)
33
Experiment 2 : Controlling the 2nd stage
• improves inter. perf. with slow and bursty data sources.
• degrades the overall response-time in the case of fast/reliable sources.
Fig. 7: Slow relations. Fig. 8: Fast relations.
![Page 34: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/34.jpg)
• Stage 2 should be employed less aggressively (less often).
• A dynamic activation threshold.
34
![Page 35: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/35.jpg)
XJoin-Dyn
• aggressive in the early stages of the query.
• becomes less aggressive as more of the results are produced.
• starts with a low activation treshold (0.01) and then linearly increases it to 0.02.
35
![Page 36: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/36.jpg)
36
Experiment 3 : the effect of memory size
• Recall ! The prime motivation for designing XJoin was the huge memory requirements of the symmetric hash join.
• XJoin reduces the memory requirements but adds overhead (disk I/O & duplicate detection).
![Page 37: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/37.jpg)
• Size of the input relations: 8.6 MB.• 3 different memory allocations:
- 3 MB (neither of the inputs fit into the memory)- 10 MB (one input fits into the memory)- 20 MB (both inputs fit into the memory)
Fig. 9: Slow Network, Varying memory
Fig. 10: Fast Network, Varying memory
37
![Page 38: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/38.jpg)
• XJoin performs better both in:
- interactive performance
- completion time.
38
![Page 39: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/39.jpg)
Experiment 4 : impact of query complexity
• 2 to 6 relations (1 to 5 joins)• 3 MB to each join operator
Fig. 11. Tuple production rates of XJoin and HHJ (secs)- Slow Network
39
![Page 40: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/40.jpg)
Experiment 4 : impact of query complexity
Fig. 12. Tuple production rates of XJoin and HHJ (secs)
- Fast Network
40
XJoin delivers the initial results faster
![Page 41: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/41.jpg)
XJoin
An effective query processing technique for providing fast query responses to
users in the presence of slow and bursty remote sources.
41
Conclusions
![Page 42: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/42.jpg)
• lowers the memory requirements (partitioning)
• improves the interactive performance.
• reacts to delays and takes advantage of silent periods to produce more tuples faster.
42
![Page 43: XJoin : Getting Fast Answers From Slow and Bursty Networks](https://reader035.vdocuments.net/reader035/viewer/2022081514/56815876550346895dc5d62b/html5/thumbnails/43.jpg)
What de you think about
PJoin A Multithreaded Parallel XJoin Using
the Cilk Language
?43
Perspectives