An Adaptive Query Execution Engine for Data Integration
… Based on Zachary Ives, Alon Halevy & Hector Garcia-Molina’s Slides
2
ReviewsSh ip p in gO rd ersIn ven toryBooks
m ybooks .com M edia ted S chem a
W e s t
...
F e dE x
W A N
a lt.bo o ks .re v ie w s
In te rne tIn te rne t In te rne t
UP S
E a s t O rde rs C us to me rR e v ie w s
NY Time s
...
M o rga n-K a ufma n
P re ntic e -Ha ll
Data Integration Systems
Uniform query capability across autonomous, heterogeneous data sources on LAN, WAN, or Internet: in enterprises, WWW, big science.
3
parse
convert
apply laws
estimate result sizes
consider physical plans estimate costs
pick best
execute
{P1,P2,…..}
{(P1,C1),(P2,C2)...}
Pi
answer
SQL query
parse tree
logical query plan
“improved” l.q.p
l.q.p. +sizes
statistics
Traditional Query Processing
4
Example: SQL query
SELECT titleFROM StarsInWHERE starName IN (
SELECT nameFROM MovieStarWHERE birthdate LIKE ‘%1960’
);
(Find the movies with stars born in 1960)
5
Example: Logical Query Plan
title
starName=name
StarsIn name
birthdate LIKE ‘%1960’
MovieStar
6
Example: Improved Logical Query Plan
title
starName=name
StarsIn name
birthdate LIKE ‘%1960’
MovieStar
7
Example: Estimate Result Sizes
Need expected size
StarsIn
MovieStar
8
Example: One Physical Plan
Parameters: join order, memory size, project
attributes,...
Hash join
SEQ scan index scan Parameters:Select Condition,...
StarsIn MovieStar
9
Hash function h, range 0 k Buckets for R1: G0, G1, ... Gk Buckets for R2: H0, H1, ... Hk
Algorithm (Grace Hash Join)(1) Hash R1 tuples into G buckets(2) Hash R2 tuples into H buckets(3) For i = 0 to k do
match tuples in Gi, Hi bucketsPartitioning (steps 1—2); probing (step 3)
Example: (Grace) Hash Join
10
Simple example hash: even/odd
R1 R2 Buckets R1 R22 5 Even: 4 4 3 12 Odd: 5 38 139 8
1114
2 4 8 4 12 8 14
3 5 9 5 3 13 11
11
Hash Join Variations
Question: what if there is enough memoryto hold R1?
R2 R154
123
138
1114
243589
12
Hash Join VariationsQuestion: what if there is enough memory to hold R1?
R2 R154
123
…
2, 4, 8 (1) Load entire R1 into memory
(2) Build hash table for R1(using hash function h)
(2) For each tuple r2 in R2 do- read r2- probe hash table for R1 using h(r2)- for matching tuple r1,
output <r1,r2>
R1 hashtable
3, 5, 95
<5,5>
13
Example: Estimate costs
L.Q.P
P1 P2 …. Pn
C1 C2 …. Cn Pick best!
14
New Challenges for Processing Queries in DI Systems
Little information for cost estimates
Unreliable, overlapping sources
Unpredictable data transfer rates
Want initial results quickly
Need “smarter” execution
15
The Tukwila Project
Key idea: build adaptive features into the core of the system
Interleave planning and execution (replan when you know more about your data) Compensate for lack of information Rule-based mechanism for changing behavior
Adaptive query operators: Revival of the double-pipelined join better
latency Collectors (a.k.a. “smart union”) handling
overlap
16
Tukwila Data Integration System
ExecutionEngine
Optim izer,Scheduler
Tem p Store
EventHandler
OperatorExecution
LogicalP lan
Exec.QueryP lan
Data from sources
Reformulator
Execution Stats
Q uery
Results
Novel components: Event handler Optimization-execution loop
17
Handling Execution Events
Adaptive execution via event-condition-action rules
During execution, events generatedTimeout, n tuples read, operator opens/closes, memory
overflows, execution fragment completes, …
Events trigger rules: Test conditions
Memory free, tuples read, operator state, operator active, …
Execution actionsRe-optimize, reduce memory, activate/deactivate
operator, …
18
Interleaving Planning and ExecutionRe-optimize if at unexpected state: Evaluate at key
points, re-optimize un-executed portion of plan [Kabra/DeWitt SIGMOD98]
Plan has pipelined units, fragments
Send back statistics to optimizer.
Maintain optimizer state for later reuse.
Fragm ent 1
Fragm ent 0
H ashJo in
East
H ashJo in
M ateria lize& Test
FedExOrders
WHEN end_of_fragment(0) IF card(result) > 100,000 THEN re-optimize
19
Handling LatencyOrders from data source A
OrderNo1234123513991500
TrackNo01-23-4502-90-8502-90-85 03-99-10
UPS from data source BTrackNo01-23-4502-90-8503-99-1004-08-30
StatusIn TransitDeliveredDeliveredUndeliverable
tuple
relation
SelectStatus = “Delivered” UPS
attribute
Data from B often delayed due to high volume of requests
20
Join Operation
Orders
UPS
Need to combine tuples from Orders & UPS:
JoinOrders.TrackNo = UPS.TrackNo (Orders, UPS)
OrderNo1234123513991500
TrackNo01-23-4502-90-8502-90-8503-99-10
StatusIn TransitDeliveredDeliveredDelivered
(2nd TrackNo attribute is removed)
OrderNo1234123513991500
TrackNo01-23-4502-90-8502-90-85 03-99-10
TrackNo01-23-4502-90-8503-99-1004-08-30
StatusIn TransitDeliveredDeliveredUndeliverable
21
Query plan represented as data-flow tree:
Query Plan Execution
Pipelining vs. materialization
Control flow? Iterator (top-down)
Most common database model
Easier to implement
Data-driven (bottom-up)
Threads or external scheduling
Better concurrency
SelectStatus = “Delivered”
JoinOrders.TrackNo = UPS.TrackNo
ReadOrders
ReadUPS
“Show which orders havebeen delivered”
22
Standard Hash Join
Standard Hash Joinread entire inneruse outer tuples to probe inner hash table
Double Pipelined Hash Join
23
Standard Hash Join
Standard hash join has 2 phases: Non-pipelined: Read entire inner relation, build hash table Pipelined: Use tuples from outer relation to probe the hash
table
Advantages: Only one hash table Low CPU overhead
Disadvantages: High latency: need to wait for all inner tuples Asymmetric: need to estimate for inner Long time to completion: sum of two data sources
24
Adaptive Operators: Double Pipelined Join
Standard Hash JoinDouble Pipelined Hash Join Hash table per source As tuple comes in, add
to hash table and probe opposite table
25
Double-Pipelined Hash Join
R2 R14
123
…
2, 4
R1 hashtable
3, 5
<5,5>
89
R2 hashtable
5
5
probe
(1) 2,4,3,5 arrived(2) 5 arrives
store5
26
Double-Pipelined Hash Join
Proposed for parallel main-memory databases (Wilschut 1990)
Advantages: Results as soon as tuples received Can produce results even when one source delays Symmetric (do not need to distinguish inner/outer)
Disadvantages: Require memory for two hash tables Data-driven!
27
Double-Pipelined Join Adapted to Iterator Model
Use multiple threads with queues Each child (A or B) reads tuples until full,
then sleeps & awakens parent
DPJoin sleeps until awakened, then: Joins tuples from QA or QB, returning all
matches as output
Wakes owner of queue
Allows overlap of I/O & computation in iterator model
Little overlap between multiple computations
DPJoin
A B
QA QB
28
Experimental Results(Joining 3 data sources)
Normal: DPJoin (yellow) as fast as optimal standard join (pink)
Slow sources: DPJoin much better
DPJoinOptimal StdSuboptimal Std
0
60
120
180
1 31 61 91
Tuples (100's)
Tim
e (
s)
0
80
160
240
1 31 61 91
Tuples (100's)
Tim
e (
s)
29
Insufficient Memory?
May not be able to fit hash tables in RAM
Strategy for standard hash join Swap some buckets to overflow files
As new tuples arrive for those buckets, write to files
After current phase, clear memory, repeat join on overflow files
30
Overflow Strategies
3 overflow resolution methods: Naive conversion — conversion into standard join;
asymmetric
Flush left hash table to disk, pause left source
Read in right source
Pipeline tuples from left source
Incremental Left Flush — “lazy” version of above
As above, but flush minimally
When reading tuples from right, still probe left hash table
Incremental Symmetric Flush — remains symmetric
Choose same hash bucket from both tables to flush
31
Simple Algorithmic Comparison
Assume s tuples per source (i.e. sources of equal size), memory fits m tuples: Left Flush:
If m/2 < s m (only left overflows): Cost = 2(s - m/2) = 2s - m
If m < s 2m (both overflow): Cost = 2(s + m2/2s - 3m/2) + 2(s - m) = 4s - 4m + m2/s
Symmetric Flush:If s < 2m:
Cost = 2(2s - m) = 4s - 2m
32
Experimental Results
0
140
280
420
1 31 61
Tuples (100's)
Tim
e (
s)
0
150
300
450
1 31 61
Tuples (100's)
Tim
e (
s)
Low memory (4MB): symmetric as fast as optimal
Medium memory (8MB): incremental left is nearly optimal
SymmetricIncrem. LeftNaiveOptimal (16MB)
Adequate performance for overflow cases Left flush consistent output; symmetric sometimes faster
33
Adaptive Operators: Collector
Utilize mirrors and overlapping sources to produce results quickly Dynamically adjust
to source speed & availability
Scale to many sources without exceeding net bandwidth
Based on policy expressed via rules
C
CustReviews
NYTim es
alt.books
WHEN timeout(CustReviews) DO activate(NYTimes), activate(alt.books)
WHEN timeout(NYTimes) DO activate(alt.books)
34
Summary
DPJoin shows benefits over standard joins Possible to handle out-of-memory conditions
efficiently Experiments suggest optimal strategy depends
on: Sizes of sources Amount of memory I/O speed
35
The Unsolved Problem
Find interleaving points? When to switch from optimization to execution?
Some straightforward solutions worked reasonably, but student who was supposed to solve the problem graduated prematurely.
Some work on this problem: Rick Cole (Informix) Benninghoff & Maier (OGI).
One solution being explored: execute first and break pipeline later as needed.
Another solution: change operator ordering in mid-flight (Eddies, Avnur & Hellerstein).
36
More Urgent Problems
Users want answers immediately: Optimize time to first tuple Give approximate results earlier.
XML emerges as a preferred platform for data integration: But all existing XML query processors are
based on first loading XML into a repository.
37
Tukwila Version 2
Able to transform, integrate and query arbitrary XML documents.
Support for output of query results as early as possible: Streaming model of XML query execution.
Efficient execution over remote sources that are subject to frequent updates.
Philosophy: how can we adapt relational and object-relational execution engines to work with XML?
38
Tukwila V2 Highlights
The X-scan operator that maps XML data into tuples of subtrees.
Support for efficient memory representation of subtrees (use references to minimize replication).
Special operators for combining and structuring bindings as XML output.
39
Tukwila V2 Architecture
40
Example XML File<db> <book publisher="mkp"> <title>Readings in Database Systems</title> <editors> <name>Stonebraker</name> <name>Hellerstein</name> </editors> <isbn>123-456-X</isbn> </book><company ID="mkp"> <name>Morgan Kaufmann</title> <city>San Mateo</city> <state>CA</state> </company></db>
41
XML Data Graph
db
#1
#2
#7mkp
ReadingsIn DatabaseSystems
123-456-X
#2#3 #6
Principlesof TransactionProcessing
235-711-Y
#8
#9
Morgan Kaufmann
San Mateo
#11 #12 #13
#4
StonebrakerHellerstein
#5
CA
#4
#5
#10
Bernstein
Newcomer
42
Example Query
WHERE <db> <book publisher=$pID> <title>$t</> </> ELEMENT_AS $b </> IN "books.xml", <db> <publication title=$t> <source ID=$pID>$p</>
<price>$pr</> </> </> IN "amazon.xml", $pr < 49.95CONSTRUCT <book> <name>$t</> <publisher>$p</> </>
43
Query
Execution
Plan
44
X-ScanThe operator at the leaves of the plan.Given an XML stream and a set of
regular expressions – produces a set of bindings.
Supports both trees and graph data.Uses a set of state machines to traverse
match the patterns.Maintains a list to unseen element Ids,
and resolves them upon arrival.
45
X-scan Data Structures
Structural Index
. . .
ID index
<db> <lab ID=... <name>Seattle... <location> <city>Seattle... ...
XML Tree Mgr
State MachinesStack
l c
#1 #3
Bindings
BindingTuples
ID2
. . .
ID1ID3
UnresolvedIDREFs
. . .
46
State Machines for X-scan
Mb:
MpID:
Mt:
1 2 3
4 5
db book
@publisher
6 7title
47
Other Features of Tukwila V.2
X-scan: Can also be made to preserve XML order. Careful handling of cycles in the XML graph. Can apply certain selections to the bindings.
Uses much of the code of Tukwila I.No modifications to traditional
operators.XML output producing operators.Nest operator.
48
In the “Pipeline”Partial answers: no blocking. Produce
approximate answers as data is streaming.
Policies for recovering from memory overflow [More Zack].
Efficient updating of XML documents (and an XML update language) [w/Tatarinov]
Dan Suciu: a modular/composable toolset for manipulating XML.
Automatic generation of data source descriptions (Doan & Domingos)
49
First 5 Results
4.971.48
6.381.61 3.50
34.6
0.64 2.78
47.0 190
0
10
20
30
40
50
Order 1234(R, 5MB)
LineItemQty < 32
(R, 31MB)
Orders xCust (R,
5x0.5MB)
LineItems xOrders (R,31x7MB)
Papersunder
Confs (I,0.2x9MB)
Papers withConfRefs(D, 39MB)
Query
Exe
c. T
ime
Tukwila
DOM Parse
Xalan
XT
Relational
938
50
Completion Time
22.13
104.75
20.04
363.53
208.57175.67
508.29
69.6
190
0
50
100
150
200
250
300
350
400
Order 1234(R, 5MB)
LineItemQty < 32
(R, 31MB)
Orders xCust (R,
5x0.5MB)
LineItems xOrders (R,31x7MB)
Papersunder
Confs (I,0.2x9MB)
Papers withConfRefs(D, 39MB)
Query
Exe
c. T
ime
Tukwila
Xalan
XT
Relational
938
51
Intermediate Conclusions
First scalable XML query processor for networked data.
Work done in relational query processing is very relevant to XML query processing.
We want to avoid decomposing XML data into relational structures.
52
Some Observations from Nimble
What is Nimble? Founded in June, 1999 with Dan Weld. Data integration engine built on an XML
platform. Query language is XML-QL. Mostly geared to enterprise integration,
some advanced web applications. 70+ person company (and hiring!) Ships in trucks (first customer is Paccar).
53
XML Query
User Applications
Lens™ File InfoBrowser™Software
Developers Kit
NIMBLE™ APIs
Front-End
XML
Lens Builder™Lens Builder™
Management Tools
Management Tools
Integration Builder
Integration Builder
Security T
ools
Data Administrator
Data Administrator
System Architecture
Concordance Developer
Concordance Developer
Integration
Layer
Nimble Integration Engine™
Compiler Executor
MetadataServerCache
Relational Data Warehouse/ Mart
Legacy Flat File Web Pages
Common XML View
54
The Current State of Enterprise Information
Explosion of intranet and extranet information
80% of corporate information is unmanaged
By 2004 30X more enterprise data than 1999
The average company: maintains 49 distinct
enterprise applications spends 35% of total IT
budget on integration-related efforts
1995 1997 1999 2001 2003 2005
Enterprise Information
Source: Gartner, 1999
55
Design Issues Query language for XML: tracking the W3C
committee. The algebra:
Needs to handle XML, relational, hierarchical and support it all efficiently!
Need to distinguish physical from logical algebra.
Concordance tables need to be an integral part of the system. Need to think of data cleaning.
Need to deal with down times of data sources (or refusal times).
Need to provide range of options between on-demand querying and pre-materialization.
56
Non-Technical Issues
SQL not really a standard.Legacy systems are not necessarily old.IT managers skeptical of truths.People are very confused out there.Need a huge organization to support the
effort.