07. overview of query processing
TRANSCRIPT
![Page 1: 07. Overview of Query Processing](https://reader036.vdocuments.net/reader036/viewer/2022081418/553cf1514a7959d8258b4b24/html5/thumbnails/1.jpg)
Distributed Database SystemsAutumn, 2007
Chapter 7
Overview of Query Processing
1Distributed Database Systems
![Page 2: 07. Overview of Query Processing](https://reader036.vdocuments.net/reader036/viewer/2022081418/553cf1514a7959d8258b4b24/html5/thumbnails/2.jpg)
SQL: Non-Procedural Language of RDB
Tuple calculus◦ { t | F(t) } where:
t : tuple variableF(t) : well formed formula
Example◦ Get the No. and name of all managers
2Distributed Database Systems
( ) ( ){ }""|, MANAGERTITLEtEMPtENAMEENOt =∧∈
![Page 3: 07. Overview of Query Processing](https://reader036.vdocuments.net/reader036/viewer/2022081418/553cf1514a7959d8258b4b24/html5/thumbnails/3.jpg)
SQL: Non-Procedural Language of RDB
Domain calculus
where:xi : domain variables
: well formed formula
Example
{ x, y | E(x, y, "manager") }
3Distributed Database Systems
( ){ } ,,,|,,, 2121 nn xxxFxxx ⋅⋅⋅⋅⋅⋅
( )nxxxF ,,, 21 ⋅⋅⋅
Variables are position sensitive!
![Page 4: 07. Overview of Query Processing](https://reader036.vdocuments.net/reader036/viewer/2022081418/553cf1514a7959d8258b4b24/html5/thumbnails/4.jpg)
SQL: Non-Procedural Language of RDB
SQL is a tuple calculus language
SELECT ENO,ENAMEFROM EMPWHERE TITLE=“manager”
4Distributed Database Systems
End user uses non-procedural languages to express queries.
![Page 5: 07. Overview of Query Processing](https://reader036.vdocuments.net/reader036/viewer/2022081418/553cf1514a7959d8258b4b24/html5/thumbnails/5.jpg)
Query Processor
Query processor transforms queries into procedural operations to access data
5Distributed Database Systems
![Page 6: 07. Overview of Query Processing](https://reader036.vdocuments.net/reader036/viewer/2022081418/553cf1514a7959d8258b4b24/html5/thumbnails/6.jpg)
Query Processor
Distributed query processor has to deal with
◦query decomposition, and
◦data localization
6Distributed Database Systems
![Page 7: 07. Overview of Query Processing](https://reader036.vdocuments.net/reader036/viewer/2022081418/553cf1514a7959d8258b4b24/html5/thumbnails/7.jpg)
7.1 Query Processing Problems
Distributed Database Systems 7
![Page 8: 07. Overview of Query Processing](https://reader036.vdocuments.net/reader036/viewer/2022081418/553cf1514a7959d8258b4b24/html5/thumbnails/8.jpg)
7.1 Query Processing ProblemsCentralized query processor must◦ transform calculus query into algebra operation, and
◦choose the best execution planExample:
SELECT ENAMEFROM E,GWHERE E.ENO = G.ENOAND RESP=“manager”
8Distributed Database Systems
![Page 9: 07. Overview of Query Processing](https://reader036.vdocuments.net/reader036/viewer/2022081418/553cf1514a7959d8258b4b24/html5/thumbnails/9.jpg)
7.1 Query Processing Problems
Relational Algebra 1
Relational Algebra 2
9Distributed Database Systems
( )( )GE ManagerRESPENOENAME ""=σπ <>
( )( )GEENOGENOEManagerRESPENAME ×=∧= ..""σπ
Execution plan 2 is better for consuming less resources!
![Page 10: 07. Overview of Query Processing](https://reader036.vdocuments.net/reader036/viewer/2022081418/553cf1514a7959d8258b4b24/html5/thumbnails/10.jpg)
7.1 Query Processing Problems
In DDB, the query processor must consider the communication cost and select the best site!Same query as last example, but G and E are distributed.Simple plan:◦ To transport all segments to query site and
execute there. This causes too much network traffic, very costly.
10Distributed Database Systems
![Page 11: 07. Overview of Query Processing](https://reader036.vdocuments.net/reader036/viewer/2022081418/553cf1514a7959d8258b4b24/html5/thumbnails/11.jpg)
7.1 Query Processing Problems
Distributed Query Example◦ Distribution of E and G
11Distributed Database Systems
![Page 12: 07. Overview of Query Processing](https://reader036.vdocuments.net/reader036/viewer/2022081418/553cf1514a7959d8258b4b24/html5/thumbnails/12.jpg)
7.1 Query Processing Problems
Distributed Query Example◦ Query
12Distributed Database Systems
( )( )GE ManagerREPSPENOENAME ""=σπ <>
![Page 13: 07. Overview of Query Processing](https://reader036.vdocuments.net/reader036/viewer/2022081418/553cf1514a7959d8258b4b24/html5/thumbnails/13.jpg)
7.1 Query Processing Problems
Distributed Query Example◦ Optimized Processing
13Distributed Database Systems
![Page 14: 07. Overview of Query Processing](https://reader036.vdocuments.net/reader036/viewer/2022081418/553cf1514a7959d8258b4b24/html5/thumbnails/14.jpg)
7.2 Objectives of Query Processing
Distributed Database Systems 14
![Page 15: 07. Overview of Query Processing](https://reader036.vdocuments.net/reader036/viewer/2022081418/553cf1514a7959d8258b4b24/html5/thumbnails/15.jpg)
7.2 Objectives of Query Processing
Two-fold objectives:
◦Transformation, and
◦Optimization
15Distributed Database Systems
![Page 16: 07. Overview of Query Processing](https://reader036.vdocuments.net/reader036/viewer/2022081418/553cf1514a7959d8258b4b24/html5/thumbnails/16.jpg)
7.2 Objectives of Query Processing
Cost to be considered for optimization:
◦CPU time◦I/O time, and
◦Communication time
16Distributed Database Systems
WAN: the last cost is dominant
LAN: all three are equal
![Page 17: 07. Overview of Query Processing](https://reader036.vdocuments.net/reader036/viewer/2022081418/553cf1514a7959d8258b4b24/html5/thumbnails/17.jpg)
7.3 Complexity of Relational Algebra Operations
Distributed Database Systems 17
![Page 18: 07. Overview of Query Processing](https://reader036.vdocuments.net/reader036/viewer/2022081418/553cf1514a7959d8258b4b24/html5/thumbnails/18.jpg)
7.3 Complexity of Relational Algebra Operations
Measured by n (cardinality) and tuples are sorted on comparison attributes
Distributed Database Systems 18
O(n)
O(nÿlogn)
O(nÿlogn)
O(n2)
)duplicates(with ,πσ
GROUP ),duplicates(with π
−÷ ,,,, UI<>
×
![Page 19: 07. Overview of Query Processing](https://reader036.vdocuments.net/reader036/viewer/2022081418/553cf1514a7959d8258b4b24/html5/thumbnails/19.jpg)
7.4 Characterization of Query Processor
Distributed Database Systems 19
![Page 20: 07. Overview of Query Processing](https://reader036.vdocuments.net/reader036/viewer/2022081418/553cf1514a7959d8258b4b24/html5/thumbnails/20.jpg)
7.4.1 Languages
For users:
◦ calculus or algebra based languages.For query processor:
◦map the input into internal form of algebra augmented with communication primitives.
Distributed Database Systems 20
![Page 21: 07. Overview of Query Processing](https://reader036.vdocuments.net/reader036/viewer/2022081418/553cf1514a7959d8258b4b24/html5/thumbnails/21.jpg)
7.4.2 Types of Optimization
Exhaustive search◦ Workable for small solution space
Heuristics
◦ Perform first, semi-join, etc. for largesolution space
Distributed Database Systems 21
σ π,
![Page 22: 07. Overview of Query Processing](https://reader036.vdocuments.net/reader036/viewer/2022081418/553cf1514a7959d8258b4b24/html5/thumbnails/22.jpg)
7.4.3 Optimization Timing
Static◦ Do it at compiling time by using statistics,
appropriate for exhaustive search, optimized once, but executed many times.
Dynamic◦ Do it at execution time, accurate, repeated
for every execution, expensive.
Distributed Database Systems 22
![Page 23: 07. Overview of Query Processing](https://reader036.vdocuments.net/reader036/viewer/2022081418/553cf1514a7959d8258b4b24/html5/thumbnails/23.jpg)
7.4.4 Statistics
Facts of◦ Cardinalities◦ Attribute value distribution◦ Size of relation, etc.
Provided to query optimizer and periodically updated.
Distributed Database Systems 23
![Page 24: 07. Overview of Query Processing](https://reader036.vdocuments.net/reader036/viewer/2022081418/553cf1514a7959d8258b4b24/html5/thumbnails/24.jpg)
7.4.5 Decision Site
For query optimization, it may be done by◦ Single site – centralized approach, or◦ All the sites involved – distributed, or◦ Hybrid – one site makes major decision in
cooperation with other sites making local decisions
Distributed Database Systems 24
![Page 25: 07. Overview of Query Processing](https://reader036.vdocuments.net/reader036/viewer/2022081418/553cf1514a7959d8258b4b24/html5/thumbnails/25.jpg)
7.4.6 Exploration of the Network Topology
WAN◦ communication cost is dominant
LAN◦ communication cost is comparable to I/O
cost. Broadcasting capability, star network, satellite network should be considered.
Distributed Database Systems 25
![Page 26: 07. Overview of Query Processing](https://reader036.vdocuments.net/reader036/viewer/2022081418/553cf1514a7959d8258b4b24/html5/thumbnails/26.jpg)
7.4.7 Exploration of Replicated Fragments
Use replications to minimize communication costs.
Distributed Database Systems 26
![Page 27: 07. Overview of Query Processing](https://reader036.vdocuments.net/reader036/viewer/2022081418/553cf1514a7959d8258b4b24/html5/thumbnails/27.jpg)
7.4.8 Use of Semi-joins
Reduce the size of operand relations to cut down communication costs when overhead is not significant.
Distributed Database Systems 27
![Page 28: 07. Overview of Query Processing](https://reader036.vdocuments.net/reader036/viewer/2022081418/553cf1514a7959d8258b4b24/html5/thumbnails/28.jpg)
7.5 Layers of Query Processing
Distributed Database Systems 28
![Page 29: 07. Overview of Query Processing](https://reader036.vdocuments.net/reader036/viewer/2022081418/553cf1514a7959d8258b4b24/html5/thumbnails/29.jpg)
Distributed Database Systems 29
Generic Laying Scheme for Distributed Query Processing
![Page 30: 07. Overview of Query Processing](https://reader036.vdocuments.net/reader036/viewer/2022081418/553cf1514a7959d8258b4b24/html5/thumbnails/30.jpg)
7.5.1 Query Decomposition
Decompose calculus query into algebra query using global conceptual schema information.
Distributed Database Systems 30
Step 1 – calculus normalization
Step 2 – semantic analysis to reject incorrect queries
Step 3 – simplification to eliminate redundant components
Step 4 – translation of calculus query into optimized algebra query.
![Page 31: 07. Overview of Query Processing](https://reader036.vdocuments.net/reader036/viewer/2022081418/553cf1514a7959d8258b4b24/html5/thumbnails/31.jpg)
7.5.2 Data Localization
Distributed query is mapped into a fragment query and simplified to produce a good one.
Distributed Database Systems 31
![Page 32: 07. Overview of Query Processing](https://reader036.vdocuments.net/reader036/viewer/2022081418/553cf1514a7959d8258b4b24/html5/thumbnails/32.jpg)
7.5.3 Global Query Optimization
Find an execution strategy close to optimal.Find the best ordering of operations in the fragment query, including communication operations.Cost function defined in time is required.
Distributed Database Systems 32
![Page 33: 07. Overview of Query Processing](https://reader036.vdocuments.net/reader036/viewer/2022081418/553cf1514a7959d8258b4b24/html5/thumbnails/33.jpg)
7.5.4 Local Query Optimization
Centralized system algorithms
(to be discussed in chapter 9)
Distributed Database Systems 33
![Page 34: 07. Overview of Query Processing](https://reader036.vdocuments.net/reader036/viewer/2022081418/553cf1514a7959d8258b4b24/html5/thumbnails/34.jpg)
7.6 Conclusions
Distributed Database Systems 34
![Page 35: 07. Overview of Query Processing](https://reader036.vdocuments.net/reader036/viewer/2022081418/553cf1514a7959d8258b4b24/html5/thumbnails/35.jpg)
7.6 Conclusions
Query processor – must be able to find good execution plan for a calculus query, s. t. CPU time, I/O time and communication time are minimized.Method: laying of◦ decomposition◦ localization◦ global query optimization◦ local query optimization
Distributed Database Systems 35