michael armbrust a functional query optimization framework
TRANSCRIPT
![Page 1: Michael Armbrust A Functional Query Optimization Framework](https://reader034.vdocuments.net/reader034/viewer/2022050714/56649d165503460f949ec547/html5/thumbnails/1.jpg)
Michael Armbrust
A Functional Query Optimization Framework
![Page 2: Michael Armbrust A Functional Query Optimization Framework](https://reader034.vdocuments.net/reader034/viewer/2022050714/56649d165503460f949ec547/html5/thumbnails/2.jpg)
What is Query Optimization?SQL is a declarative language:
Queries express what data to retrieve,
not how to retrieve it.
The database is free to pick the ‘best’ execution strategy through a process known as optimization
![Page 3: Michael Armbrust A Functional Query Optimization Framework](https://reader034.vdocuments.net/reader034/viewer/2022050714/56649d165503460f949ec547/html5/thumbnails/3.jpg)
Naïve Query Planning
SELECT name
FROM (
SELECT id, name
FROM People) p
WHERE p.id = 1
![Page 4: Michael Armbrust A Functional Query Optimization Framework](https://reader034.vdocuments.net/reader034/viewer/2022050714/56649d165503460f949ec547/html5/thumbnails/4.jpg)
Optimized ExecutionWriting imperative code to optimize such patterns generally is hard.
Instead write simple rules:• Each rule makes one
small change• Run rules many rules
together to fixed point.
![Page 5: Michael Armbrust A Functional Query Optimization Framework](https://reader034.vdocuments.net/reader034/viewer/2022050714/56649d165503460f949ec547/html5/thumbnails/5.jpg)
Optimizing with Rules
![Page 6: Michael Armbrust A Functional Query Optimization Framework](https://reader034.vdocuments.net/reader034/viewer/2022050714/56649d165503460f949ec547/html5/thumbnails/6.jpg)
Prior Work:Optimizer GeneratorsVolcano / Cascades:
• Create a custom language for expressing rules that rewrite trees of relational operators.
• Build a compiler that generates executable code for these rules.
Cons: Developers need to learn this custom language. Language might not be powerful enough.
![Page 7: Michael Armbrust A Functional Query Optimization Framework](https://reader034.vdocuments.net/reader034/viewer/2022050714/56649d165503460f949ec547/html5/thumbnails/7.jpg)
Introducing Catalyst
Idea: Use high-level language features from a modern functional language (scala) to build an optimizer generator.
![Page 8: Michael Armbrust A Functional Query Optimization Framework](https://reader034.vdocuments.net/reader034/viewer/2022050714/56649d165503460f949ec547/html5/thumbnails/8.jpg)
TreeNode Library
Easily transformable trees of operators
• Standard collection functionality (foreach, map, etc)
• Transform function – recursive modification of trees that match a specified pattern
• Debugging support
![Page 9: Michael Armbrust A Functional Query Optimization Framework](https://reader034.vdocuments.net/reader034/viewer/2022050714/56649d165503460f949ec547/html5/thumbnails/9.jpg)
Tree TransformationsExpressed as PartialFunction[TreeType,TreeType] (i.e. a function that can be applied to some subset of all trees and that returns a new tree.)
1. If the function does apply to a given operator, that operator is replaced with the result.
2. When the function does not apply to a given operator, that operator is left unchanged.
3. The transformation is also applied recursively to the operators children.
![Page 10: Michael Armbrust A Functional Query Optimization Framework](https://reader034.vdocuments.net/reader034/viewer/2022050714/56649d165503460f949ec547/html5/thumbnails/10.jpg)
Writing Rules as Tree Transformations1. Find filters on top
of projections.
2. Check that the filter can be evaluated without the result of the project.
3. If so, switch the operators.
![Page 11: Michael Armbrust A Functional Query Optimization Framework](https://reader034.vdocuments.net/reader034/viewer/2022050714/56649d165503460f949ec547/html5/thumbnails/11.jpg)
Writing Rules as Tree Transformations
val newPlan = queryPlan transform {
case f @ Filter(_, p @ Project(_, grandChild))
if(f.references subsetOf grandChild.output) =>
p.copy(child = f.copy(child = grandChild)
}
![Page 12: Michael Armbrust A Functional Query Optimization Framework](https://reader034.vdocuments.net/reader034/viewer/2022050714/56649d165503460f949ec547/html5/thumbnails/12.jpg)
Writing Rules as Tree Transformations
val newPlan = queryPlan transform {
case f @ Filter(_, p @ Project(_, grandChild))
if(f.references subsetOf grandChild.output) =>
p.copy(child = f.copy(child = grandChild)
}
Partial FunctionTree
![Page 13: Michael Armbrust A Functional Query Optimization Framework](https://reader034.vdocuments.net/reader034/viewer/2022050714/56649d165503460f949ec547/html5/thumbnails/13.jpg)
Writing Rules as Tree Transformations
val newPlan = queryPlan transform {
case f @ Filter(_, p @ Project(_, grandChild))
if(f.references subsetOf grandChild.output) =>
p.copy(child = f.copy(child = grandChild)
}
Find Filter on Project
![Page 14: Michael Armbrust A Functional Query Optimization Framework](https://reader034.vdocuments.net/reader034/viewer/2022050714/56649d165503460f949ec547/html5/thumbnails/14.jpg)
Writing Rules as Tree Transformations
val newPlan = queryPlan transform {
case f @ Filter(_, p @ Project(_, grandChild))
if(f.references subsetOf grandChild.output) =>
p.copy(child = f.copy(child = grandChild)
} Check that the filter can be evaluated without the result of the project.
![Page 15: Michael Armbrust A Functional Query Optimization Framework](https://reader034.vdocuments.net/reader034/viewer/2022050714/56649d165503460f949ec547/html5/thumbnails/15.jpg)
Writing Rules as Tree Transformations
val newPlan = queryPlan transform {
case f @ Filter(_, p @ Project(_, grandChild))
if(f.references subsetOf grandChild.output) =>
p.copy(child = f.copy(child = grandChild)
} If so, switch the order.
![Page 16: Michael Armbrust A Functional Query Optimization Framework](https://reader034.vdocuments.net/reader034/viewer/2022050714/56649d165503460f949ec547/html5/thumbnails/16.jpg)
Writing Rules as Tree Transformations
val newPlan = queryPlan transform {
case f @ Filter(_, p @ Project(_, grandChild))
if(f.references subsetOf grandChild.output) =>
p.copy(child = f.copy(child = grandChild)
}
Pattern Matching
![Page 17: Michael Armbrust A Functional Query Optimization Framework](https://reader034.vdocuments.net/reader034/viewer/2022050714/56649d165503460f949ec547/html5/thumbnails/17.jpg)
Writing Rules as Tree Transformations
val newPlan = queryPlan transform {
case f @ Filter(_, p @ Project(_, grandChild))
if(f.references subsetOf grandChild.output) =>
p.copy(child = f.copy(child = grandChild)
}
Collections library
![Page 18: Michael Armbrust A Functional Query Optimization Framework](https://reader034.vdocuments.net/reader034/viewer/2022050714/56649d165503460f949ec547/html5/thumbnails/18.jpg)
Writing Rules as Tree Transformations
val newPlan = queryPlan transform {
case f @ Filter(_, p @ Project(_, grandChild))
if(f.references subsetOf grandChild.output) =>
p.copy(child = f.copy(child = grandChild)
}
Copy Constructors
![Page 19: Michael Armbrust A Functional Query Optimization Framework](https://reader034.vdocuments.net/reader034/viewer/2022050714/56649d165503460f949ec547/html5/thumbnails/19.jpg)
Prototype StatusPrototype Shark without using the HIVE optimizer.
• 1 Developer, 1 month
• Able to plan SELECT, WHERE, GROUP BY, ORDER BY, COUNT, SUM, AVERAGE, COUNT DISTINCT.• Catalyst plans global aggregates better than current Shark
(Using Spark Accumulator variables)
• ~700 Lines of code for planning + execution using Spark
• Working on integrating BlinkDB (approximate queries using sampling).
![Page 20: Michael Armbrust A Functional Query Optimization Framework](https://reader034.vdocuments.net/reader034/viewer/2022050714/56649d165503460f949ec547/html5/thumbnails/20.jpg)
ConclusionCatalyst is a framework for optimizing trees of relational operators:
• TreeNode library adds collection / transformation methods to tree nodes.
• Library for representing common logical relational operators and expressions.
• Machinery for executing batches of transformations to fixed point / enumerating possible physical plans for a given logical plan
![Page 21: Michael Armbrust A Functional Query Optimization Framework](https://reader034.vdocuments.net/reader034/viewer/2022050714/56649d165503460f949ec547/html5/thumbnails/21.jpg)
Questions?
![Page 22: Michael Armbrust A Functional Query Optimization Framework](https://reader034.vdocuments.net/reader034/viewer/2022050714/56649d165503460f949ec547/html5/thumbnails/22.jpg)
Expression Library
• Contains common expressions (+, -, etc.) and aggregates (SUM, COUNT, etc)
• Operators can reason functionally about data type and nullability.
• Assigns GUIDS to all attributes in a plan to allow reasoning about where values are produced even after transformations have destroyed scoping information.
![Page 23: Michael Armbrust A Functional Query Optimization Framework](https://reader034.vdocuments.net/reader034/viewer/2022050714/56649d165503460f949ec547/html5/thumbnails/23.jpg)
Cost Models: Selecting the Cheapest Plan• Many possible physical plans for most
queries.
• Optimizer needs to decide which to use.
• Cost Model – Estimate cost of execution based on data statistics.