hawq architecture hl++ 2015 moscow
TRANSCRIPT
![Page 1: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/1.jpg)
HAWQ Architecture Alexey Grishchenko
![Page 2: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/2.jpg)
Who I am Enterprise Architect @ Pivotal • 7 years in data processing
• 5 years of experience with MPP
• 4 years with Hadoop
• Using HAWQ since the first internal Beta
• Responsible for designing most of the EMEA HAWQ and Greenplum implementations
• Spark contributor
• http://0x0fff.com
![Page 3: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/3.jpg)
Agenda • What is HAWQ
![Page 4: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/4.jpg)
Agenda • What is HAWQ
• Why you need it
![Page 5: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/5.jpg)
Agenda • What is HAWQ
• Why you need it
• HAWQ Components
![Page 6: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/6.jpg)
Agenda • What is HAWQ
• Why you need it
• HAWQ Components
• HAWQ Design
![Page 7: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/7.jpg)
Agenda • What is HAWQ
• Why you need it
• HAWQ Components
• HAWQ Design
• Query execution example
![Page 8: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/8.jpg)
Agenda • What is HAWQ
• Why you need it
• HAWQ Components
• HAWQ Design
• Query execution example
• Competitive solutions
![Page 9: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/9.jpg)
What is • Analytical SQL-on-Hadoop engine
![Page 10: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/10.jpg)
What is • Analytical SQL-on-Hadoop engine
• HAdoop With Queries
![Page 11: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/11.jpg)
What is • Analytical SQL-on-Hadoop engine
• HAdoop With Queries
Postgres Greenplum HAWQ
2005 Fork Postgres 8.0.2
![Page 12: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/12.jpg)
What is • Analytical SQL-on-Hadoop engine
• HAdoop With Queries
Postgres HAWQ
2005 Fork Postgres 8.0.2
2009 Rebase Postgres 8.2.15
Greenplum
![Page 13: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/13.jpg)
What is • Analytical SQL-on-Hadoop engine
• HAdoop With Queries
Postgres HAWQ
2005 Fork Postgres 8.0.2
2009 Rebase Postgres 8.2.15
2011 Fork GPDB 4.2.0.0
Greenplum
![Page 14: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/14.jpg)
What is • Analytical SQL-on-Hadoop engine
• HAdoop With Queries
Postgres HAWQ
2005 Fork Postgres 8.0.2
2009 Rebase Postgres 8.2.15
2011 Fork GPDB 4.2.0.0
2013 HAWQ 1.0.0.0
Greenplum
![Page 15: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/15.jpg)
What is • Analytical SQL-on-Hadoop engine
• HAdoop With Queries
Postgres HAWQ
2005 Fork Postgres 8.0.2
2009 Rebase Postgres 8.2.15
2011 Fork GPDB 4.2.0.0
2013 HAWQ 1.0.0.0
HAWQ 2.0.0.0 Open Source 2015
Greenplum
![Page 16: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/16.jpg)
HAWQ is … • 1’500’000 C and C++ lines of code
![Page 17: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/17.jpg)
HAWQ is … • 1’500’000 C and C++ lines of code
– 200’000 of them in headers only
![Page 18: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/18.jpg)
HAWQ is … • 1’500’000 C and C++ lines of code
– 200’000 of them in headers only
• 180’000 Python LOC
![Page 19: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/19.jpg)
HAWQ is … • 1’500’000 C and C++ lines of code
– 200’000 of them in headers only
• 180’000 Python LOC
• 60’000 Java LOC
![Page 20: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/20.jpg)
HAWQ is … • 1’500’000 C and C++ lines of code
– 200’000 of them in headers only
• 180’000 Python LOC
• 60’000 Java LOC
• 23’000 Makefile LOC
![Page 21: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/21.jpg)
HAWQ is … • 1’500’000 C and C++ lines of code
– 200’000 of them in headers only
• 180’000 Python LOC
• 60’000 Java LOC
• 23’000 Makefile LOC
• 7’000 Shell scripts LOC
![Page 22: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/22.jpg)
HAWQ is … • 1’500’000 C and C++ lines of code
– 200’000 of them in headers only
• 180’000 Python LOC
• 60’000 Java LOC
• 23’000 Makefile LOC
• 7’000 Shell scripts LOC
• More than 50 enterprise customers
![Page 23: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/23.jpg)
HAWQ is … • 1’500’000 C and C++ lines of code
– 200’000 of them in headers only
• 180’000 Python LOC
• 60’000 Java LOC
• 23’000 Makefile LOC
• 7’000 Shell scripts LOC
• More than 50 enterprise customers – More than 10 of them in EMEA
![Page 24: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/24.jpg)
Apache HAWQ • Apache HAWQ (incubating) from 09’2015
– http://hawq.incubator.apache.org
– https://github.com/apache/incubator-hawq
• What’s in Open Source – Sources of HAWQ 2.0 alpha
– HAWQ 2.0 beta is planned for 2015’Q4
– HAWQ 2.0 GA is planned for 2016’Q1
• Community is yet young – come and join!
![Page 25: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/25.jpg)
Why do we need it?
![Page 26: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/26.jpg)
Why do we need it?
• SQL-interface for BI solutions to the Hadoop data complaint with ANSI SQL-92, -99, -2003
![Page 27: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/27.jpg)
Why do we need it?
• SQL-interface for BI solutions to the Hadoop data complaint with ANSI SQL-92, -99, -2003
– Example - 5000-line query with a number of window function generated by Cognos
![Page 28: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/28.jpg)
Why do we need it?
• SQL-interface for BI solutions to the Hadoop data complaint with ANSI SQL-92, -99, -2003
– Example - 5000-line query with a number of window function generated by Cognos
• Universal tool for ad hoc analytics on top of Hadoop data
![Page 29: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/29.jpg)
Why do we need it?
• SQL-interface for BI solutions to the Hadoop data complaint with ANSI SQL-92, -99, -2003
– Example - 5000-line query with a number of window function generated by Cognos
• Universal tool for ad hoc analytics on top of Hadoop data
– Example - parse URL to extract protocol, host name, port, GET parameters
![Page 30: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/30.jpg)
Why do we need it?
• SQL-interface for BI solutions to the Hadoop data complaint with ANSI SQL-92, -99, -2003
– Example - 5000-line query with a number of window function generated by Cognos
• Universal tool for ad hoc analytics on top of Hadoop data
– Example - parse URL to extract protocol, host name, port, GET parameters
• Good performance
![Page 31: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/31.jpg)
Why do we need it?
• SQL-interface for BI solutions to the Hadoop data complaint with ANSI SQL-92, -99, -2003
– Example - 5000-line query with a number of window function generated by Cognos
• Universal tool for ad hoc analytics on top of Hadoop data
– Example - parse URL to extract protocol, host name, port, GET parameters
• Good performance – How many times the data would hit the HDD during
a single Hive query?
![Page 32: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/32.jpg)
HAWQ Cluster Server1
SNameNode
Server4
ZK JM
NameNode
Server3
ZK JM
Server2
ZK JM
Server6
Datanode
ServerN
Datanode
Server5
Datanode
interconnect
…
![Page 33: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/33.jpg)
HAWQ Cluster Server1
SNameNode
Server4
ZK JM
NameNode
Server3
ZK JM
Server2
ZK JM
Server6
Datanode
ServerN
Datanode
Server5
Datanode
YARNNM YARNNM YARNNM
YARNRM YARNAppTimeline
interconnect
…
![Page 34: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/34.jpg)
HAWQ Cluster
HAWQMaster
Server1
SNameNode
Server4
ZK JM
NameNode
Server3
ZK JM
HAWQStandby
Server2
ZK JM
HAWQSegment
Server6
Datanode
HAWQSegment
ServerN
Datanode
HAWQSegment
Server5
Datanode
YARNNM YARNNM YARNNM
YARNRM YARNAppTimeline
interconnect
…
![Page 35: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/35.jpg)
Master Servers Server1
SNameNode
Server4
ZK JM
NameNode
Server3
ZK JM
Server2
ZK JM
HAWQSegment
Server6
Datanode
HAWQSegment
ServerN
Datanode
HAWQSegment
Server5
Datanode
YARNNM YARNNM YARNNM
YARNRM YARNAppTimeline
interconnect
…
HAWQMaster HAWQStandby
![Page 36: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/36.jpg)
Master Servers
HAWQMaster
QueryParser
QueryOpJmizer
GlobalResourceManager
DistributedTransacJonsManager
QueryDispatch
MetadataCatalog
HAWQStandbyMaster
QueryParser
QueryOp,mizer
GlobalResourceManager
DistributedTransac,onsManager
QueryDispatch
MetadataCatalog
WALrepl.
![Page 37: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/37.jpg)
HAWQMaster HAWQStandby
Segments Server1
SNameNode
Server4
ZK JM
NameNode
Server3
ZK JM
Server2
ZK JM
Server6
Datanode
ServerN
Datanode
Server5
Datanode
YARNNM YARNNM YARNNM
YARNRM YARNAppTimeline
interconnect
HAWQSegment HAWQSegmentHAWQSegment …
![Page 38: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/38.jpg)
Segments HAWQSegment
QueryExecutor
libhdfs3
PXF
HDFSDatanode
LocalFilesystemTemporaryData
Directory
Logs
YARNNodeManager
![Page 39: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/39.jpg)
Metadata • HAWQ metadata structure is similar to
Postgres catalog structure
![Page 40: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/40.jpg)
Metadata • HAWQ metadata structure is similar to
Postgres catalog structure
• Statistics
– Number of rows and pages in the table
![Page 41: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/41.jpg)
Metadata • HAWQ metadata structure is similar to
Postgres catalog structure
• Statistics
– Number of rows and pages in the table
– Most common values for each field
![Page 42: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/42.jpg)
Metadata • HAWQ metadata structure is similar to
Postgres catalog structure
• Statistics
– Number of rows and pages in the table
– Most common values for each field
– Histogram of values distribution for each field
![Page 43: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/43.jpg)
Metadata • HAWQ metadata structure is similar to
Postgres catalog structure
• Statistics
– Number of rows and pages in the table
– Most common values for each field
– Histogram of values distribution for each field
– Number of unique values in the field
![Page 44: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/44.jpg)
Metadata • HAWQ metadata structure is similar to
Postgres catalog structure
• Statistics
– Number of rows and pages in the table
– Most common values for each field
– Histogram of values distribution for each field
– Number of unique values in the field
– Number of null values in the field
![Page 45: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/45.jpg)
Metadata • HAWQ metadata structure is similar to
Postgres catalog structure
• Statistics
– Number of rows and pages in the table
– Most common values for each field
– Histogram of values distribution for each field
– Number of unique values in the field
– Number of null values in the field
– Average width of the field in bytes
![Page 46: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/46.jpg)
Statistics No Statistics How many rows would produce the join of two tables?
![Page 47: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/47.jpg)
Statistics No Statistics How many rows would produce the join of two tables?
ü From 0 to infinity
![Page 48: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/48.jpg)
Statistics No Statistics
Row Count
How many rows would produce the join of two tables?
ü From 0 to infinity
How many rows would produce the join of two 1000-row tables?
![Page 49: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/49.jpg)
Statistics No Statistics
Row Count
How many rows would produce the join of two tables?
ü From 0 to infinity
How many rows would produce the join of two 1000-row tables?
ü From 0 to 1’000’000
![Page 50: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/50.jpg)
Statistics No Statistics
Row Count
Histograms and MCV
How many rows would produce the join of two tables?
ü From 0 to infinity
How many rows would produce the join of two 1000-row tables?
ü From 0 to 1’000’000
How many rows would produce the join of two 1000-row tables, with known field cardinality, values distribution diagram, number of nulls, most common values?
![Page 51: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/51.jpg)
Statistics No Statistics
Row Count
Histograms and MCV
How many rows would produce the join of two tables?
ü From 0 to infinity
How many rows would produce the join of two 1000-row tables?
ü From 0 to 1’000’000
How many rows would produce the join of two 1000-row tables, with known field cardinality, values distribution diagram, number of nulls, most common values?
ü ~ From 500 to 1’500
![Page 52: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/52.jpg)
Metadata • Table structure information
ID Name Num Price1 Яблоко 10 502 Груша 20 803 Банан 40 404 Апельсин 25 505 Киви 5 1206 Арбуз 20 307 Дыня 40 1008 Ананас 35 90
![Page 53: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/53.jpg)
Metadata • Table structure information
– Distribution fields
ID Name Num Price1 Яблоко 10 502 Груша 20 803 Банан 40 404 Апельсин 25 505 Киви 5 1206 Арбуз 20 307 Дыня 40 1008 Ананас 35 90
hash(ID)
![Page 54: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/54.jpg)
Metadata • Table structure information
– Distribution fields
– Number of hash buckets
ID Name Num Price1 Яблоко 10 502 Груша 20 803 Банан 40 404 Апельсин 25 505 Киви 5 1206 Арбуз 20 307 Дыня 40 1008 Ананас 35 90
hash(ID)
ID Name Num Price1 Яблоко 10 50
2 Груша 20 803 Банан 40 40
4 Апельсин 25 505 Киви 5 120
6 Арбуз 20 307 Дыня 40 100
8 Ананас 35 90
![Page 55: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/55.jpg)
Metadata • Table structure information
– Distribution fields
– Number of hash buckets
– Partitioning (hash, list, range)
ID Name Num Price1 Яблоко 10 502 Груша 20 803 Банан 40 404 Апельсин 25 505 Киви 5 1206 Арбуз 20 307 Дыня 40 1008 Ананас 35 90
hash(ID)
ID Name Num Price1 Яблоко 10 50
2 Груша 20 803 Банан 40 40
4 Апельсин 25 50
5 Киви 5 1206 Арбуз 20 307 Дыня 40 100
8 Ананас 35 90
![Page 56: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/56.jpg)
Metadata • Table structure information
– Distribution fields
– Number of hash buckets
– Partitioning (hash, list, range)
• General metadata – Users and groups
![Page 57: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/57.jpg)
Metadata • Table structure information
– Distribution fields
– Number of hash buckets
– Partitioning (hash, list, range)
• General metadata – Users and groups
– Access privileges
![Page 58: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/58.jpg)
Metadata • Table structure information
– Distribution fields
– Number of hash buckets
– Partitioning (hash, list, range)
• General metadata – Users and groups
– Access privileges
• Stored procedures – PL/pgSQL, PL/Java, PL/Python, PL/Perl, PL/R
![Page 59: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/59.jpg)
Query Optimizer • HAWQ uses cost-based query optimizers
![Page 60: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/60.jpg)
Query Optimizer • HAWQ uses cost-based query optimizers
• You have two options – Planner – evolved from the Postgres query
optimizer
– ORCA (Pivotal Query Optimizer) – developed specifically for HAWQ
![Page 61: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/61.jpg)
Query Optimizer • HAWQ uses cost-based query optimizers
• You have two options – Planner – evolved from the Postgres query
optimizer
– ORCA (Pivotal Query Optimizer) – developed specifically for HAWQ
• Optimizer hints work just like in Postgres – Enable/disable specific operation
– Change the cost estimations for basic actions
![Page 62: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/62.jpg)
Storage Formats Which storage format is the most optimal?
![Page 63: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/63.jpg)
Storage Formats Which storage format is the most optimal?
ü It depends on what you mean by “optimal”
![Page 64: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/64.jpg)
Storage Formats Which storage format is the most optimal?
ü It depends on what you mean by “optimal” – Minimal CPU usage for reading and writing the data
![Page 65: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/65.jpg)
Storage Formats Which storage format is the most optimal?
ü It depends on what you mean by “optimal” – Minimal CPU usage for reading and writing the data
– Minimal disk space usage
![Page 66: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/66.jpg)
Storage Formats Which storage format is the most optimal?
ü It depends on what you mean by “optimal” – Minimal CPU usage for reading and writing the data
– Minimal disk space usage
– Minimal time to retrieve record by key
![Page 67: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/67.jpg)
Storage Formats Which storage format is the most optimal?
ü It depends on what you mean by “optimal” – Minimal CPU usage for reading and writing the data
– Minimal disk space usage
– Minimal time to retrieve record by key
– Minimal time to retrieve subset of columns
– etc.
![Page 68: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/68.jpg)
Storage Formats • Row-based storage format
– Similar to Postgres heap storage
• No toast
• No ctid, xmin, xmax, cmin, cmax
![Page 69: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/69.jpg)
Storage Formats • Row-based storage format
– Similar to Postgres heap storage
• No toast
• No ctid, xmin, xmax, cmin, cmax
– Compression
• No compression
• Quicklz
• Zlib levels 1 - 9
![Page 70: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/70.jpg)
Storage Formats • Apache Parquet
– Mixed row-columnar table store, the data is split into “row groups” stored in columnar format
![Page 71: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/71.jpg)
Storage Formats • Apache Parquet
– Mixed row-columnar table store, the data is split into “row groups” stored in columnar format
– Compression
• No compression
• Snappy
• Gzip levels 1 – 9
![Page 72: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/72.jpg)
Storage Formats • Apache Parquet
– Mixed row-columnar table store, the data is split into “row groups” stored in columnar format
– Compression
• No compression
• Snappy
• Gzip levels 1 – 9
– The size of “row group” and page size can be set for each table separately
![Page 73: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/73.jpg)
Resource Management • Two main options
– Static resource split – HAWQ and YARN does not know about each other
![Page 74: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/74.jpg)
Resource Management • Two main options
– Static resource split – HAWQ and YARN does not know about each other
– YARN – HAWQ asks YARN Resource Manager for query execution resources
![Page 75: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/75.jpg)
Resource Management • Two main options
– Static resource split – HAWQ and YARN does not know about each other
– YARN – HAWQ asks YARN Resource Manager for query execution resources
• Flexible cluster utilization – Query might run on a subset of nodes if it is small
![Page 76: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/76.jpg)
Resource Management • Two main options
– Static resource split – HAWQ and YARN does not know about each other
– YARN – HAWQ asks YARN Resource Manager for query execution resources
• Flexible cluster utilization – Query might run on a subset of nodes if it is small
– Query might have many executors on each cluster node to make it run faster
![Page 77: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/77.jpg)
Resource Management • Two main options
– Static resource split – HAWQ and YARN does not know about each other
– YARN – HAWQ asks YARN Resource Manager for query execution resources
• Flexible cluster utilization – Query might run on a subset of nodes if it is small
– Query might have many executors on each cluster node to make it run faster
– You can control the parallelism of each query
![Page 78: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/78.jpg)
Resource Management • Resource Queue can be set with
– Maximum number of parallel queries
![Page 79: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/79.jpg)
Resource Management • Resource Queue can be set with
– Maximum number of parallel queries
– CPU usage priority
![Page 80: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/80.jpg)
Resource Management • Resource Queue can be set with
– Maximum number of parallel queries
– CPU usage priority
– Memory usage limits
![Page 81: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/81.jpg)
Resource Management • Resource Queue can be set with
– Maximum number of parallel queries
– CPU usage priority
– Memory usage limits
– CPU cores usage limit
![Page 82: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/82.jpg)
Resource Management • Resource Queue can be set with
– Maximum number of parallel queries
– CPU usage priority
– Memory usage limits
– CPU cores usage limit
– MIN/MAX number of executors across the system
![Page 83: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/83.jpg)
Resource Management • Resource Queue can be set with
– Maximum number of parallel queries
– CPU usage priority
– Memory usage limits
– CPU cores usage limit
– MIN/MAX number of executors across the system
– MIN/MAX number of executors on each node
![Page 84: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/84.jpg)
Resource Management • Resource Queue can be set with
– Maximum number of parallel queries
– CPU usage priority
– Memory usage limits
– CPU cores usage limit
– MIN/MAX number of executors across the system
– MIN/MAX number of executors on each node
• Can be set up for user or group
![Page 85: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/85.jpg)
External Data • PXF
– Framework for external data access
– Easy to extend, many public plugins available
– Official plugins: CSV, SequenceFile, Avro, Hive, HBase
– Open Source plugins: JSON, Accumulo, Cassandra, JDBC, Redis, Pipe
![Page 86: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/86.jpg)
External Data • PXF
– Framework for external data access
– Easy to extend, many public plugins available
– Official plugins: CSV, SequenceFile, Avro, Hive, HBase
– Open Source plugins: JSON, Accumulo, Cassandra, JDBC, Redis, Pipe
• HCatalog – HAWQ can query tables from HCatalog the same
way as HAWQ native tables
![Page 87: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/87.jpg)
Query Example HAWQMaster
Metadata
TransacJonMgr.
QueryParser QueryOpJmizer
QueryDispatch
ResourceMgr.
NameNode
Server1
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
Server2
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
ServerN
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
YARNRMPostmaster
Resource Prepare Execute Result CleanupPlan
![Page 88: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/88.jpg)
Query Example HAWQMaster
Metadata
TransacJonMgr.
QueryParser QueryOpJmizer
QueryDispatch
ResourceMgr.
NameNode
Server1
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
Server2
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
ServerN
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
YARNRMPostmaster
Resource Prepare Execute Result CleanupPlan
QE
![Page 89: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/89.jpg)
Query Example HAWQMaster
Metadata
TransacJonMgr.
QueryParser QueryOpJmizer
QueryDispatch
ResourceMgr.
NameNode
Server1
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
Server2
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
ServerN
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
YARNRMPostmaster
Resource Prepare Execute Result CleanupPlan
QE
![Page 90: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/90.jpg)
Query Example HAWQMaster
Metadata
TransacJonMgr.
QueryParser QueryOpJmizer
QueryDispatch
ResourceMgr.
NameNode
Server1
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
Server2
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
ServerN
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
YARNRMPostmaster
Resource Prepare Execute Result CleanupPlan
QE
![Page 91: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/91.jpg)
Query Example HAWQMaster
Metadata
TransacJonMgr.
QueryParser QueryOpJmizer
QueryDispatch
ResourceMgr.
NameNode
Server1
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
Server2
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
ServerN
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
YARNRMPostmaster
Resource Prepare Execute Result CleanupPlan
QE
![Page 92: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/92.jpg)
Query Example HAWQMaster
Metadata
TransacJonMgr.
QueryParser QueryOpJmizer
QueryDispatch
ResourceMgr.
NameNode
Server1
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
Server2
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
ServerN
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
YARNRMPostmaster
Resource Prepare Execute Result CleanupPlan
QE ScanBarsb
HashJoinb.name = s.bar
ScanSellss Filterb.city = 'San Francisco'
Projects.beer, s.price
MotionGather
MotionRedist(b.name)
![Page 93: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/93.jpg)
Plan
Query Example HAWQMaster
Metadata
TransacJonMgr.
QueryParser QueryOpJmizer
QueryDispatch
ResourceMgr.
NameNode
Server1
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
Server2
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
ServerN
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
YARNRMPostmaster
Prepare Execute Result Cleanup
QE
Resource
![Page 94: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/94.jpg)
Plan
Query Example HAWQMaster
Metadata
TransacJonMgr.
QueryParser QueryOpJmizer
QueryDispatch
ResourceMgr.
NameNode
Server1
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
Server2
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
ServerN
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
YARNRMPostmaster
Prepare Execute Result Cleanup
QE
Resource
Ineed5containersEachwith1CPUcoreand256MBRAM
![Page 95: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/95.jpg)
Plan
Query Example HAWQMaster
Metadata
TransacJonMgr.
QueryParser QueryOpJmizer
QueryDispatch
ResourceMgr.
NameNode
Server1
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
Server2
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
ServerN
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
YARNRMPostmaster
Prepare Execute Result Cleanup
QE
Resource
Ineed5containersEachwith1CPUcoreand256MBRAM
Server1:2containersServer2:1containerServerN:2containers
![Page 96: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/96.jpg)
Plan
Query Example HAWQMaster
Metadata
TransacJonMgr.
QueryParser QueryOpJmizer
QueryDispatch
ResourceMgr.
NameNode
Server1
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
Server2
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
ServerN
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
YARNRMPostmaster
Prepare Execute Result Cleanup
QE
Resource
Ineed5containersEachwith1CPUcoreand256MBRAM
Server1:2containersServer2:1containerServerN:2containers
![Page 97: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/97.jpg)
Plan
Query Example HAWQMaster
Metadata
TransacJonMgr.
QueryParser QueryOpJmizer
QueryDispatch
ResourceMgr.
NameNode
Server1
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
Server2
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
ServerN
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
YARNRMPostmaster
Prepare Execute Result Cleanup
QE
Resource
Ineed5containersEachwith1CPUcoreand256MBRAM
Server1:2containersServer2:1containerServerN:2containers
![Page 98: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/98.jpg)
Plan
Query Example HAWQMaster
Metadata
TransacJonMgr.
QueryParser QueryOpJmizer
QueryDispatch
ResourceMgr.
NameNode
Server1
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
Server2
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
ServerN
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
YARNRMPostmaster
Prepare Execute Result Cleanup
QE
Resource
Ineed5containersEachwith1CPUcoreand256MBRAM
Server1:2containersServer2:1containerServerN:2containers
QE QE QE QE QE
![Page 99: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/99.jpg)
ResourcePlan
Query Example HAWQMaster
Metadata
TransacJonMgr.
QueryParser QueryOpJmizer
QueryDispatch
ResourceMgr.
NameNode
Server1
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
Server2
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
ServerN
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
YARNRMPostmaster
Execute Result Cleanup
QE
QE QE QE QE QE
Prepare
![Page 100: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/100.jpg)
ResourcePlan
Query Example HAWQMaster
Metadata
TransacJonMgr.
QueryParser QueryOpJmizer
QueryDispatch
ResourceMgr.
NameNode
Server1
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
Server2
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
ServerN
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
YARNRMPostmaster
Execute Result Cleanup
QE
QE QE QE QE QE
Prepare
ScanBarsb
HashJoinb.name = s.bar
ScanSellss Filterb.city = 'San Francisco'
Projects.beer, s.price
MotionGather
MotionRedist(b.name)
![Page 101: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/101.jpg)
ResourcePlan
Query Example HAWQMaster
Metadata
TransacJonMgr.
QueryParser QueryOpJmizer
QueryDispatch
ResourceMgr.
NameNode
Server1
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
Server2
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
ServerN
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
YARNRMPostmaster
Execute Result Cleanup
QE
QE QE QE QE QE
Prepare
ScanBarsb
HashJoinb.name = s.bar
ScanSellss Filterb.city = 'San Francisco'
Projects.beer, s.price
MotionGather
MotionRedist(b.name)
![Page 102: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/102.jpg)
ResourcePlan
Query Example HAWQMaster
Metadata
TransacJonMgr.
QueryParser QueryOpJmizer
QueryDispatch
ResourceMgr.
NameNode
Server1
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
Server2
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
ServerN
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
YARNRMPostmaster
Result Cleanup
QE
QE QE QE QE QE
Prepare Execute
ScanBarsb
HashJoinb.name = s.bar
ScanSellss Filterb.city = 'San Francisco'
Projects.beer, s.price
MotionGather
MotionRedist(b.name)
![Page 103: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/103.jpg)
ResourcePlan
Query Example HAWQMaster
Metadata
TransacJonMgr.
QueryParser QueryOpJmizer
QueryDispatch
ResourceMgr.
NameNode
Server1
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
Server2
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
ServerN
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
YARNRMPostmaster
Result Cleanup
QE
QE QE QE QE QE
Prepare Execute
ScanBarsb
HashJoinb.name = s.bar
ScanSellss Filterb.city = 'San Francisco'
Projects.beer, s.price
MotionGather
MotionRedist(b.name)
![Page 104: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/104.jpg)
ResourcePlan
Query Example HAWQMaster
Metadata
TransacJonMgr.
QueryParser QueryOpJmizer
QueryDispatch
ResourceMgr.
NameNode
Server1
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
Server2
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
ServerN
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
YARNRMPostmaster
Result Cleanup
QE
QE QE QE QE QE
Prepare Execute
ScanBarsb
HashJoinb.name = s.bar
ScanSellss Filterb.city = 'San Francisco'
Projects.beer, s.price
MotionGather
MotionRedist(b.name)
![Page 105: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/105.jpg)
ResourcePlan
Query Example HAWQMaster
Metadata
TransacJonMgr.
QueryParser QueryOpJmizer
QueryDispatch
ResourceMgr.
NameNode
Server1
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
Server2
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
ServerN
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
YARNRMPostmaster
Cleanup
QE
QE QE QE QE QE
Prepare Execute Result
![Page 106: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/106.jpg)
ResourcePlan
Query Example HAWQMaster
Metadata
TransacJonMgr.
QueryParser QueryOpJmizer
QueryDispatch
ResourceMgr.
NameNode
Server1
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
Server2
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
ServerN
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
YARNRMPostmaster
Cleanup
QE
QE QE QE QE QE
Prepare Execute Result
![Page 107: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/107.jpg)
ResourcePlan
Query Example HAWQMaster
Metadata
TransacJonMgr.
QueryParser QueryOpJmizer
QueryDispatch
ResourceMgr.
NameNode
Server1
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
Server2
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
ServerN
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
YARNRMPostmaster
Cleanup
QE
QE QE QE QE QE
Prepare Execute Result
![Page 108: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/108.jpg)
ResourcePlan
Query Example HAWQMaster
Metadata
TransacJonMgr.
QueryParser QueryOpJmizer
QueryDispatch
ResourceMgr.
NameNode
Server1
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
Server2
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
ServerN
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
YARNRMPostmaster
QE
QE QE QE QE QE
Prepare Execute Result Cleanup
![Page 109: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/109.jpg)
ResourcePlan
Query Example HAWQMaster
Metadata
TransacJonMgr.
QueryParser QueryOpJmizer
QueryDispatch
ResourceMgr.
NameNode
Server1
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
Server2
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
ServerN
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
YARNRMPostmaster
QE
QE QE QE QE QE
Prepare Execute Result Cleanup
FreequeryresourcesServer1:2containersServer2:1containerServerN:2containers
![Page 110: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/110.jpg)
ResourcePlan
Query Example HAWQMaster
Metadata
TransacJonMgr.
QueryParser QueryOpJmizer
QueryDispatch
ResourceMgr.
NameNode
Server1
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
Server2
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
ServerN
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
YARNRMPostmaster
QE
QE QE QE QE QE
Prepare Execute Result Cleanup
FreequeryresourcesServer1:2containersServer2:1containerServerN:2containers
OK
![Page 111: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/111.jpg)
ResourcePlan
Query Example HAWQMaster
Metadata
TransacJonMgr.
QueryParser QueryOpJmizer
QueryDispatch
ResourceMgr.
NameNode
Server1
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
Server2
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
ServerN
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
YARNRMPostmaster
QE
QE QE QE QE QE
Prepare Execute Result Cleanup
FreequeryresourcesServer1:2containersServer2:1containerServerN:2containers
OK
![Page 112: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/112.jpg)
ResourcePlan
Query Example HAWQMaster
Metadata
TransacJonMgr.
QueryParser QueryOpJmizer
QueryDispatch
ResourceMgr.
NameNode
Server1
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
Server2
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
ServerN
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
YARNRMPostmaster
QE
QE QE QE QE QE
Prepare Execute Result Cleanup
FreequeryresourcesServer1:2containersServer2:1containerServerN:2containers
OK
![Page 113: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/113.jpg)
ResourcePlan
Query Example HAWQMaster
Metadata
TransacJonMgr.
QueryParser QueryOpJmizer
QueryDispatch
ResourceMgr.
NameNode
Server1
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
Server2
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
ServerN
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
YARNRMPostmaster
QE
QE QE QE QE QE
Prepare Execute Result Cleanup
FreequeryresourcesServer1:2containersServer2:1containerServerN:2containers
OK
![Page 114: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/114.jpg)
ResourcePlan
Query Example HAWQMaster
Metadata
TransacJonMgr.
QueryParser QueryOpJmizer
QueryDispatch
ResourceMgr.
NameNode
Server1
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
Server2
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
ServerN
Localdirectory
HAWQSegmentPostmaster
HDFSDatanode
YARNRMPostmaster
Prepare Execute Result Cleanup
![Page 115: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/115.jpg)
Query Performance • Data does not hit the disk unless this cannot be
avoided
![Page 116: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/116.jpg)
Query Performance • Data does not hit the disk unless this cannot be
avoided
• Data is not buffered on the segments unless this cannot be avoided
![Page 117: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/117.jpg)
Query Performance • Data does not hit the disk unless this cannot be
avoided
• Data is not buffered on the segments unless this cannot be avoided
• Data is transferred between the nodes by UDP
![Page 118: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/118.jpg)
Query Performance • Data does not hit the disk unless this cannot be
avoided
• Data is not buffered on the segments unless this cannot be avoided
• Data is transferred between the nodes by UDP
• HAWQ has a good cost-based query optimizer
![Page 119: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/119.jpg)
Query Performance • Data does not hit the disk unless this cannot be
avoided
• Data is not buffered on the segments unless this cannot be avoided
• Data is transferred between the nodes by UDP
• HAWQ has a good cost-based query optimizer
• C/C++ implementation is more efficient than Java implementation of competitive solutions
![Page 120: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/120.jpg)
Query Performance • Data does not hit the disk unless this cannot be
avoided
• Data is not buffered on the segments unless this cannot be avoided
• Data is transferred between the nodes by UDP
• HAWQ has a good cost-based query optimizer
• C/C++ implementation is more efficient than Java implementation of competitive solutions
• Query parallelism can be easily tuned
![Page 121: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/121.jpg)
Competitive Solutions Hive SparkSQL Impala HAWQ
QueryOpJmizer
![Page 122: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/122.jpg)
Competitive Solutions Hive SparkSQL Impala HAWQ
QueryOpJmizer
ANSISQL
![Page 123: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/123.jpg)
Competitive Solutions Hive SparkSQL Impala HAWQ
QueryOpJmizer
ANSISQL
Built-inLanguages
![Page 124: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/124.jpg)
Competitive Solutions Hive SparkSQL Impala HAWQ
QueryOpJmizer
ANSISQL
Built-inLanguages
DiskIO
![Page 125: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/125.jpg)
Competitive Solutions Hive SparkSQL Impala HAWQ
QueryOpJmizer
ANSISQL
Built-inLanguages
DiskIO
Parallelism
![Page 126: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/126.jpg)
Competitive Solutions Hive SparkSQL Impala HAWQ
QueryOpJmizer
ANSISQL
Built-inLanguages
DiskIO
Parallelism
DistribuJons
![Page 127: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/127.jpg)
Competitive Solutions Hive SparkSQL Impala HAWQ
QueryOpJmizer
ANSISQL
Built-inLanguages
DiskIO
Parallelism
DistribuJons
Stability
![Page 128: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/128.jpg)
Competitive Solutions Hive SparkSQL Impala HAWQ
QueryOpJmizer
ANSISQL
Built-inLanguages
DiskIO
Parallelism
DistribuJons
Stability
Community
![Page 129: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/129.jpg)
Roadmap • AWS and S3 integration
![Page 130: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/130.jpg)
Roadmap • AWS and S3 integration
• Mesos integration
![Page 131: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/131.jpg)
Roadmap • AWS and S3 integration
• Mesos integration
• Better Ambari integration
![Page 132: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/132.jpg)
Roadmap • AWS and S3 integration
• Mesos integration
• Better Ambari integration
• Cloudera, MapR and IBM Hadoop distributions native support
![Page 133: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/133.jpg)
Roadmap • AWS and S3 integration
• Mesos integration
• Better Ambari integration
• Cloudera, MapR and IBM Hadoop distributions native support
• Make the SQL-on-Hadoop engine ever!
![Page 134: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/134.jpg)
Summary
• Modern SQL-on-Hadoop engine
• For structured data processing and analysis
• Combines the best techniques of competitive solutions
• Just released to the open source
• Community is very young
Join our community and contribute!
![Page 135: HAWQ Architecture HL++ 2015 Moscow](https://reader031.vdocuments.net/reader031/viewer/2022022418/58a1aa1c1a28ab89478ba363/html5/thumbnails/135.jpg)
Questions
Apache HAWQ
http://hawq.incubator.apache.org
Reach me on http://0x0fff.com