orca and postgresql

21
1 © Copyright 2015 Pivotal. All rights reserved. 1 © Copyright 2015 Pivotal. All rights reserved. ORCA and PostgreSQL Venkatesh Raghavan South Bay PostgreSQL Meetup, San Jose

Upload: phungnhan

Post on 13-Feb-2017

252 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: ORCA and PostgreSQL

1© Copyright 2015 Pivotal. All rights reserved. 1© Copyright 2015 Pivotal. All rights reserved.

ORCA and PostgreSQLVenkatesh Raghavan

South Bay PostgreSQL Meetup, San Jose

Page 2: ORCA and PostgreSQL

2© Copyright 2015 Pivotal. All rights reserved.

PostgreSQL Planner• Addresses join re-ordering• Treats everything else as add-on (grouping, with clause, etc.)• Imposes order on specific optimization steps• Recursively descends into subqueries

• Sometimes planner is unable to unnest complex correlated subqueries• High code complexity

• Maximum: 102 (Orca 8.5)• Minimum: 6.4 (Orca 1.5)

Page 3: ORCA and PostgreSQL

3© Copyright 2015 Pivotal. All rights reserved.

Join Ordering vs. “Everything Else”• TPC-H Query 5

– 6 Tables– “Harmless” query

“Everything Else”Size of search space ~230,000,000

Join Order ProblemSize of search space

< 100,000

Page 4: ORCA and PostgreSQL

4© Copyright 2015 Pivotal. All rights reserved.

Orca Architecture• Orca is not baked into one host system

Orca

ParserHost System

< />SQL

Q2DXL

DXLQuery

MD Provider

MD

req.

Catalog Executor

DXLMD

DXLPlan

DXL2Plan

Results

Page 5: ORCA and PostgreSQL

5© Copyright 2015 Pivotal. All rights reserved.

Key Features• Smarter partition elimination• Subquery unnesting• Optimizing common table expressions (CTE)• Additional functionality

• Improved join ordering• Join-aggregate reordering• Sort order optimization

Page 6: ORCA and PostgreSQL

6© Copyright 2015 Pivotal. All rights reserved.

TPC-DS Orca vs. Planner on HAWQ

TPC-DS 10TB, 16 nodes, 48 GB/node

Planner Plan Hangs

Orca Plan Better

Planner Plan Better

Page 7: ORCA and PostgreSQL

7© Copyright 2015 Pivotal. All rights reserved. 7© Copyright 2015 Pivotal. All rights reserved. 7© Copyright 2015 Pivotal. All rights reserved.

Subquery Unnesting

Page 8: ORCA and PostgreSQL

8© Copyright 2015 Pivotal. All rights reserved.

Subqueries: Impact

● Heavily used in many workloads○ BI/reporting tools generate substantial number of subqueries○ TPC-H workload: 40% of the 22 queries○ TPC-DS workload: 20% of the 111 queries

● Inefficient plans mean query takes a long time or does not terminate● Optimizations

○ De-correlation○ Conversion of subqueries to joins

Page 9: ORCA and PostgreSQL

9© Copyright 2015 Pivotal. All rights reserved.

Subquery Handling: Orca vs. Planner

CSQ Class Planner Orca

CSQ in select list Correlated Execution Join

CSQ in disjunctive filter Correlated Execution Join

Multi-Level CSQ No Plan Join

CSQ with group by and inequality Correlated Execution Join

CSQ must return one row Correlated Execution Join

CSQ with correlation in select list Correlated Execution Correlated Execution

Page 10: ORCA and PostgreSQL

10© Copyright 2015 Pivotal. All rights reserved. 10© Copyright 2015 Pivotal. All rights reserved. 10© Copyright 2015 Pivotal. All rights reserved.

How to enable Orca on Postgres?

Page 11: ORCA and PostgreSQL

11© Copyright 2015 Pivotal. All rights reserved.

• GPORCA: https://github.com/greenplum-db/gporca

• White Paper: http://bit.ly/1ntrE8v

• Pivotal Tracker: http://bit.ly/1m1WGDn

Get Your Hands On It!

Page 12: ORCA and PostgreSQL

12© Copyright 2015 Pivotal. All rights reserved.

• Step 1: Lift and shift translators already implemented in GPDB

• Step 2: Functions in lsyscache.* and gucs* need to moved

• Step 3: Update makefiles (most of this already done in GPDB)

Getting Orca on Postgres

Page 13: ORCA and PostgreSQL

13© Copyright 2015 Pivotal. All rights reserved.

Publications• Optimization of Common Table Expressions in MPP Database Systems, VLDB 2015

– Amr El-Helw, Venkatesh Raghavan, Mohamed A. Soliman, George C. Caragea, Zhongxian Gu, Michalis Petropoulos

• Orca: A Modular Query Optimizer Architecture for Big Data, SIGMOD 2014– Mohamed A. Soliman, Lyublena Antova, Venkatesh Raghavan, Amr El-Helw, Zhongxian Gu, Entong Shen, George C. Caragea, Carlos Garcia-

Alvarado, Foyzur Rahman, Michalis Petropoulos, Florian Waas, Sivaramakrishnan Narayanan, Konstantinos Krikellas, Rhonda Baldwin

• Optimizing Queries over Partitioned Tables in MPP Systems, SIGMOD 2014– Lyublena Antova, Amr El-Helw, Mohamed Soliman, Zhongxian Gu, Michalis Petropoulos, Florian Waas

• Reversing Statistics for Scalable Test Databases Generation, DBTest 2013– Entong Shen, Lyublena Antova

• Total Operator State Recall - Cost-Effective Reuse of Results in Greenplum Database, ICDE Workshops 2013– George C. Caragea, Carlos Garcia-Alvarado, Michalis Petropoulos, Florian M. Waas

• Testing the Accuracy of Query Optimizers, DBTest 2012– Zhongxian Gu, Mohamed A. Soliman, Florian M. Waas

– Automatic Capture of Minimal, Portable, and Executable Bug Repros using AMPERe, DBTest 2012– Lyublena Antova, Konstantinos Krikellas, Florian M. Waas

– Automatic Data Placement in MPP Databases, ICDE Workshops 2012– Carlos Garcia-Alvarado, Venkatesh Raghavan, Sivaramakrishnan Narayanan, Florian M. Waas

Page 14: ORCA and PostgreSQL

14© Copyright 2015 Pivotal. All rights reserved. 14© Copyright 2015 Pivotal. All rights reserved. 14© Copyright 2015 Pivotal. All rights reserved.

Backup

Page 15: ORCA and PostgreSQL

15© Copyright 2015 Pivotal. All rights reserved.

What Is GP-Orca?

• State-of-the-art query optimization framework designed from scratch to:– Improve – performance, ease-of-use– Enable – foundation for future research and development– Connect – applies to multiple host systems (GPDB, HAWQ,

Postgres)

Page 16: ORCA and PostgreSQL

16© Copyright 2015 Pivotal. All rights reserved.

● A query that is nested inside an outer query block● Correlated Subquery (CSQ) is a subquery that uses values

from the outer query

Subqueries: Definition

SELECT * FROM part p1WHERE price > (SELECT avg(price) FROM part p2 WHERE p2.brand = p1.brand)

Page 17: ORCA and PostgreSQL

17© Copyright 2015 Pivotal. All rights reserved.

Subqueries in Disjunctive Filters

• Find parts with: size > 40 OR price > the average brand price

SELECT *FROM part p1WHERE p_size > 40 ORp_retailprice >(SELECT avg(p_retailprice) FROM part p2 WHERE p2.p_brand = p1.p_brand)

Page 18: ORCA and PostgreSQL

18© Copyright 2015 Pivotal. All rights reserved.

Subqueries in Disjunctive Filters

Page 19: ORCA and PostgreSQL

19© Copyright 2015 Pivotal. All rights reserved.

Currently in Apache HAWQ

Parser Executor

Orca

PlannerSQL ResultsQuery Plan

Page 20: ORCA and PostgreSQL

20© Copyright 2015 Pivotal. All rights reserved.

When Orca is exercised

Parser Executor

Orca

Planner

Query Plan

SQL Results

Page 21: ORCA and PostgreSQL

21© Copyright 2015 Pivotal. All rights reserved.

When Orca fallbacks

Parser Executor

Orca

Planner

Query

SQL Results

Fallback

Plan

Orca

Orca will automatically fallback to the legacy optimizerfor unsupported features