apache impala (incubating) 2.5 performance update

53
1 © Cloudera, Inc. All rights reserved. Apache Impala 2.5 (Incubating) Performance improvements overview

Upload: cloudera-inc

Post on 17-Feb-2017

744 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Apache Impala (incubating) 2.5 Performance Update

1© Cloudera, Inc. All rights reserved.

Apache Impala 2.5 (Incubating)Performance improvements overview

Page 2: Apache Impala (incubating) 2.5 Performance Update

2© Cloudera, Inc. All rights reserved.

Agenda

• What is Impala? • Impala at Apache• What is new in Impala 2.5 (CDH 5.7)• Impala performance update• Roadmap• Q&A

Page 3: Apache Impala (incubating) 2.5 Performance Update

3© Cloudera, Inc. All rights reserved.

SQL-on-Hadoop engines

SQLImpala

SQL-on-Apache Hadoop – Choosing the right tool for the right job

Page 4: Apache Impala (incubating) 2.5 Performance Update

4© Cloudera, Inc. All rights reserved.

• General-purpose SQL engine • Real-time queries in Apache Hadoop • General availability (v1.0) release out since April 2013 • Analytic SQL functionality (v2.0) since October 2014• Apache incubator project since December 2015• Previous release 2.3 (CDH 5.5) released November 2015

• Current release 2.5 (CDH 5.7) April 2016

What is Impala?

Today’s topic

Justin Erickson
Maybe add:"Analytic SQL functionality (v2.0) since October 2014"Just mention if you run through this that this means"Things like, SQL:2003 analytic window functions, Correlated/uncorrelated subqueries, etc" Reason this helps address that Impala is proper SQL
Page 5: Apache Impala (incubating) 2.5 Performance Update

5© Cloudera, Inc. All rights reserved.

• Query speed over Hadoop that meets or exceeds that of a proprietary analytic DBMS• General-purpose SQL query engine:

• Targeted for analytical workloads• Supports queries that take from milliseconds to hours

• Runs directly within Hadoop: • reads widely used Hadoop file formats • talks to widely used Hadoop storage managers • runs on same nodes that run Hadoop processes • Highly available

• High performance: • C++ instead of Java • Run time code generation

Impala overview

Page 6: Apache Impala (incubating) 2.5 Performance Update

6© Cloudera, Inc. All rights reserved.

Impala Use Cases

•Interactive BI/analytics on more data•Asking new questions – exploration, ML (Ibis)•Data processing with tight SLAs•Query-able archive w/full fidelity

Page 7: Apache Impala (incubating) 2.5 Performance Update

7© Cloudera, Inc. All rights reserved.

• Incubator project since December 2015

• Development process slowly moving to ASF infrastructure (see IMPALA-3221)

• Help wanted!

Where to find the Impala community:

[email protected]

[email protected]

http://impala.io

@apacheimpala

Impala at Apache

Page 8: Apache Impala (incubating) 2.5 Performance Update

8© Cloudera, Inc. All rights reserved.

New in Impala 2.5Usability Enhancements• Admission Control Improvements• Null-safe join/equals

Performance and Scalability• Runtime filters• Improved Cardinality Estimation and Join

Ordering• Query start-up improvements• Additional codegen and code

optimizations• Decimal arithmetic improvements• Fast min/max values on partition

columns(with query option)Integrations•Support for EMC DSSD

Page 9: Apache Impala (incubating) 2.5 Performance Update

9© Cloudera, Inc. All rights reserved.

New in Impala 2.5Performance and Scalability

• Runtime filters• Improved Cardinality Estimation and Join

Ordering• Query start-up improvements• Additional codegen and code

optimizations• Decimal arithmetic improvements• Incremental metadata updates (DDL)• Fast min/max values on partition

columns(with query option)

Covered today

Page 10: Apache Impala (incubating) 2.5 Performance Update

10© Cloudera, Inc. All rights reserved.

Impala 2.5 (CDH 5.7) improvements vs Impala 2.3 (CDH 5.5)

• 2.2x speedup for TPC-H• 1.7x speedup for TPC-H (Nested)• 4.3X speedup for TPC-DS

Page 11: Apache Impala (incubating) 2.5 Performance Update

11© Cloudera, Inc. All rights reserved.

Runtime filtering

• General idea: some predicates can only be computed at runtime

• Example: SELECT count(*) FROM date_dim dt ,store_sales WHERE dt.d_date_sk = store_sales.ss_sold_date_sk AND dt.d_moy = 12;

• How does Impala execute this query?

Page 12: Apache Impala (incubating) 2.5 Performance Update

12© Cloudera, Inc. All rights reserved.

SELECT dt.d_year,item.i_brand brand,sum(ss_ext_sales_price) sum_agg

FROM date_dim dt,store_sales,item

WHERE dt.d_date_sk = store_sales.ss_sold_date_skAND store_sales.ss_item_sk = item.i_item_skAND i_category = "Books"AND i_class = "fiction"AND dt.d_moy = 12

GROUP BY dt.d_year,item.i_brand

ORDER BY dt.d_year,sum_agg DESC,i_brand limit 100

Runtime filters

store_sales

43 billion rows

item

198 rows

Broadcast Join #1

290 million rows

date_dim

6,200 rows

Broadcast Join #2

Aggregate

47 million rows

Page 13: Apache Impala (incubating) 2.5 Performance Update

13© Cloudera, Inc. All rights reserved.

SELECT dt.d_year,item.i_brand brand,sum(ss_ext_sales_price) sum_agg

FROM date_dim dt,store_sales,item

WHERE dt.d_date_sk = store_sales.ss_sold_date_skAND store_sales.ss_item_sk = item.i_item_skAND i_category = "Books"AND i_class = "fiction"AND dt.d_moy = 12

GROUP BY dt.d_year,item.i_brand

ORDER BY dt.d_year,sum_agg DESC,i_brand limit 100

Runtime filters

store_sales

43 billion rows

item

198 rows

Broadcast Join #1

290 million rows

date_dim

6,200 rows

Broadcast Join #2

Aggregate

47 million rows

Runtime filters: the opportunity● The planner doesn’t know what the set of

ss_sold_date_sk and ss_item_sk contains - even with statistics.

● opportunity to save some work - why bother sending 43 billion of those rows to the joins?

● Runtime filters computes this predicate at runtime.

Page 14: Apache Impala (incubating) 2.5 Performance Update

14© Cloudera, Inc. All rights reserved.

SELECT dt.d_year,item.i_brand brand,sum(ss_ext_sales_price) sum_agg

FROM date_dim dt,store_sales,item

WHERE dt.d_date_sk = store_sales.ss_sold_date_skAND store_sales.ss_item_sk = item.i_item_skAND i_category = "Books"AND i_class = "fiction"AND dt.d_moy = 12

GROUP BY dt.d_year,item.i_brand

ORDER BY dt.d_year,sum_agg DESC,i_brand limit 100

Runtime filters

store_sales

43 billion rows

item

198 rows

Broadcast Join #1

290 million rows

date_dim

6,200 rows

Broadcast Join #2

Aggregate

47 million rowsStep 1: planner tells Join #1 to produce bloom filter qualifying i_item_sk & Join #2 to produce bloom filter for qualifying d_date_sk

Page 15: Apache Impala (incubating) 2.5 Performance Update

15© Cloudera, Inc. All rights reserved.

SELECT dt.d_year,item.i_brand brand,sum(ss_ext_sales_price) sum_agg

FROM date_dim dt,store_sales,item

WHERE dt.d_date_sk = store_sales.ss_sold_date_skAND store_sales.ss_item_sk = item.i_item_skAND i_category = "Books"AND i_class = "fiction"AND dt.d_moy = 12

GROUP BY dt.d_year,item.i_brand

ORDER BY dt.d_year,sum_agg DESC,i_brand limit 100

Runtime filters

store_sales

43 billion rows

item

198 rows

Broadcast Join #1

290 million rows

date_dim

6,200 rows

Broadcast Join #2

Aggregate

47 million rowsStep 2: Join reads all rows from build side (right input), and computes filter containing all distinct values of i_item_sk and d_date_sk

Page 16: Apache Impala (incubating) 2.5 Performance Update

16© Cloudera, Inc. All rights reserved.

SELECT dt.d_year,item.i_brand brand,sum(ss_ext_sales_price) sum_agg

FROM date_dim dt,store_sales,item

WHERE dt.d_date_sk = store_sales.ss_sold_date_skAND store_sales.ss_item_sk = item.i_item_skAND i_category = "Books"AND i_class = "fiction"AND dt.d_moy = 12

GROUP BY dt.d_year,item.i_brand

ORDER BY dt.d_year,sum_agg DESC,i_brand limit 100

Runtime filters

store_sales

43 billion rows

item

198 rows

Broadcast Join #1

290 million rows

date_dim

6,200 rows

Broadcast Join #2

Aggregate

47 million rowsStep 3: Join #1 & #2 sends filter to store_sales scan. Scan eliminates rows that don’t have a match in the bloom filters.

Page 17: Apache Impala (incubating) 2.5 Performance Update

17© Cloudera, Inc. All rights reserved.

SELECT dt.d_year,item.i_brand brand,sum(ss_ext_sales_price) sum_agg

FROM date_dim dt,store_sales,item

WHERE dt.d_date_sk = store_sales.ss_sold_date_skAND store_sales.ss_item_sk = item.i_item_skAND i_category = "Books"AND i_class = "fiction"AND dt.d_moy = 12

GROUP BY dt.d_year,item.i_brand

ORDER BY dt.d_year,sum_agg DESC,i_brand limit 100

Runtime filters

store_sales

47 million rows

item

198 rows

Broadcast Join #1

47 million rows

date_dim

6,200 rows

Broadcast Join #2

Aggregate

47 million rows

store_sales scan uses bloom filter from Join #2 to filter out partitions (ss_sold_date_sk)and bloom filter from Join #1 to filter out rows that don’t qualify (ss_item_sk)

Page 18: Apache Impala (incubating) 2.5 Performance Update

18© Cloudera, Inc. All rights reserved.

SELECT dt.d_year,item.i_brand brand,sum(ss_ext_sales_price) sum_agg

FROM date_dim dt,store_sales,item

WHERE dt.d_date_sk = store_sales.ss_sold_date_skAND store_sales.ss_item_sk = item.i_item_skAND i_category = "Books"AND i_class = "fiction"AND dt.d_moy = 12

GROUP BY dt.d_year,item.i_brand

ORDER BY dt.d_year,sum_agg DESC,i_brand limit 100

Runtime filters

store_sales

47 million rows

item

198 rows

Broadcast Join #1

47 million rows

date_dim

6,200 rows

Broadcast Join #2

Aggregate

47 million rows

914x reduction in number of rows coming out of scan43 billion -> 47 million

6x reduction in number of rows coming out of join290 million -> 47 million

Page 19: Apache Impala (incubating) 2.5 Performance Update

19© Cloudera, Inc. All rights reserved.

SELECT c_email_address,sum(ss_ext_sales_price) sum_agg

FROM store_sales,customer,customer_demographics

WHERE ss_customer_sk = c_customer_skAND cd_demo_sk = c_current_cdemo_skAND cd_gender = ‘M’AND cd_purchase_estimate = 10000AND cd_credit_reting = ‘Low Risk’

GROUP BY c_email_addressORDER BY sum_agg DESC

Runtime filters variation : Global filters

ShuffleJoin #1

43 billion rows

customer_demo

2,400 rows

BroadcastJoin #2

Aggregate

49 million rows

store_sales

43 billion rows

customer

3.8 million

Shuffle Shuffle

Justin Kestelyn
maybe remove global filters example in interests of time
Page 20: Apache Impala (incubating) 2.5 Performance Update

20© Cloudera, Inc. All rights reserved.

SELECT c_email_address,sum(ss_ext_sales_price) sum_agg

FROM store_sales,customer,customer_demographics

WHERE ss_customer_sk = c_customer_skAND cd_demo_sk = c_current_cdemo_skAND cd_gender = ‘M’AND cd_purchase_estimate = 10000AND cd_credit_reting = ‘Low Risk’

GROUP BY c_email_addressORDER BY sum_agg DESC

Runtime filters variation : Global filters

ShuffleJoin #1

43 billion rows

customer_demo

2,400 rows

BroadcastJoin #2

Aggregate

49 million rows

Join #1 & #2 are expensive joins since left side of the joins have 43 billion rows

store_sales

43 billion rows

customer

3.8 million

Shuffle Shuffle

Page 21: Apache Impala (incubating) 2.5 Performance Update

21© Cloudera, Inc. All rights reserved.

SELECT c_email_address,sum(ss_ext_sales_price) sum_agg

FROM store_sales,customer,customer_demographics

WHERE ss_customer_sk = c_customer_skAND cd_demo_sk = c_current_cdemo_skAND cd_gender = ‘M’AND cd_purchase_estimate = 10000AND cd_credit_reting = ‘Low Risk’

GROUP BY c_email_addressORDER BY sum_agg DESC

Runtime filters variation : Global filters

ShuffleJoin #1

43 billion rows

customer_demo

2,400 rows

BroadcastJoin #2

Aggregate

49 million rows

Create bloom filter from Join #2 on cd_demo_sk and push down to customer table scan

store_sales

43 billion rows

customer

3.8 million

Shuffle Shuffle

Page 22: Apache Impala (incubating) 2.5 Performance Update

22© Cloudera, Inc. All rights reserved.

SELECT c_email_address,sum(ss_ext_sales_price) sum_agg

FROM store_sales,customer,customer_demographics

WHERE ss_customer_sk = c_customer_skAND cd_demo_sk = c_current_cdemo_skAND cd_gender = ‘M’AND cd_purchase_estimate = 10000AND cd_credit_reting = ‘Low Risk’

GROUP BY c_email_addressORDER BY sum_agg DESC

Runtime filters variation : Global filters

ShuffleJoin #1

43 billion rows

customer_demo

2,400 rows

BroadcastJoin #2

Aggregate

49 million rows

Reduced customer rows by 826X

3.8 million to 4,600 rows

store_sales

43 billion rows

customer

4,600 rows

Shuffle Shuffle

Page 23: Apache Impala (incubating) 2.5 Performance Update

23© Cloudera, Inc. All rights reserved.

SELECT c_email_address,sum(ss_ext_sales_price) sum_agg

FROM store_sales,customer,customer_demographics

WHERE ss_customer_sk = c_customer_skAND cd_demo_sk = c_current_cdemo_skAND cd_gender = ‘M’AND cd_purchase_estimate = 10000AND cd_credit_reting = ‘Low Risk’

GROUP BY c_email_addressORDER BY sum_agg DESC

Runtime filters variation : Global filters

ShuffleJoin #1

43 billion rows

customer_demo

2,400 rows

BroadcastJoin #2

Aggregate

49 million rows

store_sales

43 billion rows

customer

4,600 rows

Shuffle Shuffle

Create bloom filter from Join #1 on c_customer_sk and push down to store_sales table scan

Page 24: Apache Impala (incubating) 2.5 Performance Update

24© Cloudera, Inc. All rights reserved.

SELECT c_email_address,sum(ss_ext_sales_price) sum_agg

FROM store_sales,customer,customer_demographics

WHERE ss_customer_sk = c_customer_skAND cd_demo_sk = c_current_cdemo_skAND cd_gender = ‘M’AND cd_purchase_estimate = 10000AND cd_credit_reting = ‘Low Risk’

GROUP BY c_email_addressORDER BY sum_agg DESC

Runtime filters variation : Global filters

ShuffleJoin #1

49 million rows

customer_demo

2,400 rows

BroadcastJoin #2

Aggregate

49 million rows

store_sales

49 million rows

customer

4,600 rows

Shuffle Shuffle

877x reduction in rows43 billion -> 49 million rows

set RUNTIME_FILTER_MODE=GLOBAL;

Page 25: Apache Impala (incubating) 2.5 Performance Update

25© Cloudera, Inc. All rights reserved.

Runtime filters: real-world results

• Runtime filters can be highly effective. Some benchmark queries are more than 30 times faster in Impala 2.5.0.

• As always, depends on your queries, your schemas and your cluster environment.• By default, runtime filters are enabled in limited ‘local’ mode in Impala 2.5.0. They

can be enabled fully by setting RUNTIME_FILTER_MODE=GLOBAL. • Other runtime filter parameters include :

• RUNTIME_BLOOM_FILTER_SIZE: [1048576]• RUNTIME_FILTER_WAIT_TIME_MS: [0]

Page 26: Apache Impala (incubating) 2.5 Performance Update

26© Cloudera, Inc. All rights reserved.

Improved Cardinality Estimates and Join Order

1. More robust scan cardinality estimation• Mitigate correlated predicates (exponential backoff)

2. Improved join cardinality estimation• Special treatment of common case of PK/FK joins• Detect selective joins by applying the selectivity of build-side predicates to the

estimated join cardinality

• TPC-H Q8 Impact: >8x speedup (91s in Impala 2.3 -> 11s in Impala 2.5)

SELECT * FROM cars WHERE cars.make = 'Toyota' AND cars.model = 'Camry'

Page 27: Apache Impala (incubating) 2.5 Performance Update

27© Cloudera, Inc. All rights reserved.

Query start-up: performance impact

Page 28: Apache Impala (incubating) 2.5 Performance Update

28© Cloudera, Inc. All rights reserved.

LLVM Codegen Support in Impala

Operations:• Hash join• Aggregation• Scans: Text, Sequence, Avro• Expressions in all operators• Sort• Top-N

Data Types:• TINYINT, SMALLINT, INT, BIGINT• FLOAT, DOUBLE• BOOLEAN• STRING, VARCHAR• DECIMALNew in Impala

2.5Extended in Impala 2.5

Page 29: Apache Impala (incubating) 2.5 Performance Update

29© Cloudera, Inc. All rights reserved.

Codegen for Order by & Top-Nvoid* ExprContext::GetValue(Expr* e, TupleRow* row) { switch (e->type_.type) { case TYPE_BOOLEAN: { .. .. } case TYPE_TINYINT: { .. .. } case TYPE_INT: { .. .

int Compare(TupleRow* lhs, TupleRow* rhs) const { for (int i = 0; i < sort_cols_lhs_.size(); ++i) { void* lhs_value = sort_cols_lhs_[i]->GetValue(lhs); void* rhs_value = sort_cols_rhs_[i]->GetValue(rhs);

if (lhs_value == NULL && rhs_value != NULL) return nulls_first_[i]; if (lhs_value != NULL && rhs_value == NULL) return -nulls_first_[i];

int result = RawValue::Compare(lhs_value, rhs_value, sort_cols_lhs_[i]->root()->type()); if (!is_asc_[i]) result = -result; if (result != 0) return result; // Otherwise, try the next Expr } return 0; // fully equivalent key }

Page 30: Apache Impala (incubating) 2.5 Performance Update

30© Cloudera, Inc. All rights reserved.

Codegen for Order by & Top-N

int CompareCodgened(TupleRow* lhs, TupleRow* rhs) const { int64_t lhs_value = sort_columns[i]->GetBigIntVal(lhs); // i = 0 int64_t rhs_value = sort_columns[i]->GetBigIntVal(rhs); // i = 1

int result = lhs_value > rhs_value ? 1 : (lhs_value < rhs_value ? -1 : 0); if (result != 0) return result; // Otherwise, try the next Expr return 0; // fully equivalent key}

Codegen code

• Perfectly unrolls “for each grouping column” loop• No switching on input type(s)• Removes branching on ASCENDING/DESCENDING,

NULLS FIRST/LAST

Original code

int Compare(TupleRow* lhs, TupleRow* rhs) const { for (int i = 0; i < sort_cols_lhs_.size(); ++i) { void* lhs_value = sort_cols_lhs_[i]->GetValue(lhs); void* rhs_value = sort_cols_rhs_[i]->GetValue(rhs);

if (lhs_value == NULL && rhs_value != NULL) return nulls_first_[i]; if (lhs_value != NULL && rhs_value == NULL) return -nulls_first_[i];

int result = RawValue::Compare(lhs_value, rhs_value, sort_cols_lhs_[i]->root()->type()); if (!is_asc_[i]) result = -result; if (result != 0) return result; // Otherwise, try the next Expr } return 0; // fully equivalent key }

Page 31: Apache Impala (incubating) 2.5 Performance Update

31© Cloudera, Inc. All rights reserved.

Codegen for Order by & Top-N

int CompareCodgened(TupleRow* lhs, TupleRow* rhs) const { int64_t lhs_value = sort_columns[i]->GetBigIntVal(lhs); // i = 0 int64_t rhs_value = sort_columns[i]->GetBigIntVal(rhs); // i = 1

int result = lhs_value > rhs_value ? 1 : (lhs_value < rhs_value ? -1 : 0); if (result != 0) return result; // Otherwise, try the next Expr return 0; // fully equivalent key}

Codegen code

• Perfectly unrolls “for each grouping column” loop• No switching on input type(s)• Removes branching on ASCENDING/DESCENDING,

NULLS FIRST/LAST

Original code

int Compare(TupleRow* lhs, TupleRow* rhs) const { for (int i = 0; i < sort_cols_lhs_.size(); ++i) { void* lhs_value = sort_cols_lhs_[i]->GetValue(lhs); void* rhs_value = sort_cols_rhs_[i]->GetValue(rhs);

if (lhs_value == NULL && rhs_value != NULL) return nulls_first_[i]; if (lhs_value != NULL && rhs_value == NULL) return -nulls_first_[i];

int result = RawValue::Compare(lhs_value, rhs_value, sort_cols_lhs_[i]->root()->type()); if (!is_asc_[i]) result = -result; if (result != 0) return result; // Otherwise, try the next Expr } return 0; // fully equivalent key }

10x more efficient code

Page 32: Apache Impala (incubating) 2.5 Performance Update

32© Cloudera, Inc. All rights reserved.

Float/Double Vs Decimal?Pros for Float/Double

• Uses less memory.

• Faster because floating point math operations are natively supported by processors.(Note: Decimal uses fixed-point hardware types - int64 and __int128)

• Can represent a larger range of numbers.

Cons for Float/Double• Precision errors compound during aggregations

• Can’t do math with wide number of significant digits (123456789.1 * .0000987654321)

Decimal arithmetic and aggregation

No go for applications requiring high precision & accuracy What about performance penalty?

Page 33: Apache Impala (incubating) 2.5 Performance Update

33© Cloudera, Inc. All rights reserved.

Decimal arithmetic and aggregation

SELECT l_returnflag, l_linestatus, Sum(l_quantity) AS SUM_QTY, Sum(l_extendedprice)AS SUM_BASE_PRICE, Sum(l_extendedprice * ( 1 - l_discount ))AS SUM_DISC_PRICEFROM lineitem GROUP BY l_returnflag, l_linestatus ORDER BY l_returnflag,

l_linestatus

3x speedup

● Simplified overflow check for decimal.● Extended Codegen framework to support aggregations involving decimal.● Bridged the performance gap between double and decimal

Page 34: Apache Impala (incubating) 2.5 Performance Update

34© Cloudera, Inc. All rights reserved.

Network

Distributed Aggregations in Impala

Preagg Preagg Preagg

Merge Merge Merge

select cust_id, sum(dollars)from sales group by cust_id;

Scan ScanScan

• Impala aggregations have two phases:• Pre-aggregation phase• Merge phase

• The pre-aggregation phase greatly reduces network traffic if there are many input rows per grouping value.• E.g. many sales per customer.

Page 35: Apache Impala (incubating) 2.5 Performance Update

35© Cloudera, Inc. All rights reserved.

Network

Downsides of Pre-aggregations

Preagg Preagg Preagg

Merge Merge Merge

select distinct * from sales;

Scan ScanScan

• Pre-aggregations consume:• Memory• CPU cycles

• Pre-aggregations are not always effective at reducing network traffic

• E.g. select distinct for nearly-distinct rows• Pre-aggregations can spill to disk under

memory pressure• Disk I/O is bad - better to send to

merge agg rather than disk

Page 36: Apache Impala (incubating) 2.5 Performance Update

36© Cloudera, Inc. All rights reserved.

Network

Streaming Pre-aggregations in Impala 2.5

Merge Merge Merge

select distinct * from sales;

Scan ScanScan

• Reduction factor is dynamically estimated based on the actual data processed

• Pre-aggregation expands memory usage only if reduction factor is good

• Benefits:• Certain aggregations with low reduction

factor see speedups of up to 40%• Memory consumption can be reduced by

50% or more• Streaming pre-aggregations don’t spill to

disk

Page 37: Apache Impala (incubating) 2.5 Performance Update

37© Cloudera, Inc. All rights reserved.

Streaming Pre-aggregations in Impala 2.5

Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail 06:AGGREGATE 1 366.581ms 366.581ms 1 1 72.00 KB -1.00 B FINALIZE 05:EXCHANGE 1 149.923us 149.923us 15 1 0 -1.00 B UNPARTITIONED 02:AGGREGATE 15 243.604ms 248.701ms 15 1 12.00 KB 10.00 MB 04:AGGREGATE 15 8s887ms 9s585ms 450.00M 437.91M 1.53 GB 245.01 MB FINALIZE 03:EXCHANGE 15 827.770ms 932.785ms 450.00M 437.91M 0 0 HASH(o_orderkey) 01:AGGREGATE 15 9s995ms 11s484ms 450.00M 437.91M 1.64 GB 3.59 GB 00:SCAN HDFS 15 142.192ms 189.179ms 450.00M 450.00M 150.94 MB 88.00 MB tpch_300_parquet.orders

Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail 06:AGGREGATE 1 356.667ms 356.667ms 1 1 72.00 KB -1.00 B FINALIZE 05:EXCHANGE 1 110.924us 110.924us 15 1 0 -1.00 B UNPARTITIONED 02:AGGREGATE 15 246.188ms 250.408ms 15 1 12.00 KB 10.00 MB 04:AGGREGATE 15 11s174ms 11s753ms 450.00M 437.91M 1.51 GB 245.01 MB FINALIZE 03:EXCHANGE 15 750.620ms 805.099ms 450.00M 437.91M 0 0 HASH(o_orderkey) 01:AGGREGATE 15 5s670ms 6s715ms 450.00M 437.91M 153.40 MB 3.59 GB STREAMING 00:SCAN HDFS 15 151.746ms 201.804ms 450.00M 450.00M 150.95 MB 88.00 MB tpch_300_parquet.orders

Baseline finished in 23.13 seconds

With stream pre-aggregation enabled finished in 14.9 seconds

Page 38: Apache Impala (incubating) 2.5 Performance Update

38© Cloudera, Inc. All rights reserved.

Optimization for partition keys scan

• Use metadata to avoid table accesses for partition key scans:• select min(month), max(year) from functional.alltypes;• month, year are partition keys of the table

• Enabled by query option OPTIMIZE_PARTITION_KEY_SCANS• Applicable:

• min(), max(), ndv() and aggregate functions with distinct keyword• partition keys only

01:AGGREGATE [FINALIZE] | output: min(month),max(year)| 00:UNION constant-operands=24

03:AGGREGATE [FINALIZE] | output: min:merge(month), max:merge(year)|02:EXCHANGE [UNPARTITIONED] |01:AGGREGATE| output: min(month), max(year)|00:SCAN HDFS [functional.alltypes] partitions=24/24 files=24 size=478.45KB

Plan without optimization Plan with optimization

Page 39: Apache Impala (incubating) 2.5 Performance Update

39© Cloudera, Inc. All rights reserved.

21x node cluster each with Hardware ● 384GB memory, 2s sockets, 12x total cores, Intel Xeon CPU E5-2630L 0 at 2.00GHz● 12 disk drives at 932GB each (one for the OS, the rest for HDFS)

Comparative Set● Impala 2.5

○ RUNTIME_FILTER_MODE = 2;● Spark SQL 1.6

○ Thrift JDBC server used to avoid startup cost ○ --master yarn --deploy-mode client --driver-memory 24G --driver-cores 8 --executor-memory 24G --num-executors 240

Workload● TPC-DS 15TB stored in Parquet file format (default of 256MB block size)● Un-modified TPC-DS queries : 3, 7, 8, 19, 25, 27, 34, 42, 43, 46, 47, 52, 53, 55, 59, 61, 63, 68, 73, 79, 88, 89, 96, 98● Caveats:

○ Spark-SQL failed running : ■ Q25 : Bad plan ■ Q47 : StackOverflowError■ Q89 : StackOverflowError

Competitive benchmark : TPC-DS

Page 40: Apache Impala (incubating) 2.5 Performance Update

40© Cloudera, Inc. All rights reserved.

Q25 (Fact to fact joins)SELECT i_item_id,i_item_desc, s_store_id, s_store_name, Stddev_samp(ss_net_profit),Stddev_samp(sr_net_loss), Stddev_samp(cs_net_profit) AS catalog_sales_profit FROM store_sales, store_returns, catalog_sales, date_dim d1, date_dim d2, date_dim d3, store, item WHERE d1.d_moy = 4 AND d1.d_year = 2001 AND d1.d_date_sk = ss_sold_date_sk AND i_item_sk = ss_item_sk AND s_store_sk = ss_store_sk AND ss_customer_sk = sr_customer_sk AND ss_item_sk = sr_item_sk AND ss_ticket_number = sr_ticket_number AND sr_returned_date_sk = d2.d_date_sk AND d2.d_moy BETWEEN 4 AND 10 AND d2.d_year = 2001 AND sr_customer_sk = cs_bill_customer_sk AND sr_item_sk = cs_item_sk AND cs_sold_date_sk = d3.d_date_sk AND d3.d_moy BETWEEN 4 AND 10 AND d3.d_year = 2001 GROUP BY i_item_id, i_item_desc, s_store_id, s_store_name ORDER BY i_item_id, i_item_desc, s_store_id, s_store_name LIMIT 100;

Competitive benchmark Query complexity varied from Q3SELECT dt.d_year, item.i_brand_id brand_id, item.i_brand brand, Sum(ss_ext_sales_price) sum_agg FROM date_dim dt, store_sales, item WHERE dt.d_date_sk = store_sales.ss_sold_date_sk AND store_sales.ss_item_sk = item.i_item_sk AND item.i_manufact_id = 436 AND dt.d_moy = 12 GROUP BY dt.d_year, item.i_brand, item.i_brand_id ORDER BY dt.d_year, sum_agg DESC, brand_id LIMIT 100;

Page 41: Apache Impala (incubating) 2.5 Performance Update

41© Cloudera, Inc. All rights reserved.

Competitive benchmark

Page 42: Apache Impala (incubating) 2.5 Performance Update

42© Cloudera, Inc. All rights reserved.

Competitive benchmark

Impala 2.5 is 11x faster (based on geomean)

Page 43: Apache Impala (incubating) 2.5 Performance Update

43© Cloudera, Inc. All rights reserved.

Performance Benchmark Takeaways• Impala unlocks BI usage directly on Hadoop

• Meets BI low-latency and multi-user requirements • Advantage expands for single-user vs just 10 users

• Spark SQL enables easier Spark application development• Enables mixed procedural Spark (Java/Scala) and SQL job development

• Mid-term trends will further favor Impala’s design approach for latency and concurrency• More data sets move to memory (HDFS caching, in-memory joins, Intel joint roadmap)• CPU efficiency will increase in importance• Native code enables easy optimizations for CPU instruction sets

Page 44: Apache Impala (incubating) 2.5 Performance Update

44© Cloudera, Inc. All rights reserved.

• Available today in Impala 2.5:• All the same Impala functionality, performance, and third-party integrations• Supported across our cloud partners• Deployment via Director• Modular architecture enables cloud’s decoupled storage and elasticity future

• Available soon in Impala 2.6:• Impala read/write to S3 in addition to local HDFS IMPALA-1878• Dynamically sized runtime filters• Parquet scanner optimization• Faster joins, aggregations, sorts and decimal arithmetic • Rack aware scheduling • Faster code generation

Impala and Cloud

Page 45: Apache Impala (incubating) 2.5 Performance Update

45© Cloudera, Inc. All rights reserved.

Impala Roadmap2H 2015 1H 2016 2016

• SQL Support & Usability• Nested structures• Kudu updates (beta)

• Management & Security• Record reader service

(beta)• Finer-grained security

(Sentry)• Integration

• Isilon support• Python interface (Ibis)

• Performance & Scale• Improved predictability

under concurrency

• Performance & Scale• Continued scalability and

concurrency• Initial perf/scale

improvements• Management & Security

• Improved admission control

• Resource utilization and showback

• SQL Support & Usability• Dynamic partitioning

• Performance & Scale• >20x performance• Multi-threaded

joins/aggregations• Continued scale work

• Cloud• S3 read/write support

• Management & Security• Improved YARN

integration• Automated metadata

• SQL Support & Usability• Data type improvements• Added SQL extensions

Justin Erickson
Remove "nested types with Avro"If somebody asks we can respond with potentially end of this year
Page 46: Apache Impala (incubating) 2.5 Performance Update

46© Cloudera, Inc. All rights reserved.

Appendix.

Page 47: Apache Impala (incubating) 2.5 Performance Update

47© Cloudera, Inc. All rights reserved.

Page 48: Apache Impala (incubating) 2.5 Performance Update

48© Cloudera, Inc. All rights reserved.

• Pre Impala 2.5:• Coordinator starts receiving fragments before

senders• Problem:

• Serializes startup• Scale and plan complexity ~ slower startup

• Impala 2.5:• Coordinator starts fragments in any order• Added wait logic for senders and receivers

Query start-up improvements

Page 49: Apache Impala (incubating) 2.5 Performance Update

49© Cloudera, Inc. All rights reserved.

Scheduling Small Queries

Query scheduler assigns scan ranges to workers (running impalad).First it selects an HDFS datanode to read from.

A B C

Selection will always start with the same replica to make optimal use of OS buffer caches.This can lead to hot-spots for some workloads.Improvement: Pick impalad at random.

Justin Kestelyn
remove slides 45-47 in interests of time
Mostafa Mokhtar
[email protected] Should I remove the competitive benchmark slide as well?
Mostafa Mokhtar
I think the slide numbers have changed since I deleted some
Mostafa Mokhtar
_Marked as resolved_
Justin Kestelyn
_Re-opened_I would keep the vs Spark SQL results in, people will ask
Mostafa Mokhtar
They are still there, slided 40-43
Page 50: Apache Impala (incubating) 2.5 Performance Update

50© Cloudera, Inc. All rights reserved.

New Query Option: random_replica

Disabled by default.set random_replica = 1;

Also has a corresponding query hint:SELECT AVG(c1) FROM t /* +SCHEDULE_RANDOM_REPLICA

*/;

Page 51: Apache Impala (incubating) 2.5 Performance Update

51© Cloudera, Inc. All rights reserved.

Where It Can Help• Large number of small queries, each with few input tables.• High load on only one of multiple replicas of a table.• Queries are CPU bound.• Benefit: Distribute load more evenly over replicas.• Tradeoff: Distribution of local reads will increase buffer cache usage.

What’s Next• Add possibility to prefer remote reads.• Switch remote impalad selection from round-robin to load-based.• Add rack-awareness.

Page 52: Apache Impala (incubating) 2.5 Performance Update

52© Cloudera, Inc. All rights reserved.

Catalog Improvements

Incrementally update table metadata instead of force-reloading all table metadata during DDL/DML operations

Reload metadata of only ‘dirty’ partitions

Reuse descriptors of HDFS files to avoid loading file/block metadata for files that haven’t been modified

Significantly reduce the latency of DDL/DML operations that change a small fraction of table metadata (e.g. alter table foo partition (year = 2010) set location ‘blah’)

Page 53: Apache Impala (incubating) 2.5 Performance Update

53© Cloudera, Inc. All rights reserved.

Catalog Improvements - Results