discover hdp 2.2: even faster sql queries with apache hive and stinger.next

34
Page 1 © Hortonworks Inc. 2014 Discover HDP 2.2: Even Faster SQL Queries with Apache Hive & Stinger.next Hortonworks. We do Hadoop.

Upload: hortonworks

Post on 28-May-2015

1.970 views

Category:

Software


0 download

DESCRIPTION

Earlier this year, the Apache open source community delivered the Stinger Initiative to improve speed, scale and SQL semantics in Apache Hive. Now Stinger.next is underway, to build on those initial successes. In this presentation, from a webinar hosted by Hortonworks co-founder Alan Gates and Hortonworks Hive product manager Raj Baines, you can learn more about Stinger.next and innovation in Apache Hive. Alan and Raj cover new Hive functionality for more speed, scale and SQL in HDP 2.2. Specific topics include transactions with ACID semantics, the cost based optimizer and dynamic query optimizations. The presentation also shows future plans for the Stinger.next initiative.

TRANSCRIPT

Page 1: Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

Page 1 © Hortonworks Inc. 2014

Discover HDP 2.2: Even Faster SQL Queries with Apache Hive & Stinger.next

Hortonworks. We do Hadoop.

Page 2: Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

Page 2 © Hortonworks Inc. 2014

Speakers

Justin Sears

Hortonworks Product Marketing Manager

Alan Gates

Hortonworks Co-Founder and Apache Hive Committer & PMC Member

Raj Bains

Hortonworks Senior Manger of Product Management for Apache Hive

Page 3: Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

Page 3 © Hortonworks Inc. 2014

Agenda

•  Introduction to Stinger.next

•  New Innovation in Apache Hive 0.14 §  SQL: Transactions with ACID semantics

§  Speed: Cost based optimizer for star and bushy joins

§  Scale: Dynamic query optimizations

•  The Road Ahead for Stinger.next

•  Q & A

We’ll move quickly: •  Attendee phone lines are muted

•  Text any questions to Raj Bains using Webex chat •  Questions answered at the end

•  Unanswered questions and answers in upcoming blog post

Page 4: Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

Page 4 © Hortonworks Inc. 2014

Big Data, Hadoop & Data Center Re-platforming

Business Drivers

•  From reactive analytics to proactive interactions

•  Insights that drive competitive advantage & optimal returns

Financial Drivers

•  Cost of data systems, as % of IT spend, continues to grow

•  Cost advantages of commodity hardware & open source software

$ Technical Drivers

•  Data is growing exponentially & existing systems overwhelmed

•  Predominantly driven by NEW types of data that can inform analytics

There is an inequitable balance between vendor and customer in the market

Page 5: Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

Page 5 © Hortonworks Inc. 2014

Clickstream Capture and analyze website visitors’ data trails and optimize your website

Sensors Discover patterns in data streaming automatically from remote sensors and machines

Server Logs Research logs to diagnose process failures and prevent security breaches

New Types of Data Hadoop Value:

Sentiment Understand how your customers feel about your brand and products – right now

Geographic Analyze location-based data to manage operations where they occur

Unstructured Understand patterns in files across millions of web pages, emails, and documents

Page 6: Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

Page 6 © Hortonworks Inc. 2014

A Shift from Reactive to Proactive Interactions

HDP and Hadoop allow organizations to use data to shift interactions from…

Reactive Post Transaction

Proactive Pre Decision

…to Real-time Personalization From static branding

…to repair before break From break then fix

…to Designer Medicine From mass treatment

…to Automated Algorithms From Educated Investing

…to 1x1 Targeting From mass branding

A shift in Advertising

A shift in Financial Services

A shift in Healthcare

A shift in Retail

A shift in Telco

Page 7: Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

Page 7 © Hortonworks Inc. 2014

Enterprise Goals for the Modern Data Architecture

•  Consolidate siloed data sets structured and unstructured

•  Central data set on a single cluster

•  Multiple workloads across batch interactive and real time

•  Central services for security, governance and operation

•  Preserve existing investment in current tools and platforms

•  Single view of the customer, product, supply chain

APP

LIC

ATIO

NS

DAT

A S

YSTE

M

Business Analytics

Custom Applications

Packaged Applications

RDBMS

EDW

MPP

YARN: Data Operating System

1 ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° ° N

Interactive Real-Time Batch CRM

ERP

Other 1 ° ° °

° ° ° °

HDFS (Hadoop Distributed File System)

SOU

RC

ES

EXISTING  Systems  

Clickstream   Web    &Social  

Geoloca9on   Sensor    &  Machine  

Server    Logs  

Unstructured  

Page 8: Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

Page 8 © Hortonworks Inc. 2014

YARN Transformed Hadoop & Opened a New Era

YARN The Architectural Center of Hadoop

•  Common data platform, many applications

•  Support multi-tenant access & processing

•  Batch, interactive & real-time use cases

YARN: Data Operating System (Cluster Resource Management)

1 ° ° ° ° ° ° °

° ° ° ° ° ° ° °

Script

Pig

SQL

Hive

Tez Tez

Java Scala

Cascading

Tez

° °

° °

° ° ° ° °

° ° ° ° °

Others

ISV Engines

HDFS (Hadoop Distributed File System)

Stream

Storm

Search

Solr

NoSQL

HBase Accumulo

Slider Slider

BATCH, INTERACTIVE & REAL-TIME DATA ACCESS

In-Memory

Spark

Page 9: Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

Page 9 © Hortonworks Inc. 2014

YARN Extends Hadoop to Other Data Center Leaders

YARN The Architectural Center of Hadoop

•  Common data platform, many applications

•  Support multi-tenant access & processing

•  Batch, interactive & real-time use cases

•  Supports 3rd-party ISV tools

(ex. SAS, Syncsort, Actian, etc.)

YARN Ready Applications Facilitates ongoing innovation and enterprise adoption via ecosystem of new and existing “YARN Ready” solutions

YARN: Data Operating System (Cluster Resource Management)

1 ° ° ° ° ° ° °

° ° ° ° ° ° ° °

Script

Pig

SQL

Hive

Tez Tez

Java Scala

Cascading

Tez

° °

° °

° ° ° ° °

° ° ° ° °

Others

ISV Engines

HDFS (Hadoop Distributed File System)

Stream

Storm

Search

Solr

NoSQL

HBase Accumulo

Slider Slider

BATCH, INTERACTIVE & REAL-TIME DATA ACCESS

In-Memory

Spark

Page 10: Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

Page 10 © Hortonworks Inc. 2014

Enterprise Hadoop: Central Set of Services

YARN: Data Operating System (Cluster Resource Management)

1 ° ° ° ° ° ° °

° ° ° ° ° ° ° °

° °

° °

° ° ° ° °

° ° ° ° °

Enables Apache Hadoop to be an Enterprise Data Platform with centralized services for:

•  Governance

•  Operations

•  Security

Everything that plugs into Hadoop inherits these services

Provision, Manage & Monitor

Ambari

Zookeeper

Scheduling

Oozie

Load data and manage

according to policy

Deploy and effectively

manage the platform

Provide layered approach to

security through Authentication, Authorization,

Accounting, and Data Protection

SECURITY GOVERNANCE OPERATIONS

Script

Pig

SQL

Hive

Java Scala

Cascading

Stream

Storm

Search

Solr

NoSQL

HBase Accumulo

BATCH, INTERACTIVE & REAL-TIME DATA ACCESS

In-Memory

Spark

Others

ISV Engines

YARN: Data Operating System (Cluster Resource Management)

HDFS (Hadoop Distributed File System)

Tez Slider Slider Tez Tez

Page 11: Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

Page 11 © Hortonworks Inc. 2014

Hortonworks Development Investment for the Enterprise

Vertical Integration with YARN and HDFS

1 ° ° ° ° ° ° °

° ° ° ° ° ° ° °

° °

° °

° ° ° ° °

° ° ° ° °

Provision, Manage & Monitor

Ambari

Zookeeper

Scheduling

Oozie

Load data and manage

according to policy

Deploy and effectively

manage the platform

Provide layered approach to

security through Authentication, Authorization,

Accounting, and Data Protection

SECURITY GOVERNANCE OPERATIONS

Script

Pig

SQL

Hive

Java Scala

Cascading

Stream

Storm

Search

Solr

NoSQL

HBase Accumulo

BATCH, INTERACTIVE & REAL-TIME DATA ACCESS

In-Memory

Spark

Others

ISV Engines

YARN: Data Operating System (Cluster Resource Management)

HDFS (Hadoop Distributed File System)

Tez Slider Slider Tez Tez

•  Ensure engines can run reliably and respectfully in a YARN based cluster •  Implement features throughout the stack to accommodate

Page 12: Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

Page 12 © Hortonworks Inc. 2014

Hortonworks Development Investment for the Enterprise

Horizontal Integration for Enterprise Services

1 ° ° ° ° ° ° °

° ° ° ° ° ° ° °

° °

° °

° ° ° ° °

° ° ° ° °

Provision, Manage & Monitor

Ambari

Zookeeper

Scheduling

Oozie

Load data and manage

according to policy

Deploy and effectively

manage the platform

Provide layered approach to

security through Authentication, Authorization,

Accounting, and Data Protection

SECURITY GOVERNANCE OPERATIONS

Script

Pig

SQL

Hive

Java Scala

Cascading

Stream

Storm

Search

Solr

NoSQL

HBase Accumulo

BATCH, INTERACTIVE & REAL-TIME DATA ACCESS

In-Memory

Spark

Others

ISV Engines

YARN: Data Operating System (Cluster Resource Management)

HDFS (Hadoop Distributed File System)

Tez Slider Slider Tez Tez

•  Ensure consistent enterprise services are applied across the entire Hadoop stack •  Integrate with and extend existing data center solutions for these key requirements

Page 13: Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

Page 13 © Hortonworks Inc. 2014

Hortonworks Data Platform 2.2

HDP Delivers Enterprise Hadoop

YARN: Data Operating System (Cluster Resource Management)

1 ° ° ° ° ° ° °

° ° ° ° ° ° ° °

Script

Pig

SQL

Hive

Tez Tez

Java Scala

Cascading

Tez

° °

° °

° ° ° ° °

° ° ° ° °

HDFS (Hadoop Distributed File System)

Stream

Storm

Search

Solr

NoSQL

HBase Accumulo

Slider Slider

SECURITY GOVERNANCE OPERATIONS BATCH, INTERACTIVE & REAL-TIME DATA ACCESS

In-Memory

Spark

Provision, Manage & Monitor

Ambari

Zookeeper

Scheduling

Oozie

Data Workflow, Lifecycle & Governance

Falcon Sqoop Flume Kafka NFS

WebHDFS

Authentication Authorization

Audit Data Protection

Storage: HDFS

Resources: YARN Access: Hive

Pipeline: Falcon Cluster: Ranger Cluster: Knox

Deployment Choice Linux Windows Cloud

YARN is the architectural center of HDP

•  Common data set across all applications

•  Batch, interactive & real-time workloads

•  Multi-tenant access & processing

Provides comprehensive enterprise capabilities

•  Governance

•  Security

•  Operations

Enables broad ecosystem adoption

•  ISVs can plug directly into Hadoop

The widest range of deployment options •  Linux & Windows

•  On premises & cloud

Others

ISV Engines

On-Premises

Page 14: Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

Page 14 © Hortonworks Inc. 2014

Hortonworks Data Platform 2.2

HDP Delivers Enterprise Hadoop

1 ° ° ° ° ° ° °

° ° ° ° ° ° ° °

Script

Pig

Tez

Java Scala

Cascading

Tez

° °

° °

° ° ° ° °

° ° ° ° °

HDFS (Hadoop Distributed File System)

Stream

Storm

Search

Solr

NoSQL

HBase Accumulo

Slider Slider

GOVERNANCE OPERATIONS

In-Memory

Spark

Provision, Manage & Monitor

Ambari

Zookeeper

Scheduling

Oozie

Data Workflow, Lifecycle & Governance

Falcon Sqoop Flume Kafka NFS

WebHDFS

YARN is the architectural center of HDP

•  Common data set across all applications

•  Batch, interactive & real-time workloads

•  Multi-tenant access & processing

Provides comprehensive enterprise capabilities

•  Governance

•  Security

•  Operations

Enables broad ecosystem adoption

•  ISVs can plug directly into Hadoop

The widest range of deployment options •  Linux & Windows

•  On premises & cloud

Others

ISV Engines

SECURITY

Authentication Authorization

Audit Data Protection

Storage: HDFS

Resources: YARN Access: Hive

Pipeline: Falcon Cluster: Ranger Cluster: Knox

Deployment Choice Linux Windows On-Premises Cloud

BATCH, INTERACTIVE & REAL-TIME DATA ACCESS

YARN: Data Operating System (Cluster Resource Management)

SQL

Hive

Tez

Page 15: Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

Page 15 © Hortonworks Inc. 2014

Introduction to Stinger.next

Page 16: Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

Page 16 © Hortonworks Inc. 2014

Stinger.next – Enterprise SQL at Hadoop Scale

Stinger (Hive 0.13, Tez, ORC File)

Scale to Petabytes

Batch to Interactive Queries

Read-Only Data

Substantial SQL Support

Single Tool for Multiple SQL workloads – Interactive, Reporting and ETL

MapReduce, Tez Engines

Stinger.next

Scale to Petabytes

Sub-Second Queries

Modify Data with Transactions

Comprehensive SQL:2011 Analytics

Single Tool for Multiple SQL workloads – Interactive, Reporting, ETL, ML

MapReduce, Tez, Spark Engines

Page 17: Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

Page 17 © Hortonworks Inc. 2014

SQL in Hive 0.14: Transactions with ACID Semantics

Page 18: Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

Page 18 © Hortonworks Inc. 2014

Transaction Use Cases Reporting with Analytics (YES) •  Reporting on data with occasional updates •  Corrections to the fact tables, evolving dimension tables

•  Low concurrency updates, low TPS

Operational Reporting (YES, next) •  High throughput ingest from operational (OLTP) database

•  Periodic inserts every 5-30 minutes

•  Requires tool support and changes in our Transactions

Operational (OLTP) Database (NO) •  Small Transactions, each doing single line inserts

•  High Concurrency - Hundreds to thousands of connections

Hive

OLTP Hive Replication

Analytics Modifications

Hive

High Concurrency OLTP

Page 19: Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

Page 19 © Hortonworks Inc. 2014

Deep Dive: Transactions Transaction Support in Hive with ACID semantics •  Hive native support for INSERT, UPDATE, DELETE. •  Split Into Phases:

•  Phase 1: Hive Streaming Ingest (append) •  Phase 2: INSERT / UPDATE / DELETE Support •  Phase 3: BEGIN / COMMIT / ROLLBACK Txn

[Done]

[HDP 2.2]

[Next]

Read-Optimized ORCFile

Delta File Merged Read-

Optimized ORCFile

1. Original File Task reads the latest

ORCFile

Task

Read-Optimized ORCFile

Task Task

2. Edits Made Task reads the ORCFile and merges

the delta file with the edits

3. Edits Merged Task reads the updated ORCFile

Hive ACID Compactor periodically merges the delta

files in the background.

Page 20: Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

Page 20 © Hortonworks Inc. 2014

Speed in Hive 0.14: Cost Based Optimizer

Page 21: Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

Page 21 © Hortonworks Inc. 2014

TPC-DS Query 17

SELECT i_item_id, i_item_desc, s_state, Count(ss_quantity) AS store_sales_quantitycount, Avg(ss_quantity) AS store_sales_quantityave, Stddev_samp(ss_quantity) AS store_sales_quantitystdev, Stddev_samp(ss_quantity) / Avg(ss_quantity) AS store_sales_quantitycov, Count(sr_return_quantity) as_store_returns_quantitycount, Avg(sr_return_quantity) as_store_returns_quantityave, Stddev_samp(sr_return_quantity) as_store_returns_quantitystdev, Stddev_samp(sr_return_quantity) / Avg(sr_return_quantity) AS store_returns_quantitycov, Count(cs_quantity) AS catalog_sales_quantitycount, Avg(cs_quantity) AS catalog_sales_quantityave, Stddev_samp(cs_quantity) / Avg(cs_quantity) AS catalog_sales_quantitystdev, Stddev_samp(cs_quantity) / Avg(cs_quantity) AS catalog_sales_quantitycov FROM store_sales, store_returns, catalog_sales, date_dim d1, date_dim d2, date_dim d3, store, item WHERE d1.d_quarter_name = '2000Q1' AND d1.d_date_sk = store_sales.ss_sold_date_sk AND ss_sold_date BETWEEN '2000-01-01' AND '2000-03-31' AND item.i_item_sk = store_sales.ss_item_sk AND store.s_store_sk = store_sales.ss_store_sk AND store_sales.ss_customer_sk = store_returns.sr_customer_sk AND store_sales.ss_item_sk = store_returns.sr_item_sk AND store_sales.ss_ticket_number = store_returns.sr_ticket_number AND store_returns.sr_returned_date_sk = d2.d_date_sk AND d2.d_quarter_name IN ( '2000Q1', '2000Q2', '2000Q3' ) AND sr_returned_date BETWEEN '2000-01-01' AND '2000-09-01' AND store_returns.sr_customer_sk = catalog_sales.cs_bill_customer_sk AND store_returns.sr_item_sk = catalog_sales.cs_item_sk AND catalog_sales.cs_sold_date_sk = d3.d_date_sk AND d3.d_quarter_name IN ( '2000Q1', '2000Q2', '2000Q3' ) AND cs_sold_date BETWEEN '2000-01-01' AND '2000-09-31' GROUP BY i_item_id, i_item_desc, s_state ORDER BY i_item_id, i_item_desc, s_state LIMIT 100;

Page 22: Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

Page 22 © Hortonworks Inc. 2014

CBO on Selected Queries – 17

store_sales store_returns catalog_sales

items store

date_dim d1 date_dim d2 date_dim d3

Filter: quarter Filter: quarter Filter: quarter

Filter: date Filter: date Filter: date

customer_sk ticket_number

customer_sk Item_sk

date_sk date_sk date_sk

item_sk store_sk

Page 23: Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

Page 23 © Hortonworks Inc. 2014

OLD: Left Deep Plan

Reducer 3 •  Merge join 2 & 10 •  Map join 1 •  Map join 6 •  Map Join 7 •  Map Join 8 store •  Map Join 11 item •  Filter •  Group By •  Reduce

Map 12 Table_scan

Store_returns

Map 6 Table_scan d2, filter

Map 7 Table_scan d3, filter

Reducer 4 Group_By Reduce

Reducer 10 Merge join 12, 9

Map 9 Table_scan store_sales

Map 1 Table_scan d1, filter

Map 2 Table_scan catalog_sales

Reducer 5 Limit

B

B

B

Map 11 Table_scan item

Map 8 Table_scan store B

Large Fact tables joined together without filters

B

Page 24: Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

Page 24 © Hortonworks Inc. 2014

NEW: Complex Bushy Plan

Reducer 4 Merge join 3 & 8 Map join store Map join item

Reduce

Map 10 table_scan

store

Map 12 Table_scan

item

Map 3 Store_sales

Map join

Map 8 Store_returns

Map join

Reducer 5 Merge_Join Group_By Reduce

Map 11 catalog_sales,

Map Join

Map 9 Table_scan d1,

filter

Map 1 Table_scan d1,

filter

Map 2 Table_scan d1,

filter

Reducer 6 Group by Reduce

Reducer7 Limit

B

B B

B B

All 3 Large Fact tables joined with date dimension limiting data to few quarters

Page 25: Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

Page 25 © Hortonworks Inc. 2014

Performance Improvement – Query 17

Scale = 30TB Input records ~186mil

CBO Elapsed Time (sec)

Elapsed Time

Intermediate data (GB)

Output and Intermediate Records

OFF 10,683 ~3 hrs 5,017 135,647,792,123 ON 1,284 ~20 mins 275 8,543,232,360

Page 26: Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

Page 26 © Hortonworks Inc. 2014

Scale in Hive 0.14: Dynamic Query Optimization

Page 27: Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

Page 27 © Hortonworks Inc. 2014

Auto Reducer Parallelism

Use dynamic data volume during execution

rather than estimates from query compilation to determine the number of reducers

Leads to

faster query execution,

better resource utilizations

App Master

Vertex Manager

Vertex State

Machine

Time

1. Data size statistics

Tasks for a single map vertex

Tasks for a single reduce vertex

2. Set parallelism

3. Re-route

4. Cancel task

App Master

Vertex Manager

Vertex State

Machine

5. Tasks Completed

Tasks for a single map vertex

Tasks for a single reduce vertex

6. Start Tasks

7. Start

Page 28: Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

Page 28 © Hortonworks Inc. 2014

Auto Reducer Parallelism

use tpcds_bin_partitioned_orc_30000; set hive.tez.auto.reducer.parallelism=true; set hive.tez.min.partition.factor=0.125; SELECT ss_promo_sk, Sum(ss_sales_price), Count(*) FROM store_sales WHERE ss_sold_date < '1998-03-01' GROUP BY ss_promo_sk ORDER BY 2 DESC LIMIT 10;

Page 29: Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

Page 29 © Hortonworks Inc. 2014

Dynamic Partition Pruning

store_sales

date_dim d1 Filter

ss_sold_date_sk = date_sk

Table Definition create table store_sales (...) partitioned by (ss_sold_date_sk int) stored as orc;

d1 d2 d3 d4 …

Example Join of •  a large Fact table with multiple partitions •  with a dimension table that has a filter

The ss_sold_date_sk partitions that can be pruned away at join time is not known till the filter is applied at runtime

Compile Time Design •  Insert synthetic conditions for each join representing "x in

(keys of other side in join)”. Optimizer will push it as far down as possible

•  If the condition hits a table scan and the column involved is a partition column:

•  Setup Operator to send key events to AM •  else:

•  Remove synthetic predicate

App Master

Vertex Manager

Vertex State

Machine

1. Send events for partition pruning

Tasks for a single map vertex

Tasks for a single map vertex

Page 30: Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

Page 30 © Hortonworks Inc. 2014

Dynamic Pruning

TPC-DS Query 3       SELECT dt.d_year, item.i_brand_id brand_id, item.i_brand brand, Sum(ss_ext_sales_price) sum_agg FROM date_dim dt, store_sales, item WHERE dt.d_date_sk = store_sales.ss_sold_date_sk AND store_sales.ss_item_sk = item.i_item_sk AND item.i_manufact_id = 436 AND dt.d_moy = 12 GROUP BY dt.d_year, item.i_brand, item.i_brand_id ORDER BY dt.d_year, sum_agg DESC, brand_id LIMIT 100;  

Page 31: Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

Page 31 © Hortonworks Inc. 2014

Stinger.next: The Road Ahead

Page 32: Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

Page 32 © Hortonworks Inc. 2014

Stinger.next - Delivery Themes

Beyond  Read-­‐Only  2nd  Half  2014  

 

•  Transac(ons  with  ACID  allowing  insert,  update  and  delete  

•  Temporary  Tables  

•  Cost  Based  Op(mizer  op(mizes  star  and  bushy  join  queries  

Sub-­‐Second  1st  Half  2015  

 

•  Sub-­‐Second  queries  with  LLAP  

•  Hive-­‐Spark  Machine  Learning  integra(on  

•  Opera(onal  repor(ng  with  Hive  Streaming  Ingest  and  Transac(ons    

•  Replica(on  and  SQL/CBO  improvements  

Richer  Analy9cs  2nd  Half  2015  

 •  Toward  SQL:2011  Analy(cs  

•  Materialized  Views    

•  Cross-­‐Geo  Queries  

•  Workload  Management  via  YARN  and  LLAP  integra(on  

Page 33: Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

Page 33 © Hortonworks Inc. 2014

Q & A

Page 34: Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next

Page 34 © Hortonworks Inc. 2014

Thank you! Learn more at: hortonworks.com/hadoop/hive/

Register for the remaining 6 Discover HDP 2.2 Webinars

Hortonworks.com/webinars