alan gates, hortonworks_hadoop&sql

3
Stinger Overview Page 1 An initiative, not a project or product Includes changes to Hive and a new project Tez Two main goals –Improve Hive performance 100x over Hive 0.10 –Extend Hive SQL to include features needed for analytics Hive will support: –BI tools connecting to Hadoop –Analysts performing ad-hoc, interactive queries –Still excellent at the large batch jobs it is used for today © 2013 Hortonworks

Upload: the-hive

Post on 27-Jan-2015

103 views

Category:

Documents


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Alan Gates, Hortonworks_Hadoop&SQL

Stinger Overview

Page 1

•An initiative, not a project or product• Includes changes to Hive and a new project Tez•Two main goals

–Improve Hive performance 100x over Hive 0.10–Extend Hive SQL to include features needed for analytics

•Hive will support:–BI tools connecting to Hadoop–Analysts performing ad-hoc, interactive queries–Still excellent at the large batch jobs it is used for today

© 2013 Hortonworks

Page 2: Alan Gates, Hortonworks_Hadoop&SQL

Stinger Mileposts

Page 2© 2013 Hortonworks

Stinger Phase 3• Buffer Cache• Cost Based

Optimizer

Stinger Phase 2• YARN Resource Mgmnt• Hive on Apache Tez• Remove startup latency• Vectorized Operators

Stinger Phase 1• Reduce # of MR Jobs• SQL Analytics• ORCFile Format

1 2 Improve existing tools & preserve investments

Enable Hive to support interactive workloads

Released in Hive 0.11

Current Work

Roadmap

Page 3: Alan Gates, Hortonworks_Hadoop&SQL

© Hortonworks Inc. 2013

Performance Trajectory

Page 3

Hive 10_x000d_T

ext

Hive 10_x000d_R

C

Hive 11_x000d_R

C

Hive 11_x000d_O

RC

Hive 11 CP_x000d_O

RC, Tez_x000d_Vectoriza-

tion

0X

5X

10X

15X

20X

25X

1X2X

12X11X

21X

Query 27 Speedup

Hive 10_x000d_

Text

Hive 10_x000d_

RC

Hive 11_x000d_

RC

Hive 11_x000d_

ORC

Hive 11 CP_x000d_ORC, Tez

0X

10X

20X

30X

40X

50X

60X

70X

80X

90X

1X

14X

44X

57X

78X

Query 82 Speedup