alan gates, hortonworks_hadoop&sql
DESCRIPTION
TRANSCRIPT
Stinger Overview
Page 1
•An initiative, not a project or product• Includes changes to Hive and a new project Tez•Two main goals
–Improve Hive performance 100x over Hive 0.10–Extend Hive SQL to include features needed for analytics
•Hive will support:–BI tools connecting to Hadoop–Analysts performing ad-hoc, interactive queries–Still excellent at the large batch jobs it is used for today
© 2013 Hortonworks
Stinger Mileposts
Page 2© 2013 Hortonworks
Stinger Phase 3• Buffer Cache• Cost Based
Optimizer
Stinger Phase 2• YARN Resource Mgmnt• Hive on Apache Tez• Remove startup latency• Vectorized Operators
Stinger Phase 1• Reduce # of MR Jobs• SQL Analytics• ORCFile Format
1 2 Improve existing tools & preserve investments
Enable Hive to support interactive workloads
Released in Hive 0.11
Current Work
Roadmap
© Hortonworks Inc. 2013
Performance Trajectory
Page 3
Hive 10_x000d_T
ext
Hive 10_x000d_R
C
Hive 11_x000d_R
C
Hive 11_x000d_O
RC
Hive 11 CP_x000d_O
RC, Tez_x000d_Vectoriza-
tion
0X
5X
10X
15X
20X
25X
1X2X
12X11X
21X
Query 27 Speedup
Hive 10_x000d_
Text
Hive 10_x000d_
RC
Hive 11_x000d_
RC
Hive 11_x000d_
ORC
Hive 11 CP_x000d_ORC, Tez
0X
10X
20X
30X
40X
50X
60X
70X
80X
90X
1X
14X
44X
57X
78X
Query 82 Speedup