tomer shiran, mapr_hadoop&sql

3
SQL-in-Hadoop SQL is hot again! Apache Hive (+ Stinger/Tez) Apache Drill Shark/Spark Impala Phoenix Greenplum (HAWQ) Cascading Lingual Hadapt Splice Machine MapR provides the broadest SQL support Apache Hive 0.11 GA Impala on MapR Private beta (25-50% faster) Apache Drill 1.0 Alpha this month Hadoop BI tools can do a lot more than SQL queries

Upload: the-hive

Post on 20-Aug-2015

364 views

Category:

Documents


8 download

TRANSCRIPT

Page 1: Tomer Shiran, MapR_Hadoop&SQL

SQL-in-Hadoop

• SQL is hot again!Apache Hive (+ Stinger/Tez)

Apache DrillShark/Spark

ImpalaPhoenix

Greenplum (HAWQ)

Cascading LingualHadapt

Splice Machine

• MapR provides the broadest SQL supportApache Hive 0.11

GAImpala on MapR

Private beta (25-50% faster)Apache Drill 1.0

Alpha this month

• Hadoop BI tools can do a lot more than SQL queries

Page 2: Tomer Shiran, MapR_Hadoop&SQL

Why Apache Drill?

• Community-driven project– SQL is an application interface– Users don’t want vendor lock-in

• Next-generation SQL-in-Hadoop– Full ANSI SQL:2003– Schema is optional– Nested data: JSON, Protobuf, …– Highly extensible– YARN integration

Who’s contributing?MapRPentahoOracleVMWareMicrosoftThoughtworksUT AustinUW MadisonRJMetricsXingCloud

Lines of code:> 100K

Page 3: Tomer Shiran, MapR_Hadoop&SQL

It’s Not Just About Queries…• Real-time data loading so you don’t query stale data

– HDFS was not designed for these workloads

• Common storage and resource mgmt for all Big Data applications– Enterprise-grade: HA, DP (snapshots), DR (mirrors)– Multi-tenancy– Read/write access (POSIX)