tomer shiran, mapr_hadoop&sql

SQL-in-Hadoop

• SQL is hot again!Apache Hive (+ Stinger/Tez)

Apache DrillShark/Spark

ImpalaPhoenix

Greenplum (HAWQ)

Cascading LingualHadapt

Splice Machine

• MapR provides the broadest SQL supportApache Hive 0.11

GAImpala on MapR

Private beta (25-50% faster)Apache Drill 1.0

Alpha this month

• Hadoop BI tools can do a lot more than SQL queries

Why Apache Drill?

• Community-driven project– SQL is an application interface– Users don’t want vendor lock-in

• Next-generation SQL-in-Hadoop– Full ANSI SQL:2003– Schema is optional– Nested data: JSON, Protobuf, …– Highly extensible– YARN integration

Who’s contributing?MapRPentahoOracleVMWareMicrosoftThoughtworksUT AustinUW MadisonRJMetricsXingCloud

Lines of code:> 100K

It’s Not Just About Queries…• Real-time data loading so you don’t query stale data

– HDFS was not designed for these workloads

• Common storage and resource mgmt for all Big Data applications– Enterprise-grade: HA, DP (snapshots), DR (mirrors)– Multi-tenancy– Read/write access (POSIX)

tomer shiran, mapr_hadoop&sql

Documents