tomer shiran, mapr_hadoop&sql
TRANSCRIPT
SQL-in-Hadoop
• SQL is hot again!Apache Hive (+ Stinger/Tez)
Apache DrillShark/Spark
ImpalaPhoenix
Greenplum (HAWQ)
Cascading LingualHadapt
Splice Machine
• MapR provides the broadest SQL supportApache Hive 0.11
GAImpala on MapR
Private beta (25-50% faster)Apache Drill 1.0
Alpha this month
• Hadoop BI tools can do a lot more than SQL queries
Why Apache Drill?
• Community-driven project– SQL is an application interface– Users don’t want vendor lock-in
• Next-generation SQL-in-Hadoop– Full ANSI SQL:2003– Schema is optional– Nested data: JSON, Protobuf, …– Highly extensible– YARN integration
Who’s contributing?MapRPentahoOracleVMWareMicrosoftThoughtworksUT AustinUW MadisonRJMetricsXingCloud
Lines of code:> 100K
It’s Not Just About Queries…• Real-time data loading so you don’t query stale data
– HDFS was not designed for these workloads
• Common storage and resource mgmt for all Big Data applications– Enterprise-grade: HA, DP (snapshots), DR (mirrors)– Multi-tenancy– Read/write access (POSIX)