apache drill overview - tokyo apache drill meetup 2015/09/15
Post on 13-Feb-2017
1.774 Views
Preview:
TRANSCRIPT
-
2015 MapR Technologies 1
2015 MapR Technologies
Apache Drill Overview
M.C. Srivas CTO and Co-Founder, MapR Technologies Data Engineer, MapR Technologies 2015 9 15
-
2015 MapR Technologies 2
(@nagix) MapR Technologies
NS-SHAFT
!
-
2015 MapR Technologies 3
-
2015 MapR Technologies 4
Apache Drill 1.0 (5/19) http://drill.apache.org
-
2015 MapR Technologies 5
Apache Drill
-
2015 MapR Technologies 6 2015 MapR Technologies
Apache Drill
-
2015 MapR Technologies 7
1980 2000 2010 1990 2020
80%
: Human-Computer Interaction & Knowledge Discovery in Complex Unstructured, Big Data
-
2015 MapR Technologies 8
1980 2000 2010 1990 2020
DB
GBTB TBPB
-
2015 MapR Technologies 9
SQL
SQL NoSQL
SQL
BI (TableauMicroStrategy )
HDFS (ParquetJSON ) HBase
-
2015 MapR Technologies 10
Industry's First Schema-free SQL engine
for Big Data
-
2015 MapR Technologies 11
&
BI
ITBI
BI
ITBI ITBI
BI
IT
IT ETL
IT
1980 -1990 2000
-
2015 MapR Technologies 12
Hadoop
Hadoop
:
:
-
2015 MapR Technologies 13
Drill
(Hive )
2
SCHEMA ON WRITE
SCHEMA BEFORE READ
SCHEMA ON THE FLY
-
2015 MapR Technologies 14
Drill
JSON BSON
HBase
Parquet Avro
CSV TSV
Name ! Gender ! Age !Michael ! M ! 6 !Jennifer ! F ! 3 !
{ ! name: { ! first: Michael, ! last: Smith ! }, ! hobbies: [ski, soccer], ! district: Los Altos !} !{ ! name: { ! first: Jennifer, ! last: Gates ! }, ! hobbies: [sing], ! preschool: CCLC !} !
RDBMS/SQL-on-Hadoop
Apache Drill
-
2015 MapR Technologies 15
- - HBase - Hive
Drill SQL on Everything
SELECT * FROM dfs.yelp.`business.json` !
- - Hive - HBase
- DFS (Text, Parquet, JSON) - HBase/MapR-DB - Hive /HCatalog - Hadoop API
-
2015 MapR Technologies 16
(drillbit)
(MapReduce, Spark, Tez)
ZooKeeper drillbit ZooKeeper drillbit ZooKeeper drillbit
-
2015 MapR Technologies 17
Drill
HDFS MapR-FS DataNode drillbit HBase MapR-DB RegionServer drillbit MongoDB mongod drillbit ()
drillbit
DataNode/RegionServer/
mongod
drillbit
DataNode/RegionServer/
mongod
drillbit
DataNode/RegionServer/
mongod
ZooKeeper ZooKeeper
ZooKeeper
-
2015 MapR Technologies 18
SELECT*
drillbit ZooKeeper
(JDBC, ODBC,
REST)
1. drillbit
3. 4.
ZooKeeper ZooKeeper
drillbit drillbit
2. drillbit
5.
* CTAS (CREATE TABLE AS SELECT) 14
-
2015 MapR Technologies 19
drillbit
SQL Hive
HBase
MongoDB
DFS
RPC
-
2015 MapR Technologies 20 2015 MapR Technologies
-
2015 MapR Technologies 21
M.C. Srivas MapR Technologies CTO
MapReduce, Bigtable
Netapp
AFS AFS
-
2015 MapR Technologies 22
Drill
Raw Data Exploration JSON Analytics Data Hub Analytics
Hive HBase
{JSON}, Parquet Text
-
2015 MapR Technologies 23
IOT
SaaS Apache Drill JSON BI ODBC
ETL
-
2015 MapR Technologies 24
SQL Hadoop
MapR Drill PigHiveQLSQL
Drill Tableau Squirrel
MapR 1/100 $1,000 / TB MapR Drill BI SQL Hadoop
SQL $100,000 / TB
ETL SQL
-
2015 MapR Technologies 25
Customer-facing Analytics as a Service Drill
MapR Drill Drill
Hadoop SQL
Drill
JSON Parquet 10GB4TB 160
SLA
-
2015 MapR Technologies 26
MapR Optimized Data Architecture
, SaaS,
, E
, ,
, ,
Data Movement
Data Access
BI,
,
Optimized Data Architecture
MAPR DISTRIBUTION FOR HADOOP
(Spark Streaming,
Storm)
MapR Data Platform MapR-DB
MAPR DISTRIBUTION FOR HADOOP
(MapReduce,
Spark, Hive, Pig)
MapR-FS
(Drill,
Impala)
-
2015 MapR Technologies 27 2015 MapR Technologies
-
2015 MapR Technologies 28
Apache Drill
Drill Beta(20149 - 20154)
Drill 1.0(20155)
Drill 1.1(20157)
Drill 1.2(20159)
Drill 1.3()
-
2015 MapR Technologies 29
Apache Drill (2015)
ANSI SQL o (Rank, Row_number,
OVER, PARTITION BY) o CTAS
o Hive &
o Hive UDF o Hive Impersonation o AVRO
(Beta) JDBC
Drill 1.1
ANSI SQL o (Lead, Lag,
First_Value, Last_value, NTile) o Drop Table
o Hive
o Hive
o MapR-DB
o
Drill Web UI
Drill 1.2 ANSI SQL o Insert/Append
o
o o Drill on MapR-DB JSON
o MapR-DB
o Parquet
Drill 1.3
-
2015 MapR Technologies 30
Hive BI Hive Hive
Hive Hive Drill Hive UDF Hive Drill Impersonation
Hive
Parquet & Text
Hive
Drill
Drill ODBC
Drill JDBC
1.1
1.2
-
2015 MapR Technologies 31
MapR-DB BI (Tableau,
MicroStrategy, Qlikview, ) MapR-DB KV MapR-DB JSON
MapR-DB SQL ES
MapR-DB
MapR-DB
Drill
Drill ODBC
Drill JDBC
1.2 1.3
1.3
-
2015 MapR Technologies 32
ANSI SQL
Count/Avg/Min/Max/Sum Over/Partition By Rank, Dense_Rank, Percent_Rank, Row_Number, Cume_Dist Lead, Lag, First_Value, Last_Value, Ntile
SQL DDL Parquet Drop table Insert/Append
1.1
1.2
1.1 1.2 1.3
1.1
-
2015 MapR Technologies 33
PAM +
Impersonation
Drill View
JDBC/ODBC
Web UI Files HBase Hive
Drill View 1
Drill View 2
U U U
User
1.2
-
2015 MapR Technologies 34
&
BI
-
2015 MapR Technologies 35 2015 MapR Technologies
-
2015 MapR Technologies 36
Drill (e-Stat)
-
2015 MapR Technologies 37
Drill (e-Stat)
e-Stat Apache Drill http://nagix.hatenablog.com/entry/2015/05/21/232526
-
2015 MapR Technologies 38
-
2015 MapR Technologies 39
-
2015 MapR Technologies 40
Drill JDK 7 $ wget http://getdrill.org/drill/download/apache-drill-1.1.0.tar.gz$ tar -xvzf apache-drill-1.1.0.tar.gz$ apache-drill-1.1.0/bin/drill-embedded0: jdbc:drill:zk=local>
-
2015 MapR Technologies 41
$ ls -l
-
2015 MapR Technologies 42
README$ cat README
-
2015 MapR Technologies 43
-
2015 MapR Technologies 44
MySQL DROP TABLE IF EXISTS ``;CREATE TABLE `` ( `id` int(11) NOT NULL AUTO_INCREMENT, `createdon` timestamp NULL DEFAULT NULL, `createdby` int(11) DEFAULT NULL, ...) ENGINE=InnoDB AUTO_INCREMENT=36993336 DEFAULT CHARSET=utf8;
LOCK TABLES `` WRITE;INSERT INTO `` VALUES (9,'2002-01-17 02:15:08',0,'2011-10-14 13:47:31',20,2,2,1,1,0,19630, ... ),( ... ), ... ,( ... );INSERT INTO `` VALUES (2297,'2002-03-19 22:13:14',0,'2011-10-14 15:47:29',11,3,2,1,2,0,21891, ... ),( ... ), ... ,( ... );...
-
2015 MapR Technologies 45
MySQL DROP TABLE IF EXISTS ``;CREATE TABLE `` ( `id` int(11) NOT NULL AUTO_INCREMENT, `createdon` timestamp NULL DEFAULT NULL, `createdby` int(11) DEFAULT NULL, ...) ENGINE=InnoDB AUTO_INCREMENT=36993336 DEFAULT CHARSET=utf8;
LOCK TABLES `` WRITE;INSERT INTO `` VALUES (9,'2002-01-17 02:15:08',0,'2011-10-14 13:47:31',20,2,2,1,1,0,19630, ... ),( ... ), ... ,( ... );INSERT INTO `` VALUES (2297,'2002-03-19 22:13:14',0,'2011-10-14 15:47:29',11,3,2,1,2,0,21891, ... ),( ... ), ... ,( ... );...
CSV
-
2015 MapR Technologies 46
MySQL CSV #!/usr/bin/perl
while () { s/^(--|\/\*| |\)|DROP|CREATE|LOCK).*//g; # s/^INSERT INTO .+ VALUES \(//g; # INSERT s/(?
-
2015 MapR Technologies 47
CSV SELECT
3197
0: jdbc:drill:zk=local> SELECT count(*) FROM dfs.`/tmp/.csv`;.csv`;+-----------+| EXPR$0 |+-----------+| 31971575 |+-----------+1 row selected (32.733 seconds)
-
2015 MapR Technologies 48
CSV SELECT
CSV columns [a,b,...]
0: jdbc:drill:zk=local> !set maxwidth 1600: jdbc:drill:zk=local> SELECT * FROM dfs.`/tmp/.csv` LIMIT 3;+---------+| columns |+---------+| ["9","2002-01-17 02:15:08","0","2011-10-14 13:47:31","20","2","2","1","1","0","19630","","",""," Ave.","Suite ","To || ["10","2002-01-17 02:22:35","0","2011-10-14 13:47:31","10","2","3","2","2","0","19631","","",""," Ave","","York Region"," || ["11","2002-01-17 20:17:27","0","2011-10-14 13:47:32","0","2","2","1","2","0","19632","","","","","","Toronto",""," |+---------+3 rows selected (0.564 seconds)
-
2015 MapR Technologies 49
CSV SELECT
columns[0], columns[1]
0: jdbc:drill:zk=local> SELECT columns[0], columns[1], columns[2], columns[3], columns[4] FROM dfs.`/tmp/.csv` LIMIT 3;+---------+----------------------+---------+----------------------+---------+| EXPR$0 | EXPR$1 | EXPR$2 | EXPR$3 | EXPR$4 |+---------+----------------------+---------+----------------------+---------+| 9 | 2002-01-17 02:15:08 | 0 | 2011-10-14 13:47:31 | 20 || 10 | 2002-01-17 02:22:35 | 0 | 2011-10-14 13:47:31 | 10 || 11 | 2002-01-17 20:17:27 | 0 | 2011-10-14 13:47:32 | 0 |+---------+----------------------+---------+----------------------+---------+3 rows selected (0.356 seconds)
-
2015 MapR Technologies 50
CSV SELECT
MySQL
0: jdbc:drill:zk=local> SELECT columns[0] AS id, columns[1] AS createdon, columns[2] AS createdby, columns[3] AS updatedon, columns[4] AS updatedby FROM dfs.`/tmp/.csv` LIMIT 3;+-----+----------------------+------------+----------------------+------------+| id | createdon | createdby | updatedon | updatedby |+-----+----------------------+------------+----------------------+------------+| 9 | 2002-01-17 02:15:08 | 0 | 2011-10-14 13:47:31 | 20 || 10 | 2002-01-17 02:22:35 | 0 | 2011-10-14 13:47:31 | 10 || 11 | 2002-01-17 20:17:27 | 0 | 2011-10-14 13:47:32 | 0 |+-----+----------------------+------------+----------------------+------------+3 rows selected (0.327 seconds)
-
2015 MapR Technologies 51
CSV SELECT
CSV VARCHAR CAST( AS )
:
0: jdbc:drill:zk=local> SELECT CAST(columns[0] AS INT) AS id, CAST(columns[1] AS TIMESTAMP) AS createdon, CAST(columns[2] AS INT) AS createdby, CAST(columns[3] AS TIMESTAMP) AS updatedon, CAST(columns[4] AS INT) AS updatedby FROM dfs.`/tmp/.csv` LIMIT 3;Error: SYSTEM ERROR: NumberFormatException:
Fragment 1:2
[Error Id: 33d800c9-78ea-473a-8e41-b13e38307af3 on node1:31010] (state=,code=0)
-
2015 MapR Technologies 52
CSV NULL 1: CASE
2:
CASE WHEN columns[2] = '' THEN NULL ELSE CAST(columns[2] AS INT)END
0: jdbc:drill:zk=local> ALTER SYSTEM SET `drill.exec.functions.cast_empty_string_to_null` = true;+-------+----------------------------------------------------------+| ok | summary |+-------+----------------------------------------------------------+| true | drill.exec.functions.cast_empty_string_to_null updated. |+-------+----------------------------------------------------------+
-
2015 MapR Technologies 53
CSV SELECT 2 0: jdbc:drill:zk=local> SELECT CAST(columns[0] AS INT) AS id, CAST(columns[1] AS TIMESTAMP) AS createdon, CAST(columns[2] AS INT) AS createdby, CAST(columns[3] AS TIMESTAMP) AS updatedon, CAST(columns[4] AS INT) AS updatedby FROM dfs.`/tmp/.csv` LIMIT 3;+-----+------------------------+------------+------------------------+------------+| id | createdon | createdby | updatedon | updatedby |+-----+------------------------+------------+------------------------+------------+| 9 | 2002-01-17 02:15:08.0 | 0 | 2011-10-14 13:47:31.0 | 20 || 10 | 2002-01-17 02:22:35.0 | 0 | 2011-10-14 13:47:31.0 | 10 || 11 | 2002-01-17 20:17:27.0 | 0 | 2011-10-14 13:47:32.0 | 0 |+-----+------------------------+------------+------------------------+------------+3 rows selected (0.734 seconds)
-
2015 MapR Technologies 54
25
1 2
0: jdbc:drill:zk=local> SELECT columns[25] AS gender, count(*) AS number, TRUNC(100.0 * count(*) / 31971575, 2) AS percent FROM dfs.`/tmp/.csv` GROUP BY columns[25] ORDER BY columns[25];+---------+-----------+----------+| gender | number | percent |+---------+-----------+----------+| | 9809 | 0.03 || 0 | 2 | 0.0 || 1 | 4414808 | 13.8 || 2 | 27546956 | 86.16 |+---------+-----------+----------+4 rows selected (31.79 seconds)
-
2015 MapR Technologies 55
0: jdbc:drill:zk=local> SELECT columns[0] AS pnum, columns[1] AS email FROM dfs.`/tmp/.csv` WHERE columns[1] = 'barack.obama@whitehouse.gov';+-----------+------------------------------+| pnum | email |+-----------+------------------------------+| 12655726 | barack.obama@whitehouse.gov |+-----------+------------------------------+1 row selected (10.566 seconds)
-
2015 MapR Technologies 56
/tmp .view.drillJSON
0: jdbc:drill:zk=local> CREATE VIEW dfs.tmp.`` AS SELECT. . . . . . . . . . . > CAST(columns[0] AS INT) AS id,. . . . . . . . . . . > CAST(columns[1] AS TIMESTAMP) AS createdon,. . . . . . . . . . . > CAST(columns[2] AS INT) AS createdby,. . . . . . . . . . . > CAST(columns[3] AS TIMESTAMP) AS updatedon,. . . . . . . . . . . > CAST(columns[4] AS INT) AS updatedby. . . . . . . . . . . > .... . . . . . . . . . . > FROM. . . . . . . . . . . > dfs.`/tmp/.csv`. . . . . . . . . . . > ;
-
2015 MapR Technologies 57
CSV 2642 $ ls Transactions2008-03-21_downloaded.csv 2010-08-19_downloaded.csv 2013-01-16_downloaded.csv2008-03-22_downloaded.csv 2010-08-20_downloaded.csv 2013-01-17_downloaded.csv2008-03-23_downloaded.csv 2010-08-21_downloaded.csv 2013-01-18_downloaded.csv2008-03-24_downloaded.csv 2010-08-22_downloaded.csv 2013-01-19_downloaded.csv2008-03-25_downloaded.csv 2010-08-23_downloaded.csv 2013-01-20_downloaded.csv2008-03-26_downloaded.csv 2010-08-24_downloaded.csv 2013-01-21_downloaded.csv2008-03-27_downloaded.csv 2010-08-25_downloaded.csv 2013-01-22_downloaded.csv2008-03-28_downloaded.csv 2010-08-26_downloaded.csv 2013-01-23_downloaded.csv2008-03-29_downloaded.csv 2010-08-27_downloaded.csv 2013-01-24_downloaded.csv2008-03-30_downloaded.csv 2010-08-28_downloaded.csv 2013-01-25_downloaded.csv2008-03-31_downloaded.csv 2010-08-29_downloaded.csv 2013-01-26_downloaded.csv2008-04-01_downloaded.csv 2010-08-30_downloaded.csv 2013-01-27_downloaded.csv2008-04-02_downloaded.csv 2010-08-31_downloaded.csv 2013-01-28_downloaded.csv2008-04-03_downloaded.csv 2010-09-01_downloaded.csv 2013-01-29_downloaded.csv...
-
2015 MapR Technologies 58
10 0: jdbc:drill:zk=local> columns[19] AS TXT_COUNTRY, count(*) AS number from dfs.`/tmp/Transactions` GROUP BY columns[19] ORDER BY count(*) DESC LIMIT 10;Transactions` GROUP BY columns[19] ORDER BY count(*) DESC LIMIT 10;+--------------+----------+| TXT_COUNTRY | number |+--------------+----------+| US | 7591509 || CA | 823746 || BR | 197032 || AU | 146745 || TW | 118338 || CL | 109875 || ZA | 78126 || AR | 75314 || JP | 74165 || GB | 57901 |+--------------+----------+
-
2015 MapR Technologies 59
CSV $ cd Transactions$ for file in `ls *.csv`; do> dir=`echo $file | cut -c 1-7 | tr - /`> if [ ! -d $dir ]; then> mkdir -p $dir> fi> mv $file $dir> done$ ls2008 2009 2010 2011 2012 2013 2014 2015$ ls 200803 04 05 06 07 08 09 10 11 12$ ls 2008/032008-03-21_downloaded.csv 2008-03-25_downloaded.csv 2008-03-29_downloaded.csv2008-03-22_downloaded.csv 2008-03-26_downloaded.csv 2008-03-30_downloaded.csv2008-03-23_downloaded.csv 2008-03-27_downloaded.csv 2008-03-31_downloaded.csv2008-03-24_downloaded.csv 2008-03-28_downloaded.csv
-
2015 MapR Technologies 60
dir0,dir1 0: jdbc:drill:zk=local> SELECT dir0 AS year, dir1 AS month, TRUNC(SUM(CAST(REGEXP_REPLACE(REGEXP_REPLACE(columns[2], '^\\(', '-'), ',|\\)', '') AS DOUBLE)), 2) AS amount from dfs.`/tmp/Transactions` WHERE columns[2] 'AMOUNT' GROUP BY dir0, dir1 ORDER BY dir0, dir1;+-------+-------+-----------------+| dir0 | dir1 | amount |+-------+-------+-----------------+| 2008 | 03 | 97676.25 || 2008 | 04 | 266162.39 || 2008 | 05 | 1330456.45 || 2008 | 06 | 1630110.26 || 2008 | 07 | 2590733.03 || 2008 | 08 | 2743130.11 || 2008 | 09 | 2436655.66 || 2008 | 10 | 2534268.59 || 2008 | 11 | 2934391.31 |...
-
2015 MapR Technologies 61
-
2015 MapR Technologies 62
Apache Drill
-
2015 MapR Technologies 63
Q & A @mapr_japan maprjapan
sales-jp@mapr.com
MapR
maprtech
mapr-technologies
top related