getting started with sas and hadoop · #analyticsx c o p y r ig ht © 201 6, sas in stitute in c....
TRANSCRIPT
![Page 1: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/1.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
Getting Started with SAS and Hadoop
Jeff Bailey
![Page 2: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/2.jpg)
Why Hadoop?
![Page 3: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/3.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
HOW MUCH DOES THIS DRIVE COST?
3 TB
![Page 4: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/4.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
HOW MUCH DOES THIS DRIVE COST?
3 TB
Silly, you couldn’t get a 3TB drive in 1980!
1980 $1,312,500,000
![Page 5: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/5.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
HOW MUCH DOES THIS DRIVE COST?
3 TB
That’s $0.03 per GB! TODAY $69
2010 $270
2005 $3,720
2000 $33,000
1995 $3,360,000
1990 $33,600,000
1985 $315,000,000
1980 $1,312,500,000
![Page 6: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/6.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
HOW MUCH DOES THIS DRIVE COST?That’s $0.03 per GB!
3 TB
TODAY $92
2010 $270
2005 $3,720
2000 $33,000
1995 $3,360,000
1990 $33,600,000
1985 $315,000,000
1980 $1,312,500,000
TODAY $69
2010 $270
2005 $3,720
2000 $33,000
1995 $3,360,000
1990 $33,600,000
1985 $315,000,000
1980 $1,312,500,000
Insight: Disk Space is FREE!
![Page 7: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/7.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
IT’S NOT JUST ABOUT COST!
3 TB
How long does it take to read 3 TB of data?
![Page 8: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/8.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
IT’S NOT JUST ABOUT COST!
3 TB4.17 Hours
How long does it take to read 3 TB of data?
![Page 9: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/9.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
3 TB
How long does it take to read 3 TB?
4.17 Hours3 TB
4.17 HoursWhat happens if you add more disks?
IT’S NOT JUST ABOUT COST!
![Page 10: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/10.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
HOW LONG DOES IT TAKE TO READ A 3 TB FILE?
4.17 hr
2.5 min
15 sec
1 disk
100 disks
1000 disks
![Page 11: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/11.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
HOW LONG DOES IT TAKE TO READ A 3 TB FILE?
4.17 hr
2.5 min
15 sec
1 disk
100 disks
1000 disks
Insight: More Disks are FASTER!
![Page 12: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/12.jpg)
What is Hadoop?
![Page 13: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/13.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
• Distributed Storage Performs Great
• Data is Replicated
• Reasonable Cost
• Sits on the OS File System
Had
oo
p
HDFS
Hadoop is a Storage Platform
![Page 14: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/14.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
• MapReduce/YARN
• Distributed Processing
• Data Locality
• Usually Java
Had
oo
p
YARN / MapReduce
HDFS
Hadoop is a Processing Platform
![Page 15: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/15.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
• Scripting Language
• Higher level than programming Java MapReduce
• Pig Latin scripts are converted to MapReduce jobs
• Great for joining data
• Great for transforming data
Had
oo
p
YARN / MapReduce
HDFS
Pig
Apache Pig
![Page 16: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/16.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
• Distributed Processing
• Data Locality
• Map Phase
• Reduce Phase
Clo
ud
era
YARN / MapReduce
HDFS
Had
oo
p
people = LOAD '/user/training/customers' AS (cust_id, name); orders = LOAD '/user/training/orders' AS (ord_id, cust_id, cost); groups = GROUP orders BY cust_id; totals = FOREACH groups GENERATE group, SUM(orders.cost) AS t; result = JOIN totals BY group, people BY cust_id;DUMP result;
Apache Pig: Example Program
![Page 17: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/17.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
• SQL on Hadoop
• Similar to traditional SQL
• Reduces development time
• Enables BI on Hadoop
• Schema-on-Read
• You choose underlying file format
Had
oo
p
YARN / MapReduce
HDFS
Pig Hive2
Apache Hive
![Page 18: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/18.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
• SQL on Hadoop
• Similar to traditional SQL
• Reduces development time
• Enables BI on Hadoop
• Schema-on-Read
• You choose underlying file format
Clo
ud
era
YARN / MapReduce
HDFS
Pig Hive2
Had
oo
p
SELECT zipcode, SUM(cost) AS total
FROM customers
JOIN orders
ON (customers.cust_id = orders.cust_id)
WHERE zipcode LIKE '63%'
GROUP BY zipcode
ORDER BY total DESC;
Apache Hive
![Page 19: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/19.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
• High-performance SQL engine
• Handles concurrency well
• Does not rely on MapReduce
• Supports a dialect of SQL very similar to Hive’s
• 100% open source
• Apache License
Clo
ud
era
YARN / MapReduce
HDFS
Pig Hive2 Impala
Apache Impala is a SQL Engine
![Page 20: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/20.jpg)
How can SAS Interact with Hadoop?
![Page 21: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/21.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
Using Base SAS 9.4 with Hadoop
FILEREF HDFSData Files
Data Files#1
![Page 22: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/22.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
SAS FILENAME Statement for Hadoop
Had
oo
p
YARN / MapReduce
HDFS
Pig Hive2 Impala
SAS
![Page 23: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/23.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
SAS FILENAME Statement for Hadoop
Clo
ud
era
YARN / MapReduce
HDFS
Pig Hive2 Impala
SAS
options set=SAS_HADOOP_CONFIG_PATH="\\sashq\cdh45p1";options set=SAS_HADOOP_JAR_PATH="\\sashq\cdh45";
FILENAME hdp1 hadoop 'test.txt';
/* Write file to HDFS */data _null_;
file hdp1;put ' Test Test Test';
run;
/* Read file from HDFS */data test;
infile hdp1;input textline $15.;
run;
![Page 24: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/24.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
Using Base SAS 9.4 with Hadoop
FILEREF HDFSData Files
Data Files#1
PROC
HadoopHadoop
MapReduce +HDFS commands#2
![Page 25: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/25.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
Hadoop ProcedureH
ad
oo
p
YARN / MapReduce
HDFS
Pig Hive2 Impala
SAS
• Submit HDFS commands
• Submit MapReduce Jobs
• Submit Pig Latin programs
![Page 26: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/26.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
How Do I Submit HDFS Commands?
Clo
ud
era
YARN / MapReduce
HDFS
Pig Hive2 Impala
SASfilename cfg 'C:\Hadoop_cfg\cdh57.xml';
/* Copy war_and_peace.txt to HDFS. *//* Copy moby_dick.txt to HDFS. */proc hadoop options=cfg username="sasxjb" verbose;
HDFS mkdir='/user/sasxjb/Books';HDFS COPYFROMLOCAL="C:\Hadoop_data\moby_dick.txt"
OUT='/user/sasxjb/Books/moby_dick.txt';HDFS COPYFROMLOCAL="C:\Hadoop_data\war_and_peace.txt"
OUT='/user/sasxjb/Books/war_and_peace.txt';run;
![Page 27: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/27.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
How Do I Submit MapReduce Jobs?
Clo
ud
era
YARN / MapReduce
HDFS
Pig Hive2 Impala
SASfilename cfg 'C:\Hadoop_cfg\cdh57.xml';
proc hadoop options=cfg user="sasxjb" verbose;mapreduce input='/user/sasxjb/Books/moby_dick.txt' output='/user/sasxjb/outBook' jar='C:\Hadoop_examples\hadoop-examples-1.2.0.1.3-96.jar' outputkey="org.apache.hadoop.io.Text" outputvalue="org.apache.hadoop.io.IntWritable" reduce="org.apache.hadoop.examples.WordCount$IntSumReducer" combine="org.apache.hadoop.examples.WordCount$IntSumReducer" map="org.apache.hadoop.examples.WordCount$TokenizerMapper";
run;
![Page 28: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/28.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
How Do I Submit Pig Latin Programs?
Clo
ud
era
YARN / MapReduce
HDFS
Pig Hive2 Impala
SAS
filename cfg 'C:\Hadoop_cfg\cdh57.xml';
proc hadoop options=cfg username="sasxjb“ verbose; pig code=pigcode ;
run;
![Page 29: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/29.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
Using Base SAS 9.4 with Hadoop
FILEREF HDFSData Files
Data Files#1
PROC
HadoopHadoop
MapReduce +HDFS commands#2
SAS/ACCESS HiveServer2HiveQL
Result sets#3
![Page 30: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/30.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
SAS/ACCESS Interface to Hadoop
• Generates HiveQL
• Connects via JDBC
• Makes Hive tables look like SAS data sets
• Bulk loads directly to HDFS
• Can read directly from HDFS
Had
oo
p
YARN / MapReduce
HDFS
Pig Hive2 Impala
SAS
![Page 31: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/31.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
How Does SAS/ACCESS Talk to Hadoop?proc sql;
select count(*) from mycdh.customer_dimwhere loyalty_program='Chocolate Club';
run;
?
![Page 32: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/32.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
How Does SAS/ACCESS Talk to Hadoop?proc sql;
select count(*) from mycdh.customer_dimwhere loyalty_program='Chocolate Club';
run;
select COUNT(*) from `CUSTOMER_DIM` TXT_1
WHERE TXT_1.`loyalty_program` = 'Chocolate Club'
SAS Generated This SQL
![Page 33: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/33.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
How Does SAS/ACCESS Talk to Hadoop?proc sql;
select count(*) from mycdh.customer_dimwhere loyalty_program='Chocolate Club';
run;
OPTIONS SASTRACE=',,,d' SASTRACELOC=SASLOG NOSTSUFFIX;
select COUNT(*) from `CUSTOMER_DIM` TXT_1
WHERE TXT_1.`loyalty_program` = 'Chocolate Club'
SAS Generated This SQL
![Page 34: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/34.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
SAS/ACCESS Interface to Hadoop
• Generates HiveQL• Connects via JDBC
• Makes Hive tables look like SAS data sets
• Bulk loads directly to HDFS
• Can read directly from HDFS
Had
oo
p
YARN / MapReduce
HDFS
Pig Hive2 Impala
SAS
![Page 35: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/35.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
We Can Write Our Own HiveQL!proc sql;
connect to hadoop (server=quickstart
user=cloudera);
execute (create table store_cnt
row format delimited
fields terminated by '\001‘
stored as parquet
as
select customer_rk, count(*) as tot
from order_fact
group by customer_rk) by hadoop;
quit;
Explicit Pass-Through
![Page 36: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/36.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
Clo
ud
era
YARN / MapReduce
HDFS
Pig Hive2 Impala
What about Apache Impala?
SAS/ACCESS Interface to Impala:
• Connects via ODBC
• Makes Hive tables look like SAS data sets
• Bulk loads directly to HDFS
SAS/ACCESS
to ImpalaImpala
HiveQL
Result sets#4
![Page 37: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/37.jpg)
In-Database: Code Accelerator
![Page 38: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/38.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
proc ds2 indb=yes;
thread tpgm / overwrite=yes;
method run();
set hdplib.intable;
output;
end;
endthread;
run;
data hdplib.outdata
(overwrite=yes);
dcl thread tpgm hdpdata;
method run();
set from hdpdata;
end;
enddata;
run;
quit;
What is SAS In-Database Code Accelerator?
SAS In-Database Code Accelerators let
you run SAS code inside Hadoop. With this
you get:
• DS2 processing (modern DATA Step)
• More Data Types
• Code Packages
• More Programming Structures
• Parallel Database Operations
• Thread Programs Run Inside Database
![Page 39: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/39.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
In-Database Code Accelerator Runs in Hadoop
Had
oo
p
YARN / MapReduce
HDFS
Pig Hive2 SAS EP
proc ds2 indb=yes;
thread tpgm / overwrite=yes;
method run();
set hdplib.intable;
output;
end;
endthread;
run;
data hdplib.outdata
(overwrite=yes);
dcl thread tpgm hdpdata;
method run();
set from hdpdata;
end;
enddata;
run;
quit;
![Page 40: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/40.jpg)
In-Database: Scoring Accelerator
![Page 41: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/41.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
What is SAS In-Database Scoring Accelerator?H
ad
oo
p
YARN / MapReduce
HDFS
Pig Hive2 SAS EP
SAS In-Database Scoring
Accelerator lets you score models
inside the cluster. With this you get:
• Uses the SAS Embedded
Process
• Faster Scoring
• Less data movement – score
data where it lives
• Uses fewer resources
![Page 42: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/42.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
What does the Scoring Process Look like?
![Page 43: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/43.jpg)
Data Loader for Hadoop
![Page 44: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/44.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
Data Loader for Hadoop – Self Service Big Data
• Easy to use UI
• Query Data
• Manage Data
• Transform Data
• Run Custom Code
• Move Data
![Page 45: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/45.jpg)
SAS Grid Manager for Hadoop
![Page 46: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/46.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
What is SAS Grid Manager for Hadoop?
![Page 47: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/47.jpg)
Servers
![Page 48: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/48.jpg)
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
SAS Viya + Hadoop Architecture
Microservices In-Memory Engine
Cloud Analytic Services (CAS)
SAS Data Connector to Hadoop
SAS Data Connect Acceleratorfor Hadoop
Hadoop
HDFS as infrastructure
![Page 49: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/49.jpg)
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
#analyticsx
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
http://www.linkedin.com/in/jeffreydbailey
https://github.com/Jeff-
Bailey/SAS13341_SAS_Hadoop
Feel Free to Contact Me!
![Page 50: Getting Started with SAS and Hadoop · #analyticsx C o p y r ig ht © 201 6, SAS In stitute In c. All r ig hts r ese rve d. Getting Started with SAS and Hadoop Jeff Bailey](https://reader034.vdocuments.net/reader034/viewer/2022050714/5b92036f09d3f215288cf467/html5/thumbnails/50.jpg)
C o p y r ig ht © 201 6, SAS In st i tute In c. A l l r ig hts r ese rve d.
#analyticsx