what's hot: big data analytics met hadoop · pig hive map reduce hdfs base sas &...
TRANSCRIPT
Copyright © 2014, SAS Institute Inc. All rights reserved.
Big Data Analytics met HadoopJos van DongenArno Klijnman
Copyright © 2014, SAS Institute Inc. All rights reserved.
Distributed storage andprocessing of (big) data on large clusters of commodity hardware
HDFS
Map/Reduce
What is…
Copyright © 2014, SAS Institute Inc. All rights reserved.
HDFS - Distributed storage for big files
Copyright © 2014, SAS Institute Inc. All rights reserved.
Map/Reduce- Distributed processing for big data
Copyright © 2014, SAS Institute Inc. All rights reserved.
The Hadoop Jungle
Copyright © 2014, SAS Institute Inc. All rights reserved.
SAS & Hadoop Capabilities
WITH Hadoop ON Hadoop IN Hadoop
HDFS
• SAS Data Quality Accelerator
• SAS Scoring Accelerator
• SAS Code Accelerator
Copyright © 2014, SAS Institute Inc. All rights reserved.
SAS & Hadoop Integration
Next-Gen
SAS®
User
SAS®
User
User
Interface
Metadata
Data
Access
Data
Processing
File
System
SAS Metadata
In-Memory
Data Access
HivePig
Map Reduce
HDFS
Base SAS & SAS/ACCESS® to Hadoop™In-Memory
Data Access
HivePig
SAS® Data
Management
SAS® Visual
Analytics
SAS® Visual
Statistics
SAS®
Enterprise
Miner™
SAS®
Studio
SAS® LASR™ Analytic
Server
SAS Embedded
Process
SAS® In-memory
Statistics for
Hadoop
Copyright © 2014, SAS Institute Inc. All rights reserved.
Two Paradigms
Hadoop as a Data PlatformHadoop as a core component of next
generation analytical platform
TEXT
MANAGE
DATA
EX
PL
OR
E
DA
TA
DEVELOP
MODELS
DE
PL
OY
&
MO
NIT
OR
Copyright © 2014, SAS Institute Inc. All rights reserved.
Paradigm two Hadoop as a core component of next generation analytical platform
TEXT
MANAGE
DATA
EX
PL
OR
E
DA
TA
DEVELOP
MODELS
DE
PL
OY
&
MO
NIT
OR
• SAS/ACCESS
• SAS Data Management
• SAS Federation Server
• SAS Event Stream Processing
• SAS Data Loader for Hadoop SAS Data Quality Accelerator for
Hadoop
SAS Code Accelerator for Hadoop
• SAS Data Loader for Hadoop
• SAS Visual Analytics
• SAS In-memory Statistics for Hadoop
• SAS High Performance Analytics Products
• SAS Visual Statistics
• SAS In-memory Statistics for Hadoop
• SAS Scoring Accelerator
for Hadoop
• SAS Decision Manager
• SAS Visual Analytics
Copyright © 2014, SAS Institute Inc. All rights reserved.
IDENTIFY /
FORMULATE
PROBLEM
DATA
PREPARATION
DATA
EXPLORATION
TRANSFORM
& SELECT
BUILD
MODEL
VALIDATE
MODEL
DEPLOY
MODEL
EVALUATE /
MONITOR
RESULTS
SAS runs the Entire Analytical Lifecycle in/on/with Hadoop• BASE SAS
• SAS / Access
• SAS Data Loader for Hadoop
• SAS DI Studio
• SAS Visual Analytics
SAS Visual Statistics
• SAS High Performance Analytics
Offerings
• SAS In-Memory Statistics for
Hadoop
Done using either the Data
Preparation, Data Exploration
or Build Model Tools
• SAS High Performance Analytics Offerings
• SAS In-Memory Statistics for Hadoop
• SAS Visual Statistics
Done using the Build Model
Tools and other checks
• SAS Scoring Accelerator
for Hadoop
• SAS Code Accelerator
for Hadoop
• SAS Visual
Analytics
Copyright © 2014, SAS Institute Inc. All rights reserved.
USER ROLES & THE ANALYTICS LIFECYCLE
IDENTIFY /
FORMULATE
PROBLEMDATA
PREPARATION
DATA
EXPLORATION
TRANSFORM
& SELECTBUILD
MODEL
VALIDATE
MODEL
DEPLOY
MODEL
EVALUATE /
MONITOR
RESULTSDomain Expert
Makes Decisions
Evaluates Processes and ROI
BUSINESS
MANAGER
Model Validation
Model Deployment
Data Preparation
IT SYSTEMS /
MANAGEMENT
Data Exploration
Data Visualization
Report Creation
BUSINESS
ANALYST
Exploratory Analysis
Descriptive Segmentation
Predictive Modeling
ANALYST
DATA SCIENTIST
Copyright © 2014, SAS Institute Inc. All rights reserved.
USER ROLES & THE ANALYTICS LIFECYCLE
IDENTIFY /
FORMULATE
PROBLEMDATA
PREPARATION
DATA
EXPLORATION
TRANSFORM
& SELECTBUILD
MODEL
VALIDATE
MODEL
DEPLOY
MODEL
EVALUATE /
MONITOR
RESULTSDomain Expert
Makes Decisions
Evaluates Processes and ROI
BUSINESS
MANAGER
Model Validation
Model Deployment
Data Preparation
IT SYSTEMS /
MANAGEMENT
Data Exploration
Data Visualization
Report Creation
BUSINESS
ANALYST
Exploratory Analysis
Descriptive Segmentation
Predictive Modeling
ANALYST
DATA SCIENTIST
Copyright © 2014, SAS Institute Inc. All rights reserved.
Copyright © 2014, SAS Institute Inc. All rights reserved.
USER ROLES & THE ANALYTICS LIFECYCLE
IDENTIFY /
FORMULATE
PROBLEMDATA
PREPARATION
DATA
EXPLORATION
TRANSFORM
& SELECTBUILD
MODEL
VALIDATE
MODEL
DEPLOY
MODEL
EVALUATE /
MONITOR
RESULTSDomain Expert
Makes Decisions
Evaluates Processes and ROI
BUSINESS
MANAGER
Model Validation
Model Deployment
Data Preparation
IT SYSTEMS /
MANAGEMENT
Data Exploration
Data Visualization
Report Creation
BUSINESS
ANALYST
Exploratory Analysis
Descriptive Segmentation
Predictive Modeling
ANALYST
DATA SCIENTIST
Copyright © 2014, SAS Institute Inc. All rights reserved.
SAS® Data Loader for HadoopA new SAS Web-based Business user interface
Point & Click
User Menus
Little or no Hadoop
experience neededSelf-Service UI HTML 5 Interface
Enables Self-Service approach to managing data in Hadoop environment
Copyright © 2014, SAS Institute Inc. All rights reserved.
SAS® Data Loader for HadoopTransform Data in Hadoop
Filtering RulesColumn
SelectionsAggregation
No coding, scripting or specialized skills required
Copyright © 2014, SAS Institute Inc. All rights reserved.
SAS® Data Loader for HadoopQuery Hadoop data
Select
Source Tables
Apply Query
Criteria
See subset of data in
Table Viewer
Simple Drag & Drop approach to Query Data inside Hadoop
Copyright © 2014, SAS Institute Inc. All rights reserved.
SAS® Data Loader for HadoopProfile Hadoop Data
Select
Source Table
View Reports in
Column Display
Run standard metrics on data inside Hadoop and generate reports
View Reports in
Table Display
Copyright © 2014, SAS Institute Inc. All rights reserved.
Copyright © 2014, SAS Institute Inc. All rights reserved.
Copyright © 2014, SAS Institute Inc. All rights reserved.
View Data
Copyright © 2014, SAS Institute Inc. All rights reserved.
SAS® Data Loader for HadoopCopy Data to distributed sas® lasr server
Select
Source Table
Explore Hadoop data quickly and easily for faster insights
Copy Data To distributed
SAS® LASR Servers
SAS® Visual Analytics
Optional
Visualize Data
Copyright © 2014, SAS Institute Inc. All rights reserved.
Copyright © 2014, SAS Institute Inc. All rights reserved.
USER ROLES & THE ANALYTICS LIFECYCLE
IDENTIFY /
FORMULATE
PROBLEMDATA
PREPARATION
DATA
EXPLORATION
TRANSFORM
& SELECTBUILD
MODEL
VALIDATE
MODEL
DEPLOY
MODEL
EVALUATE /
MONITOR
RESULTSDomain Expert
Makes Decisions
Evaluates Processes and ROI
BUSINESS
MANAGER
Model Validation
Model Deployment
Data Preparation
IT SYSTEMS /
MANAGEMENT
Data Exploration
Data Visualization
Report Creation
BUSINESS
ANALYST
Exploratory Analysis
Descriptive Segmentation
Predictive Modeling
ANALYST
DATA SCIENTIST
Copyright © 2014, SAS Institute Inc. All rights reserved.
Copyright © 2014, SAS Institute Inc. All rights reserved.
USER ROLES & THE ANALYTICS LIFECYCLE
IDENTIFY /
FORMULATE
PROBLEMDATA
PREPARATION
DATA
EXPLORATION
TRANSFORM
& SELECTBUILD
MODEL
VALIDATE
MODEL
DEPLOY
MODEL
EVALUATE /
MONITOR
RESULTSDomain Expert
Makes Decisions
Evaluates Processes and ROI
BUSINESS
MANAGER
Model Validation
Model Deployment
Data Preparation
IT SYSTEMS /
MANAGEMENT
Data Exploration
Data Visualization
Report Creation
BUSINESS
ANALYST
Exploratory Analysis
Descriptive Segmentation
Predictive Modeling
ANALYST
DATA SCIENTIST
Copyright © 2014, SAS Institute Inc. All rights reserved.
SAS Scoring Accelerator for Hadoop
SAS Model
Manager
Export Score Code
(EM,SAS/STAT,VS)Scoring File(s)
Hadoop Publish Macro
Copyright © 2014, SAS Institute Inc. All rights reserved.
SAS Scoring Accelerator for Hadoop
Copyright © 2014, SAS Institute Inc. All rights reserved.
SAS DI Data Loader VA
Explorer
VS IMSTAT Scoring
Accelerator
• Access to
Hadoop
• Transform
• Write back
to Hadoop
• Write to
LASR
• Show table
• Profile
• Build Query
• Write result to
LASR
• Discover
relations
• Understand
the data
• Discover a
model
• Determine
significance
• Cluster
variables
• Recommendation
• Datastep to enrich
original dataset
with
recommendation
results
• Write to LASR
• Deploy
model
• Run model
• Back to Data
Loader
Demo flow
SAS Data Management SAS Interactive Analytics On Hadoop SAS Analytics
Copyright © 2014, SAS Institute Inc. All rights reserved.
SAS & Hadoop: 3 Things to Remember
WITH Hadoop ON Hadoop IN Hadoop
HDFS
Copyr i g ht © 2014, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
AWS-Cloud
Elastic IP Address
Internet
Setup:
- CentOS operating system
- Local users on all
Amazon servers
- Internal network for all
Amazon Servers
- Open firewall for all ports
between workstation &
server
- No integration Mail
server
- No SSL1
2
3
Demo Environment Infrastructure
Copyright © 2014, SAS Institute Inc. All rights reserved.
Copyright © 2014, SAS Institute Inc. All rights reserved.
9 oktober 2014
Huizen
Copyright © 2014, SAS Institute Inc. All rights reserved.