sanntids- og big data analytics - sas · data streams event streaming engine real time decision...

36
Company Confidential - For Internal Use Only Copyright © 2014, SAS Institute Inc. All rights reserved. SANNTIDS- OG BIG DATA ANALYTICS EN HYPE ELLER REALITET? JONAS LIE-NIELSEN, PRINCIPAL SOLUTION ARCHITECT, NORDIC ENTERPRISE ANALYTICS COE

Upload: tranngoc

Post on 01-Apr-2018

217 views

Category:

Documents


2 download

TRANSCRIPT

Company Confidential - For Internal Use Only

Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.

SANNTIDS- OG BIG DATA ANALYTICS

EN HYPE ELLER REALITET?

JONAS LIE-NIELSEN, PRINCIPAL SOLUTION ARCHITECT, NORDIC ENTERPRISE ANALYTICS COE

Company Confidential - For Internal Use Only

Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.

"Big data is what happened when

the cost of storing information

became less than the cost of making

the decision to throw it away.”George Dyson

BIG DATA THE ERA OF ABUNDANCE

Company Confidential - For Internal Use Only

Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.

Discovery-centric

Everything is

permitted unless it is

forbidden

Focus on value

Technology empowered

TWO ARAS ….TWO MINDSETS

Company Confidential - For Internal Use Only

Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.

THE CHANGE FROM SCARCITY TO ABUNDANT THINKING

scarcity

cost

scarcity

cost

abundant

value

abundant

value

Company Confidential - For Internal Use Only

Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.

BIG DATA INFORMATION SOURCES

Source: Gartner (September 2013)

Company Confidential - For Internal Use Only

Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.

BIG DATA BUSINESS USE CASES

Source: Hortonworks

Company Confidential - For Internal Use Only

Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.

HADOOP

Company Confidential - For Internal Use Only

Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.

WHAT IS HADOOP? DICTIONARY DEFINITION

“Hadoop is one way of using a set of cheap

computers to store an enormous amount of data

and then to process that data in parallel."

Company Confidential - For Internal Use Only

Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.

WHAT IS HADOOP?AS A DATA PLATFORM LOWER STORAGE COSTS ARE

MUCH LOWER…

$0,00

$2 000 000,00

$4 000 000,00

$6 000 000,00

$8 000 000,00

$10 000 000,00

$12 000 000,00

$14 000 000,00

$16 000 000,00

$18 000 000,00

1 10 100 1000

Tota

l Co

st

Number of Gigabytes

Hadoop

Teradata Warehouse Appliance

Oracle Exadata

IBM Netezza

Company Confidential - For Internal Use Only

Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.

WHAT IS HADOOP? MAKING HADOOP EASY AND ENTERPRISE READY…

Company Confidential - For Internal Use Only

Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.

WHAT IS HADOOP? BUZZWORDS: MAP AND REDUCE – PROCESSING DATA!

• Computing Power of

lots of small servers

• Standard Processing

Approach

• Custom Coding to

exploit the

environment

• Designed for batch

processing

Company Confidential - For Internal Use Only

Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.

BIG DATA ADOPTION RATES

The adoption of Hadoop has started

• Biggest adoption of Apache hadoop (pure opensouce)

• The pureplayers (Claudera, Hortenworks, MapR) are coming fast

• Low current adoption, Claudera has currelty 350 customer

• Fast growing, Hortonworks has added 250 new customers the last 5 quarters

• The big ones straigh after with IBM in front

Company Confidential - For Internal Use Only

Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.

WHY SHOULD YOU

CARE?

SOME OF THE ORGANISATIONS THAT PUBLICLY STATE

USE OF HADOOP

Company Confidential - For Internal Use Only

Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.

HOW DOES SAS DEAL WITH THIS?

Company Confidential - For Internal Use Only

Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.

BIG DATA HOW TO PROCESS IT?

BIG DATA FOCUS IS SHIFTING TO STREAMING DATA ANALYSIS

FOR LOW LATENCY DECISION MAKING

Company Confidential - For Internal Use Only

Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.

SAS DEPLOYMENT PATTERNS

Teradata

SAS

Data

In-Database

Teradata

SAS

Traditional SAS

Even with In-Database processing there will still be some

work performed on the SAS server

Teradata

SAS

Data

Memory

Data

Even with In-Memory processing there will still be some

work performed on the SAS server

67xx, 27xx Teradata 720

In-Memory

Company Confidential - For Internal Use Only

Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.

SAS HIGH PERFORMANCE ANALYTICS SOLUTIONS

Company Confidential - For Internal Use Only

Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.

BIG DATA WHY HIGH PERFORMANCE ANALYTICS?

We had one customer who was spending about five and a

half hours building an attribution model. With high-

performance data mining, they’re now building it in about

three minutes. Plus, we were able to get a factor of about

two times more lift, meaning millions of dollars for the

customer in terms of return on investment.”

~Wayne Thompson Chief Data Scientist, SAS

“Since in-memory processing is so fast, the time to

process advanced analytics on big data is reduced. This

frees up more time to actually think differently, experiment

with different approaches, fine-tune your champion model,

and eventually increase predictive power

~ Large data Big data

Data volume

Analytic models

based on samples

Omni channel

Marketing optimization

Full scale high frequency

analytic models

Analytics

Meduim

computation

Heavy

computation

Social media analytics

Sensor/log analytics

Real time analytics

Company Confidential - For Internal Use Only

Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.

Advanced analytics

Forcasting

Data management

Data sources

SAS HIGH LEVEL ARCHITECTURE

SAS Compute Grid

DWH

In - database

SAP BW

Other

sources

Xml &

files

Data streamsEvent Streaming

Engine

Real time decision engine

In-Memory Analytics Engine(s)

High

performance

analytics

Company Confidential - For Internal Use Only

Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.

SAS IN-DATABASE

Company Confidential - For Internal Use Only

Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.

THE VALUE OF INDB PERFORMANCE

DS2IP_Hive Demo SlidesFrom:

0

20

40

60

80

100

120

140

160

180

200

0 5 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 5 0 0 0 0 0 0 2 0 0 0 0 0 0 0 2 5 0 0 0 0 0 0

TIM

E (S

)

NUMBER OF ROWS

INITIAL PERFORMANCE NUMBERS

Hadoop INDB Not INDB Greenplum INDB

Company Confidential - For Internal Use Only

Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.

THE GOAL OF EP RUN TK IN THE DATABASE

TK runs in Database (New)TK runs on Client (Old)

Database

SAS Server

SAS Procs

Data Data Data

TK

Database

SAS Server

SAS Procs

Data Data Data

Database

Process

Database

Process

EP

TK

Company Confidential - For Internal Use Only

Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.

SAS AND HADOOP

Company Confidential - For Internal Use Only

Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.

SAS FOR HADOOP TECHNOLOGY DIRECTION

SAS

Hive

SAS/Access to Hadoop - Extract

data from Hadoop into SAS

Embedded Process - Push

some SAS processing to

Hadoop with Map Reduce

SAS

Score A Code AImpala

In-Memory Analytics - Use

Hadoop for Storage persistence

and commodity computing.

SAS

HPA LASR

Some inspiration: https://www.youtube.com/watch?v=J3b8nMUMo4Y

Company Confidential - For Internal Use Only

Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.

SAS FOR HADOOP SAS / ACCESS

SAS/Access to Hadoop

SAS/Access to Cloudera Impala

SAS/Access to Hadoop – Push some SAS processing to Hadoop via Hive QL 1

HADOOP

Data

SAS/ACCESS

Hadoop

SASSERVER

Company Confidential - For Internal Use Only

Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.

SAS/ Embedded Process – Push SAS data management processing to Hadoop with Map Reduce 2

SAS FOR HADOOP SAS / EMBEDDED PROCESS

SAS/Scoring Accelerator for Hadoop

SAS/Code Accelerator for Hadoop (July 2014)

SAS/Data Quality Accelerator for Hadoop (July 2014)

proc ds2 ;

/* thread ~ eqiv to a mapper */

thread map_program;

method run(); set dbmslib.intab;

/* program statements */

end; endthread; run;

/* program wrapper */

data hdf.data_reduced;

dcl thread map_program map_pgm; method run();

set from map_pgm threads=N;

/* reduce steps */ end; enddata;

run; quit;

HADOOP

SAS Data Step and DS2 Jobs

SASSERVER

Company Confidential - For Internal Use Only

Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.

SAS HADOOP

ARCHITECTURE IN-MEMORY SOLUTION

SAS®

LASR ANALYTIC SERVER

SAS®

IN-MEMORY

SAS®

IN-MEMORY

SAS®

IN-MEMORY

SAS®

IN-MEMORY

SAS®

IN-MEMORY

HADOOPWEB CLIENTS APPLICATIONS

In-Memory Analytics – Process in Memory, use Hadoop for Storage persistence and commodity computing

4 SAS ANALYTIC HADOOP ENVIRONMENT

Visual Analytics

Visual Statistics

Visual Scenario Designer

In-Memory Statistics

Visual Data Builder

Str

eam

ing

(ES

P)

Dis

trib

ute

d

(EP

/SQ

OO

P)

Query

(SQ

L/F

TP

etc

…)

ERP

SCM

CRM

Images

Audio

and Video

Machine

Logs

Text

fWeb and

Social

Company Confidential - For Internal Use Only

Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.

SAS TYPICAL SOLUTION (ANALYTICS FOCUSED)

Hadoop

Data sources

cdr

Probes &

NE

counters

alarms

Social

data

Unstruct

ured

data

High Volume

Incremental,

Streaming &

batch

Data lake

Analytical tables

DQ gate

SAS analytics

DWHIn - database

Deploy

Data Integration

Operational

system

In - database

Deploy

In-Memory Analytics Engine(s)

DeployRead

Company Confidential - For Internal Use Only

Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.

SAS® Studio

WEB-BASED CLIENT

MPP DATASTOREBLADE ENVIRONMENT

HIGH LEVEL

ARCHITECTURE

DISTRIBUTED DEPLOYMENT ON COMMODITY HARDWARE

(DEDICATED RACK)

IN-MEMORY STORE

SAS® LASR™

ANALYTIC SERVER

SAS®

IN-MEMORY STATISTICS FOR HADOOP

Not part of

IMSTAT

Can be separated

HADOOP

SAS Embedded Process

WORKSPACE SERVER

MID-TIER

METADATASERVER (Optional)

OtherRDBMS Nonrelational Click Stream PC Files

Hadoop

(Cloudera,

Hortonworks)

Company Confidential - For Internal Use Only

Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.

SAS IMSTAT

FEATURE SUMMARY

www.sas.com/hpa

Data

Management• APPEND

• BALANCE

• TABLEINFO

• COLUMINFO

• COMPUTE

• DELETEROWS

• DROPTABLE

• FETCH

• PARTITION

• PROMOTE

• PURGETEMPTABLES

• SCHEMA

• SCORE

• SET

• TABLE

• UPDATE

Data Exploration• DISTINCT

• BOXPLOT

• CORR

• CROSSTAB

• DISTRIBUTIONINFO

• FREQUENCY

• HISTOGRAM

• KDE

• MDSUMMARY

• PERCENTILE

• SUMMARY

• TOPK

• GROUPBY

Descriptive Modeling• CLUSTER

• ARM

Deployment• SCORE

DATA

MANAGEMENT &

EXPLORATION

MODEL

DEVELOPMENT

MODEL

DEPLOYMENT

ANALYTICAL

LIFECYCLE

Miscellaneous• EXTERNAL (C API)

• FREE

• REPLAY

• SAVE

• STORE

Text Analytics• Parsing

• SVD

• Topic generation

• Document projection

• Sentiment analysis

Recommender• CLUSTER

• KNN

• ARM (Rule mining)

• SVD

• ENSEMBLE methods

Predictive Modeling• ASSESS

• DECISIONTREE

• FORECAST

• GENMODEL

• GLM

• LOGISTIC

• OPTIMIZE

• RANDOMWOODS

• Regression Trees

IMXFERsasiola

sashdat

Anyfile Reader

Company Confidential - For Internal Use Only

Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.

SAS ESP

Company Confidential - For Internal Use Only

Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.

Advanced analytics

Forcasting

Data management

Data sources

SAS HIGH LEVEL ARCHITECTURE

SAS Compute Grid

DWH

In - database

SAP BW

Other

sources

Xml &

files

Data streamsEvent Streaming

Engine

Real time decision engine

In-Memory Analytics Engine(s)

High

performance

analytics

Company Confidential - For Internal Use Only

Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.

ANALYTICS AND INSIGHTS ON STREAMING DATA

TECHNOLOGY

INTEGRATION

• SAS® Event Stream Processing Engine is integrated

into some SAS Solutions and can be deployed at the

front-end of most others

• Complements batch and real-time capabilities of SAS

solutions with streaming data analysis

STREAMING

ANALYTICS

• Enables SAS analytic solutions to process streaming

events.

• Leverages analytical model results to provide real time

insights and action on streaming data

• Enabled the deployment of additive and incremental

analytic models on streaming data

STREAMING

ANALYTICS

Company Confidential - For Internal Use Only

Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.

SAS® EVENT STREAM

PROCESSING ENGINE

3 KEY CHARACTERISTICS

TECHNOLOGY

The SAS® ESP Engine provides the architecture to process streams

of data and business events, on the move, prior to storage, when

events happen

SPEED

The SAS® ESP Engine can process huge volumes of streaming data

flowing at very high rates (Millions of events/sec) with very short

latency (<1 millisecond)

ACTIONABLE

INTELLIGENCE

The SAS® ESP Engine filters/aggregates/correlates the stream to

focus and detect specific events, patterns or characteristics, that will

help the business

Company Confidential - For Internal Use Only

Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.

SAS EVENT STREAM

PROCESSINGDATAFLOW CENTRIC

SAS EVENT STREAM PROCESSING ENGINE

DATA IN (Events)

DATA OUT(Events)

Event

Stream

Event

Stream

Event

StreamEvent

Stream

Event

Stream

Event

Stream

Event

Stream

Design of the rule model (called “Continuous Query”)

using components (called “Windows”)

Event

Stream

DATA IN (Events)

DATA IN (Events)

DATA OUT(Events)

SOURCE1

WINDOW

SOURCE2

WINDOW

SOURCE3

WINDOW

FILTER

WINDOW

CALCULATIONS

WINDOW

JOIN

WINDOW

JOIN

WINDOW

CALCULATIONS

WINDOW

THRESHOLD

WINDOW

Company Confidential - For Internal Use Only

Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.

BIG DATA TARGET ARCHITECTURE

Data &

Decision

Management

Hadoop Data store

In Memory Data store

Streaming data

dwh

Monitoring &

Reporting

transa

ction

transactions

transactions

datastream

Datastream

Data store Analytical tables

batc

h

Realtime

Event

stream

processing

Data

exploration &

visualization

Real time data stream

ale

rts

Probes &

sensors

alarms

Statistical

exploration &

Modelling

High performance analytics

scoringFore-

casting Modelling

Statistical

programming

Modelling

• Monitoring

• Operation center

• SMS

DQ

gate

DQ

gate

Web Services

alarms