peter mravčák - sapevents.edgesuite.net · predikt. ívne analýzy (nielen) pre big data . peter...

43
Prediktívne analýzy (nielen) pre Big Data Peter Mravčák SAP Slovensko s.r.o.

Upload: phungthu

Post on 16-Feb-2019

217 views

Category:

Documents


0 download

TRANSCRIPT

Prediktívne analýzy (nielen) pre Big Data

Peter Mravčák SAP Slovensko s.r.o.

Prediktívne analýzy a BigData Making (Business) Sense Out of Big Data

© 2012 SAP AG. All rights reserved. 3

Digital Era: Big Data

GPS, RFID,

Hypervisor, Web Servers,

Email, Messaging Clickstreams, Mobile,

Telephony, IVR, Databases, Sensors, Telematics, Storage,

Servers, Security Devices, Desktops

© 2012 SAP AG. All rights reserved. 4

The Potential to Connect “Things“ (and get data:) is immense

Turbines, windmills, UPS, Batteries, Generators, Meters, Drills, Fuels Cells, etc. Source: Beecham Research

HVAC Transport, Fire & Safety, Lighting, Security, Access etc.

TVs, Power Systems, Dishwashers, Lighting, Washer/Dryers. Meters/Lights, Alarms etc.

Pumps, Valves, Vats, Conveyors, Pipelines, Motors, Drives, Converting, Fabrication, Assembly/Packaging, Vessels/Tanks etc. MRI, PDAs, Implants,

Surgical Equip., Pumps, Monitors, Telmedicine etc.

Servers, Storage, PCs, routers, Switches, PBXs etc.

Cars, Ambulances, Fire, Breakdown, Lone Worker, Homeland Security, Environment Monitor, etc.

Vehicles, Lights, Ships, Planes, Signage, Tolls, Containers, etc.

POS Terminals, Tags, Cash Registers, Vending machines, Signs etc.

Buildings

Consumer & Home

IT & Networks

Security / Public Safety

Industrial Healthcare

& Life Sciences Energy

Retail

Transportation

Commercial, Institutional, Industrial

Infrastructure, Awareness & Safety, Comfort & Convenience

Resource automation, Fluid/Processes, Converting/ Discrete

Care, In Vivo/Home, Research

Supply/Demand, Alternative, Oil/Gas

Stores, Hospitality, Specialty

Trans Systems, Vehicles, Non-Vehicular

Tracking, Equipment, Surveillance

Enterprise, Public

© 2012 SAP AG. All rights reserved. 5

Data, data, data... So what?

© 2012 SAP AG. All rights reserved. 6

CRM ERP Billing Click- streams Mobile Social

Media

Logs Sensors Email IoT

Predictive Analytics Extracting Information/Knowledge from Data

© 2012 SAP AG. All rights reserved. 7

Predictive Analytics Extracting Information/Knowledge from Data

© 2012 SAP AG. All rights reserved. 8

Example: Machine Health Prediction Same principles apply to Churn / Propensity-to-Buy / Ebola detection etc. predictions

• Predict machine/part failure to lower service costs and increase machine up-time

• Potentially interesting attributes (predictors) : • Sensor data like temperatures, pressures, machine conditions • Failure codes • Status sequences • Machine master data

© 2012 SAP AG. All rights reserved. 9

Measurements: every 10 seconds Dataset: 1.200 variables 140.000 records Target variable : Failure in next 24h

Example: Machine Health Prediction Using Sensor Data

Prediktívne analýzy 101

© 2012 SAP AG. All rights reserved. 11

What is Predictive Analytics? Typical questions predictive analytics / data mining may answer

Classification Who will (need intervention | be at risk of fraud) next (week | month | year)?

Prediction How will (debt | crime | budget) be next (week | month)?

Forecasting How will the (budget | debt | grant) be over the next (year | month)?

Clustering/Segmentation What are the groups of (constituents | businesses | employees) with a similar (behaviour | profile)?

(Social) Network Analysis Analyse interactions to identify (communities | influencers)

Association Rules Analyse transactions to identity events likely to occur together

© 2012 SAP AG. All rights reserved. 12

Doing Predictions basically is a Four Step Approach

Historic data is used to learn. These leanings are used to create prediction models. These models can than be applied to current data. They need to be systematically controlled & maintained to ensure best possible results.

Prediction Models

Control & Maintenance

Current Data Transactions, sensors, ...

Historic Data Transactions, demographics, sensors, ...

© 2012 SAP AG. All rights reserved. 13

Doing Predictions...

Historic Data Sensors, transactions,...

Prediction Models

Control & Maintenance

Current Data Sensors ...

© 2012 SAP AG. All rights reserved. 14

Creating Analytical Dataset Example

Analytical Record Domain 1

Var1 Var2

Var3

...

Domain 2

CRM ERP Sensor data DWH

• Real-time sensor data (billion records/year)

• CRM, ERP, EAM, DWH data

© 2012 SAP AG. All rights reserved. 15

Doing Predictions...

Historic Data Sensors, transactions,...

Prediction Models

Control & Maintenance

Derived attributes

Current Data Sensors ...

© 2012 SAP AG. All rights reserved. 16

Analytical Record

Domain 1

Var1 Var2

Var3

...

Varn -2

Var n-1

Var n

Domain 2 Domain N-1 Domain N

Derived Attributes: • Time Window Aggregates • Sequences • Text/Log Analytics • Link/Network/Social Attributes • Co-location Events/transactions • Geolocation Path Identification

CRM ERP Sensor data DWH

Enriching Analytical Dataset Derived Attributes

© 2012 SAP AG. All rights reserved. 17

Enriching Analytical Dataset Derived Attributes

© 2012 SAP AG. All rights reserved. 18

Enriching Analytical Dataset Derived Attributes/Features

© 2012 SAP AG. All rights reserved. 19

Enriching Analytical Dataset Derived Attributes

© 2012 SAP AG. All rights reserved. 20

Reusable Reduces Human Error Self-Service Prepare

Metadata based modelling

Create 1000’s of derived attributes

Define metadata once

Select time-stamped population

Builds analytic dataset automatically

SAP InfiniteInsight Explorer Analytical data sets with clicks not code

© 2012 SAP AG. All rights reserved. 21

Historic Data Sensors, transactions,...

Prediction Models

Control & Maintenance

Derived attributes

Current Data Sensors ...

Doing Predictions...

© 2012 SAP AG. All rights reserved. 22

Classification Who will (need intervention | be at risk of fraud) next (week | month | year)?

Prediction How will (debt | crime | budget) be next (week | month)?

Forecasting How will the (budget | debt | grant) be over the next (year | month)?

(Social) Network Analysis Analyse interactions to identify (communities | influencers)

Clustering/Segmentation What are the groups of (constituents | businesses | employees) with a similar (behaviour | profile)?

Association Rules Analyse transactions to identity events likely to occur together

Prediction modelling Families of Problems & Algorithms Used for Predictive Analytics

© 2012 SAP AG. All rights reserved. 23

Modelling with SAP InfiniteInsight

Automatization of repeatable & time consuming modeling steps: • Missing data • Outliers • Skewed distributions • Correlations • Data encoding etc.

Automatised model building & optimization

© 2012 SAP AG. All rights reserved. 24

Easy to Use Time to Market More Models Build

Fully automated modeling process • Regression • Classification • Segmentation • Time series forecasting • Association rules

Identify key variables Executive and operational reports

SAP InfiniteInsight Modeler Predictive power in days not months

© 2012 SAP AG. All rights reserved. 25

Improve Insight Extend Reach Boost ROI Social

Use link/social variables for enhanced prediction

Identify communities amongst your customers

Find influencers to make your campaigns viral

SAP InfiniteInsight (Social) Networks Analysis Improve insight with (social) networks

© 2012 SAP AG. All rights reserved. 26

Historic Data Sensors, transactions,...

Prediction Models

Current Data Sensors ...

Control & Maintenance

Derived attributes

Doing Predictions...

© 2012 SAP AG. All rights reserved. 27

SAP InfiniteInsight Scorer Put scores into action

One-click deployment of scores into production environment

In-database scoring (SQL)

Interface with business apps via scoring equations in

• C++ • Java • PMML • SAS

Non-Intrusive Time to Value Repeatable Deploy

© 2012 SAP AG. All rights reserved. 28

Refresh analytic data sets and models automatically

Deploy scores to production

Alert on data and model deviations

No Programming Scale Manage By Exception Improve

SAP InfiniteInsight Factory Every model at peak performance

Prediktívne analýzy a BigData Výzvy

© 2012 SAP AG. All rights reserved. 30

Challenge No 1: Traditional predictive analytics approach too long ... to prepare, deploy and manage models

Source: Adopted from Phases of the Pattern Mining Process by Gartner

• Manual • Repetitive • Prone to error

© 2012 SAP AG. All rights reserved. 31

Many models needed to cover all needs

N machine types

M failure types

= N x M Models

Historic Data Sensors, transactions,...

Prediction Models

Current Sensor Data

Control & Maintenance

Derived attributes

K marketing campaigns

M communication channels

N customer segments

= K x N x M Models

© 2012 SAP AG. All rights reserved. 32

Need for Faster and Better Predictive Models

• Advanced techniques & tools for ADS building • Automation of model creation

• Automated data preparation • High quality – comparable/better models • Short „Time-to-Market“ • Low TCO per model

• Production use in real-time environments • Control & Maintenance of models

© 2012 SAP AG. All rights reserved. 33

Model transformed in SQL Code

Dataset automation

SQL Code

To real time environments

SAP InfiniteInsight

© 2012 SAP AG. All rights reserved. 34

Challenge No 2: Traditional predictive modeling can’t handle BigData

Can’t scale across wide data sets

Hard to interpret semi- and unstructured data

Exhaust data scientists’ “a priori” knowledge

© 2012 SAP AG. All rights reserved. 35

High Dimensionality Problems Data preparation&encoding, model overfiting, scalability of modeling&deployment

CRM ERP Billing

Profile Products

Purchase History

Usage

Before 2010 (Transactions)

SmartGrid Web Mobile

Social Media

Now (Behaviors)

Logs Sensors IoT M2M

CRM

ERP

Cam

paig

n

100’s of Derived Attributes Big Data

Handcrafted SAP InfiniteInsight

© 2012 SAP AG. All rights reserved. 36

SAP InfiniteInsight – Automated Machine Learning Approach The more data, the better models & More data (generally) “beats” better algorithms

20 Variables Demographics +

Simple Aggregates

500 Variables Time Pivoted Behavior

Social Communities

© 2012 SAP AG. All rights reserved. 37

SAP InfiniteInsight – Automated Machine Learning Approach Identify any and all information that has predictive power

© 2012 SAP AG. All rights reserved. 38

Real-life Examples Deep Insight from Big Data

Customer No of Variables UniCredit 675 Cox 800 Sears 900 Large Wireless Telco 1,000 Lowe’s 1,100 Mobilink 1,100 Large UK Retail Bank 2,000 Experian 2,000 Vodafone D2 2,500 MonotaRO 2,500 Bell Canada 3,000 Rogers Wireless 3,000 Discover 10,000 U.S. eBusiness 15,000 Shutterfly 28,000+

SAP prediktívne analýzy a BigData Zhrnutie

© 2012 SAP AG. All rights reserved. 40

Why SAP InfiniteInsight ?

Productive Predictive analytics process made efficient. Automated data preparation, modeling and deployments tasks. Models in minutes or hours.

No PhD Required Easy yet sophisticated. Model building and deployment in clicks.

Big Data Made Easy Scales for terabytes and petabytes of data. Rapid insight from 1,000's and 10,000's of variables with no expert intervention.

Fast & Accurate Automation cuts human time. Increased accuracy by including all potentially predictive variables and eliminating manual errors

Corporate Knowledge Models incorporated into the business process. Knowledge shared and retained across the organization.

Quick Win Quick installation. Short training. Leapfrog to best-in-class analytics.

Lower TCO Leverage existing infrastructure. No need for additional resources. Payback in weeks.

© 2012 SAP AG. All rights reserved. 41

SAP InfiniteInsight & Big data

Not just large records, but high dimensions

Lack of data knowledge, complex domain knowledge

Textual information, weblogs, transactions, phone calls,

location data, sensor signals…

Fast modeling & scoring, (social) network analysis, recommendations

© 2012 SAP AG. All rights reserved. 42

Some Proof Points

Allegro: 100M+ personalized recommendations a day Mobilink: Social graphs on 70M distinct nodes and 900M links out of 4.3 billion CDRs Shutterfly: Model with 28.000+ columns

Vodafone: Churn and X-sell management with 700 models Rhapsody: Survival analysis Mobilink: Find the influencers for a variety of business questions

E.ON: Analyze call center logs (text) to enhance customer targeting Firmenich: Text, memos & chemical attributes used to predict the likelihood of a fragrance to sell Vodafone: Variety of data types (transactions, geo, …)

Mobilink: Graphs built in 30 hours Shutterfly: Model built in a day Telco: Processing SNA analysis on billions of CDR transactions in hours

Thank You!

Peter Mravčák [email protected]