big data and predictive analytics

21
Big Data and Predictive Analytics Unravel the BIG mystery Antarip Biswas Sept 26th 2013 “In God we trust, all others must bring data”

Upload: africa

Post on 23-Feb-2016

77 views

Category:

Documents


0 download

DESCRIPTION

Big Data and Predictive Analytics. Unravel the BIG mystery. “In God we trust, all others must bring data”. Antarip Biswas Sept 26th 2013. Agenda / Table of Contents. Introduction to Big Data. Drivers of Big Data Analytics. Data Sciences. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Big Data and Predictive Analytics

Big Data and Predictive Analytics

Unravel the BIG mystery

Antarip BiswasSept 26th 2013

“In God we trust, all others must bring data”

Page 2: Big Data and Predictive Analytics

Agenda / Table of Contents

2

Introduction to Big Data

Drivers of Big Data Analytics

Data Sciences

Use Cases and Success Stories – Class 3

Social Media Analytics

Technical Deep Dive, Real Life Projects

Real Life Projects – Class 3

Page 3: Big Data and Predictive Analytics

3

Use Cases and

Success Stories

Page 4: Big Data and Predictive Analytics

4

Success Stories - FareCast

Air fare prediction

For an online airfare predicts whether the fare will go UP or DOWN or STAY SAME in the future

Acquired for $100M by Microsoft

Employed machine learning technologies over big data

Page 5: Big Data and Predictive Analytics

5

Tesco Loyalty Program

Done by Dunnhumby

Data Data for Loyalty Program

Basic demographic information such as address, age, gender, the number of members in a household and their ages, dietary habits.

Purchase history appended Summary attributes

Cluster analysis

Crucible a massive database of not only applicant information and purchase

history, but also information purchased and collected elsewhere about participating consumers. Credit reports, loan applications, magazine subscription lists, Office for National Statistics, and the Land Registry are all sources of additional information that is stored in Crucible.

Page 6: Big Data and Predictive Analytics

6

Tesco Loyalty Program - Benefits

1. Loyalty

2. Cross-sells

3. Inventory, distribution and store network planning

4. Optimal targeting and use of manufacturer promotions

5. Consumer insight generation and marketing those insights

Tesco has achieved a 3.6 factor increase in coupon redemption ratesby using big-data predictive analytics to predict which consumers are more likely to redeem which coupons !

Page 7: Big Data and Predictive Analytics

7CONFIDENTIAL & PROPRIETARY

Big Data – Success Story

Page 8: Big Data and Predictive Analytics

8

Netflix Recommendations

Existing recommendation system – Cinematch

Korbell Team winner 107 algorithms explored Machine learning and Data mining Employed SVD and RBM

Achieved 8.43% improvement in recommendations over existing system

Page 9: Big Data and Predictive Analytics

9

Google Flu Spread Prediction

Prediction of the spread of flu in real time during H1N1 2009 Google tested a mammoth of 450 million different mathematical models

to test the search terms, comparing their predictions against the actual flu cases; 45 important parameters were founds

Model was tested when H1N1 crisis struck in 2009 and gave more meaningful and valuable real time information than any public health official system

Page 10: Big Data and Predictive Analytics

10

Prediction – High Frequency Trading

Objective: predict impact of earnings announcement on stock prices Use historical financial data to get a time series of quarterly expected and actual

earnings announcements Use historical financial data of stock price movements after the announcement

Approach Categorize stocks based on market capital so that similar sized companies are

grouped together Split the historical data into in sample (training) set and out sample (validation) set Fit a linear regression model on sample data where the independent variable

(feature) is the difference between the actual and estimated earnings, the dependent variable is the impact on stock price

Achieved return of 1% or 100 “basis points”

Page 11: Big Data and Predictive Analytics

11

Predictive Analytics for Couponing

Test Group List of households from

Analytic engine

Control GroupList of households

getting the same offer

Run the same campaign on both lists

Verify efficacy of household recommendation demonstrating

significant variance from Control Group

Measure resultsRedemption (primary),

Clips (secondary)

Evaluate impact – Control Group vs.

Test Group

Page 12: Big Data and Predictive Analytics

12

Improve Recommendations/Allocations

Customer deviation in buying behavior refined by customer profile changes

• Taxonomy based approach to identify business semantic Major events that

determine change in buying pattern: Location change, change in marital status, change in income group, birth of child, …

Source for this information social channels, purchase deviation, …

Identify specific product categories relevant for the major event Association of product

categories to various customer classification

For instance customers with kids buy candies; or customers with pets buy pet-food

Expl

orat

ory

tech

niqu

es

Customer Transactions

Customer 360

Association &Clustering

Customer groups based on classifica-

tions

Matching / Filtering

PersonalizedRecommendation

List

Products eligible forrecommendation

Clusterassignments

Probabilistic product affinities based on segment’s behavior

Target Recommen-dation

Products ListFor target customer’s

cluster

Refine clas-sifiers

Campaign results

Time Series

Time specific prod-uct and associated

prods

Product classification and Customer seg-ment association

Page 13: Big Data and Predictive Analytics

13

Improve Recommendations/Allocations

Products bought by similar customers, but not by current customer

• Identification of similar customers more accurately with availability of extensive profile information Classification of customers

by predetermined attributes Usage of exploratory

techniques to identify clusters of similar customers

• Identify product propensity for specific segments Determined by clustering

and classification techniques

Customer Transactions

Customer 360 - NoSQL

Association &Clustering

Segment specificProduct lists

Customer groups based on classifica-

tions

Matching / Filtering

PersonalizedRecommendation

List

Products eligible forrecommendation

Clusterassignments

Exploratory techniques

Probabilistic product affinities based on segment’s behavior

Target Recommen-dation

Products ListFor target customer’s

cluster

Refine classifiers

Campaign results

Page 14: Big Data and Predictive Analytics

14

Expl

orat

ory

tech

niqu

es

Improve Recommendations/Allocations

Determine correlated items not bought by current customer

• Link association to determine products that are bought together – bread and butter, wine and cheese, …

• Identify products bought by customer, but not the correlated item

• Recommendation based on absence of product

Customer Transactions

Customer 360 - NoSQL

Association & Clustering

Customer groups based on classifica-tions

Matching / Filtering

PersonalizedRecommendationList

Products eligible forrecommendation

Clusterassignments

Probabilistic product affinities based on segment’s behavior

Target Recommen-dation

Products ListFor target customer’scluster

Refine clas-sifiers

Campaign results

Association rules

Segment specific product and associ-ated prods

Page 15: Big Data and Predictive Analytics

Identify what customers want – and when

15

Transaction details for filtered customer list : Buyers of Cat food/ Cat food Generic 4

oz

• Salary, • Zipcode,• No of kids,• House owner• Gender

Cross-tabulated data• Brand1, Brand2,… Brandn• Weight, Size, Volume, • Brand• Category1, Categgory2, ..• Offer clipped category1, …

Affinity models

Associated Variables: Single or multiple variables by different segments

Scattergrams Correlation Regression

using multi-model approach

Prediction models

Customer list by probability

Transaction details merged with customer data to provide contextual information as required for inference

Models generated using historical data by the analytic engine to identify affinity of specific variables

Application of variable affinity to customer list to identify probability of non-purchasers to purchase cat food / cat food Generic 4 oz

Sample technique

Page 16: Big Data and Predictive Analytics

Contextualize information, correlate facts, predict and improve

16

Contextualize

•Offers clipped, •Customer transactions, •Product taxonomy

Correlate

•Associated products•Associated variables•Category association

Predict and improve

•Probability of customer purchase – based on rule-sets, adjusted, Machine learning to improve recommendation, Predict customer patterns based on empirical data

Information from multiple operational and data warehousing systems that contain

customer data, purchase details, …

Information from social channels that provide supporting information to create detailed customer profile

Rule sets from knowledgebase accumulated over the years

Advanced Analytics - Product association

Pet foods

Cat food

Brand 1Variety1

Variety2

Brand 1Variety1

Variety2

Pet owners

Cat owners

Affinity Carpet cleaners

Cat grooming tools

needLitter box

Litter

Filter

Buyer of Cat Food / Generic Cat food 4 ounce

Transaction details for this customer list

Filtered high vol. categories

Associated products by affinity + confidence

Inferred rules

Customer list, probability

Page 17: Big Data and Predictive Analytics

Obama for America Campaign 2012

17

Canvassing fro

m

youth

Canvassing from older

generation

Page 18: Big Data and Predictive Analytics

Obama for America Campaign 2012

18

Obama for America data science team used social media as a tool to efficiently recruit human resources it needed leading into the election’s home stretch

Primary objective - determine who were the best messengers, who they might be able to persuade, and what actions they might be willing to take

Reason to harness social media - • Youth majority unreachable on phone calls or neighborhood

canvassing, but always connected to some form of social media• Optimize resources by enabling to transform voter intelligence to

actionable intelligence.

Page 19: Big Data and Predictive Analytics

Traffic Congestion Control

19

• Big Data Analytics used for traffic congestion control

• Enables travellers to plan their routes to their destinations

• Enables traffic controllers to effectively route cars in order to avoid as much congestion as possible

• Implemented in LA by a joint initiative of Xerox and the LA transport department

Page 20: Big Data and Predictive Analytics

DNA Sequencing and Cancer Therapies

20

• Previously small portions of people’s genes sequenced

• Big Data technology enables entire DNA to be sequenced which is largely helpful for cancer patients

• Enabled selecting therapies based on genetic markers and person-specific genetic makeup

• If one treatment became ineffective due to cancer mutation, use different therapies based on other gene markers.

• Steve Jobs one of the first people in the world to have entire DNA sequenced

Page 21: Big Data and Predictive Analytics

21

Thank You