data mining & knowledge discovery

61
Data Mining & Knowledge Discovery: Personalization Technologies for One to One Marketing Bhagi Narahari

Upload: nirmala-last

Post on 30-Oct-2014

902 views

Category:

Technology


5 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Data Mining & Knowledge Discovery

Data Mining & Knowledge Discovery: Personalization Technologies for One to One Marketing

Bhagi Narahari

Page 2: Data Mining & Knowledge Discovery

Outline of Lecture

What and Why of Data Mining and KDD? Importance and Applications to E-

commerce How ? Personalization

personalized one-to-one business on the internet Part I: Overview of Personalization Part 2: The Data Mining Process

Page 3: Data Mining & Knowledge Discovery

Predictive Modelling

A “black box” that makes predictions about the future based on information from the past and present

Age

balance

income

How much will customerspend on next catalog order ?

Model

(Crystal ball?)

Page 4: Data Mining & Knowledge Discovery

What is Data Mining?

It is the exploration and analysis by automatic or semiautomatic means, of large quantities of data in order to discover meaningful patterns and rules.

Page 5: Data Mining & Knowledge Discovery

Why now? (A historical perspective)

Because data is now available (wasn’t always)

Distributed sources Technology evolution Competition (do what you can to outdo)

Page 6: Data Mining & Knowledge Discovery

Why DM?

CRM (Customer Relationship Management) - important success factor in E-commerce price differentiation no longer enough customer service more important

Links with suppliers already exist (B2B) - JIT, joint forecasting, planning, procurement

Current emphasis on links with customers - feedback, input in design, etc.

Page 7: Data Mining & Knowledge Discovery

CRM

Identifying profitable customers Better service for more valued customers Retaining profitable customers

Getting a new customer costs a lot more than retaining an existing one

takes 5X to acquire new customers (Peppers&Rogers)

An increase from 75% to 80% in retention reduces costs by about 10%

Larger share of customer pool

Page 8: Data Mining & Knowledge Discovery

CRM

Product differentiations based on “price” and “quality” are increasingly difficult need to differentiate based on relationships

Increasingly sophisticated mass marketing increases probability of success cost of mass marketing is driven down by

internet (reach)

Page 9: Data Mining & Knowledge Discovery

CRM

Goal: Positively interact with your customers and prospects define customer segments lights out execution of campaigns against

segments attribution and evaluation of responses

Page 10: Data Mining & Knowledge Discovery

Personalization in Ecommerce

Positive: much better chance of personalization

customer identificationtracking across visits and within visit

ability to do ‘what if’ experiments Negative:

cost of switching is much less is web based shopping good for ‘touchy feely’ things price differentiation across geographies not easy

Page 11: Data Mining & Knowledge Discovery

Personalization

ProductDiscovery

ProductEvaluation

TermsNegotiation

OrderPlacement

OrderPayment

Customer Service& Support

MarketResearch

Market Stimulation/Education

TermsNegotiations

OrderReceipt

Order billingand paymentmanagement

Customer Service& Support

ProducerChain

Customer Chain

Page 12: Data Mining & Knowledge Discovery

B2C Personalization Objectives

Know the customer profile - registration, cookies

Determine what the customer wants Ask: Questionnaires

what is the incentive for truthfulness Deduce: click streams, history, collaborative filtering

(Amazon!!) Deliver

Customize the look and feel offer special promotions offer customized products (Holy Grail)

Page 13: Data Mining & Knowledge Discovery

Use of Personalization

In addition to storing and retrieving information on the individual’s profile “on the fly” can also use mining software to analyze the

information in the database to make recommendations or comments specific to the individual

Page 14: Data Mining & Knowledge Discovery

Impact of Personalization

Customer relationship Learn more about customers

learn and understand the why and how they prefer to do business with your organization

In tandem with tracking provides you with a tool to monitor your website what works, what does’nt, what makes your

audience “click”

Page 15: Data Mining & Knowledge Discovery

Security and Privacy as Barrier to Personalization

Large number of customers concerned about personalization (double click!)

will they pay more to preserve privacy? Some falsify info to preserve privacy customers give more info to trusted site need secure site with clear privacy policies

stated at site

Page 16: Data Mining & Knowledge Discovery

Personalization

Know the Customer IdentifyGive the customerhis/her wants

QuestionnairesPast historyClick Streams

Profile

LoginCredit Card#

Predicting the wantsMapping to“peers”

Extrapolationfrom past

Extrapolationfrom peers (firefly.com)

Look&feel

Product selection&promotions

NewProduct

Page 17: Data Mining & Knowledge Discovery

Know the customer

Cookies backlash (users do not trust them)

OPS: Open Profiling Standard combined with eTrust certification

Registration User certificates: logons

Key Question: how do you know that this customer is same as that goes

to your storefront need standard warehouse techniques like address

resolution, cred.card resolution etc.

Page 18: Data Mining & Knowledge Discovery

Know the Customer:OPS

Two drivers user should not retype again & again basic info data is used in a trusted fashion (not leaked, other data

not see etc.) by users Two parts

Common datademographics (country,zip,age,gender)Contact (name, address, CreditCard…)User agent preferences

Per-site Sections (can be shared across sites, if user allows)

Page 19: Data Mining & Knowledge Discovery

What if no profile???

Deduce collect information: history of purchases, time

spent on pages ask questions (offer rewards) combine with database marketing data

Predict behaviour buy probabilities build customer relationship

mining is key!

Page 20: Data Mining & Knowledge Discovery

Personalization: Actions to take- Look and feel

Personalized pages specific data specific presentation and design sent through various mediums

Manage Customers not products: 1-1 marketing Strategy.com

deliver personalized pageseg: stock portfolio, personal info including alarm,

travel reservations use different mediums

WAP enable phones (eg: Sprint PCS Web)

Page 21: Data Mining & Knowledge Discovery

Storefront Personalization

Customers visit Store Website Howard buys ties Rob buys Baby Products Ray buys toys Amy buys clothes

Provide a view of the store to these customers present them with what they are likely to buy?

Howard: ties, and men’s formal wearRay: Toys and gadgetsRob: Infant, Toddler sectionAmy: Women’s Clothes section

Page 22: Data Mining & Knowledge Discovery

More Actions: Product Presentations & Promotions

Basic Storefront Product Hierarchy

Clothes

Men’s Women’s Children’s

Shirts Pants Casuals Evening Infants Kids

John’s ViewMary’s View

Page 23: Data Mining & Knowledge Discovery

BroadVision.com

BroadVision One-to-One application allows businesses to develop and manage

personalized web sites interactively profile each visitor and dynamically

match info based on their profile and business rules specified by providers of site & services

users do not go through hoops finding relevant data

Page 24: Data Mining & Knowledge Discovery

DM Terminology

OLAP

ROLAP

Data Warehouse

Data Marts

Data Stores

Neural Networks

Genetic Algorithms

Data Mining

Rule Based Systems

SQL

Page 25: Data Mining & Knowledge Discovery

How?

Determine probability of buying as a function of customer attributes such as age, income, past buying patterns, ..

Target customers by ranking from highest to lowest probabilities

Other techniques: Decision Trees, Neural Networks, ….

Page 26: Data Mining & Knowledge Discovery

KDD

Knowledge Discovery in Databases It is the process of identifying valid, novel,

potentially useful, and understandable patterns in data (Fayyad, Piatesky-Shapiro, and Smyth)

It involves data preparation, pattern extraction, knowledge evaluation, and refinement, in iteration

Page 27: Data Mining & Knowledge Discovery

KDD

Data mining is a step in the KDD process that involves the application of certain algorithms to extract patterns

Steps in the KDD process:Select DataData Cleansing and Pre-processingData MiningResults interpretationImplementation

Page 28: Data Mining & Knowledge Discovery

Pre-processing in KDD

80-90% of KDD process is spent here Why?

Operational data is incomplete, inconsistent, in different formats across systems

DM techniques might require data in a specific format

Page 29: Data Mining & Knowledge Discovery

Data Mining Problems

Classification/Segmentation Binary (Yes/No) Multiple Category (Large/Medium/Small)

Forecasting (how much) Association Rule extraction (market basket

analysis) Sequence detection

balance increase -> missed payment -> default

Page 30: Data Mining & Knowledge Discovery

Typical DM tasks

Prediction and Classification Directed Decision trees, Neural networks, memory based

reasoning, logistic regression Examples:

How many units will be sold on a given day?What will be the stock price on a given day?Will a customer buy the product or not?

Page 31: Data Mining & Knowledge Discovery

DM tasks

Affinity grouping Undirected Which products go together naturally? The beer-diaper syndrome? Market basket analysis Examples:

Which products peak in demand simultaneously?

Page 32: Data Mining & Knowledge Discovery

DM tasks

Clustering task Undirected Segmenting into similar clusters Different from classification Examples

Customers with similar buying profilesProducts with similar demand patterns

Page 33: Data Mining & Knowledge Discovery

DM success factors

Integration with data warehouses and DSS Users should develop a good understanding

of techniques Recognize that these tools cannot

automatically find patterns without being told what to do

Most methods now used are extensions of analytical methods that have been around for decades

Page 34: Data Mining & Knowledge Discovery

Legal and Ethical Issues

Privacy concerns becoming more important will impact the way that data can be used and analyzed ownership issues European data laws have implications on US

Often data included in the data warehouse cannot legally be used in decision making process Race, Gender, Age

Data contamination will become critical

Page 35: Data Mining & Knowledge Discovery

Making Decisions

Data Warehouse?

Models

Decisions

Data Data Data Data

Page 36: Data Mining & Knowledge Discovery

Data Warehouse

Bill Inmon: “A data warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data in support of management decisions.”

is managed data that is situated after and outside the operational systems

Page 37: Data Mining & Knowledge Discovery

Data Warehousing

Increasing need to find, summarize, and interpret large amounts of data effectively Especially when data is distributed across many

different databases Transaction processing systems not easily

accessible to other systems Plus TP systems have time constraints

Page 38: Data Mining & Knowledge Discovery

Enter the Data Warehouse

To deliver decision data to decision makers by integrating data from various TPS to a

single storage which can then feed a range of decision support

applications through an OLAP interface!

Page 39: Data Mining & Knowledge Discovery

Data Complications

Noise Missing data Transformation

numeric data text

Need to differentiate between variables you can control and those you cannot Actionable: size of discount, number of offers etc. Non-actionable: age, income ..

Page 40: Data Mining & Knowledge Discovery

Data Mining Techniques

Market Basket Analysis Memory Based Reasoning Cluster Detection Link Analysis Decision Trees and Rule Induction Neural Networks Genetic Algorithms OLAP

Page 41: Data Mining & Knowledge Discovery

OLAP: On Line Analytical Processing

While a data warehouse brings data together, OLAP lets you look at data and manipulate interactively

OLAP allows users to “slice and dice” data Allows user to drill-down into detail data

Page 42: Data Mining & Knowledge Discovery

Relational vs Multidimensional

Page 43: Data Mining & Knowledge Discovery

Consolidations

Page 44: Data Mining & Knowledge Discovery

Multidimensional Terminology

East, West, Central are input members of the Region dimension. Total Region is an output member of the Region dimension. Similarly, Nuts, Screws, Bolts, Washers, and Total are members of the Product dimension.

Variables are typically numerical measures like Sales, Costs, Profits, Expenses, and so forth.

Dimensions are roughly equivalent to Fields in a relational database. Cells are roughly equivalent to Records.

Page 45: Data Mining & Knowledge Discovery

Steps in DW and OLAP

Data Loader

Data Converter

Data Scrubber

Data Transformer

Data Warehouse OLAP Server OLAP Interface

Data Data Data

Page 46: Data Mining & Knowledge Discovery
Page 47: Data Mining & Knowledge Discovery

Cluster Detection

Undirected data mining Finds records that are similar to each other

(clusters) Clusters are found using geometric

methods, statistical methods, and neural networks

Good way to start any analysis

Page 48: Data Mining & Knowledge Discovery

Market Basket Analysis

Form of clustering used for finding items that occur together (in a transaction or market basket)

Likelihood of different products being purchased together as rules

Planning store layouts, limiting specials to one of the products in a set,...

Page 49: Data Mining & Knowledge Discovery

Transaction data

Customer Products

1 Milk, Soda

2 Milk, Beer,diapers

3 Milk, cleaner

4 Beer, diapers,soda

5 Beer, soda

Page 50: Data Mining & Knowledge Discovery

Co-occurrence matrix

Beer Cleaner

Milk Soda Diapers

Beer 3 0 1 2 2

Clea 0 1 1 0 0

Milk 1 1 3 0 1

Soda 2 1 0 3 1

Diap 2 0 1 1 2

Page 51: Data Mining & Knowledge Discovery

Support and confidence

For a rule that says: If A then B Support is defined as the ratio of number of

transactions that include both A and B to total number of transactions

Confidence is defined by the ratio of the number of transactions that include both A and B to the number of transactions that include A.

How do you specify ‘significant’ support and confidence ?

Page 52: Data Mining & Knowledge Discovery

Algorithm for Finding Association Rules

Input is Min-Support and Min-Confidence Find all sets of items with Min-Support

(frequent itemsets) Frequent Itemsets Property: Every subset of a

frequent itemset must also be a frequent itemsetiterative algorithm: start with frequent

itemsets with one item, and construct larger itemsets using only smaller frequent itemsets.

Page 53: Data Mining & Knowledge Discovery

MBA example

Using the sample data create a co-occurrence table

Let relevant Support = 25% and Confidence= 50%: Beer and Diapers appear in 3/5= 60% If beer then diapers has confidence of 2/3=67% Thus, “If customer buys beer then customer buys

diapers” satisfies 25% support & 50% confidence

Conclusion drawn by mining system: Customers who buy beer also buy diapers

Page 54: Data Mining & Knowledge Discovery

Applying MBA Results

Is the relationship useful ? Beer and Diapers may not be of use Victoria’s Secret transaction mining led to specific

apparel sent to specific stores -- Microstrategy software

Who defines “usefullness” only as good as rules specified by

humans/marketing workforce NBA mining: designers of s/w did not include height

mismatches at first…coaches made the correction

Page 55: Data Mining & Knowledge Discovery

Data Mining Algorithms

Four algorithms commonly cited Association Rule (used in over 90% of the cases!) Nearest Neighbor

quick and easy but models get large Decision Tree Neural Network

difficult to interpret and large time

Page 56: Data Mining & Knowledge Discovery

Decision Trees

Series of if/then rules easy to understand, complexity in implementation

No

yes

Balance<10K Balance > 10K

Age > 48Age< 48

yes

Page 57: Data Mining & Knowledge Discovery

CRM and Data Mining

Recall:customer segmentation is key in CRM data mining can help improve understanding of

customer behaviourhelps located meaningful segments from

customer data users want to turn that understanding into an

automated interactions with their customers

Page 58: Data Mining & Knowledge Discovery

Integrating Data Mining & CRM

Data mining application owns the modelling process

CRM application owns the campaign execution process

Goals: minimize pain involved with using models in

campaigns score records only when and where necessary

Page 59: Data Mining & Knowledge Discovery

Integrating Mining & CRM

Step 1: analytic user creates model using mining system model is then exported into campaign

management system Step 2:

Marketing user creates campaign that includes predictive models

when campaign executes, data mining engine scores customers dynamically

Page 60: Data Mining & Knowledge Discovery

Benefits of Integration

Pre-generated model selection Score defined segments “on the fly”

eliminates need to score entire database improve efficiency of campaigns

Reduces manual intervention and error Accelerates the market cycle

increases likelihood of reaching customers before competitors

improves campaign results and lower costs

Page 61: Data Mining & Knowledge Discovery

Summary

“Using the new media of the one-to-one future, you will be able to communicate directly with customers individually…..” - Don Peppers & Martha Rogers (One-to-One Future)

“What are you afraid of?…..Even if you’re not afraid of these things, the beauty is,with proper marketing, we can make you afraid”-- Michael Saylor, CEO Microstrategy.