business intelligence & data mining-14

Upload: binzidd007

Post on 02-Jun-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 Business Intelligence & Data Mining-14

    1/25

    Lessons & Challenges

    from Mining Retail E-Commerce Data

    Kohavi et. al (2004)

  • 8/10/2019 Business Intelligence & Data Mining-14

    2/25

    Motivation

    n Important domain of data miningn Massive amounts of data is collected

    n Data collection is automatic not prone to errors

    n Data is Rich has a lot of potential for discoveringpatterns

    n Three types of Data: Clickstream data, Transactionaldata and User Profile data

    n Combined mining of these 3 types of data is possible

    90%

    10%

  • 8/10/2019 Business Intelligence & Data Mining-14

    3/25

    The E-Commerce Data Mining Suite

    n E-Commerce data mining suite developed byBlue Martini Software

    n Purchased and used by many Brand Nameretailers: Debenhams, Harley Davidson,Sainburys, Sprint etc.

    n System designed specifically for BI

    n End-to-end solution:

    n Data Collection

    n Data Warehousing

    n Data Transformations

    n Visualization

    n Data Mining

  • 8/10/2019 Business Intelligence & Data Mining-14

    4/25

    The Business Intelligence Process

    Data Cleaning

    Data Integration

    Data Sources

    Data Warehouse

    Task-

    relevant Data

    Selection and

    Reduction

    Data Mining

    Pattern Evaluation

  • 8/10/2019 Business Intelligence & Data Mining-14

    5/25

    The Experience Shared

    Business Lessons & Technical Lessons have been shared Data Miningprojects executed for more than 20 clients

    Clients from different industry verticals with varying

    business models

    Clients spread over: US, Europe, Asia & Africa

    Data Sizes upto 100 million records Diverse data:

    Clickstream

    User Profile

    Demographic

    Response to Mail CampaignsOrders Placed through website / telephone / in-store

  • 8/10/2019 Business Intelligence & Data Mining-14

    6/25

    Business Lessons

  • 8/10/2019 Business Intelligence & Data Mining-14

    7/25

    Requirements Gathering is Challenging

    n Clients are reluctant to list business questionsn They may not know what questions to ask

    n They do not understand the underlying technologyand how much it can do

    n Clients present standard reporting type

    questions, e.g.n What is the gender-wise distribution of customers?

    n What is the region-wise response rate of the mailcampaign?

    n Instead of asking questions like:n What are the characteristics of customers who spend

    more than $500?

    n What kind of people responded to the mailcampaign?

  • 8/10/2019 Business Intelligence & Data Mining-14

    8/25

    Educating the Users

    n Involving the users is critical for success

    n Understanding the business

    n Uncovering the real needs

    n Users will have to educated

    n What can be achieved by BI

    n Prototypes / Demo Systems

    n

    Case studies

  • 8/10/2019 Business Intelligence & Data Mining-14

    9/25

    Business Events

    n The architecture recordsn Every customer search and number of results returned:

    Too many rows, No rows

    n Shopping cart events: Add to cart, Change Quantity,Delete

    n

    Registration, log-in, checkout, payment, orderconfirmation

    n Any failure / crashes

    n Users timezone

    n Technical capabilities of the users computer

    n These details are collected particularlybecause they are useful for ANALYSIS

  • 8/10/2019 Business Intelligence & Data Mining-14

    10/25

    Data Collection

    n

    Usual methods of data collection:n Stateless Http requests from multiple web servers

    n Parsing and loading them session-wise and user-wise

    n Difficult Web logs were designed for debugging

    web servers not to provide data for BIn Blue Martini architecture was designed for BI

    n Session & user data collected and linked togetherat Application Server level

    n Transactions automatically tied to sessions

    n All data automatically recorded in a databasen Pre-processing and data cleaning is not required

  • 8/10/2019 Business Intelligence & Data Mining-14

    11/25

    Data Collection Lessons

    n Collect the right data upfront

    nAll data that could be useful should becollected and integrated

    n Stored in a database / data warehouse

    n Integrate with External Events

    n Marketing events like promotions

    n Cannot be captured by the data collectionsystems

  • 8/10/2019 Business Intelligence & Data Mining-14

    12/25

    Creating the Data Warehouse

    n DW creation requires substantial datatransformations

    n Can take 80% of the time taken to thecomplete BI exercise

    n Requires integration of several data sources:

    n Website

    n Payment gateway

    n Call center

    n POS terminals / shops systems

    n External systems / inputs (e.g. promotions /campaigns data)

  • 8/10/2019 Business Intelligence & Data Mining-14

    13/25

    Logical DW Architecture

  • 8/10/2019 Business Intelligence & Data Mining-14

    14/25

    Data Warehousing: Challenges

    Loading and Maintaining Consistent Data

    Loading and Storing Large Volumes of Data

    Coping with Changes in Operational Definitions

    Providing Reasonable Response Times

    If it is an E-Commerce site the website itself will

    be outside the Firewall, so data will have to be

    copied across the Firewall

  • 8/10/2019 Business Intelligence & Data Mining-14

    15/25

    Business Intelligence Tools

    n The software provided: Reports, Visualizationand Data Mining

    n Data Mining algorithms included:

    n Rule Inductionn Anomaly (outlier) detection

    n Entropy-based statistics

    n Association Rules

  • 8/10/2019 Business Intelligence & Data Mining-14

    16/25

    Business Intelligence Lessons (1)

    n Operational transactions have higherpriority than BI

    nBI can be taken up after the systemstabilizes

    n Can take several months to get startedn Users are happy with basic reports /

    MIS

    n Unexpectedly insightful findings capture their

    interest

    n This can start the BI process

  • 8/10/2019 Business Intelligence & Data Mining-14

    17/25

    Business Intelligence Lessons (2)

    n Trained Data Analysts are requiredn Domain knowledge is important

    n Technical know-how is essential

    n

    Terminology needs to be Definedn Users can misinterpret results

    n Potentially useful findings may be ignored orunrealistic expectations can arise

  • 8/10/2019 Business Intelligence & Data Mining-14

    18/25

    Business Intelligence: Challenges

    Designing user-friendly interactive interface

    Automatic Feature Construction

    Building models that users can interpret

    Making users understand that correlation does not

    imply causality

    Explaining insights

    Linking ROI to insights

  • 8/10/2019 Business Intelligence & Data Mining-14

    19/25

    Deployment

    n Insights need to be shared

    n Insights obtained by Data Mining needs to beshared across the organization

    n Easy to use tools for capturing andcommunicating (e.g. by E-mail) will help

    n Taking Action

    n Business users must see the value

    nActing on the results may be difficult (e.g.

    designing a campaign for a special segmentof customers)

    nA good architecture would help

  • 8/10/2019 Business Intelligence & Data Mining-14

    20/25

    Technical Lessons

  • 8/10/2019 Business Intelligence & Data Mining-14

    21/25

    Data Collection and Management Lessons

    n Collect data at the right leveln Data was collected at the Application Server

    level

    n Reduced pre-processing of weblog data

    n Design the GUI with Data Mining in mindnAll useful data can be captured

    n Default values should be avoided

    n

    Validate data to reduce cleaning effort

  • 8/10/2019 Business Intelligence & Data Mining-14

    22/25

    Data Collection and ManagementChallenges

    n Should data be sampled?n E-Commerce data is huge in volume

    n Is it necessary to store all the data?

    n Will rare events be missed if sampling is done?

    n

    Slowly changing dimensionsn Customers evolve (e.g. lifetime changes, lifestyle

    changes)

    n Products evolve (e.g. new lines, new technology)

    n Frequency of DW uploads

    n DW uploads take time and processing power

    n Should not disrupt BI analysts work

  • 8/10/2019 Business Intelligence & Data Mining-14

    23/25

    Data Cleaning and Pre-processingLessons & Challenges

    n Time-outs, incomplete sessions, crashesn Needs to be detected

    n What to do with such data?

    n Duplicates

    n

    Same customer with more than one IDn Same account used by multiple customers

    n Guest log-ins

    n Missing, unknown, not applicable or default

    valuesn Hierarchical Attributes

    n Most algorithms cannot handle hierarchical attributes

  • 8/10/2019 Business Intelligence & Data Mining-14

    24/25

    An Attribute Hierarchy

    all

    Europe North_America

    MexicoCanadaSpainGermany

    Vancouver

    M. WindL. Chan

    ...

    ......

    ... ...

    ...

    TorontoFrankfurt

    all

    region

    office

    country

    city

  • 8/10/2019 Business Intelligence & Data Mining-14

    25/25

    Analysis Lessons & Challenges

    n Enriching the Datan Add demographic attributes

    n Create derived attributes

    n Calculate weighted averages, moving averages

    n Exploration

    n Visualization

    n

    Domain knowledge can help in gaining insightn Customer propensity scoring

    n Building Models

    n Start with simple models (easy to explain to users)

    n Build models at the right level of the attribute hierarchy

    n

    Address scalability issues (to maintain users interest andconfidence)

    n Test and validate the models

    n Estimate accuracy levels