ผลการวิเคราะห์ข้อมูลของทีมที่ได้รางวัลชนะเลิศ...
TRANSCRIPT
ABC Bookstore AnalysisThe Three Data MusketeersNIDA Business Analytics and Data Sciences ContestSeptember, 2, 2016
Thearasak PhaladisailoedTeam Position: Senior Data Scientist
Faculty of Information Technology KMITL
Experienced with PythonMachine Learning Enthusiast
James RakratchatakulTeam Position: Analyst
Faculty of Engineering Chulalongkorn University
- Big Data & Statistical Analysis- Experienced with RapidMiner
- Tableua Analyst
Korkrid AkepanidtawornTeam Position: Data Scientist
- Economics and Statistics- London School of Economics
- Big Data & Statistical Analysis- Experienced with R Programming, STATA,
SAS, SPSS, PSPP, Python, BI Tools, and Big Data Services
September, 2, 2016 Business Analytics and Data Sciences Contest 2
Our Team
Morning Agenda1. Understanding our Dataset
2. Problem Statement
3. Data Preprocessing
4. Descriptive Analytics
5. Predictive Analytics
6. Data Insight Recap
4
Analytics Life Cycle
“Our team use mass analytics tools to build a predictive model, communicate results,
operationalize data, and lead to discovery”
September, 2, 2016 Business Analytics and Data Sciences Contest 5
Dataset Snapshot
395 observations of 64 variables. The dataset comes with data dictionary, describing each variable.
September, 2, 2016 Business Analytics and Data Sciences Contest 6
Problem Statement
ABC Bookstore is currently experiencing problems with a constant decline in profits, customer loyalty, and customer satisfaction.
Our team will need to use data analytics to investigate the ways to recover customer satisfaction, books re-purchasing rate, and bookstore subscription.
September, 2, 2016 Business Analytics and Data Sciences Contest 7
Data Preprocessing
The ABC Bookstore dataset needs cleansing prior to data analysis and visualization. The key challenges are:
Outlier Detection (delete or impute extreme values) Missing Value Treatment Binning or Discretization Central of Tendency Imputation
Missing Values Replacement Policies:
• Ignore the records with missing values.• Replace them with a global constant (e.g., “?”).• Fill in missing values manually based on your
domain knowledge.• Replace them with the variable mean (if numerical)
or the most frequent value (if categorical).• Use modeling techniques such as nearest neighbors,
Bayes’ rule, decision tree, or EM algorithm.
8
Descriptive AnalyticsWhat Happened When?
Explaining the past
September, 2, 2016 Business Analytics and Data Sciences Contest 9
Demographic Analysis
September, 2, 2016 Business Analytics and Data Sciences Contest 10
Demographic Analysis
September, 2, 2016 Business Analytics and Data Sciences Contest 11
Demographic Analysis
September, 2, 2016 Business Analytics and Data Sciences Contest 12
What will happen?Predicting the future
September, 2, 2016 Business Analytics and Data Sciences Contest 13
Customer Satisfaction Model
Algorithms Title: Stepwise Linear Regression
How does the algorithms work?: predicting the value of target (numerical variable) by building a model based on one or more predictors (numerical and categorical variables)
Business Objective: regain customer satisfaction and maximize customer utility
Goal: predict the levels of customer satisfaction from purchasing books from our store
Approach:• Data cleansing and select relevant features• Interpret the linear regression model• Examine coefficient of determination and p-values. • Parameter Tuning• Model selection and evaluation
September, 2, 2016 Business Analytics and Data Sciences Contest 14
Regression Model Result
September, 2, 2016 Business Analytics and Data Sciences Contest 15
Model Evaluation – R Squared
The coefficient of determination (R2) summarizes the explanatory power of the regression model and is computed from the sums-of-squares terms.
R2 describes the proportion of variance of the dependent variable explained by the regression model. If the regression model is “perfect”, SSE is zero, and R2 is 1. If the regression model is a total failure, SSE is equal to SST, no variance is explained by regression, and R2 is zero.
September, 2, 2016 Business Analytics and Data Sciences Contest 16
Re-Purchasing Algorithms
Algorithms Title: Linear Regression
How does the algorithms work?: predicting the value of target (numerical variable) by building a model based on one or more predictors (numerical and categorical variables)
Business Objective: identify the rate at which a customer wants to re-purchase our books
Goal: predict the levels of customer satisfaction from purchasing books from our store
Approach:• Data cleansing and select relevant features• Interpret the linear regression model• Examine coefficient of determination and p-values. • Parameter Tuning• Model selection and evaluation
September, 2, 2016 Business Analytics and Data Sciences Contest 17
Re-Purchasing Regression
September, 2, 2016 Business Analytics and Data Sciences Contest 18
Model Evaluation - RMSE
RMSE is a popular formula to measure the error rate of a regression model. However, it can only be compared between models whose errors are measured in the same units: RMSE = 7.29605. Accuracy = 92.70395
September, 2, 2016 Business Analytics and Data Sciences Contest 19
Subscription Model
Algorithms Title: Decision Tree
How does the algorithms work?: Decision tree builds classification or regression models in the form of a tree structure. It breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed. The final result is a tree with decision nodes and leaf nodes. Decision trees can handle both categorical and numerical data.
Business Objective: identify customer preference to subscribe ABC bookstore
Goal: apply vs. not apply by using classification algorithms
Approach:• Data cleansing and select relevant features• Interpret the decision tree model• Parameter Tuning: entropy, information gain, and decision rules• Model selection and evaluation
September, 2, 2016 Business Analytics and Data Sciences Contest 20
Model Result
Significant var: CS5, CS8, CS9 + age, CS14, CS30
September, 2, 2016 Business Analytics and Data Sciences Contest 21
Model Evaluation
A confusion matrix shows the number of correct and incorrect predictions made by the classification model compared to the actual outcomes (target value) in the data
• Accuracy : the proportion of the total number of predictions that were correct.• Positive Predictive Value or Precision : the proportion of positive cases that were correctly identified.• Negative Predictive Value : the proportion of negative cases that were correctly identified.• Sensitivity or Recall : the proportion of actual positive cases which are correctly identified. • Specificity : the proportion of actual negative cases which are correctly identified.
September, 2, 2016 Business Analytics and Data Sciences Contest 22
What’s Next?
Data-driven business strategy in the era of Thailand 4.0 Better Campaign Performance Better Market Share Better Product Development Lasting Revenue Better customer satisfaction Better books re-purchasing rate Better bookstore subscription.
Business Analytics: Data-Driven ABC Bookstore The Three Data Musketeers
NIDA Business Analytics and Data Sciences ContestSeptember, 2, 2016
Business Plan: Table of Contents
1. Business Challenges and Expectations2. Industry 4.0: Data-Driven Business3. Marketing Strategy4. Management 5. Service Recommendation6. Project Timeline7. Financial Analysis
Key Challenges & Outcomes
September, 2, 2016 NIDA Business Analytics and Data Sciences Contest 25
• Constant decline in profits• Customer utility drops• Low re-purchase• Low the bookstore subscription
• Sustainable business growth• Maximized customer utility• Higher re-purchase• High Incentive to subscribe
Industry 4.0
September, 2, 2016 NIDA Business Analytics and Data Sciences Contest 26
Data-Driven Marketing
If ABC Bookstore wants to succeed in the era of Thailand 4.0, the company needs to deliver innovation, speed, cost reduction, and creative management
27September, 2, 2016 NIDA Business Analytics and Data Sciences Contest
the right message to the right person
at the right time for the right price
Strategy to Maximize Utility
September, 2, 2016 NIDA Business Analytics and Data Sciences Contest 28
According to previous analysis, it is found that “the employee can answer questions”, “activities/discussions”, “a variety of books”, and “book wrapping” are statistically significant to the increase in customer satisfaction.
Management Challenge Lack of employee training Lack of store activities
Service Challenge A variety of books offered Add-on for book wrapping?
Strategy to Re-purchase
September, 2, 2016 NIDA Business Analytics and Data Sciences Contest 29
According to previous analysis, it is found that the top 10 factors that are statistically associated with re-purchasing are a variety of books, bag claiming, respect for customers, employee’s good service, quality in book categorization, feeling of security or freedom in reading, place to read books, Easy to search books, and discount or promotion.
Strategy to Increase Subscription
September, 2, 2016 NIDA Business Analytics and Data Sciences Contest 30
Freedom for customersWell-mannered
There’s book that I want!Book Wrapping Service
Well-OrganizedQuality in book categorization
Creative Management 4.0
September, 2, 2016 NIDA Business Analytics and Data Sciences Contest 31
Data indicates three following attributes of most importance: Well-Mannered Service-minded No pressure on customers Freedom for customers Respect for customers
Balance of Can Do Capability and Will Do Motivation appropriate training for employees adequate skills and competencies, especially data-driven decisions updated knowledge: always keep up-to-date on social media Motivate employees in aspects of recognition, love of work, career
structure, and social respect.
September, 2, 2016 NIDA Business Analytics and Data Sciences Contest 32
ABC Serenade
Novel Business
Academic Magazine
September, 2, 2016 NIDA Business Analytics and Data Sciences Contest 33
ABC Serenade
Novel Top10
September, 2, 2016 NIDA Business Analytics and Data Sciences Contest 34
ABC Serenade
Novel
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxx
September, 2, 2016 NIDA Business Analytics and Data Sciences Contest 35
ABC Counter
Timeline
Set up Impression Box
Train Employees 1st
2016/9
2016/11
2016/13
2016/10
2016/12
App Development for Serenade
Serenade Zone
Full OperationSocial Media Marketing
36
Result Check-up
Purchase Computersand materials for
serenade zone
36September, 2, 2016 NIDA Business Analytics and Data Sciences Contest
Financial Analysis
37September, 2, 2016 NIDA Business Analytics and Data Sciences Contest
Service Costs
Impression Box 200 B
Training Venue 10,000 B
Computer 20,000 B
App Development 10,000 B
Serenade Bar 10,000 B
Total 50,200 B
Thank YouTime for Q & A