2002/1/17ids lab seminar evaluating a clustering solution: an application in the tourism market...
DESCRIPTION
2002/1/17IDS Lab Seminar Motivation To evaluate a clustering solutionTRANSCRIPT
![Page 1: 2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin](https://reader035.vdocuments.net/reader035/viewer/2022062401/5a4d1b147f8b9ab05999100a/html5/thumbnails/1.jpg)
2002/1/17 IDS Lab Seminar
Evaluating a clustering solution: An application in the tourism market
Advisor: Dr. HsuGraduate: Yung-Chu Lin
![Page 2: 2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin](https://reader035.vdocuments.net/reader035/viewer/2022062401/5a4d1b147f8b9ab05999100a/html5/thumbnails/2.jpg)
2002/1/17 IDS Lab Seminar
Outline Motivation Objective The various paradigms The number of clusters Utility concepts Proposed approach A tourism market application Conclusion
![Page 3: 2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin](https://reader035.vdocuments.net/reader035/viewer/2022062401/5a4d1b147f8b9ab05999100a/html5/thumbnails/3.jpg)
2002/1/17 IDS Lab Seminar
Motivation To evaluate a clustering solution
![Page 4: 2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin](https://reader035.vdocuments.net/reader035/viewer/2022062401/5a4d1b147f8b9ab05999100a/html5/thumbnails/4.jpg)
2002/1/17 IDS Lab Seminar
Objective Propose a framework for
evaluating a clustering solution Advocate a multimethodological
approach
![Page 5: 2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin](https://reader035.vdocuments.net/reader035/viewer/2022062401/5a4d1b147f8b9ab05999100a/html5/thumbnails/5.jpg)
2002/1/17 IDS Lab Seminar
The various paradigms Statistical method
Measures of association, association test, Automatic Interaction Detection(AID), Classification and Regression Tree-CART, Discriminant Analysis and Logistic Regression Machine Learning
Tree Classification algorithm-C4.5 and prepositional rules-CN2 The conjugation of methodologies sets the stage for dealing with rich and complex problems
![Page 6: 2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin](https://reader035.vdocuments.net/reader035/viewer/2022062401/5a4d1b147f8b9ab05999100a/html5/thumbnails/6.jpg)
2002/1/17 IDS Lab Seminar
Statistical methodologies Association between two nominal
variables
Cramer Statistic
![Page 7: 2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin](https://reader035.vdocuments.net/reader035/viewer/2022062401/5a4d1b147f8b9ab05999100a/html5/thumbnails/7.jpg)
2002/1/17 IDS Lab Seminar
Statistical methodologies(cont’d) Uncertainty Coefficient
![Page 8: 2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin](https://reader035.vdocuments.net/reader035/viewer/2022062401/5a4d1b147f8b9ab05999100a/html5/thumbnails/8.jpg)
2002/1/17 IDS Lab Seminar
Statistical methodologies(cont’d)
Mutual Information
ANOVA MANOVA CART Discriminant Analysis Logistic Regression
![Page 9: 2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin](https://reader035.vdocuments.net/reader035/viewer/2022062401/5a4d1b147f8b9ab05999100a/html5/thumbnails/9.jpg)
2002/1/17 IDS Lab Seminar
Machine learning methodologies Decision Trees
Provide a hierarchical process and model of classification Nonbacktracking and greedy optimisation algorithm
Propositional Rules Provide logic models Represented by “if condition then cluster”
Neural Networks Navie Bayes
![Page 10: 2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin](https://reader035.vdocuments.net/reader035/viewer/2022062401/5a4d1b147f8b9ab05999100a/html5/thumbnails/10.jpg)
2002/1/17 IDS Lab Seminar
The number of clusters May be set a priori May be an outcome of the
clustering process itself The best number is obtained by
comparing measures of model fit for as alternative numbers of clusters
![Page 11: 2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin](https://reader035.vdocuments.net/reader035/viewer/2022062401/5a4d1b147f8b9ab05999100a/html5/thumbnails/11.jpg)
2002/1/17 IDS Lab Seminar
The number of clusters(cont’d) Mixture Model Akaike Criteria(AIC)
![Page 12: 2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin](https://reader035.vdocuments.net/reader035/viewer/2022062401/5a4d1b147f8b9ab05999100a/html5/thumbnails/12.jpg)
2002/1/17 IDS Lab Seminar
Utility concepts The main question in evaluating a clustering a question about utility Utility is evaluated by judgement
![Page 13: 2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin](https://reader035.vdocuments.net/reader035/viewer/2022062401/5a4d1b147f8b9ab05999100a/html5/thumbnails/13.jpg)
2002/1/17 IDS Lab Seminar
Proposed approach
preprocess
![Page 14: 2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin](https://reader035.vdocuments.net/reader035/viewer/2022062401/5a4d1b147f8b9ab05999100a/html5/thumbnails/14.jpg)
2002/1/17 IDS Lab Seminar
Proposed approach(cont’d) The choice of a discriminant and classification methodologies the nature of variables Regarding discrimination, complementary dimensions offer a new perspective and understanding An integration of methodologies and techniques based on the Statistical and Machine Learning Paradigms
![Page 15: 2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin](https://reader035.vdocuments.net/reader035/viewer/2022062401/5a4d1b147f8b9ab05999100a/html5/thumbnails/15.jpg)
2002/1/17 IDS Lab Seminar
A tourism market application
The clustering solution
Evaluation of clustering solution
![Page 16: 2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin](https://reader035.vdocuments.net/reader035/viewer/2022062401/5a4d1b147f8b9ab05999100a/html5/thumbnails/16.jpg)
2002/1/17 IDS Lab Seminar
Data base The answers to a questionnaire: Portuguese clients of Pousadas de Portugal 49 questions 200 variables 2500 Portuguese clients
![Page 17: 2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin](https://reader035.vdocuments.net/reader035/viewer/2022062401/5a4d1b147f8b9ab05999100a/html5/thumbnails/17.jpg)
2002/1/17 IDS Lab Seminar
Clustering Model sample: 1647 clients (65%) ; Validation sample: 897 clients (35%) Use a priori and a K-Means procedure 4 variables expressing the frequency and type of Pousadas
CH, CSUP, C and B type 3 clusters (First time user, Regular users and Heavy users)
Model: 18%, 60% and 22% Validation: 16%, 62% and 22%
![Page 18: 2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin](https://reader035.vdocuments.net/reader035/viewer/2022062401/5a4d1b147f8b9ab05999100a/html5/thumbnails/18.jpg)
2002/1/17 IDS Lab Seminar
Clustering(cont’d) 2 clusters (Heavy users and Regular users)
Model: 16 Pousadas and 5 Pousadas Validation: 17 Pousadas and 4 Pousadas
![Page 19: 2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin](https://reader035.vdocuments.net/reader035/viewer/2022062401/5a4d1b147f8b9ab05999100a/html5/thumbnails/19.jpg)
2002/1/17 IDS Lab Seminar
A tourism market application
The clustering solution
Evaluation of clustering solution
![Page 20: 2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin](https://reader035.vdocuments.net/reader035/viewer/2022062401/5a4d1b147f8b9ab05999100a/html5/thumbnails/20.jpg)
2002/1/17 IDS Lab Seminar
Evaluation of clustering solution
![Page 21: 2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin](https://reader035.vdocuments.net/reader035/viewer/2022062401/5a4d1b147f8b9ab05999100a/html5/thumbnails/21.jpg)
2002/1/17 IDS Lab Seminar
Analysis of association between clusters and clustering base Measure the degree of correction in classification
Model: 82.6% ; Validation: 91.5% The linear combinations of the clustering base variables that maximise the ratio between-within cluster variation
![Page 22: 2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin](https://reader035.vdocuments.net/reader035/viewer/2022062401/5a4d1b147f8b9ab05999100a/html5/thumbnails/22.jpg)
2002/1/17 IDS Lab Seminar
Analysis of association between clusters and clustering base(cont’d)
![Page 23: 2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin](https://reader035.vdocuments.net/reader035/viewer/2022062401/5a4d1b147f8b9ab05999100a/html5/thumbnails/23.jpg)
2002/1/17 IDS Lab Seminar
Analysis of association between clusters and other variables Chi-square the strength of association
between clusters and variables Rule Induction Procedures discriminate
and classify on the base of attributes significantly associated with clusters
Rule induction provide a better comprehension of the facts discriminating the clusters
C4.5 and CN2 evaluate both Model sample and Validation sample
![Page 24: 2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin](https://reader035.vdocuments.net/reader035/viewer/2022062401/5a4d1b147f8b9ab05999100a/html5/thumbnails/24.jpg)
2002/1/17 IDS Lab Seminar
Analysis of association between clusters and other variables(cont’d) Memorize a group/beam of the
best solutions
![Page 25: 2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin](https://reader035.vdocuments.net/reader035/viewer/2022062401/5a4d1b147f8b9ab05999100a/html5/thumbnails/25.jpg)
2002/1/17 IDS Lab Seminar
Analysis of association between clusters and other variables(cont’d)
![Page 26: 2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin](https://reader035.vdocuments.net/reader035/viewer/2022062401/5a4d1b147f8b9ab05999100a/html5/thumbnails/26.jpg)
2002/1/17 IDS Lab Seminar
Analysis of association between clusters and other variables(cont’d)
![Page 27: 2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin](https://reader035.vdocuments.net/reader035/viewer/2022062401/5a4d1b147f8b9ab05999100a/html5/thumbnails/27.jpg)
2002/1/17 IDS Lab Seminar
Analysis of association between clusters and other variables(cont’d)
![Page 28: 2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin](https://reader035.vdocuments.net/reader035/viewer/2022062401/5a4d1b147f8b9ab05999100a/html5/thumbnails/28.jpg)
2002/1/17 IDS Lab Seminar
Global evaluation In Discriminant Analysis and Logistic Regressionclearly the differences between clusters Chi-square tests association between variables and clusters C4.5 and CN2 provides a more complex and richer perspective
![Page 29: 2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin](https://reader035.vdocuments.net/reader035/viewer/2022062401/5a4d1b147f8b9ab05999100a/html5/thumbnails/29.jpg)
2002/1/17 IDS Lab Seminar
Conclusion Identifying significant associations characterising the clustered entities guided discriminant and classification analysis Propositional rule induction is suitable for discriminating purposes Multimethodological approach should consider not only inference but also descriptive analysis