2002/1/17ids lab seminar evaluating a clustering solution: an application in the tourism market...

Post on 17-Jan-2018

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

2002/1/17IDS Lab Seminar Motivation To evaluate a clustering solution

TRANSCRIPT

2002/1/17 IDS Lab Seminar

Evaluating a clustering solution: An application in the tourism market

Advisor: Dr. HsuGraduate: Yung-Chu Lin

2002/1/17 IDS Lab Seminar

Outline Motivation Objective The various paradigms The number of clusters Utility concepts Proposed approach A tourism market application Conclusion

2002/1/17 IDS Lab Seminar

Motivation To evaluate a clustering solution

2002/1/17 IDS Lab Seminar

Objective Propose a framework for

evaluating a clustering solution Advocate a multimethodological

approach

2002/1/17 IDS Lab Seminar

The various paradigms Statistical method

Measures of association, association test, Automatic Interaction Detection(AID), Classification and Regression Tree-CART, Discriminant Analysis and Logistic Regression Machine Learning

Tree Classification algorithm-C4.5 and prepositional rules-CN2 The conjugation of methodologies sets the stage for dealing with rich and complex problems

2002/1/17 IDS Lab Seminar

Statistical methodologies Association between two nominal

variables

Cramer Statistic

2002/1/17 IDS Lab Seminar

Statistical methodologies(cont’d) Uncertainty Coefficient

2002/1/17 IDS Lab Seminar

Statistical methodologies(cont’d)

Mutual Information

ANOVA MANOVA CART Discriminant Analysis Logistic Regression

2002/1/17 IDS Lab Seminar

Machine learning methodologies Decision Trees

Provide a hierarchical process and model of classification Nonbacktracking and greedy optimisation algorithm

Propositional Rules Provide logic models Represented by “if condition then cluster”

Neural Networks Navie Bayes

2002/1/17 IDS Lab Seminar

The number of clusters May be set a priori May be an outcome of the

clustering process itself The best number is obtained by

comparing measures of model fit for as alternative numbers of clusters

2002/1/17 IDS Lab Seminar

The number of clusters(cont’d) Mixture Model Akaike Criteria(AIC)

2002/1/17 IDS Lab Seminar

Utility concepts The main question in evaluating a clustering a question about utility Utility is evaluated by judgement

2002/1/17 IDS Lab Seminar

Proposed approach

preprocess

2002/1/17 IDS Lab Seminar

Proposed approach(cont’d) The choice of a discriminant and classification methodologies the nature of variables Regarding discrimination, complementary dimensions offer a new perspective and understanding An integration of methodologies and techniques based on the Statistical and Machine Learning Paradigms

2002/1/17 IDS Lab Seminar

A tourism market application

The clustering solution

Evaluation of clustering solution

2002/1/17 IDS Lab Seminar

Data base The answers to a questionnaire: Portuguese clients of Pousadas de Portugal 49 questions 200 variables 2500 Portuguese clients

2002/1/17 IDS Lab Seminar

Clustering Model sample: 1647 clients (65%) ; Validation sample: 897 clients (35%) Use a priori and a K-Means procedure 4 variables expressing the frequency and type of Pousadas

CH, CSUP, C and B type 3 clusters (First time user, Regular users and Heavy users)

Model: 18%, 60% and 22% Validation: 16%, 62% and 22%

2002/1/17 IDS Lab Seminar

Clustering(cont’d) 2 clusters (Heavy users and Regular users)

Model: 16 Pousadas and 5 Pousadas Validation: 17 Pousadas and 4 Pousadas

2002/1/17 IDS Lab Seminar

A tourism market application

The clustering solution

Evaluation of clustering solution

2002/1/17 IDS Lab Seminar

Evaluation of clustering solution

2002/1/17 IDS Lab Seminar

Analysis of association between clusters and clustering base Measure the degree of correction in classification

Model: 82.6% ; Validation: 91.5% The linear combinations of the clustering base variables that maximise the ratio between-within cluster variation

2002/1/17 IDS Lab Seminar

Analysis of association between clusters and clustering base(cont’d)

2002/1/17 IDS Lab Seminar

Analysis of association between clusters and other variables Chi-square the strength of association

between clusters and variables Rule Induction Procedures discriminate

and classify on the base of attributes significantly associated with clusters

Rule induction provide a better comprehension of the facts discriminating the clusters

C4.5 and CN2 evaluate both Model sample and Validation sample

2002/1/17 IDS Lab Seminar

Analysis of association between clusters and other variables(cont’d) Memorize a group/beam of the

best solutions

2002/1/17 IDS Lab Seminar

Analysis of association between clusters and other variables(cont’d)

2002/1/17 IDS Lab Seminar

Analysis of association between clusters and other variables(cont’d)

2002/1/17 IDS Lab Seminar

Analysis of association between clusters and other variables(cont’d)

2002/1/17 IDS Lab Seminar

Global evaluation In Discriminant Analysis and Logistic Regressionclearly the differences between clusters Chi-square tests association between variables and clusters C4.5 and CN2 provides a more complex and richer perspective

2002/1/17 IDS Lab Seminar

Conclusion Identifying significant associations characterising the clustered entities guided discriminant and classification analysis Propositional rule induction is suitable for discriminating purposes Multimethodological approach should consider not only inference but also descriptive analysis

top related