a generalized cluster centroid based classifier for text categorization

17
Intelligent Database Systems Presenter : BEI-YI JIANG Authors : GUANSONG PANG, SHENGYI JIANG 2013. INFORMATION PROCESSING AND MANAGEMENT A generalized cluster centroid based classi er for text categorization

Upload: royce

Post on 24-Feb-2016

60 views

Category:

Documents


0 download

DESCRIPTION

A generalized cluster centroid based classifier for text categorization. Presenter : Bei -YI Jiang Authors : Guansong Pang, Shengyi Jiang 2013. Information Processing and Management. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation. KNN - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A generalized cluster  centroid  based  classifier  for text categorization

Intelligent Database Systems Lab

Presenter : BEI-YI JIANG

Authors : GUANSONG PANG, SHENGYI JIANG

2013. INFORMATION PROCESSING AND MANAGEMENT

A generalized cluster centroid based classifier for text categorization

Page 2: A generalized cluster  centroid  based  classifier  for text categorization

Intelligent Database Systems Lab

OutlinesMotivationObjectivesMethodologyExperimentsConclusionsComments

Page 3: A generalized cluster  centroid  based  classifier  for text categorization

Intelligent Database Systems Lab

Motivation

• KNN− With the exponential growth of online textual

information, how to organize text data effectively and efficiently has become an important and demanding issue.

• Rocchio− Fails to obtain an expressive categorization

model due to its inherent linear separability assumption.

Page 4: A generalized cluster  centroid  based  classifier  for text categorization

Intelligent Database Systems Lab

Objectives

• To strengthen the expressiveness of the Rocchio model.

• Employ the improved Rocchio model to speed up the categorization process of KNN.

Page 5: A generalized cluster  centroid  based  classifier  for text categorization

Intelligent Database Systems Lab

Methodology

• KNN

• Rocchio

Page 6: A generalized cluster  centroid  based  classifier  for text categorization

Intelligent Database Systems Lab

Methodology

Page 7: A generalized cluster  centroid  based  classifier  for text categorization

Intelligent Database Systems Lab

Methodology

Page 8: A generalized cluster  centroid  based  classifier  for text categorization

Intelligent Database Systems Lab

Methodology

• GCC

• Determine the threshold

Page 9: A generalized cluster  centroid  based  classifier  for text categorization

Intelligent Database Systems Lab

Experiments

Page 10: A generalized cluster  centroid  based  classifier  for text categorization

Intelligent Database Systems Lab

Experiments

Page 11: A generalized cluster  centroid  based  classifier  for text categorization

Intelligent Database Systems Lab

Experiments

Page 12: A generalized cluster  centroid  based  classifier  for text categorization

Intelligent Database Systems Lab

Experiments

Page 13: A generalized cluster  centroid  based  classifier  for text categorization

Intelligent Database Systems Lab

Experiments

Page 14: A generalized cluster  centroid  based  classifier  for text categorization

Intelligent Database Systems Lab

Experiments

Page 15: A generalized cluster  centroid  based  classifier  for text categorization

Intelligent Database Systems Lab

Experiments

Page 16: A generalized cluster  centroid  based  classifier  for text categorization

Intelligent Database Systems Lab

Conclusions

• strengthen the expressiveness of the Rocchio model• GCCC and its variants achieve impressive

performance• obtain near linear time complexity in modeling• GCCC’s modeling stage is more time-consuming

Page 17: A generalized cluster  centroid  based  classifier  for text categorization

Intelligent Database Systems Lab

Comments• Advantages

-relatively stable-favorable performance

• Applications-online categorization