a generalized cluster centroid based classifier for text categorization
DESCRIPTION
A generalized cluster centroid based classifier for text categorization. Presenter : Bei -YI Jiang Authors : Guansong Pang, Shengyi Jiang 2013. Information Processing and Management. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation. KNN - PowerPoint PPT PresentationTRANSCRIPT
Intelligent Database Systems Lab
Presenter : BEI-YI JIANG
Authors : GUANSONG PANG, SHENGYI JIANG
2013. INFORMATION PROCESSING AND MANAGEMENT
A generalized cluster centroid based classifier for text categorization
Intelligent Database Systems Lab
OutlinesMotivationObjectivesMethodologyExperimentsConclusionsComments
Intelligent Database Systems Lab
Motivation
• KNN− With the exponential growth of online textual
information, how to organize text data effectively and efficiently has become an important and demanding issue.
• Rocchio− Fails to obtain an expressive categorization
model due to its inherent linear separability assumption.
Intelligent Database Systems Lab
Objectives
• To strengthen the expressiveness of the Rocchio model.
• Employ the improved Rocchio model to speed up the categorization process of KNN.
Intelligent Database Systems Lab
Methodology
• KNN
• Rocchio
Intelligent Database Systems Lab
Methodology
Intelligent Database Systems Lab
Methodology
Intelligent Database Systems Lab
Methodology
• GCC
• Determine the threshold
Intelligent Database Systems Lab
Experiments
Intelligent Database Systems Lab
Experiments
Intelligent Database Systems Lab
Experiments
Intelligent Database Systems Lab
Experiments
Intelligent Database Systems Lab
Experiments
Intelligent Database Systems Lab
Experiments
Intelligent Database Systems Lab
Experiments
Intelligent Database Systems Lab
Conclusions
• strengthen the expressiveness of the Rocchio model• GCCC and its variants achieve impressive
performance• obtain near linear time complexity in modeling• GCCC’s modeling stage is more time-consuming
Intelligent Database Systems Lab
Comments• Advantages
-relatively stable-favorable performance
• Applications-online categorization