to address product classification based on kaggle data ...lxiong/cs570/share/project/... · the...
TRANSCRIPT
![Page 1: to Address Product Classification based on Kaggle Data ...lxiong/cs570/share/project/... · The Otto Group is one of the largest e-commerce companies in the world. Arranging millions](https://reader034.vdocuments.net/reader034/viewer/2022052105/604082ab0bb3f4027326221b/html5/thumbnails/1.jpg)
Using Density-Based Clustering Approaches to Address Product Classification based on Kaggle Data
Denis Whelan& Jin Ming
December 3, 2017
![Page 2: to Address Product Classification based on Kaggle Data ...lxiong/cs570/share/project/... · The Otto Group is one of the largest e-commerce companies in the world. Arranging millions](https://reader034.vdocuments.net/reader034/viewer/2022052105/604082ab0bb3f4027326221b/html5/thumbnails/2.jpg)
▶ Background: ◦ The Otto Group is one of the largest e-commerce companies in the world.◦ Arranging millions of products from a variety of different products and countries is a complex task
that requires a sophisticated approach. ▶ Data:
◦ ~62,000 samples, 93 numeric features, 9 target labels◦ Target labels are hidden but represent key categories such as electronics, fashion, etc.
▶ Kaggle Competition Purpose: ◦ Build a predictive model which can accurately classify products into the 9 appropriate categories
(supervised learning)
Introduction: Product Classification Challenge
▶ Our Purpose: ◦ Apply density-based clustering methods to this
product classification problem to cluster all products (unsupervised)
![Page 3: to Address Product Classification based on Kaggle Data ...lxiong/cs570/share/project/... · The Otto Group is one of the largest e-commerce companies in the world. Arranging millions](https://reader034.vdocuments.net/reader034/viewer/2022052105/604082ab0bb3f4027326221b/html5/thumbnails/3.jpg)
● CLARA (1990): ● The basic k-medoids method for large data applications
● DBSCAN (1996): ● The original density-based method
● NG - DBSCAN (2016):● Modified DBSCAN method
Methods: CLARA, DBSCAN, NG-DBSCAN
![Page 4: to Address Product Classification based on Kaggle Data ...lxiong/cs570/share/project/... · The Otto Group is one of the largest e-commerce companies in the world. Arranging millions](https://reader034.vdocuments.net/reader034/viewer/2022052105/604082ab0bb3f4027326221b/html5/thumbnails/4.jpg)
● CLARA (Clustering Large Applications, 1990): ● Sampling with PAM
● DBSCAN (Ester et al., 1996): ● The use of density-reachable points and density-connected points● Groups data packed in high-density regions of the feature space● Separates 'core points' from 'noise points'● Recognizes clusters with arbitrary shapes
CLARA, DBSCAN
![Page 5: to Address Product Classification based on Kaggle Data ...lxiong/cs570/share/project/... · The Otto Group is one of the largest e-commerce companies in the world. Arranging millions](https://reader034.vdocuments.net/reader034/viewer/2022052105/604082ab0bb3f4027326221b/html5/thumbnails/5.jpg)
NG-DBSCAN (Lulli et al. 2016)
● Limitations of DBSCAN● Scalability is limited● Cannot handle arbitrary similarity measures, only uses Euclidean
distance● The choice of Eps and MinPts
● NG-DBSCAN: ● An approximated and distributed implementation of DBSCAN● more efficient because of approximation● can represent item dissimilarity through any symmetric distance function
![Page 6: to Address Product Classification based on Kaggle Data ...lxiong/cs570/share/project/... · The Otto Group is one of the largest e-commerce companies in the world. Arranging millions](https://reader034.vdocuments.net/reader034/viewer/2022052105/604082ab0bb3f4027326221b/html5/thumbnails/6.jpg)
NG-DBSCAN
● Phase 1: ● create ε-graph
i. form neighbor graph by connecting each node to k random other nodes
ii. edges are added to ε-graph if the distance is less than eiii. as soon as a node has M_max neighbors in the ε-graph,
remove it from neighbor graph
![Page 7: to Address Product Classification based on Kaggle Data ...lxiong/cs570/share/project/... · The Otto Group is one of the largest e-commerce companies in the world. Arranging millions](https://reader034.vdocuments.net/reader034/viewer/2022052105/604082ab0bb3f4027326221b/html5/thumbnails/7.jpg)
(Lulli et al. 2016)
![Page 8: to Address Product Classification based on Kaggle Data ...lxiong/cs570/share/project/... · The Otto Group is one of the largest e-commerce companies in the world. Arranging millions](https://reader034.vdocuments.net/reader034/viewer/2022052105/604082ab0bb3f4027326221b/html5/thumbnails/8.jpg)
NG-DBSCAN
● Phase 2: ● discovering dense regions
i. coreness disseminationii. seed identificationiii. seed propagation
![Page 9: to Address Product Classification based on Kaggle Data ...lxiong/cs570/share/project/... · The Otto Group is one of the largest e-commerce companies in the world. Arranging millions](https://reader034.vdocuments.net/reader034/viewer/2022052105/604082ab0bb3f4027326221b/html5/thumbnails/9.jpg)
(Lulli et al. 2016)
![Page 10: to Address Product Classification based on Kaggle Data ...lxiong/cs570/share/project/... · The Otto Group is one of the largest e-commerce companies in the world. Arranging millions](https://reader034.vdocuments.net/reader034/viewer/2022052105/604082ab0bb3f4027326221b/html5/thumbnails/10.jpg)
Results: CLARA▶ Runtime: ◦ 0.95 seconds
![Page 11: to Address Product Classification based on Kaggle Data ...lxiong/cs570/share/project/... · The Otto Group is one of the largest e-commerce companies in the world. Arranging millions](https://reader034.vdocuments.net/reader034/viewer/2022052105/604082ab0bb3f4027326221b/html5/thumbnails/11.jpg)
Results: CLARA & DBSCAN
![Page 12: to Address Product Classification based on Kaggle Data ...lxiong/cs570/share/project/... · The Otto Group is one of the largest e-commerce companies in the world. Arranging millions](https://reader034.vdocuments.net/reader034/viewer/2022052105/604082ab0bb3f4027326221b/html5/thumbnails/12.jpg)
Results: DBSCAN
▶ Runtime: ◦ 18.3 minutes
![Page 13: to Address Product Classification based on Kaggle Data ...lxiong/cs570/share/project/... · The Otto Group is one of the largest e-commerce companies in the world. Arranging millions](https://reader034.vdocuments.net/reader034/viewer/2022052105/604082ab0bb3f4027326221b/html5/thumbnails/13.jpg)
Questions?