kdd cup 2007
DESCRIPTION
KDD CUP 2007. Neural Network HW2 Group 14. Yu Szu-Hsien (M9609208) Ciou Yun-Rong(M9608305). How? (method & system). 1. Make into a matrix. From analyzing the film types that the customers has rated, we can predict the customers’ rating on the other films in the same type. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: KDD CUP 2007](https://reader034.vdocuments.net/reader034/viewer/2022051316/5681503c550346895dbe3a1e/html5/thumbnails/1.jpg)
2007_12_31 KDD CUP 2007 Neural Network HW2
KDD CUP 2007Neural Network HW2
Group 14
Yu Szu-Hsien (M9609208)
Ciou Yun-Rong(M9608305)
![Page 2: KDD CUP 2007](https://reader034.vdocuments.net/reader034/viewer/2022051316/5681503c550346895dbe3a1e/html5/thumbnails/2.jpg)
Group 14 HW 2
How?
(method & system)
![Page 3: KDD CUP 2007](https://reader034.vdocuments.net/reader034/viewer/2022051316/5681503c550346895dbe3a1e/html5/thumbnails/3.jpg)
Group 14 HW 2
1. Make into a matrix
From analyzing the film types that the customers has rated, we can predict the customers’ rating on the other films in the same type.
![Page 4: KDD CUP 2007](https://reader034.vdocuments.net/reader034/viewer/2022051316/5681503c550346895dbe3a1e/html5/thumbnails/4.jpg)
Group 14 HW 2
This problem takes the data in an enormous database as a basis.
The rating series of every customer imply the personality, favorite and time interval.
Every movie can compile statistics, and it is appraised that how many customers have rated in different time, regarded as time series.
Every customer can compile statistics, and it is appraised that what user rated, regarded as time series.
2. The characteristics of the problem
![Page 5: KDD CUP 2007](https://reader034.vdocuments.net/reader034/viewer/2022051316/5681503c550346895dbe3a1e/html5/thumbnails/5.jpg)
Group 14 HW 2
Similarity measures
Use Poisson regression
Clustering analysis
Association rule
Random forests
Collaborative filtering method (group filter or social filtering)
Singular value decomposition (SVD)
Methods → How to find the similar films and similar users?
![Page 6: KDD CUP 2007](https://reader034.vdocuments.net/reader034/viewer/2022051316/5681503c550346895dbe3a1e/html5/thumbnails/6.jpg)
Group 14 HW 2
<Weka> : multilayer perceptron (MLP) Data mining software in Java
<MATLAB> : backpropagation The language of technical computing
<MS SQL 2005> : clustering A comprehensive, integrated data management and analysis s
oftware
System
![Page 7: KDD CUP 2007](https://reader034.vdocuments.net/reader034/viewer/2022051316/5681503c550346895dbe3a1e/html5/thumbnails/7.jpg)
Group 14 HW 2
Result (training & test set)
![Page 8: KDD CUP 2007](https://reader034.vdocuments.net/reader034/viewer/2022051316/5681503c550346895dbe3a1e/html5/thumbnails/8.jpg)
Group 14 HW 2
“ Out of memory!! ”-- The dataset size is too large.
Not enough eigenvalues of the dataset.
What are the valuable eigenvalues we really need?
Which algorithm should be used?
Difficulty confronted
![Page 9: KDD CUP 2007](https://reader034.vdocuments.net/reader034/viewer/2022051316/5681503c550346895dbe3a1e/html5/thumbnails/9.jpg)
Group 14 HW 2
Training & Test set
Downsize the dataset : Grouping by their eigenvalues (using SQL) Sampling from the groups for training
Make the sampled dataset into a matrix
Train in the tool : Weka, MATLAB
Evaluate the accuracy by RMSE
![Page 10: KDD CUP 2007](https://reader034.vdocuments.net/reader034/viewer/2022051316/5681503c550346895dbe3a1e/html5/thumbnails/10.jpg)
Group 14 HW 2
The Sketch
![Page 11: KDD CUP 2007](https://reader034.vdocuments.net/reader034/viewer/2022051316/5681503c550346895dbe3a1e/html5/thumbnails/11.jpg)
Group 14 HW 2
SQL Server
![Page 12: KDD CUP 2007](https://reader034.vdocuments.net/reader034/viewer/2022051316/5681503c550346895dbe3a1e/html5/thumbnails/12.jpg)
Group 14 HW 2
MATLAB(1/2)
![Page 13: KDD CUP 2007](https://reader034.vdocuments.net/reader034/viewer/2022051316/5681503c550346895dbe3a1e/html5/thumbnails/13.jpg)
Group 14 HW 2
MATLAB(2/2)
(# Training Data = 10040, Test Data = 42)
![Page 14: KDD CUP 2007](https://reader034.vdocuments.net/reader034/viewer/2022051316/5681503c550346895dbe3a1e/html5/thumbnails/14.jpg)
Group 14 HW 2
Weka
(# Training Data = 118, Test Data = 13)
![Page 15: KDD CUP 2007](https://reader034.vdocuments.net/reader034/viewer/2022051316/5681503c550346895dbe3a1e/html5/thumbnails/15.jpg)
Group 14 HW 2
Analysis (why)
![Page 16: KDD CUP 2007](https://reader034.vdocuments.net/reader034/viewer/2022051316/5681503c550346895dbe3a1e/html5/thumbnails/16.jpg)
Group 14 HW 2
Analysis
<Weka> We regard the data as a matrix of the movies and users
• Defect: enormous matrix
Solution: classify the movies or users first
Minimum of the wrong rate: multilayer perceptron neural number& training times
<MATLAB> Not enough eigenvalue (only one eigenvalue about movie classif
ication) We will find more eigenvalue about the dependence among the
movie and customer (use SVD)
![Page 17: KDD CUP 2007](https://reader034.vdocuments.net/reader034/viewer/2022051316/5681503c550346895dbe3a1e/html5/thumbnails/17.jpg)
Group 14 HW 2