classification in weka - ijskt.ijs.si/petrakralj/ips_dm_0910/handsonweka-part1.pdf ·...
Post on 06-Feb-2018
221 Views
Preview:
TRANSCRIPT
Petra.Kralj@ijs.si
Classification in WEKA 2009/11/10
Petra Kralj Novak
Petra.Kralj@ijs.si
Petra.Kralj@ijs.si
Practice with Weka
1. Build a decision tree with the ID3 algorithm on the lenses dataset, evaluate on a separate test set
2. Classification on the CAR dataset– Preparing the data– Building decision trees– Naive Bayes classifier– Understanding the Weka output
Petra.Kralj@ijs.si
Weka
Download
version
3.6
Weka is open source softwere for machine learning and data mining.
http://www.cs.waikato.ac.nz/ml/weka/
Petra.Kralj@ijs.si
Run Weka
Choose Explorer
Petra.Kralj@ijs.si
Exercise 1: Lenses dataset
• In the Weka data mining tool induce a decision
tree for the lenses dataset with the ID3
algorithm.
• Data:– lensesTrain.arff
– lensesTest.arff
• Compare the outcome with the manually
obtained results.
Petra.Kralj@ijs.si
Load the data
Petra.Kralj@ijs.si
Load the data - 2
lensesTrain.arff
Petra.Kralj@ijs.si
The data are loaded
Target variable
Choose
“Classify”
Petra.Kralj@ijs.si
Choose algoritem
Petra.Kralj@ijs.si
trees
Id3
Petra.Kralj@ijs.si
12
3
5
lensesTest.arff
4
Petra.Kralj@ijs.si
Decision tree
Petra.Kralj@ijs.si
Classification accuracy
Confusion
matrix
Petra.Kralj@ijs.si
Exercise 2: CAR dataset
• 1728 examples
• 6 attributes– 6 nominal
– 0 numeric
• Nominal target variable– 4 classes: unacc, acc, good, v-good
– Distribution of classes• unacc (70%), acc (22%), good (4%), v-good (4%)
• No missing values
Petra.Kralj@ijs.si
Preparing the data for WEKA - 1
Data in a spreadsheet
(e.g. MS Excel)
- Rows are examples
- Columns are attributes
- The last column is the target variable
Petra.Kralj@ijs.si
Preparing the data for WEKA - 2
Save as “.csv”- Careful with dots “.”,
commas “,” and semicolons “;”!
Petra.Kralj@ijs.si
Load the data
Target variable
Car.csv
Petra.Kralj@ijs.si
Choose algorithm J48
Petra.Kralj@ijs.si
Building and evaluating the tree
Petra.Kralj@ijs.si
Classified as
Actual values
Classification
accuracy
Petra.Kralj@ijs.si
Right
mouse
click
Petra.Kralj@ijs.si
Tree pruning
Set the minimal
number of objects
per leaf to 15
Parameters of the
algorithm
(right mouse click)
Petra.Kralj@ijs.si
Reduced
number of
leaves and
nodesEasier to interpret
Lower
classification
accuracy
Petra.Kralj@ijs.si
Petra.Kralj@ijs.si
Naïve Bayes classifier
Petra.Kralj@ijs.si
Petra.Kralj@ijs.si
top related