classification in weka - ijskt.ijs.si/petrakralj/ips_dm_0910/handsonweka-part1.pdf ·...

Post on 06-Feb-2018

221 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Petra.Kralj@ijs.si

Classification in WEKA 2009/11/10

Petra Kralj Novak

Petra.Kralj@ijs.si

Petra.Kralj@ijs.si

Practice with Weka

1. Build a decision tree with the ID3 algorithm on the lenses dataset, evaluate on a separate test set

2. Classification on the CAR dataset– Preparing the data– Building decision trees– Naive Bayes classifier– Understanding the Weka output

Petra.Kralj@ijs.si

Weka

Download

version

3.6

Weka is open source softwere for machine learning and data mining.

http://www.cs.waikato.ac.nz/ml/weka/

Petra.Kralj@ijs.si

Run Weka

Choose Explorer

Petra.Kralj@ijs.si

Exercise 1: Lenses dataset

• In the Weka data mining tool induce a decision

tree for the lenses dataset with the ID3

algorithm.

• Data:– lensesTrain.arff

– lensesTest.arff

• Compare the outcome with the manually

obtained results.

Petra.Kralj@ijs.si

Load the data

Petra.Kralj@ijs.si

Load the data - 2

lensesTrain.arff

Petra.Kralj@ijs.si

The data are loaded

Target variable

Choose

“Classify”

Petra.Kralj@ijs.si

Choose algoritem

Petra.Kralj@ijs.si

trees

Id3

Petra.Kralj@ijs.si

12

3

5

lensesTest.arff

4

Petra.Kralj@ijs.si

Decision tree

Petra.Kralj@ijs.si

Classification accuracy

Confusion

matrix

Petra.Kralj@ijs.si

Exercise 2: CAR dataset

• 1728 examples

• 6 attributes– 6 nominal

– 0 numeric

• Nominal target variable– 4 classes: unacc, acc, good, v-good

– Distribution of classes• unacc (70%), acc (22%), good (4%), v-good (4%)

• No missing values

Petra.Kralj@ijs.si

Preparing the data for WEKA - 1

Data in a spreadsheet

(e.g. MS Excel)

- Rows are examples

- Columns are attributes

- The last column is the target variable

Petra.Kralj@ijs.si

Preparing the data for WEKA - 2

Save as “.csv”- Careful with dots “.”,

commas “,” and semicolons “;”!

Petra.Kralj@ijs.si

Load the data

Target variable

Car.csv

Petra.Kralj@ijs.si

Choose algorithm J48

Petra.Kralj@ijs.si

Building and evaluating the tree

Petra.Kralj@ijs.si

Classified as

Actual values

Classification

accuracy

Petra.Kralj@ijs.si

Right

mouse

click

Petra.Kralj@ijs.si

Tree pruning

Set the minimal

number of objects

per leaf to 15

Parameters of the

algorithm

(right mouse click)

Petra.Kralj@ijs.si

Reduced

number of

leaves and

nodesEasier to interpret

Lower

classification

accuracy

Petra.Kralj@ijs.si

Petra.Kralj@ijs.si

Naïve Bayes classifier

Petra.Kralj@ijs.si

Petra.Kralj@ijs.si

top related