department of computer science, university of waikato, new zealand eibe frank weka: a machine...

70
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression Clustering Association Rules Attribute Selection Data Visualization The Experimenter The Knowledge Machine Learning with WEKA based on notes by

Upload: kylie-watson

Post on 28-Mar-2015

224 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

Department of Computer Science, University of Waikato, New Zealand

Eibe Frank

WEKA: A Machine Learning Toolkit

The Explorer• Classification and

Regression• Clustering• Association Rules• Attribute Selection• Data Visualization

The Experimenter The Knowledge

Flow GUI Conclusions

Machine Learning with WEKA

based on notes by

Page 2: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 2

WEKA: the bird

Copyright: Martin Kramer ([email protected])

Page 3: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 3

WEKA: the software Machine learning/data mining software written in

Java (distributed under the GNU Public License) Used for research, education, and applications Complements “Data Mining” by Witten & Frank Main features:

Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods

Graphical user interfaces (incl. data visualization) Environment for comparing learning algorithms

Page 4: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 4

WEKA: versions There are several versions of WEKA:

WEKA 3.0: “book version” compatible with description in data mining book 1st edition

WEKA 3.2: “GUI version” adds graphical user interfaces (earlier version is command-line only)

WEKA 3.4 ++ on SoC linux and ISS windows This talk is based on snapshots of WEKA 3.3 … with some extra up-to-date snapshots Only changes are “layout” and some extras

Page 5: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 5

@relation heart-disease-simplified

@attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}

@data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,present67,male,asympt,229,yes,present38,female,non_anginal,?,no,not_present...

WEKA only deals with “flat” files

Page 6: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 6

@relation heart-disease-simplified

@attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}

@data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,present67,male,asympt,229,yes,present38,female,non_anginal,?,no,not_present...

WEKA only deals with “flat” files

Page 7: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 7

Page 8: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 8

Page 9: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 9

Page 10: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 10

Explorer: pre-processing the data Data can be imported from a file in various

formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL

database (using JDBC) Pre-processing tools in WEKA are called “filters” BUT it may be easier to reformat to ARFF yourself

(write a program in python / java … or just use WordPad to type in the text – but make sure format is right!), this helps with data understanding

Page 11: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 11

Page 12: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 12

Page 13: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 13

Page 14: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 14

Page 15: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 15

Page 16: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 16

Page 17: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 17

Page 18: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 18

Explorer: building “classifiers” Classifiers in WEKA are models for predicting

nominal or numeric quantities Implemented learning schemes include:

Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, …

You explore by trying different classifiers, see which works best for you…

Page 19: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 19

Page 20: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 20

Page 21: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 21

Page 22: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 22

Page 23: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 23

Page 24: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 24

Page 25: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 25

Page 26: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 26

Page 27: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 27

Page 28: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 28

Page 29: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 29

Page 30: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 30

Page 31: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 31

Page 32: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 32

Page 33: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 33

Page 34: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 34

Page 35: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 35

Page 36: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 36

Page 37: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 37

Page 38: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 38

Page 39: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 39

Page 40: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 40

Page 41: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 41

Page 42: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 42

Page 43: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 43

Page 44: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 44QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

Page 45: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 45QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

Page 46: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 46QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

Page 47: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 47

Page 48: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

WEKA from ISS PC

2009

Page 49: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression
Page 50: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression
Page 51: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression
Page 52: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression
Page 53: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

@relation ukus

@attribute center numeric@attribute centre numeric@attribute centerpercent numeric@attribute color numeric@attribute colour numeric@attribute colorpercent numeric@attribute english {UK,US}

@data1,32,3, 0,20,0, UK0,25,0, 0,12,0, UK9,27,25, 0,84,0, UK0,19,0, 0,24,0, UK0,16,0, 0,14,0, UK0,16,0, 0,12,0, UK0,21,0, 0,38,0, UK0,25,0, 0,34,0, UK2,26,7, 2,3,40, UK2,32,5, 1,59,2, UK31,0,100, 55,0,100, US61,0,100, 26,0,100, US24,0,100, 11,0,100, US12,1,92, 21,4,84, US8,0,100, 4,2,67, US10,0,100, 8,0,100, US19,0,100, 22,0,100, US14,0,100, 7,0,100, US14,0,100, 6,0,100, US8,5,62, 24,0,100, US

Page 54: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression
Page 55: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression
Page 56: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression
Page 57: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression
Page 58: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression
Page 59: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression
Page 60: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression
Page 61: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression
Page 62: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression
Page 63: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression
Page 64: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression
Page 65: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

@relation test

@attribute center numeric@attribute centre numeric@attribute centerpercent numeric@attribute color numeric@attribute colour numeric@attribute colorpercent numeric@attribute english {UK,US}

@data10,5,33, 0,20,0, UK

Page 66: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression
Page 67: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression
Page 68: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression
Page 69: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression
Page 70: Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression

04/10/23 University of Waikato 70

WEKA has more… Clustering data into groups Finding associations between attributes Visualisation - online analytical processing Experimenter to run and compare different MLs Knowledge Flow GUI 3rd-party add-ons: sourceforge.net http://www.cs.waikato.ac.nz/ml/weka