weka: a machine machine learning with...
Post on 13-Mar-2020
19 Views
Preview:
TRANSCRIPT
Department of Computer Science,
University of Waikato, New Zealand
Eibe Frank
◼ WEKA: A Machine
Learning Toolkit
◼ The Explorer
• Classification and
Regression
• Clustering
• Association Rules
• Attribute Selection
• Data Visualization
◼ The Experimenter
◼ The Knowledge
Flow GUI
◼ Conclusions
Machine Learning with WEKA
10/15/2018 University of Waikato 2
WEKA: the bird
Copyright: Martin Kramer (mkramer@wxs.nl)
10/15/2018 University of Waikato 3
WEKA: the software
◼ Machine learning/data mining software written in
Java (distributed under the GNU Public License)
◼ Used for research, education, and applications
◼ Complements “Data Mining” by Witten & Frank
◼ Main features:
◆ Comprehensive set of data pre-processing tools,
learning algorithms and evaluation methods
◆ Graphical user interfaces (incl. data visualization)
◆ Environment for comparing learning algorithms
10/15/2018 University of Waikato 4
WEKA: versions
◼ There are several versions of WEKA:
◆ WEKA 3.0: “book version” compatible with
description in data mining book
◆ WEKA 3.2: “GUI version” adds graphical user
interfaces (book version is command-line only)
◆ WEKA 3.3: “development version” with lots of
improvements
◼ This talk is based on the latest snapshot of WEKA
3.3 (soon to be WEKA 3.4)
10/15/2018 University of Waikato 5
@relation heart-disease-simplified
@attribute age numeric
@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}
@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...
WEKA only deals with “flat” files
10/15/2018 University of Waikato 6
@relation heart-disease-simplified
@attribute age numeric
@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}
@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...
WEKA only deals with “flat” files
10/15/2018 University of Waikato 7
10/15/2018 University of Waikato 8
Explorer: Exploring the data
◼ Data can be imported from a file in various
formats: ARFF, CSV, C4.5, binary
◼ Data can also be read from a URL or from an SQL
database (using JDBC)
◼ Pre-processing tools in WEKA are called “filters”
◼ WEKA contains filters for:
◆ Discretization, normalization, resampling, attribute
selection, transforming and combining attributes, …
10/15/2018 University of Waikato 9
10/15/2018 University of Waikato 10
10/15/2018 University of Waikato 11
10/15/2018 University of Waikato 12
10/15/2018 University of Waikato 13
10/15/2018 University of Waikato 14
10/15/2018 University of Waikato 15
10/15/2018 University of Waikato 16
10/15/2018 University of Waikato 17
10/15/2018 University of Waikato 18
10/15/2018 University of Waikato 19
10/15/2018 University of Waikato 20
10/15/2018 University of Waikato 21
10/15/2018 University of Waikato 22
10/15/2018 University of Waikato 23
10/15/2018 University of Waikato 24
10/15/2018 University of Waikato 25
10/15/2018 University of Waikato 26
10/15/2018 University of Waikato 27
10/15/2018 University of Waikato 28
10/15/2018 University of Waikato 29
10/15/2018 University of Waikato 30
Explorer: building “classifiers”
◼ Classifiers in WEKA are models for predicting
nominal or numeric quantities
◼ Implemented learning schemes include:
◆ Decision trees and lists, instance-based classifiers,
support vector machines, multi-layer perceptrons,
logistic regression, Bayes’ nets, …
◼ “Meta”-classifiers include:
◆ Bagging, boosting, stacking, error-correcting output
codes, locally weighted learning, …
10/15/2018 University of Waikato 31
10/15/2018 University of Waikato 32
10/15/2018 University of Waikato 33
10/15/2018 University of Waikato 34
10/15/2018 University of Waikato 35
10/15/2018 University of Waikato 36
10/15/2018 University of Waikato 37
10/15/2018 University of Waikato 38
10/15/2018 University of Waikato 39
10/15/2018 University of Waikato 40
10/15/2018 University of Waikato 41
10/15/2018 University of Waikato 42
10/15/2018 University of Waikato 43
10/15/2018 University of Waikato 44
10/15/2018 University of Waikato 45
10/15/2018 University of Waikato 46
10/15/2018 University of Waikato 47
10/15/2018 University of Waikato 48
10/15/2018 University of Waikato 49
10/15/2018 University of Waikato 50
10/15/2018 University of Waikato 51
10/15/2018 University of Waikato 52
10/15/2018 University of Waikato 53
10/15/2018 University of Waikato 54
10/15/2018 University of Waikato 55
10/15/2018 University of Waikato 56
10/15/2018 University of Waikato 57
10/15/2018 University of Waikato 58
10/15/2018 University of Waikato 59
10/15/2018 University of Waikato 60
10/15/2018 University of Waikato 61
10/15/2018 University of Waikato 62
10/15/2018 University of Waikato 63QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
10/15/2018 University of Waikato 64QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
10/15/2018 University of Waikato 65QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.
10/15/2018 University of Waikato 66
10/15/2018 University of Waikato 67
10/15/2018 University of Waikato 68
10/15/2018 University of Waikato 69
10/15/2018 University of Waikato 70
10/15/2018 University of Waikato 71
10/15/2018 University of Waikato 72
10/15/2018 University of Waikato 73
Q uick Tim e™ and a TIFF (LZW) dec om press or are needed to s ee this p ic ture.
10/15/2018 University of Waikato 74
10/15/2018 University of Waikato 75
10/15/2018 University of Waikato 76
10/15/2018 University of Waikato 77
10/15/2018 University of Waikato 78
QuickTime™ and a TIFF (LZW) decompressor are needed t o see this picture.
10/15/2018 University of Waikato 79
QuickTime™ and a TIFF (LZW) decompressor are needed t o see this picture.
10/15/2018 University of Waikato 80
10/15/2018 University of Waikato 81
QuickTime™ and a TIFF (LZW) decompressor are needed t o see this picture.
10/15/2018 University of Waikato 82
10/15/2018 University of Waikato 83
10/15/2018 University of Waikato 84
10/15/2018 University of Waikato 85
10/15/2018 University of Waikato 86
10/15/2018 University of Waikato 87
10/15/2018 University of Waikato 88
10/15/2018 University of Waikato 89
10/15/2018 University of Waikato 90
Explorer: clustering data
◼ WEKA contains “clusterers” for finding groups of
similar instances in a dataset
◼ Implemented schemes are:
◆ k-Means, EM, Cobweb, X-means, FarthestFirst
◼ Clusters can be visualized and compared to “true”
clusters (if given)
◼ Evaluation based on loglikelihood if clustering
scheme produces a probability distribution
10/15/2018 University of Waikato 91
10/15/2018 University of Waikato 92
10/15/2018 University of Waikato 93
10/15/2018 University of Waikato 94
10/15/2018 University of Waikato 95
10/15/2018 University of Waikato 96
10/15/2018 University of Waikato 97
10/15/2018 University of Waikato 98
10/15/2018 University of Waikato 99
10/15/2018 University of Waikato 100
10/15/2018 University of Waikato 101
10/15/2018 University of Waikato 102
10/15/2018 University of Waikato 103
10/15/2018 University of Waikato 104
10/15/2018 University of Waikato 105
10/15/2018 University of Waikato 106
Explorer: finding associations
◼ WEKA contains an implementation of the Apriori
algorithm for learning association rules
◆ Works only with discrete data
◼ Can identify statistical dependencies between
groups of attributes:
◆ milk, butter bread, eggs (with confidence 0.9 and
support 2000)
◼ Apriori can compute all rules that have a given
minimum support and exceed a given confidence
10/15/2018 University of Waikato 107
10/15/2018 University of Waikato 108
10/15/2018 University of Waikato 109
10/15/2018 University of Waikato 110
10/15/2018 University of Waikato 111
10/15/2018 University of Waikato 112
10/15/2018 University of Waikato 113
10/15/2018 University of Waikato 114
Explorer: attribute selection
◼ Panel that can be used to investigate which
(subsets of) attributes are the most predictive ones
◼ Attribute selection methods contain two parts:
◆ A search method: best-first, forward selection,
random, exhaustive, genetic algorithm, ranking
◆ An evaluation method: correlation-based, wrapper,
information gain, chi-squared, …
◼ Very flexible: WEKA allows (almost) arbitrary
combinations of these two
10/15/2018 University of Waikato 115
10/15/2018 University of Waikato 116
10/15/2018 University of Waikato 117
10/15/2018 University of Waikato 118
10/15/2018 University of Waikato 119
10/15/2018 University of Waikato 120
10/15/2018 University of Waikato 121
10/15/2018 University of Waikato 122
10/15/2018 University of Waikato 123
Explorer: data visualization
◼ Visualization very useful in practice: e.g. helps to
determine difficulty of the learning problem
◼ WEKA can visualize single attributes (1-d) and
pairs of attributes (2-d)
◆ To do: rotating 3-d visualizations (Xgobi-style)
◼ Color-coded class values
◼ “Jitter” option to deal with nominal attributes (and
to detect “hidden” data points)
◼ “Zoom-in” function
10/15/2018 University of Waikato 124
10/15/2018 University of Waikato 125
10/15/2018 University of Waikato 126
10/15/2018 University of Waikato 127
10/15/2018 University of Waikato 128
10/15/2018 University of Waikato 129
10/15/2018 University of Waikato 130
10/15/2018 University of Waikato 131
10/15/2018 University of Waikato 132
10/15/2018 University of Waikato 133
10/15/2018 University of Waikato 134
10/15/2018 University of Waikato 135
10/15/2018 University of Waikato 136
Performing experiments
◼ Experimenter makes it easy to compare the
performance of different learning schemes
◼ For classification and regression problems
◼ Results can be written into file or database
◼ Evaluation options: cross-validation, learning
curve, hold-out
◼ Can also iterate over different parameter settings
◼ Significance-testing built in!
10/15/2018 University of Waikato 137
10/15/2018 University of Waikato 138
10/15/2018 University of Waikato 139
10/15/2018 University of Waikato 140
10/15/2018 University of Waikato 141
10/15/2018 University of Waikato 142
10/15/2018 University of Waikato 143
10/15/2018 University of Waikato 144
10/15/2018 University of Waikato 145
10/15/2018 University of Waikato 146
10/15/2018 University of Waikato 147
10/15/2018 University of Waikato 148
10/15/2018 University of Waikato 149
10/15/2018 University of Waikato 150
The Knowledge Flow GUI
◼ New graphical user interface for WEKA
◼ Java-Beans-based interface for setting up and
running machine learning experiments
◼ Data sources, classifiers, etc. are beans and can
be connected graphically
◼ Data “flows” through components: e.g.,
“data source” -> “filter” -> “classifier” -> “evaluator”
◼ Layouts can be saved and loaded again later
10/15/2018 University of Waikato 151
10/15/2018 University of Waikato 152
10/15/2018 University of Waikato 153
10/15/2018 University of Waikato 154
10/15/2018 University of Waikato 155
10/15/2018 University of Waikato 156
10/15/2018 University of Waikato 157
10/15/2018 University of Waikato 158
10/15/2018 University of Waikato 159
10/15/2018 University of Waikato 160
10/15/2018 University of Waikato 161
10/15/2018 University of Waikato 162
10/15/2018 University of Waikato 163
10/15/2018 University of Waikato 164
10/15/2018 University of Waikato 165
10/15/2018 University of Waikato 166
10/15/2018 University of Waikato 167
10/15/2018 University of Waikato 168
10/15/2018 University of Waikato 169
10/15/2018 University of Waikato 170
10/15/2018 University of Waikato 171
Conclusion: try it yourself!
◼ WEKA is available at
http://www.cs.waikato.ac.nz/ml/weka
▪ Also has a list of projects based on WEKA
▪ WEKA contributors:
Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard
Pfahringer , Brent Martin, Peter Flach, Eibe Frank ,Gabi Schmidberger ,Ian H.
Witten , J. Lindgren, Janice Boughton, Jason Wells, Len Trigg, Lucio de
Souza Coelho, Malcolm Ware, Mark Hall ,Remco Bouckaert , Richard Kirkby,
Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy, Tony Voyle, Xin Xu,
Yong Wang, Zhihai Wang
top related