getting started with weka -...
TRANSCRIPT
Getting started with WekaYishuang Geng, Kexin Shi, Pei Zhang, Angel
Trifonov, Jiefeng He, Xiaolu Xiong
Lesson 1.1 - Introduction
Purpose of this course
● Take the mystery out of data mining.
● How to use the Weka workbench for data mining.
● Explain the basic principles of several popular algorithms
Data mining with Weka
● What’s data mining?○ We are overwhelmed with data○ Data mining is about going from the raw data to
information.
● What could data mining do?○ You’re at the supermarket checkout and you’re
happy with your bargains … and the supermarket is happy you’ve bought some more stuff
○ You want a child, but you and your partner can’t have one.
○ You want to monitor the firefighters status but you cannot get into the burning houses to watch them.
What is Weka?
1. A bird found only in New Zealand2. Waikato Environment for Knowledge
Analysis
● Weka includes:○ 100+ algorithms for classification○ 75 for data preprocessing○ 25 to assist with feature
selection○ 20 for clustering, finding
association rules, etc
Textbook
Data Mining: Practical machinelearning tools and techniques,by Ian H. Witten, Eibe Frank andMark A. Hall. Morgan Kaufmann, 2011
Learning outcome of the course
● Load data into Weka and look at it● Use filters to preprocess it● Explore it using interactive visualization● Apply classification algorithms● Interpret the output● Understand evaluation methods and their implications● Understand various representations for models● Explain how popular machine learning algorithms work● Be aware of common pitfalls with data mining
Use Weka on your own data… and understand what you are doing!
A simple application
○ You want to monitor the firefighters status but you cannot get into the burning houses to watch them.
A simple application
Motion Detection Using RF Signals for the FirstResponder in Emergency Operations● Firefighters
○ Sensor to monitor their physiological information, which personal area communication capability to a centroid node.
○ Centroid node has local area communication capability to link the terminals out of burning house.
○ If we want to monitor their motion, what should we do?
Existing approaches
● Pros○ High detection rate.○ Low computational cost.
● Cons○ Add extra load to firefighter.○ Limited sensor location, usually on shoes.○ Lack of capability on detecting multiple
motions,mainly used for fall detection.
Raw data
Data mining
Information from the raw data
Summary
● Why taking that course● Materials
○ Weka○ Textbook
● Course schedule○ Lectures○ Activities○ Assessments
● Learning outcome● A simple application
Lesson 1.2 - Exploring the Explorer
Setting up Weka
● Download latest (Weka 3.6.10) from http://www.cs.waikato.ac.nz/ml/weka/downloading.html
● Self-extracting executable ○ Java VM included (if needed)
● Create shortcut to Data folder in your Computer’s My Documents
● Use the Weka shortcut from the program folder
Weka Interface
● Weka interfaces○ Explorer○ Experimenter○ GUI○ Command-line
● Explorer will be used the most
Explorer Interface
● Explorer Panels○ Preprocess
● Opening datasets○ File
● Filter○ Supervised○ Unsupervised
Filters
● Difference
● An additional two kinds of filtering○ Instances○ Attributes
More Preprocess Information
● Relation○ Attributes○ Instances
● Selected Attribute○ Name○ Type○ Other Info
● Attributes○ Editing○ Removing
● Class Visualization● Status and log
Lesson 1.3 - Exploring datasets
Classification
Nominal vs. Numerical
ARFF file format
Lesson 1.4 - Building a classifier
Classifying the glass datasetInterpreting J48 outputJ48 configuration panel... option: pruned vs unpruned trees ... option: avoid small leaves
Jiefeng
Click Here
Use the confusion matrix to determine how many headlamps instances were misclassified as build wind float?
3What is the percentage ofcorrectly classified instances?
Turning pruning off results in larger trees, and often yields worse results because the classifier may "overfit" the data. However, in some cases the unpruned tree performs better than the pruned one.
1.4 Summary
Building a classifierClassifying the glass datasetInterpreting J48 outputJ48 configuration panel... option: pruned vs unpruned trees ... option: avoid small leaves
Lesson 1.5 - Using a filter
Use a filter to remove an attribute
Open weather.nominal.arff
Check the filters
Set attributeIndices to 3 and click OK
Apply the filter
Lesson 1.6 - Visualizing your data
Raw data visualization
Sepalwidth vs. petalwidth
Zoom in
Zoom in
Error visualization
Error visualization
Thank you!
Questions?