Third Colloquium:
Application of Data Mining in Education
SITI KHADIJAH MOHAMAD
FACULTY OF EDUCATION
APRIL 10 & 11, 2018
Introduction Data Mining, Software, RQs,
1
Data Mining
Data Mining is a technique which use to discover patterns in data, gain knowledge.
Machine Learning is the algorithms used in data mining technique.
Types of DM: Decision tree, Association rules, Clustering, etc.
Supervised and Unsupervised Learning?
Cross validation?
Software
Types: WEKA, Microsoft SQL Server 2008, RapidMiner, Clementine, R
Download: http://www.cs.waikato.ac.nz/ml/weka/
Supported Platform: Linux, Windows, Mac OS
Created: Researchers at the University of Waikato, New Zealand
Research Question
Association, Clustering and Decision tree are NOT Cause - Effect analysis.
It is actually about relationship analysis.
Eg of RQs:
1. To develop a decision tree model that can predict student’s performance based on the
mechanisms of metacognitive scaffolding prompted by the instructor in Facebook discussion.
2. To formulate learning performance pathways based on the reflective thinking and types of
feedback through educational blogging
3. How the provision of feedback and reflective thinking shape the reflection process through
educational blogging
4. To develop deaf students’ learning patterns when using the e-learning environment in studying
Nuclear Energy
Decision Tree
• This is related to lifestyle and heart disease.
• Age, Smoker (y/n), Diet (good/poor), and a label Risk
(Less Risk/More Risk).
• The biggest influence on Risk turns out to be the
Smoker attribute.
• Smoker becomes the first branch in our tree.
• For Smokers, the next influential attribute happens to
be Age, however, for non smokers, the data indicates
that their diet has a bigger influence on the risk.
• The tree will branch into two different nodes until the
classification is reached.
• Decision tree can be a great way to visualize how a
decision is derived based on the attributes in your
data.
Credit to: refactorthis.net
Association Rules
Q1 Q2 T1 conf: (1)
Q7 T3 conf: (0.92)
T2 Q2 conf: (0.5)
Support (coverage) and Confidence (accuracy)
Clustering
Credit to: Almodiel
WEKA Workbench 2
WEKA Workbench (1) Performance Comparison
Graphical Interface
Classifiers
Command-line Interface
WEKA Workbench (2)
Supply data here
Details of the data
Details of the data
• Attributes == Variables
• Instances == No of samples
Preprocess Tab
4 options to
classify the data
WEKA Workbench (3)
Classify Tab (also known as postprocessing tab)
Results panel
Lists of algorithms
Right click here to
view the tree
What Does Precision and Recall Tell Us?
Precision: Given all the predicted labels (for a given class X), how many
instances were correctly predicted?
Recall: For all instances that should have a label X, how many of these
were correctly captured?
Suppose a computer program for recognizing dogs in scenes from a
video identifies 7 dogs in a scene containing 9 dogs and some cats. If 4
of the identifications are correct, but 3 are actually cats, the program's
precision is 4/7 while its recall is 4/9.
Application & Interpretation
True Positives and True Negatives: are correct classification
False Positives: when the outcome is incorrectly predicted as yes when it is actually no
False Negatives: when the outcome is incorrectly predicted as no when it is actually yes Credit to: wikipedia
Calculate Recall for Class A:
= TP_A / (TP_A+ FN_A)
= 10 / (10 + 2 )
= 0.83
Predicted Class
a b c Total
Actual
Class
a 10 1 1 12
b 2 0 1 3
c 1 0 0 1
Total 13 1 2 16
Application & Interpretation
Calculate Precision for Class A:
= TP_A / (TP_A+ FP_A)
= 10 / (10 + 3 )
= 0.769
Thank You! Questions?