rm world 2014: design and implementation of data mining case studies
DESCRIPTION
TRANSCRIPT
Dr. Matthew North
Professor of Business & Information Systems
The College of Idaho
RapidMinerWorld 2014
Boston, USA
Design and Implementation of Data
Mining Case Studies in RapidMiner
W&J/College of Idaho
A Focus on Teaching & Learning
Lots of folks do data mining
Lots of those use RapidMiner
Data mining education
Younger than the discipline
Strange collection of options
Science? Business? Math?
A Focus on Teaching & Learning
2005 – Present:
Books!
Tools!
Weka, Alphaminer, Clementine, more…
Education
Master’s, Certificates, Boot Camps
Data Mining for the Masses (2012)
Data Mining Cases in RapidMiner (2013)
A Focus on Teaching & Learning
The Case Method
Cases give context
My first Clementine class
Cases build on prior knowledge
Central Tendency > k-Means Clustering
Cases use Learning Theory
Concept Attainment
The Anatomy of a Data Mining Case
ActivationStimulate prior knowledge/learningRelevant to the data mining task
AdditionIntroduce the new conceptK-Means Clustering
ComparisonGood/poor examples
Conclusions
RapidMiner World/Boston Example
Activation: Welcome to Boston!
There’s a lot to do here
Lots of cool/smart people
After hours connections can be valuable
Can data mining help make an effective
fun/work connection?
Maybe so, if we rate options and then
build option clusters
RapidMiner World/Boston Example
Addition: Options + Data = Choice
List our options, then rate from 0-3
across various types of fun
RapidMiner World/Boston Example
Addition: Modeling the data
RapidMiner World/Boston Example
Comparison: What do you see?
RapidMiner World/Boston Example
Conclusions: So what?
Does this help you make a decision?
How can you fine tune your model?
To what other problems/datasets could
you apply what you’ve learned?
Response to Reviewers
Use of a toy example
Transfer of knowledge to other
scenarios is ideal
Sometimes a little help is good…
Loan Analyst Example
Activation:
You review loans looking for red flags
You know how to spot anomalies
Your work is time-consuming
Addition:
Problem loans don’t look like average ones
K-Means Clustering uses averages
Averages help create different groups
Loan Analyst Example
Comparison:Build a k-Means model with your loan data
You’re the expert, what do you see?Compare your standard method results to the
data mining results
Conclusions:Is the model useful?
Can it speed up your identification of problem loans?
Conclusions
Cases are fun/interesting
Cases are accessible to area experts
Learning data mining is often the hurdle
RapidMiner makes data mining
accessible to non-experts
Now…..
Who’s Ready to Hit the Town?!?