![Page 1: An Evaluation of A Commercial Data Mining Suite Oracle Data Mining Presented by Emily Davis Supervisor: John Ebden](https://reader030.vdocuments.net/reader030/viewer/2022032803/56649e395503460f94b2aa79/html5/thumbnails/1.jpg)
An Evaluation of A Commercial Data Mining Suite
Oracle Data MiningPresented by Emily DavisSupervisor: John Ebden
![Page 2: An Evaluation of A Commercial Data Mining Suite Oracle Data Mining Presented by Emily Davis Supervisor: John Ebden](https://reader030.vdocuments.net/reader030/viewer/2022032803/56649e395503460f94b2aa79/html5/thumbnails/2.jpg)
Oracle Data MiningAn Investigation
Emily Davis
Investigating the data mining tools
and software available with
Oracle9i.
Use Oracle Data Mining and
JDeveloper (Java API) to run
algorithms in data mining suite on
sample data.
An evaluation of results using confusion
matrices, lift charts & error rates. A
comparison of the effectiveness of different
algorithms.
Supervisor: John EbdenContact: [email protected]: http://www.cs.ru.ac.za/research/students/g01D1801/
Model A
Model Accept
Model Reject
Actual Accept
600 25
Actual Reject
75 300
Oracle Data Mining, DM4J and
JDeveloper
Adaptive BayesNaive Bayes
![Page 3: An Evaluation of A Commercial Data Mining Suite Oracle Data Mining Presented by Emily Davis Supervisor: John Ebden](https://reader030.vdocuments.net/reader030/viewer/2022032803/56649e395503460f94b2aa79/html5/thumbnails/3.jpg)
Problem Statement
To determine how Oracle provides data mining functionalityEase of useData preparationModel buildingModel testingApplying models to new data
![Page 4: An Evaluation of A Commercial Data Mining Suite Oracle Data Mining Presented by Emily Davis Supervisor: John Ebden](https://reader030.vdocuments.net/reader030/viewer/2022032803/56649e395503460f94b2aa79/html5/thumbnails/4.jpg)
Problem Statement
To determine whether the algorithms used would find a pattern in a data setWhat happened when the models were
applied to a new data set To determine which algorithm built the
most effective model and under what circumstances
![Page 5: An Evaluation of A Commercial Data Mining Suite Oracle Data Mining Presented by Emily Davis Supervisor: John Ebden](https://reader030.vdocuments.net/reader030/viewer/2022032803/56649e395503460f94b2aa79/html5/thumbnails/5.jpg)
Problem Statement
To determine how models are tested and if this indicates how they will perform when applied to new data
To determine how the data affected the model building and how the test data affected the model testing
![Page 6: An Evaluation of A Commercial Data Mining Suite Oracle Data Mining Presented by Emily Davis Supervisor: John Ebden](https://reader030.vdocuments.net/reader030/viewer/2022032803/56649e395503460f94b2aa79/html5/thumbnails/6.jpg)
Methodology
Two Classification algorithms selected:Naïve BayesAdaptive Bayes Network
Both produce predictions which could then be compared
![Page 7: An Evaluation of A Commercial Data Mining Suite Oracle Data Mining Presented by Emily Davis Supervisor: John Ebden](https://reader030.vdocuments.net/reader030/viewer/2022032803/56649e395503460f94b2aa79/html5/thumbnails/7.jpg)
Methodology
Data from http://www.ru.ac.za/weather/ Weather data Data recorded includes:
Temperature (degrees F) Humidity (percent) Barometer (inches of mercury) Wind Direction (degrees, 360 = North, 90 = East) Wind Speed (MPH) High Wind Speed (MPH) Solar Radiation (Watts/m^2) Rainfall (inches) Wind Chill (computed from high wind speed and temperature)
![Page 8: An Evaluation of A Commercial Data Mining Suite Oracle Data Mining Presented by Emily Davis Supervisor: John Ebden](https://reader030.vdocuments.net/reader030/viewer/2022032803/56649e395503460f94b2aa79/html5/thumbnails/8.jpg)
Data
Rainfall reading removed and replaced with a yes or no depending on whether rainfall was recorded
This variable, RAIN, was chosen as the target variable
2 Data sets put into tables in the databaseWEATHER_BUILDWEATHER_APPLY
![Page 9: An Evaluation of A Commercial Data Mining Suite Oracle Data Mining Presented by Emily Davis Supervisor: John Ebden](https://reader030.vdocuments.net/reader030/viewer/2022032803/56649e395503460f94b2aa79/html5/thumbnails/9.jpg)
WEATHER_BUILD2601 recordsUsed to create build and test data with
Transformation Split wizard WEATHER_APPLY
290 recordsUsed to validate models
![Page 10: An Evaluation of A Commercial Data Mining Suite Oracle Data Mining Presented by Emily Davis Supervisor: John Ebden](https://reader030.vdocuments.net/reader030/viewer/2022032803/56649e395503460f94b2aa79/html5/thumbnails/10.jpg)
Building and Testing the Models
The Priors technique Training and tuning the models The models built Testing Results
![Page 11: An Evaluation of A Commercial Data Mining Suite Oracle Data Mining Presented by Emily Davis Supervisor: John Ebden](https://reader030.vdocuments.net/reader030/viewer/2022032803/56649e395503460f94b2aa79/html5/thumbnails/11.jpg)
Data Preparation Techniques - Priors
Histogram for:RAIN
0
200
400
600
800
1000
1200
1400
yes no
Bin Range
Bin
Co
un
t
![Page 12: An Evaluation of A Commercial Data Mining Suite Oracle Data Mining Presented by Emily Davis Supervisor: John Ebden](https://reader030.vdocuments.net/reader030/viewer/2022032803/56649e395503460f94b2aa79/html5/thumbnails/12.jpg)
Priors
Histogram for:RAIN
0
200
400
600
800
1000
1200
1400
yes no
Bin Range
Bin
Co
un
t
Stratified Sampling
![Page 13: An Evaluation of A Commercial Data Mining Suite Oracle Data Mining Presented by Emily Davis Supervisor: John Ebden](https://reader030.vdocuments.net/reader030/viewer/2022032803/56649e395503460f94b2aa79/html5/thumbnails/13.jpg)
Priors
Histogram for:RAIN
0
200
400
600
800
1000
1200
1400
yes no
Bin Range
Bin
Co
un
t
Histogram for:RAIN
0
200
400
600
800
1000
1200
1400
yes no
Bin Range
Bin
Co
un
t
Stratified Sampling
![Page 14: An Evaluation of A Commercial Data Mining Suite Oracle Data Mining Presented by Emily Davis Supervisor: John Ebden](https://reader030.vdocuments.net/reader030/viewer/2022032803/56649e395503460f94b2aa79/html5/thumbnails/14.jpg)
Training and Tuning the Models
Predicted No Predicted Yes
Actual No 384 34
Actual Yes 141 74
![Page 15: An Evaluation of A Commercial Data Mining Suite Oracle Data Mining Presented by Emily Davis Supervisor: John Ebden](https://reader030.vdocuments.net/reader030/viewer/2022032803/56649e395503460f94b2aa79/html5/thumbnails/15.jpg)
Training and Tuning the Models
Viable to introduce a weighting of 3 against false negatives
Makes a false negative prediction 3 times as costly as a false positive
Algorithm attempts to minimise costs
![Page 16: An Evaluation of A Commercial Data Mining Suite Oracle Data Mining Presented by Emily Davis Supervisor: John Ebden](https://reader030.vdocuments.net/reader030/viewer/2022032803/56649e395503460f94b2aa79/html5/thumbnails/16.jpg)
The Models
8 models in total 4 using each algorithm
One using default settingsOne using the Priors techniqueOne using weightingOne using Priors and weighting
![Page 17: An Evaluation of A Commercial Data Mining Suite Oracle Data Mining Presented by Emily Davis Supervisor: John Ebden](https://reader030.vdocuments.net/reader030/viewer/2022032803/56649e395503460f94b2aa79/html5/thumbnails/17.jpg)
Testing the Models
Tested on test data set created from WEATHER_BUILD data set
Confusion matrices indicating accuracy of models
![Page 18: An Evaluation of A Commercial Data Mining Suite Oracle Data Mining Presented by Emily Davis Supervisor: John Ebden](https://reader030.vdocuments.net/reader030/viewer/2022032803/56649e395503460f94b2aa79/html5/thumbnails/18.jpg)
Testing Results
Testing Results
0.00%10.00%20.00%30.00%40.00%50.00%60.00%70.00%80.00%90.00%
now
eigh
ting,
no p
riors
now
eigh
ting,
prio
rs
wei
ghtin
g,no
prio
rs
wei
ghtin
g,pr
iors
Model Settings
Acc
ura
cy
Naïve Bayes
Adaptive BayesNetwork
![Page 19: An Evaluation of A Commercial Data Mining Suite Oracle Data Mining Presented by Emily Davis Supervisor: John Ebden](https://reader030.vdocuments.net/reader030/viewer/2022032803/56649e395503460f94b2aa79/html5/thumbnails/19.jpg)
Applying the Models to New Data
Models were applied to the new data in WEATHER_APPLY
Prediction Probability THE_TIME
no 0.9999 1
yes 0.6711 138
Prediction Cost of incorrect prediction
THE_TIME
no 0 1
yes 0.3288 138
Extracts showing 2 predictions in actual results
![Page 20: An Evaluation of A Commercial Data Mining Suite Oracle Data Mining Presented by Emily Davis Supervisor: John Ebden](https://reader030.vdocuments.net/reader030/viewer/2022032803/56649e395503460f94b2aa79/html5/thumbnails/20.jpg)
Attribute Influence on Predictions
Adaptive Bayes Network provides rules along with predictions
Rules in if…….then format Rules showed attributes with most
influence were:Wind ChillWind Direction
![Page 21: An Evaluation of A Commercial Data Mining Suite Oracle Data Mining Presented by Emily Davis Supervisor: John Ebden](https://reader030.vdocuments.net/reader030/viewer/2022032803/56649e395503460f94b2aa79/html5/thumbnails/21.jpg)
Results of Applying Models to New Data
Model Results
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
noweighting,no priors
noweighting,
priors
weighting,no priors
weighting,priors
Model Settings
Acc
ura
cy
Naïve Bayes
Adaptive Bayes Network
![Page 22: An Evaluation of A Commercial Data Mining Suite Oracle Data Mining Presented by Emily Davis Supervisor: John Ebden](https://reader030.vdocuments.net/reader030/viewer/2022032803/56649e395503460f94b2aa79/html5/thumbnails/22.jpg)
Comparing Accuracy
Model Results
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
noweighting,no priors
noweighting,
priors
weighting,no priors
weighting,priors
Model SettingsA
ccu
racy Naïve Bayes
Adaptive Bayes Network
Testing Results
0.00%10.00%20.00%30.00%40.00%50.00%60.00%70.00%80.00%90.00%
now
eigh
ting,
no p
riors
now
eigh
ting,
prio
rs
wei
ghtin
g,no
prio
rs
wei
ghtin
g,pr
iors
Model Settings
Acc
ura
cy Naïve Bayes
Adaptive BayesNetwork
![Page 23: An Evaluation of A Commercial Data Mining Suite Oracle Data Mining Presented by Emily Davis Supervisor: John Ebden](https://reader030.vdocuments.net/reader030/viewer/2022032803/56649e395503460f94b2aa79/html5/thumbnails/23.jpg)
Observations
Algorithms found a pattern in the weather data Most effective model: Adaptive Bayes Network
algorithm using weighting Accuracy of Naïve Bayes models improves
dramatically if weighting and Priors are used Significant difference between accuracy during
testing of models and accuracy when applied to new data
![Page 24: An Evaluation of A Commercial Data Mining Suite Oracle Data Mining Presented by Emily Davis Supervisor: John Ebden](https://reader030.vdocuments.net/reader030/viewer/2022032803/56649e395503460f94b2aa79/html5/thumbnails/24.jpg)
Conclusions
Oracle Data Mining provides easy to use wizards that support all aspects of the data mining process
Algorithms found a pattern in the weather dataBest case: the Adaptive Bayes Network model
predicted 73.1% of RAIN outcomes correctly
![Page 25: An Evaluation of A Commercial Data Mining Suite Oracle Data Mining Presented by Emily Davis Supervisor: John Ebden](https://reader030.vdocuments.net/reader030/viewer/2022032803/56649e395503460f94b2aa79/html5/thumbnails/25.jpg)
Conclusions
Adaptive Bayes Network algorithm produced most effective model: accuracy 73.1% when applied to new data Tuned using a weighting of 3 against false negatives
Most effective model using Naïve Bayes: accuracy of 63.79% Uses a weighting of 3 against false negatives and
uses Priors technique
![Page 26: An Evaluation of A Commercial Data Mining Suite Oracle Data Mining Presented by Emily Davis Supervisor: John Ebden](https://reader030.vdocuments.net/reader030/viewer/2022032803/56649e395503460f94b2aa79/html5/thumbnails/26.jpg)
Conclusions
Accuracy during testing does not always indicate performance of model on new data
Test accuracy inflated if target attribute distribution in build and test data sets is similar
Shows the need for testing of a model on a variety of data sets
![Page 27: An Evaluation of A Commercial Data Mining Suite Oracle Data Mining Presented by Emily Davis Supervisor: John Ebden](https://reader030.vdocuments.net/reader030/viewer/2022032803/56649e395503460f94b2aa79/html5/thumbnails/27.jpg)
Questions