oracle data mining update and xerox application charlie berger sr. director of product management,...
TRANSCRIPT
Oracle Data Mining Update and Xerox Application
Charlie BergerSr. Director of Product Management, Life Sciences
and Data [email protected]
Oracle Corporation
Raj MinhasResearch ScientistXerox Corporation
Session id: 40262
Copyright © 2003 Oracle Corporation
Agenda
Oracle Data Mining UpdateODM/DM4J DemonstrationXerox Application
Copyright © 2003 Oracle Corporation
Oracle Business Intelligence Vision
DatabaseDatabaseEngineEngine
DataIntegration
Engine
OLAPEngine
MiningEngine
Multiple databases Multiple servers Multiple engines Proprietary interfaces Complex environment Slow conversion of
data to information
Is to change this …
Copyright © 2003 Oracle Corporation
Oracle Business Intelligence Vision
Single database Single server Standard interfaces Simplified
environment Fast conversion of
data to information
Data Warehousing
ETL
OLAP
Data Mining
Oracle 10Oracle 10gg DB DB
Into this …
Copyright © 2003 Oracle Corporation
What is Oracle Data Mining?
Oracle Data Mining (ODM) sifts through massive amounts of data to find hidden patterns and information
— valuable information that can help you better understand your customers and anticipate their behavior
ODM insights can be revealing, significant, and valuable e.g.
– Predict which customers are likely to churn– Discover what factors are involved with a
certain disease– Identify fraudulent behavior
Copyright © 2003 Oracle Corporation
Data mining embedded in Oracle10g Database
– Simplifies process, eliminates data movement, speeds analysis, deployment and delivers security and scalability
Build models and applications simultaneously
– Build and evaluate models and automatically generate Java code
Enhance applications with predictions and insights
– For example, build churn prediction applications and enable call centers with greater customer insight
Oracle Data MiningOverview & Differentiating Features
Data Mining
Copyright © 2003 Oracle Corporation
Oracle Data MiningBusiness Intelligence Applications
CEOs can ask…– How can I target the
“right customers” to maximize profits?
Managers can answer…
– Which customers are likely to be interested in which offers and why?
Call Reps can…– Suggest the right
“offer” for the customer
Information Producers Information Consumers
Data Miners can…– Discover patterns
and insights hidden in the data
Oracle Data Mining
Copyright © 2003 Oracle Corporation
Information ConsumersKey factors that
influence customers likely to purchase a
product
Customers sorted in likelihood to
purchase a product
Copyright © 2003 Oracle Corporation
Oracle11i CRM ApplicationCRM / Data Mining Integration
Marketing analysts can design targeted campaigns without becoming data mining experts
– Build models, score lists
– Discover patters & make predictions
Data mining increases effectiveness of
targeted campaigns
Copyright © 2003 Oracle Corporation
10g Oracle Data MiningWide range of data mining algorithms Feature Selection
– Attribute Importance Supervised learning (classification & prediction)
– Naïve Bayes – Adaptive Bayes Networks– Support Vector Machines
Unsupervised learning (clustering and associations)
– Association Rules – Orthogonal Clustering– Enhanced k-means Cluster
Feature Extraction– Non Negative Matrix Factorization
Data Mining
Copyright © 2003 Oracle Corporation
10g Additional Features
Text Mining– Ability to combine structured data
and unstructured data
ODM API– Java– PL/SQL
Scoring engine Similarity Searches
– BLAST (Life sciences: genes and proteins)
Data Mining
Copyright © 2003 Oracle Corporation
Oracle Data Mining/DM4JDemonstration
Copyright © 2003 Oracle Corporation
DM4J2 New FeaturesAccess Data
– Import flat file to db wizard
Visualize Data– Data snapshot– Standard summary statistics– Attribute level histograms
Transform Data– Create View / Table– Random and Stratified sampling– Aggregation– Computed column – Normalization– Discretization– Table Splits– Filtering– Recode
Modeling– Building models– Testing models– Applying (scoring) models– Visualize results
Deploy Models/Results– Generate transformation code
(PL/SQL)– View and generate transformation
lineage– Generate model code (Java)– Integrate with Oracle tools
– JDeveloper– Oracle Warehouse Builder– Discoverer
Copyright © 2003 Oracle Corporation
Oracle Data MiningEnabling Data Mining Applications
DM4J GUI add-ins provides wizards for
building and evaluating models
Copyright © 2003 Oracle Corporation
Oracle Data MiningEnabling Data Mining Applications
Data analysts can build and review data
mining models
Data analysts can build and review data
mining models
Copyright © 2003 Oracle Corporation
Oracle Data MiningEnabling Data Mining Applications
Comprehensive GUI for preparing data, building models, evaluating results and deploying models
DM4J provides features to transform and prepare the data
Copyright © 2003 Oracle Corporation
Oracle Data MiningEnabling Data Mining Applications
Automated, scheduled, and event-driven business intelligence applications can can be easily integrated into enterprise applications
DM4J automatically generates the Java
code
Copyright © 2003 Oracle Corporation
Multiple Examples of tumor tissue (public data from Whitehead/MIT)
Oracle 10gSVM Classification of Multiple Tumor Types
DNA Microarray Data
Oracle Data Mining
Actual\Predicted BR PR LU CO LY BL ML UT LE RE PA OV MS BR
BREAST-BR 1 1 PROSTATE-PR 1 1 LUNG-LU 1 2 COLON-CO 3 LYMPHOMA-LY 6 BLADDER-BL 1 2 MELANOMA-ML 1 1 UTERUS-UT 2 LEUKEMIA-LE 1 5 RENAL-RE 3 PANCREAS-PA 1 2 OVARY-OV 1 2 MESOTHELIOMA-MS
3
BRAIN-BR 4
78.25% accuracy
Green=Correct Red=Errors
We feed multiple cancer types data into the Oracle DB: 16,063 genes, 144 cancer
patients and 10 samples per class.
We mine the data using Support Vector Machines and create the confusion matrix
Copyright © 2003 Oracle Corporation
Oracle 10gSVM Classification of Multiple Tumor Types
Actual\Predicted BR PR LU CO LY BL ML UT LE RE PA OV MS BR
BREAST-BR 1 1 PROSTATE-PR 1 1 LUNG-LU 1 2 COLON-CO 3 LYMPHOMA-LY 6 BLADDER-BL 1 2 MELANOMA-ML 1 1 UTERUS-UT 2 LEUKEMIA-LE 1 5 RENAL-RE 3 PANCREAS-PA 1 2 OVARY-OV 1 2 MESOTHELIOMA-MS
3
BRAIN-BR 4
78.25% accuracy
Green=Correct Red=Errors
Oracle Data Mining’s SVM models are able to accurately predict the multi-class tumor problem with
78.25% accuracy.
Copyright © 2003 Oracle Corporation
TDWI“Andrew Braunberg, a senior analyst with research
firm TDWI suggests that DM4J should simplify the job of data analysts. Before Oracle released DM4J,
Braunberg notes, analysts who used ODM had to write out all of the Java code that was required to build their predictive models. “This was a time-
consuming process that slowed model development and deployment.”With DM4J, Braunberg notes,
Java code is automatically written as data analysts build their predictive models. Moreover, developers or data analysts can re-use this code in other
Java-based applications. As a result, he anticipates, DM4J will “enhance analysts’ ability to create predictive models using Oracle Data Mining.””
TDWI Brief: Oracle Data Mining gets GUI; IBM and Cognos' BI partnership April 9, 2003 http://www.dw-institute.com/research/display.asp?id=6632
By Stephen Swoyer
Copyright © 2003 Oracle Corporation
Benefits of Oracle’s ApproachOracle Data Mining Feature Benefit
Data Mining algorithms embedded in database
Eliminates data movement and security exposure
Fastest: DataInformation
Wide range of data mining algorithms
Supports most data mining problems
Runs on multiple platforms Applications may be developed and deployed
Built on Oracle Technology Grid, RAC, integrated BI,… SQL & PL/SQL available Leverage existing skills
AQ&Q U E S T I O N SQ U E S T I O N SA N S W E R SA N S W E R S