oracle data mining update and xerox application charlie berger sr. director of product management,...

24

Upload: jonas-walters

Post on 02-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Oracle Data Mining Update and Xerox Application Charlie Berger Sr. Director of Product Management, Life Sciences and Data Mining charlie.berger@oracle.com
Page 2: Oracle Data Mining Update and Xerox Application Charlie Berger Sr. Director of Product Management, Life Sciences and Data Mining charlie.berger@oracle.com

Oracle Data Mining Update and Xerox Application

Charlie BergerSr. Director of Product Management, Life Sciences

and Data [email protected]

Oracle Corporation

Raj MinhasResearch ScientistXerox Corporation

[email protected]

Session id: 40262

Page 3: Oracle Data Mining Update and Xerox Application Charlie Berger Sr. Director of Product Management, Life Sciences and Data Mining charlie.berger@oracle.com

Copyright © 2003 Oracle Corporation

Agenda

Oracle Data Mining UpdateODM/DM4J DemonstrationXerox Application

Page 4: Oracle Data Mining Update and Xerox Application Charlie Berger Sr. Director of Product Management, Life Sciences and Data Mining charlie.berger@oracle.com

Copyright © 2003 Oracle Corporation

Oracle Business Intelligence Vision

DatabaseDatabaseEngineEngine

DataIntegration

Engine

OLAPEngine

MiningEngine

Multiple databases Multiple servers Multiple engines Proprietary interfaces Complex environment Slow conversion of

data to information

Is to change this …

Page 5: Oracle Data Mining Update and Xerox Application Charlie Berger Sr. Director of Product Management, Life Sciences and Data Mining charlie.berger@oracle.com

Copyright © 2003 Oracle Corporation

Oracle Business Intelligence Vision

Single database Single server Standard interfaces Simplified

environment Fast conversion of

data to information

Data Warehousing

ETL

OLAP

Data Mining

Oracle 10Oracle 10gg DB DB

Into this …

Page 6: Oracle Data Mining Update and Xerox Application Charlie Berger Sr. Director of Product Management, Life Sciences and Data Mining charlie.berger@oracle.com

Copyright © 2003 Oracle Corporation

What is Oracle Data Mining?

Oracle Data Mining (ODM) sifts through massive amounts of data to find hidden patterns and information

— valuable information that can help you better understand your customers and anticipate their behavior

ODM insights can be revealing, significant, and valuable e.g.

– Predict which customers are likely to churn– Discover what factors are involved with a

certain disease– Identify fraudulent behavior

Page 7: Oracle Data Mining Update and Xerox Application Charlie Berger Sr. Director of Product Management, Life Sciences and Data Mining charlie.berger@oracle.com

Copyright © 2003 Oracle Corporation

Data mining embedded in Oracle10g Database

– Simplifies process, eliminates data movement, speeds analysis, deployment and delivers security and scalability

Build models and applications simultaneously

– Build and evaluate models and automatically generate Java code

Enhance applications with predictions and insights

– For example, build churn prediction applications and enable call centers with greater customer insight

Oracle Data MiningOverview & Differentiating Features

Data Mining

Page 8: Oracle Data Mining Update and Xerox Application Charlie Berger Sr. Director of Product Management, Life Sciences and Data Mining charlie.berger@oracle.com

Copyright © 2003 Oracle Corporation

Oracle Data MiningBusiness Intelligence Applications

CEOs can ask…– How can I target the

“right customers” to maximize profits?

Managers can answer…

– Which customers are likely to be interested in which offers and why?

Call Reps can…– Suggest the right

“offer” for the customer

Information Producers Information Consumers

Data Miners can…– Discover patterns

and insights hidden in the data

Oracle Data Mining

Page 9: Oracle Data Mining Update and Xerox Application Charlie Berger Sr. Director of Product Management, Life Sciences and Data Mining charlie.berger@oracle.com

Copyright © 2003 Oracle Corporation

Information ConsumersKey factors that

influence customers likely to purchase a

product

Customers sorted in likelihood to

purchase a product

Page 10: Oracle Data Mining Update and Xerox Application Charlie Berger Sr. Director of Product Management, Life Sciences and Data Mining charlie.berger@oracle.com

Copyright © 2003 Oracle Corporation

Oracle11i CRM ApplicationCRM / Data Mining Integration

Marketing analysts can design targeted campaigns without becoming data mining experts

– Build models, score lists

– Discover patters & make predictions

Data mining increases effectiveness of

targeted campaigns

Page 11: Oracle Data Mining Update and Xerox Application Charlie Berger Sr. Director of Product Management, Life Sciences and Data Mining charlie.berger@oracle.com

Copyright © 2003 Oracle Corporation

10g Oracle Data MiningWide range of data mining algorithms Feature Selection

– Attribute Importance Supervised learning (classification & prediction)

– Naïve Bayes – Adaptive Bayes Networks– Support Vector Machines

Unsupervised learning (clustering and associations)

– Association Rules – Orthogonal Clustering– Enhanced k-means Cluster

Feature Extraction– Non Negative Matrix Factorization

Data Mining

Page 12: Oracle Data Mining Update and Xerox Application Charlie Berger Sr. Director of Product Management, Life Sciences and Data Mining charlie.berger@oracle.com

Copyright © 2003 Oracle Corporation

10g Additional Features

Text Mining– Ability to combine structured data

and unstructured data

ODM API– Java– PL/SQL

Scoring engine Similarity Searches

– BLAST (Life sciences: genes and proteins)

Data Mining

Page 13: Oracle Data Mining Update and Xerox Application Charlie Berger Sr. Director of Product Management, Life Sciences and Data Mining charlie.berger@oracle.com

Copyright © 2003 Oracle Corporation

Oracle Data Mining/DM4JDemonstration

Page 14: Oracle Data Mining Update and Xerox Application Charlie Berger Sr. Director of Product Management, Life Sciences and Data Mining charlie.berger@oracle.com

Copyright © 2003 Oracle Corporation

DM4J2 New FeaturesAccess Data

– Import flat file to db wizard

Visualize Data– Data snapshot– Standard summary statistics– Attribute level histograms

Transform Data– Create View / Table– Random and Stratified sampling– Aggregation– Computed column – Normalization– Discretization– Table Splits– Filtering– Recode

Modeling– Building models– Testing models– Applying (scoring) models– Visualize results

Deploy Models/Results– Generate transformation code

(PL/SQL)– View and generate transformation

lineage– Generate model code (Java)– Integrate with Oracle tools

– JDeveloper– Oracle Warehouse Builder– Discoverer

Page 15: Oracle Data Mining Update and Xerox Application Charlie Berger Sr. Director of Product Management, Life Sciences and Data Mining charlie.berger@oracle.com

Copyright © 2003 Oracle Corporation

Oracle Data MiningEnabling Data Mining Applications

DM4J GUI add-ins provides wizards for

building and evaluating models

Page 16: Oracle Data Mining Update and Xerox Application Charlie Berger Sr. Director of Product Management, Life Sciences and Data Mining charlie.berger@oracle.com

Copyright © 2003 Oracle Corporation

Oracle Data MiningEnabling Data Mining Applications

Data analysts can build and review data

mining models

Data analysts can build and review data

mining models

Page 17: Oracle Data Mining Update and Xerox Application Charlie Berger Sr. Director of Product Management, Life Sciences and Data Mining charlie.berger@oracle.com

Copyright © 2003 Oracle Corporation

Oracle Data MiningEnabling Data Mining Applications

Comprehensive GUI for preparing data, building models, evaluating results and deploying models

DM4J provides features to transform and prepare the data

Page 18: Oracle Data Mining Update and Xerox Application Charlie Berger Sr. Director of Product Management, Life Sciences and Data Mining charlie.berger@oracle.com

Copyright © 2003 Oracle Corporation

Oracle Data MiningEnabling Data Mining Applications

Automated, scheduled, and event-driven business intelligence applications can can be easily integrated into enterprise applications

DM4J automatically generates the Java

code

Page 19: Oracle Data Mining Update and Xerox Application Charlie Berger Sr. Director of Product Management, Life Sciences and Data Mining charlie.berger@oracle.com

Copyright © 2003 Oracle Corporation

Multiple Examples of tumor tissue (public data from Whitehead/MIT)

Oracle 10gSVM Classification of Multiple Tumor Types

DNA Microarray Data

Oracle Data Mining

Actual\Predicted BR PR LU CO LY BL ML UT LE RE PA OV MS BR

BREAST-BR 1 1 PROSTATE-PR 1 1 LUNG-LU 1 2 COLON-CO 3 LYMPHOMA-LY 6 BLADDER-BL 1 2 MELANOMA-ML 1 1 UTERUS-UT 2 LEUKEMIA-LE 1 5 RENAL-RE 3 PANCREAS-PA 1 2 OVARY-OV 1 2 MESOTHELIOMA-MS

3

BRAIN-BR 4

78.25% accuracy

Green=Correct Red=Errors

We feed multiple cancer types data into the Oracle DB: 16,063 genes, 144 cancer

patients and 10 samples per class.

We mine the data using Support Vector Machines and create the confusion matrix

Page 20: Oracle Data Mining Update and Xerox Application Charlie Berger Sr. Director of Product Management, Life Sciences and Data Mining charlie.berger@oracle.com

Copyright © 2003 Oracle Corporation

Oracle 10gSVM Classification of Multiple Tumor Types

Actual\Predicted BR PR LU CO LY BL ML UT LE RE PA OV MS BR

BREAST-BR 1 1 PROSTATE-PR 1 1 LUNG-LU 1 2 COLON-CO 3 LYMPHOMA-LY 6 BLADDER-BL 1 2 MELANOMA-ML 1 1 UTERUS-UT 2 LEUKEMIA-LE 1 5 RENAL-RE 3 PANCREAS-PA 1 2 OVARY-OV 1 2 MESOTHELIOMA-MS

3

BRAIN-BR 4

78.25% accuracy

Green=Correct Red=Errors

Oracle Data Mining’s SVM models are able to accurately predict the multi-class tumor problem with

78.25% accuracy.

Page 21: Oracle Data Mining Update and Xerox Application Charlie Berger Sr. Director of Product Management, Life Sciences and Data Mining charlie.berger@oracle.com

Copyright © 2003 Oracle Corporation

TDWI“Andrew Braunberg, a senior analyst with research

firm TDWI suggests that DM4J should simplify the job of data analysts. Before Oracle released DM4J,

Braunberg notes, analysts who used ODM had to write out all of the Java code that was required to build their predictive models. “This was a time-

consuming process that slowed model development and deployment.”With DM4J, Braunberg notes,

Java code is automatically written as data analysts build their predictive models. Moreover, developers or data analysts can re-use this code in other

Java-based applications. As a result, he anticipates, DM4J will “enhance analysts’ ability to create predictive models using Oracle Data Mining.””

TDWI Brief: Oracle Data Mining gets GUI; IBM and Cognos' BI partnership April 9, 2003 http://www.dw-institute.com/research/display.asp?id=6632

By Stephen Swoyer

Page 22: Oracle Data Mining Update and Xerox Application Charlie Berger Sr. Director of Product Management, Life Sciences and Data Mining charlie.berger@oracle.com

Copyright © 2003 Oracle Corporation

Benefits of Oracle’s ApproachOracle Data Mining Feature Benefit

Data Mining algorithms embedded in database

Eliminates data movement and security exposure

Fastest: DataInformation

Wide range of data mining algorithms

Supports most data mining problems

Runs on multiple platforms Applications may be developed and deployed

Built on Oracle Technology Grid, RAC, integrated BI,… SQL & PL/SQL available Leverage existing skills

Page 23: Oracle Data Mining Update and Xerox Application Charlie Berger Sr. Director of Product Management, Life Sciences and Data Mining charlie.berger@oracle.com

AQ&Q U E S T I O N SQ U E S T I O N SA N S W E R SA N S W E R S

Page 24: Oracle Data Mining Update and Xerox Application Charlie Berger Sr. Director of Product Management, Life Sciences and Data Mining charlie.berger@oracle.com