© 2002 page 1 data mining tools for zle copying and use restrictions: material under this...

38
© 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and Genus Software. Any use of the this material, in part or whole, except in context of Genus Data Mining Integrator and Data Mart Builder, without written permission from HP and Genus is prohibited.

Upload: sean-schroeder

Post on 26-Mar-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

© 2002 page 1

Data Mining Tools For ZLE

Copying and Use Restrictions:

Material under this presentation is the Intellectual Property of HP Corporation and Genus Software. Any use of the this material, in part or whole, except in context of Genus Data Mining Integrator and Data Mart Builder, without written permission from HP and Genus is prohibited.

Page 2: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

page 2© 2002

agenda

•data mining in ZLE solutions

•ZLE data mining toolkit

•toolkit demonstration

agenda

Page 3: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

© 2002 page 3

title text

Meta Group

• process of identifying and/or extracting previously unknown, non-trivial, unanticipated, important information from large sets of data

Gartner Group

• process of discovering meaningful new correlations, patterns and trends by sifting through large amounts of data stored in repositories, using pattern recognition technologies, statistical and mathematical techniques

Page 4: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

© 2002 page 4

title text

• role– determine most

effective responses to business events

•ZLE facilitates mining

by providing– a rich, integrated,

current data source– an integrated

operational environment into which models can be deployed

•data mining helps to realize the full business value of a ZLE system

Page 5: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

page 5© 2002

derive attributes

identify and definebusiness opportunity

create case set

deploy model

profile data

transform data

assess performance

train models

typically about 75% of process

ZLE data mining process• understand the opportunity

– identify and define business opportunity

• prepare data– profile and understand data– derive attributes– transform data– create case set

• build models– train models– assess model performance

• use models– deploy model– monitor model performance

monitor modelperformance

Page 6: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

page 6© 2002

agenda

•data mining in ZLE solutions

•ZLE data mining toolkit

•toolkit demonstration

agenda

Page 7: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

page 7© 2002

the ZLE data mining toolkit

•goal:– provide tools that facilitate ZLE data mining – reduce process cycle times dramatically

• three tools being developed by Genus Software:– data preparation– data transfer – model deployment

•partners: Genus, MicroStrategy, SAS

•product names:

– Genus Mining Integrator for NonStop SQL (all three tools)

– Genus Mart Builder for NonStop SQL (first two tools only)

Page 8: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

page 8© 2002

part of Genus toolkit

ZLE data mining analytical cycle

Data Store(NonStop SQL)

Data Preparation(profiling/transforming data)

Model Deployment(written to DB tables)

Data Transfer(fast parallel streams)

Mining Mart(Tru64/Windows)

Scoring

Engine

RulesEngin

e

Agg.Engin

eInte

ract

ion

Manager

Real-Time Scoring(using the Recommender)

part of ZDK 3

Modeling (SAS Enterprise Miner)

available from SAS

Page 9: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

page 9© 2002

agenda

•data mining in ZLE solutions

•ZLE data mining toolkit

•toolkit demonstration

agenda

Page 10: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

page 10© 2002

toolkit demonstration

•credit card fraud detection example

•opportunity: use ZLE data store data to predict, in real-time, which credit card purchases are likely to be fraudulent

•use tools to: – build a case set table with one row describing each

purchase

– transfer table to SAS server for modeling

– deploy predictive model to ZLE data store

– execute model in real-time to make fraud predictions

•steps described, including many tool screen shots

Page 11: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

© 2002 page 11

• based on the MicroStrategy (MSI) Business Intelligence toolset, leverages GUI, logical data model support, SQL generation, etc.

• uses NonStop SQL/MX DBMS, leverages sampling, TRANSPOSE, statistical functions, …

• custom tool developed by Genus using MSI SDK for NonStop SQL operations and functionality not supported by MSI tools

toolkit data preparation

solution

Page 12: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

page 12© 2002

two main ZLE data preparation tasks

1. profile tables– column names and types– partitioning information, attributes, key structure, …– column values

2. transform source tables– derive new attributes– aggregate to appropriate level– clean data– pivot– combine to form case set

Page 13: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

page 13© 2002

the MicroStrategy desktop

Page 14: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

page 14© 2002

MSI profile report: fraud vs. billing state

Page 15: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

page 15© 2002

NonStop SQL/MX sampling

•source table sampling– insert into CustSampselect * from Cust sample random 1 percent clusters of 10 blocksunion select * from Custwhere CardNo in (select CardNo from FrdFlg)

•enables interactive and exploratory data prep

•cleanly integrated into SQL

•performed efficiently in DP2

•easily accessible through Genus tool

Page 16: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

page 16© 2002

creating a materialized sample table using the Genus Data Mart Builder

Page 17: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

page 17© 2002

identifying source and sample method

Page 18: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

page 18© 2002

specifying materialized sample table

Page 19: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

page 19© 2002

transforming source data

Billions of Purchases

Millions of Accounts

PurchasePurchDt Amt Store Acct

102302 11:02:44 $4.50 423 8849940044102302 11:02:44 $88.38 221 8376636636102302 11:02:45 $121.33 221 8376636636102302 11:02:45 $19.99 73 3866493657

…102402 11:01:01 $43.84 743 8376636636102402 11:02:59 $77.01 23 5378366284102402 11:02:21 $11.63 189 8376636636102402 11:03:58 $144.00 270 3866493657

…102502 12:01:34 $289.08 45 6474538469102502 12:01:49 $71.99 301 3866493657102502 12:03:45 $38.23 219 5382638977102502 12:03:58 $58.84 17 3866493657

StoreSize Age CS

249 4 33337 9 88893 1 76102 19 43

219 12 44430 6 90501 14 23194 2 5

579 5 75220 13 34331 1 91430 8 18

AccountCR CrLim Ten1 1000 80 4600 460 1700 151 1700 15

0 4600 890 1000 10 2000 201 1500 12

0 3000 301 3300 280 2900 290 1800 16

P1 S1 A1 P3 S3 A30 0 0 0 0 01 1 $54 1 1 $540 0 0 0 0 00 0 0 0 0 0

1 1 $121 1 1 $1211 1 $54 1 1 $542 2 $79 2 2 $791 1 $20 3 1 $60

0 0 0 0 0 02 1 $54 4 1 $590 0 0 0 0 03 2 $55 5 2 $58

Purchase History Min Max Elec Vid Jewl

$1 $3 0 0 0$9 $17 1 1 0

$19 $42 0 0 1$4 $9 0 1 0

$8 $19 1 0 0$15 $22 1 1 0$1 $3 0 0 0

$11 $42 1 1 1

$19 $98 0 0 1$7 $22 0 1 0$4 $9 0 1 0$6 $14 1 0 1

ItemSummary Frd?

0100

0000

0001

Fraud

Aggregate and Pivot

Page 20: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

page 20© 2002

result: a case set for modeling

PurchDt Amt Store Acct102302 11:02:44 $4.50 423 8849940044102302 11:02:44 $88.38 221 8376636636102302 11:02:45 $121.33 221 8376636636102302 11:02:45 $19.99 73 3866493657

…102402 11:01:01 $43.84 743 4674847467102402 11:02:59 $77.01 23 5378366284102402 11:02:21 $11.63 189 8376636636102402 11:03:58 $144.00 270 3866493657

…102502 12:01:34 $289.08 45 6474538469102502 12:01:49 $71.99 301 3866493657102502 12:03:45 $38.23 219 5382638977102502 12:03:58 $58.84 17 3866493657

Size Age CS249 4 33337 9 88893 1 76102 19 43

219 12 44430 6 90501 14 23194 2 5

579 5 75220 13 34331 1 91430 8 18

CR CrLim Ten1 1000 80 4600 460 1700 151 1700 15

0 4600 890 1000 10 2000 201 1500 12

0 3000 301 3300 280 2900 290 1800 16

P1 S1 A1 P3 S3 A30 0 0 0 0 01 1 $54 1 1 $540 0 0 0 0 00 0 0 0 0 0

1 1 $121 1 1 $1211 1 $54 1 1 $542 2 $79 2 2 $791 1 $20 3 1 $60

0 0 0 0 0 02 1 $54 4 1 $590 0 0 0 0 03 2 $55 5 2 $58

Min Max Elec Vid Jewl$1 $3 0 0 0$9 $17 1 1 0

$19 $42 0 0 1$4 $9 0 1 0

$8 $19 1 0 0$15 $22 1 1 0$1 $3 0 0 0

$11 $42 1 1 1

$19 $98 0 0 1$7 $22 0 1 0$4 $9 0 1 0$6 $14 1 0 1

Frd?0100

1101

0001

Hundreds of Attributes

One Row Per Purchase

Mix of Fraud and No-Fraud Purchases

Page 21: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

page 21© 2002

MSI Datamart report summarizing items

Page 22: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

page 22© 2002

data transfer tool

Data Store

Mining Mart

NonStop SQL/MX

ASCII files

SAS data set

data transfer tool• task: transfer case set from data store to mining

mart

coordinator coordinator

– design

HTML

HTTP

JDBC

Web browserclient

Web server

Web App.

receive SAS importtransferreceive SAS importtransfer

receive SAS importtransferreceive SAS importtransfer

Page 23: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

page 23© 2002

data transfer specification screen

Page 24: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

page 24© 2002

transfer monitoring

Page 25: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

page 25© 2002

modeling in SAS enterprise miner

Page 26: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

© 2002 page 26

body copy

model export

score converter node generates Java model code

reporter node exports code and HTML report to project directory

Page 27: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

page 27© 2002

NonStop SQL/MX

Data Store

SAS Open

Metadata server

File/SAS server

SASEnterpri

seMiner

Mining Mart

model deployment tool• task

– copy model information to a ZLE Data Store

Model export/registration

– design

HTML

HTTP

JDBC access

Web browserclient

File/registryaccess

Web Server

Web App.

Page 28: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

page 28© 2002

starting the model deployment tool

Page 29: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

page 29© 2002

connecting to a Data Store

Page 30: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

page 30© 2002

a list of models in the Data Store

Page 31: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

page 31© 2002

viewing a deployed model

Page 32: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

page 32© 2002

selecting a SAS report directory

Page 33: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

page 33© 2002

viewing available reports

Page 34: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

page 34© 2002

viewing an Enterprise Miner report

Page 35: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

page 35© 2002

deploying a model

Page 36: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

page 36© 2002

deployment confirmation

Page 37: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

page 37© 2002

real-time scoring using the Recommender

Scoring Engine

Aggregation Engine

Rules Engine

Model Aggregates

Model Scores

DeployedModels

BusinessRules

AggregateDefinitions

Offers /Advice

CustomerData

Inte

ract

ion M

an

ag

er

Page 38: © 2002 page 1 Data Mining Tools For ZLE Copying and Use Restrictions: Material under this presentation is the Intellectual Property of HP Corporation and

page 38© 2002

how to get the data mining tools

•Product Names

– Genus Mining Integrator for NonStop SQL (Data Preparation, Data Transfer, and Model Deployment tools)

– Genus Mart Builder for NonStop SQL (first two tools only)

•Can be ordered through HP, support provided by Genus

•Availability: calendar Q4 2002

•For more information, contact

[email protected] (Product Manager)

[email protected] (Program Manager)

[email protected] (Development)