© 2002 page 1 data mining tools for zle copying and use restrictions: material under this...
TRANSCRIPT
© 2002 page 1
Data Mining Tools For ZLE
Copying and Use Restrictions:
Material under this presentation is the Intellectual Property of HP Corporation and Genus Software. Any use of the this material, in part or whole, except in context of Genus Data Mining Integrator and Data Mart Builder, without written permission from HP and Genus is prohibited.
page 2© 2002
agenda
•data mining in ZLE solutions
•ZLE data mining toolkit
•toolkit demonstration
agenda
© 2002 page 3
title text
Meta Group
• process of identifying and/or extracting previously unknown, non-trivial, unanticipated, important information from large sets of data
Gartner Group
• process of discovering meaningful new correlations, patterns and trends by sifting through large amounts of data stored in repositories, using pattern recognition technologies, statistical and mathematical techniques
© 2002 page 4
title text
• role– determine most
effective responses to business events
•ZLE facilitates mining
by providing– a rich, integrated,
current data source– an integrated
operational environment into which models can be deployed
•data mining helps to realize the full business value of a ZLE system
page 5© 2002
derive attributes
identify and definebusiness opportunity
create case set
deploy model
profile data
transform data
assess performance
train models
typically about 75% of process
ZLE data mining process• understand the opportunity
– identify and define business opportunity
• prepare data– profile and understand data– derive attributes– transform data– create case set
• build models– train models– assess model performance
• use models– deploy model– monitor model performance
monitor modelperformance
page 6© 2002
agenda
•data mining in ZLE solutions
•ZLE data mining toolkit
•toolkit demonstration
agenda
page 7© 2002
the ZLE data mining toolkit
•goal:– provide tools that facilitate ZLE data mining – reduce process cycle times dramatically
• three tools being developed by Genus Software:– data preparation– data transfer – model deployment
•partners: Genus, MicroStrategy, SAS
•product names:
– Genus Mining Integrator for NonStop SQL (all three tools)
– Genus Mart Builder for NonStop SQL (first two tools only)
page 8© 2002
part of Genus toolkit
ZLE data mining analytical cycle
Data Store(NonStop SQL)
Data Preparation(profiling/transforming data)
Model Deployment(written to DB tables)
Data Transfer(fast parallel streams)
Mining Mart(Tru64/Windows)
Scoring
Engine
RulesEngin
e
Agg.Engin
eInte
ract
ion
Manager
Real-Time Scoring(using the Recommender)
part of ZDK 3
Modeling (SAS Enterprise Miner)
available from SAS
page 9© 2002
agenda
•data mining in ZLE solutions
•ZLE data mining toolkit
•toolkit demonstration
agenda
page 10© 2002
toolkit demonstration
•credit card fraud detection example
•opportunity: use ZLE data store data to predict, in real-time, which credit card purchases are likely to be fraudulent
•use tools to: – build a case set table with one row describing each
purchase
– transfer table to SAS server for modeling
– deploy predictive model to ZLE data store
– execute model in real-time to make fraud predictions
•steps described, including many tool screen shots
© 2002 page 11
• based on the MicroStrategy (MSI) Business Intelligence toolset, leverages GUI, logical data model support, SQL generation, etc.
• uses NonStop SQL/MX DBMS, leverages sampling, TRANSPOSE, statistical functions, …
• custom tool developed by Genus using MSI SDK for NonStop SQL operations and functionality not supported by MSI tools
toolkit data preparation
solution
page 12© 2002
two main ZLE data preparation tasks
1. profile tables– column names and types– partitioning information, attributes, key structure, …– column values
2. transform source tables– derive new attributes– aggregate to appropriate level– clean data– pivot– combine to form case set
page 13© 2002
the MicroStrategy desktop
page 14© 2002
MSI profile report: fraud vs. billing state
page 15© 2002
NonStop SQL/MX sampling
•source table sampling– insert into CustSampselect * from Cust sample random 1 percent clusters of 10 blocksunion select * from Custwhere CardNo in (select CardNo from FrdFlg)
•enables interactive and exploratory data prep
•cleanly integrated into SQL
•performed efficiently in DP2
•easily accessible through Genus tool
page 16© 2002
creating a materialized sample table using the Genus Data Mart Builder
page 17© 2002
identifying source and sample method
page 18© 2002
specifying materialized sample table
page 19© 2002
transforming source data
Billions of Purchases
Millions of Accounts
PurchasePurchDt Amt Store Acct
102302 11:02:44 $4.50 423 8849940044102302 11:02:44 $88.38 221 8376636636102302 11:02:45 $121.33 221 8376636636102302 11:02:45 $19.99 73 3866493657
…102402 11:01:01 $43.84 743 8376636636102402 11:02:59 $77.01 23 5378366284102402 11:02:21 $11.63 189 8376636636102402 11:03:58 $144.00 270 3866493657
…102502 12:01:34 $289.08 45 6474538469102502 12:01:49 $71.99 301 3866493657102502 12:03:45 $38.23 219 5382638977102502 12:03:58 $58.84 17 3866493657
…
StoreSize Age CS
249 4 33337 9 88893 1 76102 19 43
219 12 44430 6 90501 14 23194 2 5
579 5 75220 13 34331 1 91430 8 18
AccountCR CrLim Ten1 1000 80 4600 460 1700 151 1700 15
0 4600 890 1000 10 2000 201 1500 12
0 3000 301 3300 280 2900 290 1800 16
P1 S1 A1 P3 S3 A30 0 0 0 0 01 1 $54 1 1 $540 0 0 0 0 00 0 0 0 0 0
1 1 $121 1 1 $1211 1 $54 1 1 $542 2 $79 2 2 $791 1 $20 3 1 $60
0 0 0 0 0 02 1 $54 4 1 $590 0 0 0 0 03 2 $55 5 2 $58
Purchase History Min Max Elec Vid Jewl
$1 $3 0 0 0$9 $17 1 1 0
$19 $42 0 0 1$4 $9 0 1 0
$8 $19 1 0 0$15 $22 1 1 0$1 $3 0 0 0
$11 $42 1 1 1
$19 $98 0 0 1$7 $22 0 1 0$4 $9 0 1 0$6 $14 1 0 1
ItemSummary Frd?
0100
0000
0001
Fraud
Aggregate and Pivot
page 20© 2002
result: a case set for modeling
PurchDt Amt Store Acct102302 11:02:44 $4.50 423 8849940044102302 11:02:44 $88.38 221 8376636636102302 11:02:45 $121.33 221 8376636636102302 11:02:45 $19.99 73 3866493657
…102402 11:01:01 $43.84 743 4674847467102402 11:02:59 $77.01 23 5378366284102402 11:02:21 $11.63 189 8376636636102402 11:03:58 $144.00 270 3866493657
…102502 12:01:34 $289.08 45 6474538469102502 12:01:49 $71.99 301 3866493657102502 12:03:45 $38.23 219 5382638977102502 12:03:58 $58.84 17 3866493657
…
Size Age CS249 4 33337 9 88893 1 76102 19 43
219 12 44430 6 90501 14 23194 2 5
579 5 75220 13 34331 1 91430 8 18
CR CrLim Ten1 1000 80 4600 460 1700 151 1700 15
0 4600 890 1000 10 2000 201 1500 12
0 3000 301 3300 280 2900 290 1800 16
P1 S1 A1 P3 S3 A30 0 0 0 0 01 1 $54 1 1 $540 0 0 0 0 00 0 0 0 0 0
1 1 $121 1 1 $1211 1 $54 1 1 $542 2 $79 2 2 $791 1 $20 3 1 $60
0 0 0 0 0 02 1 $54 4 1 $590 0 0 0 0 03 2 $55 5 2 $58
Min Max Elec Vid Jewl$1 $3 0 0 0$9 $17 1 1 0
$19 $42 0 0 1$4 $9 0 1 0
$8 $19 1 0 0$15 $22 1 1 0$1 $3 0 0 0
$11 $42 1 1 1
$19 $98 0 0 1$7 $22 0 1 0$4 $9 0 1 0$6 $14 1 0 1
Frd?0100
1101
0001
Hundreds of Attributes
One Row Per Purchase
Mix of Fraud and No-Fraud Purchases
page 21© 2002
MSI Datamart report summarizing items
page 22© 2002
data transfer tool
Data Store
Mining Mart
NonStop SQL/MX
ASCII files
SAS data set
data transfer tool• task: transfer case set from data store to mining
mart
coordinator coordinator
– design
HTML
HTTP
JDBC
Web browserclient
Web server
Web App.
receive SAS importtransferreceive SAS importtransfer
receive SAS importtransferreceive SAS importtransfer
page 23© 2002
data transfer specification screen
page 24© 2002
transfer monitoring
page 25© 2002
modeling in SAS enterprise miner
© 2002 page 26
body copy
model export
score converter node generates Java model code
reporter node exports code and HTML report to project directory
page 27© 2002
NonStop SQL/MX
Data Store
SAS Open
Metadata server
File/SAS server
SASEnterpri
seMiner
Mining Mart
model deployment tool• task
– copy model information to a ZLE Data Store
Model export/registration
– design
HTML
HTTP
JDBC access
Web browserclient
File/registryaccess
Web Server
Web App.
page 28© 2002
starting the model deployment tool
page 29© 2002
connecting to a Data Store
page 30© 2002
a list of models in the Data Store
page 31© 2002
viewing a deployed model
page 32© 2002
selecting a SAS report directory
page 33© 2002
viewing available reports
page 34© 2002
viewing an Enterprise Miner report
page 35© 2002
deploying a model
page 36© 2002
deployment confirmation
page 37© 2002
real-time scoring using the Recommender
Scoring Engine
Aggregation Engine
Rules Engine
Model Aggregates
Model Scores
DeployedModels
BusinessRules
AggregateDefinitions
Offers /Advice
CustomerData
Inte
ract
ion M
an
ag
er
page 38© 2002
how to get the data mining tools
•Product Names
– Genus Mining Integrator for NonStop SQL (Data Preparation, Data Transfer, and Model Deployment tools)
– Genus Mart Builder for NonStop SQL (first two tools only)
•Can be ordered through HP, support provided by Genus
•Availability: calendar Q4 2002
•For more information, contact
– [email protected] (Product Manager)
– [email protected] (Program Manager)
– [email protected] (Development)