mining financial data histograms & contingency tables shishir gupta under the guidance of dr....

23
Mining Financial Data Mining Financial Data Histograms & Contingency Histograms & Contingency Tables Tables Shishir Gupta Shishir Gupta Under the guidance of Under the guidance of Dr. Mirsad Hadzikadic Dr. Mirsad Hadzikadic In memory of In memory of Dr Dr . Jan Zytkow . Jan Zytkow SEP 09 1944 - JAN 16 SEP 09 1944 - JAN 16 2001 2001

Post on 19-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic

Mining Financial Data Mining Financial Data Histograms & Contingency Tables Histograms & Contingency Tables

Shishir GuptaShishir GuptaUnder the guidance ofUnder the guidance of

Dr. Mirsad HadzikadicDr. Mirsad Hadzikadic

In memory of In memory of

DrDr. Jan Zytkow. Jan ZytkowSEP 09 1944 - JAN 16 2001SEP 09 1944 - JAN 16 2001

Page 2: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic

AgendaAgenda• Database• Task goals• Tool & technique used• Data preparation and cleaning• Attribute selection• Data transformation• Data Mining/Pattern

Evaluation• Knowledge presentation• Pros/Cons• Questions & Demonstration

Page 3: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic

DatabaseDatabase

• Financial Dataset from PKDD 1999

• Financial Dataset from a Czech Bank

• Relational Dataset• 8 Relations

– ACCOUNT - LOAN– DEMOGRAPH - ORDER– TRANSACTION - CARD– DISPOSITION - CLIENT

Page 4: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic

Task GoalTask Goal

• Determine Good Client to offer some additional service

• Determine Bad Client to watch carefully to minimize bank loss

• Offer Services :– Loan– Credit Card

Page 5: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic

Technique Used - HistogramTechnique Used - Histogram

SQL Statement usedSQL Statement used

SELECT age, COUNT(age)

FROM table_x

GROUP BY age

ORDER BY age

Page 6: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic

Technique Used – C-TablesTechnique Used – C-Tables

SQL Statement usedSQL Statement used

SELECT sex, COUNT(sex), age

FROM table_x a, table_y b

WHERE a.id = b.fid

GROUP BY sex, age

ORDER BY sex, age

Page 7: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic

Technique Used – CorrelationTechnique Used – Correlation

SQL Statement usedSQL Statement usedSELECT x, y

FROM table_x a, table_y b

WHERE a.id = b.fid

ORDER BY x, y

Page 8: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic

Tool - ArchitectureTool - Architecture

Page 9: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic

Tool - DescriptionTool - Description

Page 10: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic

Data CleaningData Cleaning

• Missing Value– Relation

DEMOGRAPHIC

• Incorrect Values– Relation

TRANSACTION

(Data reduced by 10% after cleaning)

Page 11: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic

Data PreparationData Preparation

• Relation CLIENT– Separating SEX &

BDATE from BIRTHNUMBER

• All Date fields converted to AGE– Ref 199901.

Page 12: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic

Data Preparation Data Preparation Cont….Cont….

• Creating Table definitions

• Setting up data in table compatible format

• Loading data into Database

• Evaluate loading errors and changing attribute definitions accordingly

Page 13: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic

Attribute SelectionAttribute Selection

• Decision Relation– LOAN

• Decision Attributes– STATUS

• Classification Attributes– All other attributes

that do not belong to LOAN relation.

A4?

A6?A1?

Class1 Class2 Class1 Class2

Y N

Y N

N Y

Page 14: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic

Data TransformationData Transformation

• Discretization – Continuous attributes into 4 to 10 buckets

• Transactions performed in the year 1997 considered for relation TRANSACTION.– Due to resource limitations– Maximum loans were approved during this

period

TRANSFORM

Page 15: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic

Data Mining/Pattern EvaluationData Mining/Pattern Evaluation• Run Histogram on all

non-key attributes to study its distribution.

• Discretize continuous attributes.

• Run Contingency Table study the reference among two attributes.

• Check significance with Correlation function if both attributes are continuous.

Page 16: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic

Knowledge Presentation - 1Knowledge Presentation - 1

• All loans on accounts where a second person is allowed to dispose are GOOD LOANS

(100%)

Page 17: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic

Knowledge Presentation - 2Knowledge Presentation - 2

• Permanent Orders of type household & leasing indicates financial stability

Page 18: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic

Knowledge Presentation - 3Knowledge Presentation - 3

• Accounts with Cash withdrawals are more likely to repay their loans

Page 19: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic

Knowledge Presentation - 4Knowledge Presentation - 4

• Accounts with low transaction amounts indicate good loans

Page 20: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic

Knowledge Presentation - 5Knowledge Presentation - 5

• Accounts that are in debt indicates BAD LOAN

Page 21: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic

ProsPros

• Flexibility to alter data presentation to Flexibility to alter data presentation to understand the nature of dataunderstand the nature of data

• Customers with no background with Customers with no background with datamining can appreciate the output datamining can appreciate the output results because of its simplicityresults because of its simplicity

• Since there is a provision to store the Since there is a provision to store the results in a file, subsequent analysis results in a file, subsequent analysis on a subset of data becomes very on a subset of data becomes very easyeasy

Page 22: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic

ConsCons

• Needs capability for Multi-Variable Needs capability for Multi-Variable analysis.analysis.

• Some kind of quantification needs to Some kind of quantification needs to be put in.be put in.

• Performance issues with using Performance issues with using RDBMS.RDBMS.

Page 23: Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic

Questions & DemonstrationQuestions & Demonstration