credit card duplication and crime prevention using … · techniques, fuzzy system, decision trees,...

1

CREDIT CARD DUPLICATION AND CRIME PREVENTION USING MACHINE

LEARNING

1DIVYA V, 2HEMAMALINIMURALI, 3INDU PRIYADARSINI T, 4ARTHIJAYABHARATHI

1Assistant Professor, CSE, Bannari Amman Institute of Technology, Sathyamangalam,

Tamilnadu, India 2Student, CSE, Bannari Amman Institute of Technology, Sathyamangalam,



Tamilnadu, India [email protected],

ABSTRACT

Popular payment mode accepted by both offline and online is credit card that provides cashless transaction. It is

easy, convenient and trendy to make payments and other transactions. Credit card fraud is also growing along with

the development in technology. It can also be said that economic fraud is drastically increasing in the global

communicat ion improvement. It is being recorded every year that the loss due to these fraudulen t acts is billions of

dollars. These activities are carried out so elegantly so it is similar to genuine transactions. Hence simple pattern

related techniques and other less complex methods are really not going to work. Having an efficient method of fraud

detection has become a need for all banks in order to minimize chaos and bring order in p lace. There are techniques

like Machine learning, Genetic Programming, fuzzy log ic, sequence alignment, etc are used for detecting credit card

fraudulent transactions. Along with these techniques, KNN algorithm and outlier detection methods are

implemented to optimize the best solution for the fraud detection problem. These approaches are proved to minimize

the false alarm rates and increase the fraud detection rate. Any of these methods can be implemented on bank cred it

card fraud detection system, to detect and prevent the fraudulent transaction.

KEYWORDS : Credit card fraud, KNN, Machine learning, Genetic Programming, fuzzy logic, sequence alignment

INTRODUCTION

In day-to-day usage of credit card transactions the procurement of products and services

assists online transactions or card swiping procurements. This leads to increase in online

transactions using credit and debit cards evolving to a world of effortless expenditure. Frauds

involved in the credit card section have caused severe damage to the users and the service

provider and is said to be even worse in coming days. Fraudsters observe and adapt to the quick

International Journal of Pure and Applied MathematicsVolume 119 No. 16 2018, 4375-4387ISSN: 1314-3395 (on-line version)url: http://www.acadpubl.eu/hub/Special Issue http://www.acadpubl.eu/hub/

4375

2

changes in the technology and find clever ways to involve in illegal activities. Frauds caused due

to these smart hackers are hazardous and dangerous. A well-educated fraudster can create several

identities and conduct credit card transactions without being caught. Talking in terms of e-

commerce transactions the major problem faced due to these fraudulent activities is so similar to

legal ones. Hence having an efficient and complex fraud detection system is a must to prevent

these fraudulent activities. The challenging section of this problem is to detect frauds in a huge

dataset where the legal transactions are more and the fraudulent transactions are bare minimum or

close to negligible.

There are very few ideas on credit card fraud detection methods due to the fact that these

methods cannot be tested without a dataset. Hence it’s difficult to prove the robustness or even the

probability of success ratio of the methods. As known that the credit card information is

confidential, the bank owners and service providers do not encourage in sharing these data for

experiments as well. Hence KNN is used widely for finding the final results obtained in ranking

the research methodology .It investigate credit card fraud using KNN algorithm and outlier

detection method. Outlier detection is an important research area forming part of many

application domains. Specific application domains call for specific detection techniques, while the

more generic ones can be applied in a large number of scenarios with good results. This survey

tries to provide a structured and comprehensive overview of the research on Nearest Neighbor

Based Outlier Detection listing out various techniques as applicable to our area of research.

LITERATURE REVIEW

Due to the theatrical increase of fraud which results in loss of dollars worldwide each

year, several modern techniques in detecting fraud are persistently evolved and applied to many

business fields. Fraud detection involves monitoring the activities of populations of users in

order to estimate, perceive or avoid undesirable behavior[1]. Undesirable behavior is a broad

term including delinquency, fraud, intrusion, and account defaulting. It presents a survey of

current techniques used in credit card fraud detection and telecommunication fraud. The goal is

to provide a comprehensive review of different techniques to detect fraud[1].

Financial fraud is increasing significantly with the development of modern technology

and the global superhighways of communication, resulting in the loss of billions of dollars

worldwide each year. The companies and financial institution loose huge amounts due to fraud

and fraudsters continuously try to find new rules and tactics to commit illegal actions. Thus,

fraud detection systems have become essential for all credit card issuing banks to minimize their

losses. The most commonly used fraud detection methods are Neural Network , rule-induction

techniques, fuzzy system, decision trees, Support Vector Machines (SVM), Artificial Immune

System(AIS) , genetic algorithms, K-Nearest Neighbor algorithms[7]. These techniques can be

used alone or in collaboration using ensemble or meta- learning techniques to build classifiers. It

presents a survey of various techniques used in credit card fraud detection and evaluates each

International Journal of Pure and Applied Mathematics Special Issue

4376

3

methodology based on certain design criteria[2]. And this survey enables us to build a hybrid

approach for developing some effective algorithms which can perform well for the classification

problem with variable misclassification costs and with higher accuracy.

The credit card has become the most popular mode of payment for both online as well as

regular purchase, in cases of fraud associated with it are also rising. Credit card frauds are

increasing day by day regardless of various techniques developed for its detection. Fraudsters are

so experts that they generate new ways of committing fraudulent transactions each day which

demands constant innovation for its detection techniques. Most of the techniques based on

Artificial Intelligence, Fuzzy Logic, Neural Network, Logistic Regression, Naïve Bayesian,

Machine Learning, Sequence Alignment, Decision tree, Bayesian network, meta learning,

Genetic programming etc., these are evolved in detecting various credit card fraudulent

transactions[3]. It presents a survey of various techniques used in various credit card fraud

detection mechanisms[12].

It develops a method which improves a credit card fraud detection solution currently

being used in a bank. With this solution each transaction is scored and based on these scores the

transactions are classified as fraudulent or legitimate. In fraud detection solutions the typical

objective is to minimize the wrongly classified number of transactions[4]. However, in reality,

wrong classification of each transaction do not have the same effect in that if a card is in the ha nd

of fraudsters its whole available limit is used up. Thus, the misclassification cost should be taken

as the available limit of the card. This is what we aim at minimizing in this study. As for the

solution method, it suggested a novel combination of the two well known meta-heuristic

approaches, namely the genetic algorithms and the scatter search. The method is applied to real

data and very successful results are obtained compared to current practice [6].

With an increase usage of credit cards for online purchases as well as regular purchases,

causes a credit card fraud. In the mode of electronic payment system, fraud transactions are rising

on the regular basis. The Modern techniques based on the Data Mining, Genetic Programming

etc. has used in detecting fraudulent transactions. The technique of finding optimal solution for

the problem and implicitly generate the results using genetic algorithm[5]. The aim is to develop a

method of generating test data and to detect fraudulent transaction with this algor ithm. This

algorithm is an optimization technique and evolutionary search based on the principles of genetic

and natural selection, heuristic used to solve high complexity computational problems. It presents

to find the detection of credit card fraud mechanism and examines the result based on the

principles of this algorithm[3]. The benefit of detecting fraud is to clear for both credit card

companies and their clients. The fraudulent transactions are not prevented from being cleared; the

company must accept the financial cost of that transaction. This reduces the cost associated with

higher interest rates, and its charges.

PROPOSED METHODOLOGY

In the existing credit card fraud detection business processing system, fraudulent transaction

will be detected after transaction is done. It is difficult to find out fraudulent and regarding loses


4377

4

will be barred by issuing authorities. KNN algorithm and outlier detection methods are

implemented to optimize the best solution for the fraud detection problem. These approaches are

proved to minimize the false alarm rates and increase the fraud detection rate. Any of these

methods can be implemented on bank credit card fraud detection system, to detect and prevent the

fraudulent transaction. An outlier approach is very different from the traditional observation

method. Outlier method is used to detect unusual behavior of a system using a different

mechanism.

Thus, there is no requirement of predictive model before classification.

KNN achieves high performance rate without using the prior assumptions about the distributions.

The KNN based credit card fraud detection techniques need two major things to be estimated

namely the distance or similarity measure between two data instances. Advantage of using

unsupervised data is that it does not require prior labeling of data or knowledge about fraudulent

methods or transactions. So it need not be trained to discriminate between a legal and illegal

transaction. It simply follows the normal behavior pattern as an unusual activity or fraudulent.

SYSTEM IMPLEMENTATION

Credit card is convenient and substituted for cash, and it is also convenient method of payment.

It is preapproved credit amount that can be used for purchasing goods and services, payment of

that purchase is collected later with agreed charges. The credit cards credit limits various based

upon individual perceived credit worthiness and it is the maximum amount loaned; credit

worthiness is an individual ability and willingness to pay money back. Credit card fraud is a

situation when an individual uses someone else credit card information to charge purchases, or

removing funds for personal reasons from the account without owner’s authorization

Biometric cryptosystems combine cryptography and biometrics to provide security in the

cloud computing environment. Biometrics is the technology of measuring and analyzing

biological data of human body, extracting a feature set from the acquired data, and comparing this

set against to the template set in the database. The proposed system contains the modules such as

preprocessing, feature extraction, feature level fusion, key generation, fusion of encrypt key and

face level features and decryption.

The various biometric traits comprises of fingerprint, face, iris, voice, hand geometry, palm

print and more. The proposed approach combines finger print and iris as well as encryption keys

and face features for cryptographic key generation. The necessity to memorize or carry lengthy

passwords or keys is averted by the integration of biometrics within the cryptography. The steps

involved in the proposed approach based on multimodal biometrics for cryptographic key

generation are as follows:

1. Normalization using histogram equalization

2. Feature extraction form fingerprint


4378

5

3. Feature extraction from iris

4. Feature level fusion of fingerprint and iris features

5. Cryptographic key generation from fused features

6. Double encryption for cloud data security

7. Feature extraction from face

8. Fusion level of encryption message and face features

NORMALIZATION USING HISTOGRAM EQUALIZATION

Histogram Normalization is one of the most commonly used methods for preprocessing. The

most commonly used histogram normalization technique is histogram equalization where one

attempts to change the image histogram into a histogram that is constant for all brightness values.

This would correspond to a brightness distribution where all values are equally probable. For

finger print, iris and face image, the extracted features can be combined easily using fusion

technique.

(1)

Where , grey level and N is total number of pixels in the image. Transformation

to a new intensity value is defined by:

(2)

Equation (2) defines a mapping of the pixels’ intensity values from their original range (0-255) to

the domain of [0,1]. Thus, to obtain pixel values in the original domain, e.g., the 8-bit interval, the

values have to be rescaled. Where is output image after preprocessing step by using

histogram equalization method [1] [2]

FEATURE EXTRACTION FROM FINGERPRINT

For extracting the feature of image, the system uses Gabor filter. A Gabor filter is a linear filter

whose impulse response is defined by a harmonic function multiplied by a Gaussian function. A

set of Gabor filters with different frequencies and orientations may be helpful for extracting useful

features from an image [3]. Gabor filters are a popular tool for this task of extracting more

relevant features from the fingerprint image. Each point is characterized by local Gabor filter

responses. A 2-D Gabor filter is achieved by modulating a 2-D sine wave (at particular

frequencies and orientations) with a Gaussian envelope. The 2-D Gabor filter kernel is defined by

(3)


4379

6

Where, and are the standard deviations of the Gaussian envelope along the x and y-

dimensions, respectively. and are the wavelength and orientation, respectively. The spread

the Gaussian envelope is defined using the wavelength . The rotation of the – plane by an

angle results in a Gabor filter at orientation . is expressed by

) (4)

Where denotes the number of orientations. The Gabor local feature at a point of an image

can be viewed as the response of all different Gabor filters located at that point. A filter response

is obtained by convolving the filter kernel (with specific ) with the image. Here Gabor

used with 8 orientation and 4 scales/wavelengths as pictured on figure 1. For sampling point

the Gabor filter response, denoted as is represented as:

(5)

Where, denotes a greyscale image. When apply all Gabor filters at multiple

frequencies and orientations at a specific point thus get a set of filter responses

for that point. They are denoted as a Gabor jet. A jet is defined as the set of complex

coefficients obtained from one image point, and can be written as

(6)

Where, is magnitude and is phase of Gabor features/coefficients. Gabor filters optimally

capture both local orientation and frequency information from a fingerprint image. By tuning a

Gabor filter to specific frequency and direction, the local frequency and orientation information

can be obtained.

FEATURE EXTRACTION FROM IRIS

Iris is an important part of an eye which remains unique to each individual and remains constant

over the life of a person. The circular black disk in the center of the eyeball is known as pupil.

The pupil dilates when exposed to light and contracts in dark and hence size of pupil var ies with

respect to the exposed light. The iris is the annular ring between the sclera and pupil boundary

and contains the flowery pattern unique to each individual. The extraction of the unique texture

information unique of each individual from rest of the eye image is performed and transformed

into strip to apply pattern matching algorithm between the database and query images of iris. The

following are the important steps involved in iris recognition:

1. Pupil detection

2. Iris detection

3. Normalization


4380

7

4. Feature extraction

To remove the effect of illumination the iris image is converted into grayscale. As pupil is the

largest black area in the intensity image, its edges can be detected easily from the binarized image

by using suitable threshold on the intensity image. The basic idea of this technique is to find

curves that can overcome artifacts such as shadows and noise. The procedure first finds the

intensity image gradient at all the locations in the given image by convolving with the sobel

filters. The gradient images and along x and y direction, is obtained by

kernels that detect horizontal and vertical changes in the image. The sobel filter kernels are

= } (7)

(8)

The absolute value of the gradient images along the vertical and horizontal direction is obtained to

form an absolute gradient image using the equation

+ (9)

Where is the convolution of image with and is the convolution of

image with The absolute gradient image is used to find edges. The edge image is

scanned for pixel (P) having true value and the center is determined with the help of the following

equations

(10)

To demarcate the variation in the outer iris boundary, intensity variation approach is been

employed. In this approach concentric circles of different radii are drawn from the detected

center. Among many circles, the circle with maximum change in intensity with respect to

previous drawn circle is chosen as an iris circle. The approach works well if there exists sharp

variation between iris boundary and sclera. The radius of iris and pupil boundary helps in

transforming the annular portion called as strip to a rectangular block.

Irises from different people may be captured in different size and, even for irises from the same

eye, the size may change due to illumination variations and other factors. Such elastic

deformation in iris texture will affect the results of iris matching. For the purpose of achieving

more accurate recognition results, it is necessary to compensate for the iris deformation. Daugman

[4], [5], [6] solved this problem by projecting the original iris in a Cartesian coordinate system

into a doubly dimensionless pseudo-polar coordinate system.

To transform the annular region of iris into polar equivalent the following set of equations are

used:

With


4381

8

(11)

Where, and are respectively the radius of pupil and the iris, while ( and

( are the coordinates of the pupillary and limbic boundaries in the direction . The

value of θ belongs to [0,2 ], belongs to [0, 1].

Features are the attributes or values extracted to get the unique characteristics from the image.

Features from the iris image are extracted using Gabor filter. It is shown that the functional form

of Gabor filters conforms closely to the receptive profiles of simple cortical cells, and Gabor

filtering is an effective scheme for image representation. A two-dimensional (2D) even Gabor

filter can be represented by the following equation in the spatial domain:

(12)

Where is the frequency of the sinusoidal plane wave along the direction from thex-axis,

and are the space constants of the Gaussian envelope along x and y axes respectively.

Convolution with Gabor filters is still the major contributor to the overall featureextraction time.

Here set the filter frequency to the average ridge frequency where k isthe average inter-

ridge distance. The average inter-ridge distance is approximately 6 pixels in a 600 dpi Iris image.

If is too large, spurious ridges are created in the filtered image whereas if is too small, nearby

ridges are merged into one. Each sub image is respectively filtered by these Gabor filters. This

leads to a total of1120 (8 for each sub image) output images from which the iris features are

extracted. The feature vectors (FV) are passed to the matching module to allow comparisons.

FEATURE LEVEL FUSION OF FINGERPRINT AND IRIS FEATURES

Fusion at this level can be applied to the extraction of different features from the same modality

or different multimodalities. Feature extraction level refers to combining different feature vectors

that are obtained from multiple sensors for the same biometric trait or multiple biometric traits.

When feature vectors are homogeneous, a single feature vector can be calculated with “or”

operations. When the feature vectors are non-homogeneous, can concatenate them to form a

single vector [7] [8]. The multimodal authentication system is shown in Fig.1. And the face

features are stored in the given database.


4382

9

Fig.1 Multimodal Authentication System

MULTIMODAL AUTHENTICATION SYSTEM USING ENSEMBLE SVM WITH AFSA

In this study, feature selection is performed in which SVM acts as the classifier using the

Artificial Fish Swarm Algorithm(AFSA) and the GA. A classified model can be established using

classifiers for assigning data to the correct categories. From input images, the features are first

extracted each data of which has the correct category label. These extracted features were termed

as training data and the remaining were regarded as test images. In mean time, the training images

were taken as input into the SVM classifiers wherein the classified model gets established and

later test images were used for verify this model and to obtain accurate classifications. In view of

solving the problems detailed earlier, the artificial fish swarm helps to give some specific

definitions of the integrated weight algorithm. Identification becomes more computational and

time consuming in case of biometrics when compared to the identity verification. Hence, for

achieving the improvement in the performance with lesser execution time, highly specialized

classification-based biometric system should be approached. Gaussian mixture, neural networks

and KNN classifiers are the most commonly used models-based classifiers different biometrics.

Statistical learning theory has the tendency to absorb both the variability and the similarity

Training data of face, finger print

and iris images

Test data of

Face, finger print and iris images

Template Generation

Normalization

Feature Extraction

Matching using AFSA-SVM

Database Storage

Decision

(Imposter/User)

Feature Level Fusion

Key generation and encrypt cloud data

Decrypt and return feature level fusion values


4383

10

between patterns. Support Vector Machine (SVM) is a powerful learning tool that works

according to statistical learning theory and Machine learning. In solving various classification and

pattern recognition problems, SVM has been evidenced with superior results. In case of several

pattern classification applications, SVM gives better generalization performance when compared

to conventional techniques especially for larger number of input variables. Keeping this in mind,

the evaluation of SVM for the fused feature vector is performed. Pseudo code for Biometric

Authentication Process is given below.

The algorithmic process of SVM integrated weight using the artificial fish swarm algorithm is

been proposed in the present work as follows:

(1) The population size of artificial fish swarm N , the maximum Iterations number, visible

domain of artificial fish VISUAL , the maximum step length of artificial fish STEP , and crowded

factor are given as;

(13)

(14)

Where, , and refer to the p-th elements of the parameter vectors including and

of the artificial fish, and of the next state of artificial fish. The random numbers referred

as in . Similar definition is given for symbols in the following

equations.

(2) Given the initial iterations artificial fish individuals are generated randomly

in the feasible domain of controlled variable to form the initial swarm to produce N groups of w.

Each heft is the random number in [0,1] ;

(3) To calculate the food concentration FC of the current position of each artificial fish in the

initial swarm, and to compare the size. The individual fish with the maximum.

(4) Every artificial fish imitates the following and swarming behaviors, respectively. The

movement with higher FC is selected to execute after each action. The default behavior method is

foraging;

(15)

(5) After a single action of every artificial fish, it’s FC and the FC in the bulletin board is

inspected. The latter should be replaced by the former if its FC is better than that in the bulletin

board;

(6) Judgment of terminal conditions is done to check if the mean square error of successive

values obtained is less than the permissible error, or if num has reached the preset maximum


4384

11

Iterations number. If the above criteria are met, the calculated result should be output (i.e. FC in

bulletin board).

(7) Otherwise, (step 4).

The biometric Authentication process involves,

Bio-aut-process(Person P){

chck.db←P.ID

If(chck.db){

for(i=0;i++)

Template[i]←Feature.extract(P)

db←Template[i]}}

if Q claims as P

{Q.feat←Feature.extract(Q) AFSASVM matcher←Q.feat

AFSASVM matcher←Template[i]

If AFSASVM matcher(Q.feat==Template[i])

Q is P}

else {Q.tmp←Feature.extract(Q)

FSASVM matcher←Q.tmp

AFSASVM matcher←Template[j]

for(j=0;j++)

IfAFSASVM matcher(Q.tmp==Template[j])

Q has passed authentication}}

return}

CONCLUSION AND FUTURE ENHANCEMENTS

As global networking provides many ways for fraudsters, building an accurate and easy handling

credit card fraud detection system is one of the major tasks for banks. There are several ways to

detect the fraud transaction. To progress safety measures of the monetary transaction systems in a

habitual and effectual way, structure a precise and well organized credit card scam detection

system is one of the essential functions for money transactions. By performing oversampling and

extracting the principal direction of the data can use KNN method to determine the anomaly of

the target instance. Hence the KNN method can suit for detecting fraud with the limitation of

memory. By the mean time outlier detection mechanism helps to detect the credit card fraud using

less memory and computation requirements. Especially outlier detection works fast and well on

online large datasets. But compared with power methods and other known anomaly detection

methods, experimental results prove that the KNN method is accurate and efficient.A possible

future work would be to unify the assumptions made by different techniques regarding the normal

and outlier behavior into a statistical or machine learning framework.


4385

12

REFERENCES

[1] Renu, Suman” Analysis on Credit Card Fraud Detection Methods” volume 8 number 1–

Feb 2014

[2]Divya.Iyer, ArtiMohanpurkar, SnehaJanardhan, DhanashreeRathod, Amruta Sardeshmuk”

credit card fraud detection using hidden markov model ” 978-1-4673-0126-8/11/$26.00_c 2011

IEEE

[3] K.RamaKalyani, D.UmaDevi” Fraud Detection of Credit Card PaymentSystem by Genetic

Algorithm” Volume 3, Issue 7, July-2012

[4] AbhinavSrivastava, AmlanKundu, ShamikSural, and Arun K. Majumdar”Credit Card Fraud

Detection Using Hidden Markov Model” VOL. 5, NO. 1,JANUARY-MARCH 2012

[5]VenkataRatnamGanji,” Credit card fraud detection using Anti-k

NearestNeighborAlgorithm”,International Journal on Computer Science andEngineering (IJCSE)

Vol. 4 ,06 June 2012,(1035-1039)

[6] EkremDuman, M. HamdiOzcelik “Detecting credit card fraud by geneticalgorithm and scatter

search”. Elsevier, Expert Systems with Applications,(2011). 38; (13057–13063).

[7]A.J. Graaff A.P. Engelbrechtagraaff “The Artificial Immune System forFraud Detection in the

Telecommunications Environment” 20 November 2014.

[8] S. Benson Edwin Raj, A. Annie Portia, “Analysis on Credit Card FraudDetection Methods”,

International Conference on Computer, Communicationand Electrical Technology –

ICCCET2011, 18th & 19th March, 2011

[9] Y. Sahin and E. Duman, “Detecting Credit Card Fraud by Decision Treesand Support Vector

Machines”, International Multiconference of Engineers andcomputer scientists March, 2011.

[10] S. Benson Edwin Raj, A. Annie Portia “Analysis on Credit Card FraudDetection Methods”,

IEEE-International Conference on Computer,Communication and Electrical Technology, (2011),

pg.152-156.

[11] P.Jayant,Vaishali,D.Sharma,” Survey on Credit Card Fraud DetectionTechniques”,

International Journal of Engineering Research & Technology(IJERT), Vol. 3 Issue 3, March –

2014,pg.1545-1551

[12] PriyaRavindraShimpi, Prof. VijayalaxmiKadroli, ”Survey on Credit CardFraud Detection

Techniques”, International Journal Of Engineering AndComputer Science, Volume 4 Issue 11

Nov 2015, Page No. 15010-15015.


4386

credit card duplication and crime prevention using … · techniques, fuzzy system, decision trees,...

Documents