credit card duplication and crime prevention using … · techniques, fuzzy system, decision trees,...
TRANSCRIPT
1
CREDIT CARD DUPLICATION AND CRIME PREVENTION USING MACHINE
LEARNING
1DIVYA V, 2HEMAMALINIMURALI, 3INDU PRIYADARSINI T, 4ARTHIJAYABHARATHI
1Assistant Professor, CSE, Bannari Amman Institute of Technology, Sathyamangalam,
Tamilnadu, India 2Student, CSE, Bannari Amman Institute of Technology, Sathyamangalam,
Tamilnadu, India 3Student, CSE, Bannari Amman Institute of Technology, Sathyamangalam,
Tamilnadu, India 4Student, CSE, Bannari Amman Institute of Technology, Sathyamangalam,
Tamilnadu, India [email protected],
ABSTRACT
Popular payment mode accepted by both offline and online is credit card that provides cashless transaction. It is
easy, convenient and trendy to make payments and other transactions. Credit card fraud is also growing along with
the development in technology. It can also be said that economic fraud is drastically increasing in the global
communicat ion improvement. It is being recorded every year that the loss due to these fraudulen t acts is billions of
dollars. These activities are carried out so elegantly so it is similar to genuine transactions. Hence simple pattern
related techniques and other less complex methods are really not going to work. Having an efficient method of fraud
detection has become a need for all banks in order to minimize chaos and bring order in p lace. There are techniques
like Machine learning, Genetic Programming, fuzzy log ic, sequence alignment, etc are used for detecting credit card
fraudulent transactions. Along with these techniques, KNN algorithm and outlier detection methods are
implemented to optimize the best solution for the fraud detection problem. These approaches are proved to minimize
the false alarm rates and increase the fraud detection rate. Any of these methods can be implemented on bank cred it
card fraud detection system, to detect and prevent the fraudulent transaction.
KEYWORDS : Credit card fraud, KNN, Machine learning, Genetic Programming, fuzzy logic, sequence alignment
INTRODUCTION
In day-to-day usage of credit card transactions the procurement of products and services
assists online transactions or card swiping procurements. This leads to increase in online
transactions using credit and debit cards evolving to a world of effortless expenditure. Frauds
involved in the credit card section have caused severe damage to the users and the service
provider and is said to be even worse in coming days. Fraudsters observe and adapt to the quick
International Journal of Pure and Applied MathematicsVolume 119 No. 16 2018, 4375-4387ISSN: 1314-3395 (on-line version)url: http://www.acadpubl.eu/hub/Special Issue http://www.acadpubl.eu/hub/
4375
2
changes in the technology and find clever ways to involve in illegal activities. Frauds caused due
to these smart hackers are hazardous and dangerous. A well-educated fraudster can create several
identities and conduct credit card transactions without being caught. Talking in terms of e-
commerce transactions the major problem faced due to these fraudulent activities is so similar to
legal ones. Hence having an efficient and complex fraud detection system is a must to prevent
these fraudulent activities. The challenging section of this problem is to detect frauds in a huge
dataset where the legal transactions are more and the fraudulent transactions are bare minimum or
close to negligible.
There are very few ideas on credit card fraud detection methods due to the fact that these
methods cannot be tested without a dataset. Hence it’s difficult to prove the robustness or even the
probability of success ratio of the methods. As known that the credit card information is
confidential, the bank owners and service providers do not encourage in sharing these data for
experiments as well. Hence KNN is used widely for finding the final results obtained in ranking
the research methodology .It investigate credit card fraud using KNN algorithm and outlier
detection method. Outlier detection is an important research area forming part of many
application domains. Specific application domains call for specific detection techniques, while the
more generic ones can be applied in a large number of scenarios with good results. This survey
tries to provide a structured and comprehensive overview of the research on Nearest Neighbor
Based Outlier Detection listing out various techniques as applicable to our area of research.
LITERATURE REVIEW
Due to the theatrical increase of fraud which results in loss of dollars worldwide each
year, several modern techniques in detecting fraud are persistently evolved and applied to many
business fields. Fraud detection involves monitoring the activities of populations of users in
order to estimate, perceive or avoid undesirable behavior[1]. Undesirable behavior is a broad
term including delinquency, fraud, intrusion, and account defaulting. It presents a survey of
current techniques used in credit card fraud detection and telecommunication fraud. The goal is
to provide a comprehensive review of different techniques to detect fraud[1].
Financial fraud is increasing significantly with the development of modern technology
and the global superhighways of communication, resulting in the loss of billions of dollars
worldwide each year. The companies and financial institution loose huge amounts due to fraud
and fraudsters continuously try to find new rules and tactics to commit illegal actions. Thus,
fraud detection systems have become essential for all credit card issuing banks to minimize their
losses. The most commonly used fraud detection methods are Neural Network , rule-induction
techniques, fuzzy system, decision trees, Support Vector Machines (SVM), Artificial Immune
System(AIS) , genetic algorithms, K-Nearest Neighbor algorithms[7]. These techniques can be
used alone or in collaboration using ensemble or meta- learning techniques to build classifiers. It
presents a survey of various techniques used in credit card fraud detection and evaluates each
International Journal of Pure and Applied Mathematics Special Issue
4376
3
methodology based on certain design criteria[2]. And this survey enables us to build a hybrid
approach for developing some effective algorithms which can perform well for the classification
problem with variable misclassification costs and with higher accuracy.
The credit card has become the most popular mode of payment for both online as well as
regular purchase, in cases of fraud associated with it are also rising. Credit card frauds are
increasing day by day regardless of various techniques developed for its detection. Fraudsters are
so experts that they generate new ways of committing fraudulent transactions each day which
demands constant innovation for its detection techniques. Most of the techniques based on
Artificial Intelligence, Fuzzy Logic, Neural Network, Logistic Regression, Naïve Bayesian,
Machine Learning, Sequence Alignment, Decision tree, Bayesian network, meta learning,
Genetic programming etc., these are evolved in detecting various credit card fraudulent
transactions[3]. It presents a survey of various techniques used in various credit card fraud
detection mechanisms[12].
It develops a method which improves a credit card fraud detection solution currently
being used in a bank. With this solution each transaction is scored and based on these scores the
transactions are classified as fraudulent or legitimate. In fraud detection solutions the typical
objective is to minimize the wrongly classified number of transactions[4]. However, in reality,
wrong classification of each transaction do not have the same effect in that if a card is in the ha nd
of fraudsters its whole available limit is used up. Thus, the misclassification cost should be taken
as the available limit of the card. This is what we aim at minimizing in this study. As for the
solution method, it suggested a novel combination of the two well known meta-heuristic
approaches, namely the genetic algorithms and the scatter search. The method is applied to real
data and very successful results are obtained compared to current practice [6].
With an increase usage of credit cards for online purchases as well as regular purchases,
causes a credit card fraud. In the mode of electronic payment system, fraud transactions are rising
on the regular basis. The Modern techniques based on the Data Mining, Genetic Programming
etc. has used in detecting fraudulent transactions. The technique of finding optimal solution for
the problem and implicitly generate the results using genetic algorithm[5]. The aim is to develop a
method of generating test data and to detect fraudulent transaction with this algor ithm. This
algorithm is an optimization technique and evolutionary search based on the principles of genetic
and natural selection, heuristic used to solve high complexity computational problems. It presents
to find the detection of credit card fraud mechanism and examines the result based on the
principles of this algorithm[3]. The benefit of detecting fraud is to clear for both credit card
companies and their clients. The fraudulent transactions are not prevented from being cleared; the
company must accept the financial cost of that transaction. This reduces the cost associated with
higher interest rates, and its charges.
PROPOSED METHODOLOGY
In the existing credit card fraud detection business processing system, fraudulent transaction
will be detected after transaction is done. It is difficult to find out fraudulent and regarding loses
International Journal of Pure and Applied Mathematics Special Issue
4377
4
will be barred by issuing authorities. KNN algorithm and outlier detection methods are
implemented to optimize the best solution for the fraud detection problem. These approaches are
proved to minimize the false alarm rates and increase the fraud detection rate. Any of these
methods can be implemented on bank credit card fraud detection system, to detect and prevent the
fraudulent transaction. An outlier approach is very different from the traditional observation
method. Outlier method is used to detect unusual behavior of a system using a different
mechanism.
Thus, there is no requirement of predictive model before classification.
KNN achieves high performance rate without using the prior assumptions about the distributions.
The KNN based credit card fraud detection techniques need two major things to be estimated
namely the distance or similarity measure between two data instances. Advantage of using
unsupervised data is that it does not require prior labeling of data or knowledge about fraudulent
methods or transactions. So it need not be trained to discriminate between a legal and illegal
transaction. It simply follows the normal behavior pattern as an unusual activity or fraudulent.
SYSTEM IMPLEMENTATION
Credit card is convenient and substituted for cash, and it is also convenient method of payment.
It is preapproved credit amount that can be used for purchasing goods and services, payment of
that purchase is collected later with agreed charges. The credit cards credit limits various based
upon individual perceived credit worthiness and it is the maximum amount loaned; credit
worthiness is an individual ability and willingness to pay money back. Credit card fraud is a
situation when an individual uses someone else credit card information to charge purchases, or
removing funds for personal reasons from the account without owner’s authorization
Biometric cryptosystems combine cryptography and biometrics to provide security in the
cloud computing environment. Biometrics is the technology of measuring and analyzing
biological data of human body, extracting a feature set from the acquired data, and comparing this
set against to the template set in the database. The proposed system contains the modules such as
preprocessing, feature extraction, feature level fusion, key generation, fusion of encrypt key and
face level features and decryption.
The various biometric traits comprises of fingerprint, face, iris, voice, hand geometry, palm
print and more. The proposed approach combines finger print and iris as well as encryption keys
and face features for cryptographic key generation. The necessity to memorize or carry lengthy
passwords or keys is averted by the integration of biometrics within the cryptography. The steps
involved in the proposed approach based on multimodal biometrics for cryptographic key
generation are as follows:
1. Normalization using histogram equalization
2. Feature extraction form fingerprint
International Journal of Pure and Applied Mathematics Special Issue
4378
5
3. Feature extraction from iris
4. Feature level fusion of fingerprint and iris features
5. Cryptographic key generation from fused features
6. Double encryption for cloud data security
7. Feature extraction from face
8. Fusion level of encryption message and face features
NORMALIZATION USING HISTOGRAM EQUALIZATION
Histogram Normalization is one of the most commonly used methods for preprocessing. The
most commonly used histogram normalization technique is histogram equalization where one
attempts to change the image histogram into a histogram that is constant for all brightness values.
This would correspond to a brightness distribution where all values are equally probable. For
finger print, iris and face image, the extracted features can be combined easily using fusion
technique.
(1)
Where , grey level and N is total number of pixels in the image. Transformation
to a new intensity value is defined by:
(2)
Equation (2) defines a mapping of the pixels’ intensity values from their original range (0-255) to
the domain of [0,1]. Thus, to obtain pixel values in the original domain, e.g., the 8-bit interval, the
values have to be rescaled. Where is output image after preprocessing step by using
histogram equalization method [1] [2]
FEATURE EXTRACTION FROM FINGERPRINT
For extracting the feature of image, the system uses Gabor filter. A Gabor filter is a linear filter
whose impulse response is defined by a harmonic function multiplied by a Gaussian function. A
set of Gabor filters with different frequencies and orientations may be helpful for extracting useful
features from an image [3]. Gabor filters are a popular tool for this task of extracting more
relevant features from the fingerprint image. Each point is characterized by local Gabor filter
responses. A 2-D Gabor filter is achieved by modulating a 2-D sine wave (at particular
frequencies and orientations) with a Gaussian envelope. The 2-D Gabor filter kernel is defined by
(3)
International Journal of Pure and Applied Mathematics Special Issue
4379
6
Where, and are the standard deviations of the Gaussian envelope along the x and y-
dimensions, respectively. and are the wavelength and orientation, respectively. The spread
the Gaussian envelope is defined using the wavelength . The rotation of the – plane by an
angle results in a Gabor filter at orientation . is expressed by
) (4)
Where denotes the number of orientations. The Gabor local feature at a point of an image
can be viewed as the response of all different Gabor filters located at that point. A filter response
is obtained by convolving the filter kernel (with specific ) with the image. Here Gabor
used with 8 orientation and 4 scales/wavelengths as pictured on figure 1. For sampling point
the Gabor filter response, denoted as is represented as:
(5)
Where, denotes a greyscale image. When apply all Gabor filters at multiple
frequencies and orientations at a specific point thus get a set of filter responses
for that point. They are denoted as a Gabor jet. A jet is defined as the set of complex
coefficients obtained from one image point, and can be written as
(6)
Where, is magnitude and is phase of Gabor features/coefficients. Gabor filters optimally
capture both local orientation and frequency information from a fingerprint image. By tuning a
Gabor filter to specific frequency and direction, the local frequency and orientation information
can be obtained.
FEATURE EXTRACTION FROM IRIS
Iris is an important part of an eye which remains unique to each individual and remains constant
over the life of a person. The circular black disk in the center of the eyeball is known as pupil.
The pupil dilates when exposed to light and contracts in dark and hence size of pupil var ies with
respect to the exposed light. The iris is the annular ring between the sclera and pupil boundary
and contains the flowery pattern unique to each individual. The extraction of the unique texture
information unique of each individual from rest of the eye image is performed and transformed
into strip to apply pattern matching algorithm between the database and query images of iris. The
following are the important steps involved in iris recognition:
1. Pupil detection
2. Iris detection
3. Normalization
International Journal of Pure and Applied Mathematics Special Issue
4380
7
4. Feature extraction
To remove the effect of illumination the iris image is converted into grayscale. As pupil is the
largest black area in the intensity image, its edges can be detected easily from the binarized image
by using suitable threshold on the intensity image. The basic idea of this technique is to find
curves that can overcome artifacts such as shadows and noise. The procedure first finds the
intensity image gradient at all the locations in the given image by convolving with the sobel
filters. The gradient images and along x and y direction, is obtained by
kernels that detect horizontal and vertical changes in the image. The sobel filter kernels are
= } (7)
(8)
The absolute value of the gradient images along the vertical and horizontal direction is obtained to
form an absolute gradient image using the equation
+ (9)
Where is the convolution of image with and is the convolution of
image with The absolute gradient image is used to find edges. The edge image is
scanned for pixel (P) having true value and the center is determined with the help of the following
equations
(10)
To demarcate the variation in the outer iris boundary, intensity variation approach is been
employed. In this approach concentric circles of different radii are drawn from the detected
center. Among many circles, the circle with maximum change in intensity with respect to
previous drawn circle is chosen as an iris circle. The approach works well if there exists sharp
variation between iris boundary and sclera. The radius of iris and pupil boundary helps in
transforming the annular portion called as strip to a rectangular block.
Irises from different people may be captured in different size and, even for irises from the same
eye, the size may change due to illumination variations and other factors. Such elastic
deformation in iris texture will affect the results of iris matching. For the purpose of achieving
more accurate recognition results, it is necessary to compensate for the iris deformation. Daugman
[4], [5], [6] solved this problem by projecting the original iris in a Cartesian coordinate system
into a doubly dimensionless pseudo-polar coordinate system.
To transform the annular region of iris into polar equivalent the following set of equations are
used:
With
International Journal of Pure and Applied Mathematics Special Issue
4381
8
(11)
Where, and are respectively the radius of pupil and the iris, while ( and
( are the coordinates of the pupillary and limbic boundaries in the direction . The
value of θ belongs to [0,2 ], belongs to [0, 1].
Features are the attributes or values extracted to get the unique characteristics from the image.
Features from the iris image are extracted using Gabor filter. It is shown that the functional form
of Gabor filters conforms closely to the receptive profiles of simple cortical cells, and Gabor
filtering is an effective scheme for image representation. A two-dimensional (2D) even Gabor
filter can be represented by the following equation in the spatial domain:
(12)
Where is the frequency of the sinusoidal plane wave along the direction from thex-axis,
and are the space constants of the Gaussian envelope along x and y axes respectively.
Convolution with Gabor filters is still the major contributor to the overall featureextraction time.
Here set the filter frequency to the average ridge frequency where k isthe average inter-
ridge distance. The average inter-ridge distance is approximately 6 pixels in a 600 dpi Iris image.
If is too large, spurious ridges are created in the filtered image whereas if is too small, nearby
ridges are merged into one. Each sub image is respectively filtered by these Gabor filters. This
leads to a total of1120 (8 for each sub image) output images from which the iris features are
extracted. The feature vectors (FV) are passed to the matching module to allow comparisons.
FEATURE LEVEL FUSION OF FINGERPRINT AND IRIS FEATURES
Fusion at this level can be applied to the extraction of different features from the same modality
or different multimodalities. Feature extraction level refers to combining different feature vectors
that are obtained from multiple sensors for the same biometric trait or multiple biometric traits.
When feature vectors are homogeneous, a single feature vector can be calculated with “or”
operations. When the feature vectors are non-homogeneous, can concatenate them to form a
single vector [7] [8]. The multimodal authentication system is shown in Fig.1. And the face
features are stored in the given database.
International Journal of Pure and Applied Mathematics Special Issue
4382
9
Fig.1 Multimodal Authentication System
MULTIMODAL AUTHENTICATION SYSTEM USING ENSEMBLE SVM WITH AFSA
In this study, feature selection is performed in which SVM acts as the classifier using the
Artificial Fish Swarm Algorithm(AFSA) and the GA. A classified model can be established using
classifiers for assigning data to the correct categories. From input images, the features are first
extracted each data of which has the correct category label. These extracted features were termed
as training data and the remaining were regarded as test images. In mean time, the training images
were taken as input into the SVM classifiers wherein the classified model gets established and
later test images were used for verify this model and to obtain accurate classifications. In view of
solving the problems detailed earlier, the artificial fish swarm helps to give some specific
definitions of the integrated weight algorithm. Identification becomes more computational and
time consuming in case of biometrics when compared to the identity verification. Hence, for
achieving the improvement in the performance with lesser execution time, highly specialized
classification-based biometric system should be approached. Gaussian mixture, neural networks
and KNN classifiers are the most commonly used models-based classifiers different biometrics.
Statistical learning theory has the tendency to absorb both the variability and the similarity
Training data of face, finger print
and iris images
Test data of
Face, finger print and iris images
Template Generation
Normalization
Feature Extraction
Matching using AFSA-SVM
Database Storage
Decision
(Imposter/User)
Feature Level Fusion
Key generation and encrypt cloud data
Decrypt and return feature level fusion values
International Journal of Pure and Applied Mathematics Special Issue
4383
10
between patterns. Support Vector Machine (SVM) is a powerful learning tool that works
according to statistical learning theory and Machine learning. In solving various classification and
pattern recognition problems, SVM has been evidenced with superior results. In case of several
pattern classification applications, SVM gives better generalization performance when compared
to conventional techniques especially for larger number of input variables. Keeping this in mind,
the evaluation of SVM for the fused feature vector is performed. Pseudo code for Biometric
Authentication Process is given below.
The algorithmic process of SVM integrated weight using the artificial fish swarm algorithm is
been proposed in the present work as follows:
(1) The population size of artificial fish swarm N , the maximum Iterations number, visible
domain of artificial fish VISUAL , the maximum step length of artificial fish STEP , and crowded
factor are given as;
(13)
(14)
Where, , and refer to the p-th elements of the parameter vectors including and
of the artificial fish, and of the next state of artificial fish. The random numbers referred
as in . Similar definition is given for symbols in the following
equations.
(2) Given the initial iterations artificial fish individuals are generated randomly
in the feasible domain of controlled variable to form the initial swarm to produce N groups of w.
Each heft is the random number in [0,1] ;
(3) To calculate the food concentration FC of the current position of each artificial fish in the
initial swarm, and to compare the size. The individual fish with the maximum.
(4) Every artificial fish imitates the following and swarming behaviors, respectively. The
movement with higher FC is selected to execute after each action. The default behavior method is
foraging;
(15)
(5) After a single action of every artificial fish, it’s FC and the FC in the bulletin board is
inspected. The latter should be replaced by the former if its FC is better than that in the bulletin
board;
(6) Judgment of terminal conditions is done to check if the mean square error of successive
values obtained is less than the permissible error, or if num has reached the preset maximum
International Journal of Pure and Applied Mathematics Special Issue
4384
11
Iterations number. If the above criteria are met, the calculated result should be output (i.e. FC in
bulletin board).
(7) Otherwise, (step 4).
The biometric Authentication process involves,
Bio-aut-process(Person P){
chck.db←P.ID
If(chck.db){
for(i=0;i++)
Template[i]←Feature.extract(P)
db←Template[i]}}
if Q claims as P
{Q.feat←Feature.extract(Q) AFSASVM matcher←Q.feat
AFSASVM matcher←Template[i]
If AFSASVM matcher(Q.feat==Template[i])
Q is P}
else {Q.tmp←Feature.extract(Q)
FSASVM matcher←Q.tmp
AFSASVM matcher←Template[j]
for(j=0;j++)
IfAFSASVM matcher(Q.tmp==Template[j])
Q has passed authentication}}
return}
CONCLUSION AND FUTURE ENHANCEMENTS
As global networking provides many ways for fraudsters, building an accurate and easy handling
credit card fraud detection system is one of the major tasks for banks. There are several ways to
detect the fraud transaction. To progress safety measures of the monetary transaction systems in a
habitual and effectual way, structure a precise and well organized credit card scam detection
system is one of the essential functions for money transactions. By performing oversampling and
extracting the principal direction of the data can use KNN method to determine the anomaly of
the target instance. Hence the KNN method can suit for detecting fraud with the limitation of
memory. By the mean time outlier detection mechanism helps to detect the credit card fraud using
less memory and computation requirements. Especially outlier detection works fast and well on
online large datasets. But compared with power methods and other known anomaly detection
methods, experimental results prove that the KNN method is accurate and efficient.A possible
future work would be to unify the assumptions made by different techniques regarding the normal
and outlier behavior into a statistical or machine learning framework.
International Journal of Pure and Applied Mathematics Special Issue
4385
12
REFERENCES
[1] Renu, Suman” Analysis on Credit Card Fraud Detection Methods” volume 8 number 1–
Feb 2014
[2]Divya.Iyer, ArtiMohanpurkar, SnehaJanardhan, DhanashreeRathod, Amruta Sardeshmuk”
credit card fraud detection using hidden markov model ” 978-1-4673-0126-8/11/$26.00_c 2011
IEEE
[3] K.RamaKalyani, D.UmaDevi” Fraud Detection of Credit Card PaymentSystem by Genetic
Algorithm” Volume 3, Issue 7, July-2012
[4] AbhinavSrivastava, AmlanKundu, ShamikSural, and Arun K. Majumdar”Credit Card Fraud
Detection Using Hidden Markov Model” VOL. 5, NO. 1,JANUARY-MARCH 2012
[5]VenkataRatnamGanji,” Credit card fraud detection using Anti-k
NearestNeighborAlgorithm”,International Journal on Computer Science andEngineering (IJCSE)
Vol. 4 ,06 June 2012,(1035-1039)
[6] EkremDuman, M. HamdiOzcelik “Detecting credit card fraud by geneticalgorithm and scatter
search”. Elsevier, Expert Systems with Applications,(2011). 38; (13057–13063).
[7]A.J. Graaff A.P. Engelbrechtagraaff “The Artificial Immune System forFraud Detection in the
Telecommunications Environment” 20 November 2014.
[8] S. Benson Edwin Raj, A. Annie Portia, “Analysis on Credit Card FraudDetection Methods”,
International Conference on Computer, Communicationand Electrical Technology –
ICCCET2011, 18th & 19th March, 2011
[9] Y. Sahin and E. Duman, “Detecting Credit Card Fraud by Decision Treesand Support Vector
Machines”, International Multiconference of Engineers andcomputer scientists March, 2011.
[10] S. Benson Edwin Raj, A. Annie Portia “Analysis on Credit Card FraudDetection Methods”,
IEEE-International Conference on Computer,Communication and Electrical Technology, (2011),
pg.152-156.
[11] P.Jayant,Vaishali,D.Sharma,” Survey on Credit Card Fraud DetectionTechniques”,
International Journal of Engineering Research & Technology(IJERT), Vol. 3 Issue 3, March –
2014,pg.1545-1551
[12] PriyaRavindraShimpi, Prof. VijayalaxmiKadroli, ”Survey on Credit CardFraud Detection
Techniques”, International Journal Of Engineering AndComputer Science, Volume 4 Issue 11
Nov 2015, Page No. 15010-15015.
International Journal of Pure and Applied Mathematics Special Issue
4386
4387
4388