seminar presentation

VA I B H AV D H AT TA R WA LC S E I D D5 T H Y EA R

0 8 2 1 1 0 1 8

U N D E R T H E G U I DA N C E O FD R . D U R G A TO S H N I WA L

Artificial Neural Networks based Data Mining Techniques

Introduction

Introduction to Knowledge Discovery in Databases Process and components of the Data Mining Process.

The various Data Mining Techniques and a brief description of these techniques.

A brief overview of artificial neural networks and their position as an applicable tool in data mining.

Applications of the techniques available to data mining practitioners, including Artificial Neural Networks, Regression, and Decision Trees.

Presentation Overview

KDD ProcessData MiningCRISP-DM ModelMining TechniquesArtificial Neural NetworksBack Propagation AlgorithmApplicationsConclusion

Knowledge Discovery in Databases(KDD) Process

Knowledge Discovery in Databases(KDD) Process

The Knowledge Discovery in Databases (KDD) process is commonly defined with the stages: Selection Pre-processing Transformation Data Mining Interpretation/Evaluation.

Data Mining

Data mining is the term used to describe the process of extracting value from a database. A Data-warehouse is a location where information is stored. The type of data stored depends largely on the type of industry and the company.

Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), is the process that attempts to discover patterns in large data sets.

It utilizes methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.

Data Mining Process : Steps Involved

Data cleaning The task of this step is to remove noise and inconsistent data.

Data integration In this step, multiple data sources like the ones mentioned in the section above can be combined to an integrated collection of data.

Data selection All the data relevant to the analysis task is retrieved from the database in this step.

Data transformation The data is transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations.

Data Mining Process : Steps Involved

Data mining The critical step where intelligent methods are applied in order to extract data patterns.

Pattern evaluation This step is deployed to identify the truly interesting patterns representing knowledge based on certain measures.

Knowledge presentation In the final step, various visualization and knowledge representation techniques are used to present the mined knowledge to the user.

Data Mining Functions

Classification: It infers the defining characteristics of a certain group.

Clustering: It identifies groups of items that share a particular characteristic.

Association: It identifies relationships between events that occur at one time.

Sequencing: It is similar to association, except that the relationship exists over a period of time.

Forecasting: It estimates future values based on patterns within large sets of data.

Data Mining : Data Types

Data Mining is performed on the following types of data :

Relational databases

Data warehouses

Transactional databases

Advanced DB and information repositories

Cross-Industry Standard Process for Data Mining (CRISP-DM) Model

Business understanding - In this phase, the business objectives must be understood clearly by finding out what the client really want to achieve. Next, we have to assess the situation by finding about the resources, assumptions, constraints and other important factors. Then from the business objectives and current situations, we need to create goals to achieve the business objective within the current situation.

Data understanding - This phase starts with initial data collection from available sources to get familiar with data. Data load and Data integration are carried out to ensure successful data collection. Then, the data need to be explored by tackling the data mining questions, which can be addressed using querying, reporting and visualization. Finally, we must check whether the acquired data is complete, and ensure that there are no missing values in the acquired data.

Data preparation - The data preparation normally consumes about 90% of the time. The outcome of the data preparation phase is the final data set. When the available data sources are identified, they need to be selected, cleaned, constructed and formatted into the desired form.


Modelling - Several modelling techniques are selected to be used for the prepared dataset. A test scenario must be generated to validate the model’s quality. One or more models are created by running the modelling tool on the prepared dataset. The created models need to be assessed carefully so that they meet business initiatives.

Evaluation - In the evaluation phase, the model results must be evaluated in the context of business objectives in the first phase. In this phase, new business requirements may be raised due to new patterns has been discovered in the model results or from other factors. Gaining business understanding is an iterative process in data mining. The final decision must be made in this step to move to the deployment phase.

Deployment - The knowledge or information gained through data mining process needs to be presented in such a way that it can be used, whenever it is desired. From project point of view, the final evaluation of the project needs to summarize the project experiences and review the project to see what needs to be improved.


Data Mining Techniques : Classification

Classification is the most commonly applied data mining technique, which employs a set of pre-classified examples to develop a model that can classify the population of records at large.

Classification is a classic data mining technique based on machine learning. Basically classification is used to classify each item in a set of data into one of predefined set of classes or groups.

The data classification process involves learning and classification. In Learning, the training data are analyzed by classification algorithm. In classification, test data are used to estimate the accuracy of the classification rules. If

the accuracy is acceptable, the rules can be applied to the new data tuples.Classification method makes use of mathematical techniques such as

decision trees, linear programming, neural network and statistics. In classification, we make the software that can learn how to classify the data items into groups.

Data Mining Techniques : Clustering

Clustering can be defined as identification of similar classes of objects. Clustering is a data mining technique that makes meaningful or useful cluster of

objects that have similar characteristic using automatic technique. By using clustering techniques we can further identify dense and sparse regions in object space and can discover overall distribution pattern and correlations among data attributes.

Due to the fact that classification approach can become costly, Clustering can be used as pre-processing approach for attribute subset selection and classification.

In clustering technique, the classes are defined and accordingly objects are put in them, whereas in classification objects are assigned into predefined classes.

Data Mining Techniques : Regression

Regression analysis helps in understanding how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed. In other words, it estimates the average value of the dependent variable when the independent variables are fixed.

In all cases, the estimation target is a function of the independent variables called the regression function. In regression analysis, it is also of interest to characterize the variation of the dependent variable around the regression function, which can be described by a probability distribution.

Data Mining Techniques : Association Rules

Association is one of the best known data mining technique. In association, a pattern is discovered based on a relationship of a particular item on other items in the same transaction.

Association and correlation is usually to find frequent item set findings among large data sets. This type of finding helps businesses to make certain decisions, such as catalogue design, cross marketing and customer shopping behaviour analysis.

Association rules are usually required to satisfy a user-specified minimum support and a user-specified minimum confidence at the same time

Data Mining Techniques : Neural Networks

An Artificial Neural Network (ANN), usually called neural network (NN), is a mathematical model or computational model that is inspired by the structure and functional aspects of biological neural networks.

A neural network consists of an interconnected group of artificial neurons, and it processes information using a connection based approach to computation.

In most cases an ANN is an adaptive system that changes its structure based on external or internal information that flows through the network during the learning phase.

Modern neural networks are non-linear statistical data modelling tools. They are usually used to model complex relationships between inputs and outputs or to find patterns in data.

Artificial Neural Network

Artificial Neural Network

Neural networks are non-linear statistical data modelling tools. They can be used to model complex relationships between inputs and outputs; or to find patterns in data and to infer rules from them.

Neural networks are useful in providing information on associations, classifications, clusters, and forecasting. Using neural networks as a tool, data warehousing firms can harvest information from datasets in the data mining process.

Neural networks are used to estimate sampled functions when we do not know the form of the functions.

The two abilities: pattern recognition and function estimation make neural networks a very prevalent utility in data mining. With their model-free estimators and their dual nature, neural networks serve data mining in a variety of ways.

Feed Forward Neural Network

Input data is presented to the network and propagated through the network until it reaches the output layer. This forward process produces a predicted output.

The predicted output is subtracted from the actual output and an error value for the networks is calculated.

The neural network then uses supervised learning, which in most cases is back propagation, to train the network. Back propagation is a learning algorithm for adjusting the weights. It starts with the weights between the output layer PE’s and the last hidden layer PE’s and works backwards through the network.

Once back propagation has finished, the forward process starts again, and this cycle is continued until the error between predicted and actual outputs is minimized.

Feed Forward Neural Network : Training

Back Propagation Algorithm

Initialize the weights in the network Do

For each example E in the training set O = neural-net-output (network, e); forward pass T = teacher output for e Calculate error (T - O) at the output units Compute delta_wh for all weights from hidden layer to output layer ; backward pass Compute delta_wi for all weights from input layer to hidden layer ; backward pass

continued Update the weights in the network

Until all examples classified correctly or stopping criterion satisfiedReturn the network

Back Propagation Algorithm

Phase 1: PropagationEvery propagation involves the following steps:

Forward propagation of a training pattern's input through the neural network. Backward propagation of the propagation's output activations through the

neural network using the training pattern's target.Phase 2: Weight updateFor each weight-synapse the following steps are used:

Multiply its output delta and input activation to get the gradient of the weight. Bring the weight in the opposite direction of the gradient by subtracting a ratio

of it from the weight.Repeat phase 1 and 2 until the performance of the network is

satisfactory.

Applications : Spatial Data Mining

Spatial Data Mining

Spatial Data Cube ConstructionAs with relational data, we can integrate spatial data to

construct a data warehouse that facilitates spatial data mining. A spatial data warehouse is a subject-oriented, integrated, time variant and non-volatile collection of both spatial and non-spatial data in support of spatial data mining and spatial-data-related decision-making processes.

There are three types of dimensions in a spatial data cube: A non spatial dimension A spatial-to-non spatial dimension A spatial-to-spatial dimension

Applications : Text Mining

Web Mining

The World Wide Web serves as a huge, widely distributed, global information service centre for news, advertisements, consumer information, financial management, education, government, e-commerce, and many other information services. The Web also contains a rich and dynamic collection of hyperlink information and Web page access and usage information, providing rich sources for data mining.

Challenges: The Web seems to be too huge for effective data warehousing and data mining The complexity of Web pages is far greater than that of any traditional text document

collection The Web is a highly dynamic information source The Web serves a broad diversity of user communities Only a small portion of the information on the Web is truly relevant or useful

Besides mining Web contents and Web linkage structures, another important task for Web mining is Web usage mining.

Applications : Intrusion Detection

The security of our computer systems and data is at continual risk. The extensive growth of the Internet and increasing availability of tools and tricks for intruding and attacking networks have prompted intrusion detection to become a critical component of network administration. Some areas in which data mining technology is being applied or further developed for intrusion detection: Development of data mining algorithms for intrusion detection Association and correlation analysis, and aggregation to help select and build

discriminating attributes Analysis of stream data Distributed data mining Visualization and querying tools

Conclusions

Although the basic steps in data mining include data cleaning, selection and transformation; the functions and techniques are only applied in the vital step where intelligent methods are used to detect patterns.

Cross Industry Standard Process for Data Mining Model is an effective approach to a model which considers business requirements at every step.

Classification and Clustering techniques are popular and easily applicable in data mining, however classification we require prior characteristic information.

Artificial Neural Networks can be deployed to detect patterns and make predictions which make them capable tools in data mining. A feed forward neural network uses a back propagation algorithm to train itself.

The application of data mining techniques along with GIS techniques makes for a potential opportunity to explore various aspects of Spatial Data Mining.

The growth of data available for processing, as well as multimedia elements and the world wide web leads to greater opportunities for data mining techniques. However the pre-processing, selection and transformation needs to be handled first.

Thank You

seminar presentation

Documents