data mining application

2

Click here to load reader

Upload: abhinaw-kumar-singh

Post on 12-May-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Mining Application

1. Introduction:

The purpose of this term paper is to give use of data mining in many scientific applications, suchas geo statistics, kriging, inverse distance weighting, 3D reconstruction, bioinformatics, chemoinformatics, and handwriting recognition etc. Another purpose of this term paper is to introducedirect kernel methods as general-purpose and powerful data mining tools for predictivemodeling, feature selection and visualization. Direct kernel methods are a class of algorithms foranalysis pattern. Best known member of direct kernel methods is support vector machine (SVM).I am selecting direct kernel methods for this term paper because it is used in many scientificapplications for mining the information.

2. Whatisdatamining?

2.1 Introduction of data mining

Data mining is about finding new information in a lot of data. Data mining is defined asextraction of interesting information from large heterogeneous data sources for used to takedecision in favor of organization. Data mining has its roots in statistics, probability theory, neuralnetworks, and the systems angle of artificial intelligence. In Montreal KnowledgeDiscovery and Data Mining (KDD) conference in 1995 [1], data mining was defined as:mining is the process of automatically extracting valid, novel, potentially useful, and ultimatelycomprehensible information from large

2.2 Scientific data mining

Scientific data mining is defined as data mining applied to scientific problems, rather thandatabase marketing, finance, or business-driven applications. Scientific data mining distinguishesitself in the sense that the nature of the datasets is often very different from traditional marketdriven data mining applications. The datasets now might involve vast amounts of continuousdata and, precise and accounting for underlying system nonlinearities can be extremelychallenging from a machine learning point of view. For example, data mining of astronomybased data dataset ta now poses a challengeon its own. On other example [2], for bio-informatics related applications such as gene findingand protein folding, the dataset are more modest, but the modeling part can be extremelychallenging.

Page 2: Data Mining Application

2.3 The Data mining process

Figure 1: Data mining process

Data mining process involves the gathering of data cleansing, data pre-processing andtransforming a subset of data to a flat file, building one more models that can be predictivemodels, clusters or data visualizations that lead to the formulation of rules, and finally piecingtogether the larger picture.

2.4 Data mining methods and techniques

Data mining often involves clustering, attribute and feature selection, classification models or thebuilding of predictive regression, the formation of rules and outlier detection. These techniquescan be based on probability theory, statistics, decision trees, Bayesian networks, associationrules, evolutionary computation, neural networks, and fuzzy logic.