a little about data mining

12
1 A LITTLE ABOUT DATA MINING

Upload: nguyen-ngoc-binh-phuong

Post on 16-Mar-2018

153 views

Category:

Business


1 download

TRANSCRIPT

1

A LITTLE ABOUT DATA MINING

Data mining is focused on better understanding of characteristics and patterns among variables in large databases using a variety of statistical and analytical tools.◦ It is used to identify relationships among variables in

large data sets and understand hidden patterns that they may contain.

◦ XLMiner software implement many basic data mining procedures in a spreadsheet environment.

2

3

In supervised data mining techniques, there is a dependent variable the method is trying to predict.◦ The classification and prediction/forecasting methods

are supervised data mining techniques.

In unsupervised data mining techniques, there is no dependent variable. Instead, these techniques search for patterns and structure among all of the variables.◦ One popular unsupervised method is association

analysis (known in marketing as market basket analysis)

◦ The most common unsupervised method is clustering(known in marketing as segmentation).

4

De

sc

rip

tiv

e a

na

lyti

cs

Pre

dic

tiv

e a

na

lyti

cs

Suppose you had a basket and filled it with different kinds of fruits.

5

We have four types of fruits:

6

APPLE

BANANA

GRAPECHERRY

You already learn from your previous work about the physical characters of fruits. So arranging the same type of fruits at one place is easy now.

In data mining terminology the earlier work is called as training data. You already learn the things from your training data. This is because of response variable.

7

Suppose you have taken a new fruit from the basket then you will see the size, color, and shape of that particular fruit.◦ If size is Big, color is Red, the shape is rounded shape

with a depression at the top, you will confirm the fruit name as Apple and you will put in Apple group.

If you learn the thing before from training data and then applying that knowledge to the test data (for new fruit), this type of learning is called as supervised learning.

8

This time, you don’t know anything about the fruits, honestly saying this is the first time you have seen them. You have no clue about those.

So, how will you arrange them? What will you do first?

9

You will take a fruit and you will arrange them by considering the physical character of that particular fruit.

Suppose you have considered color. Then you will arrange them on considering base condition as color. Then the groups will be something like this:◦ RED COLOR GROUP: apples & cherries.

◦ GREEN COLOR GROUP: bananas & grapes.

So now you will take another physical character such as size.◦ RED COLOR AND BIG SIZE: apples.

◦ RED COLOR AND SMALL SIZE: cherries.

◦ GREEN COLOR AND BIG SIZE: bananas.

◦ GREEN COLOR AND SMALL SIZE: grapes.

10

Here you did not learn anything before, means no training data and no response variable.

In data mining, this kind of learning is known as unsupervised learning.

11