data mining with weka - cnucse.cnu.ac.kr/~cheonghee/lectures/12dm/intro_weka_.pdf다운받아...

24
Data Mining Lab Introduction to Weka 11/12/2012 Data mining with WEKA 호영 [email protected]

Upload: others

Post on 19-Feb-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data mining with WEKA - CNUcse.cnu.ac.kr/~cheonghee/lectures/12dm/intro_weka_.pdf다운받아 weka를 이용하여 classification, clustering 등 데이터마이닝 작업을 수행하여

Data Mining Lab Introduction to Weka 11/12/2012

Data mining with WEKA

우 호영 [email protected]

Page 2: Data mining with WEKA - CNUcse.cnu.ac.kr/~cheonghee/lectures/12dm/intro_weka_.pdf다운받아 weka를 이용하여 classification, clustering 등 데이터마이닝 작업을 수행하여

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

WEKA : the software

“Waikato Environment for Knowledge Analysis”

Data Mining Software in Java

– a collection of machine learning algorithms

for data mining tasks

– http://www.cs.waikato.ac.nz/ml/weka/

Inclusion

– data pre-processing, classification, regression,

clustering, association rules, visualization

Page 3: Data mining with WEKA - CNUcse.cnu.ac.kr/~cheonghee/lectures/12dm/intro_weka_.pdf다운받아 weka를 이용하여 classification, clustering 등 데이터마이닝 작업을 수행하여

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

How to install WEKA

Download WEKA from

– http://www.cs.waikato.ac.nz/ml/weka/index_downloadi

ng.html

– 강의자료 홈페이지

Page 4: Data mining with WEKA - CNUcse.cnu.ac.kr/~cheonghee/lectures/12dm/intro_weka_.pdf다운받아 weka를 이용하여 classification, clustering 등 데이터마이닝 작업을 수행하여

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

How to install WEKA

Next I Agree Next Next Install

Page 5: Data mining with WEKA - CNUcse.cnu.ac.kr/~cheonghee/lectures/12dm/intro_weka_.pdf다운받아 weka를 이용하여 classification, clustering 등 데이터마이닝 작업을 수행하여

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

WEKA GUI chooser

Explorer Experimenter

KnowledgeFlow Command Line Interface

Page 6: Data mining with WEKA - CNUcse.cnu.ac.kr/~cheonghee/lectures/12dm/intro_weka_.pdf다운받아 weka를 이용하여 classification, clustering 등 데이터마이닝 작업을 수행하여

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

WEKA Explorer : Open file

Open file Brings up a dialog box allowing you to browse for the data

file on the local file system

Open URL Asks for a Uniform Resource Locator address for where

the data is stored

Open DB Reads data from a database

Generate Enables you to generate

artificial data from a variety of

DataGenerators

Data can be imported from a file in

various formats: ARFF, CSV, C4.5

Page 7: Data mining with WEKA - CNUcse.cnu.ac.kr/~cheonghee/lectures/12dm/intro_weka_.pdf다운받아 weka를 이용하여 classification, clustering 등 데이터마이닝 작업을 수행하여

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

ARFF format

An ARFF (= Attribute-Relation File Format ) file is an ASCII

text file that describes a list of instances sharing a set of

attributes

ARFF files have two distinct sections

– Header : relation, attributes

– Data

Page 8: Data mining with WEKA - CNUcse.cnu.ac.kr/~cheonghee/lectures/12dm/intro_weka_.pdf다운받아 weka를 이용하여 classification, clustering 등 데이터마이닝 작업을 수행하여

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

@relation heart-disease-simplified

@attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present}

@data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,230,no,not_present .

.

.

ARFF data

@relation heart-disease-simplified

@attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present}

@data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,230,no,not_present .

.

.

Header

Data

a sample

Page 9: Data mining with WEKA - CNUcse.cnu.ac.kr/~cheonghee/lectures/12dm/intro_weka_.pdf다운받아 weka를 이용하여 classification, clustering 등 데이터마이닝 작업을 수행하여

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

ARFF data – header section

@relation heart-disease-simplified

@attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present}

Realation - @relation <relation-name>

– The relation name is defined as the first line in the ARFF

Attribute - @attribute <attribute-name> <datatype>

– @attribute statement uniquely defines the name of that attribute

– Data type

numeric(integer,real is treated as numeric)

<nominal-specification>

string

Page 10: Data mining with WEKA - CNUcse.cnu.ac.kr/~cheonghee/lectures/12dm/intro_weka_.pdf다운받아 weka를 이용하여 classification, clustering 등 데이터마이닝 작업을 수행하여

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

ARFF data – data section

Data - @data

– a single line denoting the start of the data segment in the file

@data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,230,no,not_present .

.

.

Page 11: Data mining with WEKA - CNUcse.cnu.ac.kr/~cheonghee/lectures/12dm/intro_weka_.pdf다운받아 weka를 이용하여 classification, clustering 등 데이터마이닝 작업을 수행하여

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

Open arff file

weather

– 14 samples

– 4 attribute

– binary class

No. outlook temperature humidity windy

sunny 85 85 FALSE no

sunny 80 90 TRUE no

overcast 83 86 FALSE yes

rainy 70 96 FALSE yes

rainy 68 80 FALSE yes

rainy 65 70 TRUE no

overcast 64 65 FALSE yes

sunny 72 95 FALSE no

sunny 69 70 FALSE yes

rainy 75 80 TRUE yes

sunny 75 70 TRUE yes

overcast 72 90 TRUE yes

overcast 81 75 FALSE yes

rainy 71 91 TRUE no

Page 12: Data mining with WEKA - CNUcse.cnu.ac.kr/~cheonghee/lectures/12dm/intro_weka_.pdf다운받아 weka를 이용하여 classification, clustering 등 데이터마이닝 작업을 수행하여

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

File information

attribute & class

information

Page 13: Data mining with WEKA - CNUcse.cnu.ac.kr/~cheonghee/lectures/12dm/intro_weka_.pdf다운받아 weka를 이용하여 classification, clustering 등 데이터마이닝 작업을 수행하여

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

File information – visualization all

각각의 에트리뷰트 값에 대한 클래스 분포를 확인 할 수 있다

Page 14: Data mining with WEKA - CNUcse.cnu.ac.kr/~cheonghee/lectures/12dm/intro_weka_.pdf다운받아 weka를 이용하여 classification, clustering 등 데이터마이닝 작업을 수행하여

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

Classify Section

Select a classifier

Test Option

Select class attribute

Page 15: Data mining with WEKA - CNUcse.cnu.ac.kr/~cheonghee/lectures/12dm/intro_weka_.pdf다운받아 weka를 이용하여 classification, clustering 등 데이터마이닝 작업을 수행하여

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

Choose a classifier for classification

NaiveBayes

Page 16: Data mining with WEKA - CNUcse.cnu.ac.kr/~cheonghee/lectures/12dm/intro_weka_.pdf다운받아 weka를 이용하여 classification, clustering 등 데이터마이닝 작업을 수행하여

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

Classifier’s information & options

NaiveBayes

– Capabilities : 해당 알고리즘의 attr, class로 사용 가능한 data

type의 종류를 확인 할 수 있다.

click

Page 17: Data mining with WEKA - CNUcse.cnu.ac.kr/~cheonghee/lectures/12dm/intro_weka_.pdf다운받아 weka를 이용하여 classification, clustering 등 데이터마이닝 작업을 수행하여

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

k-fold cross validation(set as k = 3)

Set test options

data set

k-1 : training set, 1 : test set

Page 18: Data mining with WEKA - CNUcse.cnu.ac.kr/~cheonghee/lectures/12dm/intro_weka_.pdf다운받아 weka를 이용하여 classification, clustering 등 데이터마이닝 작업을 수행하여

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

Training & test

Click the ‘Start’ Button

Page 19: Data mining with WEKA - CNUcse.cnu.ac.kr/~cheonghee/lectures/12dm/intro_weka_.pdf다운받아 weka를 이용하여 classification, clustering 등 데이터마이닝 작업을 수행하여

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

Prediction Accuracy

Page 20: Data mining with WEKA - CNUcse.cnu.ac.kr/~cheonghee/lectures/12dm/intro_weka_.pdf다운받아 weka를 이용하여 classification, clustering 등 데이터마이닝 작업을 수행하여

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

Choose other classifiers

Multi Layer Perceptron & J48

Page 21: Data mining with WEKA - CNUcse.cnu.ac.kr/~cheonghee/lectures/12dm/intro_weka_.pdf다운받아 weka를 이용하여 classification, clustering 등 데이터마이닝 작업을 수행하여

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

Prediction result

Naïve Bayes

Multi Layer Perceptron

J48

Page 22: Data mining with WEKA - CNUcse.cnu.ac.kr/~cheonghee/lectures/12dm/intro_weka_.pdf다운받아 weka를 이용하여 classification, clustering 등 데이터마이닝 작업을 수행하여

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

Exercise & homework #3

이번 과제는 11월 12일 실습시간에 수행하여

제출하는 것을 기본으로 합니다. 그러나 수업에

출석하지 못한 학생들은 따로 실습을 수행하여

제출해도 됩니다.

UCI machine learning repository에서 iris 데이터를

다운받아 weka를 이용하여 classification,

clustering 등 데이터마이닝 작업을 수행하여 그

결과를 보고서에 카피하여 제출하세요.

(다음 수업시간에 프린트하여 제출)

Page 23: Data mining with WEKA - CNUcse.cnu.ac.kr/~cheonghee/lectures/12dm/intro_weka_.pdf다운받아 weka를 이용하여 classification, clustering 등 데이터마이닝 작업을 수행하여

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

Exercise & homework #3

classify와 같은 방법으로 cluster, associate를

수행한다.

classification : 다른 classifier를 사용해 본다.

cluster

– classifier option을 통해 cluster의 수를 데이터의

class의 수와 맞게 설정해 본다.

associate

– Apriori algorithm은 numeric value를 처리하지 못한다.

Page 24: Data mining with WEKA - CNUcse.cnu.ac.kr/~cheonghee/lectures/12dm/intro_weka_.pdf다운받아 weka를 이용하여 classification, clustering 등 데이터마이닝 작업을 수행하여

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

weka memory not enough

메모리 부족이 발생할 경우, 명령어

프롬프트(cmd.exe)에서 메모리를 확장하여

실행한다