Transcript
Page 1: WHY WE USE EXPLORATORY DATA ANALYSIS

1

WHY WE USE EXPLORATORY DATA ANALYSIS

DATA YES

NO

ESTIMATES BASEDON NORMAL DISTRIB.

KURTOSIS, SKEWNESS

TRANSFORMATIONS

QUANTILE (ROBUST)

ESTIMATES

OUTLIERS

EXTREMS YES

NO

QUANTILE (ROBUST)

ESTIMATES

WHY ?

CAN WEREMOVED THEM ?

DO DATA COME FROM NORMAL DISTRIBUTION?

TRANSFORMATIONS

Page 2: WHY WE USE EXPLORATORY DATA ANALYSIS

2

METHODS OF EDA

Graphical:

dot plot

box plot

notched box plot

QQ plot

histogram

density plots

Tests:

tests of normality

minimal sample size

Page 3: WHY WE USE EXPLORATORY DATA ANALYSIS

3

DOT PLOT

Page 4: WHY WE USE EXPLORATORY DATA ANALYSIS

4

BOX PLOT

lowerquartil

upperkvartil

fenceouter inner

fenceinner outer

interquartilerange (H)

číselná osa

median

Page 5: WHY WE USE EXPLORATORY DATA ANALYSIS

5

NOTCHED BOX PLOT

interval estimate of median

FD,H

1,57 RI = M ±

n

RF

Page 6: WHY WE USE EXPLORATORY DATA ANALYSIS

6

Q-Q PLOT

X: theoretical quantiles of analysed distribution

Y: sample quantilesideal coincidence of sample values and theoretical distribution

measured values

Page 7: WHY WE USE EXPLORATORY DATA ANALYSIS

7

Q-Q GRAF

25 30 35 40 45 50 55 60 65

Pozorovaná hodnota

-3

-2

-1

0

1

2

3

Oče

káva

ná n

orm

ální

hod

nota

Page 8: WHY WE USE EXPLORATORY DATA ANALYSIS

8

Q-Q GRAF

-20 0 20 40 60 80 100 120

Pozorovaná hodnota

-3

-2

-1

0

1

2

3

Očekávaná n

orm

áln

í hodnota

Page 9: WHY WE USE EXPLORATORY DATA ANALYSIS

9

Q-Q plot

right sided – skewed to left

left sided – skewed to right

platycurtic („flat“) leptocurtic(„steep“)

Page 10: WHY WE USE EXPLORATORY DATA ANALYSIS

10

Page 11: WHY WE USE EXPLORATORY DATA ANALYSIS

11

Page 12: WHY WE USE EXPLORATORY DATA ANALYSIS

12

HISTOGRAM

Histogram - Sheet1 - TLOUSTKYČetnost

TLOUSTKY

20 30 40 50 60 700

10

20

30

Page 13: WHY WE USE EXPLORATORY DATA ANALYSIS

13

HISTOGRAM

correct width of interval:

0,4int 2,46 ( 1)L n nL 2int

Page 14: WHY WE USE EXPLORATORY DATA ANALYSIS

14

HISTOGRAM – kernel density function

Odhad hustoty - Sheet1 - TLOUSTKYHustota

TLOUSTKY

10 20 30 40 50 60 70 800.000

0.010

0.020

0.030

0.040

0.050

0.060

Page 15: WHY WE USE EXPLORATORY DATA ANALYSIS

15

TRANSFORMATION

Aim of transformation:reduction of variance better level of symmetry(normality) of data

Transformation function:non-linear function monotonic function

Page 16: WHY WE USE EXPLORATORY DATA ANALYSIS

16

TRANSFORMATION – basic concept

-0.4

-0.2

0

0.2

0.4

0.6

0.8

0 0.5 1 1.5 2 2.5 3 3.5

Original data (tree-rings widths in mm)

Tra

nsf

orm

ed d

ata

mean of original data

transformed mean and its

projection to original data set

Page 17: WHY WE USE EXPLORATORY DATA ANALYSIS

17

TRANSFORMATION – logaritmic transformation

lnx x

0.0

5.0

10.0

15.0

0.0 266.7 533.3 800.0

Histogram

C2

Count

0.0

3.3

6.7

10.0

3.0 4.3 5.7 7.0

Histogram

C7

Count

Page 18: WHY WE USE EXPLORATORY DATA ANALYSIS

18

TRANSFORMATION – power transformation

0

( ) ln 0

0

x

x x for

x

Page 19: WHY WE USE EXPLORATORY DATA ANALYSIS

19

TRANSFORMATION – Box-Cox

0xln

01x

)x(

Page 20: WHY WE USE EXPLORATORY DATA ANALYSIS

20

TRANSFORMATION – Box-Cox

Page 21: WHY WE USE EXPLORATORY DATA ANALYSIS

21

TRANSFORMATION– estimate of optimal

logarithm oflikelihood function

for various values of optimal

interval estimate of parameter

= 1 is not included in intervalestimate of . It means that

transformation will be probably

successful

1.00

maxLF – 0,5*quantile 2


Top Related