Download - WHY WE USE EXPLORATORY DATA ANALYSIS
1
WHY WE USE EXPLORATORY DATA ANALYSIS
DATA YES
NO
ESTIMATES BASEDON NORMAL DISTRIB.
KURTOSIS, SKEWNESS
TRANSFORMATIONS
QUANTILE (ROBUST)
ESTIMATES
OUTLIERS
EXTREMS YES
NO
QUANTILE (ROBUST)
ESTIMATES
WHY ?
CAN WEREMOVED THEM ?
DO DATA COME FROM NORMAL DISTRIBUTION?
TRANSFORMATIONS
2
METHODS OF EDA
Graphical:
dot plot
box plot
notched box plot
QQ plot
histogram
density plots
Tests:
tests of normality
minimal sample size
3
DOT PLOT
4
BOX PLOT
lowerquartil
upperkvartil
fenceouter inner
fenceinner outer
interquartilerange (H)
číselná osa
median
5
NOTCHED BOX PLOT
interval estimate of median
FD,H
1,57 RI = M ±
n
RF
6
Q-Q PLOT
X: theoretical quantiles of analysed distribution
Y: sample quantilesideal coincidence of sample values and theoretical distribution
measured values
7
Q-Q GRAF
25 30 35 40 45 50 55 60 65
Pozorovaná hodnota
-3
-2
-1
0
1
2
3
Oče
káva
ná n
orm
ální
hod
nota
8
Q-Q GRAF
-20 0 20 40 60 80 100 120
Pozorovaná hodnota
-3
-2
-1
0
1
2
3
Očekávaná n
orm
áln
í hodnota
9
Q-Q plot
right sided – skewed to left
left sided – skewed to right
platycurtic („flat“) leptocurtic(„steep“)
10
11
12
HISTOGRAM
Histogram - Sheet1 - TLOUSTKYČetnost
TLOUSTKY
20 30 40 50 60 700
10
20
30
13
HISTOGRAM
correct width of interval:
0,4int 2,46 ( 1)L n nL 2int
14
HISTOGRAM – kernel density function
Odhad hustoty - Sheet1 - TLOUSTKYHustota
TLOUSTKY
10 20 30 40 50 60 70 800.000
0.010
0.020
0.030
0.040
0.050
0.060
15
TRANSFORMATION
Aim of transformation:reduction of variance better level of symmetry(normality) of data
Transformation function:non-linear function monotonic function
16
TRANSFORMATION – basic concept
-0.4
-0.2
0
0.2
0.4
0.6
0.8
0 0.5 1 1.5 2 2.5 3 3.5
Original data (tree-rings widths in mm)
Tra
nsf
orm
ed d
ata
mean of original data
transformed mean and its
projection to original data set
17
TRANSFORMATION – logaritmic transformation
lnx x
0.0
5.0
10.0
15.0
0.0 266.7 533.3 800.0
Histogram
C2
Count
0.0
3.3
6.7
10.0
3.0 4.3 5.7 7.0
Histogram
C7
Count
18
TRANSFORMATION – power transformation
0
( ) ln 0
0
x
x x for
x
19
TRANSFORMATION – Box-Cox
0xln
01x
)x(
20
TRANSFORMATION – Box-Cox
21
TRANSFORMATION– estimate of optimal
logarithm oflikelihood function
for various values of optimal
interval estimate of parameter
= 1 is not included in intervalestimate of . It means that
transformation will be probably
successful
1.00
maxLF – 0,5*quantile 2