why we use exploratory data analysis
DESCRIPTION
ESTIMATES BASED. ON NORMAL DISTRIB. DATA. YES. NO. WHY ?. OUTLIERS. CAN WE. KURTOSIS ,. EXTR EMS. REMOVED THEM ?. SKEWNESS. YES. NO. QUANTILE. (ROBUST). TRANSFORMA TIONS. ESTIMATES. QUANTILE. (ROBUST). ESTIMATES. WHY WE USE EXPLORATORY DATA ANALYSIS. - PowerPoint PPT PresentationTRANSCRIPT
1
WHY WE USE EXPLORATORY DATA ANALYSIS
DATA YES
NO
ESTIMATES BASEDON NORMAL DISTRIB.
KURTOSIS, SKEWNESS
TRANSFORMATIONS
QUANTILE (ROBUST)
ESTIMATES
OUTLIERS
EXTREMS YES
NO
QUANTILE (ROBUST)
ESTIMATES
WHY ?
CAN WEREMOVED THEM ?
DO DATA COME FROM NORMAL DISTRIBUTION?
TRANSFORMATIONS
2
METHODS OF EDA
Graphical:
dot plot
box plot
notched box plot
QQ plot
histogram
density plots
Tests:
tests of normality
minimal sample size
3
DOT PLOT
4
BOX PLOT
lowerquartil
upperkvartil
fenceouter inner
fenceinner outer
interquartilerange (H)
číselná osa
median
5
NOTCHED BOX PLOT
interval estimate of median
FD,H
1,57 RI = M ±
n
RF
6
Q-Q PLOT
X: theoretical quantiles of analysed distribution
Y: sample quantilesideal coincidence of sample values and theoretical distribution
measured values
7
Q-Q GRAF
25 30 35 40 45 50 55 60 65
Pozorovaná hodnota
-3
-2
-1
0
1
2
3
Oče
káva
ná n
orm
ální
hod
nota
8
Q-Q GRAF
-20 0 20 40 60 80 100 120
Pozorovaná hodnota
-3
-2
-1
0
1
2
3
Očekávaná n
orm
áln
í hodnota
9
Q-Q plot
right sided – skewed to left
left sided – skewed to right
platycurtic („flat“) leptocurtic(„steep“)
10
11
12
HISTOGRAM
Histogram - Sheet1 - TLOUSTKYČetnost
TLOUSTKY
20 30 40 50 60 700
10
20
30
13
HISTOGRAM
correct width of interval:
0,4int 2,46 ( 1)L n nL 2int
14
HISTOGRAM – kernel density function
Odhad hustoty - Sheet1 - TLOUSTKYHustota
TLOUSTKY
10 20 30 40 50 60 70 800.000
0.010
0.020
0.030
0.040
0.050
0.060
15
TRANSFORMATION
Aim of transformation:reduction of variance better level of symmetry(normality) of data
Transformation function:non-linear function monotonic function
16
TRANSFORMATION – basic concept
-0.4
-0.2
0
0.2
0.4
0.6
0.8
0 0.5 1 1.5 2 2.5 3 3.5
Original data (tree-rings widths in mm)
Tra
nsf
orm
ed d
ata
mean of original data
transformed mean and its
projection to original data set
17
TRANSFORMATION – logaritmic transformation
lnx x
0.0
5.0
10.0
15.0
0.0 266.7 533.3 800.0
Histogram
C2
Count
0.0
3.3
6.7
10.0
3.0 4.3 5.7 7.0
Histogram
C7
Count
18
TRANSFORMATION – power transformation
0
( ) ln 0
0
x
x x for
x
19
TRANSFORMATION – Box-Cox
0xln
01x
)x(
20
TRANSFORMATION – Box-Cox
21
TRANSFORMATION– estimate of optimal
logarithm oflikelihood function
for various values of optimal
interval estimate of parameter
= 1 is not included in intervalestimate of . It means that
transformation will be probably
successful
1.00
maxLF – 0,5*quantile 2