Download - 5 Data Screening
8/16/2019 5 Data Screening
http://slidepdf.com/reader/full/5-data-screening 1/33
DATA SCREENING &
TRANSFORMATION
STATISTICAL ANALYSIS
8/16/2019 5 Data Screening
http://slidepdf.com/reader/full/5-data-screening 2/33
Data Screening/CleaningBefore conducting any statistical test, data
need to be checkAlways potential for errors to occur
measurement
transcribing
entering the data
STATISTICAL ANALYSIS
8/16/2019 5 Data Screening
http://slidepdf.com/reader/full/5-data-screening 3/33
Qualitative Data
Coding is important: recommended to
use numerical-codes to present groups
Easy to check, since each variable can
take one of the numbers of the limited
values
Any values that are not allowable: must
be errors
STATISTICAL ANALYSIS
8/16/2019 5 Data Screening
http://slidepdf.com/reader/full/5-data-screening 4/33
numerical codes are
recommended compare tostring/text because a small
‘male’ is different from a big
‘Male’.
STATISTICAL ANALYSIS
Qualitative Data
8/16/2019 5 Data Screening
http://slidepdf.com/reader/full/5-data-screening 5/33
STATISTICAL ANALYSIS
8/16/2019 5 Data Screening
http://slidepdf.com/reader/full/5-data-screening 6/33
Quantitative Data
Very prone to errors & difficult to check
Out of range numbers should be remove
e.g: decimal point is easily misplaced, any
value which is out of range further checking
STATISTICAL ANALYSIS
8/16/2019 5 Data Screening
http://slidepdf.com/reader/full/5-data-screening 7/33
STATISTICAL ANALYSIS
8/16/2019 5 Data Screening
http://slidepdf.com/reader/full/5-data-screening 9/33
Testing For Normality
3 ways:
1)Graphs:
histogram, Q-Q plots
2)Descriptive statistics:
mean,median,skewness,kurtosis
3)Formal statistical test:Kolmogorov Smirnov one Sample test (K-S
test),Shapiro-Wilk test
STATISTICAL ANALYSIS
8/16/2019 5 Data Screening
http://slidepdf.com/reader/full/5-data-screening 10/33
Graphs: Histogram & Q-Q plots
STATISTICAL ANALYSIS
8/16/2019 5 Data Screening
http://slidepdf.com/reader/full/5-data-screening 11/33
STATISTICAL ANALYSIS
8/16/2019 5 Data Screening
http://slidepdf.com/reader/full/5-data-screening 12/33
Skewness & Kurtosis
STATISTICAL ANALYSIS
8/16/2019 5 Data Screening
http://slidepdf.com/reader/full/5-data-screening 13/33
Formal Statistical Test
STATISTICAL ANALYSIS
8/16/2019 5 Data Screening
http://slidepdf.com/reader/full/5-data-screening 14/33
STATISTICAL ANALYSIS
Formal Statistical Test
STATISTICAL ANALYSIS
8/16/2019 5 Data Screening
http://slidepdf.com/reader/full/5-data-screening 15/33
Flowchart for Formal Statistical Test
STATISTICAL ANALYSIS
STATISTICAL ANALYSIS
8/16/2019 5 Data Screening
http://slidepdf.com/reader/full/5-data-screening 16/33
DATA TRANSFORMATION
WHY? Many statistical model based on mean and require it asappropriate central tendency measure (need to treat data asnormal)
Linear least squares regression assumes that the relationshipbetween two variables is linear
Can “straighten” a nonlinear relationship by transforming one orboth of the variables
Often transformations will ‘fix’ problem distributions so that we
can use least-squares regression
When transformations fail: use nonparametric regression,which makes fewer assumptions about the data
STATISTICAL ANALYSIS
STATISTICAL ANALYSIS
8/16/2019 5 Data Screening
http://slidepdf.com/reader/full/5-data-screening 17/33
TYPICAL TRANFORMATION TYPES
-Logarithmic
-Square Root
-Angular
-Box-Cox
-Reciprocal(inverse)
-Power
STATISTICAL ANALYSIS
STATISTICAL ANALYSIS
8/16/2019 5 Data Screening
http://slidepdf.com/reader/full/5-data-screening 18/33
LOGARITHMIC TRANFORMATIONS
STATISTICAL ANALYSIS
STATISTICAL ANALYSIS
8/16/2019 5 Data Screening
http://slidepdf.com/reader/full/5-data-screening 19/33
STATISTICAL ANALYSIS
STATISTICAL ANALYSIS
8/16/2019 5 Data Screening
http://slidepdf.com/reader/full/5-data-screening 20/33
SQUARE ROOT
TRANSFORMATIONS
STATISTICAL ANALYSIS
STATISTICAL ANALYSIS
8/16/2019 5 Data Screening
http://slidepdf.com/reader/full/5-data-screening 21/33
RECIPROCAL/INVERSE
TRANSFORMATION
STATISTICAL ANALYSIS
STATISTICAL ANALYSIS
8/16/2019 5 Data Screening
http://slidepdf.com/reader/full/5-data-screening 22/33
CAN ALL DATA CAN BE
TRANSFORM
May ignore since it’s difficult to interprets afterwardseven though the transformation can be done
If we have a large number of identical observations,usually at zero then any transformation will leave halfthe observations with the same value, at the extreme ofthe distribution. It is impossible to transform these data
to a normal distribution.
STATISTICAL ANALYSIS
STATISTICAL ANALYSIS
8/16/2019 5 Data Screening
http://slidepdf.com/reader/full/5-data-screening 23/33
can use non-parametric methods( eg:Mann-
Whitney test). These will give us a significancetest but usually no confidence interval
There are some data which should not be
transformed. Sometimes we are interested inthe data in the actual units only: eg cost data
STATISTICAL ANALYSIS
CAN ALL DATA CAN BE
TRANSFORM
STATISTICAL ANALYSIS
8/16/2019 5 Data Screening
http://slidepdf.com/reader/full/5-data-screening 24/33
Practical SessionOpen SPSS Data: sga
STATISTICAL ANALYSIS
STATISTICAL ANALYSIS
8/16/2019 5 Data Screening
http://slidepdf.com/reader/full/5-data-screening 25/33
STATISTICAL ANALYSIS
STATISTICAL ANALYSIS
8/16/2019 5 Data Screening
http://slidepdf.com/reader/full/5-data-screening 26/33
STATISTICAL ANALYSIS
STATISTICAL ANALYSIS
8/16/2019 5 Data Screening
http://slidepdf.com/reader/full/5-data-screening 27/33
STATISTICAL ANALYSIS
STATISTICAL ANALYSIS
8/16/2019 5 Data Screening
http://slidepdf.com/reader/full/5-data-screening 28/33
Data transformation
STATISTICAL ANALYSIS
STATISTICAL ANALYSIS
8/16/2019 5 Data Screening
http://slidepdf.com/reader/full/5-data-screening 29/33
STATISTICAL ANALYSIS
STATISTICAL ANALYSIS
8/16/2019 5 Data Screening
http://slidepdf.com/reader/full/5-data-screening 30/33
STATISTICAL ANALYSIS
STATISTICAL ANALYSIS
8/16/2019 5 Data Screening
http://slidepdf.com/reader/full/5-data-screening 31/33
STATISTICAL ANALYSIS
STATISTICAL ANALYSIS
8/16/2019 5 Data Screening
http://slidepdf.com/reader/full/5-data-screening 32/33
STATISTICAL ANALYSIS
STATISTICAL ANALYSIS