mastering data analysis tools.pptx
Post on 14-Apr-2018
233 Views
Preview:
TRANSCRIPT
-
7/30/2019 Mastering Data Analysis Tools.pptx
1/67
Data Analysis
Using SPSS
Know where to find information and
how to use it, that's the secret ofsuccess:Albert Einstein
-
7/30/2019 Mastering Data Analysis Tools.pptx
2/67
Workshop Objectives:
To develop Data Analysis Skills
Use of appropriate Statistical Techniques
Use of SPSS to perform Statistical Analysis
Understanding and Interpretation of Results
-
7/30/2019 Mastering Data Analysis Tools.pptx
3/67
Types of Analysis:
Descriptive Analysis Inferential Analysis
Model Building Techniques Multivariate Analysis
IBM SPSS
-
7/30/2019 Mastering Data Analysis Tools.pptx
4/67
Before starting analyzing data let
me introduce SPSS and its basicstructure
-
7/30/2019 Mastering Data Analysis Tools.pptx
5/67
Descriptive Analysis:
Descriptive Analysis for Qualitative Variables
Descriptive Analysis for Quantitative Variables
-
7/30/2019 Mastering Data Analysis Tools.pptx
6/67
Descriptive Analysis of Qualitative Data
Tables Graphs Numbers
One Way TableTwo Way Table
.
.
.
N-Way Table
Bar Chart
Pie Chart
Clustered Bar
Chart
Percentages
Qualitative Data
(Categorical Data)
-
7/30/2019 Mastering Data Analysis Tools.pptx
7/67
-
7/30/2019 Mastering Data Analysis Tools.pptx
8/67
Descriptive Analysis for Quantitative Data
Quantitative Data(Numerical Data)
Tables Graphs Numbers
CenterImportant
PointsVariation Distribution
Mean
Median
Mode
Geometric Mean
Harmonic Mean
Trimmed Mean
Median
Quartiles
Percentiles
Range
Inter Quartile Range
Variance
Standard Deviation
Skewness
Kurtosis
Frequency Distribution
Stem and Leaf
Histogram
Box-Plot
-
7/30/2019 Mastering Data Analysis Tools.pptx
9/67
Tabular Methods
-
7/30/2019 Mastering Data Analysis Tools.pptx
10/67
Graphical Methods
0
50
100
150
200
250
15750 35750 55750 75750 95750 115750
Frequency
Histogram
-
7/30/2019 Mastering Data Analysis Tools.pptx
11/67
Numerical Methods
-
7/30/2019 Mastering Data Analysis Tools.pptx
12/67
Practice Session for Descriptive
Analysis
Import Customers Databas.xls into SPSS
Label data properly
Make One-way tables for variables (Age, Sex,OwnHome and Married). Also make pie chart and bar
chart for these variables Make Two-way tables (sex by OwnHome and Married
by OwnHome). Also make clustered bar chart for eachvariable
Produce Detailed Numerical descriptive statistics forvariable Purchases (Mean, Median, ..). Alsomake histogram and stem & leaf and box-plot forvariable Purchases
Perform previous step by gender
-
7/30/2019 Mastering Data Analysis Tools.pptx
13/67
Inferential AnalysisComparing Groups
-
7/30/2019 Mastering Data Analysis Tools.pptx
14/67
Comparing
Groups
OneGroup
MeasuredOnce
NormalityAssumption
Fulfilled
NotFulfilled
MeasuredTwice
NormalityAssumption
Fulfilled
NotFulfilled
TwoGroups
NormalityAssumption
Fulfilled
NotFulfilled
Homogeneity ofVariances
Assumption
Fulfilled
NotFulfilled
More thanTwo
Groups
NormalityAssumption
Fulfilled
NotFulfilled
Homogeneity ofVariances
Assumption
Fulfilled
NotFulfilled
Parametric & Non-Parametric
Inference
Normality
+
Equal
Variances
Normality
+
Un-Equal
Variances
Normality
+
Un-Equal
Variances
Normality
+
Equal
Variances
-
7/30/2019 Mastering Data Analysis Tools.pptx
15/67
Comparing One Group
Kinds of Research Questions
For the one-sample situation, the prime concern in research is
examining a measure of central tendency (location) for the
population of interest. The best-known measures of locationare the mean and median. For a one-sample situation, we
might want to know if the average waiting time in a doctor's
office is greater than one hour, or if the average growth of
roses is 4 inches or more with a certain fertilizer, oris annualreturn is 10.2% for the banks that exercised comprehensive
planning.
-
7/30/2019 Mastering Data Analysis Tools.pptx
16/67
Comparing Two Groups
Kinds of Research Questions
One of the most common tasks in research is to compare twopopulations (groups). We might want to compare the income levelof two regions, the nitrogen content of two lakes, or theeffectiveness of two drugs.
The first question that arises is what aspects (parameters) of thepopulations shall we compare. We might consider comparing theaverages, the medians, the standard deviations, the distributionalshapes (histogram), or maximum values. We base the comparison
parameter on our particular problem.Perhaps the simplest comparison that we can make is between themeans of the two populations.
-
7/30/2019 Mastering Data Analysis Tools.pptx
17/67
Comparing more than two Groups
Kinds of Research Questions
One of the most common tasks in research is to compare severalpopulations (groups). We might want to compare the income levelof three regions, the nitrogen content of four lakes, or theeffectiveness of four drugs.
The first question that arises concerns which aspects (parameters)of the populations we should compare. We might considercomparing the means, medians, standard deviations, distributionalshapes (histograms), or maximum values. We base the comparison
of parameter on our particular problem.Perhaps the simplest comparison that we can make is to comparemeans of several populations.
-
7/30/2019 Mastering Data Analysis Tools.pptx
18/67
One Sample t-test
One Sample t-test is used to compare one group to a
given standard on the basis of Arithmetic Average
(Mean).
-
7/30/2019 Mastering Data Analysis Tools.pptx
19/67
Assumptions of the One-sample t-test
The data are continuous.
The data follow the Normal distribution.
The sample is a simple random sample from the
population.
-
7/30/2019 Mastering Data Analysis Tools.pptx
20/67
Hypotheses and Formulas
0 0 0: , :AH H
2
Xt
s
n
With
1df n
-
7/30/2019 Mastering Data Analysis Tools.pptx
21/67
Case Study
A manufacturer of high-performance automobiles
produces disc brakes that must measure 322 millimeters
in diameter. Quality control manager randomly selects
128 discs and measures their diameters.
We can use One Sample T Test to determine whether or
not the mean diameters of the brakes in sample
significantly differ from 322 millimeters.
-
7/30/2019 Mastering Data Analysis Tools.pptx
22/67
SPSS Analytic Procedure
-
7/30/2019 Mastering Data Analysis Tools.pptx
23/67
The Sign Test
The sign test is perhaps the oldest of all the nonparametricprocedures. This nonparametric test is based on thebinomial distribution. It assumes two mutually exclusiveoutcomes, constant or stable probability of success orfailure, and n independent trials
The terminology, sign test, reinforces the point that thedata are converted to a series of plus and minus signs. Thetest is based on the number of plus signs that occur. Zerodifferences are thrown out, and the sample size is reduced
accordingly.
-
7/30/2019 Mastering Data Analysis Tools.pptx
24/67
Assumptions of the Sign Test
The data are continuous
The distribution of these data is symmetric.
The measurement scale is at least interval.
-
7/30/2019 Mastering Data Analysis Tools.pptx
25/67
Hypotheses and Formulas
w
w
wZ
1
4
1 2 1
24
w
w
n n
n n n
w R
0 0
0 0 0
:
: , ,A
H
H
-
7/30/2019 Mastering Data Analysis Tools.pptx
26/67
Case Study
A Researcher believes that median salary of HRManager is 50 thousands. To confirm thishypothesis he selects a random sample of 1207
HR Managers from different companies.
We can use Sign Test to determine whether ornot the median salary is significantly different
from 50 thousands.
-
7/30/2019 Mastering Data Analysis Tools.pptx
27/67
SPSS Analytic Procedure
-
7/30/2019 Mastering Data Analysis Tools.pptx
28/67
Paired Samples t-test
Kinds of Research Questions
In the paired case, we take two measurements on sameindividual at different times, or we have one measurement
on each individual of a pair.Examples of the first case are two insurance-claim adjustersassessing the damage for the same 15 cases. Evaluation ofthe improvement in aerobic fitness for 15 subjects wheremeasurements are made at the beginning of the fitnessprogram and at the end of it.An example of the second paired situation is the testing ofthe effectiveness of two drugs, A and B, on 20 pairs ofpatients who have been matched on physiological andpsychological variables. One patient in the pair receivesdrug A, and the other patient gets drug B.
-
7/30/2019 Mastering Data Analysis Tools.pptx
29/67
Assumptions of the paired-sample t-test
The data are continuous.
The data, i.e., the differences for the matched-pairs,
follow a Normal distribution.
The sample of pairs is a simple random sample from
its population.
-
7/30/2019 Mastering Data Analysis Tools.pptx
30/67
Hypotheses and Formulas
0: 0 , : 0d A dH H
2
d d
d
Xt
n
s
With
1df n
-
7/30/2019 Mastering Data Analysis Tools.pptx
31/67
Case Study
A researcher in behavioral medicine believes that stress oftenmakes asthma symptoms worse for people who suffer from thisrespiratory disorder. Therefore, the researcher decides to study theeffect of relaxation training on the severity of their symptoms.
A sample of 5 patients is selected. During the week beforetreatment, the investigator records the severity of their symptomsby measuring how many doses of medication are needed forasthma attacks. Then the patients receive relaxation training. Forthe week following the training the researcher once again recordsthe number of doses used by each patient.
Data from Gravetter and Wallnau (4th Ed.) p. 319.
-
7/30/2019 Mastering Data Analysis Tools.pptx
32/67
SPSS Analytic Procedure
-
7/30/2019 Mastering Data Analysis Tools.pptx
33/67
Wilcoxon Signed Rank test
Wilcoxon Signed Rank test is used to test the
median difference of zero in case ofnon
normal populations.
-
7/30/2019 Mastering Data Analysis Tools.pptx
34/67
Assumptions of the two-sample t-test
The differences are continuous.
The distribution of these differences is symmetric.
The differences are mutually independent.
The measurement scale is at least interval.
-
7/30/2019 Mastering Data Analysis Tools.pptx
35/67
Hypotheses and Formulas
w
w
wZ
1
41 2 1
24
w
w
n n
n n n
w R
0 1 2
1 1 2
:
:
H
H
-
7/30/2019 Mastering Data Analysis Tools.pptx
36/67
Case Study
An educationist wants to see the effectiveness of
new teaching method. For this She selected 600
students and record their scores in a test of 150
marks. The scores are recorded before and after thenew teaching method.
The Wilcoxon Signed Rank test can be used to test
the effectiveness of new teaching method.
-
7/30/2019 Mastering Data Analysis Tools.pptx
37/67
SPSS Analytic Procedure
-
7/30/2019 Mastering Data Analysis Tools.pptx
38/67
Independent Samples t-test
Equal Variances
Independent sample t test is used to compare two
groups on the basis of their averages.
-
7/30/2019 Mastering Data Analysis Tools.pptx
39/67
Assumptions of the two-sample t-test
The data are continuous
The data follow the Normal distribution.
The variances of the two populations are equal
The two samples are independent
Both samples are simple random samples from their
respective populations.
-
7/30/2019 Mastering Data Analysis Tools.pptx
40/67
Hypotheses and Formulas
0 1 2 1 2: , :AH H
1 2 1 2
2 2
1 1 2 2
1 2 1
1 1 1 1
2 2
X Xt
n s n s
n n n n
With
1 22df n n
-
7/30/2019 Mastering Data Analysis Tools.pptx
41/67
Case Study
An analyst at a department store wants to evaluate a
recent credit card promotion. To this end, 500
cardholders were randomly selected. Half received
an ad promoting a reduced interest rate onpurchases made over the next three months, and
half received a standard seasonal ad.
We can use Independent-Samples T Test to compare
the spending of the two groups.
-
7/30/2019 Mastering Data Analysis Tools.pptx
42/67
SPSS Analytic Procedure
d d l
-
7/30/2019 Mastering Data Analysis Tools.pptx
43/67
Independent Samples t-test
Unequal Variances
Independent Samples t-test is use to compare two
independent groups on the basis of average. This test
does not require homogeneity of the variances.
-
7/30/2019 Mastering Data Analysis Tools.pptx
44/67
Hypotheses and Formulas
0 1 2 1 2: , :AH H
1 2 1 22 2
1 2
1 2
X Xt
n n
s s
With
22 2
1 2
1 2
2 22 2
1 2
1 2
1 21 1
n ndf
n n
n n
s s
s s
-
7/30/2019 Mastering Data Analysis Tools.pptx
45/67
Case Study
A researcher wishes to compare the
expenditure behavior of the students, one of
the research question is to see the difference
in expenditures by gender.
-
7/30/2019 Mastering Data Analysis Tools.pptx
46/67
SPSS Analytic Procedure
-
7/30/2019 Mastering Data Analysis Tools.pptx
47/67
Mann-Whitney Test
Mann-Whitney Test is used to compare the
two independent groups on the basis of
medians. This test does not require the
assumption of normality.
-
7/30/2019 Mastering Data Analysis Tools.pptx
48/67
Mann-Whitney U Test Assumptions
The variable of interest is continuous. The measurement scale
is at least ordinal.
The probability distributions of the two populations are
identical, except for location.
The two samples are independent.
Both samples are simple random samples from their
respective populations.
-
7/30/2019 Mastering Data Analysis Tools.pptx
49/67
Hypotheses and Formulas
1 2
1 2 1 2
1 1
2
1
12
1
2
u
u
n n
n n n n
n nu w
u
u
uz
W is the sum of ranks of the smaller sample
0 1 2
1 2
:
:A
H
H
-
7/30/2019 Mastering Data Analysis Tools.pptx
50/67
Case Study
Data on birth weight of infants born to mothers with
different levels of prenatal care. Two independent
samples data for univariate analysis. Test data for
Mann-Whitney U-Test, obtained from Howell, DavidD. Fundamental Statistics for the Behavioral Sciences
3rd Edition, p385.
-
7/30/2019 Mastering Data Analysis Tools.pptx
51/67
SPSS Analytic Procedure
O W A l i f V i
-
7/30/2019 Mastering Data Analysis Tools.pptx
52/67
One-Way Analysis of Variance
Equal Variances
One Way Analysis of Variance is used to
compare more than two groups on the basis
of their averages.
O W A l i f V i
-
7/30/2019 Mastering Data Analysis Tools.pptx
53/67
One-Way Analysis of Variance
Assumptions The data are continuous.
The data follow the Normal distribution, each groupis normally distributed.
The variances of the populations are equal.
The groups are independent.
Each group is a simple random sample from itspopulation.
-
7/30/2019 Mastering Data Analysis Tools.pptx
54/67
Hypotheses and Formulas
0 1 2 3: .......
:
k
A
H
H Atleast one pair is significantly diffrent
MSGF
MSE
MSG is the Mean Square of Group and MSE is the Mean Square Error
-
7/30/2019 Mastering Data Analysis Tools.pptx
55/67
Example
This is a hypothetical data file that concerns the
popularity of a TV channel. Using a prototype, the
marketing team has collected focus group data. One
of the question of interest is to see the difference inpopularity of the TV channel in different age groups.
This hypothesis can be tested using One Way ANOVA.
-
7/30/2019 Mastering Data Analysis Tools.pptx
56/67
SPSS Analytic Procedure
One Way Analysis of Variance
-
7/30/2019 Mastering Data Analysis Tools.pptx
57/67
One-Way Analysis of Variance
Unequal Variances
Welch ANOVA is used to compare more than two
groups on the basis of averages. This test does not
require the homogeneity of variances.
Welch Analysis of Variance
-
7/30/2019 Mastering Data Analysis Tools.pptx
58/67
Welch Analysis of Variance
Assumptions
The data are continuous
The data follow the Normal distribution, each group is
normally distributed.
The groups are independent.
Each group is a simple random sample from its population.
-
7/30/2019 Mastering Data Analysis Tools.pptx
59/67
0 1 2 3: .......
:
k
A
H
H Atleast one pair is significantly diffrent
2
. ..21
2
2 21
1
1
1
2 21 1 / 1
1
/
/
ki
ii
i
ki
iki
ii
n
X XkF
k i nk
i
s
sn
sn
12
2
221
1
/31 / 1
1/
ki i
ini
i ii
df nk
n S
n S
With
-
7/30/2019 Mastering Data Analysis Tools.pptx
60/67
Case Study
A sales manager evaluates two new training courses.
Sixty employees, divided into three groups, all
receive standard training. In addition, group 2
receives technical training, and group 3 receives ahands-on tutorial. Each employee was tested at the
end of the training course and their score recorded.
-
7/30/2019 Mastering Data Analysis Tools.pptx
61/67
SPSS Analytic Procedure
-
7/30/2019 Mastering Data Analysis Tools.pptx
62/67
Kruskal-Wallis Test
Kruskal-Wallis H-test is used to compare more
than two groups on the basis of their medians.
Kruskal Wallis Test
-
7/30/2019 Mastering Data Analysis Tools.pptx
63/67
Kruskal-Wallis Test
Assumptions
The variable of interest is continuous, themeasurement scale is at least ordinal.
The probability distributions of the populations areidentical, except for location.
The groups are independent.
All groups are simple random samples from theirrespective populations.
-
7/30/2019 Mastering Data Analysis Tools.pptx
64/67
Hypotheses and Formulas
1
123 1
1
ki
i i
RH N
N N n
0 1 2: ......
:
k
A
H
H At least one pair of median is significantly diffrent
-
7/30/2019 Mastering Data Analysis Tools.pptx
65/67
Case Study
A health scientist wishes to compare the
survival experiences after breast cancer with
different Pathological Tumor Size (Categories).
We can use Kruskal-Wallis H-Test to determine
whether or not the median survival time of
the patients is significantly differ in differentpathological tumor size.
-
7/30/2019 Mastering Data Analysis Tools.pptx
66/67
SPSS Analytic Procedure
-
7/30/2019 Mastering Data Analysis Tools.pptx
67/67
Model Building Techniques
top related