Download - Mediha Ass of Spss
-
8/6/2019 Mediha Ass of Spss
1/23
SUBMITTED TO:
MS.FARHAT IQBAL
SUBMITTED BY:MEDIHA WAHEED
MAJOR:
BBA (SEM-8)
DATE:
12TH, MAY, 2011
Data AnalysisCorrelation & Regression Tests
-
8/6/2019 Mediha Ass of Spss
2/23
Data Analysis
Correlation & Regression Tests Page 2
QUESTION NO. 1The chief of police has given the department statistician the task of determining whether air
temperature is related to the number of traffic accidents in the city each day. On each of 8
randomly selected days the statistician records the maximum temperature and the number ofaccidents took place in the city. The results are:
Maximum Temperature (C) 14 23 16 22 30 34 19 27
No. of traffic accidents 4 6 3 6 8 11 5 7
Find the correlation coefficient. Test the hypothesis that air temperature is related to the number
of accidents.
SOLUTION
STEP 1
Hypothesis formulation:
Symbolically:
HO: r = 0
HA: r 0
Theoretically:HO: Number of accidents is dependent on air temperature
HA : Number of accidents is not dependent on air temperature
STEP 2
Determine the level of significance:
Alpha () = 5% or 0.05
-
8/6/2019 Mediha Ass of Spss
3/23
Data Analysis
Correlation & Regression Tests Page 3
STEP 3
Correlations
Max.temp No.of.traffic.accidents
Max.temp Pearson Correlation 1 .962**
Sig. (2-tailed) .000
N 8 8
No.of.traffic.accide
nts
Pearson Correlation .962**
1
Sig. (2-tailed) .000
N 8 8
**. Correlation is significant at the 0.01 level (2-tailed).
STEP 4
Decision making;
As correlation coefficient( r ) is 0.962, it shows perfect positive relationship between X and Y. it
means that there is a perfect positive relation relationship between air temperature and number of
accidents. Numbers of accidents are dependent on air temperature and as air temperature
increases or decreases, number of accidents also increases or decreases.
QUESTION NO.2:
Checkout operators to be employed by a super market chain are given a one week training
period, and then given speed and accuracy test. Separate scores are recorded for the two aspects.
For ten randomly chosen operators, the results were as follows (recorded on scale of 0 to 100)
Speed score (X) 81 90 62 80 43 76 58 82 90 36
Accuracy Score(Y) 27 38 37 60 65 52 82 47 58 18
-
8/6/2019 Mediha Ass of Spss
4/23
Data Analysis
Correlation & Regression Tests Page 4
a) Draw a scatter diagram and decide which correlation test should be employed on this
data.
b) Calculate the value of correlation coefficient and interpret
c) Test the value of r for significance
SOLUTION
PART A:
As the following scatter plot shows a non linear relationship, we go for a spearman correlation
( r ) Coefficient test. The given data of scores is non directional.So,we opt for the two tail
spearman correlation ( r )
-
8/6/2019 Mediha Ass of Spss
5/23
Data Analysis
Correlation & Regression Tests Page 5
PART B:
STEP 1
Hypothesis formulation:
Symbolically:
HO: r = 0
HA: r 0
Theoretically:
HO: The accuracy of scores is dependent upon the speed of the operators.
HA : The accuracy of scores is not dependent upon the speed of the operators
STEP 2
Determine the level of significance:
Alpha() = 5% or 0.05
STEP 3
Spss test of correlation and its result
Correlations
Speed. Score Accuracy. Score
Spearman's
rho
Speed. Score Correlation
Coefficient
1.000 -.061
Sig. (2-tailed) . .868
N 10 10
Accuracy.
Score
Correlation
Coefficient
-.061 1.000
Sig. (2-tailed) .868 .
N 10 10
-
8/6/2019 Mediha Ass of Spss
6/23
Data Analysis
Correlation & Regression Tests Page 6
STEP 4
As the value of correlation coefficient ( r ) = -0.061,its shows a week negative relationship
between the two variables of speed and accuracy scores. It shows that the speed of the operators
in a super chain is negatively related to the accuracy of keeping records. The accuracy of scores
is not very much dependent upon the speed of operators.
PART C:
Value of r for significance;
In order to check the value of r for significance we compare the given p value in the above
table with the level of significance Alpha() = 5% or 0.05
P value = 0.868
Alpha () = 5% or 0.05
P value > Alpha ()
0.868 > 0.05
As our p value is greater than the level of significance, we accept the null hypothesis (HO) and
reject the alternative hypothesis (HA). It gives evidence against HA and in favor ofHO.
This shows that our spearman Correlation coefficient ( r ) is not considered as significant, it is
insignificant for the given variables of speed and accuracy scores.
QUESTION NO. 3.
The expenditure on child care facilities in the previous year by a random sample of 6 local
council, and the number of children under age 5living in the electorates are shown below:
Council 1 2 3 4 5 6
Expenditures (000Rs.) 125 180 154 90 102 63
Number of Children 1723 2510 1856 1525 1624 920
a) Draw a scatter diagram of the data
b) Find the least square regression line of expenditures on number of children
c) Interpret the four tables of results
-
8/6/2019 Mediha Ass of Spss
7/23
Data Analysis
Correlation & Regression Tests Page 7
d) Draw the line on the scatter diagram. Comment on whether you feel the line is good fit of
the data.
e) Using the estimated line, predict the expenditures of a local council that has 1250
children under the age of 5.
f) Confirm the significance of alpha?
SOLUTION
PART A
Scatter diagram
-
8/6/2019 Mediha Ass of Spss
8/23
Data Analysis
Correlation & Regression Tests Page 8
PART B
STEP 1
Hypothesis formulation:
Symbolically:
HO: = 0
HA: 0
Theoretically:
HO: The Expenditure on child care facilities is dependent on the number of childrens.
HA: The Expenditure on child care facilities is not dependent on the number of childrens
STEP 2
Determine the level of significance:
Alpha() = 5% or 0.05
STEP 3
Compute the spss,regression test
Variables Entered/RemovedbModel
Variables
Entered
Variables
Removed Method
1 No.of.Childrena
. Entera. All requested variables entered.
b. Dependent Variable: Expenditures
-
8/6/2019 Mediha Ass of Spss
9/23
Data Analysis
Correlation & Regression Tests Page 9
Model SummaryModel R R Square
Adjusted R
Square
Std. Error of
the Estimate
1 .949a
.900 .875 15.17804
a. Predictors: (Constant), No.of.Children
Coefficientsa
Model
Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
95.0%
Confidence
Interval for B
B Std. Error Beta
Lower
Bound
Upper
Bound
1 (Constant) -15.185 23.164 -.656 .548 -79.498 49.128
No.of.Children .079 .013 .949 6.012 .004 .043 .116
a. Dependent Variable: Expenditures
ANOVAbModel
Sum of
Squares df
Mean
Square F Sig.
1 Regression 8326.508 1 8326.508 36.144 .004a
Residual 921.492 4 230.373
Total 9248.000 5
a. Predictors: (Constant), No.of.Children
b. Dependent Variable: Expenditures
-
8/6/2019 Mediha Ass of Spss
10/23
Data Analysis
Correlation & Regression Tests Page 10
Least square regression line of
expenditures on numberof children
Yi = + xi= (-15.185) + (0.079) xi
S.E = (23.164) (0.013)
t-value = (-0.656) (6.012)
p-value = (0.548) (0.004)
R = 0.949, R2
= 0.900
STEP 4
Decision making
As our p-value is less than the level of significance, alpha() = 5% or 0.05.we will reject the null
hypothesis(HO)
and accept the alternative hypothesis(HA)
.It gives evidence against HO
and in
favor ofHA.
P-value = 0.004 < = 0.05
It confirms the significance of HO, concluding that expenditure is significantly related to the
number of children. We reject HO and also conclude thatBeta (), which is a partial slope
coefficient is significant.
PART CInterpretation of above results from the table:
-
8/6/2019 Mediha Ass of Spss
11/23
Data Analysis
Correlation & Regression Tests Page 11
Interpretation of Alpha ():
As alpha () is the intercept and it shows the point where x = 0 and regression line touches the y-
axis.
Here, it shows that when number of children is (x) = 0, the expenditure (y) is -15.185(000Rs.).It
means that when there is no children, then the expenditure reduces to 15.185(000Rs.).
Interpretation of Beta ()
As beta () is the slope of the regression line, it means that one unit change in x will leads to
0.097 units increase in Y. If there is an increase of one child, expenditure will increase by
0.097(000Rs.).
Interpretation of correlation coefficient (R)As our correlation coefficient (r) is 0.949, it shows a strong positive linear relationship between
the number of children and expenditure. Expenditure is strongly related to the number of
children. It shows that when number of children increases or decreases, the expenditure will also
increase or decrease in a same manner.
Interpretation of coefficient of determination (R2)
As our coefficient of determination (R2) is 0.900 or 90%,it shows that the 90% variation in y is
due to the x variable.90% of the variations in the expenditure is due to the number of children.
Interpretation of Anova Table
Total sum of square = regression sum of square + residual sum of square
(Yi Y)2 (Yi Y)
2 (Yi Y)
2
9248.00 = 8326.508 + 921.492
Interpretation of Total Deviation in Y (Yi Y)2
As total deviation is the difference between the best prediction and the actual value. Here the
value of 9248.00 represents the Total Deviation in Y (Yi Y)2.The above table shows that the
total deviation between the estimated and actual of expenditure is 9648(000Rs.).
Interpretation of regression sum of square (Yi Y)2
-
8/6/2019 Mediha Ass of Spss
12/23
Data Analysis
Correlation & Regression Tests Page 12
This shows that out of total deviation, 8326.508 =8327 units shows that the deviation explained
by the estimated regression line of yon x .in the given data, 8327 units of deviation is explained
by the estimated regression line of expenditures on number of children.
Interpretation of residual sum of square (Yi Y)2
This shows that out of total deviation, 921.492 =921 units shows that the deviation not
explained by the estimated regression line of yon x .in the given data, 921 units of deviation is
not explained by the estimated regression line of expenditures on number of children.
PART D
Determination of whether the estimated line is a good
fit of the data.
-
8/6/2019 Mediha Ass of Spss
13/23
Data Analysis
Correlation & Regression Tests Page 13
Interpretation:
As the above scatter plot depicts that there is a linear relationship, the estimated line or the
regression model is good fit of the given data regarding the expenditure and number of children.
It is a good fit because the above calculated values and diagram depicts that coefficient of
determination (R2
) is 0.900 or 90%,which is quite high to support the good fitted model, as data
points are also closer to each other.
PART E
Using the estimated line, prediction of the
Expenditures of a local council that has 1250 children
Putting the value of x = 1250 in the estimated regression line to find out the estimated value of y.
Yi = + xi= (-15.185) + (0.079) xi
= (-15.185) + (0.079)(1250)
= 83.565.
This shows that when the number of children increases to 1250, the expenditure on the child
facilities will increase to 83.565(000rs.).The predicted expenditure will be 83.565(000rs.), when
number of children will increased to 1250.
PART F
Checking the significance of Alpha()
STEP 1
Hypothesis formulation:
Symbolically:
HO: = 0
HA: 0
-
8/6/2019 Mediha Ass of Spss
14/23
Data Analysis
Correlation & Regression Tests Page 14
Theoretically:
HO: The Expenditure on child care facilities is dependent on the number of childrens.
HA: The Expenditure on child care facilities is not dependent on the number of childrens
STEP 2
Determine the level of significance:
Alpha() = 5% or 0.05
STEP 3
Checking the p value for alpha from the above tables
and estimated regression model
Yi = + xi= (-15.185) + (0.079) xi
S.E = (23.164) (0.013)
t-value = (-0.656) (6.012)
p-value = (0.548) (0.004)
As, it is shown that
t-value = -0.656
p-value = 0.548
STEP 4
Decision making
As our p-value is greater than the level of significance, alpha() = 5% or 0.05.we will accept the
null hypothesis(HO) and reject the alternative hypothesis(HA).It gives evidence against HA and in
favor ofHO.
P-value = 0.548 > = 0.05
-
8/6/2019 Mediha Ass of Spss
15/23
Data Analysis
Correlation & Regression Tests Page 15
It confirms the insignificance of HO, concluding that alpha ( ) which is the intercept of the
simple linear regression is statistically insignificant.
QUESTION NO. 4Consider the following data for the variables X and Y.
X 1 2 3 4 5 6 7 8 9
Y 1 4 9 16 25 49 64 81 93
a) Plot the data on a scatter diagram.
b) Find the least square regression line on Y and X.
c) Draw the line on the diagram.d) Interpret the results
e) Use the line found in (b) to predict the value of y for an X value of 6. Can you make
the better Prediction without using this line? Why or Why not?
f) Confirm the significance of alpha?
SOLUTION
PART A
-
8/6/2019 Mediha Ass of Spss
16/23
Data Analysis
Correlation & Regression Tests Page 16
Scatter diagram
PART B
STEP 1
Hypothesis formulation:
Symbolically:
HO: = 0
HA: 0
Theoretically:
-
8/6/2019 Mediha Ass of Spss
17/23
Data Analysis
Correlation & Regression Tests Page 17
HO: The Y variable is dependent on the variable X
HA: The Y variable is not dependent on the variable X
STEP 2
Determine the level of significance:
Alpha() = 5% or 0.05
STEP 3
Compute the spss, regression test
Variables Entered/RemovedbModel
Variables
Entered
Variables
Removed Method
1 Xa
. Enter
a. All requested variables entered.
b. Dependent Variable: Y
ANOVAbModel
Sum of
Squares df
Mean
Square F Sig.
1 Regression 9176.067 1 9176.067 124.982 .000a
Residual 513.933 7 73.419
Total 9690.000 8
a. Predictors: (Constant), X
Model SummaryModel R R Square
Adjusted R
Square
Std. Error of
the Estimate
1 .973a .947 .939 8.56849
a. Predictors: (Constant), X
-
8/6/2019 Mediha Ass of Spss
18/23
Data Analysis
Correlation & Regression Tests Page 18
ANOVAbModel
Sum of
Squares df
Mean
Square F Sig.
1 Regression 9176.067 1 9176.067 124.982 .000a
Residual 513.933 7 73.419
Total 9690.000 8
a. Predictors: (Constant), X
b. Dependent Variable: Y
Least square regression line of Y on X
Yi = + xi= (-23.833) + (12.367) xi
S.E = (6.225) (1.106)
t-value = (-3.829) (11.180)
p-value = (0.006) (0.000)
R = 0.973, R2
= 0.947
STEP 4
Decision making
Coefficientsa
Model
Unstandardized
Coefficients
Standardized
Coefficients
t Sig.
95.0% Confidence
Interval for B
B Std. Error Beta
Lower
Bound
Upper
Bound
1 (Constant) -23.833 6.225 -3.829 .006 -38.553 -9.114
X 12.367 1.106 .973 11.180 .000 9.751 14.982
a. Dependent Variable: Y
-
8/6/2019 Mediha Ass of Spss
19/23
Data Analysis
Correlation & Regression Tests Page 19
As our p-value is less than the level of significance, alpha() = 5% or 0.05.we will reject the null
hypothesis(HO) and accept the alternative hypothesis(HA).
P-value = 0.000 < = 0.05
It confirms the significance of HO,
concluding that variable Y is significantly related to the
variable X. We reject HO and also conclude thatBeta (), which is a partial slope coefficient is
significant.
PART C
Determination of whether the estimated line is a good
fit of the data.
-
8/6/2019 Mediha Ass of Spss
20/23
Data Analysis
Correlation & Regression Tests Page 20
Interpretation:
As the above scatter plot depicts that there is a linear relationship, the estimated line or the
regression model is good fit of the given data regarding the Y and X variable. It is a good fit
because the above calculated values and diagram depicts that coefficient of determination (R2 ) is
0.947 or 95%,which is quite high to support the good fitted model, as data points are also closer
to each other.
PART D
Interpretation of above results from the table:
Interpretation of Alpha ():As alpha () is the intercept and it shows the point where x = 0 and regression line touches the y-
axis.
Here, it shows that when variable (x) = 0, then variable (y) is -23.833 units.It means that when
there is no value of X or it becomes zero, then the variable (y) reduces to 23.833 units.
Interpretation of Beta ()
As beta () is the slope of the regression line, it means that one unit change in x will leads to
12.367 units increase in Y. If there is an increase of X by one unit, expenditure then Y will
increase by 12.367 units.
Interpretation of correlation coefficient (R)
As our correlation coefficient (r) is 0.973, it shows a strong positive linear relationship between
the X and Y. Variable X is strongly related to the variable Y. It shows that when units of X
increases or decreases, the unit of Y will also increase or decrease in a same manner.
Interpretation of coefficient of determination (R2
)As our coefficient of determination (R2 ) is 0.947 or 94.7%, = 95 % , it shows that the 95%
variation in Y is due to the X variable.95% of the variations in Y is caused by X.
Interpretation of Anova Table
-
8/6/2019 Mediha Ass of Spss
21/23
Data Analysis
Correlation & Regression Tests Page 21
Total sum of square = regression sum of square + residual sum of square
(Yi Y)2 (Yi Y)
2 (Yi Y)
2
9690.00 = 9176.067 + 513.933
Interpretation of Total Deviation in Y (Yi Y)2As total deviation is the difference between the best prediction and the actual value. Here the
value of 9690.00 represents the Total Deviation in Y (Yi Y)2.The above table shows that the
total deviation between the estimated and actual of expenditure is 9690units.
Interpretation of regression sum of square (Yi Y)2
This shows that out of total deviation, 9176.067 =9176 units shows that the deviation explained
by the estimated regression line of y
on x .in the given data, 8327 units of deviation is explainedby the estimated regression line of variable Y on variable X.
Interpretation of residual sum of square (Yi Y)2
This shows that out of total deviation, 513.933 =514 units shows that the deviation not
explained by the estimated regression line of yon x .in the given data, 514 units of deviation is
not explained by the estimated regression line of variable Y on variable X.
PART E
Using the estimated line, prediction of the estimated
value of Y for an X value of 6.
Putting the value of x = 6 in the estimated regression line to find out the estimated value of y.
Yi = + xi= (-23.833) + (12.367) xi
== (-23.833) + (12.367) (6)
= 50.369
This shows that when the X units increases to 6, the number of Y units will increase to 50units,
When the value of X is 6, the predicted or the estimated value of Y is 50.
-
8/6/2019 Mediha Ass of Spss
22/23
Data Analysis
Correlation & Regression Tests Page 22
Prediction without using the estimated regression line.
Yes, we can do better prediction without using this line, as in this case average Y value would be
used as the predicted value.
PART F
Checking the significance of Alpha()
STEP 1
Hypothesis formulation:
Symbolically:HO: = 0
HA: 0
Theoretically:
HO: The Y variable is dependent on the variable X
HA: The Y variable is not dependent on the variable X
STEP 2
Determine the level of significance:
Alpha() = 5% or 0.05
STEP 3
Checking the p value for alpha from the above tables
and estimated regression model
Yi = + xi= (-23.833) + (12.367) xi
S.E = (6.225) (1.106)
t-value = (-3.829) (11.180)
-
8/6/2019 Mediha Ass of Spss
23/23
Data Analysis
Correlation & Regression Tests Page 23
p-value = (0.006) (0.000)
As, it is shown that
t-value = -3.829
p-value = 0.006
STEP 4
Decision making
As our p-value is less than the level of significance, alpha() = 5% or 0.05.we will reject the null
hypothesis(HO) and accept the alternative hypothesis(HA).It gives evidence against HO and in
favor ofHA.
P-value = 0.006 < = 0.05
It confirms the significance of HO, concluding that alpha ( ) which is the intercept of the simple
linear regression is statistically significant.