Download - Mediha Ass of Spss

8/6/2019 Mediha Ass of Spss

1/23

SUBMITTED TO:

MS.FARHAT IQBAL

SUBMITTED BY:MEDIHA WAHEED

MAJOR:

BBA (SEM-8)

DATE:

12TH, MAY, 2011

Data AnalysisCorrelation & Regression Tests


2/23

Data Analysis

Correlation & Regression Tests Page 2

QUESTION NO. 1The chief of police has given the department statistician the task of determining whether air

temperature is related to the number of traffic accidents in the city each day. On each of 8

randomly selected days the statistician records the maximum temperature and the number ofaccidents took place in the city. The results are:

Maximum Temperature (C) 14 23 16 22 30 34 19 27

No. of traffic accidents 4 6 3 6 8 11 5 7

Find the correlation coefficient. Test the hypothesis that air temperature is related to the number

of accidents.

SOLUTION

STEP 1

Hypothesis formulation:

Symbolically:

HO: r = 0

HA: r 0

Theoretically:HO: Number of accidents is dependent on air temperature

HA : Number of accidents is not dependent on air temperature

STEP 2

Determine the level of significance:

Alpha () = 5% or 0.05


3/23

Data Analysis


STEP 3

Correlations

Max.temp No.of.traffic.accidents

Max.temp Pearson Correlation 1 .962**

Sig. (2-tailed) .000

N 8 8

No.of.traffic.accide

nts

Pearson Correlation .962**

1

Sig. (2-tailed) .000

N 8 8

**. Correlation is significant at the 0.01 level (2-tailed).

STEP 4

Decision making;

As correlation coefficient( r ) is 0.962, it shows perfect positive relationship between X and Y. it

means that there is a perfect positive relation relationship between air temperature and number of

accidents. Numbers of accidents are dependent on air temperature and as air temperature

increases or decreases, number of accidents also increases or decreases.

QUESTION NO.2:

Checkout operators to be employed by a super market chain are given a one week training

period, and then given speed and accuracy test. Separate scores are recorded for the two aspects.

For ten randomly chosen operators, the results were as follows (recorded on scale of 0 to 100)

Speed score (X) 81 90 62 80 43 76 58 82 90 36

Accuracy Score(Y) 27 38 37 60 65 52 82 47 58 18


4/23

Data Analysis


a) Draw a scatter diagram and decide which correlation test should be employed on this

data.

b) Calculate the value of correlation coefficient and interpret

c) Test the value of r for significance

SOLUTION

PART A:

As the following scatter plot shows a non linear relationship, we go for a spearman correlation

( r ) Coefficient test. The given data of scores is non directional.So,we opt for the two tail

spearman correlation ( r )


5/23

Data Analysis


PART B:

STEP 1


Symbolically:

HO: r = 0

HA: r 0

Theoretically:

HO: The accuracy of scores is dependent upon the speed of the operators.

HA : The accuracy of scores is not dependent upon the speed of the operators

STEP 2


Alpha() = 5% or 0.05

STEP 3

Spss test of correlation and its result

Correlations

Speed. Score Accuracy. Score

Spearman's

rho

Speed. Score Correlation

Coefficient

1.000 -.061

Sig. (2-tailed) . .868

N 10 10

Accuracy.

Score

Correlation

Coefficient

-.061 1.000

Sig. (2-tailed) .868 .

N 10 10


6/23

Data Analysis


STEP 4

As the value of correlation coefficient ( r ) = -0.061,its shows a week negative relationship

between the two variables of speed and accuracy scores. It shows that the speed of the operators

in a super chain is negatively related to the accuracy of keeping records. The accuracy of scores

is not very much dependent upon the speed of operators.

PART C:

Value of r for significance;

In order to check the value of r for significance we compare the given p value in the above

table with the level of significance Alpha() = 5% or 0.05

P value = 0.868

Alpha () = 5% or 0.05

P value > Alpha ()

0.868 > 0.05

As our p value is greater than the level of significance, we accept the null hypothesis (HO) and

reject the alternative hypothesis (HA). It gives evidence against HA and in favor ofHO.

This shows that our spearman Correlation coefficient ( r ) is not considered as significant, it is

insignificant for the given variables of speed and accuracy scores.

QUESTION NO. 3.

The expenditure on child care facilities in the previous year by a random sample of 6 local

council, and the number of children under age 5living in the electorates are shown below:

Council 1 2 3 4 5 6

Expenditures (000Rs.) 125 180 154 90 102 63

Number of Children 1723 2510 1856 1525 1624 920

a) Draw a scatter diagram of the data

b) Find the least square regression line of expenditures on number of children

c) Interpret the four tables of results


7/23

Data Analysis


d) Draw the line on the scatter diagram. Comment on whether you feel the line is good fit of

the data.

e) Using the estimated line, predict the expenditures of a local council that has 1250

children under the age of 5.

f) Confirm the significance of alpha?

SOLUTION

PART A

Scatter diagram


8/23

Data Analysis


PART B

STEP 1


Symbolically:

HO: = 0

HA: 0

Theoretically:

HO: The Expenditure on child care facilities is dependent on the number of childrens.

HA: The Expenditure on child care facilities is not dependent on the number of childrens

STEP 2


Alpha() = 5% or 0.05

STEP 3

Compute the spss,regression test

Variables Entered/RemovedbModel

Variables

Entered

Variables

Removed Method

1 No.of.Childrena

. Entera. All requested variables entered.

b. Dependent Variable: Expenditures


9/23

Data Analysis


Model SummaryModel R R Square

Adjusted R

Square

Std. Error of

the Estimate

1 .949a

.900 .875 15.17804

a. Predictors: (Constant), No.of.Children

Coefficientsa

Model

Unstandardized

Coefficients

Standardized

Coefficients

t Sig.

95.0%

Confidence

Interval for B

B Std. Error Beta

Lower

Bound

Upper

Bound

1 (Constant) -15.185 23.164 -.656 .548 -79.498 49.128

No.of.Children .079 .013 .949 6.012 .004 .043 .116

a. Dependent Variable: Expenditures

ANOVAbModel

Sum of

Squares df

Mean

Square F Sig.

1 Regression 8326.508 1 8326.508 36.144 .004a

Residual 921.492 4 230.373

Total 9248.000 5

a. Predictors: (Constant), No.of.Children

b. Dependent Variable: Expenditures


10/23

Data Analysis


Least square regression line of

expenditures on numberof children

Yi = + xi= (-15.185) + (0.079) xi

S.E = (23.164) (0.013)

t-value = (-0.656) (6.012)

p-value = (0.548) (0.004)

R = 0.949, R2

= 0.900

STEP 4

Decision making

As our p-value is less than the level of significance, alpha() = 5% or 0.05.we will reject the null

hypothesis(HO)

and accept the alternative hypothesis(HA)

.It gives evidence against HO

and in

favor ofHA.

P-value = 0.004 < = 0.05

It confirms the significance of HO, concluding that expenditure is significantly related to the

number of children. We reject HO and also conclude thatBeta (), which is a partial slope

coefficient is significant.

PART CInterpretation of above results from the table:


11/23

Data Analysis


Interpretation of Alpha ():

As alpha () is the intercept and it shows the point where x = 0 and regression line touches the y-

axis.

Here, it shows that when number of children is (x) = 0, the expenditure (y) is -15.185(000Rs.).It

means that when there is no children, then the expenditure reduces to 15.185(000Rs.).

Interpretation of Beta ()

As beta () is the slope of the regression line, it means that one unit change in x will leads to

0.097 units increase in Y. If there is an increase of one child, expenditure will increase by

0.097(000Rs.).

Interpretation of correlation coefficient (R)As our correlation coefficient (r) is 0.949, it shows a strong positive linear relationship between

the number of children and expenditure. Expenditure is strongly related to the number of

children. It shows that when number of children increases or decreases, the expenditure will also

increase or decrease in a same manner.

Interpretation of coefficient of determination (R2)

As our coefficient of determination (R2) is 0.900 or 90%,it shows that the 90% variation in y is

due to the x variable.90% of the variations in the expenditure is due to the number of children.

Interpretation of Anova Table

Total sum of square = regression sum of square + residual sum of square

(Yi Y)2 (Yi Y)

2 (Yi Y)

2

9248.00 = 8326.508 + 921.492

Interpretation of Total Deviation in Y (Yi Y)2

As total deviation is the difference between the best prediction and the actual value. Here the

value of 9248.00 represents the Total Deviation in Y (Yi Y)2.The above table shows that the

total deviation between the estimated and actual of expenditure is 9648(000Rs.).

Interpretation of regression sum of square (Yi Y)2


12/23

Data Analysis


This shows that out of total deviation, 8326.508 =8327 units shows that the deviation explained

by the estimated regression line of yon x .in the given data, 8327 units of deviation is explained

by the estimated regression line of expenditures on number of children.

Interpretation of residual sum of square (Yi Y)2

This shows that out of total deviation, 921.492 =921 units shows that the deviation not

explained by the estimated regression line of yon x .in the given data, 921 units of deviation is

not explained by the estimated regression line of expenditures on number of children.

PART D

Determination of whether the estimated line is a good

fit of the data.


13/23

Data Analysis


Interpretation:

As the above scatter plot depicts that there is a linear relationship, the estimated line or the

regression model is good fit of the given data regarding the expenditure and number of children.

It is a good fit because the above calculated values and diagram depicts that coefficient of

determination (R2

) is 0.900 or 90%,which is quite high to support the good fitted model, as data

points are also closer to each other.

PART E

Using the estimated line, prediction of the

Expenditures of a local council that has 1250 children

Putting the value of x = 1250 in the estimated regression line to find out the estimated value of y.

Yi = + xi= (-15.185) + (0.079) xi

= (-15.185) + (0.079)(1250)

= 83.565.

This shows that when the number of children increases to 1250, the expenditure on the child

facilities will increase to 83.565(000rs.).The predicted expenditure will be 83.565(000rs.), when

number of children will increased to 1250.

PART F

Checking the significance of Alpha()

STEP 1


Symbolically:

HO: = 0

HA: 0


14/23

Data Analysis


Theoretically:

HO: The Expenditure on child care facilities is dependent on the number of childrens.

HA: The Expenditure on child care facilities is not dependent on the number of childrens

STEP 2


Alpha() = 5% or 0.05

STEP 3

Checking the p value for alpha from the above tables

and estimated regression model

Yi = + xi= (-15.185) + (0.079) xi

S.E = (23.164) (0.013)

t-value = (-0.656) (6.012)

p-value = (0.548) (0.004)

As, it is shown that

t-value = -0.656

p-value = 0.548

STEP 4

Decision making

As our p-value is greater than the level of significance, alpha() = 5% or 0.05.we will accept the

null hypothesis(HO) and reject the alternative hypothesis(HA).It gives evidence against HA and in

favor ofHO.

P-value = 0.548 > = 0.05


15/23

Data Analysis


It confirms the insignificance of HO, concluding that alpha ( ) which is the intercept of the

simple linear regression is statistically insignificant.

QUESTION NO. 4Consider the following data for the variables X and Y.

X 1 2 3 4 5 6 7 8 9

Y 1 4 9 16 25 49 64 81 93

a) Plot the data on a scatter diagram.

b) Find the least square regression line on Y and X.

c) Draw the line on the diagram.d) Interpret the results

e) Use the line found in (b) to predict the value of y for an X value of 6. Can you make

the better Prediction without using this line? Why or Why not?

f) Confirm the significance of alpha?

SOLUTION

PART A


16/23

Data Analysis


Scatter diagram

PART B

STEP 1


Symbolically:

HO: = 0

HA: 0

Theoretically:


17/23

Data Analysis


HO: The Y variable is dependent on the variable X

HA: The Y variable is not dependent on the variable X

STEP 2


Alpha() = 5% or 0.05

STEP 3

Compute the spss, regression test

Variables Entered/RemovedbModel

Variables

Entered

Variables

Removed Method

1 Xa

. Enter

a. All requested variables entered.

b. Dependent Variable: Y

ANOVAbModel

Sum of

Squares df

Mean

Square F Sig.

1 Regression 9176.067 1 9176.067 124.982 .000a

Residual 513.933 7 73.419

Total 9690.000 8

a. Predictors: (Constant), X

Model SummaryModel R R Square

Adjusted R

Square

Std. Error of

the Estimate

1 .973a .947 .939 8.56849



18/23

Data Analysis


ANOVAbModel

Sum of

Squares df

Mean

Square F Sig.

1 Regression 9176.067 1 9176.067 124.982 .000a

Residual 513.933 7 73.419

Total 9690.000 8


b. Dependent Variable: Y

Least square regression line of Y on X

Yi = + xi= (-23.833) + (12.367) xi

S.E = (6.225) (1.106)

t-value = (-3.829) (11.180)

p-value = (0.006) (0.000)

R = 0.973, R2

= 0.947

STEP 4

Decision making

Coefficientsa

Model

Unstandardized

Coefficients

Standardized

Coefficients

t Sig.

95.0% Confidence

Interval for B

B Std. Error Beta

Lower

Bound

Upper

Bound

1 (Constant) -23.833 6.225 -3.829 .006 -38.553 -9.114

X 12.367 1.106 .973 11.180 .000 9.751 14.982

a. Dependent Variable: Y


19/23

Data Analysis



hypothesis(HO) and accept the alternative hypothesis(HA).

P-value = 0.000 < = 0.05

It confirms the significance of HO,

concluding that variable Y is significantly related to the

variable X. We reject HO and also conclude thatBeta (), which is a partial slope coefficient is

significant.

PART C

Determination of whether the estimated line is a good

fit of the data.


20/23

Data Analysis


Interpretation:

As the above scatter plot depicts that there is a linear relationship, the estimated line or the

regression model is good fit of the given data regarding the Y and X variable. It is a good fit

because the above calculated values and diagram depicts that coefficient of determination (R2 ) is

0.947 or 95%,which is quite high to support the good fitted model, as data points are also closer

to each other.

PART D

Interpretation of above results from the table:

Interpretation of Alpha ():As alpha () is the intercept and it shows the point where x = 0 and regression line touches the y-

axis.

Here, it shows that when variable (x) = 0, then variable (y) is -23.833 units.It means that when

there is no value of X or it becomes zero, then the variable (y) reduces to 23.833 units.

Interpretation of Beta ()

As beta () is the slope of the regression line, it means that one unit change in x will leads to

12.367 units increase in Y. If there is an increase of X by one unit, expenditure then Y will

increase by 12.367 units.

Interpretation of correlation coefficient (R)

As our correlation coefficient (r) is 0.973, it shows a strong positive linear relationship between

the X and Y. Variable X is strongly related to the variable Y. It shows that when units of X

increases or decreases, the unit of Y will also increase or decrease in a same manner.

Interpretation of coefficient of determination (R2

)As our coefficient of determination (R2 ) is 0.947 or 94.7%, = 95 % , it shows that the 95%

variation in Y is due to the X variable.95% of the variations in Y is caused by X.

Interpretation of Anova Table


21/23

Data Analysis


Total sum of square = regression sum of square + residual sum of square

(Yi Y)2 (Yi Y)

2 (Yi Y)

2

9690.00 = 9176.067 + 513.933

Interpretation of Total Deviation in Y (Yi Y)2As total deviation is the difference between the best prediction and the actual value. Here the

value of 9690.00 represents the Total Deviation in Y (Yi Y)2.The above table shows that the

total deviation between the estimated and actual of expenditure is 9690units.

Interpretation of regression sum of square (Yi Y)2

This shows that out of total deviation, 9176.067 =9176 units shows that the deviation explained

by the estimated regression line of y

on x .in the given data, 8327 units of deviation is explainedby the estimated regression line of variable Y on variable X.

Interpretation of residual sum of square (Yi Y)2

This shows that out of total deviation, 513.933 =514 units shows that the deviation not

explained by the estimated regression line of yon x .in the given data, 514 units of deviation is

not explained by the estimated regression line of variable Y on variable X.

PART E

Using the estimated line, prediction of the estimated

value of Y for an X value of 6.

Putting the value of x = 6 in the estimated regression line to find out the estimated value of y.

Yi = + xi= (-23.833) + (12.367) xi

== (-23.833) + (12.367) (6)

= 50.369

This shows that when the X units increases to 6, the number of Y units will increase to 50units,

When the value of X is 6, the predicted or the estimated value of Y is 50.


22/23

Data Analysis


Prediction without using the estimated regression line.

Yes, we can do better prediction without using this line, as in this case average Y value would be

used as the predicted value.

PART F

Checking the significance of Alpha()

STEP 1


Symbolically:HO: = 0

HA: 0

Theoretically:

HO: The Y variable is dependent on the variable X

HA: The Y variable is not dependent on the variable X

STEP 2


Alpha() = 5% or 0.05

STEP 3

Checking the p value for alpha from the above tables

and estimated regression model

Yi = + xi= (-23.833) + (12.367) xi

S.E = (6.225) (1.106)

t-value = (-3.829) (11.180)


23/23

Data Analysis


p-value = (0.006) (0.000)

As, it is shown that

t-value = -3.829

p-value = 0.006

STEP 4

Decision making


hypothesis(HO) and accept the alternative hypothesis(HA).It gives evidence against HO and in

favor ofHA.

P-value = 0.006 < = 0.05

It confirms the significance of HO, concluding that alpha ( ) which is the intercept of the simple

linear regression is statistically significant.

Download - Mediha Ass of Spss

Top Related