correlation continued

39
ADDITIONAL INFORMATION Correlation Analysis continued… Chapter 2

Post on 20-Oct-2014

801 views

Category:

Education


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Correlation continued

ADDITIONAL INFORMATION

Correlation Analysis continued…Chapter 2

Page 2: Correlation continued

Examples of Correlation Sugar consumption and level of activity

of a person Sales volume versus expenditures Temperature and coffee sales Price and demand Production and Plant Capacity Outdoor temperature and gas

consumption

Page 3: Correlation continued

Characteristics of a Relationship

1. The direction of a relationship

a. Positive

b. Negative

2. The form of relationship

a. linear

b. curved (ex. Mood levels and dosage)

3. The degree of relationship

perfect positive

perfect negative

high degree of positive/ negative correlation

low degree of positive / negative correlation

Page 4: Correlation continued

Where and Why Correlations are Used?

1. Prediction ex. College admission with NCAE or HS grades

Sales and population

2. Validity ex. Employee performance evaluation should have tests on skills, achievements and company contribution of an employee

3. Reliability – it produces stable, consistent measurements* when reliability is high, the correlation between two measurement should be strong and positive

4. Theory Verification ex. Demand and supply

Page 5: Correlation continued

Correlation and Causation1. There is a direct cause-and-effect relationship between

variables.2. There is a reverse cause-and-effect relationship between

variables.3. The relationship between variables may be caused by a

third value variable.4. There may be a complexity of interrelationships among

variables.5. The relationship may be coincidental.

Page 6: Correlation continued

Learning Check!1. For each of the following, indicate whether you would

expect a positive or negative correlation. Justify.

a. Distance sprinted and recovery time

b. Sugar consumption and activity level for a group of children

c. Daily high temperature and daily energy consumption for 30 days in the summer.

d. Daily high temperature and daily energy consumption for 30 days on rainy season.

e. An individual’s expenditure and income

Page 7: Correlation continued

2. The data points would be clustered more closely around a straight line for a correlation of -0.80 than for a +0.05. (True or False?)

3. If the data points are tightly clustered together around a line that slopes down from left to right, then a good estimate of the correlation would be +0.90. (True or False?)

4. A correlation can never be greater than +1.00. (True or False?)

Page 8: Correlation continued

PROBABLE ERROR AND COEFFICIENT OF CORRELATION

Correlation Analysis continued…Chapter 2

Page 9: Correlation continued

Probable Error (PE)

It is a statistical device which measures the reliability and dependability of the value of coefficient of correlation

PE = 2 x standard error (or) = 0.6745 x standard error

3

Page 10: Correlation continued

Standard Error (SE)

SE = 1 – r2

√n

PE = 0.6745 x 1 – r²

√n

• if the value of `r’ is less than the PE, then there is no evidence of correlation• if the value of `r’ is six times more than the PE, the correlation is certain and significant• By adding and submitting PE from coefficient of correlation, we can find out the upperand lower limits within which the population coefficient of correlation may be expected to lie.

Page 11: Correlation continued

Uses of PE 1) PE is used to determine the limits

within which the population coefficient of correlation may be expected to lie.

2) It can be used to test whether the value of correlation coefficient of a sample is significant with that of the population

Page 12: Correlation continued

If r = 0.6 and N = 64, find out the PE and SE of the correlation coefficient. Also determine the limits of population correlation coefficient

Sol: r = 0.6 N=64

PE = 0.6745 x SESE = 1 – r2

√n

= 1 – 0.62 = 1- 0.36 = 0.64 / 8 = 0.08

√64 8

PE = 0.6745 x 0.08 = 0.05396Limits of Population Correlation Coefficient = r ± PE

= 0.6 ±0.05396= 0.54604 to

0.6540

Page 13: Correlation continued

Qn. 2 r and PE have values 0.9 and 0.04 for two series. Find n.

Sol: PE = 0.04= 0.6745 x 1 – r2 = 0.04

√n

= 1- 0.9² = 0.04

√n 0.6745

= 1-0.81 = 0.0593

√n

0.19 / √n = 0.059300.0593 x √n = 0.19√n = 0.19 ÷ 0.0593√n = 3.2N = 3.2² = 10.266N = 10

Page 14: Correlation continued

COEFFICIENT OF DETERMINATION

Correlation Analysis continued…Chapter 2

Page 15: Correlation continued

Square of Coefficient of Correlation

*Coefficient of Determination = (r2)

*Coefficient of Non-Determination = (K2)(K2) = 1- r2

The ratio of the explained variance to the total variance

Page 16: Correlation continued

Illustrative Example Calculate the coefficient of determination and

non-determination if coefficient of correlation is 0.8

Coefficient of determination = r2

= 0.82

= 0.64= 64%

Coefficient of non- determination = K2

=1- 0.82

= 1- 0.64= 36%

Page 17: Correlation continued

Merits of Pearson’s Coefficient Correlation

It is the most widely used algebraic method to measure the coefficient of correlation

It gives numerical value to express relationship between variables

It gives both direction and degree of relationship between variables

It can be used for further algebraic treatment such as coefficient of determination and non determination

It gives a single figure to explain the accurate degree of correlation between two variables

Page 18: Correlation continued

Demerits of Pearson’s Coefficient Correlation

It is very difficult to compute the value of coefficient of correlation.

It is very difficult to understand. It requires a complicated mathematical calculation. It takes more time It is unduly affected by extreme items. It assumes a linear relationship between the

variables. But in real life situation, it may not be so.

Page 19: Correlation continued

SPEARMAN’S RANK CORRELATION METHOD

Correlation Analysis continued…Chapter 2

Page 20: Correlation continued

This was developed by Charles Edward Spearman in 1904

The correlation of coefficient obtained from ranks of the variables.

6∑D2 N3 – N

Where:

D= difference of ranks between two variables

N = number of pairs

Definition

(R) = 1 -

Page 21: Correlation continued

Qn: Find the rank correlation between poverty and overcrowding from the information given below.

Town A B C D E F G H I J

Poverty 17 13 15 16 6 11 14 9 7 12

Overcrowding 36 46 35 24 12 18 27 22 2 8

Page 22: Correlation continued

Soln.

6∑D2 N3 – NN = 10

6x44 103 – 10

264990

= 1- 0.2667= +0.7333

(R) = 1 -

(R) = 1 -

(R) = 1 -

Page 23: Correlation continued

Qn: Following were the ranks given by three judges in a beauty contest. Determine which pair of judges has the nearest approach to common tastes in beauty.

Judge 1 1 6 5 10 3 2 4 9 7 8

Judge 2 3 5 8 4 7 10 2 1 6 9

Judge 3 6 4 9 8 1 2 3 10 5 7

Page 24: Correlation continued

Soln. 6∑D2

N3 – N N = 10

6x200

103 – 10

= 1- 1.2121

= - 0.2121

6x214

103 – 10

= 1- 1.297

= - 0.297

6x60

103 – 10

= 1- 0.364

= + 0.636

(R) = 1 -

(R) = 1 -Rank correlation between I&II

Rank correlation between I&II

Rank correlation between I&III

(R) = 1 -

(R) = 1 -

Page 25: Correlation continued

Qn: The coefficient rank of the marks obtained by 10 students in statistics & English was 0.2. It was later discovered that the difference in ranks of one of the students was wrongly taken as 7 instead of 9. Find the correct result.

R = 0.2

1-.0.2= 6∑D2 1 103 – 100.8 = 6∑D2 990

6∑D2 = 990x 0.8 = 792Correct ∑D2 = 792/6 = 132-72+92

= 164

6∑D2 N3 – N

(R) = 1 -= 0.2

Page 26: Correlation continued

Correct 6∑D2 N3 – N

6x164103-10

= 1 - 984 990

= 1- 0.9939= 0.0061

(R) = 1 -

(R) = 1 -

Page 27: Correlation continued

(R) = 1 - 6∑D2 = 0.8

N3 – N

1 - .08 = 6x33

N3 – N 0.2 x (N3 – N) = 198 N3 – N = 198/0.2

=990N = 10

Qn: The coefficient rank of the marks obtained by 10 students in statistics & English was 0.2. If the sum of the squares of the difference in ranks is 33, find the number of students in the group.

Page 28: Correlation continued

Computation of Rank Correlation Coefficient when Ranks are Equal

Where D – Difference of rank in the two series

N - Total number of pairs

m - Number of times each rank repeats

R = 1-

Page 29: Correlation continued

Qn:- Obtain rank correlation co-efficient for the data:-

X: 68 64 75 50 64 80 75 40 55 64

Y: 62 58 68 45 81 60 68 48 50 70

Page 30: Correlation continued

x y R1 R2D

(R1-R2) D²

68 62 4 5 1 1

64 58 6 7 1 1

75 68 2.5 3.5 1 1

50 45 9 10 1 1

64 81 6 1 5 25

80 60 1 6 5 25

75 68 2.5 3.5 1 1

40 48 10 9 1 1

55 50 8 8 0 0

64 70 6 2 4 16

∑D² 72

Page 31: Correlation continued

Merits of Rank Correlation Method It is very simple to understand. It can be applied to any type of data, i.e.

quantitative and qualitative It is the only way of studying correlation

between qualitative data such as honesty, beauty etc.

As the sum of rank differences of the two qualitative data is always equal to zero, this method facilitates a cross check on calculations.

Page 32: Correlation continued

Demerits of Rank Correlations Rank Correlation Coefficient is only an

approximate measure as the actual values are not used for calculations.

It is not convenient when the number of pairs (N) is large.

Further algebraic treatment is not possible. Combined correlation coefficient of different

series cannot be obtained as in the case of mean and standard deviation. In case of mean and standard deviation, it is possible to compute combine arithmetic mean and combined standard deviation.

Page 33: Correlation continued

CONCURRENT DEVIATION METHOD

Correlation Analysis continued…Chapter 2

Page 34: Correlation continued

Under this method, we only consider the directions of deviations.

If deviations of two variables are concurrent, then they move in the same direction, otherwise in the opposite direction.

ñ (2c-N)

NWhere N = no. of pairs of symbolC= No. of concurrent deviations (ie.No. of +signs in `dx

dy’ column

r =±

Page 35: Correlation continued

Steps1. Every value of `x’ series is compared with its

proceeding value. Increase is shown by`+’ symbol and decrease by`-’

2. The above step is repeated for `y’ series and we get `dy’

3. Multiply `dx’ by `dy’ and the product is shown in the next column. The column heading is `dxdy’

4. Take the total number of `+’ signs in `dxdy’ column. `+’ signs in `dxdy’ column denotes the concurrent deviations and it is indicated by `C’

5. Apply the formula.

Page 36: Correlation continued

Qn:- Calculate coefficient if correlation by concurrent deviation method:

Year : 2003 2004 2005 2006 2007 2008 2009 2010 2011

Supply : 160 164 172 182 166 170 178 192 186

Price : 292 280 260 234 266 254 230 190 200

Page 37: Correlation continued

Merits of concurrent deviation method:

1. It is very easy to calculate coefficient of correlation

2. It is very simple understand the method3. When the number of items is very large,

this method may be used to form quick idea about the degree of relationship

4. This method is more suitable,

Page 38: Correlation continued

Demerits of concurrent deviation method:

1. This method ignores the magnitude of changes. Ie. Equal weight is given for small and big changes.

2. The result obtained by this method is only a rough indicator of the presence or absence of correlation

3. Further algebraic treatment is not possible4. Combined coefficient of concurrent deviation of

Page 39: Correlation continued

Thank You!!!