chapter 3 correlation and regressionnlucas/stat 145/145 powerpoint files/145 chapter 3 part 1... ·...

32
Chapter 3 TOPIC SLIDE Correlation Defined 3 Range of the Correlation Coefficient 6 Scatter Plots 9 Null and Alternative Hypotheses 12 Statistical Significance 16 Example 1 21 Example 2 24 Coefficient of Determination 28 Tutorials Obtaining the Correlation Coefficient in Excel 2007 CORRELATION AND REGRESSION

Upload: others

Post on 20-Oct-2019

14 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files/145 Chapter 3 Part 1... · Chapter 3 TOPIC SLIDE Correlation Defined 3 Range of the Correlation Coefficient

Chapter 3

TOPIC SLIDE

Correlation Defined 3

Range of the Correlation Coefficient 6

Scatter Plots 9

Null and Alternative Hypotheses 12

Statistical Significance 16

Example 1 21

Example 2 24

Coefficient of Determination 28

Tutorials

• Obtaining the Correlation Coefficient in Excel 2007

CORRELATION AND REGRESSION

Page 2: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files/145 Chapter 3 Part 1... · Chapter 3 TOPIC SLIDE Correlation Defined 3 Range of the Correlation Coefficient

Chapter 3

CORRELATION

Page 3: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files/145 Chapter 3 Part 1... · Chapter 3 TOPIC SLIDE Correlation Defined 3 Range of the Correlation Coefficient

➊ Indicates how well the ranking of scores on one

variable matches the ranking of scores on a

second variable

➋ As the ranking of scores on the first variable

increasingly match the ranking of scores on the

second variable, the correlation will be stronger

• The fewer matched rankings, the weaker the

correlation

Chapter 3

CORRELATION

Page 4: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files/145 Chapter 3 Part 1... · Chapter 3 TOPIC SLIDE Correlation Defined 3 Range of the Correlation Coefficient

➌ The ranking of scores may match in the same

direction (i.e., the score ranked first on variable 1

is also ranked first on variable 2) or opposite

direction (i.e., the score ranked first on variable 1

is ranked last on variable 2)

➍ There is no correlation when the ranking of scores

on one variable fail to match any of the scores on

the second variable

Chapter 3

CORRELATION

Page 5: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files/145 Chapter 3 Part 1... · Chapter 3 TOPIC SLIDE Correlation Defined 3 Range of the Correlation Coefficient

➊ EXAMPLE: Five soccer players were ranked

according to their soccer ability and their grade

point average (GPA)

Perfect Positive r Perfect Negative r

Soccer Soccer

Player Ability GPA Player Ability GPA

A 1 1 A 1 5

B 2 2 B 2 4

C 3 3 C 3 3

D 4 4 D 4 2

E 5 5 E 5 1

Chapter 3

CORRELATION

Page 6: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files/145 Chapter 3 Part 1... · Chapter 3 TOPIC SLIDE Correlation Defined 3 Range of the Correlation Coefficient

➊ The numeric value of the correlation coefficient has a

range of +1.00 to -1.00, where zero indicates no

correlation

• The closer the correlation coefficient is to +1.00 or -

1.00, the stronger the correlation between two

variables

• The closer the correlation coefficient is to 0, the weaker

the correlation between two variables

• A correlation coefficient equal to 0 means there is

no correlation between two variables

➋ Which value represents a stronger correlation?

• +.65 or -.85

Chapter 3

CORRELATION

Page 7: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files/145 Chapter 3 Part 1... · Chapter 3 TOPIC SLIDE Correlation Defined 3 Range of the Correlation Coefficient

➊ The correlation coefficient describes two

characteristics:

• The sign of the correlation (positive or

negative) indicates the direction of the

relationship between the two variables

• The value of the correlation indicates how

strong the correlation is between two variables

➋ The symbol for the correlation between two

variables for a sample is a lower case, italicized r

Chapter 3

CORRELATION

Page 8: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files/145 Chapter 3 Part 1... · Chapter 3 TOPIC SLIDE Correlation Defined 3 Range of the Correlation Coefficient

➊ Here is a rough guideline for defining the strength of a

correlation coefficient:

• r = ±.80 to ±1.00 Strong Correlation

• r = ±.60 to ±.80 Moderate Correlation

• r = ±.40 to ±.60 Weak to Moderate

• r < ±.40 Weak Correlation

➋ The guideline above assumes a sample size of N ≥ 30

Chapter 3

CORRELATION

Page 9: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files/145 Chapter 3 Part 1... · Chapter 3 TOPIC SLIDE Correlation Defined 3 Range of the Correlation Coefficient

➊ A scatter plot is a graph that describes the direction and

strength of the correlation between two variables

➋ The closer the points in the graph are to forming a straight

line, the stronger the correlation between the two variables

• When the points in the graph form a circular pattern,

the correlation will be close or equal to zero

• When the pattern of points leans from lower right to

upper left, the scatter plot indicates the correlation is

negative

• When the pattern of points leans from lower left to

upper right, the scatter plot indicates the correlation is

positive

Chapter 3

SCATTER PLOTS

Page 10: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files/145 Chapter 3 Part 1... · Chapter 3 TOPIC SLIDE Correlation Defined 3 Range of the Correlation Coefficient

➊ When the pattern is lower right to upper left, the correlation

is negative:

➋ When the pattern is lower left to upper right, the correlation

is positive:

Chapter 3

SCATTER PLOTS

Y

X

Y

X

Page 11: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files/145 Chapter 3 Part 1... · Chapter 3 TOPIC SLIDE Correlation Defined 3 Range of the Correlation Coefficient

Chapter 3

Scatter Plots

Page 12: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files/145 Chapter 3 Part 1... · Chapter 3 TOPIC SLIDE Correlation Defined 3 Range of the Correlation Coefficient

➊ A non-zero correlation does not necessarily mean two

variables are related to each other

➋ There are two competing hypotheses:

• The alternative hypothesis (HA) contends there is a true

correlation between the two variables for the population

and the sample correlation observed is not solely due

to random error

• The null hypothesis (H0) states that there is no

correlation between the two variables for the population

and that any sample correlation observed is solely due

to random error

Chapter 3

NULL HYPOTHESIS

Page 13: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files/145 Chapter 3 Part 1... · Chapter 3 TOPIC SLIDE Correlation Defined 3 Range of the Correlation Coefficient

➊ When a correlation coefficient is sufficiently large, we can

make the inference that it reflects not just random error

alone, but also a measure of how much two variables have

in common

• Remember random error is present in everything we

measure – you can’t get rid of it and all statistics

contain some amount of random error

• Smaller samples have more random error and larger

samples have less

Chapter 3

NULL HYPOTHESIS

Page 14: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files/145 Chapter 3 Part 1... · Chapter 3 TOPIC SLIDE Correlation Defined 3 Range of the Correlation Coefficient

➊ A statistical conclusion is a statement that rejects or fails to

reject the null hypothesis

• When we reject the null hypothesis, we are saying the

sample correlation obtained is NOT solely due to

random error but indicates a real correlation between

the two variables for the population

• When we fail to reject the null hypothesis, we are

acknowledging the observed sample correlation may

be only due to random error and that there may not be

any true correlation between the two variables for the

population

Chapter 3

NULL HYPOTHESIS

Page 15: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files/145 Chapter 3 Part 1... · Chapter 3 TOPIC SLIDE Correlation Defined 3 Range of the Correlation Coefficient

➊ The stronger the correlation, the more likely there is a real

correlation between two variables for the population

➋ Whether a sample correlation between two variables is

real or not is a function of how big the sample size is and

the strength of the correlation between two variables

• As a general rule, the larger the sample size, the

weaker the sample correlation needs to be in order to

declare it statistically significant (meaning the null

hypothesis is rejected)

• In other words, the correlation coefficient needs to

be increasingly stronger for data sets based on

small sample sizes

Chapter 3

NULL HYPOTHESIS

Page 16: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files/145 Chapter 3 Part 1... · Chapter 3 TOPIC SLIDE Correlation Defined 3 Range of the Correlation Coefficient

➊ To determine if a sample correlation is significant, we need

to first work from the assumption that the null hypothesis is

true

• We assume the null hypothesis is true because we

haven’t analyzed the data yet (there’s no evidence of a

correlation without analyzing the data)

➋ We only analyze the data from one sample, but to

determine if a sample correlation is statistically significant

we have to remember there are an infinite number of

samples that could have been selected

Chapter 3

STATISTICAL SIGNIFICANCE

Page 17: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files/145 Chapter 3 Part 1... · Chapter 3 TOPIC SLIDE Correlation Defined 3 Range of the Correlation Coefficient

➊ Assuming the null hypothesis is true, the correlation for the

sample obtained should be zero and if the value is not

zero, then we assume the correlation is solely due to

random error

➋ If we imagine obtaining the correlations for all possible

samples (where each sample is the same size), we would

find that the average of all sample correlations is equal to

the population correlation

• Again, if the null hypothesis is true, the correlation

between two variables for the population will be zero

Chapter 3

STATISTICAL SIGNIFICANCE

Page 18: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files/145 Chapter 3 Part 1... · Chapter 3 TOPIC SLIDE Correlation Defined 3 Range of the Correlation Coefficient

➊ If we imagine obtaining the correlations for all possible

samples (where each sample is the same size), we could

build a histogram using the sample correlation coefficients

• Since the histogram consists of all possible sample

correlations, it is called a sampling distribution of

sample correlations

• This histogram (or sampling distribution) will be flatter

and wider when the sample correlations are based on

smaller sample sizes and taller and narrower when the

sample correlations are based on larger sample sizes

Chapter 3

STATISTICAL SIGNIFICANCE

Page 19: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files/145 Chapter 3 Part 1... · Chapter 3 TOPIC SLIDE Correlation Defined 3 Range of the Correlation Coefficient

➊ The null hypothesis is rejected and the sample correlation

is statistically significant when the obtained correlation

value (from Excel) falls in the outer 5% of the histogram (or

sampling distribution)

Chapter 3

STATISTICAL SIGNIFICANCE

0

2.5% 2.5%

Significant

Reject Ho

Significant

Reject Ho

Not Significant

Fail to Reject Ho

r

rcrit .025 rcrit .025

Page 20: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files/145 Chapter 3 Part 1... · Chapter 3 TOPIC SLIDE Correlation Defined 3 Range of the Correlation Coefficient

➊ The correlation values that identify the outer 5% of the

sampling distribution are called the critical values

➋ The critical values are found by using the r table found on

the class website

➌ To look-up the critical value, you’ll need to know the

sample size or N

• Locate the sample size under the first column

• Then, for the selected sample size, locate the critical

value under the third column (.05 under ‘2-tailed

testing’)

Chapter 3

NULL HYPOTHESIS

Page 21: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files/145 Chapter 3 Part 1... · Chapter 3 TOPIC SLIDE Correlation Defined 3 Range of the Correlation Coefficient

➊ A researcher recruited 25 adults ranging in age from 35 to

65 years old to find out if there is a relationship between

number of television hours watched and blood pressure.

The sample correlation obtained was +.65.

➋ State the null hypothesis for this problem

• The null hypothesis expects there to be no correlation

between number of television hours watched and blood

pressure for adults ranging in age from 35 to 65 years

old. Any non-zero sample correlation observed is

assumed to be solely due to random error.

Chapter 3

CORRELATION

Page 22: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files/145 Chapter 3 Part 1... · Chapter 3 TOPIC SLIDE Correlation Defined 3 Range of the Correlation Coefficient

Conduct a test of the null hypothesis at the 5% level. Be

sure to properly state the statistical conclusion

• The sample correlation obtained in Excel is +.65

• The sample size is 25

• The critical values from the r table are ±.396

• The statistical conclusion is:

• Since r (25) = +.65, p < .05; Reject H0

Chapter 3

CORRELATION

Page 23: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files/145 Chapter 3 Part 1... · Chapter 3 TOPIC SLIDE Correlation Defined 3 Range of the Correlation Coefficient

Provide an interpretation of the statistical conclusion using

the variables from the description of the problem

• Based on the 25 adults surveyed, ranging in age from

25 to 65 years old, it appears that as the amount of

television watched per day increases, there is an

increase in blood pressure. The obtained sample

correlation does not seem to be solely due to random

error, but rather indicates a real correlation between

amount of television watched per day and blood

pressure.

Chapter 3

CORRELATION

Page 24: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files/145 Chapter 3 Part 1... · Chapter 3 TOPIC SLIDE Correlation Defined 3 Range of the Correlation Coefficient

➊ A marriage counselor believes that couples who spend

more time making meals together are more satisfied with

their relationship. Sixteen couples are recruited for the

study and asked to keep track of how much time (in

minutes) they spend preparing meals together each day

for one month. At the end of the month, couples are asked

to complete a survey on how satisfied they are with their

current relationship. The sample correlation obtained was

+.45.

Chapter 3

CORRELATION

Page 25: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files/145 Chapter 3 Part 1... · Chapter 3 TOPIC SLIDE Correlation Defined 3 Range of the Correlation Coefficient

➋ State the null hypothesis for this problem

• The null hypothesis expects there to be no correlation

between amount of time couples spend together

preparing meals and their satisfaction with their current

relationship. Any non-zero sample correlation observed

is assumed to be solely due to random error.

Chapter 3

CORRELATION

Page 26: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files/145 Chapter 3 Part 1... · Chapter 3 TOPIC SLIDE Correlation Defined 3 Range of the Correlation Coefficient

Conduct a test of the null hypothesis at the 5% level. Be

sure to properly state the statistical conclusion

• The sample correlation obtained in Excel is +.45

• The sample size is 16

• The critical values from the r table are ±.497

• The statistical conclusion is:

• Since r (16) = +.45, p < .05; Fail to reject H0

Chapter 3

CORRELATION

Page 27: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files/145 Chapter 3 Part 1... · Chapter 3 TOPIC SLIDE Correlation Defined 3 Range of the Correlation Coefficient

Provide an interpretation of the statistical conclusion using

the variables from the description of the problem

• Based on the 16 couples recruited for the study, it

appears that satisfaction with current relationship is not

dependent on how much time couples spend making

meals together. The obtained sample correlation may

only be due to random error alone.

Chapter 3

CORRELATION

Page 28: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files/145 Chapter 3 Part 1... · Chapter 3 TOPIC SLIDE Correlation Defined 3 Range of the Correlation Coefficient

➊ The coefficient of determination or r 2 provides an estimate

of the percentage of variance that is common to two

variables (also known as covariance)

• Variance refers to all the things that cause scores on a

given variable to be different

• What causes people to be different heights?

• Genes, nutrition, disease, age, race, and gender

to name a few

• Differences on these traits cause variance in

heights across the population

Chapter 3

COEFFICIENT OF DETERMINATION

Page 29: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files/145 Chapter 3 Part 1... · Chapter 3 TOPIC SLIDE Correlation Defined 3 Range of the Correlation Coefficient

➊ If two variables are correlated, they must share some

amount of variance

• There is a significant correlation between height and

weight for the population

• What is the variance shared between these two

variables?

• Both height and weight are influenced by genes,

nutrition, disease, age, race, and gender

• These variables likely explain why height and

weight are correlated

• The variance shared by two variables is known as

covariance

Chapter 3

COEFFICIENT OF DETERMINATION

Page 30: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files/145 Chapter 3 Part 1... · Chapter 3 TOPIC SLIDE Correlation Defined 3 Range of the Correlation Coefficient

➊ What is the coefficient of determination or r 2 for the

problem examining the relationship between amount of TV

watched and blood pressure?

• To get the coefficient of determination, square the

sample correlation obtained in Excel

• r 2 = .65 x .65 = .42 or 42%

• Interpretation: It is estimated that 42% of the

variance in amount of TV watched per day is

common to blood pressure. This estimate of

covariance is based on a sample size of 25.

Chapter 3

COEFFICIENT OF DETERMINATION

Page 31: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files/145 Chapter 3 Part 1... · Chapter 3 TOPIC SLIDE Correlation Defined 3 Range of the Correlation Coefficient

➊ What is the coefficient of determination or r 2 for the problem

examining the relationship between amount of time couples

spend making meals together and level of satisfaction with their

current relationship?

• r 2 = .45 x .45 = .20 or 20%

• Interpretation: It is estimated that 20% of the variance in

amount of time couples spend making meals together is

common to the level of satisfaction with their current

relationship. This estimate of covariance is based on a

sample size of 16.

• NOTE: The coefficient of determination was done for the example

above for demonstration only. The coefficient of determination is

not interpretable for non-significant correlations

Chapter 3

COEFFICIENT OF DETERMINATION

Page 32: Chapter 3 CORRELATION AND REGRESSIONnlucas/Stat 145/145 Powerpoint Files/145 Chapter 3 Part 1... · Chapter 3 TOPIC SLIDE Correlation Defined 3 Range of the Correlation Coefficient

End of Chapter 3 – Part 1