statistical analysis regression & correlation

Post on 03-Jan-2016

49 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

Statistical Analysis Regression & Correlation. Psyc 250 Winter, 2013. Review: Types of Variables & Steps in Analysis. Variables & Statistical Tests. Evaluating an hypothesis. Step 1: What is the relationship in the sample ? - PowerPoint PPT Presentation

TRANSCRIPT

Statistical Analysis

Regression & Correlation

Psyc 250

Winter, 2013

Review:

Types of Variables&

Steps in Analysis

Variables & Statistical TestsVariable Type Example Common Stat

MethodNominal by nominal

Blood type by gender

Chi-square

Scale by nominal GPA by gender

GPA by major

T-test

Analysis of Variance

Scale by scale Weight by height

GPA by SAT

Regression

Correlation

Evaluating an hypothesis

• Step 1: What is the relationship in the sample?

• Step 2: How confidently can one generalize from the sample to the universe from which it comes?

p < .05

Evaluating an hypothesisRelationship in

SampleStatistical

Significance

2 nom. vars. Cross-tab / contingency table

“p value” from Chi Square

Scale dep. & 2-cat indep.

Means for each category

“p value” from t-test

Scale dep. & 3+ cat indep.

Means for each category

“p value” from ANOVA f ratio

2 scale vars. Regression line

Correlation r & r2

“p value” from reg or correlation

Evaluating an hypothesisRelationship in

SampleStatistical

Significance

2 nom. vars. Cross-tab / contingency table

“p value” from Chi Square

Scale dep. & 2-cat indep.

Means for each category

“p value” from t-test

Scale dep. & 3+ cat indep.

Means for each category

“p value” from ANOVA

2 scale vars. Regression line

Correlation r & r2

“p value” from reg or correlation

Relationships betweenScale Variables

Regression

Correlation

Regression• Amount that a dependent variable

increases (or decreases) for each unit increase in an independent variable.

• Expressed as equation for a line –

y = m(x) + b – the “regression line”

• Interpret by slope of the line: m

(Or: interpret by “odds ratio” in “logistic regression”)

Correlation• Strength of association of scale measures

• r = -1 to 0 to +1

+1 perfect positive correlation

-1 perfect negative correlation

0 no correlation

• Interpret r in terms of variance

Mean&

Variance

Example: Weight & HeightSurvey of Class n = 42

• Height• Mother’s height• Mother’s education• SAT• Estimate IQ• Well-being

(7 pt. Likert)

• Weight• Father’s education• Family income• G.P.A.• Health (7 pt. Likert)

Frequency Table for: HEIGHT  Valid CumValue Label Value Frequency Percent Percent Percent  59.00 1 2.4 2.4 2.4 61.00 2 4.8 4.8 7.1 62.00 3 7.1 7.1 14.3 63.00 3 7.1 7.1 21.4 65.00 5 11.9 11.9 33.3 66.00 3 7.1 7.1 40.5 67.00 4 9.5 9.5 50.0 68.00 5 11.9 11.9 61.9 69.00 1 2.4 2.4 64.3 70.00 6 14.3 14.3 78.6 71.00 1 2.4 2.4 81.0 72.00 4 9.5 9.5 90.5 73.00 3 7.1 7.1 97.6 74.00 1 2.4 2.4 100.0 ------- ------- ------- Total 42 100.0 100.0 Valid cases 42 Missing cases 0

Frequency Table for: HEIGHT  Valid CumValue Label Value Frequency Percent Percent Percent  59.00 1 2.4 2.4 2.4 61.00 2 4.8 4.8 7.1 62.00 3 7.1 7.1 14.3 63.00 3 7.1 7.1 21.4 65.00 5 11.9 11.9 33.3 66.00 3 7.1 7.1 40.5 67.00 4 9.5 9.5 50.0 68.00 5 11.9 11.9 61.9 69.00 1 2.4 2.4 64.3 70.00 6 14.3 14.3 78.6 71.00 1 2.4 2.4 81.0 72.00 4 9.5 9.5 90.5 73.00 3 7.1 7.1 97.6 74.00 1 2.4 2.4 100.0 ------- ------- ------- Total 42 100.0 100.0 Valid cases 42 Missing cases 0  Descriptive Statistics for: HEIGHT ValidVariable Mean Std Dev Variance Range Minimum Maximum N HEIGHT 67.33 3.87 14.96 15.00 59.00 74.00 42

mean

Variance

x i - Mean )2

Variance = s2 = ----------------------- N - 1

 

Standard Deviation = s = variance

Frequency Table for: WEIGHT Valid CumValue Label Value Frequency Percent Percent Percent  115.00 1 2.4 2.4 2.4 120.00 1 2.4 2.4 4.8 124.00 1 2.4 2.4 7.1 125.00 4 9.5 9.5 16.7 128.00 1 2.4 2.4 19.0 130.00 6 14.3 14.3 33.3 135.00 4 9.5 9.5 42.9 136.00 1 2.4 2.4 45.2 140.00 3 7.1 7.1 52.4 145.00 2 4.8 4.8 57.1 150.00 3 7.1 7.1 64.3 155.00 2 4.8 4.8 69.0 160.00 6 14.3 14.3 83.3 165.00 2 4.8 4.8 88.1 170.00 1 2.4 2.4 90.5 185.00 1 2.4 2.4 92.9 190.00 2 4.8 4.8 97.6 210.00 1 2.4 2.4 100.0 ------- ------- ------- Total 42 100.0 100.0 Valid cases 42 Missing cases 0  Descriptive Statistics for: WEIGHT ValidVariable Mean Std Dev Variance Range Minimum Maximum N WEIGHT 146.38 21.30 453.80 95.00 115.00 210.00 42

mean

Relationship of weight & height:

Regression Analysis

“Least Squares” Regression Line

Dependent = ( B ) (Independent) + constant

weight = ( B ) ( height ) + constant

Regression line

Regression: WEIGHT on HEIGHT Multiple R .59254R Square .35110Adjusted R Square .33488Standard Error 17.37332 Analysis of Variance DF Sum of Squares Mean SquareRegression 1 6532.61322 6532.61322Residual 40 12073.29154 301.83229 F = 21.64319 Signif F = .0000 ------------------ Variables in the Equation ------------------ Variable B SE B Beta T Sig T HEIGHT 3.263587 .701511 .592541 4.652 .0000(Constant) -73.367236 47.311093 -1.551 

[ Equation: Weight = 3.3 ( height ) - 73 ]

Regression line

W = 3.3 H - 73

Strength of Relationship

“Goodness of Fit”: Correlation

How well does the regression line “fit” the data?

Correlation• Strength of association of scale measures

• r = -1 to 0 to +1

+1 perfect positive correlation

-1 perfect negative correlation

0 no correlation

• Interpret r in terms of variance

Frequency Table for: WEIGHT Valid CumValue Label Value Frequency Percent Percent Percent  115.00 1 2.4 2.4 2.4 120.00 1 2.4 2.4 4.8 124.00 1 2.4 2.4 7.1 125.00 4 9.5 9.5 16.7 128.00 1 2.4 2.4 19.0 130.00 6 14.3 14.3 33.3 135.00 4 9.5 9.5 42.9 136.00 1 2.4 2.4 45.2 140.00 3 7.1 7.1 52.4 145.00 2 4.8 4.8 57.1 150.00 3 7.1 7.1 64.3 155.00 2 4.8 4.8 69.0 160.00 6 14.3 14.3 83.3 165.00 2 4.8 4.8 88.1 170.00 1 2.4 2.4 90.5 185.00 1 2.4 2.4 92.9 190.00 2 4.8 4.8 97.6 210.00 1 2.4 2.4 100.0 ------- ------- ------- Total 42 100.0 100.0 Valid cases 42 Missing cases 0  Descriptive Statistics for: WEIGHT ValidVariable Mean Std Dev Variance Range Minimum Maximum N 

WEIGHT 146.38 21.30 453.80 95.00 115.00 210.00 42

mean

mean

Variance = 454

Regression line

mean

Correlation: “Goodness of Fit”

• Variance (average sum of squared distances from mean) = 454

• “Least squares” (average sum of squared distances from regression line) = 295

l.s. = 295 Regression line

meanS2 = 454

Correlation: “Goodness of Fit”

• How much is variance reduced by calculating from regression line?

• 454 – 295 = 159 159 / 454 = .35

• Variance is reduced 35% by calculating “least squares” from regression line

• r2 = .35

r2 = % of variance in WEIGHT

“explained” by HEIGHT

Correlation coefficient = r

Correlation: HEIGHT with WEIGHT   HEIGHT WEIGHT HEIGHT 1.0000 .5925 ( 42) ( 42) P= . P= .000 WEIGHT .5925 1.0000 ( 42) ( 42) P= .000 P= .

r = .59

r2 = .35

HEIGHT “explains” 35% of variance in WEIGHT

Sentence & G.P.A.

• Regression: form of relationship

• Correlation: strength of relationship

• p value: statistical significance

Legal Attitudes Study:

1. Relationship of sentence length to G.P.A.?

2. Relationship of sentence length to Liberal-Conservative views

grade point average

grade point average

4.003.903.803.753.703.403.333.203.00

Fre

qu

en

cy

7

6

5

4

3

2

1

0

Statistics

grade point average23

1

3.5752

.35057

.12290

Valid

Missing

N

Mean

Std. Deviation

Variance

G. P. A.

jail sentence

jail sentence

18.0012.0011.009.006.004.003.002.00.00

Fre

qu

en

cy

7

6

5

4

3

2

1

0

Statistics

jail sentence24

0

5.1250

5.44788

29.67935

Valid

Missing

N

Mean

Std. Deviation

Variance

Length of Sentence (simulated data)

grade point average

4.24.03.83.63.43.23.02.8

jail

sen

ten

ce

20

10

0

-10

Scatterplot: Sentence on G.P.A.

Regression Coefficients

Coefficientsa

17.853 12.097 1.476 .155

-3.534 3.368 -.223 -1.049 .306

(Constant)

grade point average

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: jail sentencea.

Sentence = -3.5 G.P.A. + 18

grade point average

4.24.03.83.63.43.23.02.8

jail

sen

ten

ce20

10

0

-10

Sent = -3.5 GPA + 18

“Least Squares” Regression Line

Correlation: Sentence & G.P.A.

Correlations

1 -.223

. .306

23 23

-.223 1

.306 .

23 24

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

grade point average

jail sentence

grade pointaverage jail sentence

Statistical Significance

Coefficientsa

17.853 12.097 1.476 .155

-3.534 3.368 -.223 -1.049 .306

(Constant)

grade point average

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: jail sentencea.

Correlations

1 -.223

. .306

23 23

-.223 1

.306 .

23 24

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

grade point average

jail sentence

grade pointaverage jail sentence

p = .31

Regression:

Correlation

Interpreting Correlations

• r = -.22

• r2 = .05 p = .31

G.P.A. “explains” 5% of the variance in length of sentence

Write Results

“A regression analysis finds that each higher unit of GPA is associated with a 3.5 month decrease in sentence length, but this correlation was low (r = -.22) and not statistically significant (p = .31).”

Multiple Regression

• Problem: relationship of weight and calorie consumption

• Both weight and calorie consumption related to height

• Need to “control for” height or assess relative effects of height and calorie consumption

Regression line

mean

Multiple Regression

Regression line

mean

Multiple Regression

Residuals

Multiple Regression• Regress weight residuals (dependent

variable) on caloric intake (independent variable)

• Statistically “controls” for height: removes effect or “confound” of height .

• How much variance in weight does caloric intake account for over and above height?

Multiple Regression

• How much variance in dependent measure (weight, length of sentence) do all independent variables combined account for?

multiple R2

• What is the best “model” for predicting the dependent variable?

Malamuth: Sexual Aggression

• Dependent Var: self-report aggression

• Indep / Predictor Vars:– Dominance– Hostility toward women– Acceptance of violence toward women– Psychoticism– Sexual Experience

+ interaction effects

Malamuth: multiple regressions• Without “tumescence” index:

multiple R = .55 w/ interactions R = .67

multiple R2 = .30 R2 = .45

• With “tumescence” index:

multiple R = .62 w/ interactions R = .87

multiple R2 = .38 R2 = .75

top related