multiple regression analysis (mra)

Post on 05-Jan-2016

74 Views

Category:

Documents

7 Downloads

Preview:

Click to see full reader

DESCRIPTION

Multiple Regression Analysis (MRA). Design requirements Multiple regression model R 2 Comparing standardized regression coefficients. Steps in data analysis. Look first at each variable separately Then at relationships among the variables - PowerPoint PPT Presentation

TRANSCRIPT

MULTIPLE REGRESSION ANALYSIS (MRA)

Design requirements

Multiple regression model

R2

Comparing standardized regression coefficients

STEPS IN DATA ANALYSIS Look first at each variable separately Then at relationships among the variables

Examine the distribution of each variable to be used in multiple regression to determine if there are any unusual patterns that may be important in building our regression analysis.

DISTRIBUTION OF VARIABLES

CORRELATION ANALYSIS If interested only in determining whether a relationship exists, use

correlation analysis. Example: Student’s height and weight.

Plot of Height vs Weight

100 140 180 220 260

Weight

4.6

5

5.4

5.8

6.2

6.6

7

Hei

ght

Plot of Height vs Weight

100 140 180 220 260

Weight

5.3

5.6

5.9

6.2

6.5

6.8

Hei

ght

Plot of Height vs Weight

100 140 180 220 260

Weight

5.4

5.8

6.2

6.6

7

Hei

ght

Plot of Height vs Weight

100 140 180 220 260

Weight

5

5.4

5.8

6.2

6.6

Hei

ght

CORRELATION ANALYSIS

Correlation coefficient close to +1=strong positive relationship.

Correlation coefficient close to -1= strong negative relationship.

Correlation coefficient close to 0= no relationship.

EXAMPLE: SELF CONCEPT AND ACADEMIC ACHIEVEMENT (N=103)

CORRELATION

MULTIPLE REGRESSION ANALYSIS (MRA)

Method for studying the relationship between a dependent variable and two or more independent variables.

Purposes: Prediction Explanation Theory building

DESIGN REQUIREMENTS

One dependent variable (criterion)

Two or more independent variables (predictor variables).

Sample size: >= 50 (at least 10 times as many cases as independent variables)

ASSUMPTIONS

Independence: The scores of any particular subject are independent of the scores of all other subjects

Normality: In the population, the scores on the dependent variable are normally distributed for each of the possible combinations of the level of the X variables; each of the variables is normally distributed

ASSUMPTIONS Homoscedasticity: In the population, the

variances of the dependent variable for each of the possible combinations of the levels of the X variables are equal.

Linearity: In the population, the relation between the dependent variable and the independent variable is linear when all the other independent variables are held constant.

HOMOSCEDASTICITY(HOMOGENEITY OF VARIANCE)

LINEAR REGRESSION

In simple linear regression the relationship between one explanatory variable (IV) and one response variable (DV).

In multiple regression, several explanatory variables work together to explain the dependent variable.

MODELS

WHAT IS A MODEL?

Representation of Some PhenomenonRepresentation of Some Phenomenon

(Non-Math/Stats Model)(Non-Math/Stats Model)

WHAT IS A MATH/STATS MODEL?

Describe Relationship between Variables

Types- Deterministic Models

(no randomness)

- Probabilistic Models

(with randomness)

DETERMINISTIC MODELS

1. Hypothesize Exact Relationships

2. Suitable When Prediction Error is Negligible

3. Example: Body mass index (BMI) is measure of body fat based on this formula.

Non-metric Formula: BMI = Weight (pounds)x703 (Height in inches)2

PROBABILISTIC MODELS

1. Hypothesize 2 Components Deterministic Random Error

2. Example: Systolic blood pressure (SBP) of newborns is 6 Times the Age in days + Random Error

SBP = 6xage(d) + Random Error May Be Due to Factors

Other than age in days (e.g. Birth weight)

TYPES OF PROBABILISTIC MODELS

ProbabilisticModels

RegressionModels

CorrelationModels

OtherModels

ProbabilisticModels

RegressionModels

CorrelationModels

OtherModels

REGRESSION MODELS

TYPES OF PROBABILISTIC MODELS

ProbabilisticModels

RegressionModels

CorrelationModels

OtherModels

ProbabilisticModels

RegressionModels

CorrelationModels

OtherModels

REGRESSION MODELS Relationship between one dependent variable

and explanatory variable(s)

Use equation to set up relationship Numerical Dependent (Response) Variable 1 or More Numerical or Categorical

Independent (Explanatory) Variables

Used Mainly for Prediction & Estimation

REGRESSION MODELING STEPS

1. Hypothesize Deterministic Component Estimate Unknown Parameters

2. Specify Probability Distribution of Random Error Term

Estimate Standard Deviation of Error

3. Evaluate the fitted Model

4. Use Model for Prediction & Estimation

MULTIPLE REGRESSION

Very popular among social scientists.Most social phenomena have more than

one cause.

Very difficult to manipulate just one social variable through experimentation.

Social scientists must attempt to model complex social realities to explain them.

MULTIPLE REGRESSIONAllows us to:

Use several variables at once to explain the variation in a continuous dependent variable.

Isolate the unique effect of one variable on the continuous dependent variable while taking into consideration that other variables are affecting it too.

Write a mathematical equation that tells us the overall effects of several variables together and the unique effects of each on a continuous dependent variable.

Control for other variables to demonstrate whether bivariate relationships are spurious

*** MULTIPLE REGRESSION For example:

A researcher may be interested in the relationship between Education and Income and Number of Children in a family.

Independent Variables

Education

Family Income

Dependent Variable

Number of Children

MULTIPLE REGRESSION For example:

Research Hypothesis: As education of respondents increases, the number of children in families will decline (negative relationship).

Research Hypothesis: As family income of respondents increases, the number of children in families will decline (negative relationship).

Independent Variables

Education

Family Income

Dependent Variable

Number of Children

MULTIPLE REGRESSION For example:

Null Hypothesis: There is no relationship between education of respondents and the number of children in families.

Null Hypothesis: There is no relationship between family income and the number of children in families.

Independent Variables

Education

Family Income

Dependent Variable

Number of Children

MULTIPLE REGRESSION

Model Summary

.757a .573 .534 2.33785Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), Income, Educationa. ANOVAb

161.518 2 80.759 14.776 .000a

120.242 22 5.466

281.760 24

Regression

Residual

Total

Model1

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), Income, Educationa.

Dependent Variable: Childrenb.

Coefficientsa

11.770 1.734 6.787 .000

-.364 .173 -.412 -2.105 .047

-.403 .194 -.408 -2.084 .049

(Constant)

Education

Income

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: Childrena.

57% of the variation in number of children is explained by education and income!

Predictable variation by combination of independent variables

EXPLAINING VARIATION: HOW MUCH?

Total Variation in Y

UnpredictableVariation

PROPORTION OF PREDICTABLE AND UNPREDICTABLE VARIATION

X1

Y

(1-R2) = Unpredictable (unexplained) variation in Y

X2

Where:Y= # ChildrenX1 = EducationX2 = Income

R2 = Predictable (explained) variation in Y

MULTIPLE REGRESSIONNow… More Variables! The social world is very complex. What happens when you have even more variables?

For example:

A researcher may be interested in the effects of Education, Income, Sex, and Gender Attitudes on Number of Children in a family.

Independent Variables

Education

Family Income

Sex

Gender Attitudes

Dependent Variable

Number of Children

SIMPLE VS. MULTIPLE REGRESSION

One dependent variable Y predicted from one independent variable X

One regression coefficient

r2: proportion of variation in dependent variable Y predictable from X

One dependent variable Y predicted from a set of independent variables (X1, X2 ….Xk)

One regression coefficient for each independent variable

R2: proportion of variation in dependent variable Y predictable by set of independent variables (X’s)

DIFFERENT WAYS OF BUILDING REGRESSION MODELS

Simultaneous (Enter): All independent variables entered together

Stepwise: Independent variables entered according to some order (Determined by researcher) By size or correlation with dependent variable In order of significance (theory)

Hierarchical (Forward, Backward): Independent variables entered in stages

MULTIPLE REGRESSION:BLUE CRITERIA

Regression forces a best-fitting model onto data. If the model is appropriate for the data, regression should be used.

How do we know that our model is appropriate for the data?

Criteria for determining whether a regression model is appropriate for the data are nicknamed “BLUE” for best linear unbiased estimate.

MULTIPLE REGRESSION:BLUE CRITERIA

Violating the BLUE assumptions may result in biased estimates or incorrect significance tests. (However, OLS is robust to most violations.)

Data (constellation) should meet these criteria: The relationship between the dependent variable and its

predictors is linear No irrelevant variables are either omitted from or included in

the equation. (Good luck!) All variables are measured without error. (Good luck!)

top related