session 49 - 52

23
SESSION 49 - 52 Last Update 17 th June 2011 Regression

Upload: jagger

Post on 06-Feb-2016

43 views

Category:

Documents


0 download

DESCRIPTION

SESSION 49 - 52. Last Update 17 th June 2011. Regression. Learning Objectives. XY-Scatter Diagrams Plotting the Regression Line Coefficient Estimates Pearson Coefficient of Correlation Spearman Rank Correlation Coefficient. XY-Scatter Diagram. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: SESSION 49 - 52

SESSION 49 - 52

Last Update17th June 2011

Regression

Page 2: SESSION 49 - 52

Lecturer: Florian BoehlandtUniversity: University of Stellenbosch Business SchoolDomain: http://www.hedge-fund-analysis.net/pages/ve

ga.php

Page 3: SESSION 49 - 52

Learning Objectives

1. XY-Scatter Diagrams2. Plotting the Regression Line3. Coefficient Estimates4. Pearson Coefficient of Correlation5. Spearman Rank Correlation Coefficient

Page 4: SESSION 49 - 52

XY-Scatter Diagram

To draw a scatter diagram we need data for two variables. In applications where one variable depends to some degree on the other variable, the dependent variable is labeled Y and the other, called the independent variable, X. The values for X and Y are combined into a single data point using the observations for X and Y as coordinates.

Page 5: SESSION 49 - 52

Example Temperature - Truck

5 10 15 20 25 30 35 4002468

101214161820

XY-Scatter

Temp: x

Truc

ks: y

Temp TrucksObs x y

1 11 2.52 14 6.53 20 8.54 21 10.55 23 116 24 127 26 138 28 13.59 30 15.5

10 34 19

Page 6: SESSION 49 - 52

Regression Analysis

Regression analysis is used to predict the value of one variable on the basis of the other variables. The first-order linear model describes the relationship between the dependent variable Y and the independent variable(s) X. The regression model with a as the y-intercept and m as the slope coefficient is of the form:

Page 7: SESSION 49 - 52

Example Temperature - Truck

Temp TrucksObs x y

1 11 2.52 14 6.53 20 8.54 21 10.55 23 116 24 127 26 138 28 13.59 30 15.5

10 34 19

5 10 15 20 25 30 35 4002468

101214161820

f(x) = 0.654323775118537 x − 3.91487920523821

XY-Scatter

Temp: x

Truc

ks: y

The estimators of the intercept a and slope coefficient b are based on drawing a straight line through the sample data:

Page 8: SESSION 49 - 52

Intercept and Slope

The intercept a is the y-coordinate of the point where the linear function intersects the y-axis. The slope coefficient b is defined as the change in y for a unit change in x.

Page 9: SESSION 49 - 52

Fitted Line With Residuals

The line drawn through the point is called the regression line.

Page 10: SESSION 49 - 52

Residuals Squared

The regression or least square line represents a line that minimizes the sum of the squared differences between the points and the line.

Page 11: SESSION 49 - 52

Calculating Coefficients

Raw Data (y-variable as dependent and x as independent variable):

Temp TrucksObs x y

1 11 2.52 14 6.53 20 8.54 21 10.55 23 116 24 127 26 138 28 13.59 30 15.5

10 34 19

Page 12: SESSION 49 - 52

SolutionTemp Trucks

Obs x y xy x^21 11 2.5 27.5 1212 14 6.5 91 1963 20 8.5 170 4004 21 10.5 220.5 4415 23 11 253 5296 24 12 288 5767 26 13 338 6768 28 13.5 378 7849 30 15.5 465 900

10 34 19 646 1156Total 231 112 2877 5779

Step1: Calculate the gradient (beta):

Page 13: SESSION 49 - 52

SolutionTemp Trucks

Obs x y xy x^21 11 2.5 27.5 1212 14 6.5 91 1963 20 8.5 170 4004 21 10.5 220.5 4415 23 11 253 5296 24 12 288 5767 26 13 338 6768 28 13.5 378 7849 30 15.5 465 900

10 34 19 646 1156Total 231 112 2877 5779

Step 2: Calculate the intercept (alpha):

Page 14: SESSION 49 - 52

Interpreting the Coefficients

The slope coefficient b may be interpreted as the change in the dependent variable y for a one unit change in x. In the previous example, a one unit change in temperature results in a b = 0.654 additional truckloads of cool drinks sold.The intercept a is the point at which the regression line and the y-axis intersect. If x = 0 lies far outside the range of sample values x, the interpretation of the intercept is not straight-forward. In the temperature-truck example, x = 0 lies outside the smallest and largest values for x in the sample. Interpreting the intercept for x would imply that at temperature of x = 0, the soft-drink sales decline to negative 3.914!

Page 15: SESSION 49 - 52

Point Prediction

Upon obtaining the coefficient estimates we can predict the outcome for various x (point prediction) between the minimum and maximum sample observation using the regression function y = a + mx. For example:x = 16 degrees? y = 3.914 + 0.654*16 y = 6.554 ≈ 7 truckloads

X = 32 degrees? y = 3.914 + 0.654*32 y = 17.023 ≈ 17 truckloads

Page 16: SESSION 49 - 52

Pearson Coefficient of Correlation

The Pearson coefficient of correlation R may be used to test for linear association between variables. The coefficient is useful to determine whether or not a linear relationship exists between y and x. Note that variables may be positively or negatively correlated. R = 1 denotes perfect positive correlation, R = -1 signifies perfect negative correlation. R is defined for:

Page 17: SESSION 49 - 52

Type of Relationship

DIRECT LINEAR RELATIONSHIP

Small Dispersion Wide Dispersion

INVERSE LINEAR RELATIONSHIP

Small Dispersion Wide DispersionNO LINEAR

RELATIONSHIP

y

x

y

x

y

x

y

x

y

x

y

x

y

x

y

x

y

x

y

x

Positive Linear Correlation exists

0 < r <+ 1

Negative Linear Correlation exists

-1 < r < 0

No Correlation

r = 0

Page 18: SESSION 49 - 52

Coefficient of Determination

Squaring the Pearson coefficient of correlation delivers the coefficient of determination R2 in regression. It may be interpreted as the proportion of variation in the dependent variable y that is explained by the variation in the explanatory variable x. R2 is a measure of strength of the linear relationship between y and x.

Page 19: SESSION 49 - 52

Solution

Step 3: Calculate R and R2

Temp TrucksObs x y xy x^2 y^2

1 11 2.5 27.5 121 6.252 14 6.5 91 196 42.253 20 8.5 170 400 72.254 21 10.5 220.5 441 110.255 23 11 253 529 1216 24 12 288 576 1447 26 13 338 676 1698 28 13.5 378 784 182.259 30 15.5 465 900 240.25

10 34 19 646 1156 361Total 231 112 2877 5779 1448.5

Page 20: SESSION 49 - 52

Spearman Rank Correlation

The standard coefficient of correlation allows for determining whether there is evidence of a linear relationship between two interval variables. In case where the variables are ordinal, or, if both variables are interval, the normality requirement may not be satisfied. A nonparametric test statistic called Spearman Rank Correlation Coefficient may be used under the circumstances.

Page 21: SESSION 49 - 52

Objective: Comparing 2 Variables

Nominal

Chi-Square test of a contingency table

Nominal

Analyzing the relationship between two variables

Ordinal

Data type?

Spearman Rank Correlation

Population Distribution?

Error is normal or x and y bivariate

normal

x and y not bivariate normal

Simple linear regression

Page 22: SESSION 49 - 52

Example

Ranking

Business AspectManagement Staff

Brand Equity 1 1Financial Controls 2 3Customer Service 3 2Planning Systems 4 6Research & Development 5 4Company Morale 6 7Productivity 7 5

Below there is a list of organizational strengths that were independently ranked by management and staff and the managing director wished to know how closely correlated were the assessments:

Page 23: SESSION 49 - 52

Calculating RS

Ranking

Business Aspect ObsManage

ment Staff d d^2Brand Equity 1 1 1 0 0Financial Controls 2 2 3 -1 1Customer Service 3 3 2 1 1Planning Systems 4 4 6 -2 4Research & Development 5 5 4 1 1Company Morale 6 6 7 -1 1Productivity 7 7 5 2 4Total 12