stat 203 elementary statistical methods. review of basic concepts population and samples variables...

42
STAT 203 Elementary Statistical Methods

Upload: walter-hart

Post on 29-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

STAT 203

Elementary Statistical Methods

Page 2: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 2

Review of Basic Concepts

• Population and Samples• Variables and Data• Data Representation (Frequency Distn Tables,

Graphs and Charts)• Descriptive Measures Measures of Central Tendency (mean, median)Measures of Variation(standard deviation, range

etc)Five Number Summaries

Page 3: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 3

Examining Relationships of Two Numerical Variables

• In many applications, we are not only interested in understanding variables of themselves, but also interested in examining the relationships among variables.

• Predictions are always required in business, economics and the physical sciences from historical or available data

Page 4: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 4

Examples

• Final Exam Score, Study time and class attendance

• Production; overhead cost, level of production, and the number of workers

• Real Estate; value of a home, size(square feet), area

• Economics; Demand and supply • Business; Dividend yield and Earnings per share

Page 5: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 5

Terminology

• Dependent Variable (response variable or y-variable)

• Independent Variable (predictor variable or x-variable):

Page 6: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 6

Graphical

• Scatter plots (Useful for 2variables )A scatter plot is a graph of plotted data pairs x and y. • Matrix plot (Useful for more than 2 variables)It presents the individual scatter plots in a form of a matrix

Page 7: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 7

Example

• Consider some historic data for a production plant:

Production Units (In 10,000s): 5 6 7 8 9 10 11Overhead Costs (In $1000s): 12 11.5 14 15 15.4 15.3 17.5Construct a scatter plot for y verses x

Page 8: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 8

Example (cont)

Page 9: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 9

Example of a matrix plot

Page 10: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 10

Linear Correlation Coefficient (r)

• Used as a computational approach to determine the relationship between 2 variables

• Pearson’s product moment correlation coefficient (PPMC)

• Spearman’s rank correlation coefficient• Kendall’s correlation coefficient (τ)

Page 11: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 11

Pearson’s product moment correlation coefficient (PPMC)

• Computational Formula

Page 12: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 12

Properties of the Correlation Coefficient

• • When , Then a perfect linear relationship

exists between X and Y• When Then no linear relationship exists

between X and Y• When then a weak positive linear relationship

between X and Y

Page 13: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 13

Properties of the Correlation Coefficient (cont)

• When then a weak negative linear relationship between X and Y

• When then a strong positive linear relationship between X and Y

• When then a strong positive linear relationship between X and Y

Page 14: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 14

Some scatter plots

Page 15: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 15

NOTE

• The correlation only measures “linear” relationship. Therefore, when the correlation is close to 0, it indicates that the two variables have a very weak linear relationship. It does not mean that the two variables may not be related in some different functional form (like quadratic, cubic, S-shaped, etc.)

Page 16: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 16

Example of a quadratic relationship between X and Y

Page 17: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 17

Simple Linear Regression

• Linear regression is a statistical technique used to describe the relationship between variables

• Where the interest is to examine the relationship between 2 variables, it is referred to as Simple Linear regression (SLR)

• If the relationship is believed to be linear, then the equation expressing this relationship is the equation of the line:

Page 18: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 18

Simple Linear Regression

• If a graph of all the (x ,y ) pairs is plotted, and a line is determined to fit the data, then represents the y-intercept and represents the slope of the line

Page 19: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 19

Exact (Deterministic) Relationship

• We do not require regression analysis to obtain the linear equation expressing an exact relation.

• Exact linear relationships are encountered in some business environments

For Example;•

Page 20: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 20

Graph of an Exact Linear relationship

x 1 2 3 4 5 6

y 3 5 7 9 11 13

Page 21: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 21

Non-exact relationship

• Data encountered in real life and many business applications do not have an exact relationship.

• Exact relationships are an exception rather than the rule

• Real life data are more likely to look like the graph below;

Page 22: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 22

Graph of a Non-Exact Relationship x 1 2 3 4 5 6

y 3 2 8 8 11 13

654321

14

12

10

8

6

4

2

x

y

Scatterplot of y vs x

Page 23: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 23

Assumptions for SLR

• There is a linear relationship (as determined) between the 2 variables from the scatter plot

• The dependent values of Y are mutually independent.

• For each value of x corresponding to Y-values are normally distributed

• The standard deviations of the Y-values for each value of x are the same (homoscedasticity)

Page 24: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 24

Best-Fitting Line

Page 25: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 25

Least-Square Criterion

• The criterion for the best-fitting line that minimizes the “sum of squared errors” is known as the Least-Squares Criterion

• The best-fitted line is denoted as; Where is the predicted value and is the intercept and slope respectively computed from the least square method

Page 26: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 26

Least-Square Criterion

• The difference between the actual y-value and the predicted value is called a residual and represents the error

• This error is denoted as;

• The line that minimizes Is referred to at the Least-square regression line

Page 27: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 27

Least Square Criterion

• The computational formulas for and that minimizes is given by

• or

Page 28: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 28

Interpretation of and

• represents the average change in y for a unit change in x

• represents the average value of y, when x=0. (NB: this only has practical meaning when 0 is in the range of values of x)

Page 29: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 29

Computation of Error Sum of Squares (SSE)

• Recall: The use of the least square criterion is to the minimize error;

• The line that minimizes is the LS regression line.

• is referred to as the Error sum of squares. • Denoted;

Page 30: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 30

Mean Square Error (MSE)

• SSE is the sum of squared deviations of each of the observed values from the fitted regression line

• SSE is often referred to as the unexplained sum of squares

• The Mean Square Error (MSE) is an estimate of the variance around the regression line

Page 31: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 31

Computation of Total Sum of Squares

• The total variation in the response variable (y) is referred to total sum of squares

• Denoted; or SST=

Page 32: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 32

Computation of Regression Sum of Squares (SSR)

• The SSR measures the improvement in using the regression line rather than the sample mean to make predictions.

• Denoted; or SSR=

Page 33: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 33

Regression Mean Square

• The SSR which is referred to as the explained sum of squares and it measures the “explained” variation in y by the use of x.

• A measure of this variation is by use of the Mean Square Regression (MSR)

Page 34: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 34

Coefficient of Determination (

• This is a value that measures the percentage of variation in y that is explained by the regression.

• or • If is computed to be 0.853, then it means

85.3% of the variation in y is explained by the regression of y on x.

Page 35: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 35

Analysis of Variance (ANOVA)

Definition• Analysis of variance (ANOVA) refers to the

partitioning of the total variation in the dependent variable (y) into the regression and error sum of squares.

Page 36: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 36

Analysis of Variance for SLRSource Sum of

SquaresDegrees of Freedom

Mean Squares F

Regression SSR p-1 MSR=SSR/p-1 MSR/MSE

Residual SSE n-p MSR=SSE/n-p

Total SST n-1

Page 37: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 37

Homework 1

Q1. The following data are annual disposable income (in $1000) and the total annual consumption (in $1000) for 12 families selected at random from a large metropolitan area.

Income 16 30 43 70 56 50 16 26 14 12 24 30Consum

ption 14 24.55 36.78 63.25 40.18 49.55 16 22.39 16.04 12 20.77 34.78

Page 38: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 38

Homework 1 (cont)

i. Draw a scatter plot for the data and comment on the relationship between the 2 variables ii. Calculate the correlation coefficient and comment on the relationship between the variables.iii. Fit a simple linear regression of consumption on income for the data.iv. Interpret the least squares regression coefficient estimates in the context of the problem

Page 39: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 39

Homework 1 (cont)

v. Estimate the consumption of a family whose annual income is $17000. Would you consider the prediction as an Extrapolation? Why?vi. Draw an appropriate ANOVA table for the regression of Y on Xvii. Compute the coefficient of determination R-square and interpret the value.

Page 40: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 40

Homework 1 (cont)

Q2 1.Hanna Properties is a real estate company which specializes in custom-home resale in Phoenix, Arizona. The following is a sample of the size (in hundreds of square feet) and price (in thousands of dollars) data for nine custom homes currently listed for sale.

size 26 27 33 29 29 34 30 40 22

price 290 305 325 327 356 411 488 554 246

Page 41: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 41

Homework 1 (cont)

a) Draw a scatter plot for the data and comment on the relationship between the 2 variables b) Calculate the correlation coefficient between the size and price of custom homes at Hanna properties and comment on the relationship between the variables.c) Fit a simple linear regression of price on size for the data.d) Interpret the least squares regression coefficient estimates in the context of the problem

Page 42: STAT 203 Elementary Statistical Methods. Review of Basic Concepts Population and Samples Variables and Data Data Representation (Frequency Distn Tables,

AA-K 2014/15 42

Homework 1 (cont)

e) Estimate the price of a custom home from Hanna Properties if the size of the home is 3200 sq. ft? Would you consider the prediction as an Extrapolation? Why?f) Draw an appropriate ANOVA table for the regression of Y on Xg) Compute the coefficient of determination R-square and interpret the value.