* obviously, the pattern of the points in the sample does not match the pattern of the population
TRANSCRIPT
*Regression Analysis
*When we studied bivariate data, we found a that was the best model for the data we had
*We assumed that the data we had made up the entire population
*We calculate r to judge the strength and direction of the linear model
*Regression Analysis
*But, what if the data we used to find that particular was really only a sample of data from a larger population?
*Does the pattern of our sample match the pattern of the whole population? Or did we pick a sample with 10 lucky points?
*Regression Analysis
*Obviously, the pattern of the points in the sample does not match the pattern of the population.
*Regression Analysis
*r, the correlation coefficient of the sample doesn’t equal ,r the
correlation coefficient of the population
*Regression Analysis
*Question: If our sample is clustered in an ellipse and looks fairly linear, does it come from a population with a similar ellipse or not?
*Question: Is our r a good estimate of r?
*Regression Analysis
*We could do some sort of inference procedure with r and r, but that is complicated, so we’ll do the inference procedure on the slope instead.
*Remember , so the slope and correlation coefficient are tied together, so this will work
Population
Parameters
*Correlation coefficient of the population =
*y-intercept of the population =
*Slope of the
*population =
Sample
Statistics
*Correlation coefficient of the sample = r
*y-intercept of the sample = a
*Slope of the sample = b
*Regression Analysis
*Conditions for Inference on Regression
*The true relationship between x and y is linear*Check the scatter plot and residual plot
*For any value of x the values of y are independent*Random sample
*For any value of x the y-values are normally distributed*Check a histogram or boxplot of the residuals
*The standard deviation of the y values is constant*Check the scatter plot and residual plot
*Regression Analysis
*s =
*s is the standard error of the residuals, an estimate of the true standard deviation of the y-values of the population
*Interpretation of s = This is the measure of the variation of (y in context) for a given (x in context)
*Regression Analysis
*Return to Sampling Distributions
*How do we know, just from our sample, anything about the population?
*We have a picture of what the samples should look like from the sampling distribution
*If we take sample after sample of the same size from this population and calculate the slope each time, we get a normal approximation with a mean equal to the true slope and a standard deviation = =
*Regression Analysis
* =
*This is the standard deviation of the slopes
*Interpretation of - This is how much we would expect the sample slopes of predicting (y in context) with (x in context) to vary from sample to sample
*Regression Analysis
*Confidence Interval
*statistic ± (crit. val.)(std. dev. of stat.)
*b1 ± t*(Std. dev. of stat.) or (std. error)
*1. In a study of the performance of a computer printer, the size (in kilobytes) and the printing time (in seconds) for each of 22 small text files were recorded. A regression line was a satisfactory description of the relationship between size and printing time. The results of the regression analysis are shown below.
Dependent variable: Printing Time SourceSum of Squares df Mean Square F-ratio Regression 53.3315 1 53.3315 140 Residual 7.62381 20 0.38115 Variable Coefficient s.e. of coeff t-ratio prob Constant 11.6559 0.3153 37 <0.0001 Size 3.47812 0.294 11.8 <0.001 Rsquared = 87.5% Rsquared(adjusted) = 86.9% s=0.6174 with 22-2 = 20 degrees of freedom
95% Confidence Interval
Dependent variable: Printing Time Source Sum of Squares df Mean Square F-ratio Regression 53.3315 1 53.3315 140 Residual 7.62381 20 0.38115 Variable Coefficient s.e. of coeff t-ratio prob Constant 11.6559 0.3153 37 <0.0001 Size 3.47812 0.294 11.8 <0.001 s=0.6174 with 22-2 = 20 degrees of freedom
Rsquared = 87.5% Rsquared(adjusted) = 86.9%
*Regression Analysis
*Hypothesis Test*Null Hypothesis : There is no relationship between x and y, there is no correlation
*Ho: = 0
*Ha: or or
*Where is the true slope of the relationship between (x in context) and (y in context)
*1. In a study of the performance of a computer printer, the size (in kilobytes) and the printing time (in seconds) for each of 22 small text files were recorded. A regression line was a satisfactory description of the relationship between size and printing time. The results of the regression analysis are shown below.
Dependent variable: Printing Time SourceSum of Squaresdf Mean Square F-ratio Regression 53.3315 1 53.3315 140 Residual 7.62381 20 0.38115 Variable Coefficient s.e. of coeff t-ratio prob Constant 11.6559 0.3153 37 <0.0001 Size 3.47812 0.294 11.8 <0.001 Rsquared = 87.5%Rsquared(adjusted) = 86.9% s=0.6174 with 22-2 = 20 degrees of freedom
Sufficient Evidence of a linear relationship?