baseball statistics by krishna hajari faraz hyder william walker

22
Baseball Statistics By Krishna Hajari Faraz Hyder William Walker

Upload: bernardo-brantingham

Post on 15-Dec-2015

228 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Baseball Statistics By Krishna Hajari Faraz Hyder William Walker

Baseball Statistics

ByKrishna HajariFaraz Hyder

William Walker

Page 2: Baseball Statistics By Krishna Hajari Faraz Hyder William Walker

Objective

• Our goal is to find out if, over the past 10 years, there is a consistent factor that affects the winning percentage of the 30 teams in the Major League Baseball.

Page 3: Baseball Statistics By Krishna Hajari Faraz Hyder William Walker

Explanatory Variables

• Team Batting Average• Baseball Stadium Dimensions• Team Payroll• Average Game Attendance• ERA (earned run average)

Page 4: Baseball Statistics By Krishna Hajari Faraz Hyder William Walker

Explanation of Variables

• Team Batting AverageThis is the statistic used to evaluate the batter’s performance.Hits/Official At Bats

• Stadium DimensionsEach Stadium has a different field size, so we will be testing the distance from home plate to the left, center, and right wall to see if it has an impact on a team’s performance.

Page 5: Baseball Statistics By Krishna Hajari Faraz Hyder William Walker

Explanation of Variables • Team Payroll Each teams’ payroll in the MLB is different. In 2004, the highest paying

team, Yankees, had a payroll of $184 million more then 6 times as much as the lowest paying team, the Devil Rays, at $27.5 million. We propose that higher paying teams perform better. The average payroll for the 2004 season was approximately $69 million, with a standard deviation of $33 million.

• Average Game AttendanceThe average game attendance for 2004 was approximately 30.3k people with a standard deviation of 8.9k.

Page 6: Baseball Statistics By Krishna Hajari Faraz Hyder William Walker

Explanation of Variables

• Earned Run Average This is the statistic used to evaluate a pitcher’s

performance. This is calculated using the following formula

Number of runs allowed*9 Innings Pitched

Page 7: Baseball Statistics By Krishna Hajari Faraz Hyder William Walker

Response Variable

• Winning Percentage for Each Team– Games won / Games played

Page 8: Baseball Statistics By Krishna Hajari Faraz Hyder William Walker

2004 Data

• In this presentation we will take one year and will show you how we intend to analyze all of the data over the past 10 years, year by year.

Page 9: Baseball Statistics By Krishna Hajari Faraz Hyder William Walker

Hypotheses

• H0: None of these variables have an affect on winning percentage

• Ha: At least one of the variables have an affect on winning percentage

Page 10: Baseball Statistics By Krishna Hajari Faraz Hyder William Walker

Initial SummaryThis initial summary shows that the p value is very small therefore we must conclude that at least one of the variables is significant.This is the summary of the most general linear model with all five explanatory variables present.

Page 11: Baseball Statistics By Krishna Hajari Faraz Hyder William Walker

ANOVA Table

This ANOVA table shows that at least three variables are significant because their p value is less then 0.05

Page 12: Baseball Statistics By Krishna Hajari Faraz Hyder William Walker

Variance Inflation Factor

The VIF for all five of the explanatory variables is less than 10 therefore we will not exclude any of them from

the regression

Page 13: Baseball Statistics By Krishna Hajari Faraz Hyder William Walker

Correlation Matrix

The correlation matrix is showing a somewhat high correlation between attendance and payroll, however this is to be expected since teams with higher attendance would generate more revenue, and therefore have higher payrolls.

Page 14: Baseball Statistics By Krishna Hajari Faraz Hyder William Walker

All Possible Regressions

According to all the goodness criteria, the best model seems to be the one with ERA, Payroll, and Batting Average.

Page 15: Baseball Statistics By Krishna Hajari Faraz Hyder William Walker

Summary of Stepwise Regression

Page 16: Baseball Statistics By Krishna Hajari Faraz Hyder William Walker

The residuals seem to be distributed evenly above and below the 0 line. However the residuals seem to be more negative as the predicted winning percentage goes below .45.

The Q-Q Plot indicates that the model is a not nearly a perfect fit, but is still close to a straight line.

Page 17: Baseball Statistics By Krishna Hajari Faraz Hyder William Walker

Variance Test

The variance test shows that most of the variances are very close to each other. This validates the assumption that the

variances are approximately equal.

Page 18: Baseball Statistics By Krishna Hajari Faraz Hyder William Walker

The only influential outlier, 19, is the New York Yankees. This is understandable given their astronomical payroll.

Page 19: Baseball Statistics By Krishna Hajari Faraz Hyder William Walker

The Box-Cox Plot is indicating that a Box-Cox Transformation can be used with p = 2 to improve the model.

Page 20: Baseball Statistics By Krishna Hajari Faraz Hyder William Walker

The Box-Cox Transformation has improved the model, and it can be seen in these graphs. The residuals appear to be much more normally distributed, and the line is much closer to 0 when the outlier is removed.

The Q-Q plot is also closer to a straight line, indicating an improved model.

Page 21: Baseball Statistics By Krishna Hajari Faraz Hyder William Walker

Summary of Final Model with BoxCox Plot Transformation

Page 22: Baseball Statistics By Krishna Hajari Faraz Hyder William Walker

Conclusion•The Box-Cox Transformation improved the model•Unexpectedly, payroll was determined to play a comparatively minor role in the 2004 season. It also does not appear in the stepwise regression models for 5 of the past 10 years.•The two explanatory variables that were consistent factors over the past 10 years were ERA and Batting Average