mlb all-star game voting over three eras presentation

21
Sabermetrics in Practice: Examining Fan Voting for MLB All-Stars over Three Eras Allison R. Levin, MA, JD President, Social Network Advisors for Professional Sports [email protected] Twitter: @arl1102

Upload: allison-levin

Post on 18-Jul-2015

64 views

Category:

Social Media


0 download

TRANSCRIPT

Sabermetrics in Practice: Examining Fan Voting for MLB All-Stars over

Three Eras

Allison R. Levin, MA, JDPresident, Social Network Advisors for Professional Sports

[email protected]: @arl1102

The All-Star Game

The Study

• The research seeks to understand what criteria the fans valued most when selecting All-Stars and how it has changed over time

• The author collected partial year statistics for the All-Star year as well as full year statistics for the previous two years, for the top three vote getters at each position for the 1994, 2004, and 2014 games.

The Data

• Three classifications of statistics were examined to explain the percentage difference in votes

• Since there are many potential explanatory variables relative to the number of observations several determinations were necessary– What is the best regression model to use

– When does adding additional variables to the model stop providing additional meaningful value

– What order to enter variables for consistency over time

The Regression Model

• To select independent variables the best one-variable model is compared with the best two-variable model and so on.

• The criterion used to select the best of all these models is the one that maximizes the adjusted R2

Overfitting

• John von Neumann explained overfitting

– With four parameters I can fit an elephant and with five I can make him wiggle his trunk

• The subset best fit model was also used to attempt to control for overfitting by estimating the best variables for each time period.

Best models

1994

1994

2004

2014

What order

• Due to the relationship between variables and to avoid researcher bias, it was important to first determine the order in which to enter variables

• How does one become a baseball fan?

Traditional Statistics

• Most people don’t remember when they became a baseball fan

– Instead, we tend to have initial memories surrounding favorite players

Favorite Players

Visibility

• The second set of variables entered was visibility– 1994- 42% of Americans had pay television

• Of those only 8.1% paid for premium services

• Approximately 32 games on local tv

• ESPN showed 3 games a week

– 2004- 78% of Americans had cable television• Of those 56.8% paid for a package that included sports

programming

• Approximately 85% of local market games

– 2014- 97% of Americans have some form of paid television

SABR

• For diehard fans once they become interested in and have knowledge of multiple teams and players they seek out more information

• Have a group of players and start thinking about how they rate versus each other

Hypothesis 1

When information about players was limited fans tended to vote on the visibility and popularity of the players

Partially supported

Results

1994

Traditional 28%

Visibility 29%

SABR 43%

2004

Traditional 28%

Visibility 0%

SABR 53%

Hypothesis 2

When fans had access to multiple games and nearly unlimited information about players on the Web, fans tended to vote by comparing players

Not supported

Results

Traditional 39%

Visibility 13%

SABR 48%

Hypothesis 3

When Twitter is included fans are influenced by team tweets

Partially supported; Further research needed

Twitter Usage

• Team tweets for 2012-2014 asking for all-star votes were examined for each team with a player in the top 3

• Tweets and Retweets were examined – Tweets- call for action

– Retweet- action

• The adjusted R squared for the 2014 model including retweets showed that retweets significantly increased the explanatory power of the regression

Twitter Trends

Increased Action