baseball stats
DESCRIPTION
baseballTRANSCRIPT
Baseball and SteroidsRachel MonacoApril 27, 2014MA-315-A
You mad, bro?
Steroid use has been a plague in our modern day athletic world
Roid Rage is a commonly used term for those who show over aggressive tendencies in the athletic world
However steroid users say that the drugs give them better moods, cognitive functions, confidence, and many other seemingly positive side effects
Users also claim steroid use helps them be who they are
Case and point, if you’re a jerk to begin with, you’re probably just going to be a bigger jerk on steroids
Baseball Steroid Use
In 2004 the MLB decided to make all players submit to random steroid testing to help cut down on what seemed to be an epidemic during prior years
I wanted to compare the two means for the batting averages (for both the American and the National League) before and after the MLB’s steroid testing
I want to assume that after steroid testing the batting averages dropped due to the super sluggers dropping out a bit or stopping their steroid use
I also want to look at specific MLB stars Barry Bonds and Alex Rodriguez and how their stats have changed during this time
How the data was collected
Fortunately baseball has an enormous amount of data that
has been collected over an incredible amount of years. I was
able to use the MLB’s website full of statistics as well as the
Baseball Almanac. It was a fairly easy collection of data.
However, I wish I could have seen all of these games and
been able to collect the data that way.
Batting Averages before and after
In order to test the differences in the batting averages over the ten years before and the ten years after implementing random steroid testing I decided to do a test of comparing two means for the combined batting averages of the American League and the National League.
My null hypothesis is that the MLB’s combined batting averages before random steroid testing and the MLB’s combined batting averages after random steroid testing are equal
My alternative hypothesis tis that the MLB’s combined batting averages before random steroid testing and the MLB’s combined batting averages after random steroid testing are not equal
Where Hn is the null hypothesis, μbt is the batting average before random steroid testing and μat is the batting average after random steroid testing
Also I have chosen a 0.10 level of significance
Pool or not to pool?
Before comparing our two means we must use an F test to see if our variances are equal or not in order to decide whether or not to pool or not to pool our sets of data.
My null hypothesis is that the variances are equal
My alternative hypothesis is that the variances are not equal
I used to Data Analysis ToolPak extension from Excel to run the F test. Before running this however I decided that my level of significance would be 10%.
We see that our Fvalue is 0.2961 and our F Crticial value is 0.4098. When we compare these two values we see that our F Critical value is higher and we reject our null hypothesis that our variances are equal and will assume unequal variance for our test of comparing two means along with not pooling the data sets
Comparing two means
From there we end up using the equation
Where our SE is 0.002145.
From here I am able to get our test statistic through the equation
We get our t to be 2.592
From there we use our Excel command “=t.dist.2t(2.592,min(9,9))” which gives us the output of 0.029 as my p value
From here I am able to compare my p value to my level of significance which is 0.10. Seeing as our level of significance is greater than that of our p value we reject the null hypothesis that the MLB’s combined batting averages are equal.
What does that mean?!
Alex Rodriguez Recent Scandal
Within the last year one of the biggest steroid scandals has happened which happens to circle around Alex Rodriguez (A-Rod) who plays for the New York Yankees. In 2013, A-Rod was sentenced to the biggest drug suspension from baseball which will take place during the 2014 season for his use of steroids.
A-Rod will be suspended from 162 games instead of his original 211 decision
So I decided to look at this third baseman and did a One-way ANOVA for his years with the NYY.
One-Way ANOVA
I performed a one-way ANOVA test for the different types of hits A-Rod had during his years with the Yankees from 2004-2013. Our null hypothesis for a one-way ANOVA is that all of the averages are equal to each other whereas our alternative hypothesis is that at least one of the means is different.
After performing the test I see the hit and getting to first average is 140.4, getting a double is 23.4, a triple 0.8, and a home run turns out to be 30.9.
Between the groups we see the p value is 1.07E-14, which is an extremely small value showing us that we cannot conclude that the averages are the same.
In this case given the data I would assume that A-Rod is indeed the power hitter we have assumed him to be, and as someone who doesn’t like the Yankees I am hopeful that the 2014 season without him will help others in the league make the strides they need
Barry Bonds
One of the greats
Bonds definitely would have made it to the hall of fame without the use of steroids, much like Nixon would have won his presidency the second time around if he had not cheated as well
San Francisco Giants Left Fielder from 1993-2007
While A-Rod had his years with the Yankees after the starting of the random steroid testing, Barry was before, after, and during implementation of the change
Confidence Interval
I wanted to construct a 95% confidence interval of Barry Bonds Home Runs in the years he played for the Giants.
Doing this I get my z critical score to be -1.96 and 1.96
I defined my population as Barry Bond’s Home Runs during his years playing for the Giants; therefore, I can figure out the population standard deviation through my data
I ran the descriptive statistics for Barry Bonds and found the mean for his Home Runs to be 39067, the median to be 40, the mode to be 46, the standard deviation to be 14.52, as well as seeing the minimum to be 73, and the maximum to be 73.
Because I know the Standard Deviation of my population I can use the equation
Using this equation we get our E to be 17.80. From there we get our interval to be (31.72, 46.41).
This means that if we were to take random samples from our population, we would get a piece of data that would be within this interval 95% of the time.
So looking at this I wanted to look at the pieces of data that do not to analyze it
We see that before the steroid testing we have two pieces of data that are far higher than the upperbound of my confidence interval and after steroid testing we have two that are significantly lower (scratching 2005 where Bonds only played 14 games and hit 5 home runs)
Looking at this it seems that Bonds best seasons were when he was using steroids
This makes sense because of Bonds involvement with the BALCO scandal between 2003 to 2004 which was one of the trigger points of implementing the random steroid testing rule in the MLB
Has it worked?
Within my analyses I believe that the implementation of random steroid drug testing has been effective in preventing some cheating within the sport
I would like to see how the Yankees do this coming season to better help with my analysis