baseball stats

14
Baseball and Steroids Rachel Monaco April 27, 2014 MA-315-A

Upload: rachel-monaco

Post on 08-Jul-2015

88 views

Category:

Education


0 download

DESCRIPTION

baseball

TRANSCRIPT

Page 1: Baseball stats

Baseball and SteroidsRachel MonacoApril 27, 2014MA-315-A

Page 2: Baseball stats

You mad, bro?

Steroid use has been a plague in our modern day athletic world

Roid Rage is a commonly used term for those who show over aggressive tendencies in the athletic world

However steroid users say that the drugs give them better moods, cognitive functions, confidence, and many other seemingly positive side effects

Users also claim steroid use helps them be who they are

Case and point, if you’re a jerk to begin with, you’re probably just going to be a bigger jerk on steroids

Page 3: Baseball stats

Baseball Steroid Use

In 2004 the MLB decided to make all players submit to random steroid testing to help cut down on what seemed to be an epidemic during prior years

I wanted to compare the two means for the batting averages (for both the American and the National League) before and after the MLB’s steroid testing

I want to assume that after steroid testing the batting averages dropped due to the super sluggers dropping out a bit or stopping their steroid use

I also want to look at specific MLB stars Barry Bonds and Alex Rodriguez and how their stats have changed during this time

Page 4: Baseball stats

How the data was collected

Fortunately baseball has an enormous amount of data that

has been collected over an incredible amount of years. I was

able to use the MLB’s website full of statistics as well as the

Baseball Almanac. It was a fairly easy collection of data.

However, I wish I could have seen all of these games and

been able to collect the data that way.

Page 5: Baseball stats

Batting Averages before and after

In order to test the differences in the batting averages over the ten years before and the ten years after implementing random steroid testing I decided to do a test of comparing two means for the combined batting averages of the American League and the National League.

My null hypothesis is that the MLB’s combined batting averages before random steroid testing and the MLB’s combined batting averages after random steroid testing are equal

My alternative hypothesis tis that the MLB’s combined batting averages before random steroid testing and the MLB’s combined batting averages after random steroid testing are not equal

Where Hn is the null hypothesis, μbt is the batting average before random steroid testing and μat is the batting average after random steroid testing

Also I have chosen a 0.10 level of significance

Page 6: Baseball stats

Pool or not to pool?

Before comparing our two means we must use an F test to see if our variances are equal or not in order to decide whether or not to pool or not to pool our sets of data.

My null hypothesis is that the variances are equal

My alternative hypothesis is that the variances are not equal

I used to Data Analysis ToolPak extension from Excel to run the F test. Before running this however I decided that my level of significance would be 10%.

We see that our Fvalue is 0.2961 and our F Crticial value is 0.4098. When we compare these two values we see that our F Critical value is higher and we reject our null hypothesis that our variances are equal and will assume unequal variance for our test of comparing two means along with not pooling the data sets

Page 7: Baseball stats

Comparing two means

From there we end up using the equation

Where our SE is 0.002145.

From here I am able to get our test statistic through the equation

We get our t to be 2.592

From there we use our Excel command “=t.dist.2t(2.592,min(9,9))” which gives us the output of 0.029 as my p value

From here I am able to compare my p value to my level of significance which is 0.10. Seeing as our level of significance is greater than that of our p value we reject the null hypothesis that the MLB’s combined batting averages are equal.

What does that mean?!

Page 8: Baseball stats

Alex Rodriguez Recent Scandal

Within the last year one of the biggest steroid scandals has happened which happens to circle around Alex Rodriguez (A-Rod) who plays for the New York Yankees. In 2013, A-Rod was sentenced to the biggest drug suspension from baseball which will take place during the 2014 season for his use of steroids.

A-Rod will be suspended from 162 games instead of his original 211 decision

So I decided to look at this third baseman and did a One-way ANOVA for his years with the NYY.

Page 9: Baseball stats

One-Way ANOVA

I performed a one-way ANOVA test for the different types of hits A-Rod had during his years with the Yankees from 2004-2013. Our null hypothesis for a one-way ANOVA is that all of the averages are equal to each other whereas our alternative hypothesis is that at least one of the means is different.

After performing the test I see the hit and getting to first average is 140.4, getting a double is 23.4, a triple 0.8, and a home run turns out to be 30.9.

Between the groups we see the p value is 1.07E-14, which is an extremely small value showing us that we cannot conclude that the averages are the same.

In this case given the data I would assume that A-Rod is indeed the power hitter we have assumed him to be, and as someone who doesn’t like the Yankees I am hopeful that the 2014 season without him will help others in the league make the strides they need

Page 10: Baseball stats

Barry Bonds

One of the greats

Bonds definitely would have made it to the hall of fame without the use of steroids, much like Nixon would have won his presidency the second time around if he had not cheated as well

San Francisco Giants Left Fielder from 1993-2007

While A-Rod had his years with the Yankees after the starting of the random steroid testing, Barry was before, after, and during implementation of the change

Page 11: Baseball stats

Confidence Interval

I wanted to construct a 95% confidence interval of Barry Bonds Home Runs in the years he played for the Giants.

Doing this I get my z critical score to be -1.96 and 1.96

I defined my population as Barry Bond’s Home Runs during his years playing for the Giants; therefore, I can figure out the population standard deviation through my data

Page 12: Baseball stats

I ran the descriptive statistics for Barry Bonds and found the mean for his Home Runs to be 39067, the median to be 40, the mode to be 46, the standard deviation to be 14.52, as well as seeing the minimum to be 73, and the maximum to be 73.

Because I know the Standard Deviation of my population I can use the equation

Page 13: Baseball stats

Using this equation we get our E to be 17.80. From there we get our interval to be (31.72, 46.41).

This means that if we were to take random samples from our population, we would get a piece of data that would be within this interval 95% of the time.

So looking at this I wanted to look at the pieces of data that do not to analyze it

We see that before the steroid testing we have two pieces of data that are far higher than the upperbound of my confidence interval and after steroid testing we have two that are significantly lower (scratching 2005 where Bonds only played 14 games and hit 5 home runs)

Looking at this it seems that Bonds best seasons were when he was using steroids

This makes sense because of Bonds involvement with the BALCO scandal between 2003 to 2004 which was one of the trigger points of implementing the random steroid testing rule in the MLB

Page 14: Baseball stats

Has it worked?

Within my analyses I believe that the implementation of random steroid drug testing has been effective in preventing some cheating within the sport

I would like to see how the Yankees do this coming season to better help with my analysis