debating immortality: application of data envelopment analysis to voting for the baseball hall of...

12
Debating Immortality: Application of Data Envelopment Analysis to Voting for the Baseball Hall of Fame Thomas J. Miceli a, * and Brian D. Volz b a Department of Economics, University of Connecticut, Storrs, CT, USA b Department of Economics, Assumption College, Worcester, MA, USA This paper applies data envelopment analysis to voting for the Baseball Hall of Fame (HOF). The approach interprets a players career statistics as inputs and the percentage of votes that he received for the HOF as the output. A constructed frontier based on past voting denes the maximum number of votes that a player should receive based on his statistical prole. Our results suggest that about a third of the current members of the HOF (excluding Negro League players, managers, umpires, and executives) should be replaced by more deserving players. Our conclusions, however, do not account for those aspects of a players career (both positive and negative) not captured by statistics. Copyright © 2012 John Wiley & Sons, Ltd. 1. INTRODUCTION Who should and who should not be in the Baseball Hall of Fame (HOF)? This question has been a peren- nial source of debate among baseball fans, sports talk show hosts, and baseball writers. Everyone either has a favorite candidate who they feel has been unjustly excluded or points to a pretender who does not belong. Often, the discussion involves a statistical comparison of the following sort: player A, who is in the HOF, had a certain number of home runs or wins, so player B, who has as many or more home runs or wins, should also be in. Of course, selection for the HOF does not work that way. To be eligible, a player must simply have played in the major leagues for at least 10 years and have been retired for at least 5 years. Eligible players can then be selected in one of two ways. First, he can be voted in by receiving the vote of at least 75% of the baseball writers who are entitled to vote in a given year. 1 Second, if he was not elected by the writers during his 15 years of eligibility, he can be selected by the VeteransCommittee. In terms of the requirements for selection, there are no set guidelinesit is based purely on the subjective opinions of the baseball writers or members of the VeteransCommittee. To be sure, certain career milestones have virtually ensured selection in the pastfor example, 500 home runs, 3000 hits, or 300 winsbut these standards have arisen informally over time and are subject to change as styles of play evolve and people update their opinions about what is humanly achievable.(For example, in the era of steroids, it is no longer clear that 500 home runs will ensure selection.) For most candidates, the decision is based on an assessment of the totality of their career, not only centering on their achievements on the eld, but also including their off-the-eld activities (or antics), their personalities, and their overall contributionto the game. *Correspondence to: Department of Economics, University of Connecticut, Storrs, CT 06269-1063, USA. E-mail: Thomas. [email protected] Copyright © 2012 John Wiley & Sons, Ltd. MANAGERIAL AND DECISION ECONOMICS Manage. Decis. Econ. 33: 177188 (2012) Published online 17 February 2012 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/mde.2543

Upload: thomas-j-miceli

Post on 09-Aug-2016

217 views

Category:

Documents


3 download

TRANSCRIPT

MANAGERIAL AND DECISION ECONOMICS

Manage. Decis. Econ. 33: 177–188 (2012)

Published online 17 February 2012 in Wiley Online Library

Debating Immortality: Application of DataEnvelopment Analysis to Voting for the

Baseball Hall of FameThomas J. Micelia,* and Brian D. Volzb

aDepartment of Economics, University of Connecticut, Storrs, CT, USAbDepartment of Economics, Assumption College, Worcester, MA, USA

(wileyonlinelibrary.com) DOI: 10.1002/mde.2543

*CorrespondenConnecticut,Miceli@UCo

Copyright ©

This paper applies data envelopment analysis to voting for the Baseball Hall of Fame(HOF). The approach interprets a player’s career statistics as inputs and the percentageof votes that he received for the HOF as the output. A constructed frontier based on pastvoting defines the maximum number of votes that a player should receive based on hisstatistical profile. Our results suggest that about a third of the current members of theHOF (excluding Negro League players, managers, umpires, and executives) should bereplaced by more deserving players. Our conclusions, however, do not account for thoseaspects of a player’s career (both positive and negative) not captured by statistics.Copyright © 2012 John Wiley & Sons, Ltd.

1. INTRODUCTION

Who should and who should not be in the BaseballHall of Fame (HOF)? This question has been a peren-nial source of debate among baseball fans, sports talkshow hosts, and baseball writers. Everyone either hasa favorite candidate who they feel has been unjustlyexcluded or points to a pretender who does not belong.Often, the discussion involves a statistical comparisonof the following sort: player A, who is in the HOF, hada certain number of home runs or wins, so player B,who has as many or more home runs or wins, shouldalso be in.

Of course, selection for the HOF does not work thatway. To be eligible, a player must simply have playedin the major leagues for at least 10 years and have beenretired for at least 5 years. Eligible players can then beselected in one of two ways. First, he can be voted in

ce to: Department of Economics, University ofStorrs, CT 06269-1063, USA. E-mail: Thomas.nn.edu

2012 John Wiley & Sons, Ltd.

by receiving the vote of at least 75% of the baseballwriters who are entitled to vote in a given year.1

Second, if he was not elected by the writers duringhis 15 years of eligibility, he can be selected by theVeterans’ Committee. In terms of the requirementsfor selection, there are no set guidelines—it is basedpurely on the subjective opinions of the baseballwriters or members of the Veterans’ Committee. Tobe sure, certain career milestones have virtuallyensured selection in the past—for example, 500 homeruns, 3000 hits, or 300 wins—but these standards havearisen informally over time and are subject to changeas styles of play evolve and people update their opinionsabout what is humanly “achievable.” (For example, inthe era of steroids, it is no longer clear that 500 homeruns will ensure selection.) For most candidates, thedecision is based on an assessment of the totality of theircareer, not only centering on their achievements on thefield, but also including their off-the-field activities(or antics), their personalities, and their overall“contribution” to the game.

T. J. MICELI AND B. D. VOLZ178

Most of the previous economic literature on votingfor the Baseball HOF has focused on the question ofwhether there exists racial bias in the selection process.The first such study, by Findlay and Reid (1997), con-sidered all non-pitchers who were eligible for HOFselection between 1962 (the first year Jackie Robinsonbecame eligible) and 1995. Using a Heckman two-stage procedure to correct for sample-selection bias(given that some players receive zero HOF votes), theyfound evidence that black players had a higher proba-bility of receiving no votes, but that racial status didnot affect the percentage of votes cast for those playerswho received votes. However, when the authorsallowed voting standards to vary over time, they foundsome evidence of discrimination against Latin playersalthough this result tended to diminish as time passed.They also found evidence of favorable treatment ofblacks although this result was sensitive to themodel specification.

In a similar study, Desser, Monks, and Robinson(1999) found a small bias among voters for whiteplayers as opposed to both black and Latin players,but the effect was not large enough to substantiallyalter the composition of the HOF. Jewel, Brown,and Miles (2002) likewise found slight evidence ofbias against black players who were born in LatinAmerican countries, but no evidence for bias againstblack players born in the USA. They also found,however, that any discrimination against Latinblacks was limited to those who probably wouldnot have gotten into the HOF anyway. Finally, Jew-ell (2003) found no evidence that discriminationagainst blacks and Latins affected the length of timethat players had to wait for election to the HOF.

The only general study of HOF voting in theeconomics literature is by Findlay and Reid (2002).2

As in the discrimination studies, they limited theirsample to players whose eligibility began in 1962,and they also focused exclusively on non-pitchers.They estimated two models: one in which the depen-dent variable was the highest percentage of votes thata player received during his 15 years of eligibility,and one in which it was a dichotomous variable indi-cating whether or not he was ever elected. They alsoused two methods for measuring a player’s on-fieldperformance: separate individual measures of offen-sive productivity (e.g., number of hits, home runs,and stolen bases) and a composite index ofperformance. The resulting four models all per-formed well in forecasting election to the HOF, butthe authors found a slight advantage for the modelsthat used individual performance measures.

Copyright © 2012 John Wiley & Sons, Ltd.

Finally, it is worth mentioning the classic book onHOF selection by James (1995), which, among (many)other things, discusses various statistical measuresaimed at determining who should be in the HOF(see especially Chapter 7). These approaches are funda-mentally different from the existing economic models,however, in the sense that they are not designed toreflect the actual voting behavior of baseball writersbut rather to advance more consistent and objectivecriteria for determining who should be enshrined.James’s approach therefore reflects what we would terma normative analysis of HOF voting, as opposed to thecurrent paper (and the aforementioned literature), whichconstitutes a positive analysis.

Our paper extends the existing literature in severalways. First, we expand our sample to include all playerswho ever received HOF votes (through 2009). Althoughthis presents certain difficulties as a result of changingstandards of play over the history of the game (whichwe attempt to address), as well as possible changes invoting standards, it greatly expands the sample size.We also include starting pitchers as well as positionplayers although we treat them separately because ofthe difference in performance measures. Most signifi-cantly, we employ data envelopment analysis, or DEA,rather than traditional parametric techniques to analyzeHOF voting. DEA is a non-parametric technique oftenused in productivity analysis to estimate the efficiencyof a given production unit—that is, how efficiently doesthat unit or decision maker combine inputs to producean output (Ray, 2004)? Aside from its methodologicalattributes (which we discuss in the next section), theDEA approach allows us to ask one question notaddressed in previous studies—namely, whether thereare some players in the HOF who should not be basedon past voting patterns.

Although DEA has not been previously applied toHOF voting, it has been employed extensively in theanalysis of Major League Baseball (MLB). For exam-ple, it has been used by Volz (2009) to analyze MLBmanager performance, by Lewis, Sexton, and Lock(2007) in an examination of the competitiveness ofMLB teams, and by Einolf (2004) to analyze the effi-ciency of MLB and National Football League teams.

The application of DEA in the current context isbased on the idea that voting for the HOF involves animplicit production process by which writers, as agroup, use “inputs” (a player’s career statistics) toproduce an “output” (the number or percentage of affir-mative votes that the player receives for selection to theHOF). This approach uses data on past voting behaviorto construct a “frontier,” representing the maximum

Manage. Decis. Econ. 33: 177–188 (2012)DOI: 10.1002/mde

APPLICATION OF DEA TO VOTING FOR THE BASEBALL HALL OF FAME 179

attainable percentage of votes (or output) for playersbased on their statistical profiles (or inputs). Theresulting “predicted vote total” can then be comparedwith the player’s actual vote total (for eligible players)to calculate his “efficiency score.” In this way, we can,for example, determine which players not in the HOFshould be based on past voting practices of the writers.The analysis can also be applied to not-yet-eligibleplayers to predict whether they will be voted in. Anextension of the basic DEA technique (to be describedin detail below) finally allows us to address the otherpart of the HOF debate: that is, which players are inthe HOF but should not be, again, based on pastvoting practices.

The remainder of the paper is organized as follows.Section 2 briefly describes the DEA methodology.Section 3 then describes the sample of players andthe statistical measures of performance that we willuse in the analysis. Section 4 reports the results ofthe basic analysis, which addresses the first part ofthe HOF debate, namely, which players not in theHOF (or not yet eligible) should be in, on the basisof past voting patterns? Section 5 turns to the extensionof the basic DEA, referred to as “super-efficiency,”which allows us to address the second part of thedebate: which players in the HOF should not be, againon the basis of past voting? Section 6 re-frames thequestion by using DEA to construct a HOF that isconstrained in size to the number of members currentlyenshrined. The predicted membership of this“hypothetical” HOF is then compared with the actualmembership to see which current members would beexcluded and which eligible non-members would beincluded. The total number that should be “replaced”turns out to be about one-third of the current member-ship. The analysis also yields the implied votethreshold for inclusion, which turns out to be above80% as compared with the actual threshold of 75%.Finally, Section 7 offers concluding comments andqualifications of the analysis.

Figure 1. Output-oriented technical efficiency model.

2. METHODOLOGY

The goal of our analysis is to identify players whoreceived a different percentage of votes than isexpected on the basis of observed voting patterns.DEA can be used to identify such players by calculat-ing the maximum percentage of votes that a playercould have received given his level of performance.This application differs little from the use of DEA inproductivity analysis of firms as presented by Banker,

Copyright © 2012 John Wiley & Sons, Ltd.

Charnes, and Cooper (1984). In productivity analysis,firms take inputs and transform them into outputs.These outputs are then compared with a constructedproduction possibilities frontier (PPF) in order to de-termine how much more output a firm could have pro-duced. The analysis presented here applies the samemethodology to the production of HOF votes, whereplayer performance statistics are treated as inputs intothe production of the output, HOF votes.

The constructed PPF is based on three assumptionsabout the underlying production technology. The firstassumption is that inputs are freely disposable. Thismeans that if a certain level of inputs can producesome level of output, then a level of inputs, which isgreater in at least one dimension, can also produce thatlevel of output. The second assumption is that outputis freely disposable. This implies that if a certain levelof inputs can produce some level of output, then thatsame level of inputs can also produce any lower levelof output. The third assumption is the convexity of theproduction possibilities set. Convexity implies thatany convex combination of two feasible input–outputbundles is also feasible.

The PPF can be shown graphically for the case of asingle-input, single-output technology. Figure 1 showsa hypothetical PPF where the input is, say, the numberof homeruns that a player hit during his career, and theoutput is the percentage of affirmative votes for inclu-sion in the HOF. The convexity assumption meansthat all points that are convex combinations ofobserved points are also feasible. Suppose, for exam-ple, that player A hit 200 career homeruns andreceived 80% of votes cast, whereas player B hit 400homeruns and received 100% votes cast. The convex-ity assumption implies that any convex combination,for example, 300 homeruns and 90% of votes cast, isalso feasible. Graphically, this is shown by connectingthe data points to create a convex hull. The assumption

Manage. Decis. Econ. 33: 177–188 (2012)DOI: 10.1002/mde

T. J. MICELI AND B. D. VOLZ180

of free disposability of inputs implies that all points tothe right of the convex hull are feasible, whereas theassumption of free disposability of output implies thatall points below a feasible point are feasible. Whenthese assumptions are combined, the result is thePPF shown in Figure 1.

In order to determine whether players are receivingthe maximum votes possible given their input bundle,their actual output can be compared with themaximum feasible output (as determined by the PPF)given their inputs. This measure is referred to as theoutput-oriented technical efficiency. Formally, thisindex of technical efficiency for a player is calculatedby dividing the actual output by the maximum feasibleoutput for that level of inputs. In Figure 1, for exam-ple, player C’s technical efficiency is the distance Ydivided by the sum of distances Y and Z. Given thetechnical efficiency of any player, his maximum feasi-ble votes can be identified. An advantage of thismethodology over parametric models is that anyplayer who received zero votes will not lie on therelevant portion of the PPF and, therefore, will notaffect other players’ predicted percentages. Therefore,DEA does not require any special treatment of zerovote players as has been necessary in previousresearch on the subject.

Obviously, no single statistic determines thepercentage of votes that a player receives for theHOF. For the case of multiple inputs, the technicalefficiency can be found by solving the followingmaximization problem and taking the inverse of theresulting variable, θ.

General DEA Model:

Maximize θ

Subject to:

XliVi≥θV0 (1)

XliX1i≤X10 (2)

XliX2i≤X20 (3)

XliX3i≥X30 (4)

Xli ¼ 1 (5)

li≥0: (6)

The index θ can be interpreted as the multiple bywhich output can be increased using a convexcombination of observed input–output bundles. The

Copyright © 2012 John Wiley & Sons, Ltd.

subscript 0 identifies values for the player being ana-lyzed. The subscript i identifies the other players inthe sample. Vj represents the percentage of votes thatplayer j received. X1 and X2 represent positive inputs(e.g., home runs, hits), whereas X3 represents a nega-tive input (e.g., pitcher losses).3 (The specific inputsused for this analysis will be described in the nextsection.) Constraint (1) implies that the convex combi-nation of other observed vote percentages must begreater than or equal to the observed vote percentageof the player being evaluated. Constraints (2) and (3)imply that the convex combination of positive inputsmust be less than or equal to the inputs of the playerunder consideration. Constraint (4) states that the con-vex combination of the negative input must be at leastas great as the negative input of the player being eval-uated. Constraint (5) implies variable returns to scale.The assumption of variable returns to scale is appropri-ate as players cannot receive more than 100% of votes.Therefore, as inputs increase, eventually, their contribu-tion to output must decrease resulting in decreasingreturns to scale. Lastly, constraint (6) assures that the so-lution satisfies the convexity assumption.

3. VARIABLE AND SAMPLE SELECTION

The sample for our analysis includes all positionplayers and starting pitchers who played in the majorleagues from 1876 to 2009 and who are eligible forthe HOF.4 Players who played a significant portionof their careers in the Negro Leagues may havereceived votes based on their performance in thoseleagues. However, because we restrict analysis tomajor league performance statistics, which do notcapture the effects of achievements in other leagues,players who spent a significant portion of their careersin the Negro Leagues are excluded from the sample.Pitchers are included in the analysis so long as theypitched at least 100 games with at least half of thoseappearances as a starting pitcher.5 Position playersare included in the sample so long as they had at least1000 at bats. This results in samples consisting of1340 pitchers and 3358 position players.

As of the summer of 2010, there were 292members of the baseball HOF, including players,managers, executives, and pioneers. Table 1 breaksthe total down by method of selection. ExcludingNegro League players, these include 142 positionplayers and 56 starting pitchers.6 The number of theseplayers voted in by the baseball writers, whose votesprovide the data for the DEA, is 104.

Manage. Decis. Econ. 33: 177–188 (2012)DOI: 10.1002/mde

Table 1. Membership of the Baseball Hall ofFame, as of Summer 2010, by Selection Method

Number voted in by the BBWAA 109Number selected by the Veterans’ Committee 157Number selected by special Negro League committees 26

_______Total membership 292

BBWAA, Baseball Writers Association of American.

APPLICATION OF DEA TO VOTING FOR THE BASEBALL HALL OF FAME 181

The output of interest, votes, is the maximumpercentage of votes for inclusion in the HOF that aplayer received during his 15 years of eligibility, upto and including the 2010 vote.7 Players voted intothe HOF primarily as managers are treated as notreceiving any votes. To ensure a solution to the maxi-mization problem, all players who received no votesor have not yet been voted on are treated as havingreceived .01% of the vote.

We ran the model for position players and pitchersseparately as the relevant statistics describing theirperformance are obviously different. The inputs thatwe used to capture the productivity of position playersare as follows (all based on career totals): hits (H),homeruns (HR), batting average (AVG), on-basepercentage (OBP), slugging average (SLG), runsscored (R), runs batted in (RBI), and stolen bases(SB). Although these choices are somewhat arbitrary,they represent the key statistics most often used toevaluate the offensive performance of Major LeagueBaseball players. More complicated and comprehen-sive performance measures have been developed inrecent years (see, e.g., James (1995)), but it is unlikelythat members of the Baseball Writers Association ofAmerican (BBWAA) had access to these morecomplicated statistics when they made past votingdecisions. Finally we also included a player’s careerat bats (AB) to reflect his longevity.8

Note that our analysis focuses solely on offensiveperformance for position players. It has been a stan-dard criticism of HOF selection that it slights playerswho are good defensively but not strong offensively.However, the only readily available defensive statis-tics, such as fielding percentage, assists, and errors,cannot be meaningfully compared across positionsand are often misleading (for example, the best fieldersoften make the most errors because they get to moreballs). We do, however, attempt to capture the roleof defense indirectly by categorizing position playersinto four groups: corner infielders (first and thirdbase), middle infielders (second base and shortstop),outfielders, and catchers.9 Historically, corner infielders

Copyright © 2012 John Wiley & Sons, Ltd.

and outfielders have been stronger offensive players,whereas middle infielders and catchers have been stron-ger defensive players. By breaking down our results onHOF voting into these categories, we will be able toevaluate the claim that there has been a bias amongvoters for players in traditionally offensive positions.

The inputs that we used for starting pitchers arewins (W), strikeouts (K), earned run average (ERA),winning percentage (WP), strikeouts per nine innings(K9), and walks plus hits per inning pitched (WHIP).These statistics have traditionally been used to evaluatethe performance of major league pitchers. Again, morecomprehensive and complicated statistics have beendeveloped but are unlikely to have been available toearlier BBWAA voters. Finally, we also included totalgames started (GS) as a measure of pitcher longevity.

4. RESULTS OF THE BASIC ANALYSIS

Using the results of the standard DEA, we calculatethe maximum possible votes that each player couldhave received given their performance statistics. Withthis efficient vote percentage (EVOTE%), we identifythose players who could have received enough votesto be inducted into the HOF (75% of votes cast). Wealso identify those players who are not yet eligiblefor the HOF (as of 2009) but who would receiveenough votes to be inducted. In our base analysis,the results for position players are derived using apooled sample of all players although we also reportthe results by the aforementioned position categories.A summary of the results is presented in Table 2.10

Among those players eligible for inclusion in theHOF (i.e., those who played at least 10 years and havebeen retired for at least 5 years), there are 177 players,including 130 position players and 47 starting pitch-ers, whose maximum possible vote percentages putthem above the 75% threshold for election to theHOF. As noted, excluding Negro League players andrelief pitchers, there are currently (as of 2009) 142position players and 56 starting pitchers in the HOF,or a total of 198 players (including those elected bythe BBWAA or selected by the Veterans’ Committee).Our results therefore suggest that if the writers hadvoted in a consistent manner throughout history, basedpurely on statistical measures of performance asdescribed earlier, then the population of the HOFwould be nearly double its current size, with 130additional position players and 48 additional startingpitchers being included. Among those players not yeteligible for the HOF (including active players and

Manage. Decis. Econ. 33: 177–188 (2012)DOI: 10.1002/mde

Table 2. Number of PlayersWho Should Be in theHall of Fame (HOF) Based on Data EnvelopmentAnalysis (DEA) Analysis of Voting, (DEA UsingFull Sample)

Eligible forHOF

aNot yeteligible

a

Total

Positionplayers, Total

130 55 185

Cornerinfielders

44 11

Middleinfielders

15 11

Outfielders 69 32Catchers 2 1Startingpitchers

47 11 58

Totals 177 66 243

aAs of 2009.

Table 3. Number of Position Players Who ShouldBe in the Hall of Fame (HOF) by Position, (DataEnvelopment Analysis Restricting Sample toPlayers in the Same Position Category)

Eligible for HOF Not yet eligible

Corner infielder 13 6Middle infielder 8 9Outfielder 8 15Catcher 0 1

Total 29 31

T. J. MICELI AND B. D. VOLZ182

those who have been retired for less than 5 years as of2009), 66 would make the HOF based on past voting,including 55 position players and 11 starting pitchers.

Among position players, the break down byposition in Table 2 reveals the expected bias in favorof players in the traditionally offensive positions ofcorner infield and outfield. In particular, nearly 87%(or 113 out of 130) of eligible players, and 78%(or 43 out of 55) of not-yet-eligible players whoshould be in the HOF played those positions. In aneffort to further examine this bias, we re-calculatedthe efficient vote totals for position players using asthe relevant sample only those players in the sameposition category. This is accomplished by runningseparate models for each position category. Obvi-ously, this exercise substantially reduces the samplesize for each position, which means that there arefewer observations from which to construct the fron-tier. This means that for any player, finding a convexcombination of inputs and outputs that will increasehis percentage of votes is less likely. Therefore, assample size falls players’ efficiency will generallyincrease, causing the number of players with predictedvote percentages above 75% to decrease.

Given that caveat, Table 3 reports the results of therestricted analysis by position. As noted, the totalnumber of players who should be in the HOF is substan-tially lower because of the smaller sample size, fallingfrom 130 to 29 for eligible players and from 55 to 31for not-yet-eligible players. The results neverthelesscontinue to show a bias toward corner infielders andoutfielders although it is less pronounced than in the fullsample, falling from 87% to 72% for eligible playersand from 78% to 68% for not-yet-eligible players.

Copyright © 2012 John Wiley & Sons, Ltd.

4.1. Analysis by Era: the “Old-timer” Bias

The fact that our analysis suggests that the HOF isunderinclusive may reflect “errors” or lax standardsin the past, but it more likely reflects a learningprocess among voters concerning what constitutes“greatness.” As time goes by and the overall qualityof play improves, milestones once thought to beattainable only by the best players have become morecommonplace, and voting standards have adjustedaccordingly.11 Thus, players whose records wouldhave met the standards for selection to the HOF inearlier eras are now excluded because their achieve-ments are no longer seen as meriting enshrinement.For example, pitchers who reached 200 career winsand hitters who reached 500 home runs were oncealmost assured of induction, but they no longer are.

Thus, as in the case of bias by position, our analysisof HOF voting may also embody a bias towardold-time players. To investigate this possibility, wecategorized players according to the following(somewhat arbitrary) “eras”: (1) 1876–1919 (the“dead ball” era); (2) 1920–1946 (the inter-war era);(3) 1947–1989 (the era of integration); and (4)1990–2009 (the steroid era). If a player spanned twoeras, we included him in both.

Table 4 reports the number of eligible players onlywho should be in the HOF broken down by era. Theresults in the left-hand panel are based on the analysisusing the full sample. (Note that the total number ofeligible player who should be in the HOF exceedsthe corresponding number in Table 2. This is becauseplayers who spanned two eras are double counted.) Ininterpreting these results, note that there is an inherentbias against the most recent era (1990–2009) becausesome of these players are not yet eligible. Despite that,the overall data reveal a clear old-timer bias, given thatthe number of players who should be in the HOF, onthe basis of past voting, is clearly tilted to the recenteras. In other words, players from recent eras have

Manage. Decis. Econ. 33: 177–188 (2012)DOI: 10.1002/mde

Table 4. Number of Players Who Should Be in the Hall of Fame, by Era

Full sample Restricted sample

Pos. player Pitcher Total Pos. player Pitcher Total

1876–1919 24 15 39 1 6 71920–1946 22 9 31 2 8 101947–1989 61 27 88 21 25 461990–2009 25 13 38 13 12 25

Note: Totals reflect double counting of players who spanned two eras.

APPLICATION OF DEA TO VOTING FOR THE BASEBALL HALL OF FAME 183

been more likely to receive lower vote totals than theirrecords warrant, indicating that voters have increasedthe standards for selection over time.

To further investigate this bias, we re-calculated theefficient vote totals for each player with the use of onlyplayers from the same position and era as the sample. Aswith the position analysis, this reduces the sample and,hence, the number of players meeting the 75% thresh-old. The results of this exercise are reported in theright-hand panel of Table 4. Note that the old-timer biaspersists; if anything, it becomes more pronounced. Thisfurther supports the argument that voters have “over-populated” the HOF with players from earlier eras.

Figure 2. Super-efficiency model.

5. SUPER-EFFICIENCY

The basic DEA method as described to this point canonly identify those players who are not in the HOF,or are not yet eligible, who should be included onthe basis of past voting patterns. This is true becauseDEA can only increase a player’s vote percentagecompared with the percentage that he actuallyreceived. Thus, if he actually received greater than75% and hence was elected to the HOF, DEA wouldnever be able to “downgrade” him. As a consequence,the analysis to this point cannot address the other halfof the HOF debate; namely, which players currentlyin the HOF should not have been selected (either byvote or by the Veterans’ Committee). Fortunately, itturns out that there is a DEA technique, referred toas “super-efficiency,” that allows us to address thisquestion.

To understand this technique, note that because V0

is a feasible level of output, the maximum value of θmust be greater than or equal to 1. Therefore, the tech-nical efficiency measure will always lie in the closedinterval from 0 to 1. Furthermore, all players who lieon the PPF will, by definition, have a technical effi-ciency equal to 1. This implies that their maximumfeasible output is equal to their observed output. Forthese players, their maximum feasible output is based

Copyright © 2012 John Wiley & Sons, Ltd.

on their own input–output bundle, not on the otherobservations in the sample. In order to eliminate thisproblem, the same model can be run with the addedconstraint that the l of the player under considerationbe set equal to zero. This results in the frontier beingbased on all observations other than the player underconsideration. This measure, which is referred to assuper-efficiency, was first proposed by Anderson andPetersen (1993).

The resulting super-efficiencymodel is shown graph-ically in Figure 2 for the single-input, single-output case.In this example, player A is located on the efficiencyfrontier and therefore, by definition, has a technicalefficiency equal to one. In order to calculate the super-efficiency measure for player A, we compare that playerto a frontier created using all players except for player A.This frontier is shown in Figure 2 by the solid line con-necting players B and D. Player A’s efficiency relativeto this frontier is calculated as the sum of distances Sand T divided by T. This value will be greater than onefor all players who lie on the original frontier. Forplayers who are “inefficient” (i.e., lie inside the frontier),the value of this measure will be the same as that of theoutput-oriented technical efficiency presented previously.

The advantage of a super-efficiency measure is thatthose players who are “efficient” (i.e., on the originalfrontier) will now have a measure of the maximumvotes that they could have received, based on the otherobservations in the sample rather than their own. In

Manage. Decis. Econ. 33: 177–188 (2012)DOI: 10.1002/mde

able 6. Number of Players in the Hall of Fameho Should Have Received Less Than 75% of

he Vote, by Era (Super-efficiency Using Full Sample)

T. J. MICELI AND B. D. VOLZ184

this manner, it is possible to identify players who actu-ally should have received fewer votes on the basis ofthe observed voting behavior. In particular, we canidentify those players who are in the HOF but shouldnot have received the necessary threshold of 75% ofthe vote on the basis of this analysis. This groupincludes players on the PPF who, on the basis of theirsuper-efficiency measure, should have received fewervotes. It also includes players who were selected forthe HOF via means other than the BBWAA vote(specifically, by the Veterans’ Committee).

Table 5 summarizes the results of the super-efficiencymodel, categorized by position and by manner of selec-tion to the HOF. The results show that only 11 playersvoted in by the BBWAA would not have received thenecessary votes to be selected for the HOF, whereas 38players selected by the Veterans’ Committee would nothave received the necessary votes. This reflects the con-ventional wisdom that the Veterans’ Committee is a kindof “back door” into the HOF, which has perhaps dilutedthe standards for admission. In some sense, however, thismust be the case because the Veterans’ Committee onlyconsiders players whom the BBWAA did not select dur-ing their 15 years of eligibility, and because this commit-tee is comprised of former players, there is certainly thepossibility of bias and cronyism in its selection process.

On the other hand, many would argue that theVeterans’ Committee serves an essential purpose bycorrecting errors or oversights by the writers. Onesuch example is the bias toward offensive positionsidentified earlier. In fact, Table 5 shows that 19 outof 33 position players (58%) who “should not” be inthe HOF based purely on voting data but were selectedby the Veterans’ Committee are middle infielders orcatchers. (Note that no such bias appears among those

Table 5. Number of Players in the Hall of FameWho Should Have Received Less Than 75% ofthe Vote, by Position (Super-efficiency UsingFull Sample)

Voted in byBBWAA

Selected byVeterans’Com. Tota

Position players, Total 9 33 42Corner infielders 2 3Middle infielders 3 14Outfielders 2 11Catchers 2 5

Starting pitchers 2 5 7

Total 11 38 49

BBWAA, Baseball Writers Association of American.

Table 7. “Efficient” Hall of Fame (HOF)

Positionplayers

Startingpitchers Tota

HOF as of 2009a

142 56 198Number of these replaced inthe efficient HOF

55 15 70

Threshold % of votes neededfor inclusion

83.71 81.38

aExcluding Negro League players, relief pitchers, and non-players.

Voted in byBBWAA

Selected by Veterans’Com. Total

876–1919 1 16 17920–1946 5 18 23947–1989 6 5 11990–2009 1 0 1

Copyright © 2012 John Wiley & Sons, Ltd.

l

voted in by the writers.) This suggests some effort bythe Committee to reward defense standouts.

Table 6 shows the breakdown of the super-efficiencyanalysis by era. The results show a clear (and expected)bias towardold-timeplayersbytheVeterans’Committee.Of course, this reflects theCommittee’s intendedpurposeof re-evaluating players and selecting those who aredeserving of enshrinement but were overlooked by thewritersbecause theywerenot popular during their careersor because their accomplishments or importance to thegame were not fully appreciated in their time. Thus, theexistence of theVeterans’Committee builds an old-timerbias into theHOF selection process.

6. CONSTRUCTING AN “EFFICIENT” HALLOF FAME

Finally, in order to compare the current HOF to one inwhich all players received an efficient percentage of votes,we construct a “hypothetical” HOF on the basis of the ef-ficient vote percentages from this analysis. Again, exclud-ing Negro League players, relief pitchers, and non-playersfrom the analysis, there are currently 142 position playersand 56 starting pitchers in the HOF. A hypothetical HOFconstrained to have the same number of starting pitchersand position players will necessarily result in some playerscurrently in the HOF being excluded and other eligibleplayers who were not selected being added. The resultsof this analysis, using the full sample, are presented inTable 7.12

TWt

1111

Manage. Decis. Econ. 33: 177–188 (2012)DOI: 10.1002/mde

l

APPLICATION OF DEA TO VOTING FOR THE BASEBALL HALL OF FAME 185

Our proposed “efficient” HOF results in 70current HOF members (35.4%), consisting of 55position players (38.7%) and 15 starting pitchers(26.8%), being replaced by other players.13 Thespecific players along with their replacements arelisted in Appendix A. As an illustration, considertwo players who have historically been the centerof much debate. According to our analysis, GilHodges, who is not currently in the HOF, shouldbe included. In particular, he should have received91.23% of the vote on the basis of his careerrecord, which is enough to merit inclusion in ourefficient HOF.

In contrast, Phil Rizzuto, who was selected by theVeterans’ committee in 1994, should not be included.According to our results, Rizzuto should only havereceived 58.70% of the vote based on his careerrecord, which is not enough to make the cut in ourefficient HOF. Of course, Rizzuto did not receiveenough actual votes to make the HOF either(suggesting that the baseball writers got it right), buthis many advocates have argued that his defensiveability, the fact that he missed several years duringhis prime to WWII and his intangible value to severalchampionship teams, were not captured by his careerstatistics, and thus, it was appropriate for the Veterans’Committee to rectify the error.

A similar argument obviously applies to JackieRobinson and Larry Doby, who both fail to make thecut for our efficient HOF based purely on their careerstatistics (Robinson’s efficient vote is 77.50%, and thatof Doby is 74.70%). However, in consideration of theiroverall contribution to game (and the fact that they weredeprived of several productive years by the ban onblack players), they clearly deserve enshrinement. Onthe other hand, Pete Rose easily makes the cut for ourHOF on the basis of his career statistics (which yieldan efficient vote of 93.34%), but his admitted gamblingon baseball has resulted in a lifetime ban from theHOF. The impact of admitted (or suspected) steroiduse among current and recently retired players castsa similar shadow on their HOF prospects, regardlessof their career statistics.

Finally, Table 7 shows the minimum percentageof votes that a player has to receive for inclusionin our efficient HOF. For position players, it isabout 84%, and for starting pitchers, it is about 81%,both significantly higher than the current thresholdof 75%. As discussed earlier, this reflects the con-sistent application of the higher standard for selec-tion that has emerged over time as the quality of playhas improved.

Copyright © 2012 John Wiley & Sons, Ltd.

7. CONCLUSION

This paper used DEA to examine the outcome ofvoting for the Baseball HOF. This approach interpretsa player’s career statistics as the inputs into aproduction process that yields the percentage of affir-mative votes for inclusion in the HOF as the output.A constructed frontier based on past voting defines aplayer’s “efficient vote” or the maximum percentageof votes that he should receive given his statisticalprofile. Using this frontier, we compared the actualmembership of the HOF to the membership that wouldhave emerged if the standards implicit in past votingbehavior had been efficiently applied, that is, if playershad received the maximum percentage of votes thattheir career records warranted. Our results indicatedthat approximately one-third of current members ofthe HOF should be replaced by other, more deservingplayers, holding total membership fixed.

Throughout the analysis, we noted several qualifi-cations of our approach. Most importantly, it ignoresthose aspects of a player’s career that are not wellreflected in his offensive or pitching statistics butwhich are legitimate factors in deciding on his admis-sibility. This deficiency is best epitomized, on onehand, by Jackie Robinson, whose record did not meetthe minimum standard based on our analysis, but whoclearly merits inclusion in the HOF on the basis of thetotality of his contribution to the game, and on theother hand, by Pete Rose, who easily made the cutbased on statistics alone, but whommany feel is justlybanned from inclusion because of his admittedgambling on baseball. Factors such as these suggestthat purely statistical measures will never fully reflecta player’s fitness for inclusion in, or exclusion from,the HOF. Thus, the debate will go on.

NOTES

1. To be entitled to vote, a writer must have been a mem-ber of the Baseball Writers Association of America(BBWAA) for at least 10 years. A player who has beenretired for at least 5 years remains on the ballot for15years. After that, he can still be selected by the Veterans’Committee after waiting 5more years. See James(1995, pp. 35–37) or the Hall of Fame website (http://baseballhall.org) for a complete list of selection criteria.

2. Other non-economic approaches that have recently beenused to address the HOF selection process includeartificial neural networks (Young et al., 2008), radialbasis function networks (Smith and Downey, 2009),and random forests (Freiman, 2010).

3. This manner of incorporating a negative input followsthe approach of Lewis and Sexton (2004).

Manage. Decis. Econ. 33: 177–188 (2012)DOI: 10.1002/mde

T. J. MICELI AND B. D. VOLZ186

4. Although professional baseball existed before 1876,the National League, which was formed in 1876, istraditionally recognized as the first major league.

5. It is only in recent decades that relief pitchers (primarilyclosers) have emerged as specialists in professionalbaseball. Because the statistical criteria for evaluatingrelief pitchers is still unclear and because only a fewhave been inducted into the HOF, we chose to excludethem from the analysis.

6. Of the position players, 74 were elected by the BBWAAand 68 selected by the Veterans’ Committee. Of thestarting pitchers, 30 were elected by the BBWAA and26 chosen by the Veterans’ Committee.

7. Remember that players are eligible to receive votes for15 years. Thus, we use the maximum percentage that aplayer received either until he was elected (i.e., receivedat least 75% of votes cast), or was removed from eligibility.

8. See Findlay and Reid (2002, p. 101), who argue that thiscould be a positive input (valuable players have longercareers) or a negative input (a weaker player would takelonger to amass a given statistical portfolio).

APPENDIX

Changes to Hall of Fame Based on Efficient Votes

(DEA with full sample)In Out

osition player Seasons EVOTE% Position player Seasons EVOTE%

red McGriff 1986–2004 96.90 Rick Ferrell 1929–1947 34.03ndres Galarraga 1985–2004 96.54 Ernie Lombardi 1931–1947 37.23wight Evans 1972–1991 96.33 Lloyd Waner 1927–1945 42.60hili Davis 1981–1999 96.14 Ray Schalk 1912–1929 44.84oberto Alomar 1988–2004 95.90 Frank Chance 1898–1914 46.90ave Parker 1973–1991 95.80 Hughie Jennings 1891–1918 49.82ale Murphy 1976–1993 94.69 Nellie Fox 1947–1965 51.13im Raines 1979–2002 93.47 Roger Bresnahan 1897–1915 53.63ete Rose 1963–1986 93.34 Johnny Evers 1902–1929 54.03arold Baines 1980–2001 92.93 Ross Youngs 1917–1926 58.37usty Staub 1963–1985 92.88 Phil Rizzuto 1941–1956 58.70ill Dahlen 1891–1911 92.40 Dave Bancroft 1915–1930 59.12llis Burks 1987–2004 92.23 Billy Herman 1931–1947 60.92on Baylor 1970–1988 91.43 Joe Tinker 1902–1916 61.08ada Pinson 1958–1975 91.42 Bill Mazeroski 1956–1972 61.55ill Clark 1986–2000 91.29 Tommy McCarthy 1884–1896 62.09il Hodges 1943–1963 91.23 Richie Ashburn 1948–1962 62.56immy Ryan 1885–1903 91.04 Joe Sewell 1920–1933 63.98ob Johnson 1933–1945 90.89 Earle Combs 1924–1935 64.59aul O’Neill 1985–2001 90.89 Chick Hafey 1924–1937 66.96dgar Martinez 1987–2004 90.63 Buck Ewing 1880–1897 67.77arry Larkin 1986–2004 90.33 Elmer Flick 1898–1910 68.38teve Garvey 1969–1987 89.73 Red Schoendienst 1945–1963 70.84l Oliver 1968–1985 89.71 Freddie Lindstrom 1924–1936 70.90ickey Vernon 1939–1960 89.57 Billy Hamilton 1888–1901 71.24ose Canseco 1985–2001 89.50 Travis Jackson 1922–1936 72.24lan Trammell 1977–1996 89.39 Bobby Wallace 1894–1918 72.44eggie Smith 1966–1982 89.21 George Kelly 1915–1932 72.69en Boyer 1955–1969 89.20 Hack Wilson 1923–1934 72.99ou Whitaker 1977–1995 89.08 King Kelly 1878–1893 74.36ark Grace 1988–2003 88.69 Joe Gordon 1938–1950 74.54eorge Van Haltren 1887–1903 88.43 John Ward 1878–1894 74.64

P

FADCRDDTPHRBEDVWGJBPEBSAMJARKLMG

Copyright © 2012 John Wiley & Sons, Ltd.

9. Players were assigned to the position at which theyplayed the majority of games. Those (few) playerswho played a majority of their games at designatedhitter were assigned to the position category basedon where they played the next most games.

10. Tables including the detailed results of the analysis inthis and subsequent sections are available from theauthors on request.

11. On evolving standards of play, see Gould (2003, pp.151–172).

12. We also calculated the members of a hypothetical HOFwhen players were categorized by position and era.Because they are similar to the full sample case, we donot report them here. They are, however, available onrequest from the authors.

13. After the analysis of this paper was completed,Roberto Alomar and Bert Blyleven were elected tothe HOF. Note that Alomar is listed fifth in ourranking of position players who should be in, whereasBlyleven is listed first among pitchers who shouldbe in.

(Continues)

Manage. Decis. Econ. 33: 177–188 (2012)DOI: 10.1002/mde

(Continued)

Fred Lynn 1974–1990 87.96 Larry Doby 1947–1959 74.70Dick Allen 1963–1977 87.85 Ralph Kiner 1946–1955 75.21Ron Santo 1960–1974 87.51 Frank Baker 1908–1922 75.32Keith Hernandez 1974–1990 86.95 Jimmy Collins 1895–1908 75.39Jose Cruz 1970–1988 86.84 George Kell 1943–1957 75.54Gary Matthews 1972–1987 86.65 Lou Boudreau 1938–1952 77.33Jack Clark 1975–1992 86.65 Gabby Hartnett 1922–1941 77.38Minnie Minoso 1949–1980 86.56 Jackie Robinson 1947–1956 77.50Cesar Cedeno 1970–1986 86.13 Luke Appling 1930–1950 78.80Joe Judge 1915–1934 85.89 Mickey Cochrane 1925–1937 79.50Joe Kuhel 1930–1947 85.67 Bill Dickey 1928–1946 80.16Bobby Bonilla 1986–2001 85.65 Sam Thompson 1885–1906 80.41Wally Joyner 1986–2001 85.59 Pie Traynor 1920–1937 80.93Cy Williams 1912–1930 85.53 Johnny Mize 1936–1953 82.09Amos Otis 1967–1984 85.32 Willie Stargell 1962–1982 82.24Sherry Magee 1904–1919 85.19 Joe Kelley 1891–1908 82.75George Foster 1969–1986 85.14 Arky Vaughan 1932–1948 82.79Bob Elliott 1939–1953 85.07 Pee Wee Reese 1940–1958 82.81Buddy Bell 1972–1989 85.05 Harmon Killebrew 1954–1975 82.92Bobby Bonds 1968–1981 84.98 Rabbit Maranville 1912–1935 82.94Cecil Cooper 1971–1987 84.79 Gary Carter 1974–1992 83.06Bobby Murcer 1965–1983 84.73 Tony Lazzeri 1926–1939 83.67Darrell Evans 1969–1989 83.71 Bill Terry 1923–1936 83.67

Pitcher Seasons EVOTE% Pitcher Seasons EVOTE%

Bert Blyleven 1970–1992 92.51 Addie Joss 1902–1910 63.26Jim Kaat 1959–1983 91.92 Jesse Haines 1918–1937 73.97Tommy John 1963–1989 91.57 Lefty Gomez 1930–1943 74.19Tony Mullane 1881–1894 89.19 Joe McGinnity 1899–1908 74.41Jim McCormick 1878–1887 88.18 Stan Coveleski 1912–1928 75.23Dennis Martinez 1976–1998 86.92 Rube Marquard 1908–1925 75.32Luis Tiant 1964–1982 86.59 Bob Lemon 1946–1958 78.61Charlie Buffinton 1882–1892 86.01 Rube Waddell 1897–1910 78.98Jack Morris 1977–1994 85.35 Dizzy Dean 1930–1947 79.17Bob Welch 1978–1994 83.97 Mordecai Brown 1903–1916 79.40Eddie Cicotte 1905–1920 82.30 Jack Chesbro 1899–1909 79.75Jack Quinn 1909–1933 82.23 Hal Newhouser 1939–1955 80.86Vida Blue 1969–1986 81.88 Don Drysdale 1956–1969 80.90Orel Hershiser 1983–2000 81.44 Herb Pennock 1912–1934 81.22Dave McNally 1962–1975 81.38 Dazzy Vance 1915–1935 81.35

In Out

Position player Seasons EVOTE% Position player Seasons EVOTE%

APPLICATION OF DEA TO VOTING FOR THE BASEBALL HALL OF FAME 187

REFERENCES

Anderson P, Peterson N. 1993. A procedure for rankingefficient units in data envelopment analysis. ManagementScience 39: 1261–1264.

Banker RD, Charnes A, Cooper WW. 1984. Some modelsfor estimating technical and scale inefficiencies in dataenvelopment analysis. Management Science 30: 1078–1092.

Desser A, Monks J, Robinson M. 1999. Baseball Hall ofFame voting: a test of the customer discrimination hypoth-esis. Social Science Quarterly 80: 591–603.

Einolf K. 2004. Is winning everything? A data envelop-ment analysis of major league baseball and the na-tional football league. Journal of Sports Economics5: 127–151.

Copyright © 2012 John Wiley & Sons, Ltd.

Findlay D, Reid C. 2002. A comparison of two voting mod-els to forecast election into the National Baseball Hall ofFame. Managerial and Decision Economics 23: 99–113.

Findlay D, Reid C. 1997. Voting behavior, discriminationand the National Baseball Hall of Fame. Economic Inquiry35: 562–578.

Freiman M. 2010. Using random forests and simulatedannealing to predict probabilities of election to the Base-ball Hall of Fame. Journal of Quantitative Analysis inSports 6, Article 12.

Gould S. 2003. Triumph and Tragedy in Mudville. NewYork: W. W. Norton & Co.

James B. 1995. Whatever Happened to the Hall of Fame?Baseball, Cooperstown, and the Politics of Glory. NewYork: Fireside.

Manage. Decis. Econ. 33: 177–188 (2012)DOI: 10.1002/mde

T. J. MICELI AND B. D. VOLZ188

Jewel R. 2003. Voting for the Baseball Hall of Fame: theeffect of race on election date. Industrial Relations 42:87–99.

Jewel R, Brown R, Miles S. 2002. Measuring discriminationin Major League Baseball: evidence from the BaseballHall of Fame. Applied Economics 34: 167–177.

Lewis H, Sexton T. 2004. Data envelopment analysis withreverse inputs and outputs. Journal of ProductivityAnalysis 21: 113–132.

Lewis H, Sexton T, Lock K. 2007. Player salaries,organizational efficiency, and competitiveness inmajor league baseball. Journal of Sports Economics8: 266–294.

Copyright © 2012 John Wiley & Sons, Ltd.

Ray S. 2004. Data Envelopment Analysis: Theory andTechniques for Economics and Operations Research.Cambridge, UK: Cambridge Univ. Press.

Smith L, Downey J. 2009. Predicting Baseball Hall of Famemembership using a radial basis function network. Journalof Quantitative Analysis in Sports 5, Article 6.

Volz B. 2009. Minority status and managerial survival inMajor League Baseball. Journal of Sports Economics10: 522–542.

Young W, Holland W, Weckman G. 2008 Determining Hallof Fame status for Major League Baseball using an artifi-cial neural network. Journal of Quantitative Analysis inSports 4, Article 4.

Manage. Decis. Econ. 33: 177–188 (2012)DOI: 10.1002/mde