how nate silver did it by michael lieberman

Upload: jose-roberto-bonifacio

Post on 02-Apr-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 How Nate Silver Did It by Michael Lieberman

    1/2

    Reprinted from the J anuary 28, 2013 e-newsletter. 2013 Quirks Marketing Research Review (www.quirks.com).This document is for Web posting and electronic distribution only. Any editing or alteration is a violation of copyright.

    To purchase paper reprints of this article, please contact Quirks Editor Joe Rydholmat 651-379-6200 x204 or at [email protected].

    How Nate Silver did it| Michael Lieberman

    wrong too. However, the average of poll s is always more accu-rate. And a systematic probability model of the average of polls isalmost always right. Think of it as political crowdsourcing.

    One of the best modelsSilver has built one of the best models out there. Its accurate,consistent and totally stati stical. One advantage of being totallystatistical is that his model can be replicated. This article willreview the process and explain how Silver buil t his model, whathis results mean and how to use them going forward.

    The basicsTo run Silvers model, you will need Microsoft Excel; a source ofcampaign f inance and election data; and hi storical data to setpolling weights.

    The fi rst step is to calculate the poll weight, whi ch i s a mea-sure of how much an individual poll counts when averaged withother polls. The poll weight consists of three values:

    Recency: The older a poll, the lower the accuracy. A recencyfactor is calculated using a relatively simple decay function.Think of a poll as having a shelf-li fe - the longer on the shelf,the less potent the poll is.

    Sample size: When shown on television, a poll might have aspread of +/ - 4 percent . This spread is calculated using samplesize. As a general rule, the larger the sample size, the moreaccurate the poll.

    Pollster rating: Silver alludes to how his polling does this in a2010 blog. He does not, however, completely reveal his secret

    During the 2012 election, Nate Silver drew fire for his projec-tions.Joe Scarborough, the conservative host ofMorning Joeon MS-

    NBC, attacked Silver during the election and Politico.com calledhim a one-term celebrity, saying, For all the confidence Silverputs in hi s predictions, he often gives the impression of hedg-ing. (Later, Silver replied that Politico covers politics like sportsbut not in an intelligent way at all .)

    But in the end, Silver beat them all . (And Scarborough even-tually apologized, sort of, acknowledging that Silver did get itright.)

    For those who dont know, Nate Silver writes the FiveThir-tyEight blog in The New York Timesand is the bestselling author ofThe Signal and the Noise. In the book, Silver examines the world ofprediction, investigating how we can distinguish a true signalfrom a universe of noisy data. It is about prediction, probabilityand why most predictions fail - most, but not all.

    What had folks attacking Silver was this: He predicted elec-tions far more accurately than most pollsters and pundits onPoli tico, The Drudge Report, MSNBCand others. In his book,Silver described his model as bringingMoneyballto politics. Thatis, producing statistically-driven results. Silver actually popular-ized the use of probabilistic models in predicting elections byproducing the probability of a range of outcomes, rather than

    just who wins. When a candidate is at, say, a 90 percent chanceof winning, Silver will call the race. What made Silver famouswas his extremely accurate prediction of voter percentages - anarea where pundits are almost always far off the mark. Andloath as pollsters may be to admit it, polls are almost always

  • 7/27/2019 How Nate Silver Did It by Michael Lieberman

    2/2To purchase paper reprints of this article, please contact Quirks Editor Joe Rydholmat 651-379-6200 x204 or at [email protected].

    sauce. Without going into too much statistical detail, Silveruses histori cal data and regression analysis to create an ac-curacy measure for pollsters. Better pollsters have positiveratings; worse have negative ratings.

    After the information is created, the next step is to createa weighted polling average. That is, take the mean of each pollwithin the state using the three weights descri bed above. Forsmaller races, like congressional or state races, polling datamight be scarce, particularly in uncontested races. However,presidential contests, as we know, offer a deluge of data to beplugged in. Silver does not say exactly how he combines theweights. I multiply them and then weight the polls.

    ErrorA weighted polling average, l ike all averages, contains an errorand a weighted mean. The weighted mean is the exact result- the one number that pops out of the calculation. Error is theaverage distance of each data point to the weighted mean. Increating a polling prediction, we utilize the error around theweighted mean. The smaller the average distance around theweighted mean - the error - the more accurate the poll.

    When examining what Silver considers important in in-terpreting error, we get a good snapshot of what makes a pollaccurate and what makes a poll less accurate:

    Error is higher in races with fewer polls.Error is higher in races where the polls disagree with each other.Error is higher in races with a large number of undecided voters.Error is higher when the margin between the two candidates islopsided.Error is higher the more days prior to Election Day the poll is conducted.

    The presidential simulationSilver predicts a lot of races: U.S. House, U.S. Senate and stategovernorships. The mother of all elections is, of course, thepresidential.

    If I were going to construct Silvers model for the pesidentialelection, I would set up 51state worksheets in Excel. Each stateworksheet would contain the polling data and weights for astate. We conf igure the 51worksheets so each poll has its result,i ts weight and its error. For one run of a simulati on, each pollwould have one value, producing one weighted average for thestate. The winner would then be declared. Excel would assignthe electoral votes for that state. The front worksheet of myNate Silver model would show all 51states, tally who gets morethan 270 electoral votes and predict the winner.

    However, if you run the simulati on, say, 10 mil lion t imes,each poll has results that bounce around within i ts error, spit-ting out 10 million possible outcomes. When arrayed in a cumu-lative chart , all possible results are shown.

    Exactly what Silver meantOne week before the 2012 President ial election, Si lverreported that President Obama had a 73 percent chance of

    being reelected. Of course, the predicti on caused howls fromFox News. But whi le they bayed in protest, none explainedexactly what Silver meant.

    Si lver ran his model eight days before the electi on. As Istated earl ier, pol ls become more accurate closer to Electi onDay. Lets say that Silver ran his model 10 mil l ion t imes (wi tha new laptop this would take, oh, about four minutes). Withstates such as New York, Cali forni a, Texas or Georgia, t he out-come was never in doubt. But in swing states such as Virginia,Florida and particularly Ohio, the polls were too close to call.The winners may change for dif ferent i terations. I f one runsthe al l possible iterations and combinati ons (and I would saythat 10 mil l ion would probably cover i t), then one can sayhow many t imes each side triumphs.

    When Silver ran hi s models with the most current polls,7.3 mil lion times President Obama came out with more than270 electoral votes; Mitt Romney won 2.7 million times. Thus,pronounced Silver, President Obama had a 73 percent chance ofwinning because he won 73 percent of the 10 million simulations.

    Predicting the actual vote percentages is a li ttle more dif-ficult . However, when one had as much data as Silver and theability the run the simulations mil lions of times, the actual votecount will converge to the real number, much l ike crowdsourcingguesses often converge to the result.

    Practical usesPractical uses of Silvers model are abundant and not solely on apresidential level. For example, i f someone is working for a cam-paign in which the candidate is leading in the polls by 48 percentto 46 percent - a margin that in reality is a statistical tie - amonth or two before Election Day, how likely is that candidate toactually win? And if the candidate is behind by five points withone month to go, how much ground does the campaign reall yneed to make up?

    A prediction model can answer these questions. I f one candi-date is leading by f ive points one month prior to Election Day inthat or similar districts, 80 percent of the time s/he wins. Thiscan be arrived at by looking at historical data or by plugging inall the current polls and financial data and running the simula-tion 10 mil lion t imes.

    Opinions and predictionsPolitical pundits like Dick Morr is, Rush Limbaugh and MattDrudge are paid to fill air time and give their opinions. Theiropinions and predictions are almost always wrong. By contrast,Silver scientifically boils down real data and makes accuratepredictions. The coming of age of probabil istic models in main-stream political modeling was brought about by Nate Silver andit is here to stay. I ts called math.

    Michael Lieberman is founder and president of Multivariate Solutions,a New York research firm. He can be reached at 646-257-3794 or [email protected]. This article appeared in the January 28,2013, edition of Quirks e-newsletter.