a winning strategy for lotto games?

12
The Canadian Journal of Srafisrics Vol. 18, No. 3, 1990, Pages 233-244 La Revue Canadienne de Sfafistique 233 A winning strategy for lotto games? Harry JOE The University of British Columbia Key words and phrases: lotto games, distribution of k-tuples, probabilistic modelling. AMS 1985 subject class,$cations: 6OC05, 62P99, 62F99 ABSTRACT In lotto games, the distribution of k-tuples chosen by participants is not uniform, but the chance of any k-tuple being the winner is the same. The winning categories consist of matching exactly k - i numbers from the winning k-tuple for i = 0, 1, ... , m for some m. The total prize pool for a category is divided equally among all the winning tickets in the category. Therefore the strategy of buying a ticket with a k-tuple consisting of unpopular numbers will increase the expected amount of the prize if this k-tuple is a winner in some category, because the prize pool is shared among fewer tickets. By modelling the distribution of 6-tuples chosen by participants of Lotto 6/49 in Canada, the expected return and standard deviation of return can be computed. It is shown that the expected return can be more than the amount spent when the carryover is large, but the large standard deviation means that it would take tens of thousands of years to millions of years for the strategy to have a high probability of yielding a profit. RESUME Dans les jeux de loterie populaires la distribution des k-uples choisis par les participants n’est pas uniforme. Toutefois, tous les k-uples ont la mCme probabilitk d’itre gagnant. Les catigories gagnantes sont constituhs des k-uples ayant exactement k - i nombres identiques aux 616ments du k-uples gagnant, i = 0, 1, . .. , m, pour un certain m. Le prix total alloue i une cat6gorie est divis6 tgalement parmi les billets gagnants dans cette categorie. I1 s’ensuit que de choisir un k-uple form6 de nombres peu populaires va accroitres le gain es$r6 si ce k-uple est gagnant dans une cathgorie, car alors le lot sera partag6 entre moins de billets. En modelisant la loi des 6-uples choisis par les participants i la lotto 6/49, on peut calculer l’esgrance et I’6cart-type du gain. On montre que le gain espkr6 peut Ctre plus grand que la mise lorsque le lot report6 est grand. Toutefois, comme 1’6cart-type est grand il pourrait prendre entre dix mille et des millions d’ann6es avant que la strategie consistant i miser sur des num6ros impopulaires ait de bonnes chances de porter fruit. 1. INTRODUCTION. Lotto games, where participants buy a ticket by choosing k numbers out of N and win prizes if they match at least k - m numbers from the winning k-tuple, have become increasingly popular over the past few years. They exist now in Canada, in many states of the U.S., and in European countries. For example, Canada’s Lotto 6/49 (with k = 6, N = 49, and m = 3) has existed since 1982. For the history and other aspects of lotto games, see Ziemba et al. (1986). In this paper, a k-tuple is an unordered set of k numbers or a combination of k numbers (from N). We let the expression (k - i)/k denote the event of matching exactly k - i numbers of the winning k-tuple and let (k - i) + /k or (k - i)/k+ denote the event of matching exactly k - i numbers of the winning k-tuple plus the bonus number (which is different from the numbers in the winning k-tuple). In Canada, there is a

Upload: harry-joe

Post on 01-Oct-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

The Canadian Journal of Srafisrics Vol. 18, No. 3, 1990, Pages 233-244 La Revue Canadienne de Sfafistique

233

A winning strategy for lotto games? Harry JOE

The University of British Columbia

Key words and phrases: lotto games, distribution of k-tuples, probabilistic modelling. AMS 1985 subject class,$cations: 6OC05, 62P99, 62F99

ABSTRACT

In lotto games, the distribution of k-tuples chosen by participants is not uniform, but the chance of any k-tuple being the winner is the same. The winning categories consist of matching exactly k - i numbers from the winning k-tuple for i = 0, 1, . . . , m for some m. The total prize pool for a category is divided equally among all the winning tickets in the category. Therefore the strategy of buying a ticket with a k-tuple consisting of unpopular numbers will increase the expected amount of the prize if this k-tuple is a winner in some category, because the prize pool is shared among fewer tickets. By modelling the distribution of 6-tuples chosen by participants of Lotto 6/49 in Canada, the expected return and standard deviation of return can be computed. It is shown that the expected return can be more than the amount spent when the carryover is large, but the large standard deviation means that it would take tens of thousands of years to millions of years for the strategy to have a high probability of yielding a profit.

RESUME

Dans les jeux de loterie populaires la distribution des k-uples choisis par les participants n’est pas uniforme. Toutefois, tous les k-uples ont la mCme probabilitk d’itre gagnant. Les catigories gagnantes sont constituhs des k-uples ayant exactement k - i nombres identiques aux 616ments du k-uples gagnant, i = 0, 1, . . . , m, pour un certain m. Le prix total alloue i une cat6gorie est divis6 tgalement parmi les billets gagnants dans cette categorie. I1 s’ensuit que de choisir un k-uple form6 de nombres peu populaires va accroitres le gain es$r6 si ce k-uple est gagnant dans une cathgorie, car alors le lot sera partag6 entre moins de billets. En modelisant la loi des 6-uples choisis par les participants i la lotto 6/49, on peut calculer l’esgrance et I’6cart-type du gain. On montre que le gain espkr6 peut Ctre plus grand que la mise lorsque le lot report6 est grand. Toutefois, comme 1’6cart-type est grand il pourrait prendre entre dix mille et des millions d’ann6es avant que la strategie consistant i miser sur des num6ros impopulaires ait de bonnes chances de porter fruit.

1. INTRODUCTION.

Lotto games, where participants buy a ticket by choosing k numbers out of N and win prizes if they match at least k - m numbers from the winning k-tuple, have become increasingly popular over the past few years. They exist now in Canada, in many states of the U.S., and in European countries. For example, Canada’s Lotto 6/49 (with k = 6, N = 49, and m = 3) has existed since 1982. For the history and other aspects of lotto games, see Ziemba et al. (1986).

In this paper, a k-tuple is an unordered set of k numbers or a combination of k numbers (from N). We let the expression (k - i ) / k denote the event of matching exactly k - i numbers of the winning k-tuple and let (k - i) + / k or (k - i ) / k + denote the event of matching exactly k - i numbers of the winning k-tuple plus the bonus number (which is different from the numbers in the winning k-tuple). In Canada, there is a

234 JOE Vol. 18, No. 3

weekly called Luck which includes (1) summaries of winning numbers and amounts of (k - i ) / k prizes, i = 0, . . . , m, in the past 52 weeks (2) univariate histograms for the frequencies of the winning numbers from 1 to N since the beginning of the lotto games, and (3) univariate histograms of the frequencies of numbers chosen by participants during the week. There is no direct information on how combinations are chosen, but a little information about this can be obtained from the amounts of the prizes (or equivalently the numbers of winners in each category, since the prize pool for a category is shared among the winners in the category) as functions of the winning k-tuple. The univariate histograms for the participants maintain a similar pattern that hasn’t change much over time-7 and a few other numbers are popular numbers, and multiples of 10 and the larger numbers are unpopular numbers. Goodness-of-fit tests show that deviations of the univariate histogram for the winning numbers from a uniform distribution can be explained by chance variation. Since one will assume that all k-tuples are equally likely to be the winning k-tuple unless there is strong evidence to the contrary, we assume throughout the paper that the probability of winning a category, (k - i)/k is the same for any k-tuple. However, the expected amount of the prize is larger for a k-tuple which is much less frequently chosen by participants and is such that other k-tuples sharing, many common numbers with it are much less frequently chosen, since in this case the prize pool for each winning category is shared among fewer winners.

The main contribution of this paper is the technique for computing expected returns given a model, rather than the development of a statistical methodology. In Section 3, we compute expected returns and standard deviations of return for different choices of k-tuples, and in Section 4, we analyze the “picking unpopular numbers” strategy for Lotto 6/49 using the rules for Canada. Probabilistic models for distributions of k-tuples, where parameters are estimated from the univariate marginal frequencies in Canada, are used for the calculations, and a sensitivity analysis is performed. The probability distributions, defined in Section 2, are those which satisfy the univariate margin constraints and are closest under different distance measures to a uniform distribution on the 6-tuples [see Joe (1987) on derivation of these via the concept of majorization]. This should be all right as a first approximation, since one might expect the distribution of 6-tuples chosen by participants to be close to a uniform distribution. Despite not having complete data, the adequacy of the models can be partly assessed (see Section 3.1). A consequence of these models is that the most popular 6-tuple is that consisting of the 6 individually most popular numbers and the least popular 6-tuples include that consisting of the 6 individually least popular numbers. For improved models, which would need more data than that available to fit, the most unpopular 6-tuple may not consist of the 6 individually most unpopular numbers. However, because of the wide range of expected returns provided by the class of models studied here, the conclusions should extend qualitatively and to some extent quantitatively to more general models.

The expected return (see Table 3) for a dollar ticket of the “picking unpopular numbers” strategy can be more than a dollar, especially if the carryover of the jackpot for 6/6 is sufficiently large. The carryover becomes large if there is no 6/6 winner in several consecutive games. However, there are only a few especially advantageous games per year (based on expected return), and the standard deviation of the return is so large that it would take millions of years for one to have a high probability of coming out ahead using the “picking unpopular numbers” strategy. The time can go down to tens of thousands of years if thousands of unpopular 6-tuples are chosen in advantageous games (see Tables 4 and 5) .

1990 A WINNING STRATEGY FOR LOTTO GAMES? 235

2. MODELS

Let S = (s} be the set of (unordered) k-tuples from the integers 1 to N inclusive. Let p s , s E S, be a probability distribution over the k-tuples. Let Si be the set of k-tuples that include i . The univariate margin is mi, i = 1, . . . , N, where mi = CsES, p s (note that xi mi = k). Distance measures from the uniform distribution include those of the form C, y(ps), where y is strictly convex [see Joe (1987) for the relation to a majorization ordering]. A convenient class of convex functions which lead to a class of models is Wa(u) = (u"~ - u ) / a , 2 0, where the limit yo(u) = u log u is obtained with a = 0. If c, Va(ps) is minimized subject to the constraints CsEs, ps = mi, i = 1, . . . , N, and ps 2 0 for all s E S , then the model

Ps = (g 0 i ) ' l a (2.1) +

is obtained for o! > 0, and the maximum-entropy model

i E S

of Stem and Cover (1989) is obtained with a = 0, where in (2.1) (y), = max(0, y}, and in (2.2) the 0; are all positive. Note that a = 0 leads to a multiplicative model and a = 1 leads to an additive model. Using majorization theory, Joe (1987) showed that the € I i in both (2.1) and (2.2) are ordered similarly to the mi. Given data mi, the estimates of the parameters 0i can be obtained using the Newton-Raphson method. For a = 0, this is generally faster than the iterative scaling method given in Stem and Cover (1989).

For Lotto 6/49 in Canada, histograms of the frequencies mi are given weekly (two draws per week since May 1986) in Luck. The histograms have maintained a similar pattern over the past 6 years. In fitting the models (2.1) and (2.2), the frequencies for the 6 July 1985 game are used, since these are given in Ziemba et al. (1986). The frequencies and the estimated 0; for a = 0, 0.5, 1 are given in Table 1. For the model with a = 1, some 0i are negative and some of the model probabilities are zero, so that it is not a suitable model for the distribution of 6-tuples over a long period of time. However, it is used here to cover a range of possibilities; as a increases from 0, the expected return of the 6-tuple with the 6 most unpopular numbers increases. In section 3.1, the goodness of fit of the models is assessed with the data that are available in Luck and Ziemba el al. (1986).

3. EXPECTED NUMBER OF WINNERS, AMOUNTS OF PRIZES, AND RETURN

In this section, we check the adequacy of the models in Section 2 and give a procedure for calculating our expected return and standard deviation of return when we have bought one ticket (or more) and the distribution of k-tuples for other people is (2.1) or (2.2).

For Lotto 6/49 in Canada, each ticket or 6-tuple chosen costs one dollar, and 45% of ticket sales go into the prize money. Of this, each 3/6 winner gets $10, and so that the remaining pool is 0.45M - lOV3, where M is the number of tickets sold in a game and Y, is the number of 316 winners. This remainder is divided into the 4 6 , 5/6, 5/6+, and 6/6 categories, with percentages 25%, 13%, 17%, and 45% respectively if the 616 jackpot is less than $7 million, and 25%, 13%, 47%, and 15% respectively otherwise. The prize pool for each of these four categories is then divided evenly among the winners in the

236 Vol. 18, No. 3

TABLE 1: Values of 8, for the models (2.1) and (2.2).

toe, 1040, 10713, i Frequency ( a - 0 ) (a = 0.5) (a = 1)

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49

0.1133 0.1208 0.1403 0.1285 0.1410 0.1367 0.1713 0.1246 0.1310 0.1063 0.1373 0.1335 0.1346 0.1321 0.1246 0.1335 0.1296 0. I197 0.1172 0.1033 0.1246 0.1260 0.1355 0.1398 0.1505 0.1271 0.1466 0.1095 0.1083 0.0970 0.1285 0.1095 0.1240 0.1183 0.1346 0.1310 0.1165 0.1058 0.0938 0.0931 0.1020 0.1027 0.1246 0.1317 0.1108 0.1033 0.1070 0.1076 0.1108

0.5904 0.6339 0.7501 0.6793 0.7544 0.7283 0.9445 0.6562 0.6941 0.5503 0.7320 0.7091 0.7157 0.7007 0.6562 0.7091 0.6858 0.6275 0.6129 0.5333 0.6562 0.6645 0.721 1 0.7471 0.8127 0.6710 0.7886 0.5685 0.5617 0.4979 0.6793 0.5685 0.6527 0.6193 0.7157 0.6941 0.6089 0.5474 0.4800 0.4761 0.5259 0.5299 0.6562 0.6983 0.5760 0.5333 0.5543 0.5577 0.5760

0.3301 0.4246 0.6586 0.5189 0.6667 0.6166 1 .W19 0.4715 0.5490 0.2394 0.6236 0.5788 0.5918 0.5621 0.4715 0.5788 0.5322 0.4109 0.3796 0.1997 0.4715 0.4886 0.6024 0.6528 0.7751 0.5020 0.7310 0.2812 0.2656 0. 1 146 0.5189 0.2812 0.4641 0.3934 0.5918 0.5490 0.3708 0.2328 0.0704 0.0607 0.1823 0.1917 0.4715 0.5573 0.2980 0.1997 0.2486 0.2564 0.2980

0.0596 0.1085 0.2356 0.1587 0.2402 0.2121 0.4377 0.1333 0.1750 0.0139 0.2161 0.1913 0.1985 0.1822 0.1333 0.1913 0.1659 0.1013 0.0850

- 0.0057 0.1333 0.1424 0.2043 0.2324 0.3021 0.1496 0.2767 0.0348 0.0270

-0.0468 0.1587 0.0348 0.1293 0.0922 0.1985 0.1750 0.0804 0.0106

-0.0677 - 0.0722 -0.0 142 -0.0096

0.1333 0.1795 0.0433

- 0.0057 0.0185 0.0224 0.0433

1990 A WINNING STRATEGY FOR LOIT0 GAMES? 237

category, and the prize tends to be bigger if there are fewer winners in the category. The 5/6+ winning 6-tuple is one which has exactly five numbers from the winning 6-tuple and the sixth number matching a (seventh) bonus number. The carryover for a category is the amount which is carried over to the next game if there is no winner in the category. The 616 jackpot has carryovers (i.e., no winners) in about half the games, and the 5/6+ jackpot has an occasional carryover, but not 416 or 516. The average number of tickets sold per game has increased over the past few years; the number of tickets sold with no carryover is typically over 12 million and becomes much larger as the 616 carryover grows (see Table 3 for some typical values). Note that there are 13,983,816 distinct 6-tuples in Lotto 6149.

Given a model ps for the distribution of 6-tuples, for each (winning) 6-tuple plus bonus number, one can compute in a straightforward manner the expected proportion of 316, 416, 516, 5/6+, and 616 winners from the theory in Sections 3.3 and 3.4.

3.1. Goodness of Fit of Models Since only partial information on the distribution of 6-tuples is given for each game, the

models (2.1) and (2.2) cannot be checked directly. However, we can still partly assess the models and, in particular, we can assess how good the models are for studying strategies for choosing numbers. These models will predict more winners in each category if the winning 6-tuple has more popular numbers and fewer winners if it has several unpopular numbers; this property is clear from the data published by Luck, since the prizes are bigger when there are more unpopular numbers in the winning 6-tuple [see Ziemba er al. (1986) from some analyses].

For making inferences and studying strategies for choosing numbers, the important thing is that the number of 316, 416, 516, 5/6+, and 616 winners should be predicted well. For the period June 1982 to January 1986, Ziemba et al. have listed the number of winners in each category and the amounts of the prizes. Using the rule for dividing the prizes, the number of tickets that were sold can be calculated. Hence from the expected proportions for a model the expected number of winners for each category can be deter- mined for each winning 6-tuple and these can be compared with the actual number of winners. Table 2 lists, for 10 different games, the actual number of winners by category and the expected number of winners for the models in Section 2 with a = 0 and a = 1 and also with the uniform model. The games were chosen so that some had several of the more popular numbers and others had several of the more unpopular numbers. The games are inversely ordered by average of the univariate proportions in Table 1 corre- sponding to the six numbers [see last column in Table 2(a)]. Note that the average of the 49 univariate proportions is 6/49 = 0. 1224. Clearly the uniform model can do poorly, especially if the 6-tuple contains several popular numbers or several unpopular numbers; in the more popular cases, 1 to 4, the expected numbers of winners in the 316, 416, 516 categories are generally too small, sometimes by quite a bit, and in the more unpopular cases, 8 to 10, the expected numbers are generally too large. The expected numbers for the uniform model are much further away from the actual numbers of winners for cases 4, 9, and 10, and there is no case for which the a = 0 and a = 1 models do this badly. These other two models are comparable in predictions, with each being slightly better in about half of these games (the model a = 0 is better for cases 3, 4, 5, 7, 9, the model a = 1 is better for cases 1, 2, 6, 8, and case 10 is about a tie). A winning 6-tuple with all unpopular numbers would be needed to discriminate well between these models (see case 11 in Table 2). Since the number of winners is the realization of a binomial random

Vol. 18, No. 3

TABLE 2

(a)-Dates and winning numbers for some games

Case Date Winning numbers Bonus No. of tickets sold Avg. prop.

1 2 3 4 5 6 7 8 9

10 11

100585 1 10285 022385 031685 070685 042785 041385 1 I1385 1 12085 012685 -

4 6 2 2 2 7 3 1 3 3 1 1 1925263645 I 2 7 9 2 0 4 3 3 6 18 21 33 34 524263641 45 4 14 27 37 40 44 7 20 24 46 48 49 5 9 2 3 2 8 4 0 4 6 31 3236383944 63033 394149 203039404142

45 6

33 49 35 46 41 49 20 47 46

13,864,879 16,081,192 16,967,328 17,125,020 17,049,302 25,204,466 15,983,226 9,624,121 9,406,291

2 1,197,166 12,000,000

0.1317 0.1290 0.1274 0.1273 0.1253 0.1248 0.1227 0.1189 0.1169 0.1107 0.0987

(b ) Actual and expected numbers of winners for games in (a )

Case Model“ 316 416 516 5/6+ 616

a 0 1 U

U

0 1 U

a 0 1 U

U

0 1 U

a 0 1 U

U

0 1 U

a 0 I U

a 0 1 U

299,230 301,200’ 298,100 244,700’

299.9 17 328,100’ 327.500’ 283,800’

315,339 33 1,600’

299,500’

341,765 336.900’ 336,600’ 302,300’

314,971 319,500’ 321 ,000’

334,300‘

300,900’

401,639 465,700’ 469,000’ 444,900’

248,499 278,800’ 283 ,700‘ 282,100’

162.01 1 154,500’ 155,700’ I 69.900’

16876 18090’ 17650’ 13430’

16149 19100’ 19030’ 15580’

17843 18810’ 19190’ 16430’

20384 19350’ 19300’ 16590’

17079 17880’ 18100’ 16510’

26516 25840’ 26320 24410’

11327 14960’ 15610’ I 5480’

8744 8045’ 8200’ 9321‘

306 369‘ 353’ 250’

279 375‘ 373’ 290

440 359‘ 372’ 306‘

406 377 375 309‘

274 337’ 345‘ 307

495 485 502 454

150 271‘ 292’ 288’

184 142’ 147’ 173

11 7.9 7.8 6.0

8 10.2 9.9 6.9

8 8.7 9.0 7.3

11 8.0 8.2 7.4

8 9.0 9.1 7.3

6 9.5

10.1 10.8

3 5.2 5.7 6.9

3 3.0 3.0 4.1

0 1.6 1.5 0.9

0 1.6 1.6 1.1

1 1.5 1.5 I .2

5 I .6‘ 1.5’ 1.2’

3 1.4 1.4 1.2

2 1.9 2.0 1.8

0 1 .o 1.2 1 . 1

1 0.5 0.6 0.7

1990 A WINNING STRATEGY FOR LOTTO GAMES? 239

TABLE 2 (concluded)

Case Model” 316 416 516 5/6+ 616

9 a 128,432 6169 139 1 1 0 143,500’ 7332‘ 127 2.5 0.5 1 143,600’ 7340’ 127 2.3 0.5 u 166,Wb 9111’ 170’ 4.0 0.7

10 U 280,480 12831 189 7 1 0 277,200’ 1 3230’ 214 4.3 0.7 1 270,90Ob 12370’ 185 2.9’ 0.5 u 374,100’ 20530’ 382‘ 9.1 I .5

11 a ? ? ?? ?? ?? 0 112,900 4730 67.4 1.3 0.2 1 93,300 2368 6.4 0.0 0.0

a‘‘aO stands for actual, “0” for the model with a = 0, “1” for the model with a = 1, and “u” for uniform. bThe absolute difference between the expected and actual numbers of winners is more than 2 times the square root of the expected number. Twice the square root is from 800 to 1400 for 316, from 170 to 320 for 416, from 20 to 44 for 516.

variable with small probability of success, the (standard) deviation from chance alone should be of the order of the square root of the numher of winners. For the 316 and 416 categories and sometimes the 516 category, the deviation of the model expectations from the actual number of winners is more than the chance variation. The cases with the deviation more than twice the square root of the expected number are indicated with a footnote in Table 2b. The models can definitely be improved on (with more data needed), but can be considered as a first approximation, since they are better than the uniform model. The model expectations are not systematically biased [they are sometimes too large and sometimes too small, even when stratified by the “popularity”, as measured by the last column of Table 2(a)]. Therefore we shall proceed to use the models and to get a range of possibilities for expected returns; also this serves as a sensitivity analysis over a class of models.

3.2. Expected Prize

Suppose M tickets are sold for a game among other people and we have bought one ticket. All random variables defined below depend on the winning 6-tuple w and the bonus number b. Let Y3, Y4, Y5, Y5+, Y6 be the numbers of winners among the M tickets in the five categories, with the subscript indicating the category. Assuming that the vector of number of tickets sold for the 6-tuples is Multinomial(M; p s , s E S ) , the variables Y3, Y4, Y5, Y5+, Y6 have a joint multinomial distribution. Let j be 4, 5, 5+, or 6. If our single ticket is a j / 6 winner, then our prize is

0.45(M + 1) - 1OY3 + Cj - ~j - bjY3 5 = ICj - l + u j 1 + Y j ’

(3.2.1)

where IC, is a fraction between 0 and 1, C, is the carryover, and aj and bj are appropriately defined.

Let p3 = E Y3 and p, = E u j , where expectations are calculated under a given model. Also let p3 = p 3 / M , p, = p j / M , $3 = Y 3 / M , $, = Y j / M . If M is large, a Taylor series expansion of (3.2.1 ) is

240 JOE Vol. 18, No. 3

Approximations based on the delta method and asymptotic normality of the distribution of ( 8 3 , P j ) yield

The term M 2 ( 1 + ~ j ) - ~ ( p i ( 1 - p , ) / M } , which comes from the expected value of the (8, - p,)’ term in (3.2.2), is approximately pJ:’. From the normal approximation, the expected value of the (6, term in (3.2.2), is asymptotically 3 ( M / p j ) 4 ( p j ( 1 - P , ) / M } ~ 3i 3py2 for p, small. We shall be working with M of the order lo’, and then fo r j = 4, p j = Mpj is of order 104, and for j = 5, p, is usually of the order Id, but can be smaller in some cases. Hence we will not use the term 3pF2 for pi 2 lo2. The term bjMp3pj( 1 + which comes for the product term (83 - p3)($, - p j ) in (3.2.2), is O(M-’ ) . It will be negligible, as it is about M-’ times the main term (a j -b jp3 ) / ( l+p j ) ; this is not surprising, as the correlation of P;. and 83 is about -(p,p3)’I2. Similarly,

2

Eq2 = (‘’-b’p3) {1+3pj(l -pj)(l+pj)-2+o(p;’)} 1 + P j

If pj is small [it is of order 0 to 10 fo r j = 5+ and 6, and sometimes less than 100 for j = 5-see Table 2(b)], the different approximations yield

(3.2.5)

In the approximation, Y3 and yi are assumed independent (their correlation is negligible), Y3 is binomial, and yi is assumed to Poisson with mean p, (Poisson approximation to binomial).

3.3. Expected Return

Let c be our chosen 6-tuple, let w be the winning 6-tuple, and let b be the bonus number; w is assumed to be equally likely to be any of the C649 = 13, 983, 816, 6- tuples, and b is equally likely to be one of the 43 numbers not in w . The probabilities that our ticket c will win the 316, 416, 516, 5/6+, and 616 prizes are respectively q 3 =

l/C,j9. Let R be our return. Then

c : ~ c $ / c ~ ~ , q4 = C ~ ~ C : / C : ~ , 45 = 4 2 ~ : ~ ~ , 6 / 4 3 ~ , 4 ~ , q5+ = C, 43 c6 5/43~,49, and q6 =

where (6 = E &(c); = c, E Z ~ ( W ) / C : C ; ~ , where the sum is over the 6-tuples w that match c in exactly four numbers; ( 5 = c, Cb ‘€Z5(w, b)/42Cf3C;, where the sums are over the 6-tuples w that match c in exactly five numbers and over the 42 numbers b not in w U c; and [s+ = c, EZ5+(w, b ( w ) ) / C f 3 C t , where the sum is over the 6-tuples

1990 A WINNING STRATEGY FOR LOTTO GAMES? 24 1

TABLE 3: Expected return and standard deviation from one ticket for three models.“

a = O a = 0.5 a = l

Ticket M l lo6 CllO’ Exp. SD Exp. SD Exp. SD

U 12 20 19 20 27 37 47 68

A 12 19 20 37

P 12 19 20 37

0.0 0.0 1.7 4.0 6.5 6.5 6.5 4.5

0.0 1.7 4.0 6.5

0.0 1.7 4.0 6.5

0.72 0.81 0.91 1.06 1.42 1.43 1.43 1.33

0.38 0.46 0.54 0.56

0.25 0.27 0.30 0.30

547 0.83 823 0.97

1170 1.07 1730 1.24 2030 1.84 2040 1.93 2030 1.99 1700 1.94

275 0.38 61 1 0.46 956 0.54 80 1 0.56

78.3 0.26 184 0.29 314 0.33 225 0.33

612 952

1310 1910 2480 2650 2760 2560

275 61 1 955 800

102 243 410 294

1.19 1.47 1.56 1.75 3.07 3.71 4.32 5.40

0.38 0.46 0.54 0.56

0.26 0.30 0.35 0.35

716 1170 1490 21 10 3490 4370 5260 6950

275 61 1 956 80 1

120 287 482 35 1

“LI is the 6-tuple 20 30 39 40 41 42, A is the 6-tuple 2 8 17 26 33 43, P is the 6-tuple 3 5 7 24 25 27, M is the number of tickets sold, and C is the 616 carryover.

w that match c in exactly five numbers and b(w) is the number in c\w. Note that 1; is our expected prize with the choice of c if we are a winner in thej/6 category. Similarly

where q’s are defined to 6’s with Z2 replacing Z everywhere. The expectations of Z and 2’ are given in (3.2.3) to (3.2.6). The variance and standard deviation of R can now be obtained from (3.3.1) and (3.3.2). Note that the expected return and the standard deviation are functions of c (the indices in the summations defining 1; and 5 depend on c) and the model for the distribution of 6-tuples chosen by other people.

Table 3 lists the expected return and standard deviation of return with a = 0, 0.5, 1 in the models in Section 2 for certain typical values of M and the carryover for the 616 jackpot; there are three chosen 6-tuples c, U consists of 6 unpopular numbers, A consists of 6 average numbers, and P consists of 6 popular numbers. The approximations (3.2.3) to (3.2.6) are accurate to the number of significant digits given in the table. The computations of the expected returns and the standard deviations of return as given in (3.3.1) and (3.3.2) were done on a Sun spmcstation 330 computer. For a = 0, the time for one case is about one hour, and for a # 0, the time for one case is between 4 and 4.5 hours. The large amount of time is due to the summation for b. Since the expected prize does not vary much over the different w that match c in exactly four numbers (compared with the variation in the prizes), b can be computed sufficiently accurately by taking a random sample from the possible w ; this was done originally on a slower Microvax I1 computer, and the results were essentially the same as in Table 3. Algorithms for systemically enumerating k-tuples and for selecting random k-tuples are given in Nijenhuis and Wilf (1978).

242 JOE Vol. 18, No. 3

3.4. Expected Return with More than One Ticket

The ideas used in Sections 3.3 and 3.4 can be used for choosing several or many tickets. With the purchase of more than one ticket, it is possible for us to win prizes in one or more categories. If a 6-tuple w is such that we have vj winningj/6 tickets for j = 3, 4, 5, 5+, 6 then the total of our prizes is

where the sum is over 4, 5, 5+, 6 and the notation is from Section 3.2. We can obtain E Z and 2: Z2 (as a function of w ) in a similar way to Section 3.2, and E R and E R2 can be modified. A way of purchasing many tickets is to take all combinations of six numbers from a fixed set of numbers. In this case, simple combinatorics can be used to enumerate the possible values of (v6, v5+, v5, v4, vg) and obtain the replacements for the q’s in (3.3.1). For example, for seven numbers the possible values of (Vg, v5+, v5, v4, v3)

are (0, 0, 0, 0, 4), (0, 0, 0, 3, 4), (0, 0, 2, 5, O) , (0, 1 , 1 , 5, O), (1, 0, 6, 0, O), and (1, 6, 0, 0, 0), with corresponding probabilities 93 = C$2Cl/Ci9, q 4 = C, 4 / 6 ,

For certain values of K, some expected profits and standard deviations for C t tickets consisting of all combinations of size 6 from the K most unpopular numbers are given in Tables 4 and 5, corresponding respectively to the models with a = 0 and a = 0.5. For these tables, M is 37 x lo6 and the 6/6 carryover is C = 6.5 x lo6. Also, in the two tables the expected profit per dollar spent and the number of games played with M and C are as above, in order that there shall be a high probability (0.975) of a gain. For the computations, random samples of 6-tuples from those matching 4, 5, or 6 of our K chosen numbers were used for determining the 6’s and q’s which generalize those in Section 3.3. For each K, the computational times were of the order of an hour on a Sun spmcstation 330 computer; for some of the larger K, the accuracy of the computed expected profit is only one significant digit because it is the difference of two large numbers.

4 2 ~ 7 c49

q 5 = 41Cf2Ci/43Ci9, 45+ = 2cp2ci/43ci9, q6 = 42C;/43Ci9, and q6+ = C,7/43ci9.

4. DISCUSSION

Ziemba et al. (1986) and Stem and Cover (1989) have begun the study of expected return as a function of the numbers chosen. Ziemba et al. use a statistical model that incorporates some information on how combinations of numbers are chosen. Stem and Cover use simulations with the maximum-entropy model to get some expected returns. We go farther in that we have analytic calculations of expected return and standard deviation of return for a class of models.

Table 3 clearly shows that the expected return for a ticket (U) with unpopular numbers can be much larger than a ticket (P) with popular numbers. This is because, conditional on winning, the expected prizes b, 65, 65+, 6 6 are much larger for U than for P. For the maximum-entropy model (a = 0), with M = 12 x lo6 tickets sold and no carryover, these expected prizes are respectively 150,6460,359,000 and 1,740,000 for U and respectively 32.2, 555, 23,800, and 245,000 for P. Table 3 also shows that for a set A of “average” numbers, the expected return is 0.38 for all three models when there is no carryover. This is less than 0.45, the average expected return over all k-tuples, because our ticket adds to the splitting of the prizes (if we win); if our ticket were not in the game, the expected return for the participants with A would be closer to 0.45.

Even though the three distributions p s are each closest for some distance measure to the uniform distribution subject to the univariate marginal constraints, the expected returns

A WINNING STRATEGY FOR L O U 0 GAMES? 243

TABLE 4: Expected profit and standard deviation from Cf tickets from the K most unpopular numbers for the model with a = 0; M = 37 X lo6, carryover = 6.5 X lo6.

K C f Exp . SD Exp. IC ," (2 SDIEXP.)*

6 7 8 9

12 15 17 19 20 21

1 7

28 84

924 5,005

12,376 27,132 38,760 54,264 -

0.43 2.74 9.5

25 190 630

lo00 700 100

-1300

2.04 x 103

1.06 x 104 1.82 x 104 5.93 x 104 1.36 x 105 2.13 x 105 3.10 x 105 3.69 x 105

5.36 X lo3

4.32 X 10'

0.43 0.39 0.34 0.30 0.21 0.13 0.08 0.03 0.00

-0.02

9.0 x 107 1.5 x 107

3.9 x 105

1.8 x 105 8 x 105 5 x 107

5.0 X lo6 2.1 x lo6

1.9 X 10'

TABLE 5: Expected profit and standard deviation from C," tickets from the K most unpopular numbers for the model with a = 0.5; M = 37 X lo6, carryover = 6.5 X lo6.

K C," Exp. SD Exp.lC," (2 S D I E X ~ . ) ~

6 7 8 9

12 15 18 21 23 24

1 7

28 84

924 5,005

18,564 54,264

100,947 134,596 -

0.93 5.90

20.1 53

410 1500 3300 3800

100 -4100

2.65 x 103 6.83 x 103 1.31 x 104 2.22 x 104 7.00 x 104 1.57 x 105

4.79 x 105 2.92 X lo5

6.30 X lo5 7.17 X lo5

0.93 0.84 0.72 0.63 0.44 0.30 0.18 0.07 0.00

-0.03

3.2 x 107

7.0 x 105 1.2 x 105

3.1 x 104 6.4 x 104

5.4 x lo6 1.7 X lo6

4.4 x 104

2 x 108

(especially for U ) differ. This is because as a increases from 0 to 1, the proportion of people choosing U decreases (and reaches zero for some a between 0.5 and 1). However, the expected returns for U for 01 = 1 are in the range mentioned in Ziemba et al. (1986). Ziemba et al. use a statistical model where a log expected return computed from the actual amounts of the prizes in a game is regressed on factors for each number 1 to 49 for 207 games. They then extrapolate the model to all C649 6-tuples to conclude that the expected return can be as much as $1.50 without a carryover up to $2.25 with larger carryover. Besides the possibility of model error, this statistical approach at modelling cannot provide a standard deviation of return, which is needed to analyze the "picking unpopular numbers" strategy (see below).

The model with a = 1 is less realistic for the true distribution of 6-tuples chosen by participants over a period of time, because there are about 6900 6-tuples with zero probability. With this model, we would not be sharing the 6/6 or 5/6+ jackpot if we won with U, and the expected return is greater than $1 in all cases. We would consider the range of expected returns provided from a = 0 to 01 = 0.5 to be more realistic; the proportion of participants buying U ranges from 0.169 x For a = 0, the expected return of about $1.43 for U is reached for a range of the number of tickets sold when the carryover is 6.5 million. Assuming the model remains valid over an indefinite period of time, the standard deviation of about $2040 means that, by the central

to 0.0676 x

244 JOE Vol. 18, No. 3

limit theorem, it would take (2 x 2040/0.43)* = 9.0 x lo7 games with this carryover to have a high probability (0.975) of a gain. A carryover of 6.5 million occurs about 4 times a year, so that for the strategy of choosing U , one needs about 23 million years to have a good chance of being ahead (assuming the model with a = 0). To be slightly more optimistic, assume a = 0.5. With an expected return of $1.93 with a standard deviation of $2650 for a carryover of 6.5 million, it would take (2 x 265010. 93)2 = 3.2 x lo7 games, or about 8 million years, to have a good chance of coming out ahead. These figures for the number of years are in the order of those given in MacLean, Ziemba, and Blazenko (1988) for a Kelly wagering strategy with unpopular numbers.

The figures in the preceding paragraph are based on buying one ticket with the numbers in U whenever the expected return is sufficiently greater than $1. Tables 4 and 5 show that there is no relative gain from buying C[ tickets consisting of the K individually most unpopular numbers. However, with K about 17 or 18 (C[ between 10,000 and 20,000), the numbers of games played in order to have a high chance (0.975) of winning can go down to 1.8 x lo5 and 3.1 x lo4 for a = 0 and cc = 0.5 respectively. At a rate of 4 games per year, the figure of 23 million years above decreases to 45,000 years for a = 0 and the figure of 8 million years decreases to 8000 years for a = 0.5.

In summary, Tables 3, 4, and 5 certainly show that the expected return with unpopular numbers can be more than the amount spent, but that the standard deviation is so large that one must wait tens of thousands of years to millions of years (depending on how much one can spend in a game) in order to have a good chance of being ahead. The expected return is most sensitive to the sizes of the smallest k-tuple probabilities, and the models used have a fairly wide range for these quantities. Hence the conclusion concerning the unpopular numbers should be qualitatively correct, and the ranges of expected returns from these models should include the expected returns from the “true distribution of k-tuples”. For the true distribution, there is no reason why the most unpopular k-tuple should consist of the k individually most unpopular numbers. Also, the conclusion should be qualititatively valid for lotto games in general, and of course, the (computational) techniques of this paper are applicable to other lotto games.

ACKNOWLEDGEMENT

This research has been supported by an NSERC grant. I am grateful to the referees, the Associate Editor, and the Editor for their comments.

REFERENCES Joe, H. (1987). An ordering of dependence for distribution of k-tuples. with applications to lotto games. C a d .

Luck (19841990). British Columbia and Western Canada Lottery Corporations. Weekly publication. MacLean, L.C., Ziemba, W.T., and Blazenko, G. (1988). Growth versus security in dynamic investment analysis.

Nijenhuis, A., and Wilf, H.S. (1978). Combinatorial Algorithms. Second Edition. Academic Press, New York. Stern, H., and Cover, T.M. (1986). Maximum entropy and the lottery. J. Amer. Statist. Assoc., 84, 98C~985. Ziemba, W.T., Brumelle, S.L., Gauthier, A., and Schwarz, S.L. (1986) Dr. Z’s Lotto 6/49 Guidebook. Dr. Z

J . Statist., 15, 227-238.

Technical Report, The University of British Columbia.

Investments, Vancouver.

Received 21 October 1988 Revised I5 December 1989 Accepted 3 I January I990

Department of Statistics The University of British Columbia

2021 West Mall Vancouver, B.C. V6T I W5