combinatorics 1: combinatorics in real life · 4 bayern germany ajax nederland 5 manchester city...
TRANSCRIPT
Bachelor of Ecole PolytechniqueComputational Mathematics, year 2, semester 1
Lecturer: Lucas Gerin (send mail) (mailto:[email protected])
Combinatorics 1: Combinatorics inreal life
Table of contentsChampions League 2018-19
Coincidences in Probability
The birthday paradox
The lottery coincidence
Champions League 2018-19
# execute this part to modify the css stylefrom IPython.core.display import HTMLdef css_styling():
styles = open("./style/custom2.css").read()return HTML(styles)
css_styling()
## loading python libraries
# necessary to display plots inline:%matplotlib inline
# load the librariesimport matplotlib.pyplot as plt # 2D plotting libraryimport numpy as np # package for scientific computing
from math import * # package for mathematics (pi, arctan, sqrt, factorial ...)
The round of 16 in UEFA Champions League (football) consists of games involvingthe 16 teams which qualify as st and d of each of the eight groups
in the group stage.
81 2
0, 1, 2, … , 7
Here are the 16 teams involved in 2018-19 in the round of 16, together with theirgroups and countries:
Teams which ended st Teams which ended d
# Group Name Country Name Country
0 Dortmund Germany Atletico Madrid Spain
1 Barcelona Spain Tottenham England
2 Paris France Liverpool England
3 Porto Portugal Schalke Germany
4 Bayern Germany Ajax Nederland
5 Manchester City England Lyon France
6 Real Madrid Spain Roma Italy
7 Juventus Italy Manchester United England
1 2
The mechanism for the round of is as follows:
Every team which ended st plays against one team which ended d.1. Two teams of the same group cannot play against each other.2. Two teams of the same country cannot play against each other.3.
The draw of the round of is picked uniformly among all the con�gurations whichsatisfy the above constraints. The aim of this lab session is to recover the odds thatcan be found online:
(source: Twitter @2010MisterChip, Dec.2018)
(https://twitter.com/2010MisterChip/status/1072973885014970368/photo/1)
The table must be interpreted in the following way: if we pick uniformly at randoman admissible con�guration for the round of , then with probability Roma will play against Porto (observe that with probability Porto will play againstSchalke, since they both were in Group #3).
161 2
16
16 0.13590
1. Preliminaries: Con�gurations without constraints
A con�guration is a one-to-one correspondence between the two sets
Later we will encode con�gurations by permutations of the set .Therefore we need to be able to generate all the permutations of a given set.
{Teams which ended 1st} → {Teams which ended 2d}.{0, 1, … , 7}
Do it yourself. Write a function Permutations(List) which returns the list of
all the permutations of the elements of List . For instance, one should have
Permutations([7,1,4])
[[7, 1, 4], [7, 4, 1], [1, 7, 4], [1, 4, 7], [4, 7, 1
], [4, 1, 7]]
(You must use recursive programming.)
2. Checking the constraints
[[7, 1, 4], [7, 4, 1], [1, 7, 4], [1, 4, 7], [4, 7, 1], [4, 1, 7]]
def Permutations(List):if len(List)==1:
return [List]else:
Output=[] # Temporary outputfor i in List:
SubList=list(List) # We copy 'List'SubList.remove(i) # We remove 'i' from 'SubList'for SubPermutation in Permutations(SubList):
Output.append([i]+SubPermutation) # We add lists of the form [i,SubPermutation]return Output
# Test: all permutations of [7,1,4] print(Permutations([7,1,4]))
Let be a permutation of . We associateto a con�guration of games in the following way:
Team 1st of Group # plays against Team 2d of Group # ,
Team 1st of Group # plays against Team 2d of Group # ,
...
Team 1st of Group # plays against Team 2d of Group # .
For example, if [7,2,5,0,1,4,3,6] then
Dortmund (1st of Group # ) plays against Manchester Utd (2d of Group # ),
Barcelona (1st of Group # ) plays against Liverpool (2d of Group # ),
...
Juventus (1st of Group # ) plays against Roma (2d of Group # ).
We see that this con�guration does not match the required constraints sinceJuventus and Roma (both from Italy) are not supposed to play against each other.
σ (σ(0), σ(1), … , σ(7)) {0, 1, … , 7}σ 8
0 σ(0)1 σ(1)
7 σ(7)
σ =
0 71 2
7 6
Do it yourself. Create a function AdmissibleConfiguration() which returns
True if the input is an admissible con�guration, and False if it is not
admissible. If possible, it should return the explanation of why it is not admissible.
To save your time we have already created two lists Teams1st and Teams2d
ordered acccording to their groups.
3. Computing probabilities
Do it yourself. Let be a uniform permutation of , and let denote the event "At the round of , Roma plays against Porto".Write as a conditional probability, using the random variable .
σ {0, 1, … , 7} A16
ℙ(A) σ
How to use the lists Team1st and Teams2d:1st Team of Group #3 : ['Porto', 'Portugal']Country of 2d Team of Group #5 : France---------------Test of AdmissibleConfiguration:[0,6,2,7,1,4,3,5] is admissible? Answer: False[7,2,5,0,1,4,3,6] is admissible? Answer: False[7,2,6,0,1,4,3,5] is admissible? Answer: True
Teams1st=[['Dortmund','Germany'],['Barcelona','Spain'],['Paris SG','France'], ['Porto','Portugal'],['Bayern Munchen','Germany'],['Manchester City', ['Real Madrid','Spain'],['Juventus','Italy']]
Teams2d=[['Atletico Madrid','Spain'],['Tottenham','England'],['Liverpool','England' ['Schalke','Germany'],['Ajax','Nederland'],['Lyon','France'], ['Roma','Italy'],['Manchester United','England']]
# A few examples :print('How to use the lists Team1st and Teams2d:')
print('1st Team of Group #3 : '+str(Teams1st[3])) print('Country of 2d Team of Group #5 : '+str(Teams2d[5][1]))
print('---------------')
def AdmissibleConfiguration(Config):# returns "True" if and only if Config is admissiblefor i in range(8):
if Teams1st[i][1]==Teams2d[Config[i]][1]:#print('False (same country): '+str(Teams1st[i][0])+' plays '+str(Teams2d[Config[i]][0]return False
elif Config[i]==i:#print('False (same group): '+str(Teams1st[i][0])+' plays '+str(Teams2d[Config[i]][0]))return False
return True
# A test:print('Test of AdmissibleConfiguration:')print('[0,6,2,7,1,4,3,5] is admissible? Answer: '+str(AdmissibleConfiguration([print('[7,2,5,0,1,4,3,6] is admissible? Answer: '+str(AdmissibleConfiguration([print('[7,2,6,0,1,4,3,5] is admissible? Answer: '+str(AdmissibleConfiguration([
Answers. Because of the assumption
The draw of the round of is picked uniformly among all the con�gurationswhich satisfy the above constraints
we have that
16
ℙ(A) = Number of adm. config. s such that s(3) = 6Number of adm. config.
= × Number of adm. config. such that s(3) = 6Number of config.
Number of config.Number of adm. config.
= ℙ( = 6 ∩ σ is admissible.) ×σ31
ℙ(σ is admissible.)= ℙ( = 6 | σ is admissible.)σ3
Do it yourself. Write a script which computes
the number of admissible con�gurations (to check your result: you should
�nd )
the number of admissible con�gurations such that Roma plays against Porto
the probability that Roma plays Porto.
3694
Do it yourself. Write a script which computes the table of all probabilities.8 × 8
Number of admissible config. = 3694Number of admissible config where Roma plays Porto = 502Probability that Roma plays Porto = 0.1359
NumberOfAdmissibleConfig=0NumberOfAdmissibleConfigRomaVsPorto=0
for Config in AllConfigurations(range(8)):if AdmissibleConfiguration(Config)==True:
NumberOfAdmissibleConfig=NumberOfAdmissibleConfig+1if Config[3]==6:
NumberOfAdmissibleConfigRomaVsPorto=NumberOfAdmissibleConfigRomaVsPorto
print('Number of admissible config. = '+str(NumberOfAdmissibleConfig))print('Number of admissible config where Roma plays Porto = '+str(NumberOfAdmissibleConfigRomaVs
Ratio=NumberOfAdmissibleConfigRomaVsPorto/(NumberOfAdmissibleConfig+0.0)print('Probability that Roma plays Porto = '+str(np.round(Ratio,4)))
3. Bad luck for Germany
Do it yourself. Football specialists considered that a bad draw for Germany wouldbe
Bayern Munchen vs Atletico Madrid and Dortmund vs Liverpool.
Write a script which computes the probability that both events occur.1. Are these two events independent?2.
[[ 0. 0.176 0.176 0. 0.1375 0.176 0.1586 0.176 ] [ 0. 0. 0.1727 0.183 0.1381 0.1727 0.1608 0.1727] [ 0.1884 0.1684 0. 0.1825 0.1364 0. 0.1576 0.1668] [ 0.1586 0.1478 0.1467 0. 0.1175 0.1467 0.1359 0.1467] [ 0.1787 0.1678 0.1665 0. 0. 0.1665 0.1538 0.1668] [ 0.2891 0. 0. 0.2777 0.1998 0. 0.2334 0. ] [ 0. 0.1722 0.1719 0.177 0.1359 0.1719 0. 0.1711] [ 0.1852 0.1678 0.1662 0.1798 0.1348 0.1662 0. 0. ]]
NumberOfAdmissibleConfig=0MatrixOfNumbers=np.zeros([8,8]) # MatrixOfNumbers[a,b] will be the number of config in which a play
for Config in AllConfigurations(range(8)):if AdmissibleConfiguration(Config)==True:
NumberOfAdmissibleConfig=NumberOfAdmissibleConfig+1# If Config is admissible then all the 8 games of Config increase by onefor i in range(8):
MatrixOfNumbers[i,Config[i]]=MatrixOfNumbers[i,Config[i]]+1
#print(NumberOfAdmissibleConfig)MatrixOfProbabilities=np.round(MatrixOfNumbers/NumberOfAdmissibleConfig,4)print(MatrixOfProbabilities)
Answers.We consider the events:
If were independent, then one would have
(NB: As the LHS is greater than the RHS one says that these two events arepositively correlated. This is really bad luck for Germany.)
1.
E
F
= { Bayern Munchen plays Atletico Madrid}= { Dortmund plays Liverpool}.
E, F0.0338 = ℙ(E ∩ F) = ℙ(E)ℙ(F) = 0.0315.
Number of admissible config. = 3694Number of bad config for Germany = 125Probability of bad luck for Germany = 0.0338If independence: 0.0315
NumberOfAdmissibleConfig=3694 # computed earlierNumberOfAdmissibleConfigBadForGermany=0
for Config in AllConfigurations(range(8)):if AdmissibleConfiguration(Config)==True:
if Config[4]==0 and Config[0]==2:NumberOfAdmissibleConfigBadForGermany=NumberOfAdmissibleConfigBadForGermany
print('Number of admissible config. = '+str(NumberOfAdmissibleConfig))print('Number of bad config for Germany = '+str(NumberOfAdmissibleConfigBadForGermany
Ratio=np.round(NumberOfAdmissibleConfigBadForGermany/(NumberOfAdmissibleConfig+print('Probability of bad luck for Germany = '+str(Ratio))
# If these events were independent, then we would have# P(bad luck)= P(Bayern Munchen vs Atletico Madrid) x P(Dortmund vs Liverpool) Product=np.round(MatrixOfProbabilities[4,0]*MatrixOfProbabilities[0,2],4)print('If independence: '+str(Product))
Coincidences in Probability1. The birthday paradox
We consider the following problem. Consider a group of people, we assumethat their birthdays are uniformly distributed and independent in
, with . The birthday paradox asks for the probability of theevent
Obviously we have that as soon as . The so-calledparadox is that a high probability is reached for quite small values of .
n ≥ 2, … ,X1 Xn
{1, 2, … , k} k = 365
= { there exist i ≠ j, 1 ≤ i, j ≤ n; = }.En,k Xi Xj
ℙ( ) = 1En,365 n ≥ 365n
Do it yourself. Let be the complementary event of .
Compute and . (Justify carefully your answer for .)1. Compute
and deduce the formulas for .
2.
Fn,k En,k
ℙ( )F1,k ℙ( )F2,k F2,k
ℙ( | ),Fn,k Fn−1,kℙ( ),ℙ( )Fn,k En,k
Answers.We obviously have .1.
One writes
Therefore .
If the event occurs then all the 's are distinct up to .Then occurs if and only if takes one of the remainingvalues:
By induction we easily obtain that
and .
1.
ℙ( ) = 1F1,k
ℙ( )F2,k = ℙ( ≠ )X1 X2
= ℙ( = i, ≠ ) (law of total probabilities)∑i=1
k
X1 X1 X2
= ℙ( = i, ≠ i)∑i=1
k
X1 X2
= ℙ( = i)ℙ( ≠ i) (independence)∑i=1
k
X1 X2
= = (the sum does not depend on i).∑i=1
k 1k
k − 1k
k − 1k
ℙ( ) = (k − 1)/kF2,k
Fn−1,k Xi i = n − 1Fn,k Xn k − (n − 1)
ℙ( | ) = .Fn,k Fn−1,kk − (n − 1)
k
ℙ( ) = × × × ⋯ × .Fn,kk
k
k − 1k
k − 2k
k − (n − 1)k
ℙ( ) = 1 − ℙ( )En,k Fn,k
Do it yourself. Write a function that takes as inputs and returns .n, k ℙ( )En,k
For n=8, two identical birthdays with probability 0.0743352923517
def TwoIdenticalBirthdays(n,k):# returns the probability P(E_{n,k})Vector=np.arange(k-n+1,k+1) # computes [k-n+1,...,k]Quotient=Vector/(k+0.0) # '+0.0' forces the float divisionProduct= np.prod(Quotient)return 1-Product
# Test : (for n=8,k=365 this should return 0.0743...)print('For n=8, two identical birthdays with probability '+str(TwoIdenticalBirthdays
Do it yourself.Plot for to .1. Find the smallest such that .2. n ↦ ℙ( )En,365 n = 2 n = 100
n ℙ( ) ≥ 3/4En,365
Answers. 2) According to the above script, there are more than % chances assoon as .
75n ≥ 32
-----------------Question 2For n = 69, we have 0.998963666308 chance of 2 identical birthdaysFor n = 70, we have 0.999159575965 chance of 2 identical birthdays-----------------
# Question 1BirthdayParadox = [TwoIdenticalBirthdays(n,365) for n in range(2,100,3)]plt.plot(range(2,100,3),BirthdayParadox,'o-')plt.xlabel('Size $n$ of the group'),plt.ylabel('Probability')plt.title('Probability in the birthday paradox')plt.show()
# Question 2
BirthdayParadox = [TwoIdenticalBirthdays(n,365) for n in range(1,365)]i=1while BirthdayParadox[i]<0.999:
i=i+1print('-----------------')print('Question 2')print('For n = '+str(i)+', we have '+str(TwoIdenticalBirthdays(i,365))+' chance of 2 identical bprint('For n = '+str(i+1)+', we have '+str(TwoIdenticalBirthdays(i+1,365))+' chance of 2 identicprint('-----------------')
Bonus: 2. The lottery coincidence(Inspired by The North-Carolina Lottery Coincidence (Leonard Stefanski)(https://www4.stat.ncsu.edu/~stefanski/NC%20Lottery%20Coincidence.pdf).)
On July 9th, 2007, the North Carolina Cash 5 lottery numbers came up , , ,, . Two days later (the lottery runs every day), the same �ve numbers came up
again. This seems very unlikely, the aim of this exercise is to show that this is notthat extraordinary.
The rules of Cash 5 are the following: every day �ve distinct numbers are pickeduniformly (order does not matter) between and . More formally, at eachdrawing we are given a random variable uniform in the set , where is the
set of all the combinations.
As a warm-up we will �rst estimate the probability that the same combination ispicked twice in two days (instead of twice in three days).
4 21 2334 39
1 39X C5 C5( ) = 57575739
5
Do it yourself. Let be a sequence of independent random variables
uniform in . Put .
Let denote the event
Compute .1. Compute2.
Let , compute the probability that in days there are no twoconsecutive drawings which are identical.
3.
, , …X1 X2C5 k = card( ) = ( )C5
395
An
= { ≠ }.An Xn Xn+1
ℙ( )A1
ℙ ( | ∩ ∩ ⋯ ∩ ) .An−1 A1 A2 An−2n ≥ 2 pn n
Answers.
This is the birthday paradox with : .1.
Assume that the event occur. The same reasoning asbefore shows that, no matter the values of , we have that
with probability .
2.
We have that
By induction
3.
n = 2 ℙ( ) =A1k−1k
, , … ,A1 A2 An−2, … ,X1 Xn−1
= { ≠ }An−1 Xn−1 Xn (k − 1)/k
= ℙ ( and and and … ) .pn A1 A2 A3 An−1
= .pn ((k − 1)/k)n−1
Do it yourself.In the cell below, write a function
NotTwoConsecutiveIdenticalDrawings(n) which takes as input and
returns .
npn
Fine, we obtain that the probability of having two identical successive drawings isindeed very small as long as, say, (about years of daily lotteries).
We now turn to the actual lottery problem: estimating the probability that there isthe same drawing twice in three days.
n ≤ 10000 10
Do it yourself. Let denote the event
Write in terms of .1.
Compute2.
Deduce from above the probability that, in days, there are notwice the same drawing in three consecutive days.
3.
Bn
= { , , are all distinct}.Bn Xn Xn+1 Xn+2
ℙ( )B1 k = ( )395
ℙ ( | ∩ ∩ ⋯ ∩ ) .Bn−2 B1 B2 Bn−3qn n ≥ 3
0.9827832155669857
k=575757+0.0 # Number of combinationsdef NotTwoConsecutiveIdenticalDrawings(n):
# returns the probability p_nreturn (1-1/k)**(n-1)
# Test: for n=10000 this should return 0.98278...NotTwoConsecutiveIdenticalDrawings(10000)
Answers.This is just the birthday paradox for :1.
Assume that occur. In particular, we have , i.e. are all distinct.
2.
Therefore occurs if and only if and , whichhappens with probability . Finally,
We have that
Let us prove by induction that for every
For this is Question 1. For ,
1.
n = 3ℙ( ) = .B1
(k − 1)(k − 2)k2
, , … ,B1 B2 Bn−3 Bn−3, ,Xn−3 Xn−2 Xn−1
Bn−2 ≠Xn Xn−1 ≠Xn Xn−2(k − 2)/k
ℙ ( | ∩ ∩ ⋯ ∩ ) =Bn−2 B1 B2 Bn−3k − 2k
= ℙ( ∩ ⋯ ∩ ).qn B1 Bn−2n ≥ 3
= .qnk(k − 1)k2 ( )k − 2
k
n−3
n = 3 n ≥ 4qn = ℙ( ∩ ⋯ ∩ )B1 Bn−2
= ℙ( ∩ ⋯ ∩ | ∩ ⋯ ∩ )ℙ( ∩ ⋯ ∩ )B1 Bn−2 B1 Bn−3 B1 Bn−3
= k − 2kqn−1
= =k − 2k
(k − 1)(k − 2)k2 ( )k − 2
k
n−4 (k − 1)(k − 2)n−2
kn−1
Do it yourself. Write a function NotTwoIdenticalDrawingsInThreeDays(n)
which takes as input and returns .To check your result:
np.round(NotTwoIdenticalDrawingsInThreeDays(10000),4)
0.9659
n qn
k=575757+0.0def NotTwoIdenticalDrawingsInThreeDays(n):
# returns the probability q_nreturn ((k-1)/k)*((k-2)/k)**(n-2)
# Test: for n=50000 this should return 0.840566...print(np.round(NotTwoIdenticalDrawingsInThreeDays(10000),40))
Do it yourself. Assume that lotteries similar to the Cash 5 Lottery take place times in �ve years, in cities across the US.
Compute the probability that for �ve years there is at least one lottery in whichthere are twice the same drawing in three consecutive days. Write the answer in
terms of NotTwoIdenticalDrawingsInThreeDays and compute the
numerical value in the cell above.
1200 20
Answers. We want to compute
According to the script below, the answer is almost %.
ℙ( Two equal drawings in c in 1200 days)⋃c: city
= 1 − ℙ( No two equal drawings in ⋂c: city
= 1 − ℙ( No two equal drawings in
= 1 − ( .q1200)20
8
1-(NotTwoIdenticalDrawingsInThreeDays(1200))**20