chapter 5. joint distributions - university of kentchapter 5. joint distributions 5.1 introduction...
TRANSCRIPT
-
Chapter 5.
JOINT DISTRIBUTIONS
5.1 Introduction
In this chapter we look at the simultaneous(joint) behaviour of two or more rvs.
Example: We measure height and weight ofrandomly selected individuals.
ht
wt
××××
××
×
×
×
×
×
×
××
××
×
×
Clearly the two rvs representing height andweight are linked.
-
5.2 Discrete rvs: joint probabilityfunction
Let X have range x1, x2, . . . , xn, and let Yhave range y1, y2, . . . , ym.
DEFINITION: The joint pf of X and Y isdefined as the function
Pr(X = x ∩ Y = y) ,
for x = x1, x2, . . . , xn , y = y1, y2, . . . , ym . It is afunction of x and y.
Notation: the joint pf is often writtenpX,Y (x, y).
SIMPLE EXAMPLEX
1 2 3 4
1 0.05 0.05 0.05 0.05
Y 2 0.05 0.10 0.15 0.20
3 0.05 0.00 0.10 0.15
Entries in the body of the table give the jointpf of X and Y .
-
Marginal Distributions
The joint pf gives full information about thejoint behaviour of X and Y . But X by itself isjust a discrete random variable, so it has a pfPr(X = x), x = 1,2,3,4.
ExampleX
1 2 3 4
1 0.05 0.05 0.05 0.05 0.20
Y 2 0.05 0.10 0.15 0.20 0.50
3 0.05 0.00 0.10 0.15 0.30
0.15 0.15 0.30 0.40
To obtain the marginal pf of X, sum thejoint pf over all values of Y , since
Pr(X = x) =m∑
y=1
Pr(X = x ∩ Y = y) .
This gives the marginal distribution of X.
The pf of Y is obtained similarly.
Exercise: Find E(X) and E(Y ).
-
Conditional Distributions
Consider now
Pr(X = x | Y = y) =Pr(X = x ∩ Y = y)
Pr(Y = y).
For fixed y, this function of x gives the
conditional distribution of X given Y = y .
Extract from the table above:
x 1 2 3 4
Pr(X = x ∩ Y = 2) 0.05 0.10 0.15 0.20Pr(X = x | Y = 2) 0.10 0.20 0.30 0.40
Conditional distributions share the properties
of probability distributions. Note in particular
that these conditional probabilities are in the
range [0, 1], and that they sum to 1 over the
range of X.
Exercise: Find E(X | Y = 2).
-
Conditional and marginal distributions
We have seen that
Pr(X = x) =∑y
Pr(X = x ∩ Y = y).
Also, by definition,
Pr(X = x | Y = y) =Pr(X = x ∩ Y = y)
Pr(Y = y).
Together, these give the useful result
Pr(X = x) =∑y
Pr(X = x | Y = y)Pr(Y = y) .
These concepts extend to joint distributions:
• of more than 2 rvs,• of continuous rvs.
We consider the case of continuous rvs later:
we will need to use integration intead of
summation.
-
Example: Poisson and binomial
Seeds of a particular plant species fall at
random, so that the number Y in a particular
area has a Poisson distribution with some
mean µ.
For each seed, independent of all others, the
probability of germinating is p.
Find the distribution of the number X of
seeds that germinate; that is, calculate
Pr(X = x) , for x = 0,1, . . ..
Information provided. We are told:
(i): Y ∼ Poisson(µ), so that
Pr(Y = y) =e−µµy
y!, y = 0,1,2, . . .
(ii): Given that Y = y, X ∼ B(y, p) ; so that
Pr(X = x | Y = y) =(yx
)px(1−p)y−x, x = 0,1, . . . , y.
-
We need to calculate Pr(X = x), the
marginal distribution of X. Noting that y ≥ x,and that, therefore,
Pr(X = x) =∞∑
y=xPr(X = x | Y = y)Pr(Y = y)
we obtain
So Pr(X = x) =∞∑
y=x
(yx
)px(1− p)y−x
e−µµy
y!
=e−µpx
x!
∞∑y=x
(1− p)y−xµy
(y − x)!
[substitute z = y − x]
=e−µpxµx
x!
∞∑z=0
(1− p)zµz
z!
=e−µ(pµ)x
x!e(1−p)µ =
e−pµ(pµ)x
x!.
Hence X ∼ Poisson(pµ) .
-
5.3 Continuous rvs: joint pdf
Instead of a joint probability function (pf)Pr(X = x ∩ Y = y) , we obtain a jointprobability density function (pdf).
Recall: for a single random variable X, weview the pdf as giving the probability that Xis close to some value x; that is,
Pr(x < X ≤ x + g) ' g fX(x).
We extend this idea here to the case of twovariables X and Y . We define the joint pdf ofX and Y informally as the function fX,Y (x, y)such that the probability of the event
(x < X ≤ x + g) ∩ (y < Y ≤ y + h)is approximately g.h.fX,Y (x, y).
Notes
Note 1. Formally, the joint pdf is defined bydifferentiating a joint cdf Pr(X ≤ x ∩ Y ≤ y)partially with respect to x and y.
-
Note 2. Joint pdfs have behaviour analogous
to that of joint pfs. For example, we have the
following:
(A).∫ ∞−∞
∫ ∞−∞
fX,Y (x, y)dxdy = 1 .
(B). Marginal distributions:
fX(x) =∫ ∞−∞
fX,Y (x, y)dy
(C). Conditional distributions:
fX|Y (x | y) =fX,Y (x, y)
fY (y)
In the discrete case, we noted the result
Pr(X = x) =∑y
Pr(X = x | Y = y)Pr(Y = y) .
For continuous rvs, (B) and (C) combine to
give the equivalent result
fX(x) =∫ ∞−∞
fX|Y (x | y)fY (y)dy .
-
5.4 Expectation
In Chapter 3, we used the pf to give a
complete description of the behaviour of a
discrete rv.
In Chapter 4, we used the pdf to give a
complete description of the behaviour of a
continuous rv.
To summarise the most important features
of the behaviour of any rv (whether discrete
or continuous), we used the mean and
variance. We defined:
mean (µ) E(X)
variance (σ2) E{(X − µ)2}
These are both expectations of functions of
X. We can also make use of the concept of
expectation to summarise the joint behaviour
of two rvs X and Y .
-
We first extend the definition of expectation
to cover a function g(X, Y ) of two rvs, not
just a function of a single rv.
Definition: If g(X, Y ) is a scalar function of X
and Y , then we define
E{g(X, Y )} =∑x
∑y
g(x, y)Pr(X = x ∩ Y = y)
if X and Y are discrete, and
E{g(X, Y )} =∫ ∫
g(x, y)fX,Y (x, y) dx dy
if X and Y are continuous.
-
Example:X
1 2 3 4
1 0.05 0.05 0.05 0.05 0.20
Y 2 0.05 0.10 0.15 0.20 0.50
3 0.05 0.00 0.10 0.15 0.30
0.15 0.15 0.30 0.40
For g(X, Y ) = 2X + 5Y , we obtain:
E(2X + 5Y ) = (7× 0.05) + (9× 0.05). . . + (14× 0.10)
. . . + (23× 0.15) = 16.4.
Similarly, for g(X, Y ) = XY , we obtain:
E(XY ) = (1× 0.05) + (2× 0.05). . . + (4× 0.10)
. . . + (12× 0.15) = 6.35.
The concept of expectation is very powerful,especially when used in the context of thejoint behaviour of two or more rvs.
-
Expectation is a linear operator
We now show that, for any function g(X) of
a discrete rv X,
E{ag(X) + b} = aE{g(X)}+ b .
For brevity, we write pX(x) for Pr(X = x) .
Now
E{ag(X) + b} =∑{ag(x) + b}pX(x)
= a∑
g(x)pX(x) + b∑
pX(x)
= aE{g(X)}+ b.
A similar proof holds when X is continuous.
Hence, for scalar problems, expectation acts
as a linear operator.
It is also a linear operator when it is a
function of several rvs.
-
For any rvs X and Y and constants a and b,
E(aX + bY ) = aE(X) + bE(Y ).
Proof (discrete case) We again writePr(X = x ∩ Y = y) = pX,Y (x, y); alsoPr(X = x) = pX(x), Pr(Y = y) = pY (y). Now
E (aX + bY )
=∑x
∑y
(ax + by)pX,Y (x, y)
=∑x
∑y
axpX,Y (x, y) +∑x
∑y
bypX,Y (x, y)
= a∑x
x
∑y
pX,Y (x, y)
+ b∑y
y
{∑x
pX,Y (x, y)
}
= a∑x
xpX(x) + b∑y
ypY (y)
= aE(X) + bE(Y ).
Note that the operator is again linear.
We return to this topic later.
-
5.5 Covariance
Now reconsider the plot of height (Y ) against
weight (X).
hty
wt x
××××
××
×
×
×
×
×
×
××
××
×
×
When X is relatively large, so is Y . So if, for
an individual, {X − E(X)} is positive,{Y − E(Y )} is also likely to be positive .
Similarly, when {X − E(X)} is negative,{Y − E(Y )} is also likely to be negative .
So the product {X − E(X)}{Y − E(Y )} islikely to be positive.
-
We can summarise the link between X and Y
by examining the product
{X − E(X)}{Y − E(Y )}
In particular, we consider the expectation of
this function.
DEFINITION: The covariance between rvs
X and Y , written as Cov(X, Y ) , is defined as
Cov(X, Y ) = E [{X − E(X)}{Y − E(Y )}] .
If Cov(X, Y ) = 0 , then X and Y are said to
be uncorrelated.
The correlation between X and Y is positive
if Cov(X, Y ) > 0. We then say that they are
positively correlated.
They are negatively correlated if
Cov(X, Y ) < 0.
-
Calculation of covariance
Recall that the variance Var(X) of a rv X is
defined as Var(X) = E[{X − E(X)}2] .
However, we proved in §3.3 that we can writeVar(X) = E(X2)− {E(X)}2.
There is, similarly, an easier way to calculate
a covariance.
Theorem:
Cov(X, Y ) = E(XY )− E(X)E(Y ).
Proof
Cov(X, Y ) = E [{X − E(X)}{Y − E(Y )}]
= E[XY − E(X)Y−XE(Y ) + E(X)E(Y )]
= E(XY )− E(X)E(Y ).
-
Example:X
1 2 3 4
1 0.05 0.05 0.05 0.05 0.20
Y 2 0.05 0.10 0.15 0.20 0.50
3 0.05 0.00 0.10 0.15 0.30
0.15 0.15 0.30 0.40
We can obtain E(X) and E(Y ) easily: for
example
E(Y ) = (1× 0.2)+ (2× 0.5)+ (3× 0.3) = 2.1 .
Similarly, we can show that E(X) = 2.95 .
We showed earlier that E(XY ) = 6.35 .
Hence Cov(X, Y ) = 6.35− 2.95× 2.1 = 0.155.
Link between concepts of covariance and
correlation: important in dues course, but not
covered in this module.
-
5.6 Independent Random Variables
Recall that events A and B are independent
if Pr(A ∩B) = Pr(A)× Pr(B) . We extendthe concept in a natural way.
Definition: Random variables X and Y are
statistically independent if any event relating
to X alone is independent of any event
relating to Y alone.
For example, if X and Y are independent,
Pr{(X ≤ 3)∩(Y ≥ 6)} = Pr(X ≤ 3)×Pr(Y ≥ 6).
As a consequence, for discrete rvs
Pr(X = x ∩ Y = y) = Pr(X = x)Pr(Y = y)
and for continuous rvs
fX,Y (x, y) = fX(x)fY (y).
That is, the joint pf (discrete) or pdf
(continuous) factorises into terms involving x
and y separately.
-
Initial Example, revisited
Original - NOT INDEPENDENT
X
1 2 3 4
1 0.05 0.05 0.05 0.05 0.20
Y 2 0.05 0.10 0.15 0.20 0.50
3 0.05 0.00 0.10 0.15 0.30
0.15 0.15 0.30 0.40
Revised - INDEPENDENTX
1 2 3 4
1 0.030 0.030 0.06 0.08 0.20
Y 2 0.075 0.075 0.15 0.20 0.50
3 0.045 0.045 0.09 0.12 0.30
0.15 0.15 0.30 0.40
For example, note that, in the second table,
Pr({X = 4} ∩ {Y = 1}) = 0.40× 0.20 = 0.08 .
-
Independence and covariance
If two random variables are independent, then
several aspects of their expectations and
variances simplify.
In particular, if X and Y are independent rvs,
then E(XY ) = E(X)E(Y ) .
Proof: (discrete case)
E(XY ) =∑x
∑y
xy Pr(X = x ∩ Y = y)
=∑x
∑y
xy Pr(X = x)Pr(Y = y)
=
(∑x
xPr(X = x)
)∑y
y Pr(Y = y)
= E(X)E(Y ).
Hence, if X and Y are independent,
Cov(X, Y ) = 0 . The rvs are also
uncorrelated.
-
5.7 Applications of Expectation
Reminders: For a rv X with pdf fX(x), theexpectation E{g(X)} of the function g(x) isdefined as
E{g(X)} =∫ ∞−∞
g(x)fX(x) dx; or
=∑x
g(x)Pr(X = x)
We have seen that
• Expectation is linear, so thatE{ag(x) + b} = aE{g(X)}+ b andE(aX + bY ) = aE(X) + bE(Y ).
• The covariance between X and Y canbe calculated asCov(X, Y ) = E(XY )− E(X)E(Y ).
• If X and Y are independent randomvariables, Cov(X, Y ) = 0.
There are several natural extensions of theseresults.
-
The result
E(aX + bY ) = aE(X) + bE(Y )
extends to the weighted sum of any number
of rvs.
If X1, X2, . . . , Xm are rvs and a1, a2, . . . , amare constants, then
E
m∑i=1
aiXi
= m∑i=1
aiE(Xi).
In the result above, the coefficients a1, a2 etc
must be constants.
Note that, in general, E(XY ) 6= E(X)E(Y ) .
-
Results for variances
Recall that, the expectation of a linearfunction aX + b of a random variable X isgiven by
E(aX + b) = aE(X) + b.
The equivalent result for the variance of alinear function is:If X is a rv, and a and b are constants, thenVar(aX + b) = a2Var(X).
Proof: Let Z = aX + b, then
E(Z) = aE(X) + b.
Now, by definition, Var(Z) = E[{Z − E(Z)}2].We also know that
Z − E(Z) = (aX + b)− (aE(X) + b)= a{X − E(X)}
So Var(Z) = E[a2{X − E(X)}2]= a2E{X − E(X)}2]= a2Var(X).
-
Variances of sums of rvs:
For any two rvs X and Y ,
Var(X + Y ) = Var(X)+Var(Y )+2Cov(X, Y ).
Proof: Write µx = E(X) and µy = E(Y ).
Var(X + Y ) = E[{(X + Y )− (µx + µy)}2
]= E
[{(X − µx) + (Y − µy)}2
]= E
[(X − µx)2 + (Y − µy)2
+2(X − µx)(Y − µy)]
= Var(X) + Var(Y ) + 2Cov(X, Y ).
Combining this with the previous result
Var(aX + b) = a2Var(X)
gives the following:
If X and Y are two rvs and a and b areconstants, then
Var(aX + bY ) = a2Var(X) + b2Var(Y )
+2abCov(X, Y ) .
-
Special case: difference between two rvs
Substituting a = 1, b = −1, we obtain
Var(X − Y ) = Var(X) + Var(Y )− 2Cov(X, Y )
[Compare:
Var(X + Y ) = Var(X) + Var(Y ) + 2Cov(X, Y ) .]
Further Extension: For any jointly distributed
rvs X1, X2, . . . , Xn:
Var(X1 + X2 + · · ·+ Xn)
=n∑
i=1
Var(Xi) + 2n−1∑i=1
n∑j=i+1
Cov(Xi, Xj).
That is: the variance of the sum of rvs is the
sum of their variances, plus twice the sum of
all the covariances.
-
Special case: Independent rvs
For independent rvs X1, X2, . . . , Xn:
Var
n∑i=1
Xi
= n∑i=1
Var(Xi),
and
Var
n∑i=1
aiXi
= n∑i=1
a2i Var(Xi).
For independent rvs only:
Variance of a sum = sum of the variances.
For all rvs:
Expectation of a sum = sum of the
expectations.
-
5.8 Applications to SamplingProblems
In statistical work we often select a set of
observations, a random sample, from some
distribution.
This is used to make inferences about the
features of the distribution, e.g. its mean and
variance.
Suppose that X1, X2, . . . , Xn are independent
and identically distributed, each with mean µ
and variance σ2.
Typically µ and σ are unknown, and we will
wish to use the information in the sample to
estimate them.
-
Estimating the mean, µ
Suppose that just the first value, X1, isused, ignoring all the rest.It is clear that E(X1) = µ. On average, X1 isneither too high or too low .
How close can X1 be expected to be to thecorrect value µ? What is the ‘error’ in thisestimate?
On any particular occasion, we will not knowthe value of X1 − µ, but we can assess itslikely size by finding Var(X) . (We havealready denoted this by σ2.)
We will compare X1 with another estimatorof µ: the sample mean
Y =X1 + X2 + · · ·+ Xn
n.
It is clear that Y makes better use of thedata than X1 does. But can we prove thatY is a better estimator of µ than X1 is?
-
Consider the mean and variance of Y .
E(Y ) =1
nµ +
1
nµ + · · ·+
1
nµ =
nµ
n= µ.
On average, Y is neither too high nor toolow. What is the ‘error’ in this estimate?For Var(Y ) we use the result:
Var (∑
aiXi) =∑
a2i Var(Xi),
valid when the rvs X1, . . . , Xn areindependent. But Y is defined as
Y = X1+X2+···+Xnn , so a1 = . . . = an =1n,
Hence Var(Y ) =1
n2σ2 +
1
n2σ2 + · · ·+
1
n2σ2
=nσ2
n2=
σ2
n.
But Var(X1) = σ2 . So, for n > 1,
Var(Y ) =σ2
n< Var(X1).
On average Y will be closer to µ than X1will: it is a better estimator of µ.
-
Notes:
1. The sample mean is usually denoted by abar over the symbol, e.g.,
X =1
n(X1 + X2 + · · ·+ Xn) .
2. We show in §5.9 that linear combinationsof Normal rvs are themselves Normal . IfX1, X2, . . . , Xn are mutually independent, andif they are all N(µ, σ2), then
X ∼ N(µ, σ2/n).This is an important result, and is used veryfrequently in statistical practice.
3. When estimating the mean µ from arandom sample, it is not necessarily best touse X. Another possible estimator is thesample median M .If X1, X2, . . . , Xn are mutually independent,and if they are all N(µ, σ2), it can be shownthat
M.∼ N
(µ,
π
2
σ2
n
).
-
Estimating the variance
The sample mean X is usually used to
estimate µ. What can we use to estimate σ2?
Since Var(X) is defined as E{(X − µ)2}, weconsider the sample version of this quantity:
1
n
n∑i=1
(Xi − µ)2.
However, µ is typically unknown , so cannot
be used. It is sensible to consider replacing it
by X . We therefore examine the properties
of a statistic T , defined as
T =1
n
n∑i=1
(Xi −X)2
as an estimator for σ2.
Will it be too high or too low, on average?
We need to find E(T ).
-
We start by expanding the term (Xi −X)2.This gives
T =1
n
n∑i=1
(Xi −X)2
=1
n
n∑i=1
X2i − 2Xn∑
i=1
Xi + nX2
.=
1
n
n∑i=1
X2i − X2.
Since expectation is a linear operator, we
obtain
E(T ) =1
n
n∑
i=1
E(X2i )
− E(X2)For any rv (W , say)
Var(W ) = E(W2)− {E(W )}2.So E(W2) = Var(W ) + {E(W )}2.
We know E(Xi), Var(Xi), E(X) and Var(X),
so we can now obtain E(X2i ) and E(X2).
-
We obtain
E(T ) =1
n
n∑1
[{E(Xi)}2 + Var(Xi)
]−
[{E(X)}2 + Var(X)
]= (µ2 + σ2)− (µ2 +
σ2
n)
=n− 1
nσ2.
This shows that, on average, T is a little too
small. To compensate for this, we usually use
S2 =n
n− 1T =
1
n− 1
n∑i=1
(Xi −X)2
to estimate σ2.
Since E(S2) = E(
n
n− 1T
)= σ2 , S2 is said to
be an unbiased estimator of σ2.
-
5.9 Sums of random variables
If X1, X2, . . . , Xn are independent, and
Y = X1 + X2 + · · ·+ Xn
is their sum, we know that
E(Y ) =n∑
i=1
E(Xi)
and that
Var(Y ) =n∑
i=1
Var(Xi).
But what is the distribution of Y ?
This distribution can be very complicated,
and there are several methods which can be
used to find it (seen in later courses).
Here we look at the use of the
moment generating function (mgf)
-
Moment Generating Function
DEFINITION: The mgf MX(s) of a random
variable X is defined as MX(s) = E(esX) :
hence
MX(s) = E(esX)
=∑x
esx Pr(X = x) (discrete)
=∫ ∞−∞
esxfX(x)dx (continuous).
Examples:
(1) Binomial distribution B(n, p)
MX(s) =n∑
x=0
esx(nx
)px(1− p)n−x
=n∑
x=0
(nx
)(pes)x (1− p)n−x
= (1− p + pes)n .
-
(2) Exponential distribution, parameter λ
MX(s) =∫ ∞0
esxfX(x) dx
=∫ ∞0
esxλe−λxdx
=λ
λ− s, s < λ, (exercise 82).
Note that this integral is valid for s < λ only.
(3) Normal distribution, N(µ, σ2)
MX(s) =∫ ∞−∞
esx1
σ√
2πe−(x−µ)
2
2σ2 dx
= eµs+12σ
2s2 .
Evaluating this integral is tricky, and beyondthe scope of this module.
The key point is that if one calculates themgf for some rv W , and finds it to be of the
form eas+12bs
2, then the distribution of W
must be Normal; the mean will be a, and thevariance will be b.
-
Some properties of mgfs
(1) Generating moments
Definition: For any rv X, E(Xr) is known as
the rth moment of X.
So the mean, E(X), is the first moment of
X, E(X2) is the second moment, and so on.
Now esX = 1 + sX +s2
2X2 + · · · ,
and therefore
E(esX
)= 1 + sE(X) +
s2
2E(X2) + · · · .
So, if one expands MX(s) in powers of s , the
coefficient of sr/r! will be E(Xr) , the rth
moment of X.
In particular, the coefficient of s will be the
mean , and the coefficient of 12s2 will be
E(X2), from which we can calculate the
variance .
-
Example: For the exponential distribution
with parameter λ, the mgf is
MX(s) = λ/(λ− s) = (1− s/λ)−1
= 1 +s
λ+
s2
λ2+ · · ·
= 1 +1
λs +
2
λ212s
2 + · · ·
Hence E(X) =1
λand E(X2) =
2
λ2.
These values agree with those we found in
§4.5; from them we can show that
Var(X) = E(X2)−{E(X)}2 =2
λ2−(1
λ
)2=
1
λ2.
-
(2) Sums of independent rvs:
Let X and Y be independent rvs, and letZ = X + Y . Then
MZ(s) = E(esZ
)= E
{es(X+Y )
}= E
(esXesY
)= E
(esX
)E(esX
)(by independence)
= MX(s)MY (s).
The mgf of the sum of two independent rvsis the product of their mgfs.
The result also holds for the sum of nindependent rvs,
MX1+X2+···+Xn(s) = MX1(s)MX2(s) · · ·MXn(s).
This is an important result. It gives us arelatively easy way to find the distribution ofsample means – which are just sums of rvsdivided by a constant.
-
Example 1: Binomial distribution
Suppose that rvs X1 and X2 are known to be
independent, and that X1 ∼ B(n1, p) andX2 ∼ B(n2, p). Find the distribution ofY = X1 + X2 .
MY (s) = MX1(s)MX2(s)
= (1− p + pes)n1 (1− p + pes)n2
= (1− p + pes)n1+n2 .
This is the mgf of the B(n1 + n2, p)
distribution. We have therefore shown that
X1 + X2 ∼ B(n1 + n2, p).
Note that the result is not valid if the
probabilities of success are different. If
X1 ∼ B(n1, p1) and X2 ∼ B(n2, p2), withp1 6= p2, then the mgf of Y = X1 + X2 willnot be of the binomial form.
-
Example 2: Exponential distribution
Let X1 and X2 be independent, and let them
both have an exponential distribution with
parameter λ.
The distribution of X1 + X2 has mgf(
λλ−s
)2.
This is not of the same form as the mgf of an
exponential distribution, so the sum of
exponential rvs does not have an exponential
distribution.
-
Example 3: Normal distribution
Suppose that rvs X1, X2, . . . , Xn are all
independent, and that they are Normally
distributed with Xi ∼ N(µi, σ2i ).
Find the distribution of the sum Y =n∑1
Xi.
Now the mgf of Xi is eµis+
12σ
2i s
2, and the mgf
of Y is the product of all the mgfs of the Xs.
Hence MY (s) =n∏
i=1
e(µis+12σ
2i s
2)
That is, MY (s) = e{∑
(µis+12σ
2i s
2)}
= es∑
µi+s22
∑σ2i ,
where all sums are over the range i = 1, ..., n.
We see that the mgf of Y is of the form
eas+12bs
2,which is the form of the mgf of a
Normal distribution. Therefore, Y is Normally
distributed.
-
Recall that, if X ∼ N(µ, σ2) then the mgf of
X will be eµs+12σ
2s2. The mgf of Y is clearly
of this form, so
Y ∼ N(n∑
i=1
µi,n∑
i=1
σ2i ) .
Sums of Normal rvs are themselves
Normally distributed.
Very few distributions have this property. The
fact that the Normal does contributes a great
deal to its importance.
-
CHAPTER 5 SUMMARY
Joint distributions
• Description of the simultaneous behaviourof two or more random variables.
• Discrete, joint pf: PX,Y (x, y)Continuous, joint pdf:fX,Y (x, y)
• joint cdf• marginal and conditional distributions• independent rvs:
– E[g(X)h(Y )] = E[g(X)]E[h(Y )]– fX,Y (x, y) = fX(x)fY (y)
• Expectation as an operator– expectation of function of 2 rvs– linear operator: E(aX + bY + c)– covariance– mean and variance of linear
transformations, especially ofindependent rvs
– variance of a sum of rvs,• sampling problems• sums of independent rvs: mgfs
-
COURSE SUMMARY p1/2
[Chapters 1 and 2]
• Experiment, event, sample space.• Union, intersection, complement• Exclusive and exhaustive events• Probability: axioms.• Interpretations of probability:
– symmetry,
– limiting relative frequency,
– subjective probability
• Deductions from axioms• Sampling problems, replacement• Conditional probability,• Independence (pairwise, mutual)• Important theorems
– law of total probability
– Bayes’ theorem
-
COURSE SUMMARY p2/2[Chapters 3 ,4 and 5]
• Discrete and continuous rvs– discrete: pf, cdf– continuous: pdf, cdf
• Expectation and variance• Bernoulli trials• Important discrete distributions
– binomial, Poisson, geometric– Poisson approximation to binomial
• Important continuous distributions– uniform, exponential, Normal
• The Normal distribution– standardisation: N(0,1)– Use of tables– Normal approximation to the
binomial and Poisson distributions• Joint distributions
– Marginal and conditional distributions– Independent random variables– Use of expectation– Sampling and estimation– Sum of rvs. Moment generating fn