applied statistics - parametric distributions

OutlineTopics

Parametric distributionsRandom sampling and sampling distribution of Y

Law of large numbers and central limit theorem

Applied Statistics for Economics3. Parametric Probability Distributions, Random

Sampling, and the Law of Large Numbers

SFC - [email protected]

Spring 2012

SFC - [email protected] Applied Statistics for Economics 3. Parametric Probability Distributions, Random Sampling, and the Law of Large Numbers

OutlineTopics



Topics

Parametric distributions

Random sampling and sampling distribution of Y



OutlineTopics



Topics

The topics for this chapter are:

1. The normal, chi-square, F , and t distributions

2. Random sampling and the distribution of the sample average

3. Large-sample approximations and laws of large numbers


OutlineTopics



Parametric distributions

The most widely used distributions in econometrics are the following:

1. Normal N(µ, σ2)

2. Chi-squared χ2m

3. Student tm

4. F distribution Fm,n


OutlineTopics



Normal distribution

The normal distribution has the bell shape probability density. Thenormal density with mean µ and variance σ2 is symmetric aroundits mean. It has approximately 68% of its probability mass betweenµ− σ and µ+ σ; 95% between µ− 2σ and µ+ 2σ; and 99.7%between µ− 3σ and µ+ 3σ.The normal with mean µ and variance σ2 is denoted as N(µ, σ2).The standard normal distribution is the normal distribution withmean µ = 0 and variance σ2 = 1. It’s denoted as N(0, 1).


OutlineTopics



Normal distribution

Random variables with a standard normal distribution are denotedas Z . The standard normal cumulative distribution function isdenoted by Φ: Pr(Z ≤ c) = Φ(c), where c is a constant.The textbook tables give you the values of the standard normalcumulative function. So does Excel.If you have a normally distributed r.v. Y and want to find specificprobabilities using the tables, standardize it first:

Z =(Y − µ)

σ


OutlineTopics



Normal distribution

Let Y ∼ N(µ, σ2). Then Z = (Y − µ)/σ.Let c1 and c2 be two numbers such that c1 < c2 and let d1 = (c1 − µ)/σand d2 = (c2 − µ)/σ. Then:

Pr(Y ≤ c2) = Pr(Z ≤ d2) = Φ(d2)

Pr(Y ≥ c1) = Pr(Z ≥ d1) = 1− Φ(d1)

Pr(c1 ≤ Y ≤ c2) = Pr(d1 ≤ Z ≤ d2) = Φ(d2)− Φ(d1)


OutlineTopics



Multivariate normal distribution

The normal distribution generalized to many r.v.’s is called themultivariate normal. For two, X and Y , it’s called the bivariate normal.If X and Y have a bivariate normal distribution with covariance σXY ,while a and b are constants, thenaX + bY ∼ N(aµX + bµY , a

2σ2X + a2σ2

X + 2abσXY ).Similarly, if n r.v.’s have a multivariate normal distribution, then:

1. any linear combination of these variables is normally distributed,

2. the marginal distribution of each of the variables is normal, and

3. the r.v.’s are independent if, also, their covariances are zero.1

1We said before that if two r.v.’s are independent, then their covariance is zero. We also said the converse is

not necessarily true. In the special case of a joint normal distribution, the converse is true.


OutlineTopics



Chi-squared

The chi-squared distribution is the distribution of the sum of m squaredindependent standard normal r.v.’s. This distribution depends on m (the‘degrees of freedom’ of the distribution).Let Z1,Z2,Z3 be three independent standard normal r.v.’s ThenZ 2

1 + Z 22 + Z 2

3 has a chi-squared distribution with 3 degrees of freedom.Formally and in general:

(Z 21 + · · ·+ Z 2

m) ∼ χ2m


OutlineTopics



Student t distribution

The Student t distribution with m degrees of freedom is defined as thedistribution of the ratio of a standard normal variable, divided by thesquare root of an independently distributed chi-squared r.v. with mdegrees of freedom divided by m.Let Z be a standard normal r.v., W a r.v. with a chi-squared distributionwith m degrees of freedom, and Z and W are independently distributed.Then

Z/√

W /m ∼ tm

The t density function has a bell shape, similar to the normal. But whenm is small (20 or less) the tails are fatter. With m > 30, the t isapproximated well by the standard normal, and t∞ converges to thestandard normal.


OutlineTopics



The F distribution

The F distribution with (m, n) d.f. is defined as the distribution of theratio of a chi-squared r.v. with m d.f., divided by m, to an independentlydistributed chi-squared r.v. with n d.f., divided by n.Let W be a chi-squared r.v. with m d.f., V a chi-squared r.v. with n d.f.,where W and V are independently distributed. Then

W /m

V /n∼ Fm,n

When the d.f. of the denominator (n) increase indefinitely, then the r.v.V approximates the mean of an infinite number of chi-squared r.v.’s.And the mean of an infinite number of chi-squared r.v.’s is 1, because themean of a standard normal r.v. is 1. In other words, the Fm,∞

distribution of W/mV/n converges to the χ2

m distribution of W /m.


OutlineTopics



Random sampling

Virtually all the statistical and econometric procedures we’ll use involveaverages of a sample of data. That’s why we need to characterize thedistribution of sample averages.Random sampling is randomly drawing a sample from a largerpopulation. The average of a sample is, therefore, a r.v. – because itdepends on the particular sample used. Since it is a random variable, theaverage sample has a probability distribution (the sampling distribution).But before we talk about the average of a random sample, let’s say moreabout random sampling in general.


OutlineTopics



Random sampling

To say it differently, random sampling is the selection at random of nobjects from a population such that each member of the population isequally likely to be included in the sample.Example: Suppose you record the length of your commute to school andthe weather on a sample of days picked randomly. The population fromwhich you draw your sample is all your commuting days. If you draw yoursample randomly, each day of commute will have an equal chance to bepicked.Since the choice of days is random, learning about the weather on agiven sampled day won’t tell you anything about the length of commuteon any other sample day. That is, the value of the commuting time oneach sample day is an independently distributed r.v.Let the observations in the sample be Y1, . . . ,Yn. Because the days arepicked randomly, the value of the r.v. on day i , Yi is itself random. If youpick different days, you get different values of Y . Because of randomsampling, you can treat Yi as a r.v.: before it is sampled, Yi can havemany possible values; after sampled, Yi has a specific value.SFC - [email protected] Applied Statistics for Economics 3. Parametric Probability Distributions, Random Sampling, and the Law of Large Numbers

OutlineTopics



i.i.d.

Since Y1, . . . ,Yn are drawn randomly from the same population (e.g.,commuting days), the marginal distribution of Yi is the same for eachi = 1, . . . , n. And this marginal distribution is the marginal distribution ofthe population variable Y being sampled. When Yi has the samemarginal distribution for i = 1, . . . , n, then Y1, . . . ,Yn are said to beidentically distributed.And when Y1, . . . ,Yn are drawn from the same distribution and areindependently distributed, they are said to be i.i.d. (independently andidentically distributed).Formally: In a simple random sample, n objects are drawn at randomfrom a population and each object is equally likely to be drawn. Thevalue of the r.v. Y for the ith randomly drawn object is Yi . Since eachobject is equally likely to be drawn and the distribution of Yi is the samefor all i , the r.v.’s Y1, . . . ,Yn are i.i.d.; that is the distribution of Yi is thesame for all i = 1, . . . , n and Y1 is distributed independently ofY2, . . . ,Yn, etc.


OutlineTopics



Sampling distribution of the sample average

The sample average, Y , of the n observations Y1,Y2, . . . ,Yn is:

Y =1

n(Y1 + Y2 + · · ·+ Yn) =

1

n

n∑i=1

Yi

By drawing a random sample, we ensure that the sample average is a r.v.Since the sample is random, each Yi is random. Since the n observationsare random, their average is random. If we had drawn a different sample,the Y ’s would have been different and their average would have beendifferent. From sample to sample, the value of Y changes.Since Y is a r.v., it has a probability distribution. It is called thesampling distribution of Y : the probability of the possible values of Ythat could be computed for different possible samples Y1,Y2, . . . ,Yn.The sample average and their sampling distributions play a key role instatistics.


OutlineTopics



Mean of Y

Let the observations Y1,Y2, . . . ,Yn be i.i.d. and µY and σ2Y be the mean

and variance of Yi . (All Yi have the same mean and variance since theobservations are i.i.d. draws.)If n = 2, then mean of Y1 + Y2 is E (Y1 + Y2) = µY + µY = 2µY .Therefore, the mean of the sample average isE [ 1

2 (Y1 + Y2)] = ( 12 )2µY = µY . In general,

E (Y ) =1

n

n∑i=1

E (Yi ) = µY

Question: What’s the variance of (aX + bY )?


OutlineTopics



Variance of Y

We learned before that var(aX + bY ) = a2σ2X + 2abσXY + b2σ2

Y .With two i.i.d. draws (n = 2), var(Y1 + Y2) = 2σ2

Y . And var(Y ) = 12σ

2Y .

Why does the covariance term drops out?For general n, since Y1,Y2, . . . ,Yn are i.i.d. (Yi 6= Yj) for i 6= j , so thecov(Y1,Y2) = 0,

var(Y ) = var(1

n

n∑i=1

Yi

)σ2Y =

σ2Y

n

The standard deviation:s.d.(Y ) =

σY√n


OutlineTopics



Mean, variance, and s.d. of Y

Just to summarize these results:

E (Y ) = µY

var(Y ) =σ2Y

n

s.d.(Y ) =σY√n

Note: These results hold regardless of the distribution of Y . But ifY1, . . . ,Yn are i.i.d. draws from Y ∼ N(µY , σ

2Y ), then E (Y ) = µY and

var(Y ) = σ2Y /n. In other words, Y ∼ N(µY , σ

2Y /n).

Random sampling ensures that the observations are i.i.d. draws from thepopulation r.v.


OutlineTopics



Law of large numbers

Sampling distributions are key in developing statistical and econometricprocedures. That’s why it is important to understand, mathematically,the sampling distribution of Y .There are two approaches to characterizing the sampling distribution ofY : (1) the ‘exact’ approach and (2) the ‘approximate’ approach.The exact approach requires the mathematical derivation of a formula forthe sampling distribution that holds for any value of n. The result iscalled the exact or finite-sample distribution of Y . As we learned, if Yis a normal r.v. and Y1, . . . ,Yn are i.i.d., then the exact distribution of Yis normal with mean µY and variance σ2

Y /n.


OutlineTopics




What if Y is not a normal r.v.? Then, the derivation of the exactprobability distribution of Y is very hard. That’s why we use theapproximate or large-sample approach. The resulting samplingdistribution is often called an asymptotic distribution (asymptoticmeans that the approximation becomes exact in the limit when n is verylarge).The beauty of this is that the approximations can be very accurate oncethe sample size goes over, say, n = 30. If we use really large samples(thousands or tens of thousands of observations), then we cancomfortably rely on asymptotic distributions since they become adequateapproximations to the exact sampling distributions.


OutlineTopics




In deriving asymptotic sampling distributions, we will invoke two strongmathematical facts: (1) the law of large numbers and (2) the centrallimit theorem.The law of large numbers says that if the observations in a sample Yi ,i , . . . , n are i.i.d. with E (Yi ) = µY and if large outliers are unlikely (inother words, if the variance of Yi is finite: var(Yi ) = σ2

Y <∞), then Yconverges in probability to µY .The sample average Y converges in probability to (or “is consistentfor”) µY if the probability that Y is “close” to µY becomes arbitrarilyclose to one as n increases.(Usually, when statisticians say that a given sample average isconsistent, they mean that the sample average converges in probabilityto the population average. In other words, they say that the higher n is,the closer the sample average gets to the population average. Thisconcept is key in estimating the population average from a sample.)


OutlineTopics



Central limit theorem

If the observations in a sample Y1, . . . ,Yn are i.i.d. with E (Yi ) = µY andvar(Yi ) = σ2

Y , where 0 < σ2Y <∞, and regardless of the distribution of

Yi , then as n increases indefinitely (n→∞) the distribution of Ybecomes arbitrarily well approximated by a normal distribution with meanE (Y ) = µY and variance σ2

Y= σ2

Y /n.

In other words, the distribution of (Y − µY )/σY (where σ2Y

= σ2Y /n)

becomes arbitrarily well approximated by the standard normaldistribution.


OutlineTopics



Central limit theorem

How large should n be for this approximation to normality to be good? Itdepends on the distribution of Yi . If Yi is normal, then Y is normal forany n (even if small). If Yi has a distribution very far from normal, thenthe approximation requires that n ≥ 30. For sure, when n ≥ 100, thedistribution of Y should look pretty normal.Since the distribution of Y approaches the normal as n grows large, thenY is said to be asymptotically normally distributed.We’re ready for statistics!


applied statistics - parametric distributions

Education

central limit theoremlaw

central limit theorem

y1 y2

standard normal distribution

ax

standard normal

independently distributed

juliohuato gmail