probability

94
Course 003: Basic Econometrics, 2012-2013 Course 003: Basic Econometrics Rohini Somanathan- Part 1 Sunil Kanwar- Part II Delhi School of Economics, 2014-2015 Page 0 Rohini Somanathan Course 003: Basic Econometrics, 2012-2013 Outline of the Part 1 Main text: Morris H. DeGroot and Mark J. Schervish, Probability and Statistics, fourth edition. 1. Probability Theory: Chapters 1-6 Probability basics: The definition of probability, combinatorial methods, independent events, conditional probability. Random variables: Distribution functions, marginal and conditional distributions, distributions of functions of random variables, moments of a random variable, properties of expectations. Some special distributions,laws of large numbers, central limit theorems 2. Statistical Inference: Chapters 7-10 Estimation: definition of an estimator, maximum likelihood estimation, sufficient statistics, sampling distributions of estimators. Hypotheses Testing: simple and composite hypotheses, tests for differences in means, test size and power, uniformly most powerful tests. Nonparametric Methods Page 1 Rohini Somanathan

Upload: anonymous-hu37mj

Post on 06-Sep-2015

20 views

Category:

Documents


5 download

DESCRIPTION

probability, lecture notes, economics lecture notes,

TRANSCRIPT

  • Course 003: Basic Econometrics, 2012-2013

    Course 003: Basic Econometrics

    Rohini Somanathan- Part 1

    Sunil Kanwar- Part II

    Delhi School of Economics, 2014-2015

    Page 0 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Outline of the Part 1

    Main text: Morris H. DeGroot and Mark J. Schervish, Probability and Statistics, fourth edition.

    1. Probability Theory: Chapters 1-6

    Probability basics: The definition of probability, combinatorial methods, independentevents, conditional probability.

    Random variables: Distribution functions, marginal and conditional distributions,distributions of functions of random variables, moments of a random variable,

    properties of expectations.

    Some special distributions,laws of large numbers, central limit theorems2. Statistical Inference: Chapters 7-10

    Estimation: definition of an estimator, maximum likelihood estimation, sufficientstatistics, sampling distributions of estimators.

    Hypotheses Testing: simple and composite hypotheses, tests for differences in means,test size and power, uniformly most powerful tests.

    Nonparametric Methods

    Page 1 Rohini Somanathan

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Administrative Information

    Internal Assessment: 25% for Part 11. Midterm: 20%

    2. Lab assignments, Tutorial attendance and class participation: 5%

    Problem Sets: - Do as many problems from the book as you can. All odd-numberedexercises have solutions so focus on these.

    Tutorials: -Check the notice board in front of the lecture theatre for lists. Punctuality is critical - coming in late disturbs the rest of the class and me

    Page 2 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Why is this course useful?

    We (as economists, citizens, consumers, exam-takers) are often faced with situations inwhich we have to make decisions in the face of uncertainty. This may be caused by:

    randomness in the world ( a farmer making planting decisions does not know how much

    it will rain during the season, we do not know how many days well be sick next year,

    what the chances are of an economic crisis or recovery)

    incomplete information about the realized state of the world (Is a politicians promise

    sincere? Is a firm telling us the truth about a product? Has our opponent been dealt a

    better hand of cards? Is a prisoner guilty or innocent... )

    By putting structure on this uncertainty, we can arrive at decision rules: firms choose techniques, doctors choose drug regimes, electors choose

    politicians- these rules have to tell us how best to incorporate new information.

    estimates: of empirical relationships (wages and education, drugs and health...)

    tests: how likely is it that population parameters take particular values based on the

    estimates weve obtained?

    Probability theory puts structure on uncertain events and allows us to derive systematicdecision rules. The field of statistics shows us how we can collect and use data to estimate

    empirical models and test hypotheses about the population based on our estimates.

    Page 3 Rohini Somanathan

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    A motivating example: gender ratios

    We are interested in whether the gender ratio in a population reflects discrimination, eitherbefore of after birth.

    Suppose it is equally likely for a child of either sex to be conceived. We visit a small village with 10 children under the age of 1. If each birth is independent, we

    would get considerable variation in the sex-ratio in the absence of discrimination.

    P(0)=.0001, P(1)=.001, P(2)=.044, P(3)=.12, P(4)=.21, P(5)=.25 ... (display binomial(10, k, .5))

    0

    .

    0

    5

    .

    1

    .

    1

    5

    .

    2

    .

    2

    5

    p

    r

    o

    b

    a

    b

    i

    l

    i

    t

    y

    0 1 2 3 4 5 6 7 8 9 10

    When should we conclude that there is gender bias? Can we get an estimate of this bias?

    Page 4 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Origins of probability theory A probability is a number attached to some event which expresses the likelihood of the

    event occurring.

    A theory of probability was first exposited by European mathematicians in the 16th Cstudying gambling problems.

    How are probabilities assigned to events? By thinking about all possible outcomes. If there are n of these, all equally likely, we

    can attach numbers 1n to each of them. If an event contains k of these outcomes, we

    attach a probability kn to the event. This is the classical interpretation of probability.

    Alternatively, imagine the event as a possible outcome of an experiment. Its probability

    is the fraction of times it occurs when the experiment is repeated a large number of

    times. This is the frequency interpretation of probability

    In many cases events cannot be thought of in terms of repeated experiments or equally

    likely outcomes. We could base likelihoods in this case on what we believe about the

    world subjective probabilities. The subjective probability of an event A is a real number

    in the interval [0, 1] which reflects a subjective belief in the validity or occurence of event

    A. Different people might attach different probabilities to the same events. Examples?

    We formalize this subjective interpretation by imposing certain consistency conditions oncombinations of events.

    Page 5 Rohini Somanathan

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Definitions An experiment is any process whose outcome is not known in advance with certainty. These

    outcomes may be random or non-random, but we should be able to specify all of them and

    attach probabilities to them.

    Experiment Event

    10 coin tosses 4 heads

    select 10 LS MPs one is female

    go to your bus-stop at 8 bus arrives within 5 min.

    A sample space is the collection of all possible outcomes of an experiment. An event is a certain subset of possible outcomes in the space S. The complement of an event A is the event that contains all outcomes in the sample space

    that do not belong to A. We denote this event by Ac

    The subsets A1,A2,A3 . . . . . . of sample space S are called mutually disjoint sets if no two ofthese sets have an element in common. The corresponding events A1,A2,A3 . . . . . . are said to

    be mutually exclusive events.

    If A1,A2,A3 . . . . . . are mutually exclusive events such that S =A1 A2 A3 . . . . . . , these arecalled exhaustive events.

    Page 6 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Example: 3 tosses of a coin The experiment has 23 possible outcomes and we can define the sample space S = {s1, . . . ,s8}

    where

    s1 =HHH, s2 =HHT s3 =HTH, s4 =HTT , s5 = THH, s6 = THT , s7 = TTH, s8 = TTT

    Any subset of this sample space is an event. If we have a fair coin, each of the listed events are equally likely and we attach probability 18

    to each of them.

    Let us define the event A as atleast one head. Then A = {s1, . . . ,s7}, Ac = {s8}. A and Ac areexhaustive events.

    The events exactly one head and exactly two heads are mutually exclusive events. Notice that there are lots of different ways in which we can define a sample space and the

    most useful way to do so depending on the event we are interested in (# heads, or with

    picking from a deck of cards, we may be interested in the suit, the number or both)

    Page 7 Rohini Somanathan

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    The definition of probabilityDefinition: Let S be a collection of all events in S. A Probability distribution is a function

    P : S [0, 1] which satisfies the following axioms:1. The probability of every event must be non-negative

    P(A) 0 for all events A S

    2. If an event is certain to occur, its probability is 1

    P(S) = 1

    3. For any sequence of disjoint events A1, . . . . . .

    P(i=1Ai) =i=1

    P(Ai)

    Note:

    We will typically use P(A) or Pr(A) instead of P(A) For finite sample spaces S is straightforward to define. For any S which is a subset of the

    real line (and therefore infinite) let S be the set of all intervals in S.

    Page 8 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Probability measures... some useful results

    We can use our three axioms to derive some useful results:

    Result 1:For each A S, P(A) = 1P(Ac)Proof: AAc = S. By our second axiom, P(S) = 1 and by axiom 3,P(AAc) = P(A)+P(Ac)

    Result 2:P() = 0Proof: Let A = so Ac = S. Since P(S) = 1, P() = 0 using the first result above.

    Result 3:If A1 and A2 are subsets of S such that A1 A2, then P(A1) P(A2)Proof: Lets write A2 as: A2 =A1 (Ac1 A2). Since these are disjoint, we can use property3 to get P(A2) = P(A1)+P(A

    c1 A2). The second term on the RHS is non-negative (by

    axiom 1), so P(A2) P(A1). Result 4: For each A S, 0 P(A) 1

    Proof: Since A S, we can directly apply the previous result to obtainP() P(A) P(S) or 0 P(A) 1

    Page 9 Rohini Somanathan

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Some useful results..Result 5: If A1 and A2 are subsets of S then P(A1 A2) = P(A1)+P(A2)P(A1 A2)Proof: As before, the trick is to write A1 A2 as a union of disjoint sets and then add theprobabilities associated with them. Drawing a Venn Diagram helps to do this.

    A1 A2 = (A1 Ac2 ) (A1 A2) (A2 Ac1 ) (1)

    but A1 = (A1 Ac2 ) (A1 A2) and A2 = (A2 Ac1 ) (A1 A2), so

    P(A1)+P(A2) = P(A1 Ac2 )+P(A1 A2)+P(A2 Ac1 )+P(A1 A2)

    Subtracting P(A1 A2) gives us the expression in (??).

    Page 10 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Examples using the probability axioms

    1. Consider two events A and B such that Pr(A) = 13 and Pr(B) =12 . Determine the value of

    P(BAc) for each of the following conditions: (a) A and B are disjoint (b) A B (c)Pr(AB) = 18

    2. Consider two events A and B, where P(A) = .4 and P(B) = .7. Determine the minimum and

    maximum values of Pr(AB) and the conditions under which they are obtained?

    3. A point (x,y) is to be selected from the square containing all points (x,y), such that

    0 x 1 and 0 y 1. Suppose that the probability that the point will belong to anyspecified subset of S is equal to the area of that subset. Find the following probabilities:

    (a) (x 12)2+(y 12)

    2 14(b) 12 < x+y 0 for all i, then using the multiplication rule derived above, this can be written as:

    P(B) =

    ki=1

    P(Ai)P(B|Ai)

    This is known as the law of total probability.

    Example: Youre playing a game in which your score is equally likely to take any integervalue between 1 and 50. If your score the first time you play is equal to X, and you play

    until you score Y X, what is the probability that Y = 50?Solution: For each value xi, P(X = xi) =

    150 . We can compute the conditional probability of

    Y = 50 for each of these values. The event Ai is the probability that X = xi and the event B

    is getting a 50 to end the game. The probability of getting xi in the first round and 50 to

    end the game is given by the product, P(B|Ai)P(Ai). The required probability is the sum

    of these products over all possible values of i:

    P(Y = 50) =

    50x=1

    1

    51 x.1

    50=

    1

    50(1+

    1

    2+

    1

    3+ + 1

    50) = .09

    Page 24 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Bayes Theorem

    Bayes Theorem: (or Bayes Rule) Let the events A1,A2, . . .Ak form a partition of S such that

    P(Aj) > 0 for all j = 1, 2, . . . ,k, and let B be any event such that P(B) > 0. Then for i = 1, . . . ,k,

    P(Ai|B) =P(B|Ai)P(Ai)kj=1P(Aj)P(B|Aj)

    Proof:

    By the definition of conditional probability,

    P(Ai|B) =P(AiB)

    P(B)

    The denominators in these expressions are the same by the law of total probability and the

    numerators are the same using the multiplication rule.

    In the case where the partition of S consists of only two events,

    P(A|B) =P(B|A)P(A)

    P(B|A)P(A)+P(B|Ac)P(Ac)

    Page 25 Rohini Somanathan

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Bayes Rule...remarks

    Bayes rule provides us with a method of updating events in the partition based on the newinformation provided by the occurrence of the event B

    Since P(Aj) is the probability of event Aj prior to the occurrence of event B, it is referredto as the prior probability of event Aj.

    P(Aj|B) is the updated probability of the same event after the occurrence of B and is calledthe posterior probability of event Aj.

    Bayes rule is very commonly used in game-theoretic models. For example, in politicaleconomy models a Bayes-Nash equilibrium is a standard equilibrium concept: Players (say

    voters) start with beliefs about politicians and update these beliefs when politicians take

    actions. Beliefs are constrained to be updated based on Bayes conditional probability

    formula.

    In Bayesian estimation, prior distributions on population parameters are updated giveninformation contained in a sample. This is in contrast to more standard procedures where

    only the sample information is used. The sample would now lead to different estimates,

    depending on the prior distribution of the parameter that is used.

    A word about Bayes: He was a non-conformist clergyman (1702-1761), with no formalmathematics degree. He studied logic and theology at the University of Edinburgh.

    Page 26 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Bayes Rule ...examples

    C1, C2 and C3 are plants producing 10, 50 and 40 per cent of a companys output. Thepercentage of defective pieces produced by each of these is 1, 3 and 4 respectively. Given

    that a randomly selected piece is defective, what is the probability that it is from the first

    plant?

    P(C1|C) =P(C|C1))(P(C1)

    P(C)=

    (.01)(.1)

    (.01)(.1)+ (.03)(.5)+ (.04)(.4)=

    1

    32= .03

    How do the prior and posterior probabilities of the event C1 compare? What does this tell

    you about the difference between the priors and posteriors for the other events?

    Suppose that there is a new blood test to detect a virus. Only 1 in every thousand peoplein the population has the virus. The test is 98 per cent effective in detecting a disease in

    people who have it and gives a false positive for one per cent of disease free persons tested.

    What is the probability that the person actually has the disease given a positive test result:

    P(Disease|Positive) =P(Positive|Disease)P(Disease)

    P(Positive)=

    (.98)(.001)

    (.98)(.001)+ (.01)(.999)= .089

    So in spite of the test being very effective in catching the disease, we have a large number of

    false positives.

    Page 27 Rohini Somanathan

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Bayes Rule ... priors, posteriors and politicsTo understand the relationship between prior and posterior probabilities a little better, consider

    the following example:

    A politician, on entering parliament, has a reasonably good reputation. A citizen attaches aprior probability of 34 to his being honest (undertaking policies to maximize social welfare,

    rather than his bank balance).

    At the end of his tenure, the citizen finds a very large number of potholes on roads in thepoliticians constituency. While these do not leave the citizen with a favorable impression of

    the incumbent, it is possible that the unusually heavy rainfall over these years was

    responsible.

    Elections are coming up. How does the citizen use this information on road conditions toupdate his assessment of the moral standing of the politician? Let us compute the posterior

    probability of the politicians being honest, given the event that the roads are in bad

    condition:

    Suppose that the probability of bad roads is 13 if the politician is honest and is23 if

    he/she is dishonest.

    The posterior probability of the politician being honest is now given by

    P(honest|bad roads) =P(bad roads|honest)P(honest)

    P(bad roads)=

    ( 13)(34)

    ( 13)(34)+ (

    23)(

    14)=

    3

    5

    What would the posterior be if the prior is equal to 1? What if it the prior is zero? What ifthe probability of bad roads was equal to 12 for both types of politicians? When are

    differences between priors and posteriors going to be large?

    Page 28 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    The Monty Hall problem

    A game show host leads the contestant to a wall with three closed doors. Behind one of these is a fancy car, behind the other two a consolation prize (a bag of sweets) The contestant must first choose a door without any prior knowledge of what is behind each

    door.

    The host then opens one of the doors hiding a bag of sweets. The contestant is given an opportunity to switch doors and wins whatever is behind the

    door that is finally chosen by him.

    Does he raise his chances of winning the car by switching? Suppose that the contestant chooses door 1 and the host opens door three. Denote by

    A1,A2 and A3 the events that the car is behind doors 1,2 and 3 respectively. Let B be

    the event that the host opens door 3.

    Wed like to compare P(A1|B) and P(A2|B).

    By Bayes rule, the denominator of both these expressions is P(B), we therefore need to

    compare P(B|A1)P(A1) and P(B|A2)P(A2)

    The first expression is 12 .13 , the second is

    13 (because if the car is behind 2 then three will

    certainly be opened, so P(B|A2) = 1

    The contestant can therefore double his probability of being correct by switching. The

    posterior probability of A2 is23 while that of A1 remains

    13 .

    Page 29 Rohini Somanathan

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Bayes Rule: The Sally Clark case

    Sally Clark was a British solicitor who became the victim of a one of the great miscarriagesof justice in modern British legal history

    Her first son died within a few weeks of his birth in 1996 and her second one died insimilarly in 1998 after which she was arrested and tried for their murder.

    A well-known paediatrician Professor Sir Roy Meadow, who testified that the chance of twochildren from an auent family suffering sudden infant death syndrome was 1 in 73 million,

    which was arrived by squaring 1 in 8500 for likelihood of a cot death in similar circumstance.

    Clark was convicted in November 1999. In 2001 the Royal Statistical Society issued a publicstatement expressing its concern at the misuse of statistics in the courts and arguing that

    there was no statistical basis for Meadows claim

    In January 2003, she was released from prison having served more than three years of hersentence after it emerged that the prosecutors pathologist had failed to disclose

    microbiological reports that suggested one of her sons had died of natural causes.

    RSS statement excerpts: In the recent highly-publicised case of R v. Sally Clark, a medical expertwitness drew on published studies to obtain a figure for the frequency of sudden infant death syndrome

    (SIDS, or cot death) in families having some of the characteristics of the defendants family. He went on

    to square this figure to obtain a value of 1 in 73 million for the frequency of two cases of SIDS in such a

    family. ..This approach is, in general, statistically invalid. It would only be valid if SIDS cases arose

    independently within families,.. there are very strong a priori reasons for supposing that the assumption

    will be false. There may well be unknown genetic or environmental factors that predispose families to SIDS,

    so that a second case within the family becomes much more likely. The true frequency of families with two

    cases of SIDS may be very much less incriminating than the figure presented to the jury at trial.

    Page 30 Rohini Somanathan

  • Course 003: Basic Econometrics, 2012-2013

    Topic 2: Random Variables and Probability Distributions

    Rohini Somanathan

    Page 0 Rohini Somanathan

    Course 003, 2014-2015

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Sample spaces and random variables

    The outcomes of some experiments inherently take the form of real numbers: crop yields with the application of a new type of fertiliser

    students scores on an exam

    miles per litre of an automobile

    Other experiments have a sample space that is not inherently a subset of Euclidean space Outcomes from a series of coin tosses

    The character of a politician

    The modes of transport taken by a citys population

    The degree of satisfaction respondents report for a service provider -patients in a

    hospital may be asked whether they are very satisfied, satisfied or dissatisfied with the

    quality of treatment. Our sample space would consist of arrays of the form

    (VS,S,S,DS, ....)

    The caste composition of elected politicians.

    The gender composition of children attending school.

    A random variable is a function that assigns a real number to each possible outcome s S.

    Page 1 Rohini Somanathan

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Random variablesDefinition: Let (S,S,) be a probability space. If X : S< is a real-valued function having asits domain the elements of S, then X is called a random variable.

    A random variable is therefore a real-valued function defined on the space S. Typically x isused to denote this image value, i.e. x = X(s).

    If the outcomes of an experiment are inherently real numbers, they are directlyinterpretable as values of a random variable, and we can think of X as the identity function,

    so X(s) = s.

    We choose random variables based on what we are interested in getting out of theexperiment. For example, we may be interested in the number of students passing an exam,

    and not the identities of those who pass. A random variable would assign each element in

    the sample space a number corresponding to the number of passes associated with that

    outcome.

    We therefore begin with a probability space (S,S,) and arrive at an induced probabilityspace (R(X),B,PX(A)).

    How exactly do we arrive at the function Px(.)? As long as every set A R(X) is associatedwith an event in our original sample space S, Px(A) is just the probability assigned to that

    event by P.

    Page 2 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Random variables..examples

    1. Tossing a coin ten times.

    The sample space consists of the 210 possible sequences of heads and tails. There are many different random variables that could be associated with this

    experiment: X1 could be the number of heads, X2 the longest run of heads divided by

    the longest run of tails, X3 the number of times we get two heads immediately before a

    tail, etc...

    For s =HTTHHHHTTH, what are the values of these random variables?2. Choosing a point in a rectangle within a plane

    An experiment involves choosing a point s = (x,y) at random from the rectangleS = {(x,y) : 0 x 2, 0 y 1/2}

    The random variable X could be the xcoordinate of the point and an event is X takingvalues in [1, 2]

    Another random variable Z would be the distance of the point from the origin,Z(s) =

    x2+y2.

    3. Heights, weights, distances, temperature, scores, incomes... In these cases, we can have

    X(s) = s since these are already expressed as real numbers.

    Page 3 Rohini Somanathan

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Induced probability spaces..examplesLets look at some examples of how we arrive at our probability measure PX(A).

    A coin is tossed once and were interested in the number of heads, X. The probabilityassigned to the set A = {1} in our new space is just the probability associated with one head

    in our original space. So Pr(X = x) = 12 , x {0, 1}. With two tosses, the probability attached to the set A = {1} is the sum of the probabilities

    associated with the disjoint sets {H,T} and {T ,H} whose union forms this event. In this case

    Pr(X = x) =(2x

    )( 12)

    2 x {0, 1, 2} Now consider a sequence of flips of an unbiased coin and our random variable X is the

    number of flips needed for the first head. We now have

    Pr(X = x) = f(x) =

    (1

    2

    )x1 (12

    )=

    (1

    2

    )xx = 1, 2, 3 . . .

    Is this a valid probability measure?

    How is the nature of the sample space in the first two coin-flipping examples is differentfrom the third?

    In all these cases we have a discrete random variable .

    Page 4 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    The distribution function Once weve assigned real numbers to all the subsets of our sample space S that are of

    interest, we can restrict our attention to the probabilities associated with the occurrence of

    sets of real numbers.

    Consider the set A = (,x] Now P(A) = Pr(X x) F(x) is used to denote the probability Pr(X x) and is called the distribution function of x

    Definition: The distribution function F of a random variable X is a function defined for each

    real number x as follows:

    F(x) = P(X x) for < x

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Discrete distributions

    Definition: A random variable X has a discrete distribution if X can take only a finite number k

    of different values x1,x2, . . . ,xK or an infinite sequence of different values x1,x2, . . . .

    The function f(x) = P(X = x) is the probability function of x. We define it to be f(x) for allvalues x in our sample space R(X) and zero elsewhere.

    If X has a discrete distribution, the probability of any subset A of the real line is given byP(X A) =

    xiAf(xi).

    Examples:1. The discrete uniform distribution: picking one of the first k non-negative integers at

    random

    f(x) =

    1k for x = 1, 2, ...k,0 otherwise2. The binomial distribution: the probability of x successes in n trials.

    f(x) =

    (nx

    )pxqnx for x = 0, 1, 2, ...n,

    0 otherwise

    Derive the distribution functions for each of these.

    Page 6 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Continuous distributions

    The sample space associated with our random variable often has an infinite number of points.

    Example: A point is randomly selected inside a circle of unit radius with origin (0, 0) where the probabilityassigned to being in a set A S is P(A) = area of Api and X is the distance of the selected point from theorigin. In this case F(x) = Pr(X x) = area of circle with radius xpi , so the distribution function of X is given by

    F(x) =

    0 for x < 0

    x2 0 x < 11 1 x

    Definition: A random variable X has a continuous distribution if there exists a nonnegative

    function f defined on the real line, such that for any interval A,

    P(X A) =Af(x)dx

    The function f is called the probability density function or p.d.f. of X and must satisfy the

    conditions below

    1. f(x) 0

    2. f(x)dx = 1

    What is f(x) for the above example? How can you use this to compute P( 14 < X 12)? How wouldyou use F(x) instead?

    Page 7 Rohini Somanathan

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Continuous distributions..examples

    1. The uniform distribution on an interval: Suppose a and b are two real numbers with a < b.

    A point x is selected from the interval S = {x : a x b} and the probability that itbelongs to any subinterval of S is proportional to the length of that subinterval. It follows

    that the p.d.f. must be constant on S and zero outside it:

    f(x) =

    1ba for a x b0 otherwiseNotice that the value of the p.d.f is the reciprocal of the length of the interval, these values

    can be greater than one, and the assignment of probabilities does not depend on whether

    the distribution is defined over the closed interval or the open interval (a,b)

    2. Unbounded random variables: It is sometimes convenient to define a p.d.f over unbounded

    sets, because such functions may be easier to work with and may approximate the actual

    distribution of a random variable quite well. An example is:

    f(x) =

    0 for x 01(1+x)2

    for x > 0

    3. Unbounded densities: The following function is unbounded around zero but still represents

    a valid density.

    f(x) =

    23x13 for 0 < x < 1

    0 otherwise

    Page 8 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Mixed distributions Often the process of collecting or recording data leads to censoring, and instead of obtaining

    a sample from a continuous distribution, we obtain one from a mixed distribution.

    Examples: The weight of an object is a continuous random variable, but our weighing scale only

    records weights up to a certain level.

    Households with very high incomes often underreport their income, for incomes above a

    certain level (say $250,000), surveys often club all households together - this variable is

    therefore top-censored.

    In each of these examples, we can derive the probability distribution for the new random

    variable, given the distribution for the continuous variable. In the example weve just

    considered:

    f(x) =

    0 for x 01(1+x)2

    for x > 0

    suppose we record X = 3 for all values of X 3 The p.f. for our new random variable Y isgiven by the same p.f. for values less than 3 and by 14 for Y=3.

    Some variables, such as the number of hours worked per week have a mixed distribution inthe population, with mass points at 0 and 40.

    Page 9 Rohini Somanathan

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Properties of the distribution function

    Recall that the distribution function or cumulative distribution function (c.d.f) for a random

    variable X is defined as

    F(x) = P(X x) for < x

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Examples of distribution functions

    Consider the experiment of rolling a die or tossing a fair coin, with X in the first case beingthe number of dots and in the second case the number of heads. Graph the distribution

    function of X in each of these cases.

    What about the experiment of picking a point in the unit interval [0, 1] with X as thedistance from the origin?

    What type of probability function corresponds to the following distribution function?3.3 The Cumulative Distribution Function 109

    Figure 3.6 An example of ac.d.f.

    1

    z3

    z2

    z1

    z0

    0 x1 x2 x3 x4 x

    F(x)

    Section 1.10. Similarly, the fact that Pr(X x) approaches 1 as x follows fromExercise 12 in Sec. 1.10.

    The limiting values specified in Property 3.3.2 are indicated in Fig. 3.6. In thisfigure, the value of F(x) actually becomes 1 at x = x4 and then remains 1 for x > x4.Hence, it may be concluded that Pr(X x4)= 1 and Pr(X > x4)= 0. On the otherhand, according to the sketch in Fig. 3.6, the value of F(x) approaches 0 as x,but does not actually become 0 at any finite point x. Therefore, for every finite valueof x, no matter how small, Pr(X x) > 0.

    A c.d.f. need not be continuous. In fact, the value of F(x) may jump at anyfinite or countable number of points. In Fig. 3.6, for instance, such jumps or pointsof discontinuity occur where x = x1 and x = x3. For each fixed value x, we shall letF(x) denote the limit of the values of F(y) as y approaches x from the left, that is,as y approaches x through values smaller than x. In symbols,

    F(x)= limyxyx

    F (y).

    If the c.d.f. is continuous at a given point x, then F(x)= F(x+)= F(x) at that point.

    Property3.3.3

    Continuity from the Right. A c.d.f. is always continuous from the right; that is, F(x)=F(x+) at every point x.

    Proof Let y1> y2 > . . . be a sequence of numbers that are decreasing such thatlimn yn = x. Then the event {X x} is the intersection of all the events {X yn}for n= 1, 2, . . . . Hence, by Exercise 13 of Sec. 1.10,

    F(x)= Pr(X x)= limn Pr(X yn)= F(x

    +).

    It follows from Property 3.3.3 that at every point x at which a jump occurs,

    F(x+)= F(x) and F(x) < F(x).

    Page 12 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    The quantile function

    The distribution function X gives us the probability that X x for all real numbers x Suppose we are given a probability p and want to know the value of x corresponding to this

    value of the distribution function.

    If F is a one-to-one function, then it has an inverse and the value we are looking for is givenby F1(p)

    Examples: median income would be found by F1( 12) where F is the distribution function ofincome.

    Definition: When the distribution function of a random variable X is continuous and one-to-one

    over the whole set of possible values of X, we call the function F1 the quantile function of X. The

    value of F1(p) is called the pth quantile of X or the 100 pth percentile of X for each 0 < p < 1.

    Example: If X has a uniform distribution over the interval [a,b], F(x) = xaba over this interval, 0

    for x a and 1 for x > b. Given a value p, we simply solve for the pth quantile:x = pb+(1p)a. Compute this for p = .5, .25, .9, . . .

    Page 13 Rohini Somanathan

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Examples: computing quantiles, etc.

    1. The p.d.f of a random variable is given by:

    f(x) =

    18x for 0 x 40 otherwiseFind the value of t such that

    (a) P(X t) = 14(b) P(X t) = 12

    2. The p.d.f of a random variable is given by:

    f(x) =

    cx2 for 1 x 20 otherwiseFind the value of the constant c and Pr(X > 32)

    Page 14 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Bivariate distributions

    Social scientists are typically interested in the manner in which multiple attributes ofpeople and the societies they live in. The object of interest is a multivariate probability

    distribution. examples: education and earnings, days ill per month and age, sex-ratios and

    areas under rice cultivation)

    This involves dealing with the joint distribution of two or more random variables. Bivariatedistributions attach probabilities to events that are defined by values taken by two random

    variables (say X and Y).

    Values taken by these random variables are now ordered pairs, (xi,yi) and an event A is aset of such values.

    If both X and Y are discrete random variables, the probability functionf(x,y) = P(X = x and Y = y) and P(X,Y) A =

    (xi,yi)Af(xi,yi)

    Page 15 Rohini Somanathan

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Representing a discrete bivariate distribution

    If both X and Y are discrete, this function takes only a finite number of values. If there are only a small number of these values, they can be usefully presented in a table. The table below could represent the probabilities of receiving different levels of education.X is the highest level of education and Y is gender:

    education gender male femalenone .05 .2

    primary .25 .1

    middle .15 .04

    high .1 .03

    senior secondary .03 .02

    graduate and above .02 .01

    What are some features of a table like this one? In particular, how would we obtainprobabilities associated with the following events:

    receiving no education

    becoming a female graduate

    completing primary school

    What else do you learn from the table about the population of interest?

    Page 16 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Continuous bivariate distributions

    We can extend our definition of a continuous univariate distribution to the bivariate case:

    Definition: Two random variables X and Y have a continuous joint distribution if there exists a

    nonnegative function f defined over the xy-plane such that for any subset A of the plane

    P[(X,Y) A] =A

    f(x,y)dxdy

    f is now called the joint probability density function and must satisfy

    1. f(x,y) 0 for < x

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Bivariate distribution functions

    Definition: The joint distribution function of two random variables X and Y is defined as the

    function F such that for all values of x and y ( < x

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Independent random variables

    Definition: The two random variables X and Y are independent if, for any two sets A and B of

    real numbers,

    P(X A and Y B) = P(X A)P(Y B)In other words, if A is an event whose occurrence depends only values taken by X and Bs

    occurrence depends only on values taken by Y, then the random variables X and Y are

    independent only if the events A and B are independent, for all such events A and B.

    The condition for independence can be alternatively stated in terms of the joint andmarginal distribution functions of X and Y by letting the sets A and B be the intervals

    (,x) and (,y) respectively.F(x,y) = F1(x)F2(y)

    For discrete distributions, we simply define the sets A and B as the points x and y andrequire f(x,y) = f1(x)f2(y).

    In terms of the density functions, we say that X and Y are independent if it is possible tochoose functions f1 and f2 such that the following factorization holds for

    ( < x

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Dependent random variables..examples

    Given the following density densities, lets see why the variables X and Y are dependent:

    1.

    f(x,y) =

    x+y for 0 < x < 1 and 0 < y < 10 otherwiseNotice that we cannot factorize the joint density as the product of a non-negative function

    of x and another non-negative function of y. Computing the marginals gives us

    f1(x) = x+1

    2for 0 < x < 1 and f2(y) = y+

    1

    2for 0 < y < 1

    so the product of the marginals is not equal to the joint density.

    2. Suppose we have

    f(x,y) =

    kx2y2 for x2+y2 10 otherwiseIn this case the possible values X can take depend on Y and therefore, even though the joint

    density can be factorized, the same factorization cannot work for all values of (x,y).

    More generally, whenever the space of positive probability density of X and Y is bounded by a

    curve, rather than a rectangle, the two random variables are dependent.

    Page 22 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Dependent random variables..a result

    Whenever the space of positive probability density of X and Y is bounded by a curve, rather

    than a rectangle, the two random variables are dependent. If, on the other hand, the support of

    f(x,y) is a rectangle and the joint density is of the form f(x,y) = kg(x)h(y), then X and Y are

    independent.

    Proof: For the latter part of the result, suppose the support of f(x,y) is given by the rectangle abcd where

    a < b and c < d and a x b and c y d. Now the joint density f(x,y) can be written ask1g(x)k2h(y) where k1 =

    1bag(x)dx

    and k2 =1

    dch(y)dy

    .

    The marginal densities are f1(x) = k1g(x)dck2h(y)dy and f2(y) = k2g(y)

    bak1g(x)dx, whose product gives us the joint

    density.

    Now to show that if the support is not a rectangle, the variables are dependent: Start with a point (x,y) outside

    the domain where f(x,y) > 0. If x and y are independent, we have f(x,y) = f1(x)f2(y), so one of these must be zero.

    Now as we move due south and enter the set where f(x,y) > 0, our value of x has not changed, so it could not be

    that f1(x) was zero at the original point. Similarly, if we move west, y is unchanged so it could not be that f2(y)

    was zero at the original point. So we have a contradiction.

    Page 23 Rohini Somanathan

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Conditional distributions

    Definition: Consider two discrete random variables X and Y with a joint probability function

    f(x,y) and marginal probability functions f1(x) and f2(y). After the value Y = y has been

    observed, we can write the the probability that X = x using our definition of conditional

    probability:

    P(X = x|Y = y) =P(X = x and Y = y)

    Pr(Y = y)=f(x,y)

    f2(y)

    g1(x|y) =f(x,y)f2(y)

    is called the conditional probability function of X given that Y = y. Notice that:

    1. for each fixed value of y, g1(x|y) is a probability function over all possible values of X

    because it is non-negative andx

    g1(x|y) =1

    f2(y)

    x

    f(x,y) =1

    f2(y)f2(y) = 1

    2. conditional probabilities are proportional to joint probabilities because they just divide

    these by a constant.

    We cannot use the definition of condition probability to derive the conditional density for

    continuous random variables because the probability that Y takes any particular value y is zero.

    We simply define the conditional probability density function of X given Y = y as

    g1(x|y) =f(x,y)

    f2(y)for ( < x

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Deriving conditional distributions... the discrete case

    For the education-gender example, we can find the distribution of educational achievement

    conditional on being male, the distribution of gender conditional on completing college, or any

    other conditional distribution we are interested in :

    education gender male female f(education|gender=male)none .05 .2 .08

    primary .25 .1 .42

    middle .15 .04 .25

    high .1 .03 .17

    senior secondary .03 .02 .05

    graduate and above .02 .01 .03

    f(gender|graduate) .67 .33

    Page 26 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Deriving conditional distributions... the continuous case

    For the continuous joint distribution weve looked at before

    f(x,y) =

    cx2y for x2 y 10 otherwisethe marginal distribution of X is given by

    1x2

    21

    4x2ydy =

    21

    8x2(1 x4)

    and the conditional distribution g2(y|x) =f(x,y)f1(x)

    :

    g2(y|x) =

    2y1x4 for x2 y 10 otherwiseIf X = 12 , we can compute P(Y 14 |X = 12) = 1 and P(Y 34 |X = 12) =

    134

    g2(y|12) =

    715

    Page 27 Rohini Somanathan

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Construction of the joint distribution

    We can use conditional and marginal distributions to arrive at a joint distribution:

    f(x,y) = g1(x|y)f2(y) = g2(y|x)f1(x) (1)

    Notice that the conditional distribution is not defined for a value y0 at which f2(y) = 0, but this is irrelevant

    because at any such value f(x,y0) = 0.

    Example: X is first chosen from a uniform distribution on (0, 1) and then Y is chosen from a uniform distribution

    on (x, 1). The marginal distribution of X is straightforward:

    f1(x) =

    {1 for 0 < x < 1

    0 otherwise

    Given a value of X = x, the conditional distribution

    g2(y|x) =

    {1

    1x for x < y < 1

    0 otherwise

    Using (1), the joint distribution is

    f(x,y) =

    {1

    1x for 0 < x < y < 1

    0 otherwise

    and the marginal distribution for Y can now be derived as:

    f2(y) =

    f(x,y)dx =

    y0

    1

    1 xdx = log(1y) for 0 < y < 1

    Page 28 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Multivariate distributionsOur definitions of joint, conditional and marginal distributions can be easily extended to an

    arbitrary finite number of random variables. Such a distribution is now called a multivariate

    distributon.

    The joint distribution function is defined as the function F whose value at any point

    (x1,x2, . . .xn)

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Independence for the multivariate case

    Independence: The n random variables X1, . . .Xn are independent if for any n setsA1,A1, . . .An or real numbers,

    P(X1 A1,X2 A2, . . . ,Xn An) = P(X1 A1)P(X2 A2) . . .P(Xn An)

    If the joint distribution function of X1, . . .Xn is given by F and the marginal d.f. for Xi by

    Fi, it follows that X1, . . .Xn will be independent if and only if, for all points (x1, . . .xn)

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Distributions of functions of random variables

    Wed like to derive the distribution of X2, knowing that X has a uniform distribution on (1, 1)

    the density f(x) of X over this interval is 12 we know further than Y takes values in [0, 1). the distribution function of Y is therefore given by

    G(y) = P(Y y) = P(X2 y) = P(y Xy) =y

    y

    f(x)dx =y

    The density is obtained by differentiating this:

    g(y) =

    12y for 0 < y < 10 otherwise

    Page 32 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    The Probability Integral Transformation

    RESULT: Let X be a continuous random variable with the distribution function F and letY = F(X). Then Y must be uniformly distributed on [0, 1]. The transformation from X to Y is

    called the probability integral transformation.

    We know that the distribution function must take values between 0 and 1. If we pick any ofthese values, y, the yth quantile of the distribution of X will be given by some number x and

    Pr(Y y) = Pr(X x) = F(x) = y

    which is the distribution function of a uniform random variable.

    This result helps us generate random numbers from various distributions, because it allowsus to transform a sample from a uniform distribution into a sample from some other

    distribution provided we can find F1.

    Example: Suppose we want a sample from an exponential distribution. The density is exdefined over all x > 0 and the distribution function is 1 ex. If we pick from a uniform

    between 0 and 1, and get (say) .3, we can invert the distribution function to get

    x = log(10/7) = .36 as an observation of an exponential random variable.

    Page 33 Rohini Somanathan

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Random number generators

    Historically, tables of random digits were used to generate a sample from a uniformdistribution. For example, consider the following series of digits

    553617280595580771997955130480651347088612

    If we want 10 numbers between 1 and 9, we start at a random digit in the table, and pick

    the next 10 numbers. What about numbers between 1 and 100?

    Today, we would never do this, but use a statistical package to generate these. In stata forexample:

    runiform() returns uniformly distributed random variates on the interval [0,1).

    Many packages also allow us to draw directly from the distribution we are interested in:rnormal(m, s) returns normal(m,s) random variates, where m is the mean and s is the

    standard deviation.

    Page 34 Rohini Somanathan

  • Course 003: Basic Econometrics, 2012-2013

    Topic 3: The Expectation and other Moments of a Random Variable

    Rohini Somanathan

    Course 003, 2014-2015

    Page 0 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Expectation of a discrete random variable

    Definition: The expected value of a discrete random variable exists, and is defined by

    EX =xR(X) xf(x)

    The expectation is simply a weighted average of possible outcomes, with the weights beingassigned by f(x).

    In general EX 6 R(X). Consider the experiment of rolling a die where the random variableis the number of dots on the die. The density function is given by f(x) = 16I{1,2...6}(x)

    The expectation is given by6x=1

    x6 I{1,2...6}(x) = 3.5

    If X can take only a finite number of different values, this expectation always exists. If there is an infinite sequence of possible values of X, then this expectation exists if and

    only if

    limxR(X) |x|f(x)

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Expectation of a continuous random variable

    Definition: The expected value of a continuous random variable exists, and is defined by

    EX =

    xf(x) iff

    |x|f(x)

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Expectation of functions of random variable

    We may be interested in the expectation of a function y = g(x) of a random variable x. Examples:

    Agricultural yields may be given by the random variable X, revenue, for any given value

    x, is given by the function p(x)x

    Our random variable might be food availability on a farm, child health would be a

    function of such availability.

    Scores on an aptitude test may be the random variable and performance in a course

    could be a function of these.

    Suppose that the density function of y was available to us. We could directly compute theexpectation as EY =

    yh(y)dy (if continuous). But we dont need this:

    RESULT: Let X be a random variable having density function f(x). Then the expectation of

    Y = g(X) ( in the discrete and continuous case respectively) is given by:

    Eg(X) =

    xR(X)g(x)f(x)

    Eg(X) =

    g(x)f(x)dx

    Page 4 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Expectation of functions- examples

    g(x) =X

    f(x) =

    2x for 0 < x < 10 otherwiseE(X) =

    10

    x12 (2x)dx = 45

    A point (X,Y) is chosen at random from the unit square: 0 x 1 and 0 y 1. The jointdensity over all points (x,y) in the square is 1 and E(X2+ Y2) =

    10

    10

    (x2+y2)dxdy = 23

    (X1,X2) forms a random sample of size 2 from a uniform distribution on (0, 1) andY = min(X1,X2). Well show that E(Y) = 2

    10

    x20

    x1dx1dx2 =10

    x22dx2 =13

    Suppose we are interested in the expectation of a random variable Y = g(X), defined over a set . This

    would be given by

    yf(x)dx. If 1 and 2 form a partition of , we can write this integral as

    yf(x)dx =

    1

    yf(x)dx+

    2

    yf(x)dx

    In this case, we either have X1 < X2 or X1 X2 and so

    E(Y) = E(min(X1 ,X2)|X1 < X2)+E(min(X1 ,X2)|X1 X2)

    The first of these is given by integrating the density over the triangle above the 45 degree line and gives us10

    x20x1dx1dx2 =

    10

    x222 dx2 =

    16 . We double this to account for the case where X1 X2

    Page 5 Rohini Somanathan

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Expectation properties

    RESULT 1: If Y = aX+b, then E(Y) = aE(X)+bProof: ( for a continuous random variable X)

    E(aX+b) =(ax+b)f(x)dx = a

    xf(x)dx+b

    f(x)dx = aE(x)+b

    Example: If E(X) = 5 then E(3X 5) = 10

    RESULT 2: The expectation of a sum is the sum of the expectations:

    Eki=1ui(X) =

    ki=1Eui(X)

    Proof: Eki=1ui(X) =

    (ki=1ui(x)

    )f(x)dx =

    ki=1

    ui(x)f(x)dx =

    ki=1Eui(X)

    RESULT 3: For a random sample, the expectation of a product is the product of theexpectations: If X1, . . . ,Xn are n independent random variables such that each expectation

    E(Xi) exists, then E(ni=1Xi) =

    ni=1E(Xi)

    Proof: ( for a continuous random variable X) Since the random variables are independent,

    their joint density is the product of the marginals,i.e. f(x1 , . . . ,xn) =ni=1fi(xi) and

    E(ni=1Xi) =

    . . .

    (

    ni=1xi)f(x1 , . . . ,xn)dx1 . . .dxn =

    . . .

    [

    ni=1xifi(xi)]dx1 . . .dxn =

    ni=1

    xifi(xi)dxi =

    ni=1E(Xi)

    (Notice that this third property applies only to independent random variables, whereas the

    second property holds for dependent variables as well.)

    Page 6 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Expectation properties...examples

    Expected number of successes: n balls are selected from a box containing a fraction p of redballs. The random variable Xi takes the value 1 if the i

    th ball picked is red and zero

    otherwise. Were interested in the expected value of the number of red balls picked.

    This is simply X = X1+X2+ +Xn. The expectation of X, (using our theorem) is equal toE(X1)+E(X2)+ +E(Xn) where E(Xi) = p1+(1p)0 = p. We therefore have E(X) = np

    Expected number of matches: If n letters and randomly placed in n envelopes, how manymatches would we expect? Let Xi = 1 if the i

    th letter is placed in the correct envelope, and

    zero otherwise.

    P(Xi = 1) =1

    nand P(Xi = 0) = 1

    1

    n

    It is therefore the case that E(Xi) =1n i and E(X) = 1n + 1n + + 1n = 1.

    Suppose the random variables X1, . . . ,Xn form a random sample of size n from a givencontinuous distribution on the real line for which the p.d.f. is f. Find the expectation of the

    number of observations in the sample that fall in a specified interval [a,b]. This is just like

    the first problem, except the probability of success is nowbaf(x)dx, so the answer is

    nbaf(x)dx

    Page 7 Rohini Somanathan

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    More examples

    The density function for X is given by f(x) = 2(1 x)I(0,1)E(X) =

    xf(x)dx = 2

    10(x x2)dx = 2

    [x22

    x33

    ]10= 2( 16 ) =

    13 and E(X

    2) = 210(x2 x3)dx = 2

    [x33

    x44

    ]10= 2( 13

    14 ) =

    16 . We

    can use these to compute E(6X+ 3X2) = 6( 13 )+ 3(16 ) =

    52 . We could have also computed this directly using the

    formula for the expectation of a function r(X).

    A horizontal line segment of length 5 is divided at a randomly selected point and X is thelength of the left-hand part. Let us find the expectation of the product of the lengths.

    We are picking a point from a uniform distribution on [0, 5] so the density f(x) = 15 I(0,5)(x). E(X) =52 and

    E(5X) = 52 (why?). The expected value of the product of the lengths is given by

    E [X(5X)] =50

    15 x(5 x)dx =

    256 6=

    (52

    )2= E(X)E(5X)

    A bowl contains 5 chips, 3 marked $1 and 2 marked $4. A player draws 2 chips at randomand is paid the sum of the values of the chips. If it costs $4.75 to play, is his expected gain

    positive?

    Let the random variable X be the number of $1 chips. The probability function is f(x) =( 3x)(

    22x)

    (52), x = 0, 1, 2 (a

    hypergeometric distribution). Compute f(0) = 110 , f(1) =610 and f(2) =

    310 . In this case u(x) = x+ 4(2 x) = 8 3x,

    so E[u(x)] = ( 110 )8+(610 )5+(

    310 )2 = 4.4. Alternatively, compute E(X) = 0+ f(1)+ 2f(2) =

    1210 and find the desired

    expectation as 8 3E(X).

    Page 8 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Variance of a random variableDefinition: If X is a random variable with E(X) = , the variance of X is defined as follows:

    Var(X) = E[(X)2]

    Since (X)2 0, as long as exists, the variance must be non-negative, if it exists. The expectation E[(X)2] will always exist if the values of X are bounded, but need not

    exist in general.

    A small value of the variance indicates a distribution that is concentrated around . The variance is denoted by 2 and its non-negative square root is called the standard

    deviation and is denoted by .

    Page 9 Rohini Somanathan

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Variance properties

    1. Var(X) = 0 if and only if there exists a constant c such that P(X = c) = 1

    2. For an constants a and b, Var(aX+b) = a2Var(X). It follows that Var(X) = Var(X)

    Proof: Var(aX+b) = E[(aX+bab)2] = E[(a(X))2] = a2E[(X)2] = a2Var(X)

    3. Var(X) = E(X2) [E(X)]2

    Proof: expand the LHS and take expectations

    4. If X1, . . . ,Xn are independent random variables, then

    Var(X1+ +Xn) = Var(X1)+ +Var(Xn).Proof: For n = 2, E(X1 +X2) = 1 +2 and therefore

    Var(X1 +X2) = E[(X1 +X2 1 2)2] = E[(X1 1)

    2 +(X2 2)2 + 2(X1 1)(X2 2)]

    Taking expectations, we get

    E[(X1 1)2 +(X2 2)

    2 + 2(X1 1)(X2 2)] = Var(X1)+Var(X2)+ 2E[(X1 1)(X2 2)]

    But since X1 and X2 are independent,

    E[(X1 1)(X2 2)] = E(X1 1)E(X2 2) = (1 1)(2 2) = 0

    It therefore follows that

    Var(X1 +X2) = Var(X1)+Var(X2)

    Using an induction argument, this can be established for any n

    Page 10 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Moments of a random variable

    Moments of a random variable are special types of expectations that capture characteristics of

    the distribution that we may be interested in (its shape and position). Moments are defined

    either around the origin or around the mean

    Definition: Let X be a random variable with density function f(x). Then the kth moment of X is

    the expectation E(Xk). This moment is denoted by k and is said to exist if and only if

    E(|X|k)

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Moment generating functions

    Given a random variable X, consider for each real number t, the following function (known as the

    moment generating function (MGF) of X:

    (t) = E(etX)

    If X is bounded, the above expectation exists for all values of t, if not, it may only exist forsome values of t.

    (t) is always defined at t = 0 and (0) = E(1) = 1 If the MGF exists for all values of t in an interval around t = 0, then the derivative of (t)

    exists at t = 0 and

    (0) =d

    dt[E(etX)]t=0 = E[(

    d

    dtetX)]t=0 = E[(Xe

    tX)]t=0 = E(X)

    The derivative of the MGF at t = 0 is the mean of X.

    More generally, the kth derivative evaluated at t = 0 gives us the kth moment of X.

    Proof. The function ex can be expressed as the sum of the series 1+ x+ x2

    2! + . . . and so etx can be expressed as the sum

    1+ tx+ t2x2

    2! + . . . and the expectation E(etx) =

    x=0

    (1+ tx+ t2x2

    2! + . . . )f(x). If we differentiate this w.r.t t and then set t = 0,

    were left with only the second term in parenthesis, so we havex=0

    xf(x) which is defined as the expectation of X. Similarly, if we

    differentiate twice, were left withx=0

    x2f(x), which is the second moment. For continuous distributions, we replace the sumx=0

    with an integral.0(. . . )dx

    Page 12 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Moment generating functions ..an example

    Suppose a random variable X has the density function f(x) = exI(0,), we can use its MGF tocompute the mean and the variance of X as follows:

    (t) =0

    ex(t1)dx = ex(t1)

    t1

    0= 0 1t1 =

    11t for t < 1

    Taking the derivative of this function with respect to t, we get (t) = 1(1t)2

    , and

    differentiating again, we get (t) = 2(1t)3

    .

    Evaluating the first derivative at t = 0, we get = 1(10)2

    = 1.

    The variance 2 = 2 2 = 2(1 0)3 1 = 1.

    Page 13 Rohini Somanathan

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Properties of moment generating functions

    RESULT 1: Let X be a random variable for which the MGF is 1 and consider the randomvariable Y = aX+b, where a and b are given constants. Let the MGF of Y be denoted by

    2. Then for any value of t such that 1(t) exists,

    2(t) = ebt1(at)

    RESULT 2: Suppose that X1, . . . ,Xn are n independent random variables and that i is theMGF of Xi. Let Y = X1+ +Xn and the MGF of Y be given by . Then for any value of tsuch that i(t) exists for all i = 1, 2, . . . ,n,

    (t) =

    ni=1

    i(t)

    RESULT 3: If the MGFs of two random variables X1 and X2 are identical for all values of tin an interval around the point t = 0, then the probability distributions of X1 and X2 must

    be identical.

    Examples: If f(x) = exI(0,) as in the above example, the MGF of the random variableY = (X 1) = e

    t

    1t for t < 1 (using the first result above, setting a = 1 and b = 1) and if Y = 3 2X,

    the MGF of Y is given by e3t

    1+2t for t >12

    Page 14 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    An Illustration: the binomial distribution

    Suppose that there is a probability p of a girl child being born and this probability does notvary by the birth-order of the child.

    A family has n children. The random variable Xi = 1 if the ith child is a girl and 0 otherwise. The total number of girls in the family is given by the random variable X = X1+ +Xn

    follows a binomial distribution with parameters n and p

    We know from the properties of the variance that Var(X) =ni=1Var(Xi)

    E(Xi) = 1.p+ 0(1p) = p and E(X2i) = 1

    2(p)+ 02(1p) = p, so Var(Xi) = pp2 and Var(X) = np(1p)

    We can get the same expression using the MGF for the binomial: The MGF for each of the Xi variables is given by

    etP(Xi = 1)+ (1)P(Xi = 0) = pet+q.

    Using the additive property of MGFs for independent random variables, we get the

    MGF for X as

    (t) = (pet+q)n

    For two random variables each with parameters (n1,p) and (n2,p), the MGF of their sumis given by the product of the MGFs, (pet+q)n1+n2

    Page 15 Rohini Somanathan

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    The median of a distribution

    The mean gives us the centre of gravity of a distribution and is one way of summarizing it. Adisadvantage in some contexts that it is influenced by every observation.

    An alternative measure of the centre of a distribution is the median:Definition: For any random variable X, a median of the distribution of X is defined as a

    point m such that P(Xm) 12 and P(Xm) 12 RESULT: Let m be the median of the distribution of X and let d be any other number.

    Then

    E(|Xm|) E(|Xd|) Every distribution has at least one median and may have multiple medians as seen in the

    following examples:

    1. P(X = 1) = .1 P(X = 2) = .2 P(X = 3) = .3 P(X = 4) = .4

    2. P(X = 1) = .1 P(X = 2) = .4 P(X = 3) = .3 P(X = 4) = .2

    3.

    f(x) =

    4x3 for 0 < x < 10 otherwise4.

    f(x) =

    12 for 0 x 11 for 2.5 x 30 otherwise

    Page 16 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Covariance and correlation

    Definition: Let X and Y be random variables with E(X) = X and E(Y) = Y and variances

    Var(X) = 2X and Var(Y) = 2Y .

    The covariance of X and Y is defined as E[(XX)(YY)] and is denoted by XY or Cov(X,Y).

    The value of the covariance will be finite if each of the above variances are finite. It can be

    positive, negative or zero. It can conveniently be computed as E(XY)E(X)E(Y) (just expand

    the expression above and take expectations)

    Definition: If 0 < 2X

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Properties of covariance and correlation

    Result 1: Let X and Y be random variables with 2X

  • Course 003: Basic Econometrics, 2012-2013

    Topic 4: Some Special Distributions

    Rohini Somanathan

    Course 003, 2014-2015

    Page 0 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Parametric Families of Distributions

    There are a few classes of functions that are frequently used as probability distributions,because they are easy to work with (have a small number of parameters) and attach

    reasonable values to the types of uncertain events we are interested in analyzing.

    The choice among these families depends on the question of interest. For modeling the distribution of income or consumption expenditure, we want a density

    which is skewed to the right ( gamma, weibull, lognormal..)

    IQs, heights, weights, arm circumference are quite symmetric around a mode (normal

    or truncated normal)

    number of successes in a given number of trials (binomial)

    the time to failure for a machine or person (gamma, exponential)

    We refer to these probability density functions by f(x;) where refers to a parametervector.

    A given choice of therefore leads to a given probability density function. is used to denote the parameter space.

    Page 1 Rohini Somanathan

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Discrete Distributions: Uniform

    Parameter: N Probability function: f(x;N) = 1N I(1,2,...,N)(x)

    Moments: =

    xf(x) =

    1

    N

    N(N+ 1)

    2=(N+ 1)

    2

    2 =x2f(x) 2 =

    1

    N

    N(N+ 1)(2N+ 1)

    6((N+ 1)

    2

    )2=N2 1

    12

    MGF: Nj=1 ejtN Applications: experiments or situations in which each outcome is equally likely (dice,

    coins..) Can you think of applications in economics?

    Page 2 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Discrete Distributions: Bernoulli

    Parameter: p , 0 p 1 Probability function: f(x;p) = px(1p)1x I(0,1)(x) Moments:

    =xf(x) = 1.p1(1p)0 + 0.p0((1p)1 = p

    2 =x2f(x)2 = p(1p)

    MGF: etp+ e0(1p) = pet+(1p) Applications: experiments or situations in which there are two possible outcomes: success

    or failure, defective or not defective, male or female, etc.

    Page 3 Rohini Somanathan

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Discrete Distributions: Binomial.

    Parameters: (n,p) , 0 p 1 and n is a positive integer Probability function: An observed sequence of n Bernoulli trials can be represented by ann-tuple of zeros and ones. The number of ways to achieve x ones is given by

    (nx

    )= n!x!(nx)! .

    The probability of x successes in n trials is therefore:

    f(x;n,p) =

    (nx

    )px(1p)nx x = 0, 1, 2, . . .n

    0 otherwise

    Notice that sincenx=0

    (nx

    )axbnx = (a+b)n,

    nx=0f(x) = [(p+(1p)]n = 1 so we have a valid

    density function.

    MGF:The MGF is given by:xetxf(x) =

    nx=0etx

    (nx

    )px(1p)nx =

    nx=0

    (nx

    )(pet)x(1p)nx = [(1p)+pet]n

    Moments: The MGF can be used to derive = np and 2 = np(1p) Result: If X1, . . .Xk are independent random variables and if each Xi has a binomial

    distribution with parameters ni and p, then the sum X1 + +Xk has a binomialdistribution with parameters n = n1 + +nk and p.

    Page 4 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Multinomial DistributionsSuppose there are a small number of different outcomes (methods of public transport, water

    purification etc. ) The Multinomial distribution gives us the probability associated with a

    particular vector of these outcomes:

    Parameters: (n,p1, . . .pm) , 0 pi 1,ipi = 1 and n is a positive integer

    Probability function:

    f(x1, . . .xm;n,p1, . . .pm) =

    n!mi=1xi!

    mi=1pxii x = 0, 1, 2, . . .n,

    mi xi = n

    0 otherwise

    Page 5 Rohini Somanathan

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Geometric and Negative Binomial distributions

    The Negative Binomial (or Pascal) distribution gives us the probability that x failures willoccur before r successes are achieved. This means that the rth success occurs on the

    (x+ r)th trial.

    Parameters: (r,p) , 0 pi 1,ipi = 1 and r is a positive integer

    Density: For the rth success occurs on the (x+ r)th trial, we require (r 1) successes in

    the first (x+ r 1) trials. We therefore obtain the density:

    f(x;r,p) =

    (r+ x 1

    x

    )prqx, x = 0, 1, 2, 3...

    The geometric distribution is a special case of the negative binomial with r = 1. The density in this case takes the form f(x|1,p) = pqx over all natural numbers x

    the MGF is given by E(etX) = px=0(qe

    t)x = p1qet

    for t < log( 1q)

    We can use this function to get the mean and variance, = qp and 2 = q

    p2

    The negative binomial is just a sum of r geometric variables, and the MGF is therefore

    ( p1qet

    )r and the corresponding mean and variance is = rqp and 2 = rq

    p2

    The geometric distribution is memory-less, so the conditional probability of k+ t

    failures given k failures is the unconditional probability of t failures,

    P(X = k+ t|X k) = P(X = t)

    Page 6 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Discrete Distributions: Poisson Parameter: , > 0 Probability function:

    f(x;) =

    ex

    x! , x = 0, 1, 2, . . . ,

    0 otherwise

    Using the result that the series 1+ + 2

    2! +3

    3! + . . . converges to e,

    xf(x) =

    x=0

    ex

    x! = e

    x=0

    x

    x! = ee = 1 so we have a valid density.

    Moments: = = 2

    MGF: E(etX) =x=0

    etxex

    x! = e

    x=0

    (et)x

    x! = e(et1)

    The MGF can be used to get the first and second moments about the origin, and 2 + so the mean and the variance are both .

    We can also use the product of k identical MGFs to show that the sum of k independentlydistributed Poisson variables has a Poisson distribution with mean 1 + . . .k.

    Page 7 Rohini Somanathan

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    A Poisson processSuppose that the number of type A outcomes that occur over a fixed interval of time, [0, t]

    follows a process in which

    1. The probability that precisely one type A outcome will occur in a small interval of time t

    is approximately proportional to the length of the interval:

    g(1,t) = t+o(t)

    where o(t) denotes a function of t having the property that limt0o(t)t = 0.

    2. The probability that two or more type A outcomes will occur in a small interval of time t

    is negligible: x=2

    g(x,t) = o(t)

    3. The numbers of type A outcomes that occur in nonoverlapping time intervals are

    independent events.

    These conditions imply a process which is stationary over the period of observation, i.e the

    probability of an occurrence must be the same over the entire period with neither busy nor quiet

    intervals.

    Page 8 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Poisson densities representing poisson processes

    RESULT: Consider a Poisson process with the rate per unit of time. The number of events in

    a time interval t is a Poisson density with mean = t.

    Applications:

    the number of weaving defects in a yard of handloom cloth or stitching defects in a shirt the number of traffic accidents on a motorway in an hour the number of particles of a noxious substance that come out of chimney in a given period

    of time

    the number of times a machine breaks down each weekExample:

    let the probability of exactly one blemish in a foot of wire be 11000 and that of two or moreblemishes be zero.

    were interested in the number of blemishes in 3, 000 feet of wire. if the numbers of blemishes in non-overlapping intervals are assumed to be independently

    distributed, then our random variable X follows a poisson process with = t = 3 and

    P(X = 5) =35e3

    5!

    you can plug this into a computer, or alternatively use tables to compute f(5; 3) = .101

    Page 9 Rohini Somanathan

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    The Poisson as a limiting distributionWe can show that a binomial distribution with large n and small p can be approximated by a

    Poisson ( which is computationally easier).

    useful result: ev = limn(1+ vn)n We can rewrite the binomial density for non-zero values as

    f(x;n,p) =

    xi=1

    (ni+1)

    x! px(1p)nx. If np = , we can subsitute for p by

    (n

    )to get

    limnf(x;n,p) = limnxi=1(n i+ 1)

    x!

    (n

    )x(1

    n

    )nx

    = limnxi=1(n i+ 1)

    nxx

    x!

    (1

    n

    )n(1

    n

    )x= limn[n

    n.(n 1)

    n. . . .

    (n x+ 1)

    n

    x

    x!

    (1

    n

    )n(1

    n

    )x]=ex

    x!

    (using the above result and the property that the limit of a product is the product of the

    limits)

    Page 10 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Poisson as a limiting distribution...example

    We have a 300 page novel with 1, 500 letters on each page. Typing errors are as likely to occur for one letter as for another, and the probability of such

    an error is given by p = 105.

    The total number of letters n = (300) (1500) = 450, 000 Using = np, the poisson distribution function gives us the probability of the number of

    errors being less than or equal to 10 as:

    P(x 10)10x=0

    e4.5(4.5)x

    x!= .9933

    Rules of Thumb: close to binomial probabilities when n 20 and p .05, excellent when n 100and np 10.

    Page 11 Rohini Somanathan

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Discrete distributions: Hypergeometric

    Suppose, like in the case of the binomial, there are two possible outcomes and wereinterested in the probability of x values of a particular outcome, but we are drawing

    randomly without replacement so our trials are not independent.

    In particular, suppose there are A+B objects from which we pick n, A of the total numberavailable are of one type (red balls) and the rest are of the other (blue balls).

    If the random variable is the total number of red balls selected, then, for appropriate valuesof x, we have f(x;A,B,n) =

    (Ax)(Bnx)

    (A+Bn )

    Over what values of x is this defined? max{0,nB, } Xmin{n,A}

    The multivariate extension is (for xi 0, 1, 2..n,ni=1xi = n and

    mi=1Ki =M ):

    f(x1 . . .xm;K1 . . .Km,n) =

    mj=1

    (Kjxj

    )(Mn

    )

    Page 12 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Continuous distributions: uniform or rectangular

    Parameters: (a,b) , (a,b) : < a < b

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    The gamma functionThe gamma function is a special mathematical function that is widely used in statistics. The

    gamma function of is defined as

    () =

    0

    y1eydy (1)

    If = 1, () =0

    eydy = ey0= 1

    If > 1, we can integrate (1) by parts, setting u = y1 and dv = ey and using the formula0

    udv = uv00

    vdu to get: y1

    ey

    0+( 1)

    0

    y2eydy

    The first term in the above expression is zero because the exponential function goes to zerofaster than any polynomial and we obtain

    () = ( 1)( 1)

    and for any integer > 1, we have

    () = ( 1)( 2)( 3) . . . (3)(2)(1)(1) = ( 1)!

    Page 14 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    The gamma distribution

    Define the variable x by y = x , where > 0. Then dy =1dx and can rewrite () as

    () =

    0

    ( x

    )1e x

    ( 1

    )dx

    or as

    1 =

    0

    1

    ()x1e

    xdx

    This shows that for , > 0,

    f(x;,) =1

    ()x1e

    x I(0,)(x)is a valid density and is known as a gamma-type probability density function.

    Page 15 Rohini Somanathan

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Features of the gamma density

    This is a valuable distribution because it can take a variety of shapes depending on the values of

    the parameters and

    It is skewed to the right It is strictly decreasing when 1 If = 1, we have the exponential density, which is memory-less. For > 1 the density attains it maximum at x = ( 1)320 Chapter 5 Special Distributions

    Figure 5.7 Graphs of thep.d.f.s of several differentgamma distributions withcommon mean of 1.

    G

    a

    m

    m

    a

    p

    .

    d

    .

    f

    .

    x

    a ! 0.1, b ! 0.1a ! 1, b ! 1a ! 2, b ! 2a ! 3, b ! 3

    0.2

    0

    0.4

    0.6

    0.8

    1.0

    1.2

    1 2 3 4 5

    Theorem5.7.5

    Moments. Let X have the gamma distribution with parameters and . For k =1, 2, . . . ,

    E(Xk)= #( + k)k#()

    = ( + 1) . . . ( + k 1)k

    .

    In particular, E(X)= , and Var(X)= 2 .

    Proof For k = 1, 2, . . . ,E(Xk)=

    0

    xkf (x|, ) dx =

    #()

    0

    x+k1ex dx

    =

    #(). #( + k)

    +k= #( + k)

    k#(). (5.7.14)

    The expression for E(X) follows immediately from (5.7.14). The variance can becomputed as

    Var(X)= ( + 1)2

    (

    )2= 2

    .

    Figure 5.7 shows several gamma distribution p.d.f.s that all have mean equal to1 but different values of and .

    Example5.7.6

    Service Times in a Queue. In Example 5.7.5, the conditional mean service rate giventhe observations X1= x1, . . . , Xn = xn is

    E(Z|x1, . . . , xn)= n+ 12+ni=1 xi .For large n, the conditional mean is approximately 1 over the sample average ofthe service times. This makes sense since 1 over the average service time is what wegenerally mean by service rate. !

    The m.g.f. of X can be obtained similarly.

    Theorem5.7.6

    Moment Generating Function. Let X have the gamma distribution with parameters and . The m.g.f. of X is

    (t)=(

    t)

    for t < . (5.7.15)

    Page 16 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Moments of the gamma distribution

    Parameters: (,) , > 0, > 0 Moments: = , 2 = 2

    MGF: MX(t) = (1t) for t < 1 which can be derived as follows:

    MX(t) =

    0

    etx1

    ()x1e

    x

    =

    0

    1

    ()x1e

    ( 1t)x

    =1

    ()1(

    1 t

    )10

    x1( 1 t

    )1e( 1t)x 1(

    1 t

    )dx=

    1

    ()()(

    1 t

    ) (by setting y = ( 1 t)x in the expression for ().=

    1(1t

    )

    Page 17 Rohini Somanathan

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Gamma applications

    Survival analysis We can use it to model the waiting time till the rth event/success. If X is the time that

    passes until the first success, then X could be gamma distribution with = 1 and = 1 .

    This is known as an exponential distribution. If, instead we are interested in the time

    taken for the rth success, this has a gamma density with = r and 1 = .

    Related to the Poisson distribution: If the variable Y is the number of successes

    (deaths, for example) in a given time period t and has a poisson density with parameter

    , the rate of success is given by = t .

    Example: A bottling plant breaks down, on average, twice every four weeks. We want the

    probability that the number of breakdowns, X 3 in the next four weeks. We have = 2and the breakdown rate = 12 per week. P(X 3) =

    3i=0e2 2

    i

    i! = .135+ .271+ .271+ .18 = .857

    Suppose we wanted the probability that the machine does not break down in the next four

    weeks. The time taken until the first break-down, x must therefore be more than four

    weeks. This follows a gamma distribution, with = 1 and = 1.

    P(X 4) =4

    12e

    x2 dx = ex2

    4= e2 = .135

    Income distributions that are uni-modal

    Page 18 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Gamma distributions: some useful properties

    Gamma Additivity: Let X1, . . .Xn be independently distributed random variables withrespective gamma densities Gamma(i,). Then

    Y =

    ni=1

    Xi Gamma(

    ni=1

    i,)

    Scaling Gamma Random Variables: Let X be distributed with gamma densityGamma(,) and let c > 0. Then

    Y = cX Gamma(,c)

    Both these can be easily proved using the gamma MGF and applying the MGF uniqueness

    theorem: In the first case the MGF of Y is the product of the individual MGFs, i.e.

    MY(t) =

    ni=1

    MXi(t) =

    ni=1

    (1t)i = (1t)

    ni=1

    ifor t 0 Density: f(x;,2) = 1

    2pie

    12 (x )

    2I(,+)(x)

    MGF: MX(t) = et+2t22

    The MGF can be used to derive the moments, E(X) = and variance is 2

    As can be seen from the p.d.f, the distribution is symmetric around , where it achieves itsmaximum value. this is therefore also the median and the mode of the distribution.

    The normal distribution with zero mean and unit variance is known as the standard normaldistribution and is of the form: f(x; 0, 1) = 1

    2pie

    12 x

    2I(,+)(x)

    The tails of the distribution are thin: 68% of the total probability lies within one of themean, 95.4% within 2 and 99.7% within 3.

    Page 23 Rohini Somanathan

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    The Normal distribution: deriving the MGF

    By the definition of the MGF:

    M(t) =

    etx

    1

    2pie(x)2

    22 dx

    =

    1

    2pie

    [tx

    (x)2

    22

    ]dx

    We can rewrite the term inside the square brackets to obtain:

    tx(x)2

    22= t+

    1

    22t2

    [x(+2t)]2

    22

    The MGF can now be written as:

    MX(t) =Cet+ 12

    2t2

    where C = 12pie

    [x(+2t)]2

    22 dx = 1 because the integrand is a normal p.d.f with parameter

    replaced by (+2t)

    Page 24 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    The Normal distribution: computing moments

    First taking derivatives of the MGF:

    M(t) = e(t+2t2

    2 )

    M(t) = M(t)(+2t)

    M(t) = M(t) 2 +M(t)(+2t)2

    (obtained by differentiating M(t) with respect to t and substituting for M(t))

    Evaluating these at t = 0, we get M(0) = and M(0) = 2 +2, or the variance = 2.

    Page 25 Rohini Somanathan

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Transformations of Normally Distributed Variables...1

    RESULT 1: Let X N(,2). Then Z =(X) N(0, 1)

    Proof: Z is of the form aX+b with a = 1 and b = . Therefore

    MZ(t) = ebtMX(at) = e

    te t+

    2 t2

    22 = et22 which is the MGF of a standard normal distribution.

    An important implication of the above result is that if we are interested in any distribution in

    this class of normal distributions, we only need to be able to compute integrals for the standard

    normal-these are the tables youll see at the back of most textbooks.

    Example: The kilometres per litre of fuel achieved by a new Maruti model , X N(17, .25). What

    is the probability that a new car will achieve between 16 and 18 kilometres per litre?

    Answer: P(16 x 18) = P(

    1617.5 z 1817.5

    )= P(2 z 2) = 1 2(.0228) = .9544

    Page 26 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Transformations of Normals...2

    RESULT 2: Let X N(,2) and Y = aX+b, where a and b are given constants and a 6= 0,then Y has a normal distribution with mean a+b and variance a22

    Proof: The MGF of Y can be expressed as MY(t) = ebteat+

    12

    2a2t2 = e(a+b)t+12 (a)

    2t2 .

    This is simply the MGF for a normal distribution with the mean a+b and variance a22

    RESULT 3: If X1, . . . ,Xk are independent and Xi has a normal distribution with mean iand variance 2i, then Y = X1 + +Xk has a normal distribution with mean 1 + +kand variance 21 + +2k.Proof: Write the MGF of Y as the product of the MGFs of the Xis and gather linear and

    squared terms separately to get the desired result.

    We can combine these two results to derive the distribution of sample mean:RESULT 4: Suppose that the random variables X1, . . . ,Xn form a random sample from a

    normal distribution with mean and variance 2, and let Xn denote the sample mean.

    Then Xn has a normal distribution with mean and variance2

    n .

    Page 27 Rohini Somanathan

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Transformations of Normals to 2 distributions

    RESULT 5 : If X N(0, 1), then Y = X2 has a 2 distribution with one degree of freedom.

    Proof:

    MY(t) =

    ex

    2t 12pie

    x22 dx

    =

    12pie

    12 x

    2(12t)dx

    =1

    (1 2t)

    12pi 1

    (12t)

    e12 (x(12t))2dx

    =1

    (1 2t)for t 0,

    P(|X| t) 2

    t2

    or equivalently,

    P(|X| < t) 1 2

    t2

    Proof. Use Markovs inequality with Y = (X)2 and use t2 in place of the constant t. Then Y takes only

    non-negative values and E(Y) = Var(X) = 2.

    In particular, this tells us that for any random variable, the probability that values taken by the

    variable will be more than 3 standard deviations away from the mean cannot exceed 19

    P(|X| 3) 19

    For most distributions, this upper bound is considerably higher than the actual probability of

    this event.

    Page 4 Rohini Somanathan

    Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    Probability bounds ..an example

    Chebyshevs Inequality can, in principle be used for computing bounds for the probabilities of

    certain events. In practice this is not often used because the bounds it provides are quite

    different from actual probabilities as seen in the following example:

    Let the density function of X be given by f(x) = 1(2

    3)I(

    3,

    3)(x). In this case = 0 and

    2 =(ba)2

    12 = 1. If t =32 , then

    Pr(|X| 32) = Pr(|X| 3

    2) = 1

    32

    32

    1

    2

    3dx = 1

    3

    2= .13

    Chebyshevs inequality gives us 1t2= 49 which is much higher. If t = 2, the exact probability is 0,

    while our bound is 14 .

    Page 5 Rohini Somanathan

  • Course 003: Basic Econometrics, 2012-2013'

    &

    $

    %

    The sample mean and its properties

    Our estimate for the mean of a population is ty