thessl ch1 statistics, entropy , lagrange , score test, estimation

Upload: 0105053257hmg

Post on 30-May-2018

229 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 Thessl Ch1 Statistics, Entropy , Lagrange , Score Test, Estimation

    1/24

    Chapter (II)

    Definitions and Notation

    This chapter is concerned with some important definitions and notation that

    will be used in this study. The first section deals with review of some different

    approaches of estimation, the second section is devoted to some topics in hypotheses

    testing, the third section will focus on measures of information, the fourth section will

    focus on optimization subject to conditions via Lagrange multiplier, finally the fifth

    section explained in brief some important distributions.

    2.1 Methods of EstimationProblem of point estimation for distributions parameter plays a vital role in

    the statistical literatures, therefore many methods of estimation were proposed, this

    section is concerned with three methods of estimation.

    1 Method of Moments

    It is difficult to track back who introducemethod of moments MOM , but

    Johnan Bernoulli(1667-1748) was the first who used the method in his work seeGelder(1997), the idea for this method that we estimate the unknown parameters in

    terms of the unobserved populations moments for instance (mean, variance,

    skewness, kurtosis and coefficient of variation), then estimate the unobserved

    moments with the observed sampling moments. Typically, the different types that can

    represent the observed sampling moments have the following formulas:

    .1 The moments about zero (raw moments ): )(rxE

    .2 The central moments :r

    xxE )(

    .3 The standard moments :rxxE )(

    Where kr ..1= and ,x and k refer to the mean, standard deviation and the number

    of the estimated parameters of the distribution respectively. Hence the methods works

    by solving simultaneously a system of k equations in k unknown parameters and k

    observed sample moments

    1

  • 8/14/2019 Thessl Ch1 Statistics, Entropy , Lagrange , Score Test, Estimation

    2/24

    2 Method of Maximum Likelihood

    It is difficult to track who discovered this tool, but Bernoulil in 1700 was the

    first who reported about it see Gelder (1997), the idea that it is required to give the

    specified sample high probability to be drawn, so it is required to research about the

    parameters that maximized the likelihood function for the specified sample.

    The likelihood function is the joint density function for the completely random

    sampling taking the following formula:

    );();...(1

    1 in

    in xfxxL

    ==

    The method of maximum likelihood is required to estimate by finding the value of^

    that maximizes );...( 1 nxxL , hence^

    is called maximum likelihood estimator

    MLE, indeed obtaining^

    , in many cases, by solving the following equation:

    0);...( 1 =

    d

    xxdL n )1.1.2(

    The maximum likelihood method can be also used for estimated k unknown

    parameters, therefore solving homogenous k equations in k unknown parameters. It

    can be shown that it can't obtain defined in (2.1.1) equation, if the following

    conditions are not valid (often called regularity conditions):

    1. The first and second derivatives of the log-likelihood function must be defined.

    2. The range of Xs doesnt depend on the unknown parameter .

    Note: In many situations solving (2.1.1) can not be easily, thus one can use monotonic

    transformation that making the calculation easier and no loss information:

    d

    xfd

    d

    xxLd

    n

    i

    i

    n == 11);(ln

    )};...(ln{

    2

  • 8/14/2019 Thessl Ch1 Statistics, Entropy , Lagrange , Score Test, Estimation

    3/24

    3 Method of Least Square

    The method of least squares or ordinary least squares (OLS) is often has a vital

    role in statistical researches, particularly regression analysis, is proposed by Gauss see

    Gelder(1997), typically OLS used to estimate the relation between two variables are

    known as independent and dependent variables. Least squares problems fall into two

    categories, linear and non-linear. The linear least squares problem has a closed form

    solution, but the non-linear problem does not and is usually solved by iterative

    process, furthermore OLS can be applied for one or more independent variables, in

    this study will focus on one independent variable.

    Suppose nYYY ..21 are pairwise uncorrelated random variables represent the

    dependent variables and nXXX ..21 represent the fixed independent variables,

    suppose the relation between the sY and sX expressed as:

    iii UXBBY ++= 10 ni ..1=

    Where sU refers to the residuals of the model. Thus OLS states one should peak the

    values of sB which make the sum of squared residuals as minimum as possible:

    =

    +=n

    i

    ii xBByXsBYUMIN1

    2

    10 )(),,(

    Differentiating ),,( XsBYE with respect to sB it will yield:

    =+=

    =+=

    =

    =

    0)(2),,(

    0)(2),,(

    1

    10

    1

    1

    10

    0

    n

    i

    iii

    n

    i

    ii

    xBByxB

    XsBYdE

    xBByB

    XsBYdE

    (2.1.2)

    It is easily to check (2.1.2) gives a minimum values, hence solving (2.1.2) it will be

    obtained:

    3

  • 8/14/2019 Thessl Ch1 Statistics, Entropy , Lagrange , Score Test, Estimation

    4/24

    xbyb

    xnx

    yxnxy

    bn

    i

    n

    i

    10

    2

    1

    2

    11

    =

    =

    =

    =

    So far, it is not obvious to prefer which method can be more efficient than

    other, fortunately, to overcome this problem it should be discuss some topics related

    to the properties of point estimator and confidence interval.

    Definition (2.1.1): In statistics, point estimation refers to the use of sample data to

    calculate a single value is well known as a statistic, an observed function of a sample

    where the function itself is independent of the parameter, which is to serve as a best

    guess for an unknown population parameter

    Definition (2.1.2) Unbiased Estimator: The first criteria which can classify the

    estimators is unbiaseness, suppose is a statistic from observed random sample and

    consider point estimator for, we called is unbiased estimator for iff E () = ,

    if the previous condition valid in the large sample size, we called, is asymptotic

    unbiased estimator for .

    Definition (2.1.3) Relative Efficient Estimator: Suppose 1 , 2 are two

    estimators for , iff 1)(

    )(

    2

    1 k

    Where )()( 111

    i

    n

    iio

    n

    ixfxf

    === and k is a positive constant.

    The idea that we calculate the ratio between the likelihood function under oH

    and 1H , that is high value refers to accept oH otherwise indicates to reject oH ,

    9

  • 8/14/2019 Thessl Ch1 Statistics, Entropy , Lagrange , Score Test, Estimation

    10/24

    therefore this ratio is well known as simple likelihood ratio or Neyman-Pearson

    lemma.

    Definition (2.2.2): if it is required to test simple hypothesis verses composite

    alternative hypothesis among all the tests haveor less than, the statistical test has

    most powerful verses all alternative hypotheses called Uniformly Most Powerful Test,

    and take the following formula :

    Reject oH if < c accept oH if > c

    Where )()(11

    i

    n

    iio

    n

    ixfxf

    === and c is a positive constant.

    The idea that we calculate the ratio between the likelihood function under oH

    and 1H , )( ixf means all sample space for the parameter , this ratio is called

    typically Generalized Likelihood Ratio.

    It is obvious that is an special case from .The distribution of

    corresponding to a particular null and alternative hypothesis using the sampling

    distribution of the test, in many cases it is not quite, fortunately it is proved that for

    any particular null and alternative hypothesis ln2 has approximately 2

    distribution with degree of freedom the number of the tested parameter in the null

    hypothesis.

    2.3Measures of Information

    A great variety of the informations measures are proposed in the literatures

    recently see Estban (1995), since Shannon (1948) has a huge contribution for

    development the information theory , thus in this section it will deal with Shannons

    entropy and some measures related to Shannons (1948) entropy.

    Definition (2.3.1):The origin of the entropy concept goes back to Ludwig

    Boltzmann (1877), it is a Greek notation meaning transformation, it has been given a

    probabilistic interpretation in information theory by Shannon (1948).He consider the

    entropy as index of the uncertainty associated with a random variable expressed in

    10

  • 8/14/2019 Thessl Ch1 Statistics, Entropy , Lagrange , Score Test, Estimation

    11/24

    nats , where nat (sometimes nit or nepit) is aunitofinformation orentropy, based on

    natural logarithms. Let there is n events with probabilities nppp ..21 adding up to 1,

    Shannon (1948) stated the entropy corresponding these events can take the following

    formula:

    =

    =n

    i

    ii xpxpXH1

    )(ln)()( (2.3.1)

    Hence, Shannon (1948) claimed that via (2.3.1) one can transform the

    information in the sample from the invisible form to numerical physical form so the

    comparisons can easily made and can be understood. Frenken (2003) mentioned that

    (2.3.1) can be regarded the variance for the qualitative data.

    To show how Shannon (1948) concluded (2.3.1), assume knnn .., 21 be the

    number of each categories occurs in the experiment of length n, where:

    nnk

    i

    i ==1

    andn

    np ii =

    According to Golan (1996), Shannon (1948) mentioned that the all possible

    combination that partition n into k categories of size kn can be indicator for the

    accuracy of any decision associated to this sample, one can present the numbers all

    possible combination as:

    !!..!

    !

    21

    ..2,1k

    n

    knnn nnn

    nCW == (2.3.2)

    It is obvious that if (2.3.2) is always greater than or equal to one, if (2.3.2) equals one

    this indicator for the sample has one category and that refers to the maximum of

    accuracy and minimum uncertainty, for more simplicity Shannon (1948) preferred to

    deal with logarithm of W as follows:

    =

    =k

    i

    innW1

    !ln!ln)ln(

    11

    http://en.wikipedia.org/wiki/Logarithmic_unithttp://en.wikipedia.org/wiki/Logarithmic_unithttp://en.wikipedia.org/wiki/Logarithmic_unithttp://en.wikipedia.org/wiki/Informationhttp://en.wikipedia.org/wiki/Information_entropyhttp://en.wikipedia.org/wiki/Natural_logarithmhttp://en.wikipedia.org/wiki/Logarithmic_unithttp://en.wikipedia.org/wiki/Informationhttp://en.wikipedia.org/wiki/Information_entropyhttp://en.wikipedia.org/wiki/Natural_logarithm
  • 8/14/2019 Thessl Ch1 Statistics, Entropy , Lagrange , Score Test, Estimation

    12/24

    Using approximation of Striling that states:

    xasxxxx ln!ln

    Thus ln(W) will be:

    ==

    +k

    i

    i

    k

    i

    ii nnnnnnW

    11

    lnln)ln(

    =

    k

    i

    ii nnnnW1

    lnln)ln(

    =

    k

    i

    ii npnnn1

    lnln

    =

    + ki

    ii pnnnn1

    )ln(lnln

    ==

    k

    i

    ii

    k

    i

    i pnnnnn11

    lnlnln

    =

    k

    i

    ii ppn1

    ln

    )(ln)ln(1

    1 pHppWnk

    i

    ii = =

    Therefore Shannons (1948) entropy can be regarded as a measurement of the

    accuracy associated to the decisions sample in average. Indeed Shannon (1948)

    mentioned (2.3.1) satisfy the following properties:

    12

  • 8/14/2019 Thessl Ch1 Statistics, Entropy , Lagrange , Score Test, Estimation

    13/24

    1 The quantity )(XH reaches a minimum, equal to zero, when one of

    the events is a certainty, assuming 0)0ln(0 = ,and )(XH reaches the

    maximum when all the probabilities are equal, hence)(XH

    can beregarded as a concave function.

    2 If some events have zero probability, they can just as well be left out

    of the entropy when we evaluate the uncertainty.

    3. Entropy information must be symmetric that doesnt depend on the

    order of the probabilities.

    For the continuous distribution (2.3.1) can take the following formula:

    = dxxfxfXH ),(ln),()(

    Definition (2.3.2):joint entropy is a measurement concerned with uncertainty ofthe two variables takes the following formula:

    ==n

    i

    iiii yxpyxpYXH1

    ),(ln),(),(

    It is obvious that:

    )()(),( YHXHYXH +

    According to Shannon (1948) the uncertainty of a joint events is less than or

    equal to the sum of the individual uncertainties and with equality only if the events

    are independent.

    Definition (2.3.3):mutual information measures the information thatXand Yshare, takes the following formula:

    =

    =n

    i ii

    iiii

    ypxp

    yxpyxpYXM

    1 )()(

    ),(ln),(),(

    It is obvious that 0),( =YXM if the two variables are independent.

    13

  • 8/14/2019 Thessl Ch1 Statistics, Entropy , Lagrange , Score Test, Estimation

    14/24

    Definition (2.3.4):conditional entropy )/( YXH is a measure of what Y doesntsay aboutX,meaning how much information in X doesnt in Y, takes the following

    formula:

    )(),()/( YHYXHYXH =

    Remark: definitions from (2-10) - (2-12) can be extended to the continuous variables

    if the summation symbol replace with the integration symbol.

    If the two variables are independent the conditional entropy )/( YXH will equal

    )(XH . it can realize that there is a relation between the measures of information as

    follows:

    Venn diagram: relation between informations measures

    Definition (2.3.5): Kullback and Leibler (1951) introduced relative-entropy or

    information divergence ,which measures the distance between two distributions of a

    random variable. This information measure is also known as KL-entropy taking the

    following formula:

    ==n

    i i

    ii

    yqxpxpYXKL

    1 )()(ln)()/( (2.3.3)

    Typically (2.3.3) is also regarded as the relative entropy for using Y instead of X,

    since (2.3.3) can be expressed as another form:

    = =

    =n

    i

    n

    i

    iiii yqxpxpxpYXKL1 1

    )(ln)()(ln)()/(

    14

  • 8/14/2019 Thessl Ch1 Statistics, Entropy , Lagrange , Score Test, Estimation

    15/24

    =

    =n

    i

    ii yqxpXH1

    )(ln)()(

    For more simplicity taking the following example: suppose we have five events in the

    specified sample associated to the following probabilities ( .2,.1,.3,.25,.15).Assuming that we

    want to know the divergence between theses events and the probabilities uniform

    distribution. Substituting in (2.3.3) it will yield:

    =

    =n

    i i

    ii

    yq

    xpxpYXKL

    1 )(

    )(ln)()/(

    065.2.15.ln15.

    2.25.ln25.

    2.3.ln3.

    2.1.ln1.

    2.2.ln2. =++++=

    Therefore, it can be concluded that if we replace the distribution of the sample with the

    uniform distribution it will loss .065 nat , thus (2.3.2) can be consider a good tool for

    discrimination between two distributions Gohale (1983). One would assume that whenever

    0)( =iyq , the corresponding 0)( =ixp and 00

    0ln0 = see Dukkipati (2006), indeed

    KL-entropy isn't symmetry that:

    )/()/( XYKLYXKL

    Furthermore )/( YXKL is non-negative measure and it equals zero iff X and Y

    are identity:

    iallforYXKL 0)/( (2.3.4)

    According Liu(2007) ,(2.3.4) can be studied using the following identity :

    0,)ln( > yxforyx

    y

    xx (2.3.5)

    Hence, one can rewrite (2.3.3) according to (2.3.5) as:

    0)(),()()()(

    )(ln)(

    111

    > ===

    ii

    n

    i

    i

    n

    i

    i

    n

    i i

    ii xqxpforxqxp

    yq

    xpxp

    0)(11

    =

    n

    i

    ixq

    iallforYXKL 0)/(

    15

  • 8/14/2019 Thessl Ch1 Statistics, Entropy , Lagrange , Score Test, Estimation

    16/24

    Remark : KL can be applied when the variables are continuous that it will replace the

    symbol of summation with integration notation, furthermore also all the properties

    are valid see Dukkipati (2006).

    2.4Lagrange Multiplier

    In mathematical optimization, the method of Lagrange multipliers provides a

    strategy for finding the maximum or minimum of the objective function subject to

    constraints.To see this point consider the following example:

    222),( yxyxfMin += (2.4.1)

    Subject to

    1=+ yx

    To solve (2.4.1), one can insert the constrain in the objective function and

    transform the restricted optimization into unrestricted optimization, then search for

    the extreme values as follows:

    xy = 1

    (2.4.2)

    Hence, (2.4.1) can be written:

    22 )1(2),( xxyxfMin +=

    So the minimum point of yx, can be obtained as follows:

    0)(

    =dx

    xdf

    0)1(24 = xx

    026 =x

    16

    http://en.wikipedia.org/wiki/Optimization_(mathematics)http://en.wikipedia.org/wiki/Function_(mathematics)http://en.wikipedia.org/wiki/Constraint_(mathematics)http://en.wikipedia.org/wiki/Optimization_(mathematics)http://en.wikipedia.org/wiki/Function_(mathematics)http://en.wikipedia.org/wiki/Constraint_(mathematics)
  • 8/14/2019 Thessl Ch1 Statistics, Entropy , Lagrange , Score Test, Estimation

    17/24

    3

    1=x (2.4.3)

    It is obvious (2.4.3) refers to the minimum point that the second derivative ispositive, for obtaining the value of y it can substitute (2.4.3) in (2.4.2), it will yield :

    3

    2=y

    Indeed, the values of x and y can be reached via another route, which it can use

    the principle of Lagrange multiplier as follows:

    To solve (2.4.1) , it should write Lagrangian function as follows:

    )52(),,( 2 ++= yxxyyxyxLagr

    Where the constant refers to Lagrange multipliers, and Lagr refers to Lagrangian

    function. The method works as follows:

    ==

    ==

    ==

    0)1(),,(

    02),,(

    02),,(

    yxd

    yxdLagr

    ydy

    yxdLagr

    xdx

    yxdLagr

    (2.4.4)

    Since (2.4.4), generally represents a nonlinear equations, refers to a homogenous

    system in three variables, solving theses equations yielded the solution of (2.4.1), as

    follows:

    5.25.1 === yx (2.4.5)

    One can conclude that transforming the (2.4.1) from constrained optimization

    into unconstrained optimization is equivalent for using Lagrange multiplier principle,

    indeed there is another approach, known as dual problem, to solve (2.4.1) that we

    transform the constrained problem with unconstrained problem via replace all the

    17

  • 8/14/2019 Thessl Ch1 Statistics, Entropy , Lagrange , Score Test, Estimation

    18/24

    variables that in the objective function with the Lagrange multiplier, that From (2.4.4)

    it can conclude:

    24 == yx (2.4.6)

    Substituting (2.4.6) in (2.4.1) it will yield that the objective function contain only the

    Lagrange multiplier therefore to minimize (2.4.1) with respect to yx, imply

    maximizing the objective function with respect to , since has the negative sign

    thus there is usually opposite relation between Lagrange multiplier and the objective

    function, hence (2.4.1) can be rewritten as unrestricted problem :

    +

    =++ 222

    8

    3)1

    24(

    88Max (2.4.7)

    Taking the first derivative (2.4.7) to obtain the extreme values as follows:

    3

    401

    4

    38

    3 2

    ==+

    =+

    d

    d

    (2.4.8)

    Substituting (2.4.8) in (2.4.4) it will yield the same solution as (2.4.6).

    According to (later) some remarks should be taken in consideration for searching to

    solution when using Lagrange multiplier principle as follows:

    1. The number of the constraints must be less than or equal to the number of thevariables.

    2. The constraints in the optimization problem must be independent.

    In statistical inference there is a well-known test related to Lagrange multiplier

    for testing hypothesis concerned with the parameter of the distribution see Engle

    (1984). Aitcheson and Silvey (1958) proposed the Lagrange multiplier test which

    derives from a restricted maximum likelihood estimation using Lagrange multiplier,

    18

  • 8/14/2019 Thessl Ch1 Statistics, Entropy , Lagrange , Score Test, Estimation

    19/24

    suppose it is required to maximized );...( 1 nxxL with respect to subject to the

    hypothesis that 0 = , as mentioned above the Lagrangian function can take the

    form:

    )();...(),( 01 = nxxLLagr

    Differentiating ),( Lagr with respect to and then setting to zero it will yield:

    0),...(),( 1 ==

    d

    xxdL

    d

    dLagr n (2.4.9)

    0),(

    ==d

    dLagr (2.4.10)

    For solving (2.4.9) and (2.4.10) simultaneously, one can obtain the derivative of

    the );...( 1 nxxL , then substituting (2.4.10) in to the derivative , thus it will be

    obtained:

    ==

    d

    xxdL

    d

    dLagr n ),...(),( 01 (2.4.11)

    Typically (2.4.11) known as the score function )( 0S .Since is often unknown

    so it will be estimated by MLE see section (2.1), hence smaller value of )( 0S will

    agree with 0 is close to MLE and accept the null otherwise reject 0 is MLE, thus

    score test measures the magnitude between the tested value and MLE, it is obvious

    that zero and the fisher information )(I represents the mean and the variance of

    )( 0S respectively , thus Lagrange multiplier (LM) can be written as :

    )(

    ))((

    0

    2

    0

    I

    SLM =

    Under the null hypotheses, for large sample LM has Chi-Square distribution

    with one degree of freedom, for more details see Judge el at (1982), indeed LM test

    can be extended to test k parameters simultaneously as follows:

    )()()(1

    = SISLMt

    (2.4.12)

    19

  • 8/14/2019 Thessl Ch1 Statistics, Entropy , Lagrange , Score Test, Estimation

    20/24

    Where )( S refers the score function of the vector

    ,1)(

    I refers to the inverse of

    the information matrix of order k, taking the following formula respectively:

    =

    k

    n

    n

    d

    xxdL

    d

    xxdL

    S

    ),...(

    .

    ),...(

    )(

    1

    1

    1

    =

    ));...(ln());...(ln(

    ));...(ln());...(ln());...(ln(

    ));...(ln());...(ln());...(ln(

    )(

    2

    11

    1

    1

    2

    2

    1

    2

    1

    21

    1

    1

    1

    21

    2

    1

    n

    k

    n

    k

    n

    k

    nn

    n

    k

    nn

    xxLd

    dExxL

    dd

    dE

    xxLdd

    dExxL

    d

    dExxL

    dd

    dE

    xxLdd

    dExxL

    dd

    dExxL

    d

    dE

    I

    Note: also (2.4.12) has Chi-Square distribution with k degree of freedom, for more

    simplicity it should take the following example:

    Let nXXX ..21 be a random variables from the sample of size n follows Normal (

    ), 2 see section (2.3.3), suppose it is required to test :

    2

    0

    2

    0: ==oHUsing LM test the logarithm of the normal distributions likelihood function can

    be:

    =

    =n

    i

    in xnn

    xxL1

    2

    2

    22

    1 )(5.

    ln2

    )2ln(2

    ),,...(ln

    The score function will be :

    20

  • 8/14/2019 Thessl Ch1 Statistics, Entropy , Lagrange , Score Test, Estimation

    21/24

    =

    2

    2

    1

    2

    1

    2

    ),,...(ln

    ),,...(ln

    ),(

    d

    xxLd

    d

    xxLd

    S

    n

    n

    normal

    +

    =

    =

    =

    n

    i

    i

    n

    i

    i

    normal

    xn

    x

    S

    1

    2

    42

    12

    2

    )(1

    2

    )(1

    ),(

    Hence the score function under the null hypothesis:

    +

    =

    =

    =

    n

    i

    i

    n

    i

    i

    normal

    xn

    x

    S

    1

    2

    04

    0

    2

    0

    1

    02

    02

    00

    )(1

    2

    )(1

    ),(

    The information matrix under the null hypothesis associated to the normal

    distribution:

    =

    4

    0

    2

    02

    00

    20

    0

    ),(

    n

    n

    Inormal and

    =

    n

    nI normal

    4

    0

    2

    0

    2

    00

    1

    20

    0

    ),(

    Hence, the LM test can take the following formula:

    )()()(1

    = normalnormalt

    normalnormal SISLM

    LMa n o( )

    2

    n o( )2

    1

    2

    n o( )2

    b 2 a o n o( )2

    +2

    o( )4

    n

    +

    Where:

    ====

    n

    i

    i

    n

    i i

    xbxa1

    2

    1

    ,

    21

  • 8/14/2019 Thessl Ch1 Statistics, Entropy , Lagrange , Score Test, Estimation

    22/24

    Remark: as mentioned above normalLM has Chi-Square distribution with 2 degrees of

    freedom. Suppose instead of testing the mean and the variance of the normal

    simultaneously, it is required to test the mean only, hence the only change will be in

    the score function as follows :

    = =0

    )(1

    ),( 102

    0

    2

    0

    n

    i

    i

    normal

    xS

    Therefore the LM test will be:

    )1(2

    2

    2)(

    =

    n

    naLM onormal

    2.5 Some Important Distributions

    In this section, it will be in brief shown some famous distributions which will

    be used in this thesis.

    1 Normal Distribution:

    The normal distribution, also called the Gaussian distribution, is an important

    family of continuous probability distributions, applicable in many fields. Each

    member of the family may be defined by two parameters, location and scale. The

    standard normal distribution is the normal distribution with a mean of zero and a

    variance of one, The importance of the normal distribution as a model of quantitative

    phenomena in the natural andbehavioral sciences is due in part to the central limit

    theorem.

    If X has a normal distribution with mean and variance 2 the density

    function will take the following form:

    2

    2)(

    2

    1

    2

    1)(

    =x

    exf

  • 8/14/2019 Thessl Ch1 Statistics, Entropy , Lagrange , Score Test, Estimation

    23/24

    There is an important properties for normal distribution such as the mean , median

    and mode are all equal also the skewness and the Excess kurtosis equal zero.In fact

    normal distribution has a maximum entropy among all the distributions with fixed

    variance and it is equal e 2ln( with moment generation function equal

    )2

    exp()(22t

    ttMX

    += .

    2 Uniform distribution

    In probability theory and statistics, the continuous uniform distribution is a

    family of probability distributions such that for each member of the family all

    intervals of the same length on the distribution are equally probable. This distribution

    is defined by the two parameters, a and b, which are its minimum and maximum

    values respectively. It has an important role in the generating random numbers

    technique The distribution is often abbreviated U(a,b).

    If X has a Uniform Distribution with minimum a and maximum b the density

    function will take the following form:

  • 8/14/2019 Thessl Ch1 Statistics, Entropy , Lagrange , Score Test, Estimation

    24/24

    Poisson process, indeed exponential distributions can be a special case for Gamma

    Distribution, it has a widely application in life models, biology, mechanics..etc.

    If X has a exponential distribution with rate parameter > 0 the density

    function will take the following form: