farag - random variables

Upload: weretere

Post on 05-Apr-2018

227 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 Farag - Random Variables

    1/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 1

    Contents

    Chapter 4 ......................................................................................................................................... 3Basics of Statistical Modeling ........................................................................................................ 3

    4.1 Introduction ........................................................................................................................... 34.2 Statistical Experiments.......................................................................................................... 44.3 Random Variables ................................................................................................................. 8

    4.3.1 Basic Concepts ............................................................................................................... 8

    4.3.2 Properties of the CDF and the PDF of a random variable ........................................... 13

    4.3.3The Conditional Distribution ........................................................................................ 15

    4.3.4StatisticalExpectation .................................................................................................... 17

    4.3.5Functions of Random Variable ..................................................................................... 22

    4.4 Two Random Variables ...................................................................................................... 254.4.1 Statistical Expectation in Two Dimensions ................................................................. 29

    4.4.2 Functions of two Random Variables............................................................................ 31

    4.4.3Two functions of two random varaibles: ...................................................................... 33

  • 7/31/2019 Farag - Random Variables

    2/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 2

    Fig. 4. 1: Illustration of the random variable as a mapping function from the sample space of the

    sigma algebra into Borel sets on the real line. 8

    Fig. 4. 2:Illustration of the domain and range of a discrete-valued random variable. .................... 9

    Fig. 4. 3: The CDF of a mixed random variable. .......................................................................... 12

    Fig. 4. 4: Illustration of the measurability of functions of random variables. .............................. 23

    Fig. 4. 5: Illustration of the concept of two-dimensional random vectors for two random variables

    on the same sample space.. ........................................................................................................... 26

    Fig. 4. 6: Illustration of region of integration for the marginal density. ................................. 28

  • 7/31/2019 Farag - Random Variables

    3/38

  • 7/31/2019 Farag - Random Variables

    4/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 4

    4.2 Statistical Experiments

    We follow the tradition of describing a statistical experiment; i.e., E = {, , P}.

    Sample space :The sample space enumerates the elementary outcomes of an experiment, and by construct,

    various combinations of them as well. For example, if for some , are theelementary outcomes of E (i.e., its single-valued possibilities or entities of its alphabet), it is a

    natural requirement that combinations of these outcomes, that may also happen in E, should be

    included in . However, events may be constructed arbitrary and some may never happen!

    Hence, it is the tradition to list only the elementary outcomes (elements of the alphabet) in ,

    and the events (all possible combinations of the elementary outcomes or dictionary of the events

    in the experiment) are given a special entity, denoted by a fieldor algebra .

    Field (algebra) :As the events need to be scaled or measured, an enumeration procedure is needed to generate the

    dictionary of events, including those who may not happen; these will be given zero weight or

    measure. Therefore, desirable characteristics of will be such that it will be closed under all

    countable (may be infinite in number) inclusions of elementary events, such as unions.

    Specifically, we define for events and such that:a) The empty set (or null event) and the sample space (sure event) must be in .b) implies that for all .c) If then .

    The above requirements are all natural for the field to be an inclusive space that spans

    (includes) all measurable events.

    Probability measure P:

  • 7/31/2019 Farag - Random Variables

    5/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 5

    The probability measure P is a function that assigns weights to the events in . It is the tradition

    to have a positive scale normalized to the unit interval. It is natural that events may be related

    (dependent) or unrelated (independent) from each others. Events also may be joint (share some

    elementary outcomes) or disjoint(share no elementary outcomes) in their constructs. Therefore,

    a measure P satisfies the following requirements (axioms): [Ash]

    a) for all events and .b) The empty set (or null event) has zero weight and the sample space (sure event) has

    a weight of 1; that is: and c) If then if the events are

    mutually disjoint.

    As a consequence of axioms b and c, if is the complement event of; then Example 1:Consider a histology experiment conducted on cells from a tissue specimen taken

    during a procedure to determine the existence of cancer. In this experiment tissue is treated and

    viewed under a microscope with viewed cells cancerous (C) or noncancerous (NC). As biopsies

    are performed after reasonable doubts to justify doing them, the weights of the C and NC eventsdiffer with procedure. A possible statistical experiment may be described as follows: , , and Example 2: Consider the same experiment in Example 1, in which viewing the cells under the

    microscope revealed a contradictory conclusion with the agent used to prepare the tissue. The

    histologist chart showed unspecified cells (US) under these circumstances. That is the sample

    space became:

    . As these events are independent, the sigma algebra would be

    simply the collection of five possibilities: , and the probability measure willbe such that , and Example 3:Consider the experiment of recording the life span of cells in a specimen under the

    effect of an agent. Suppose that the viewing domain of the microscope allows individual cell

    viewing and tracking, and that it is possible to tag each cell as it moves within the specimen until

  • 7/31/2019 Farag - Random Variables

    6/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 6

    its death. It is expected that a counting experiment will be a tedious one especially with large

    degrees of motion of the cells, yet a histogram may be constructed indicating the number of cells

    that die with time.

    Events containing similar elementary outcomes are called joint, while those with no

    common elementary outcomes are classed disjoint. Probability measures on disjoint events are

    additive. Events with no bearing on each others are called independent. Probability measures on

    independent events are multiplicative. As cause and effect is a natural rule of physics, it is also

    logical to ask of the likelihood of related events with respect to each others. Conditional

    probability measures addresses this issue, providing the value of the probability of an event

    given a second event took place as the ratio of their joint probability normalized by the

    probability of the second event. We formally define these concepts as follows.

    Definition 1: Two events and on the same sample space are said to disjoint if theirintersection is empty; i.e., Definition 2: Two events and , not necessarily on the same sample space , are said to beindependent if their joint probability measure is multiplicative;

    i.e.,

    Definition 3: The conditional probability measure of event given the occurrence of event ,where and , not necessarily on the same sample space , is denoted by and isdefined by: provided that The expression for the conditional probability in Definition 3 is usually called Bayes Theorem.

    As direct corollary to the above definitions, independent events do not affect the value of the

    conditional probability measure. Likewise, disjoint events have a zero-valued joint probability

    measure and, subsequently, zero-valued conditional probability measure.

  • 7/31/2019 Farag - Random Variables

    7/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 7

    Example 4: Consider events and that is a subset of; symbolically, Itfollows that the conditional probability = and that the joint probabilitymeasure A stimulus may affect a set of events even though they may be disjoint. In this case,conditional probabilities enable measuring the overall effect of that stimulus. For example,

    suppose that an experiment produces three disjoint events such that . We can measure the effect of a stimulus (i.e., another event) on the set ofevents by noting that , and since the intersections are disjoint as well, thenthe probability measure

    is additive with respect to the components

    ; i.e.,

    ,which can be written in terms of conditional expectations as: This can be generalized to a larger number of events in what is known asthe total probability theorem which is stated below.

    Theorem 1: Consider a set of events and . Suppose the events aremutually disjoint; i.e., they span the sample space. Consider an event . We can show that

    The proof of the total probability theorem follows from the fact that the probability measure of

    disjoint events is analytically additive. [Ash]In general, enlisting the events in and will not be a simple task, if not impossible,

    with real-world problems. Likewise, obtaining the corresponding functional form for the

    probability measure will be a rough task to accomplish. A logical task that statistions faced hasbeen to search for automatic procedures to generate events and their measure for particular

    experiments. Indeed, a general approach has evolved to accomplish this task through the conceptof random functions; commonly referred to as random variables. This type of function maps

    events on the sample space and the sigma algebra, which may be quite abstract, into another

    representation on the Cartesian domain on which assigning a scale (measure or weight) will be

    possible using the common tools of calculus. We examine this concept next as prelude to

    describing multivariable experiments and complicated stochastic phenomena.

  • 7/31/2019 Farag - Random Variables

    8/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 8

    4.3 Random Variables

    4.3.1 Basic Concepts

    A random variable is a mapping from the sigma algebra into the real line such that theeven . As illustrated in Fig. 1, the random variable is a mapping fromevents in into other forms of events (sets) defined on the real line; these sets are called Borel

    sets. For compact representation, we insist that the inverse mapping of the Borel sets are events

    on ; i.e., the set for all Borel sets [Peiper].To avoid potential confusion on the difference between the sample space and the sigma

    algebra vis--vis the random variable, we use as stated before, the alphabet of the experiment to

    represent and its dictionary to represent ; hence, is technically a mapping from into. Of course, it is understood that events in are formed of elementary events, i.e., mustsubsequently be contained in . Like any function, has a domain which will be and arange which is the real line . Hence, a short hand for this relationship would be as follows: . As domains and ranges may be continuous or discrete, we will be able to considercases where the mapping between continuous entities, in which

    will be denoted as

    continuous random variable, and between discrete entities, in which will be denoted asdiscrete random variable. As the mapping is between events, it is natural that weights (measure)of the two ends of the mapping would be the same. That is, in the illustration shown in Fig. 1, we

    must have

    Fig. 4. 1: Illustration of the random variable as a mapping function from the sample space of the sigma algebra into

    Borel sets on the real line.

  • 7/31/2019 Farag - Random Variables

    9/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 9

    We shall provide few illustrative examples to demonstrate the power of the random variable

    concept in describing a statistical experiment.

    Example 5: Consider the experiment in example 1 above. Suppose we have a function which takes an arbitrary value of 1 for C and -1 for NC. It is evident that we can establishevents . In particular, we have the following possibilities:

    The above example illustrates a discrete-valued random variable, which has a finite range and

    domain. Fig. 2 illustrates this mapping.

    Fig. 4. 2: Illustration of the domain and range of a discrete-valued random variable.

    Example 6: Suppose a statistical experiment with sample space consisting of eight elementary outcomes. Considera function . We mayillustrate that

    is a discrete-valued random variable on

    by examining whether

    is an event. It is evident that:

  • 7/31/2019 Farag - Random Variables

    10/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 10

    Hence, is a random variable.The above illustrates the value of the random variable concept as an event generator.

    Equally important is to see whether the random variable will also enable measurability of the

    generated events. In other words, can we evaluate The ability toevaluate this quantity achieves the measurability requirements for the events in , and thus

    completely describes the statistical experiment. Indeed, the quantity hasa special name in mathematical statistics, known as the cumulative probability distribution

    function (CDF). From its definition, the event shrinks as and expands as ;therefore, the CDF is non-decreasing. Indeed it is also monotone; hence, continuous from right;

    that is:

    As the CDF is a

    fundamental concept in the theory of statistics we shall examine it in further details. Indeed, we

    shall, for the ease of notations, connote the CDF as and will give it the specialsymbol . To make the notations sync, we shall define the CDF formally as follows:Definition 3:Given a random variable defined for a statistical experiment E = {, , P}.The cumulative probability distribution (CDF) of is a function on the real line whichmeasures the event

    We connote this function as

    for every Borel set The CDF, like any function, may be studied from the theory of calculus. Therefore, the

    description of the statistical experiments and indeed the statistical theory gets transitioned from a

    seemingly guessing game into a well-founded mathematical theory through the concepts of

    random variable. The monotoncity, hence, continuity of is evident from the definition.We can easily infer the following properties of the CDF.

    Fact: The CDF function have the following properties:(i) (ii) is differentiable over .

  • 7/31/2019 Farag - Random Variables

    11/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 11

    Definition 4: The probability density function (PDF) of a random variable is thederivative of the CDF function; i.e., = . We shall see later on that both and enable various studies related to theexperiment under consideration.

    The differentiability of the CDF is a functional property due its continuity, and that does

    not imply that the random variable itself (the function that maps events to Borel sets on the real

    line) is continuous or discrete. The distinction between continuous and discrete random variables

    is in the form of the Borel sets that are generated from the mapping. A discrete random variable

    means that its domain may take only finite but discrete values; i.e., the Borel sets are discrete,

    while a continuous random variable has a continuous domains; i.e., the Borel sets are continuous.

    As a consequence of these characteristics, we may be able to define the probability of a point

    event for discrete random variables; i.e., . Point events for continuous random variables are emptysets; i.e., their measures will be zero.The implications for these characteristics and differentiability of will be such that thefollowing distinctions will be made between discrete and continuous random variables:

    {

    (1)

    For the integer-valued case, the set will be integer valued and the upper limit of the summation in Eq. (1) would beeasy to determine. If we use the generalized function representation, we may express the PDF for

    the discrete random variables in terms of the Dirac delta function; i.e., hence, Therefore, a unified expression for the CDF may be writtenfor both continuous and discrete random variables in terms of an integral functional only; i.e., In practice, however, a clear distinction will be implicit in termsof the definition of the experiment and the values that events may take. Indeed, on principle, we

  • 7/31/2019 Farag - Random Variables

    12/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 12

    may define a random variable to describe events that has both continuous and discrete domains.

    We provide below some illustrative examples for the CDF and PDF for various random

    variables.

    Example 7: Consider a random variable with the CDF shown in Fig. 3 which has the following

    functional form:

    {

    We note that the CDF in this example is not differentiable at The problem,however, is at where the CDF has a jump (discontinuity) of size We can address thebehavior of the CDF at the jump noting that:

    Fig. 4. 3: The CDF of a mixed random variable.

    Hence, functional analysis enables us to address the various scenarios that the CDF may have.

  • 7/31/2019 Farag - Random Variables

    13/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 13

    4.3.2 Properties of the CDF and the PDF of a random variable

    As the CDF is a probability measure for the event

    then, the positivity

    and upper bound of the CDF are evident. Likewise, as the name implies, the CDF is non-

    decreasing and monotone, hence, the continuity from the right is evident. We can also infer the

    properties of the CDF for disjoint events as well as independent events using the basic

    definitions. We also note that the following identities follow straight from the definition of the

    CDF. i) The PDF, being a derivative of the CDF, possess the positivity and summability (integrability)

    constraints. Writing the PDF for discrete random variable as:

    = The most important virtues of the CDF is their ability to provide a computational mechanism (or

    algorithm) to evaluate the probability measure using the form of the probability mass of the

    Borel sets corresponding to the events on the sigma algebra through summation (discrete

    random variable) or integration (continuous random variable). Some examples will illustrate

    these concepts.

    Example 8: Consider a statistical experiment defined within the boundaries illustrated in Figure

    1. Suppose the mass function of the Borel sets are such that they have an exponential form:

    We can evaluate the probability measure of the event

    by the mass of its

    equivalent Borel set . In particular, we can evaluate theprobability of such events as:

  • 7/31/2019 Farag - Random Variables

    14/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 14

    Over the long history of statistics, various functional forms for resulted for specificspractical experiments. Some of the most common PDFs are listed in Table 1.

    Table 1: Common PDF functional forms.

    Name Continuous PDF Discrete PDF

    Uniform Exponenti

    al

    ;discrete-valued

    Gaussian N/ACauchy N/ALaplace N/ARayleigh

    N/A

    Maxwell N/AGamma N/A

    Beta

    N/A

  • 7/31/2019 Farag - Random Variables

    15/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 15

    Binomial N/A n: independent trails

    Poisson N/A

    The parameters in the CDF forms in Table 1 represent the modes as well as resolution measures

    of the functions and will be addressed later on.

    It is important to remember the context of the definitions of the CDF and the PDF and the

    brilliance of the statisticians, physicists and mathematicians who, over the years, developed these

    tools that enable proper description and characterization of statistical experiments. The random

    variable concept has been the catalyst for mapping the statistical experiment from a guessing

    game into a functional representation that is governed by the rules of calculus.

    4.3.3The Conditional Distribution

    The CDF and the PDF may be defined for conditional events. For example, we may study the

    behavior of event A defined such that: with respect to another event B which may or may not be defined on the same sample spaceWe can consider the conditional CDF as measure of event { in same way we definedthe conditional probability before

    We point out that } means that , and the event may or maynot be a subset of the same sample space . We maintain that the conditional CDF possesses allthe properties of the CDF as discussed before. In particular, it is straight forward to justify the

  • 7/31/2019 Farag - Random Variables

    16/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 16

    following properties: , , and for Indeed, the total probability theorem that we studied before, may be expressed as follows:

    consider a partition on a sample space with known apriori measures Any event may defined in terms of thepartition such that: , and the CDF for random variable will be defined asfollows:

    .

    In the same fashion we studied the probability density function (PDF) before, we can

    define the conditional PDF as well; that is, for continuous random variables we have:

    .Example 9: Consider the two events

    and

    . We can show that: By derivatives, we obtain the conditional PDF

    Example 10:Consider the events and . We can show that:

  • 7/31/2019 Farag - Random Variables

    17/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 17

    { The conditional PDF will be:

    The conditional distribution is an important mechanism to measure the effects of a stimulus over

    a certain behavior, or the degree of dependency of events over each others. Later on we will

    generalize the concepts of random variable to describe experiments that can create indexed set of

    random variables, where the index is defined on the set of real numbers or on a graph; we will

    exploit the dependency of the random variables to perform clustering of classes into meaningful

    description, and we will be able to study various control mechanisms of experiments that can

    only be studied over a long period of observations.

    4.3.4StatisticalExpectation

    The CDF and the PDF in Table 1 are described in terms of parameters that quantify the basic

    behavior of these functions, such as their mean value and the dispersion (variance) around that

    mean value. Over the years of statistical analysis, statisticians have developed metrics that can be

    computed from the CDF or the PDF which provide important clues about the behavior of a

    statistical experiment. In fact, some of these metrics may provide an approximation or upper

    bounds of the probabilities that a certain event may take without actual calculation of the CDF

    and the PDF; thus a prediction may be possible for the behavior of events based on numerical

    calculations. Among these metrics is the statistical expectation which will be formally defined

    below.

    Definition 5:Given a random variable with PDF, itsexpect value is defined by

  • 7/31/2019 Farag - Random Variables

    18/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 18

    {

    The expectation as defined above for a single random variable is a numerical indication of how

    symmetric the PDF is around the origin. Due to the positivity constraints of the PDF, a zero

    expected value connotes symmetry, while a positive value indicates that the PDF is tilted

    towards the positive (right hand side) of the origin, and a negative value indicates that the PDF is

    tilted to left of the origin. Hence, in reality the quantity provides the mean value of thePDF or its locus of symmetry on the

    -axis. Therefore,

    the mean value or the average of

    the random variable From the strict mathematical sense, we can define the effect of a generalfunction of the random variable on the PDF. For example, we may define the expected value of afunction as follows:Definition 6:Given a random variable with PDF , suppose we define a mapping ; i.e., the domain and range of is similar to the expect value of isdefined by

    {

    In this case, the mean is a special case of The expectation of measures theaverage of the function. Indeed, we shall study the fact that a function of a random variable is

    also a measurable function, which defines a random variable as well. Hence, we may calculate

    as above, using the PDF of random variable , or first calculate the PDF of and then we calculate as follows:

  • 7/31/2019 Farag - Random Variables

    19/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 19

    {

    The above result is very fascinating, even though we have not formally derived it, and indeed has

    enormous practical implications in the statistical analysis, this due to the fact that calculating the

    PDF is not a simple matter, even for simple transformations. Therefore, the PDF of a random

    variable serves a great deal for functions of as well. We focus on few definitions which arespecial cases of and provide some specific clues on the PDF of.Definition 7:Given a random variable with PDF, the variance of is defined as:

    {

    The variance measures the dispersion (width or concentration) of the PDF around the mean

    value. It serves to estimate the domain of the values that the random variable

    may take. In

    fact, an upper bound has been derived, through the Chebyshev inequality, which estimates the

    upper bound of the probability of a normalizedevent.

    Fact:The Chebyshev inequality [Papoulis]. Consider a random variable with PDF having amean value and variance Define the normalized event We canshow that: .The proof the Chebyshev inequality follows directly from the definitions and the positivity of the

    density function. Its importance is in providing a measure for the concentration of the values of

    the random variable around its mean value.

    Definition 8: Given a random variable with PDF, the moment ofaround the originis defined as:

  • 7/31/2019 Farag - Random Variables

    20/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 20

    {

    The central moment is obtained in a similar fashion by the proper substitution inthe above equation. We now provide few examples to illustrate how the expectations may be

    calculated for some of the PDFs in Table 1.

    Example 11: Evaluate the mean and the variance of the Poisson distribution. This distribution is

    that of a counting random variable which estimates the instances of occurrences of a particular

    event, within a certain period, given its frequency of occurrences.

    Similarly,

    On the other hand, we note that

    Hence, it is immediate that

    for the Poisson distribution. Example 12: Evaluate the mean and the variance of the Cauchy distribution.

  • 7/31/2019 Farag - Random Variables

    21/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 21

    The mean value is zero since is an even symmetric function. The difficulty arriveswith the variance since the 2nd order expectation is infinite. Hence, the variance is

    undefined. Yet, the parameter

    measures the relationship between the maximum height

    and the span (width) of the PDF.The knowledge of the PDF (or the CDF) enables computing the expectations. It is a

    natural question to ask whether the knowledge of the expectations enable the

    determination of the functional form of the PDF? Unfortunately, it is not possible to infer

    the PDF from the expectations (moments) except in the case of the Gaussian PDF which is

    explicitly defined in terms of its mean and variance. Yet, statisticians studied mechanisms

    to generate the moments automatically through the so-called moment generating function,which is defined below.

    Definition 9: Given a random variable with PDF, the moment generating function ofis defined as The moments may be generated from as follows.

    ; Example 13: Evaluate the moment generating function for the standard uniform PDF defined

    over the interval .

    It is evident that the mean value

    for example may be obtained from first derivative of

    and taking the limit as approaches zero (note the need for using the l'Hpital's rule).

  • 7/31/2019 Farag - Random Variables

    22/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 22

    One form of expectations, that mimics the moment generating function yet provides a great

    functional ability to analyze, and indeed construct, the PDF of a random variable is known

    as the characteristic function and is defined as follows:

    Definition 10: Given a random variable with PDF, the characteristic function of is defined as

    {

    is the conjugate of the Fourier transform, which is a unique mapping of the function, thus a direct link is established between the PDF and the characteristic function,which is very useful in a number of applications.

    Example 14: Evaluate the characteristic function for the standard uniform PDF defined over

    the interval .

    4.3.5Functions of Random Variable

    It is natural to ask how a transformation of a random variable (which is a function) would

    be considered, in the context of the statistical experiment that the random variable was

    introduced, in order to characterize in terms of the theory of calculus instead of

    enumerations and guessing. The key issue is the measurability of functions of random

    variables; i.e., whether the mapping is actually constrained by the original domain (of

    course the range as well) of the original random variable?A random variable is amapping from the sigma algebra into the real line such that the event , and we also insisted that the inverse mapping is an event as well, i.e., . Now let us define a function ()It is evident that the transformation

  • 7/31/2019 Farag - Random Variables

    23/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 23

    will generate events on that must be Borel sets. Likewise, the inverse mapping will generateevents that must be in and , by construction. Hence, the measurability of ()is quite evident and inherent. Fig. 4 illustrates this concept, where the mappings of

    onto the real line produce Borelsets , which may be remapped back tothe event .

    Fig. 4. 4: Illustration of the measurability of functions of random variables.

    Theorem [Piper]: If a random variable and is a Borel function, then () is a random variable measurable by ; i.e., the algebra generated by is a subset of the algebra generated by As we done before with random variables, we characterize

    ()by its

    cumulative probability distribution function or by its probability density function.

    Depending on the functional form of , we may obtain the CFD interms of . Table 2 lists few examples of the CDF and PDFs of functions ofrandom variables (Papoulis).

  • 7/31/2019 Farag - Random Variables

    24/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 24

    Table 2: The CDF and PDF of functions of random variables for various functional forms.

    () CDF

    PDF

    () ( ) ( )

    Monotone increasing ()Monotone decreasing

    ()

    Continuous and hascountable number of real

    roots , Integrate

    ;

    Derivations of the entries in Table 2 are straight forward and follow from the definitions of

    the CDF, which should be obtained first, and then the derivative operator is applied to

    obtain the PDF.

    Functions of random variables is an important concept in mathematical statistics that

    enables statistical inference about one random variable (or, in general, a random process

    as will be studied later) in terms of another. Indeed, simulations of random variables from

    different distributions may be obtained from simulating a random variable from a standard

    uniform; i.e., such form has been studied and various randomnumber generators exist to generate samples from it. We will study simulations of random

    variables after we briefly cover the case of random experiments that may be described by

    two random variables; step towards generalizations to sequence of unstructured random

  • 7/31/2019 Farag - Random Variables

    25/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 25

    variables, which will be followed by random variables defined on a graph or have a

    particular meaningful index. As our intention is to study random processes and random

    fields that have been useful for biomedical image analysis, we will not dwell into the details

    of random variables and statistical inferences beyond the basic background needed for our

    subsequent coverage.

    4.4 Two Random Variables

    Events in random experiments may take various forms, and may be better understood in

    terms of higher dimensional mapping, instead of one-dimensional. In this case the

    measurability of the events on

    would be in terms of the mass of events (Borel

    sets)on multidimensional spaces. Consider two random variables , theevents and may be paired with respect tothe two parameters such that a new event may be defined on the space as shown in Fig. 5. We mayexpress the random variable with respect to as ( ) or in a vector; i.e., . We will be interested mainly onrandom variables defined on the same sample space, hence, for simplicity we will write

    and

  • 7/31/2019 Farag - Random Variables

    26/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 26

    Fig. 4. 5: Illustration of the concept of two-dimensional random vectors for two random variables on the same

    sample space. For measurability, the sets and are Borel sets.In direct analogy to the single random variables case, we may determine the probability

    mass distribution on the Borel sets defined on the plane , by the assignment ()

    hence, when dealing with a pair of random variables

    , we have three

    induced probability measures: The mass distribution on represented by consists of the unit mass distributed along . Similarly, represents a unit mass distributed along , and represents a unit mass distributedalong . The measures are related to as follows:

    Now if ; i.e., the entire real line, then , so that . Similarly, if ; i.e., the entire real line, then , so that . We formally define these quantities below.

  • 7/31/2019 Farag - Random Variables

    27/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 27

    Definition 11:The probability measure defined on the Borel sets in the plane is called the joint probability measure induced by the joint mapping

    . The probability mass distribution is called the joint distribution. The

    probability measures and are called themarginal probability measures induced by random variables respectively. Thecorresponding probability mass distributions are called the marginal probability

    distributions. Definition 12: The joint probability distribution function, or the joint cumulative distribution

    function for random variables

    , denoted by

    is defined by

    for all values The special cases and arecalled the marginal distribution functions for random variables respectively. We follow the same development for the single random variable case and define the probability

    density function for two random variables.

    Fact (Feller Vol. 1): Ifthe joint probability measure

    induced by

    is absolutely

    continuous, a function exists such that: is called the probability density function for .Conversely, given and that satisfy the previous properties, there exists two random variables defined on some sample space with joint probability distribution and densityfunctions and respectively. We may enlist the main properties of

    following the same manner as we did with single

    random variables; specifically, we note that:

    i. ii.

    iii.

  • 7/31/2019 Farag - Random Variables

    28/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 28

    iv. Example 15: Consider two jointly continuous and positive random variables

    with the

    following joint density function:

    Find Solution: The required quantities may be obtained by integrating

    over the specific

    region, see Fig. 6.

    ;

    Fig. 4. 6: Illustration of region of integration for the marginal density

    .

    Also, ; .The cumulative distribution function will be computed from the basic definition:

  • 7/31/2019 Farag - Random Variables

    29/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 29

    given that leads to

    We may address the issue of dependence of random variables here due to its importance.

    We maintain from definition 2 and example 9, that events are independent if their joint probabilitiesdistribute; that

    = .

    Note that in the above example, has a separable form, but the two random variablesare not independent due to the restrictions

    The independence concept will come

    handy in various applications, including simulations of random variables.

    4.4.1 Statistical Expectation in Two Dimensions

    Expectations in two dimensions follow in the same manner the one-dimensional case (see

    definitions 5 and 6). In particular, the expected value of a function of randomvariables is defined as follows:Definition 13: Given a pair of random variables , suppose we define a mapping ; i.e., the domain and range of is similar to the expectvalue of is defined by

  • 7/31/2019 Farag - Random Variables

    30/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 30

    {

    where is the probability density function for continuous random variables and isthe probability mass function for discrete random variables. Definition 14: Given a pair of random variables , the covariance function is definedas:

    Cov( where are the marginal means as defined in Definition 5. That is, for any randomvariable, we have:

    {

    where is the probability density function for continuous random variable and is theprobability mass function for discrete random variable . A special case for definition 14 that does not include the marginal means is called the auto correlation

    ;

    i.e., .We note that all the properties of the expectation operators in higher dimensions are similar to that in one-

    dimension.

  • 7/31/2019 Farag - Random Variables

    31/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 31

    In two dimensions (multi-dimensions, in general), the random variables may possess various degrees of

    relatedness; which may span the entire scale offully relatedto completely unrelatedor independent.

    We define the relatedness in terms ofcorrelation coefficient

    ; i.e.,

    The Correlation coefficient , where is the standard deviation of random variable .If the covariance is zero, then the two random variables are said to be uncorrelated. If the two

    random variables are independent, then their covariance becomes zero, and therefore, by

    definition, the two random become uncorrelated. We note that the converse is not true, except

    in the case of Gaussian random variables, which will be considered later.

    4.4.2 Functions of two Random Variables

    It is natural to study the mappings (transformations) of two random variables as we did with

    single random variables. Here we consider the transformation where is ageneral function. We used the bold face letter to represent the functional form for whichmay be a vector. A given realization (certain value or instant ofwill be represented by smallbold face letter The key issue is the measurability of which may be understood in asimilar fashion as we did in one-dimensional case; see Fig. 4. We maintain that under normal

    conditions, is measurable and, therefore, we may define a complete functional descriptionfor it in terms of the probability distribution function; ). We consider fewcases below.

    Example 16: Consider two jointly continuous random variables with probability densityfunction Define a summation function We maintain that is ameasurable random variable whose CDF may be calculated as follows.

    and the density function would be evaluated as follows:

  • 7/31/2019 Farag - Random Variables

    32/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 32

    The inner differentiation may be evaluated using Leibnitz rule; i.e.,

    Hence, As a special case, if

    are independent, then joint denisty becomes separable, and

    will

    have the following form: ,which is the convolution integral. For this case, the CDF would be: For illustartion, let us consider the simple casewhere both are independent and have uniform distribution. Let It is straight forward to show that

    {

    Table 3 lists few examples for the distribution of functions of two random variables; all can be

    derived from the basic defintions. These functions maps two random variables into a single

    random variable; i.e., there is a reduction in dimensionality involved in the transformation. We

    will consider the case where the functions of random varaibles are vectors next.

  • 7/31/2019 Farag - Random Variables

    33/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 33

    Table 3: Some one-dimensional functions of two random variables and the correspodning CDF

    and PDF.

    CDF PDF

    ;

    4.4.3Two functions of two random varaibles:

    As a step twards generlized representation of functions of random varaibles, let us consider

    vector representations of the random varaibles

    , i.e.,

    and a functional form

    for that vector representation will be quitestraight forward in the case where is a linear mapping of [ ] , for some constants If the transformation is inverible; i.e., exists. Hence, = . The

  • 7/31/2019 Farag - Random Variables

    34/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 34

    Jacobian of the transformation is defined as: . Then the joint probability densityfunction of may be written in terms of the joint density of randomvaraibles where |.| stands for the determinenat.Example 17: Consider two jointly continuous random variables with probability densityfunction Consider the polar coordinates:

    , , ( ). Wemay define theJacobian of the polar transformation as follows: =

    [ ] Hence,

    .

    4.5 Simulation of Random Variables

    In this section we discuss how to simulate a one and two-dimensional random variables of

    general distributions, starting from standard uniform distribution. This will bring about the

    linkage between a distribution or density function and deterministic functional forms, which

    represents instants or realizations of the random variables. Generating such instants may be used

    in a reverse fashion to estimate the density function using histograms. Simulation is an art and a

    well-studied subject in statistical design; stochastic systems are studied based on simulations. We

    only scratch the surface here to bring about some feeling of the outputs of random experiments in

    a form of realizations of the random variables that map the sample space into Borel sets on the

    real line. The fundamental step in the simulation is the simple observations that since the

    distribution function is monotone and non-decreasing, an inverse may be possible; hence

  • 7/31/2019 Farag - Random Variables

    35/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 35

    mapping from uniform distribution to any other form is possible. We illustrate this as follows:

    Let are two random variables, not necessarily on the same sample space or have thesame form of distribution. Suppose the two random variables are related to each through the

    monotone transformation: The distribution function of with respect to thedistribution of is as follows: () = , if is uniformly distributed. Hence, we may take Therefore, if we generate a realization from , a realization from is obtained by inversemapping.

    4.6 Summary

    This chapter introduced the basic concepts of statistical modeling which serves as building block

    for the material in the book. We defined the concept of random variable from an operational

    point view and illustrated its significance in describing statistical experiments. Characteristics of

    a single random have been studied, including the probability distribution and density function

    and the concept of statistical expectation. Extensions to function of single random variables and

    joint distribution of two random variables were considered. The chapter concluded with basics of

    simulation of one and two-dimensional random variables.

    4.7 Computer Laboratory

    A random variable of can be easily converted to any other distribution such as normaldistribution. The methodology of generating random numbers has several forms. However, the

    one used in great majority is the linear congruential generators (LCG), whose recursive formula

    is given by (A4.1), where is a sequence of integer values ranging from 0 to , startingfrom a seed point : (A4.1)Where m is the modulus,

    is the multiplier,

    is the increment, and all are non-negative values.

    In order to obtain numbers in the interval the generated integer sequence should be dividedby to obtain .There are many variations of LCGs, the form used in this project isthe multiplicative LCG were the addition of is not needed, so the recursive formula of thegenerator can be given by (A4.2)We may convert the generated uniform streams to follow a Gaussian distribution of zero meanand unit variance using the Box-Muller approach described as follows;

  • 7/31/2019 Farag - Random Variables

    36/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 36

    If and are independent random variates (realization from random variable) from ,then and are independent standard normal deviates (realization from ) obtained asfollows:

    211 2cosln2 uuZ (A4.3a)

    2122sinln2 uuZ (A4.3b)

    The approach is to run the random generator to get a very large number of samples of a uniform

    random variable, then extract even-ordered terms to be and the odd-ordered terms to be , inorder to guarantee their independence.

    Task 1:(a)Generate three streams (1D, 2D and 3D) of random numbers with 100,000 samples; you

    can use the Matlab function for random number generation.(b)Visualize the generated samples,Hint: you can use a scatter plot.(c)Compute the histogram of the three streams, and then normalize them to become a

    probability density function (pdf).

    (d)Visualize the pdfs of the three streams. Are the samples uniformly distributed? Do thepdfs represent a standard uniform distribution? Comment.

    Task 2: use the streams generated in task 1 to do the following,

    (a)Convert the generated streams to a standard normal distribution(b)Visualize the data samples. Are they normally distributed with mean 0 and unit variance?(c)Compute the normalized histogram (pdfs) of the data samples from (a) and visualize the

    distribution, comment.

    Task 3: use the streams generated in task 2 to do the following;

    (a)Convert the 1D normal variate to a normal distribution with mean -2 and standarddeviation 3.

    (b)Convert the 2D normal variate to a normal distribution with mean vector

    2

    1M and

    Covariance matrix

    94

    44.

    (c)Convert the 3D normal variate to a normal distribution with mean vector5

    5

    6

    M

    and

    Covariance matrix

    5 2 1

    2 5 0

    1 0 4

    .

  • 7/31/2019 Farag - Random Variables

    37/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 37

    (d)For (a), (b) and (c) estimate the mean and covariance matrix and compare the estimatedvalues with the ground truth ones, comment.

    4.8 Exercises

    1. Given a probability space Let be the events such that Prove/disprove the following:(a) (b) (c) .

    2. Given a sample space (a)Construct the smallest algebra that contains the two sets

    and

    (b)Construct the smallest algebrathat contains the two sets and (c)What members ofthat and measurable?3. Let be a nonnegative finitely additive set function on the field . Show that

    4. Let

    be arbitrary subsets of a set . Describe explicitly the smallest

    field

    containing

    . How many sets are there in

    ? (i.e., give an upper bound thats

    attainable.5. Derive the entries in Table 2.6. Derive the entries in Table 3.

  • 7/31/2019 Farag - Random Variables

    38/38

    F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 38

    d)4.9 References1. Robert Ash, Real Analysis and Probability, Academic Press, New York, 1972.2. Athanasios Papoulis, Probability, Random Variables and Stochastic Processes, 3rd

    Edition, McGraw-Hill, New York, 1991.

    3. Paul E. Pfeiffer, Concepts of probability Theory, 2nd Edition, Dover, New York, 1978.4. Wilbur Davenport, Random Processes, McGraw-Hill, New York, 19705. William Feller, An Introduction to Probability Theory and Its Applications, Vol.1,

    Wiley, New York, 1968.

    6. Reuven Rubinstein, Simulation and the Monte Carlo Method, Wiley, New York, 1981.