farag - random variables
TRANSCRIPT
-
7/31/2019 Farag - Random Variables
1/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 1
Contents
Chapter 4 ......................................................................................................................................... 3Basics of Statistical Modeling ........................................................................................................ 3
4.1 Introduction ........................................................................................................................... 34.2 Statistical Experiments.......................................................................................................... 44.3 Random Variables ................................................................................................................. 8
4.3.1 Basic Concepts ............................................................................................................... 8
4.3.2 Properties of the CDF and the PDF of a random variable ........................................... 13
4.3.3The Conditional Distribution ........................................................................................ 15
4.3.4StatisticalExpectation .................................................................................................... 17
4.3.5Functions of Random Variable ..................................................................................... 22
4.4 Two Random Variables ...................................................................................................... 254.4.1 Statistical Expectation in Two Dimensions ................................................................. 29
4.4.2 Functions of two Random Variables............................................................................ 31
4.4.3Two functions of two random varaibles: ...................................................................... 33
-
7/31/2019 Farag - Random Variables
2/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 2
Fig. 4. 1: Illustration of the random variable as a mapping function from the sample space of the
sigma algebra into Borel sets on the real line. 8
Fig. 4. 2:Illustration of the domain and range of a discrete-valued random variable. .................... 9
Fig. 4. 3: The CDF of a mixed random variable. .......................................................................... 12
Fig. 4. 4: Illustration of the measurability of functions of random variables. .............................. 23
Fig. 4. 5: Illustration of the concept of two-dimensional random vectors for two random variables
on the same sample space.. ........................................................................................................... 26
Fig. 4. 6: Illustration of region of integration for the marginal density. ................................. 28
-
7/31/2019 Farag - Random Variables
3/38
-
7/31/2019 Farag - Random Variables
4/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 4
4.2 Statistical Experiments
We follow the tradition of describing a statistical experiment; i.e., E = {, , P}.
Sample space :The sample space enumerates the elementary outcomes of an experiment, and by construct,
various combinations of them as well. For example, if for some , are theelementary outcomes of E (i.e., its single-valued possibilities or entities of its alphabet), it is a
natural requirement that combinations of these outcomes, that may also happen in E, should be
included in . However, events may be constructed arbitrary and some may never happen!
Hence, it is the tradition to list only the elementary outcomes (elements of the alphabet) in ,
and the events (all possible combinations of the elementary outcomes or dictionary of the events
in the experiment) are given a special entity, denoted by a fieldor algebra .
Field (algebra) :As the events need to be scaled or measured, an enumeration procedure is needed to generate the
dictionary of events, including those who may not happen; these will be given zero weight or
measure. Therefore, desirable characteristics of will be such that it will be closed under all
countable (may be infinite in number) inclusions of elementary events, such as unions.
Specifically, we define for events and such that:a) The empty set (or null event) and the sample space (sure event) must be in .b) implies that for all .c) If then .
The above requirements are all natural for the field to be an inclusive space that spans
(includes) all measurable events.
Probability measure P:
-
7/31/2019 Farag - Random Variables
5/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 5
The probability measure P is a function that assigns weights to the events in . It is the tradition
to have a positive scale normalized to the unit interval. It is natural that events may be related
(dependent) or unrelated (independent) from each others. Events also may be joint (share some
elementary outcomes) or disjoint(share no elementary outcomes) in their constructs. Therefore,
a measure P satisfies the following requirements (axioms): [Ash]
a) for all events and .b) The empty set (or null event) has zero weight and the sample space (sure event) has
a weight of 1; that is: and c) If then if the events are
mutually disjoint.
As a consequence of axioms b and c, if is the complement event of; then Example 1:Consider a histology experiment conducted on cells from a tissue specimen taken
during a procedure to determine the existence of cancer. In this experiment tissue is treated and
viewed under a microscope with viewed cells cancerous (C) or noncancerous (NC). As biopsies
are performed after reasonable doubts to justify doing them, the weights of the C and NC eventsdiffer with procedure. A possible statistical experiment may be described as follows: , , and Example 2: Consider the same experiment in Example 1, in which viewing the cells under the
microscope revealed a contradictory conclusion with the agent used to prepare the tissue. The
histologist chart showed unspecified cells (US) under these circumstances. That is the sample
space became:
. As these events are independent, the sigma algebra would be
simply the collection of five possibilities: , and the probability measure willbe such that , and Example 3:Consider the experiment of recording the life span of cells in a specimen under the
effect of an agent. Suppose that the viewing domain of the microscope allows individual cell
viewing and tracking, and that it is possible to tag each cell as it moves within the specimen until
-
7/31/2019 Farag - Random Variables
6/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 6
its death. It is expected that a counting experiment will be a tedious one especially with large
degrees of motion of the cells, yet a histogram may be constructed indicating the number of cells
that die with time.
Events containing similar elementary outcomes are called joint, while those with no
common elementary outcomes are classed disjoint. Probability measures on disjoint events are
additive. Events with no bearing on each others are called independent. Probability measures on
independent events are multiplicative. As cause and effect is a natural rule of physics, it is also
logical to ask of the likelihood of related events with respect to each others. Conditional
probability measures addresses this issue, providing the value of the probability of an event
given a second event took place as the ratio of their joint probability normalized by the
probability of the second event. We formally define these concepts as follows.
Definition 1: Two events and on the same sample space are said to disjoint if theirintersection is empty; i.e., Definition 2: Two events and , not necessarily on the same sample space , are said to beindependent if their joint probability measure is multiplicative;
i.e.,
Definition 3: The conditional probability measure of event given the occurrence of event ,where and , not necessarily on the same sample space , is denoted by and isdefined by: provided that The expression for the conditional probability in Definition 3 is usually called Bayes Theorem.
As direct corollary to the above definitions, independent events do not affect the value of the
conditional probability measure. Likewise, disjoint events have a zero-valued joint probability
measure and, subsequently, zero-valued conditional probability measure.
-
7/31/2019 Farag - Random Variables
7/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 7
Example 4: Consider events and that is a subset of; symbolically, Itfollows that the conditional probability = and that the joint probabilitymeasure A stimulus may affect a set of events even though they may be disjoint. In this case,conditional probabilities enable measuring the overall effect of that stimulus. For example,
suppose that an experiment produces three disjoint events such that . We can measure the effect of a stimulus (i.e., another event) on the set ofevents by noting that , and since the intersections are disjoint as well, thenthe probability measure
is additive with respect to the components
; i.e.,
,which can be written in terms of conditional expectations as: This can be generalized to a larger number of events in what is known asthe total probability theorem which is stated below.
Theorem 1: Consider a set of events and . Suppose the events aremutually disjoint; i.e., they span the sample space. Consider an event . We can show that
The proof of the total probability theorem follows from the fact that the probability measure of
disjoint events is analytically additive. [Ash]In general, enlisting the events in and will not be a simple task, if not impossible,
with real-world problems. Likewise, obtaining the corresponding functional form for the
probability measure will be a rough task to accomplish. A logical task that statistions faced hasbeen to search for automatic procedures to generate events and their measure for particular
experiments. Indeed, a general approach has evolved to accomplish this task through the conceptof random functions; commonly referred to as random variables. This type of function maps
events on the sample space and the sigma algebra, which may be quite abstract, into another
representation on the Cartesian domain on which assigning a scale (measure or weight) will be
possible using the common tools of calculus. We examine this concept next as prelude to
describing multivariable experiments and complicated stochastic phenomena.
-
7/31/2019 Farag - Random Variables
8/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 8
4.3 Random Variables
4.3.1 Basic Concepts
A random variable is a mapping from the sigma algebra into the real line such that theeven . As illustrated in Fig. 1, the random variable is a mapping fromevents in into other forms of events (sets) defined on the real line; these sets are called Borel
sets. For compact representation, we insist that the inverse mapping of the Borel sets are events
on ; i.e., the set for all Borel sets [Peiper].To avoid potential confusion on the difference between the sample space and the sigma
algebra vis--vis the random variable, we use as stated before, the alphabet of the experiment to
represent and its dictionary to represent ; hence, is technically a mapping from into. Of course, it is understood that events in are formed of elementary events, i.e., mustsubsequently be contained in . Like any function, has a domain which will be and arange which is the real line . Hence, a short hand for this relationship would be as follows: . As domains and ranges may be continuous or discrete, we will be able to considercases where the mapping between continuous entities, in which
will be denoted as
continuous random variable, and between discrete entities, in which will be denoted asdiscrete random variable. As the mapping is between events, it is natural that weights (measure)of the two ends of the mapping would be the same. That is, in the illustration shown in Fig. 1, we
must have
Fig. 4. 1: Illustration of the random variable as a mapping function from the sample space of the sigma algebra into
Borel sets on the real line.
-
7/31/2019 Farag - Random Variables
9/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 9
We shall provide few illustrative examples to demonstrate the power of the random variable
concept in describing a statistical experiment.
Example 5: Consider the experiment in example 1 above. Suppose we have a function which takes an arbitrary value of 1 for C and -1 for NC. It is evident that we can establishevents . In particular, we have the following possibilities:
The above example illustrates a discrete-valued random variable, which has a finite range and
domain. Fig. 2 illustrates this mapping.
Fig. 4. 2: Illustration of the domain and range of a discrete-valued random variable.
Example 6: Suppose a statistical experiment with sample space consisting of eight elementary outcomes. Considera function . We mayillustrate that
is a discrete-valued random variable on
by examining whether
is an event. It is evident that:
-
7/31/2019 Farag - Random Variables
10/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 10
Hence, is a random variable.The above illustrates the value of the random variable concept as an event generator.
Equally important is to see whether the random variable will also enable measurability of the
generated events. In other words, can we evaluate The ability toevaluate this quantity achieves the measurability requirements for the events in , and thus
completely describes the statistical experiment. Indeed, the quantity hasa special name in mathematical statistics, known as the cumulative probability distribution
function (CDF). From its definition, the event shrinks as and expands as ;therefore, the CDF is non-decreasing. Indeed it is also monotone; hence, continuous from right;
that is:
As the CDF is a
fundamental concept in the theory of statistics we shall examine it in further details. Indeed, we
shall, for the ease of notations, connote the CDF as and will give it the specialsymbol . To make the notations sync, we shall define the CDF formally as follows:Definition 3:Given a random variable defined for a statistical experiment E = {, , P}.The cumulative probability distribution (CDF) of is a function on the real line whichmeasures the event
We connote this function as
for every Borel set The CDF, like any function, may be studied from the theory of calculus. Therefore, the
description of the statistical experiments and indeed the statistical theory gets transitioned from a
seemingly guessing game into a well-founded mathematical theory through the concepts of
random variable. The monotoncity, hence, continuity of is evident from the definition.We can easily infer the following properties of the CDF.
Fact: The CDF function have the following properties:(i) (ii) is differentiable over .
-
7/31/2019 Farag - Random Variables
11/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 11
Definition 4: The probability density function (PDF) of a random variable is thederivative of the CDF function; i.e., = . We shall see later on that both and enable various studies related to theexperiment under consideration.
The differentiability of the CDF is a functional property due its continuity, and that does
not imply that the random variable itself (the function that maps events to Borel sets on the real
line) is continuous or discrete. The distinction between continuous and discrete random variables
is in the form of the Borel sets that are generated from the mapping. A discrete random variable
means that its domain may take only finite but discrete values; i.e., the Borel sets are discrete,
while a continuous random variable has a continuous domains; i.e., the Borel sets are continuous.
As a consequence of these characteristics, we may be able to define the probability of a point
event for discrete random variables; i.e., . Point events for continuous random variables are emptysets; i.e., their measures will be zero.The implications for these characteristics and differentiability of will be such that thefollowing distinctions will be made between discrete and continuous random variables:
{
(1)
For the integer-valued case, the set will be integer valued and the upper limit of the summation in Eq. (1) would beeasy to determine. If we use the generalized function representation, we may express the PDF for
the discrete random variables in terms of the Dirac delta function; i.e., hence, Therefore, a unified expression for the CDF may be writtenfor both continuous and discrete random variables in terms of an integral functional only; i.e., In practice, however, a clear distinction will be implicit in termsof the definition of the experiment and the values that events may take. Indeed, on principle, we
-
7/31/2019 Farag - Random Variables
12/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 12
may define a random variable to describe events that has both continuous and discrete domains.
We provide below some illustrative examples for the CDF and PDF for various random
variables.
Example 7: Consider a random variable with the CDF shown in Fig. 3 which has the following
functional form:
{
We note that the CDF in this example is not differentiable at The problem,however, is at where the CDF has a jump (discontinuity) of size We can address thebehavior of the CDF at the jump noting that:
Fig. 4. 3: The CDF of a mixed random variable.
Hence, functional analysis enables us to address the various scenarios that the CDF may have.
-
7/31/2019 Farag - Random Variables
13/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 13
4.3.2 Properties of the CDF and the PDF of a random variable
As the CDF is a probability measure for the event
then, the positivity
and upper bound of the CDF are evident. Likewise, as the name implies, the CDF is non-
decreasing and monotone, hence, the continuity from the right is evident. We can also infer the
properties of the CDF for disjoint events as well as independent events using the basic
definitions. We also note that the following identities follow straight from the definition of the
CDF. i) The PDF, being a derivative of the CDF, possess the positivity and summability (integrability)
constraints. Writing the PDF for discrete random variable as:
= The most important virtues of the CDF is their ability to provide a computational mechanism (or
algorithm) to evaluate the probability measure using the form of the probability mass of the
Borel sets corresponding to the events on the sigma algebra through summation (discrete
random variable) or integration (continuous random variable). Some examples will illustrate
these concepts.
Example 8: Consider a statistical experiment defined within the boundaries illustrated in Figure
1. Suppose the mass function of the Borel sets are such that they have an exponential form:
We can evaluate the probability measure of the event
by the mass of its
equivalent Borel set . In particular, we can evaluate theprobability of such events as:
-
7/31/2019 Farag - Random Variables
14/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 14
Over the long history of statistics, various functional forms for resulted for specificspractical experiments. Some of the most common PDFs are listed in Table 1.
Table 1: Common PDF functional forms.
Name Continuous PDF Discrete PDF
Uniform Exponenti
al
;discrete-valued
Gaussian N/ACauchy N/ALaplace N/ARayleigh
N/A
Maxwell N/AGamma N/A
Beta
N/A
-
7/31/2019 Farag - Random Variables
15/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 15
Binomial N/A n: independent trails
Poisson N/A
The parameters in the CDF forms in Table 1 represent the modes as well as resolution measures
of the functions and will be addressed later on.
It is important to remember the context of the definitions of the CDF and the PDF and the
brilliance of the statisticians, physicists and mathematicians who, over the years, developed these
tools that enable proper description and characterization of statistical experiments. The random
variable concept has been the catalyst for mapping the statistical experiment from a guessing
game into a functional representation that is governed by the rules of calculus.
4.3.3The Conditional Distribution
The CDF and the PDF may be defined for conditional events. For example, we may study the
behavior of event A defined such that: with respect to another event B which may or may not be defined on the same sample spaceWe can consider the conditional CDF as measure of event { in same way we definedthe conditional probability before
We point out that } means that , and the event may or maynot be a subset of the same sample space . We maintain that the conditional CDF possesses allthe properties of the CDF as discussed before. In particular, it is straight forward to justify the
-
7/31/2019 Farag - Random Variables
16/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 16
following properties: , , and for Indeed, the total probability theorem that we studied before, may be expressed as follows:
consider a partition on a sample space with known apriori measures Any event may defined in terms of thepartition such that: , and the CDF for random variable will be defined asfollows:
.
In the same fashion we studied the probability density function (PDF) before, we can
define the conditional PDF as well; that is, for continuous random variables we have:
.Example 9: Consider the two events
and
. We can show that: By derivatives, we obtain the conditional PDF
Example 10:Consider the events and . We can show that:
-
7/31/2019 Farag - Random Variables
17/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 17
{ The conditional PDF will be:
The conditional distribution is an important mechanism to measure the effects of a stimulus over
a certain behavior, or the degree of dependency of events over each others. Later on we will
generalize the concepts of random variable to describe experiments that can create indexed set of
random variables, where the index is defined on the set of real numbers or on a graph; we will
exploit the dependency of the random variables to perform clustering of classes into meaningful
description, and we will be able to study various control mechanisms of experiments that can
only be studied over a long period of observations.
4.3.4StatisticalExpectation
The CDF and the PDF in Table 1 are described in terms of parameters that quantify the basic
behavior of these functions, such as their mean value and the dispersion (variance) around that
mean value. Over the years of statistical analysis, statisticians have developed metrics that can be
computed from the CDF or the PDF which provide important clues about the behavior of a
statistical experiment. In fact, some of these metrics may provide an approximation or upper
bounds of the probabilities that a certain event may take without actual calculation of the CDF
and the PDF; thus a prediction may be possible for the behavior of events based on numerical
calculations. Among these metrics is the statistical expectation which will be formally defined
below.
Definition 5:Given a random variable with PDF, itsexpect value is defined by
-
7/31/2019 Farag - Random Variables
18/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 18
{
The expectation as defined above for a single random variable is a numerical indication of how
symmetric the PDF is around the origin. Due to the positivity constraints of the PDF, a zero
expected value connotes symmetry, while a positive value indicates that the PDF is tilted
towards the positive (right hand side) of the origin, and a negative value indicates that the PDF is
tilted to left of the origin. Hence, in reality the quantity provides the mean value of thePDF or its locus of symmetry on the
-axis. Therefore,
the mean value or the average of
the random variable From the strict mathematical sense, we can define the effect of a generalfunction of the random variable on the PDF. For example, we may define the expected value of afunction as follows:Definition 6:Given a random variable with PDF , suppose we define a mapping ; i.e., the domain and range of is similar to the expect value of isdefined by
{
In this case, the mean is a special case of The expectation of measures theaverage of the function. Indeed, we shall study the fact that a function of a random variable is
also a measurable function, which defines a random variable as well. Hence, we may calculate
as above, using the PDF of random variable , or first calculate the PDF of and then we calculate as follows:
-
7/31/2019 Farag - Random Variables
19/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 19
{
The above result is very fascinating, even though we have not formally derived it, and indeed has
enormous practical implications in the statistical analysis, this due to the fact that calculating the
PDF is not a simple matter, even for simple transformations. Therefore, the PDF of a random
variable serves a great deal for functions of as well. We focus on few definitions which arespecial cases of and provide some specific clues on the PDF of.Definition 7:Given a random variable with PDF, the variance of is defined as:
{
The variance measures the dispersion (width or concentration) of the PDF around the mean
value. It serves to estimate the domain of the values that the random variable
may take. In
fact, an upper bound has been derived, through the Chebyshev inequality, which estimates the
upper bound of the probability of a normalizedevent.
Fact:The Chebyshev inequality [Papoulis]. Consider a random variable with PDF having amean value and variance Define the normalized event We canshow that: .The proof the Chebyshev inequality follows directly from the definitions and the positivity of the
density function. Its importance is in providing a measure for the concentration of the values of
the random variable around its mean value.
Definition 8: Given a random variable with PDF, the moment ofaround the originis defined as:
-
7/31/2019 Farag - Random Variables
20/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 20
{
The central moment is obtained in a similar fashion by the proper substitution inthe above equation. We now provide few examples to illustrate how the expectations may be
calculated for some of the PDFs in Table 1.
Example 11: Evaluate the mean and the variance of the Poisson distribution. This distribution is
that of a counting random variable which estimates the instances of occurrences of a particular
event, within a certain period, given its frequency of occurrences.
Similarly,
On the other hand, we note that
Hence, it is immediate that
for the Poisson distribution. Example 12: Evaluate the mean and the variance of the Cauchy distribution.
-
7/31/2019 Farag - Random Variables
21/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 21
The mean value is zero since is an even symmetric function. The difficulty arriveswith the variance since the 2nd order expectation is infinite. Hence, the variance is
undefined. Yet, the parameter
measures the relationship between the maximum height
and the span (width) of the PDF.The knowledge of the PDF (or the CDF) enables computing the expectations. It is a
natural question to ask whether the knowledge of the expectations enable the
determination of the functional form of the PDF? Unfortunately, it is not possible to infer
the PDF from the expectations (moments) except in the case of the Gaussian PDF which is
explicitly defined in terms of its mean and variance. Yet, statisticians studied mechanisms
to generate the moments automatically through the so-called moment generating function,which is defined below.
Definition 9: Given a random variable with PDF, the moment generating function ofis defined as The moments may be generated from as follows.
; Example 13: Evaluate the moment generating function for the standard uniform PDF defined
over the interval .
It is evident that the mean value
for example may be obtained from first derivative of
and taking the limit as approaches zero (note the need for using the l'Hpital's rule).
-
7/31/2019 Farag - Random Variables
22/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 22
One form of expectations, that mimics the moment generating function yet provides a great
functional ability to analyze, and indeed construct, the PDF of a random variable is known
as the characteristic function and is defined as follows:
Definition 10: Given a random variable with PDF, the characteristic function of is defined as
{
is the conjugate of the Fourier transform, which is a unique mapping of the function, thus a direct link is established between the PDF and the characteristic function,which is very useful in a number of applications.
Example 14: Evaluate the characteristic function for the standard uniform PDF defined over
the interval .
4.3.5Functions of Random Variable
It is natural to ask how a transformation of a random variable (which is a function) would
be considered, in the context of the statistical experiment that the random variable was
introduced, in order to characterize in terms of the theory of calculus instead of
enumerations and guessing. The key issue is the measurability of functions of random
variables; i.e., whether the mapping is actually constrained by the original domain (of
course the range as well) of the original random variable?A random variable is amapping from the sigma algebra into the real line such that the event , and we also insisted that the inverse mapping is an event as well, i.e., . Now let us define a function ()It is evident that the transformation
-
7/31/2019 Farag - Random Variables
23/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 23
will generate events on that must be Borel sets. Likewise, the inverse mapping will generateevents that must be in and , by construction. Hence, the measurability of ()is quite evident and inherent. Fig. 4 illustrates this concept, where the mappings of
onto the real line produce Borelsets , which may be remapped back tothe event .
Fig. 4. 4: Illustration of the measurability of functions of random variables.
Theorem [Piper]: If a random variable and is a Borel function, then () is a random variable measurable by ; i.e., the algebra generated by is a subset of the algebra generated by As we done before with random variables, we characterize
()by its
cumulative probability distribution function or by its probability density function.
Depending on the functional form of , we may obtain the CFD interms of . Table 2 lists few examples of the CDF and PDFs of functions ofrandom variables (Papoulis).
-
7/31/2019 Farag - Random Variables
24/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 24
Table 2: The CDF and PDF of functions of random variables for various functional forms.
() CDF
PDF
() ( ) ( )
Monotone increasing ()Monotone decreasing
()
Continuous and hascountable number of real
roots , Integrate
;
Derivations of the entries in Table 2 are straight forward and follow from the definitions of
the CDF, which should be obtained first, and then the derivative operator is applied to
obtain the PDF.
Functions of random variables is an important concept in mathematical statistics that
enables statistical inference about one random variable (or, in general, a random process
as will be studied later) in terms of another. Indeed, simulations of random variables from
different distributions may be obtained from simulating a random variable from a standard
uniform; i.e., such form has been studied and various randomnumber generators exist to generate samples from it. We will study simulations of random
variables after we briefly cover the case of random experiments that may be described by
two random variables; step towards generalizations to sequence of unstructured random
-
7/31/2019 Farag - Random Variables
25/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 25
variables, which will be followed by random variables defined on a graph or have a
particular meaningful index. As our intention is to study random processes and random
fields that have been useful for biomedical image analysis, we will not dwell into the details
of random variables and statistical inferences beyond the basic background needed for our
subsequent coverage.
4.4 Two Random Variables
Events in random experiments may take various forms, and may be better understood in
terms of higher dimensional mapping, instead of one-dimensional. In this case the
measurability of the events on
would be in terms of the mass of events (Borel
sets)on multidimensional spaces. Consider two random variables , theevents and may be paired with respect tothe two parameters such that a new event may be defined on the space as shown in Fig. 5. We mayexpress the random variable with respect to as ( ) or in a vector; i.e., . We will be interested mainly onrandom variables defined on the same sample space, hence, for simplicity we will write
and
-
7/31/2019 Farag - Random Variables
26/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 26
Fig. 4. 5: Illustration of the concept of two-dimensional random vectors for two random variables on the same
sample space. For measurability, the sets and are Borel sets.In direct analogy to the single random variables case, we may determine the probability
mass distribution on the Borel sets defined on the plane , by the assignment ()
hence, when dealing with a pair of random variables
, we have three
induced probability measures: The mass distribution on represented by consists of the unit mass distributed along . Similarly, represents a unit mass distributed along , and represents a unit mass distributedalong . The measures are related to as follows:
Now if ; i.e., the entire real line, then , so that . Similarly, if ; i.e., the entire real line, then , so that . We formally define these quantities below.
-
7/31/2019 Farag - Random Variables
27/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 27
Definition 11:The probability measure defined on the Borel sets in the plane is called the joint probability measure induced by the joint mapping
. The probability mass distribution is called the joint distribution. The
probability measures and are called themarginal probability measures induced by random variables respectively. Thecorresponding probability mass distributions are called the marginal probability
distributions. Definition 12: The joint probability distribution function, or the joint cumulative distribution
function for random variables
, denoted by
is defined by
for all values The special cases and arecalled the marginal distribution functions for random variables respectively. We follow the same development for the single random variable case and define the probability
density function for two random variables.
Fact (Feller Vol. 1): Ifthe joint probability measure
induced by
is absolutely
continuous, a function exists such that: is called the probability density function for .Conversely, given and that satisfy the previous properties, there exists two random variables defined on some sample space with joint probability distribution and densityfunctions and respectively. We may enlist the main properties of
following the same manner as we did with single
random variables; specifically, we note that:
i. ii.
iii.
-
7/31/2019 Farag - Random Variables
28/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 28
iv. Example 15: Consider two jointly continuous and positive random variables
with the
following joint density function:
Find Solution: The required quantities may be obtained by integrating
over the specific
region, see Fig. 6.
;
Fig. 4. 6: Illustration of region of integration for the marginal density
.
Also, ; .The cumulative distribution function will be computed from the basic definition:
-
7/31/2019 Farag - Random Variables
29/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 29
given that leads to
We may address the issue of dependence of random variables here due to its importance.
We maintain from definition 2 and example 9, that events are independent if their joint probabilitiesdistribute; that
= .
Note that in the above example, has a separable form, but the two random variablesare not independent due to the restrictions
The independence concept will come
handy in various applications, including simulations of random variables.
4.4.1 Statistical Expectation in Two Dimensions
Expectations in two dimensions follow in the same manner the one-dimensional case (see
definitions 5 and 6). In particular, the expected value of a function of randomvariables is defined as follows:Definition 13: Given a pair of random variables , suppose we define a mapping ; i.e., the domain and range of is similar to the expectvalue of is defined by
-
7/31/2019 Farag - Random Variables
30/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 30
{
where is the probability density function for continuous random variables and isthe probability mass function for discrete random variables. Definition 14: Given a pair of random variables , the covariance function is definedas:
Cov( where are the marginal means as defined in Definition 5. That is, for any randomvariable, we have:
{
where is the probability density function for continuous random variable and is theprobability mass function for discrete random variable . A special case for definition 14 that does not include the marginal means is called the auto correlation
;
i.e., .We note that all the properties of the expectation operators in higher dimensions are similar to that in one-
dimension.
-
7/31/2019 Farag - Random Variables
31/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 31
In two dimensions (multi-dimensions, in general), the random variables may possess various degrees of
relatedness; which may span the entire scale offully relatedto completely unrelatedor independent.
We define the relatedness in terms ofcorrelation coefficient
; i.e.,
The Correlation coefficient , where is the standard deviation of random variable .If the covariance is zero, then the two random variables are said to be uncorrelated. If the two
random variables are independent, then their covariance becomes zero, and therefore, by
definition, the two random become uncorrelated. We note that the converse is not true, except
in the case of Gaussian random variables, which will be considered later.
4.4.2 Functions of two Random Variables
It is natural to study the mappings (transformations) of two random variables as we did with
single random variables. Here we consider the transformation where is ageneral function. We used the bold face letter to represent the functional form for whichmay be a vector. A given realization (certain value or instant ofwill be represented by smallbold face letter The key issue is the measurability of which may be understood in asimilar fashion as we did in one-dimensional case; see Fig. 4. We maintain that under normal
conditions, is measurable and, therefore, we may define a complete functional descriptionfor it in terms of the probability distribution function; ). We consider fewcases below.
Example 16: Consider two jointly continuous random variables with probability densityfunction Define a summation function We maintain that is ameasurable random variable whose CDF may be calculated as follows.
and the density function would be evaluated as follows:
-
7/31/2019 Farag - Random Variables
32/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 32
The inner differentiation may be evaluated using Leibnitz rule; i.e.,
Hence, As a special case, if
are independent, then joint denisty becomes separable, and
will
have the following form: ,which is the convolution integral. For this case, the CDF would be: For illustartion, let us consider the simple casewhere both are independent and have uniform distribution. Let It is straight forward to show that
{
Table 3 lists few examples for the distribution of functions of two random variables; all can be
derived from the basic defintions. These functions maps two random variables into a single
random variable; i.e., there is a reduction in dimensionality involved in the transformation. We
will consider the case where the functions of random varaibles are vectors next.
-
7/31/2019 Farag - Random Variables
33/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 33
Table 3: Some one-dimensional functions of two random variables and the correspodning CDF
and PDF.
CDF PDF
;
4.4.3Two functions of two random varaibles:
As a step twards generlized representation of functions of random varaibles, let us consider
vector representations of the random varaibles
, i.e.,
and a functional form
for that vector representation will be quitestraight forward in the case where is a linear mapping of [ ] , for some constants If the transformation is inverible; i.e., exists. Hence, = . The
-
7/31/2019 Farag - Random Variables
34/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 34
Jacobian of the transformation is defined as: . Then the joint probability densityfunction of may be written in terms of the joint density of randomvaraibles where |.| stands for the determinenat.Example 17: Consider two jointly continuous random variables with probability densityfunction Consider the polar coordinates:
, , ( ). Wemay define theJacobian of the polar transformation as follows: =
[ ] Hence,
.
4.5 Simulation of Random Variables
In this section we discuss how to simulate a one and two-dimensional random variables of
general distributions, starting from standard uniform distribution. This will bring about the
linkage between a distribution or density function and deterministic functional forms, which
represents instants or realizations of the random variables. Generating such instants may be used
in a reverse fashion to estimate the density function using histograms. Simulation is an art and a
well-studied subject in statistical design; stochastic systems are studied based on simulations. We
only scratch the surface here to bring about some feeling of the outputs of random experiments in
a form of realizations of the random variables that map the sample space into Borel sets on the
real line. The fundamental step in the simulation is the simple observations that since the
distribution function is monotone and non-decreasing, an inverse may be possible; hence
-
7/31/2019 Farag - Random Variables
35/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 35
mapping from uniform distribution to any other form is possible. We illustrate this as follows:
Let are two random variables, not necessarily on the same sample space or have thesame form of distribution. Suppose the two random variables are related to each through the
monotone transformation: The distribution function of with respect to thedistribution of is as follows: () = , if is uniformly distributed. Hence, we may take Therefore, if we generate a realization from , a realization from is obtained by inversemapping.
4.6 Summary
This chapter introduced the basic concepts of statistical modeling which serves as building block
for the material in the book. We defined the concept of random variable from an operational
point view and illustrated its significance in describing statistical experiments. Characteristics of
a single random have been studied, including the probability distribution and density function
and the concept of statistical expectation. Extensions to function of single random variables and
joint distribution of two random variables were considered. The chapter concluded with basics of
simulation of one and two-dimensional random variables.
4.7 Computer Laboratory
A random variable of can be easily converted to any other distribution such as normaldistribution. The methodology of generating random numbers has several forms. However, the
one used in great majority is the linear congruential generators (LCG), whose recursive formula
is given by (A4.1), where is a sequence of integer values ranging from 0 to , startingfrom a seed point : (A4.1)Where m is the modulus,
is the multiplier,
is the increment, and all are non-negative values.
In order to obtain numbers in the interval the generated integer sequence should be dividedby to obtain .There are many variations of LCGs, the form used in this project isthe multiplicative LCG were the addition of is not needed, so the recursive formula of thegenerator can be given by (A4.2)We may convert the generated uniform streams to follow a Gaussian distribution of zero meanand unit variance using the Box-Muller approach described as follows;
-
7/31/2019 Farag - Random Variables
36/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 36
If and are independent random variates (realization from random variable) from ,then and are independent standard normal deviates (realization from ) obtained asfollows:
211 2cosln2 uuZ (A4.3a)
2122sinln2 uuZ (A4.3b)
The approach is to run the random generator to get a very large number of samples of a uniform
random variable, then extract even-ordered terms to be and the odd-ordered terms to be , inorder to guarantee their independence.
Task 1:(a)Generate three streams (1D, 2D and 3D) of random numbers with 100,000 samples; you
can use the Matlab function for random number generation.(b)Visualize the generated samples,Hint: you can use a scatter plot.(c)Compute the histogram of the three streams, and then normalize them to become a
probability density function (pdf).
(d)Visualize the pdfs of the three streams. Are the samples uniformly distributed? Do thepdfs represent a standard uniform distribution? Comment.
Task 2: use the streams generated in task 1 to do the following,
(a)Convert the generated streams to a standard normal distribution(b)Visualize the data samples. Are they normally distributed with mean 0 and unit variance?(c)Compute the normalized histogram (pdfs) of the data samples from (a) and visualize the
distribution, comment.
Task 3: use the streams generated in task 2 to do the following;
(a)Convert the 1D normal variate to a normal distribution with mean -2 and standarddeviation 3.
(b)Convert the 2D normal variate to a normal distribution with mean vector
2
1M and
Covariance matrix
94
44.
(c)Convert the 3D normal variate to a normal distribution with mean vector5
5
6
M
and
Covariance matrix
5 2 1
2 5 0
1 0 4
.
-
7/31/2019 Farag - Random Variables
37/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 37
(d)For (a), (b) and (c) estimate the mean and covariance matrix and compare the estimatedvalues with the ground truth ones, comment.
4.8 Exercises
1. Given a probability space Let be the events such that Prove/disprove the following:(a) (b) (c) .
2. Given a sample space (a)Construct the smallest algebra that contains the two sets
and
(b)Construct the smallest algebrathat contains the two sets and (c)What members ofthat and measurable?3. Let be a nonnegative finitely additive set function on the field . Show that
4. Let
be arbitrary subsets of a set . Describe explicitly the smallest
field
containing
. How many sets are there in
? (i.e., give an upper bound thats
attainable.5. Derive the entries in Table 2.6. Derive the entries in Table 3.
-
7/31/2019 Farag - Random Variables
38/38
F a r a g : C h a p t e r 4 : B a s i c s o f S t a t i s t i c a l M o d e l i n g P r e p r i n t 2 0 1 1 | 38
d)4.9 References1. Robert Ash, Real Analysis and Probability, Academic Press, New York, 1972.2. Athanasios Papoulis, Probability, Random Variables and Stochastic Processes, 3rd
Edition, McGraw-Hill, New York, 1991.
3. Paul E. Pfeiffer, Concepts of probability Theory, 2nd Edition, Dover, New York, 1978.4. Wilbur Davenport, Random Processes, McGraw-Hill, New York, 19705. William Feller, An Introduction to Probability Theory and Its Applications, Vol.1,
Wiley, New York, 1968.
6. Reuven Rubinstein, Simulation and the Monte Carlo Method, Wiley, New York, 1981.