topic covered

Probability: Fundamentals (discrete and continuous probability

models)

Neha BarveLecturer Bioinformatics

DAVV

Topic covered

• Probability fundamentals• Definitions • Events• Probability models (discrete and continuous)• Expectation and variance• Examples

Probability• Probability is study of random experiments .• It is a measure of whether a particular event will occur or not.• A measure of chance or probability of occurrence of an event, a

number between 0 and 1.• If an event occurs the probability is 100%.• If an event does not occur the probability is 0%.• If not sure the probability lies between 0 to 1.• The uses of probability

– Begins with gambling.– Now applied to analyze data in astronomy, mortality data, traffic flow,

telephone interchange, genetics, epidemics, investment...

Probability Terms• Random Experiment: A process leading to at least 2 possible

outcomes with uncertainty as to which will occur.• Event: An event is a subset of all possible outcomes of an

experiment. – Intersection of Events: Let A and B be two events. Then the

intersection of the two events, denoted A B, is the event that both A and B occur.

– Union of Events: The union of the two events, denoted A B, is the event that A or B (or both) occurs.

– Complement: Let A be an event. The complement of A (denoted ) is the event that A does not occur.

– Mutually Exclusive Events: A and B are said to be mutually exclusive if at most one of the events A and B can occur (two events are mutually exclusive if they cannot occur at the same time. An example is tossing a coin once, which can result in either heads or tails, but not both.).

• Basic Outcomes: The simple possible results of an experiment. One and exactly one of these outcomes must occur. The set of basic outcomes is mutually exclusive and collectively exhaustive.

• Sample Space: The totality of basic outcomes of an experiment.

collectively exhaustive• Means that at least one of the outcomes

must happen, so these two possibilities together exhaust all the possibilities. However, not all mutually exclusive events are collectively exhaustive. For example, the outcomes 1 and 4 of a single roll of a six-sided die are mutually exclusive (cannot both happen) but not collectively exhaustive (there are other possible outcomes).

Basic Probability Rules1. For any event A, 0 P(A) 1.2. If A and B can never both occur (they are mutually

exclusive), then P(A and B) = P(A B) = 0.3. P(A or B) = P(A B) = P(A) + P(B) - P(A B).4. If A and B are mutually exclusive events, then P(A or

B) = P(A B) = P(A) + P(B).5. P(Ac) = 1 - P(A).Independent Events• Two events A and B are said to be independent if the

fact that A has occurred or not does not affect your assessment of the probability of B occurring. Conversely, the fact that B has occurred or not does not affect your assessment of the probability of A occurring.

6. If A and B are independent events, then P(A and B) = P(A B) = P(A) P(B).

Probability models

• Two parts in coin tossing.– A list of possible outcomes.– A probability for each outcome.

• The Sample space S of a random phenomenon is the set of all possible outcomes.– Examples. S={heads, tails}={H,T}– General analysis is possible.

• What is the probability of “exactly 2 heads in four tosses of a coin”?

• What kind of rules that any assignment of probabilities must satisfy?

• An event is an outcome or a set of outcomes. (= it is a subset of the sample space)

• A={HHTT,HTHT,HTTH,THHT,THTH,TTHH}• In a probability model, events have probabilities that satisfy ...• Two events A and B are independent if knowing that one occurs

does not change the probability that the other occurs.• If A and B are independent,

P(A and B) = P(A)P(B)the multiplication rule for independent events.

• A conditional probability is the probability that event A occurs when the sample space is limited to event B.

• This is read "the probability of A, given B". It is commonly notated P(A | B). The two events are separated by a vertical line

Conditional Probability

• Example: One of the businesses that have grown out of the public's increased use of the internet has been providing internet service to individual customers; those who provide this service are called Internet Service Providers (ISPs).

– More recently, a number of ISPs have developed business models whereby they do not need to charge customers for internet service at all, by collecting fees from advertisers, and forcing the non-paying customers to view these advertisements.

– Jupiter Communications estimates that by the end of 2003 20% of web users will have a free ISP. 6% of all web users, it is estimated, will have both a free ISP and a paid ISP account.

• In 2003, what proportion of internet users is expected to do the following?

a) subscribes to both a free ISP and a paid ISP. b) subscribes only to a paid ISP. c) subscribes only to a free ISP.

P(A B)= P(A|B)P(B)= P(B|A)P(A) • In these simple calculations, we are making use of

the conditional probability formula: P(A|B) = P(A holds given that B holds) =

P(A∩B)/P(B)• This relationship is known as Bayes' Law, after the

English clergyman Thomas Bayes (1702-1761), who first derived it. Bayes' Law was later generalized by the French mathematician Pierre-Simon LaPlace (1749-1827).

Bayes Laplace

Random Variables

• A random variable is a variable whose value is a numerical outcome of a random phenomenon.

• A random variable is a numerical description of the outcome of an experiment (e.g., the possible results of rolling two dice: (1, 1), (1, 2) , etc.).

• Random variables can be classified as either discrete (a random variable that may assume either a finite number of values or an infinite sequence of values) or as continuous (a variable that may assume any numerical value in an interval or collection of intervals).

Random Variable• A random variable is called discrete if it has countably many possible values; otherwise, it is called continuous.

• The following quantities would typically be modeled as discrete random variables:

– The number of defects in a batch of 20 items.– The number of people preferring one brand over another in a market research study.

– The credit rating of a debt issue at some date in the future.• The following would typically be modeled as continuous random variables:

– The proportion of defects in a batch of 10,000 items.– The time between breakdowns of a machine. –Sometimes, we approximate a discrete random variable with a continuous one if the possible values are very close together; e.g., stock prices are often treated as continuous random variables.

Difference • A continuous variable is one that can take any

real numerical value. For example– The length of a strip can be anything. – A person's height and age can take any real values,

within reasonable limits. • Whereas, discrete variables will only have

values that are whole numbers, For example – Number of people on a football team.– The number of major planets in the solar system. – No star could ever have 5.62 major planets

Distribution: discrete• If X is a discrete random variable then we denote its pmf by PX.

– The rule that assigns specific probabilities to specific values for a discrete random variable is called its probability mass function or pmf.

– For any value x, PX(x) is the probability of the event that X = x; i.e., PX(x) = P(X = x) = probability that the value of X is x.– We always use capital letters for random variables. Lower-case

letters like x and y stand for possible values (i.e., numbers) and are not random.

– A pmf is graphed by drawing a vertical line of height PX(x) at each possible value x. • It is similar to a histogram, except that the height of the line (or bar)

gives the theoretical probability rather than the observed frequency.• The pmf gives us one way to describe the distribution of a random

variable. Another way is provided by the cumulative probability function, denoted by FX and defined by FX(x) = P(X≦ x)– It is the probability that X is less than or equal to x.– The the pmf gives the probability that the random variable lands on

a particular value, the cpf gives the probability that it lands on or below a particular value. In particular, FX is always an increasing function.

Examples

• Three tosses of fair coin:• There are eight possible outcomes.• These will constitute the sample space.• Let the number of heads be the random

variable X, sample space S.S = [ HHH HHT HTH THH HTT THT TTH TTT ]X = [ x1 x2 x3 x4 x5 x6 x7 x8 ]

= [ 3 2 2 2 1 1 1 0 ]

• Let X be a discrete random variable and also let x1,x2,x3….. Be the values that X can assume in increasing order of magnitude.

• Let P(X= xi) = f (xi) = 1,2,3…• Be the probability of xi,

Σ f(x) = 1• f(x) is known as probability function or pdf.• Probability function for the coin tossed:• Probability of each of the 8 outcomes is 1/8.P( X = 0 ) = P ( x8 ) = 1/8P( X = 1 ) = P ( x5 ) + P ( x6 ) + P ( x7 ) = 1/8 + 1/8 + 1/8 = 3/8P( X = 2 ) = P ( x2 ) + P ( x3) + P ( x4 ) = 1/8 + 1/8 + 1/8 = 3/8P( X = 3 ) = P ( x1 ) = 1/8Probability density function is that :

X 0 1 2 3

f(x) 1/8 3/8 3/8 1/8

Histogram

3\8

1\8

0 1 2 3

Distribution: continuous• The distribution of a continuous random variable cannot be specified through a probability mass function because if X is continuous, then P(X = x) = 0 for all x; i.e., the probability of any particular value is zero. Instead, we must look at probabilities of ranges of values.

– The probabilities of ranges of values of a continuous random variable are determined by a density function. It is denoted by fX. The area under a density is always 1.

– The probability that X falls between two points a and b is the area under fX between the points a and b. The familiar bell-shaped normal curve is an example of a density.

• The cumulative distribution function or cdf of a continuous random variable is obtained from the density in much the same way a cpf is obtained from the pmf of a discrete distribution.

– The cdf of X, denoted by FX, is given by FX(x) = P(X≦ x).– FX(x) is the area under the density fX to the left of x.

• Let there be a function f(x) such that • f(x) ≥ 0

• f(x) is pdf• The probability of X lying between a and b

defined by

Expectation and Variance

• In probability theory, the expected value (or expectation, or mathematical expectation, or mean, or the first moment) of a random variable is the weighted average of all possible values that this random variable can take on.

• The weights used in computing this average correspond to the probabilities in case of a discrete random variable, or densities in case of a continuous random variable.

Discrete random variable, finite case

• Suppose random variable X can take value x1 with probability p1, value x2 with probability p2, and so on, up to value xk with probability pk. Then the expectation of this random variable X is defined as

• Since all probabilities pi add up to one: p1 + p2 + ... + pk = 1, the expected value can be viewed as the weighted average, with pi’s being the weights:

Example

• Let X represent the outcome of a roll of a six-sided die.

• The possible values for X are 1, 2, 3, 4, 5, 6, all equally likely (each having the probability of 1/6 ). The expectation of X is

• Hence the formula for expectation is

Variance

• In probability theory and statistics, the variance is a measure of how far a set of numbers are spread out from each other. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean (expected value). In particular, the variance is one of the moments of a distribution.

Example

• if a coin is tossed twice, the number of heads is: 0 with probability 0.25, 1 with probability 0.5 and 2 with probability 0.25.

• Thus the mean of the number of heads is 0.25 × 0 + 0.5 × 1 + 0.25 × 2 = 1,

• and the variance is (1-0.5)2 + (1-0.5)2= 0.5

Cumulative density function• If a random variable can take values x1,x2,x3,

……, than the distribution function is given by

Distribution function at random variable

F(X) remains the same or increases as X increase. Hence F(x) is said to be a monotonically increasing funcation

Continuous random variable• A random variable that can take on an infinite number of values is known

as a continuous random variable.• There are infinite possible values of X, the probability that it takes on any

particular valueis 1/∞ or 0.• Hence probability function in this case cannot be defined as in the discrete

case.• In a continuous case probability that X lies between two different values is

non-zero.• Examples: • 1) if X represent the height of a person, then the probability that it is

exactly 160 cm would be zero but the probability btween 155 cm and 165 cm would be non zero.

• 2) if one measures the width of an oak leaf, the result of 3½ cm is possible, however it has probability zero because there are uncountably many other potential values even between 3 cm and 4 cm. Each of these individual outcomes has probability zero, yet the probability that the outcome will fall into the interval (3 cm, 4 cm) is nonzero. (Formally, each value has an infinitesimally small probability, which statistically is equivalent to zero.)

• Let there be a function f(x) such that • f(x) ≥ 0

• f(x) is pdf• The probability of X lying between a and b defined by

• For a continuous case the probability of X being equal to any particular value is zero. Hence < sign can be replaced by the sign ≤ thus

• P(a<X<b) = P (a ≤ X < b) = P(a < X ≤ b) = P (a ≤ X ≤ b)

Probability density function of continuous random variable

Normal distribution• Data can be "distributed" (spread out) in different ways.

(left or right or jumbled)• But there are many cases where the data tends to be

around a central value with no bias left or right, and it gets close to a "Normal Distribution“.

• We say the data is "normally distributed“ if• The Normal Distribution has:• mean = median = mode• symmetry about the center• 50% of values less than the mean

and 50% greater than the mean

Mathematical term• The normal (or Gaussian) distribution is a continuous

probability distribution that has a bell-shaped probability density function, known as the Gaussian function or informally the bell curve: and given by

where parameter μ is the mean or expectation (location of the peak) and σ 2 is the variance, the mean of the squared deviation, (a "measure" of the width of the distribution). σ is the standard deviation. The distribution with μ = 0 andσ 2 = 1 is called the standard normal.

Standard Deviations• When you calculate the standard deviation of your data, you

will find that:

68% of values are within1 standard deviation of the mean

95% are within 2 standard deviations

99.7% are within 3 standard deviations

• The number of standard deviations from the mean is also called the "Standard Score", "sigma" or "z-score".

• z is the "z-score" (Standard Score)• x is the value to be standardized• μ is the mean• σ is the standard deviation

Example • A survey of daily travel time had these results (in

minutes): 26, 33, 65, 28, 34, 55, 25, 44, 50, 36, 26, 37, 43, 62, 35, 38, 45, 32, 28, 34

• The Mean is 38.8 minutes, and the Standard Deviation is 11.4 minutes

• Convert the values to z-scores ("standard scores").• To convert 26:• first subtract the mean: 26 - 38.8 = -12.8,• then divide by the Standard Deviation: -12.8/11.4 = -

1.12• So 26 is -1.12 Standard Deviations from the Mean

• Here are the first three conversions

Original Value Calculation Standard Score(z-score)

26 (26-38.8) / 11.4 = -1.1233 (33-38.8) / 11.4 = -0.5165 (65-38.8) / 11.4 = +2.30... ... ...

Central limit theorem• CLT indicates that the probability density of sum of N independent

random variable tends to approach a normal density as the N increases.• The mean and variance of this normal density are the sums of mean

and variance of N independent random variable.• When you throw a die ten times, you rarely get ones only. The usual

result is approximately same amount of all numbers between one and six. Of course, sometimes you may get a five sixes, for example, but certainly not often.

• The reason for this is that you can get the middle values in many more different ways than the extremes. Example: when throwing two dice: 1+6 = 2+5 = 3+4 = 7, but only 1+1 = 2 and only 6+6 = 12.

•

• The formal representation of the central limit theore m looks like this:

• when X1, X2,... are independent observationsof random variablea X, to which applies:

Hypothesis testing• Hypothesis testing is a way of systematically quantifying how

certain you are of the result of a statistical experiment.• Example (tossing a coin 100 times and make a judgment about

whether coin is fair or not)• Null Hypothesis : It is a hypothesis which states that there is no

difference between the procedures and is denoted by H0. • Alternative Hypothesis : It is a hypothesis which states that there is

a difference between the procedures and is denoted by HA.• Test Statistic : It is the random variable X whose value is tested to

arrive at a decision.• Conclusion : If the test statistic falls in the rejection/critical region,

H0 is rejected, else H0 is accepted.

• Table 1. Various types of H0 and HA

• Rejection Region : It is the part of the sample space (critical region) where the null hypothesis H0 is rejected. The size of this region, is determined by the probability (α) of the sample point falling in the critical region when H0 is true. α is also known as the level of significance, the probability of the value of the random variable falling in the critical region. Also it should be noted that the term "Statistical significance" refers only to the rejection of a null hypothesis at some level α.It implies only that the observed difference between the sample statistic and the mean of the sampling distribution did not occur by chance alone.

Case Null Hypothesis H 0Alternate Hypothesis H A

1 m1 = m2 m1 m� 2

2 m1 < m2 m1 > m2

3 m1 > m2 m1 < m2

Example - Efficacy Test for New drug• Drug company has new drug, wishes to compare it

with current standard treatment• Federal regulators tell company that they must

demonstrate that new drug is better than current treatment to receive approval

• Firm runs clinical trial where some patients receive new drug, and others receive standard treatment

• Numeric response of therapeutic effect is obtained (higher scores are better).

• Parameter of interest: mNew - mStd

Example - Efficacy Test for New drug

• Null hypothesis - New drug is no better than standard trt

00:0 StdNewStdNewH

• Alternative hypothesis - New drug is better than standard trt

0: StdNewAH

• Experimental (Sample) data:

StdNew

StdNew

StdNew

nnssyy

Sampling Distribution of Difference in Means

• In large samples, the difference in two sample means is approximately normally distributed:

2

22

1

21

2121 ,~nn

NYY

• Under the null hypothesis, 1-2=0 and:

)1,0(~

2

22

1

21

21 N

nn

YYZ

• 12 and 2

2 are unknown and estimated by s12 and s2

2

Example - Efficacy Test for New drug

• Type I error - Concluding that the new drug is better than the standard (HA) when in fact it is no better (H0). Ineffective drug is deemed better.– Traditionally = P(Type I error) = 0.05

• Type II error - Failing to conclude that the new drug is better (HA) when in fact it is. Effective drug is deemed to be no better.– Traditionally a clinically important difference ( is assigned

and sample sizes chosen so that: = P(Type II error | 1-2 = ) .20

Error

• When using probability to decide whether a statistical test provides evidence for or against our predictions, there is always a chance of driving the wrong conclusions. Even when choosing a probability level of 95%, there is always a 5% chance that one rejects the null hypothesis when it was actually correct. This is called Type I error, represented by the Greek letter α.

• It is possible to err in the opposite way if one fails to reject the null hypothesis when it is, in fact, incorrect. This is called Type II error, represented by the Greek letter β.

These two errors are represented in the following chart

Type of decision H0 true H0 false

Reject H0 Type I error (a) Correct decision (1-b)

Accept H0Correct decision (1-

a) Type II error (b)

Identify the null hypothesis H0 and the alternate hypothesis HA.

Steps in Hypothesis Testing

2

Choose a. The value should be small, usually less than 10%. It is important to consider the consequences of

both types of errors.

3

Select the test statistic and determine its value from the sample data. This value is called the observed

value of the test statistic. Remember that a t statistic is usually appropriate for a small number of samples; for larger number of samples, a z statistic can work well if

data are normally distributed.

4

Compare the observed value of the statistic to the critical value obtained for the chosena.

5

Make a decision.

If the test statistic falls in the critical region:

Reject H0 in favour of HA.

If the test statistic does not fall in the critical region:

Conclude that there is not enough evidence to reject

H0.

Numerics

Thank you

topic covered

Documents