the two way frequency table the 2 statistic techniques for examining dependence amongst two...

88
The two way frequency table The 2 statistic Techniques for examining dependence amongst two categorical variables

Upload: julian-walters

Post on 05-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

The two way frequency table

The 2 statistic

Techniques for examining dependence amongst two categorical

variables

Page 2: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Situation

• We have two categorical variables R and C.

• The number of categories of R is r.

• The number of categories of C is c.

• We observe n subjects from the population and count

xij = the number of subjects for which R = i and

C = j.

• R = rows, C = columns

Page 3: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Example

Both Systolic Blood pressure (C) and Serum Chlosterol (R) were meansured for a sample of n = 1237 subjects.

The categories for Blood Pressure are:

<126 127-146 147-166 167+

The categories for Chlosterol are:

<200 200-219 220-259 260+

Page 4: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Table: two-way frequency

Serum Cholesterol

Systolic Blood pressure <127 127-146 147-166 167+ Total

< 200 117 121 47 22 307200-219 85 98 43 20 246220-259 115 209 68 43 439

260+ 67 99 46 33 245

Total 388 527 204 118 1237

Page 5: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

3 dimensional bargraph

Page 6: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Example

This comes from the drug use data.

The two variables are:

1. Age (C) and

2. Antidepressant Use (R)

measured for a sample of n = 33,957 subjects.

Page 7: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Two-way Frequency Table

Took anti-depressants - 12 mo * Age - (G) Crosstabulation

Count

322 523 570 522 265 249 2451

5007 6201 5822 4982 4114 5380 31506

5329 6724 6392 5504 4379 5629 33957

YES

NO

Took anti-depressants- 12 mo

Total

20-29 30-39 40-49 50-59 60-69 70+

Age - (G)

Total

Age - (G)

20-29 30-39 40-49 50-59 60-69 70+6.04% 7.78% 8.92% 9.48% 6.05% 4.42%

Percentage antidepressant use vs Age

Page 8: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Antidepressant Use vs Age

0.0%

5.0%

10.0%

20-29 30-39 40-49 50-59 60-69 70+

Page 9: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

The 2 statistic for measuring dependence

amongst two categorical variables

DefineTotal row

1

thc

jiji ixR

1

column Totalc

thj ij

i

C x j

n

CRE ji

ij

= Expected frequency in the (i,j) th cell in the case of independence.

Page 10: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Columns

1 2 3 4 5 Total

1 x11 x12 x13 x14 x15 R1

2 x21 x22 x23 x24 x25 R2

3 x31 x32 x33 x34 x35 R3

4 x41 x42 x43 x44 x45 R4

Total C1 C2 C3 C4 C5 N

Total row 1

thc

jiji ixR

1

column Totalc

thj ij

i

C x j

Page 11: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Columns

1 2 3 4 5 Total

1 E11 E12 E13 E14 E15 R1

2 E21 E22 E23 E24 E25 R2

3 E31 E32 E33 E34 E35 R3

4 E41 E42 E43 E44 E45 R4

Total C1 C2 C3 C4 C5 n

n

CRE ji

ij

Page 12: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Justification if i jij

R CE

n then ij j

i

E C

R n

1 2 3 4 5 Total

1 E11 E12 E13 E14 E15 R1

2 E21 E22 E23 E24 E25 R2

3 E31 E32 E33 E34 E35 R3

4 E41 E42 E43 E44 E45 R4

Total C1 C2 C3 C4 C5 n

Proportion in column j for row i

overall proportion in column j

Page 13: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

and if i jij

R CE

n then ij i

j

E R

C n

1 2 3 4 5 Total

1 E11 E12 E13 E14 E15 R1

2 E21 E22 E23 E24 E25 R2

3 E31 E32 E33 E34 E35 R3

4 E41 E42 E43 E44 E45 R4

Total C1 C2 C3 C4 C5 n

Proportion in row i for column j

overall proportion in row i

Page 14: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

The 2 statistic

r

i

c

j ij

ijij

E

Ex

1 1

2

2

Eij= Expected frequency in the (i,j) th cell in the case of independence.

xij= observed frequency in the (i,j) th cell

Page 15: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Example: studying the relationship between Systolic Blood pressure and Serum Cholesterol

In this example we are interested in whether Systolic Blood pressure and Serum Cholesterol are related or whether they are independent.

Both were measured for a sample of n = 1237 cases

Page 16: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Serum Cholesterol

Systolic Blood pressure <127 127-146 147-166 167+ Total

< 200 117 121 47 22 307200-219 85 98 43 20 246220-259 115 209 68 43 439

260+ 67 99 46 33 245

Total 388 527 204 118 1237

Observed frequencies

Page 17: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Serum Cholesterol

Systolic Blood pressure <127 127-146 147-166 167+ Total

< 200 96.29 130.79 50.63 29.29 307200-219 77.16 104.8 40.47 23.47 246220-259 137.70 187.03 72.40 41.88 439

260+ 76.85 104.38 40.04 23.37 245

Total 388 527 204 118 1237

Expected frequencies

In the case of independence the distribution across a row is the same for each rowThe distribution down a column is the same for each column

Page 18: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Table Expected frequencies, Observed frequencies, Standardized Residuals

Serum Systolic Blood pressure

Cholesterol <127 127-146 147-166 167+ Total <200 96.29 130.79 50.63 29.29 307 (117) (121) (47) (22) 2.11 -0.86 -0.51 -1.35 200-219 77.16 104.80 40.47 23.47 246 (85) (98) (43) (20) 0.86 -0.66 0.38 -0.72 220-259 137.70 187.03 72.40 41.88 439 (119) (209) (68) (43) -1.59 1.61 -0.52 0.17 260+ 76.85 104.38 40.04 23.37 245 (67) (99) (46) (33) -1.12 -0.53 0.88 1.99 Total 388 527 204 118 1237

2 = 20.85

ij

ijijij

E

Exr

Page 19: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Standardized residuals

ij

ijijij

E

Exr

85.20

1 1

2

1 1

2

2

r

i

c

jij

r

i

c

j ij

ijij rE

Ex

The 2 statistic

Page 20: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Properties of the 2 statistic

1. The 2 statistic is always positive.

2. Small values of 2 indicate that Rows and Columns are independent. In this case will be in the range of (r – 1)(c – 1).

3. Large values of 2 indicate that Rows and columns are not independent.

4. Later on we will discuss this in more detail (when we study Hypothesis Testing).

Page 21: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Example

This comes from the drug use data.

The two variables are:

1. Role (C) and

2. Antidepressant Use (R)

measured for a sample of n = 33,957 subjects.

Page 22: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Two-way Frequency Table

Percentage antidepressant use vs Role

Took anti-depressants - 12 mo * role Crosstabulation

Count

344 101 201 275 455 63 224 414 2077

6268 967 1150 5150 5249 392 3036 2679 24891

6612 1068 1351 5425 5704 455 3260 3093 26968

YES

NO

Took anti-depressants- 12 mo

Total

parent,partner,worker

parent,partner parent, worker

partner,worker worker only parent only partner only no roles

role

Total

Role parent, partner, worker

parent, partner

parent, worker

partner, worker

worker only parent only

partner only no roles

5.20% 9.46% 14.88% 5.07% 7.98% 13.85% 6.87% 13.39%

Page 23: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Antidepressant Use vs Role

0.0%

5.0%

10.0%

15.0%

20.0%

parent,partner,worker

parent,partner

parent,worker

partner,worker

workeronly

parentonly

partneronly

no roles

2 = 381.961

Page 24: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Calculation of 2

1 2 3 4 5 6 7 8 Total

YES 344 101 201 275 455 63 224 414 2077NO 6268 967 1150 5150 5249 392 3036 2679 24891

Total 6612 1068 1351 5425 5704 455 3260 3093 26968

The Raw data

Expected frequencies1 2 3 4 5 6 7 8 Total (R i )

YES 509.24 82.25 104.05 417.82 439.31 35.04 251.08 238.21 2077NO 6102.76 985.75 1246.95 5007.18 5264.69 419.96 3008.92 2854.79 24891

Total (C j ) 6612 1068 1351 5425 5704 455 3260 3093 26968

ij

ijijij

E

Exr

i jij

R CE

n

Page 25: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

The Residuals

The calculation of 2

ij

ijijij

E

Exr

1 2 3 4 5 6 7 8

YES -7.32 2.07 9.50 -6.99 0.75 4.72 -1.71 11.39NO 2.12 -0.60 -2.75 2.02 -0.22 -1.36 0.49 -3.29

2

2 2 381.961ij ij

iji j i j ij

x Er

E

Page 26: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Probability Theory

Modelling random phenomena

Page 27: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Some counting formulae

Permutations

the number of ways that you can order n objects is:

n! = n(n-1)(n-2)(n-3)…(3)(2)(1)

Example:

the number of ways you can order the three letters A, B, and C is 3! = 3(2)(1) = 6

ABC ACB BAC BCA CAB CBA

Page 28: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Definition

0! = 1

Reason

mathematical consistency.

In many of the formulae given later, this definition leads to consistency.

Page 29: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Permutations

the number of ways that you can choose k objects from n objects in a specific order:

Example:

the number of ways you choose two letters from the four letters A, B, D, C in a specific order is

)1()1()!(

!

knnn

kn

nPkn

12)3)(4(!2

!4

)!24(

!424

P

Page 30: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

AB BA AC CA AD DA

BC CB BD DB CD DC

Page 31: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Example:

Suppose that we have a committee of 10 people. We want to choose a chairman, a vice-chairman, and a treasurer for the committee. The chairman is chosen first, the vice chairman second and the treasures third. How many ways can this be done.

)1()1()!(

!

knnn

kn

nPkn

10 3

10! 10!(10)(9)(8) 720

(10 3)! 7!P

Page 32: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Example:

How many ways can we order n objects.

Answern!

or

Choose n objects from n objects in a specific order

! !! if 0! 1.

( )! 0!n n

n nP n

n n

This is what is meant by the statement that the definition 0! = 1 leads to mathematical consistency

Page 33: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Combinations

the number of ways that you can choose k objects from n objects (order irrelevant) is:

)1()1(

)1()1(

)!(!

!

kk

knnn

knk

n

k

nCkn

Page 34: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Example:

the number of ways you choose two letters from the four letters A, B, D, C

{A,B} {A,C} {A,D} {B,C} {B,D}{C,D}

62

12

)1)(2(

)3)(4(

!2!2

!4

)!24(!2

!4

2

424

C

Page 35: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Example:

Suppose we have a committee of 10 people and we want to choose a sub-committee of 3 people. How many ways can this be done

45)1)(2)(3(

)3)(9)(10(

!7!3

!10

3

10310

C

Page 36: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Example: Random sampling

Suppose we have a club of N =1000 persons and we want to choose sample of k = 250 of these individuals to determine there opinion on a given issue. How many ways can this be performed?

The choice of the sample is called random sampling if all of the choices has the same probability of being selected

2422501000 10823.4

!750!250

!1000

250

1000

C

Page 37: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Important Note:

0! is always defined to be 1.

Also

are called Binomial Coefficients

)!(!

!

knk

n

k

nCkn

Page 38: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Reason:

The Binomial Theorem

nyx

0222

111

00 yxCyxCyxCyxC n

nnn

nn

nn

n

022110

210yx

n

nyx

nyx

nyx

n nnnn

Page 39: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Binomial Coefficients can also be calculated using Pascal’s triangle

11 1

1 2 11 3 3 1

1 4 6 4 1

1 5 10 10 5 1

1 6 15 20 15 6 1

Page 40: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Random Variables

Probability distributions

Page 41: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Definition:

A random variable X is a number whose value is determined by the outcome of a random experiment (random phenomena)

Page 42: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Examples1. A die is rolled and X = number of spots

showing on the upper face.2. Two dice are rolled and X = Total number

of spots showing on the two upper faces.3. A coin is tossed n = 100 times and

X = number of times the coin toss resulted in a head.

4. A person is selected at random from a population and

X = weight of that individual.

Page 43: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

5. A sample of n = 100 individuals are selected at random from a population (i.e. all samples of n = 100 have the same probability of being selected) .

X = the average weight of the 100 individuals.

Page 44: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

In all of these examples X fits the definition of a random variable, namely:– a number whose value is determined by the

outcome of a random experiment (random phenomena)

Page 45: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Random variables are either

• Discrete– Integer valued – The set of possible values for X are integers

• Continuous– The set of possible values for X are all real

numbers – Range over a continuum.

Page 46: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Examples

• Discrete

– A die is rolled and X = number of spots showing on the upper face.

– Two dice are rolled and X = Total number of spots showing on the two upper faces.

– A coin is tossed n = 100 times and X = number of times the coin toss resulted in a head.

Page 47: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Examples

• Continuous– A person is selected at random from a

population and X = weight of that individual.– A sample of n = 100 individuals are selected

at random from a population (i.e. all samples of n = 100 have the same probability of being selected) . X = the average weight of the 100 individuals.

Page 48: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Probability distribution of a Random Variable

Page 49: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

The probability distribution of a discrete random variable is describe by its :

probability function p(x).

p(x) = the probability that X takes on the value x.

Page 50: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Examples

• Discrete

– A die is rolled and X = number of spots showing on the upper face.

– Two dice are rolled and X = Total number of spots showing on the two upper faces.

x 1 2 3 4 5 6

p(x) 1/6 1/6 1/6 1/6 1/6 1/6

x 2 3 4 5 6 7 8 9 10 11 12p(x) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36

Page 51: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Graphs

To plot a graph of p(x), draw bars of height p(x) above each value of x.

Rolling a die

0

1 2 3 4 5 6

Page 52: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Rolling two dice

0

Page 53: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Note:1. 0 p(x) 1

2.

3.

x

xp 1

b

ax

xpbXaP )(

Page 54: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

The probability distribution of a continuous random variable is described by its :

probability density curve f(x).

Page 55: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

i.e. a curve which has the following properties :• 1.      f(x) is always positive.

• 2.      The total are under the curve f(x) is one.

• 3.      The area under the curve f(x) between a and b is the probability that X lies between the two values.

Page 56: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

0

0.005

0.01

0.015

0.02

0.025

0 20 40 60 80 100 120

f(x)

Page 57: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

An Important discrete distribution

The Binomial distribution

Suppose we have an experiment with two outcomes – Success(S) and Failure(F).

Let p denote the probability of S (Success).

In this case q=1-p denotes the probability of Failure(F).

Now suppose this experiment is repeated n times independently.

Page 58: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Let X denote the number of successes occuring in the n repititions.

Then X is a random variable.

It’s possible values are

0, 1, 2, 3, 4, … , (n – 2), (n – 1), n

and p(x) for any of the above values of x is given by:

xnxxnx qpx

npp

x

nxp

1

Page 59: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

X is said to have the Binomial distribution with parameters n and p.

Page 60: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Summary:

X is said to have the Binomial distribution with parameters n and p.

1. X is the number of successes occuring in the n repititions of a Success-Failure Experiment.

2. The probability of success is p.

3. xnx pp

x

nxp

1

Page 61: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Examples:

1. A coin is tossed n = 5 times. X is the number of heads occuring in the 5 tosses of the coin. In this case p = ½ and

3215

215

21

21

555

xxxxp xx

x 0 1 2 3 4 5

p(x)321

325

325

321

3210

3210

Page 62: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Random Variables

Numerical Quantities whose values are determine by the outcome of a

random experiment

Page 63: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Discrete Random VariablesDiscrete Random Variable: A random variable usually assuming an integer value.

• a discrete random variable assumes values that are isolated points along the real line. That is neighbouring values are not “possible values” for a discrete random variable

Note: Usually associated with counting• The number of times a head occurs in 10 tosses of a coin

• The number of auto accidents occurring on a weekend

• The size of a family

Page 64: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Continuous Random Variables

Continuous Random Variable: A quantitative random variable that can vary over a continuum

• A continuous random variable can assume any value along a line interval, including every possible value between any two points on the line

Note: Usually associated with a measurement• Blood Pressure

• Weight gain

• Height

Page 65: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Probability Distributionsof a Discrete Random Variable

Page 66: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Probability Distribution & Function

Probability Distribution: A mathematical description of how probabilities are distributed with each of the possible values of a random variable.

Notes: The probability distribution allows one to determine probabilities

of events related to the values of a random variable. The probability distribution may be presented in the form of a

table, chart, formula.

Probability Function: A rule that assigns probabilities to the values of the random variable

Page 67: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

x 0 1 2 3

p(x) 6/14 4/14 3/14 1/14

ExampleIn baseball the number of individuals, X, on base when a home run is hit ranges in value from 0 to 3. The probability distribution is known and is given below:

P X( )the random variable equals 2 p ( ) 23

14

Note: This chart implies the only values x takes on are 0, 1, 2, and 3. If the random variable X is observed repeatedly the probabilities,

p(x), represents the proportion times the value x appears in that sequence.

2least at is variablerandom the XP 32 pp 14

4

14

1

14

3

Page 68: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

A Bar Graph

No. of persons on base when a home run is hit

0.429

0.286

0.214

0.071

0.000

0.100

0.200

0.300

0.400

0.500

0 1 2 3

# on base

p(x)

Page 69: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Comments:Every probability function must satisfy:

1)(0 xp

1. The probability assigned to each value of the random variable must be between 0 and 1, inclusive:

x

xp

1)(

2. The sum of the probabilities assigned to all the values of the random variable must equal 1:

b

ax

xpbXaP )(3.

)()1()( bpapap

Page 70: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Mean and Variance of aDiscrete Probability Distribution

• Describe the center and spread of a probability distribution

• The mean (denoted by greek letter (mu)), measures the centre of the distribution.

• The variance (2) and the standard deviation () measure the spread of the distribution.

is the greek letter for s.

Page 71: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Mean of a Discrete Random Variable• The mean, , of a discrete random variable x is found by

multiplying each possible value of x by its own probability and then adding all the products together:

Notes: The mean is a weighted average of the values of X.

x

xxp

kk xpxxpxxpx 2211

The mean is the long-run average value of the random variable.

The mean is centre of gravity of the probability distribution of the random variable

Page 72: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

-

0.1

0.2

0.3

1 2 3 4 5 6 7 8 9 10 11

Page 73: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

2

Variance and Standard DeviationVariance of a Discrete Random Variable: Variance, 2, of a discrete random variable x is found by multiplying each possible value of the squared deviation from the mean, (x )2, by its own probability and then adding all the products together:

Standard Deviation of a Discrete Random Variable: The positive square root of the variance:

x

xpx 22

2

2

xx

xxpxpx

22 x

xpx

Page 74: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

ExampleThe number of individuals, X, on base when a home run is hit ranges in value from 0 to 3.

x p (x ) xp(x) x 2 x 2 p(x)

0 0.429 0.000 0 0.0001 0.286 0.286 1 0.2862 0.214 0.429 4 0.8573 0.071 0.214 9 0.643

Total 1.000 0.929 1.786

)(xp )(xxp )(2 xpx

Page 75: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

• Computing the mean:

Note: • 0.929 is the long-run average value of the random variable • 0.929 is the centre of gravity value of the probability

distribution of the random variable

929.0x

xxp

Page 76: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

• Computing the variance:

x

xpx 22

2

2

xx

xxpxpx

923.0929.786.1 2

• Computing the standard deviation:

2

961.0923.0

Page 77: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

The Binomial distribution1. We have an experiment with two outcomes

– Success(S) and Failure(F).

2. Let p denote the probability of S (Success).

3. In this case q=1-p denotes the probability of Failure(F).

4. This experiment is repeated n times independently.

5. X denote the number of successes occuring in the n repititions.

Page 78: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

The possible values of X are

0, 1, 2, 3, 4, … , (n – 2), (n – 1), n

and p(x) for any of the above values of x is given by:

xnxxnx qpx

npp

x

nxp

1

X is said to have the Binomial distribution with parameters n and p.

Page 79: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Summary:

X is said to have the Binomial distribution with parameters n and p.

1. X is the number of successes occurring in the n repetitions of a Success-Failure Experiment.

2. The probability of success is p.

3. The probability function

xnx ppx

nxp

1

Page 80: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Example:

1. A coin is tossed n = 5 times. X is the number of heads occurring in the 5 tosses of the coin. In this case p = ½ and

3215

215

21

21

555

xxxxp xx

x 0 1 2 3 4 5

p(x)321

325

325

321

3210

3210

Page 81: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

0.0

0.1

0.2

0.3

0.4

1 2 3 4 5 6

number of heads

p(x

)

Page 82: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Computing the summary parameters for the distribution – , 2,

x p (x ) xp(x) x 2 x 2 p(x)

0 0.03125 0.000 0 0.0001 0.15625 0.156 1 0.1562 0.31250 0.625 4 1.2503 0.31250 0.938 9 2.8134 0.15625 0.625 16 2.5005 0.03125 0.156 25 0.781

Total 1.000 2.500 7.500

)(xp )(xxp )(2 xpx

Page 83: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

• Computing the mean: 5.2

x

xxp

• Computing the variance:

x

xpx 22

2

2

xx

xxpxpx

25.15.25.7 2

• Computing the standard deviation:

2

118.125.1

Page 84: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Example:

• A surgeon performs a difficult operation n = 10 times.

• X is the number of times that the operation is a success.

• The success rate for the operation is 80%. In this case p = 0.80 and

• X has a Binomial distribution with n = 10 and p = 0.80.

Page 85: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

xx

xxp

1020.080.0

10

x 0 1 2 3 4 5p (x ) 0.0000 0.0000 0.0001 0.0008 0.0055 0.0264

x 6 7 8 9 10p (x ) 0.0881 0.2013 0.3020 0.2684 0.1074

Computing p(x) for x = 1, 2, 3, … , 10

Page 86: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

The Graph

-

0.1

0.2

0.3

0.4

0 1 2 3 4 5 6 7 8 9 10

Number of successes, x

p(x

)

Page 87: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

Computing the summary parameters for the distribution – , 2,

)(xxp )(2 xpx

x p (x ) xp(x) x 2 x 2 p(x)

0 0.0000 0.000 0 0.0001 0.0000 0.000 1 0.0002 0.0001 0.000 4 0.0003 0.0008 0.002 9 0.0074 0.0055 0.022 16 0.0885 0.0264 0.132 25 0.6616 0.0881 0.528 36 3.1717 0.2013 1.409 49 9.8658 0.3020 2.416 64 19.3279 0.2684 2.416 81 21.743

10 0.1074 1.074 100 10.737Total 1.000 8.000 65.600

Page 88: The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables

• Computing the mean: 0.8

x

xxp

• Computing the variance:

x

xpx 22

2

2

xx

xxpxpx

60.10.86.65 2

• Computing the standard deviation:

2 118.125.1