Epidemiology 9509 sampling distributions (more)
Epidemiology 9509Principle of Biostatistics
Chapter 7: Sampling Distributions (continued)
John Koval
Department of Epidemiology and BiostatisticsUniversity of Western Ontario
1
Epidemiology 9509 sampling distributions (more)
Next
want to look at histogram of sample statisticssample mean, median, sample variance, sample standard deviation
to see what their distribution looks like
2
Epidemiology 9509 sampling distributions (more)
sample mean of Bernoullis
Consider the sample of 10 observations from a Bernoulli that is,the sample of 10 responses to the question
Do you smoke? where Yes is valued as 1
and No is valued as 0
In what are we interested??
3
Epidemiology 9509 sampling distributions (more)
sample mean of Bernoullis
Consider the sample of 10 observations from a Bernoulli that is,the sample of 10 responses to the question
Do you smoke? where Yes is valued as 1
and No is valued as 0
In what are we interested??the proportion, pwhich is the sample mean of a bunch of 0’s and 1’s
4
Epidemiology 9509 sampling distributions (more)
Random variables - some math
Les us call X1, a random variable which measuresthe response (0 or 1) of the first person
and X2 is the response is the response of the second person
etc, up to X10, the response of the 10’th person
let Y be the sum of the responses of all ten subjects
Then P, the sample proportion, is the average (sample mean)or all ten responses
that is P = Y
n=
∑101 Xi
n= 0+1+1...+0
10
5
Epidemiology 9509 sampling distributions (more)
Distribution of a sample mean of Bernoullis
Remember that Y is the sum of 10 Bernoullis
so that what is the distribution of Y?(which can be thought of number of ”successes” in a sample ofsize 10)
6
Epidemiology 9509 sampling distributions (more)
Distribution of a sample mean of Bernoullis
Remember that Y is the sum of 10 Bernoullis
so that what is the distribution of Y?(which can be thought of number of ”successes” in a sample ofsize 10)
Binomial (10,0.2)where π = 0.2 is the population proportion of smokersor the probability of picking a smoker at random
Hence the distribution of the sample proportionis that of a multiple of the binomial distribution
that is, it is a curve which has the same ”boxes” as the binomialexcept the x-axis is marked in proportions rather that integers
7
Epidemiology 9509 sampling distributions (more)
Binomial Distribution B(10,0.2)
x Pr(X=x)
0 0.107371 0.268442 0.301993 0.201334 0.088085 0.026426 0.005517 0.000798 0.000079 0.0000010 0.00000
8
Epidemiology 9509 sampling distributions (more)
Bin(10,0.2)
0.15
0.10
0.00
0.20
0.05
8 9765430 1 2 10
Probability
0.25
0.30
9
Epidemiology 9509 sampling distributions (more)
Distribution of proportion
x Pr(X=x)
0.0 0.107370.1 0.268440.2 0.301990.3 0.201330.4 0.088080.5 0.026420.6 0.005510.7 0.000790.8 0.000070.9 0.000001.0 0.00000
10
Epidemiology 9509 sampling distributions (more)
proportion of 10 Bern(0.2)’s
0.15
0.10
0.00
0.20
0.05
Probability
0.25
0.30
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
11
Epidemiology 9509 sampling distributions (more)
distribution of proportions
If the proportion is the average of a number of Bernoullidistributionsits distribution is exactly a multiple of a Binomial
Hence we can always plot its distribution and calculate probabilities
From a previous lecture, we know thatfor large sample size, n, and nπ > 5the binomial distribution can be approximated by a Normaldistribution
Similarly, the distribution of the proportionfor large sample size, n, and nπ > 5can be approximated by a multiple of a Normal distribution
12
Epidemiology 9509 sampling distributions (more)
Sample means from other distributions
easy stuff ends here
If we have more complicated distributions that produce the dataof which we are calculating sample means
we cannot get the distributions so easily as for the proportion
However, for large samples, the distribution can be approximated
13
Epidemiology 9509 sampling distributions (more)
Sampling from a Binomial
Consider taking a random sample of 10 peopleto you have administered the earlier described Stress ScaleWe assume that the distribution of the Stress Scaleis Binomial(10,0.2)
From what we have just donewe know that, if we simulate the taking of such sample many timeswe can plot the resulting statisticand see the distribution of the statisticin this case, that of the sample mean
14
Epidemiology 9509 sampling distributions (more)
Distribution of sample mean - 1000 simulations
Title ’distribution of sample mean’;
options ps=24 ls=64;
data samples;
seed=25487;
nsim = 1000;
nsam=10;
nquest=10;
pi=0.2;
do nrun = 1 to nsim;
sumx = 0;
do i =1 to nsam ;
x=ranbin(seed,n,pi);
sumx = sumx+x;
end;
xbar=sumx/nsam; output;
end;
15
Epidemiology 9509 sampling distributions (more)
Distribution of sample mean (continued)
this is a default plot
proc means;
var xbar;
title ’sampling distribution of sample means’;
proc chart;
vbar xbar/type=pct space=0;
proc gchart;
vbar xbar/type=pct space=0;
16
Epidemiology 9509 sampling distributions (more)
Statistics
Sample statistics
nsam Mean Std Dev Minimum Maximum
---------------------------------------------------
10 1.9980000 0.3982510 0.6000000 3.7000000
30 1.9983867 0.2340997 1.1666667 2.9000000
100 1.9984980 0.1279179 1.5600000 2.5600000
---------------------------------------------------
as the sample size increases
1. the standard deviation gets smaller
2. the range gets smaller, and more symmetric
17
Epidemiology 9509 sampling distributions (more)
CHART output for sample size 10
Graphical representation of changes with sample size
Percentage
10 | ***
| ****
8 | ******
| *******
6 | *********
| **********
4 | ***********
| *************
2 | ***************
| *******************
---------------------------
1.1 1.5 1.9 2.3 2.7 3.1
18
Epidemiology 9509 sampling distributions (more)
CHART output for sample size 30
Percentage
12 | **
| ****
10 | ****** **
| ****** **
8 | ******** ****
| ******** ****
6 | ** ****************
| ** ****************
4 | ************************
| ************************
2 | ******************************
| ************************************
-------------------------------------
1.5 1.7 1.9 2.1 2.3 2.5 2.7 2.9
19
Epidemiology 9509 sampling distributions (more)
CHART output for sample size 100
Percentage
10 | *
| ****
8 | ******
| *******
6 | *********
| **********
4 | ************
| **************
2 | *****************
| *********************
------------------------------------
1.7 1.9 2.1 2.3 2.5
20
Epidemiology 9509 sampling distributions (more)
sample size 10- default plot
fancier graphs
21
Epidemiology 9509 sampling distributions (more)
sample size 30- default plot
22
Epidemiology 9509 sampling distributions (more)
sample size 100 - default plot
23
Epidemiology 9509 sampling distributions (more)
Distribution of sample mean (continued again)
this is a plot with a defined rangeso that we can compare the output for 10,30,100
proc gchart;
vbar xbar/type=pct space=0
midpoints = 0.6 to 3.4 by 0.2;
24
Epidemiology 9509 sampling distributions (more)
sample size 10- plot with defined range
25
Epidemiology 9509 sampling distributions (more)
sample size 30- plot with defined range
26
Epidemiology 9509 sampling distributions (more)
sample 100- plot with defined range
can see that plots centre around population mean (2.0)
27
Epidemiology 9509 sampling distributions (more)
Conclusions
1. as sample size gets largervariance decreases
2. as sample size gets largercurve looks more symmetric
28
Epidemiology 9509 sampling distributions (more)
Distribution of sample mean (more)
alternatively use Proc UNIVARIATE’s command HISTOGRAMfor both the histogram and approximating normal
proc univariate;
var xbar;
histogram /normal(mu = 2.0 sigma = 0.4);
where sigma = 0.2309 for nsam = 30and sigma = 0.1265 for nsam = 100
29
Epidemiology 9509 sampling distributions (more)
sample size 10- histogram and theoretical distribution
30
Epidemiology 9509 sampling distributions (more)
sample size 30- histogram and theoretical distribution
31
Epidemiology 9509 sampling distributions (more)
sample 100- histogranmand theoretical distribution
32
Epidemiology 9509 sampling distributions (more)
Conclusions
1. as sample size gets largercurve looks more Normal
33
Epidemiology 9509 sampling distributions (more)
Sampling from other distributions
1. Normal - perfectdistribution of sample mean is Normalregardless of sample size
2. symmetric, eg, Uniformdistribution of sample mean is symmetric(for uniform, tails may be truncated)for ”smallish” samples, distribution is normalapproximately
3. asymmetric - continuous counterpart of Binomiallike Binomial
3.1 for large sample size, distribution is approximately normal3.2 for small sample size, approximation to normal is poor
34
Epidemiology 9509 sampling distributions (more)
The Central Limit Theorem
◮ take sample of size nsam
◮ for nsam large enoughthe distribution of the sample meanwill be ”Normal”
35
Epidemiology 9509 sampling distributions (more)
The Central Limit Theorem (statistically)
◮ sample from (µ, σ2) nsam times
◮ for nsam large enoughX̄ ∼ N(µ, σ2/nsam)
36