at the beginning of the term, we talked about populations and samples what are they? why do we...
TRANSCRIPT
Sampling
Why We Sample
At the beginning of the term, we talked about populations and samples What are they? Why do we take samples?
Sampling
Generally, we want to know about the population But, studying/surveying the entire
population is problematic!▪ Too costly▪ May be impossible!
Sampling
So, we typically study samples rather than entire populations But, we are not usually interested in the
sample itself We hope that the sample will give
us insight into the population
Sampling
Starting here, we will look at the relationship between samples and populations What we can learn How precise/reliable the information is
Sample Mean vs. Population Mean
Suppose we were interested in knowing the average travel time for students coming to Seneca We don’t want to ask every Seneca
student So, we take a sample We hope that the sample mean will give
us insight into the population mean
Sample Mean vs. Population Mean
Will the sample mean be exactly equal to the population mean?
Sample Mean vs. Population Mean
Will the sample mean be exactly equal to the population mean? No, because it depends on exactly who
winds up in our sample
Sample Mean vs. Population Mean
Will the sample mean be the same same for every sample?
Sample Mean vs. Population Mean
Will the sample mean be the same same for every sample? No, because it depends on exactly who
winds up in our sample
Let’s Try This!
Get into groups (samples) of two, and calculate your average travel time
Key Points
1. The sample mean is RANDOM Depends on exactly who winds up in the
sample
Sample Mean vs. Population Mean
Do these samples give us reliable estimates of the population mean?
Sample Mean vs. Population Mean
Do these samples give us reliable estimates of the population mean? VERY SMALL -> Subject to a great deal of
randomness
Let’s Try It Again…
Groups of 3
Let’s Try It Again…
Groups of 5
Let’s Try It Again…
Groups of 10
Key Points
1. The sample mean is RANDOM Depends on exactly who winds up in the
sample
2. The larger the sample, the more likely that the sample mean will be close to the population mean
In larger samples, the randomness tends to ‘average out’, meaning less random fluctuation from sample to sample
Larger samples give more reliable results
Implications
Because the sample mean is random, we can describe it using a probability distribution I.e., for any given sample mean, there is
some probability And, we can talk about, ‘what is the
probability that we get a sample mean in the range ______?’
Called the ‘sampling distribution’
What Does the Sampling Distribution Look Like?
Depending on the actual raw data distribution, the distribution of the sample mean can have many different shapes In the next slide, we look at three
different data distributions, and what the distribution of the sample means looks like▪ When sample size, n, =2
What Does the Sampling Distribution Look Like?
Source: Dawson B, Trapp RG: Basic & Clinical Biostatistics, 4th edition
Raw DataDistribution of Sample Mean,
n=2
What Does the Sampling Distribution Look Like?
Those distributions look strange!
But, as sample size increases, wonderful things happen: First, the sample mean gets more
accurate▪ The distribution gets narrower▪ I.e., the probability of getting a sample
mean far from the real population mean is low
Second, the distribution changes shape
What Does the Sampling Distribution Look Like?
Source: Dawson B, Trapp RG: Basic & Clinical Biostatistics, 4th edition
Raw DataDistribution of Sample Mean,
When n=2
When n=10When n=30
Central Limit Theorem
As we take larger samples, the distribution of the sample mean approaches the normal distribution! (Almost) regardless of the shape of the actual
data!
Because of this, we can use what we have learned about the normal distribution to, e.g., judge how reliable/accurate our sample results are!
T-Distribution
As discussed, if the sample size is large, the sampling distribution approaches the normal distribution But, its not exactly equal to the normal
distribution▪ Especially if n is small!
For this reason, we have another distribution that we use, which is closely related
T-Distribution
T distribution takes sample size into account
T is wider and flatter than normal The smaller the
sample, the wider and flatter!▪ Reflecting that the
information is less reliable▪ I.e., that we are more
likely to get a result far from the real population mean
-3-2
.5 -2-1
.5 -1-0
.5 00.
5 11.
5 2
2.50
0000
0000
0001
3.00
0000
0000
0001
0
0.1
0.2
0.3
0.4
NormalT, n=2T, n=4
T-Distribution
T use the t-distribution we need to provide degrees of freedom This is just n – 1▪ (Sample size – 1)
Understanding Sample Mean
We can use the t-distribution to determine the probability of getting a mean in a given range, in the same way we used the normal distribution to find the probability of getting a value in a certain range
Solving Sample Questions When using t, no built-in ‘one-step’ like
norm.dist
2-step process1. Convert the x-value(s) into t-scores▪ Like z-scores!
2. Use the t-score(s) to look up the probability▪ Using t.dist▪ And the same structure: ‘Less than’ -> t.dist;
‘Greater than’ -> 1-t.dist; ‘Between’ -> t.dist(big) – t.dist(small)
Step 1: T-score
Recall: z = (value – mean)/SD
T-score: t = (value – mean)/(SD/sqrt(n))
Divide standard deviation by square root of sample size• The bigger the sample size,
the bigger number you divide SD by• -> Smaller SD -> less
spread out/more accurate!
Step 2: Calculate Probability
=t.dist(t-score, degrees of freedom, True)
Business Problems
I will walk you through an example, but first, we note that we cover this primarily so you will understand what comes later Direct business applications (or at least,
marketing applications) aren’t as common as for other techniques
‘Normal distribution’ Question
Heights for a particular segment are normally distributed, with an average of 176 cm, and a standard deviation of 7.1 cm. If you select an individual at random,
what is the probability that he has a height greater than 180 cm?
‘Normal distribution’ Question
Heights for a particular segment are normally distributed, with an average of 176 cm, and a standard deviation of 7.1 cm. If you select an individual at random,
what is the probability that he has a height greater than 180 cm?
=1 – norm.dist(180, 176, 7.1, true) ≈ 0.287
‘Sampling’ Question
Heights for a particular segment are normally distributed, with an average of 176 cm, and a standard deviation of 7.1 cm. If you select a random sample of size
5, what is the probability that the mean height is greater than 180 cm?
‘Sampling’ Question
Heights for a particular segment are normally distributed, with an average of 176 cm, and a standard deviation of 7.1 cm. If you select a random sample of size
5, what is the probability that the mean height is greater than 180 cm?
t = (180-176)/(7.1/sqrt(5)) = 1.259756607
prob =1 – t.dist(1.259756607, 7.1, true) ≈ 0.138
More practice
Repeat, with: Sample size of 15 Sample size of 30
What happens to the probability? Why?