at the beginning of the term, we talked about populations and samples what are they? why do we...

Sampling

Why We Sample

At the beginning of the term, we talked about populations and samples What are they? Why do we take samples?

Sampling

Generally, we want to know about the population But, studying/surveying the entire

population is problematic!▪ Too costly▪ May be impossible!

Sampling

So, we typically study samples rather than entire populations But, we are not usually interested in the

sample itself We hope that the sample will give

us insight into the population

Sampling

Starting here, we will look at the relationship between samples and populations What we can learn How precise/reliable the information is

Sample Mean vs. Population Mean

Suppose we were interested in knowing the average travel time for students coming to Seneca We don’t want to ask every Seneca

student So, we take a sample We hope that the sample mean will give

us insight into the population mean


Will the sample mean be exactly equal to the population mean?


Will the sample mean be exactly equal to the population mean? No, because it depends on exactly who

winds up in our sample


Will the sample mean be the same same for every sample?


Will the sample mean be the same same for every sample? No, because it depends on exactly who

winds up in our sample

Let’s Try This!

Get into groups (samples) of two, and calculate your average travel time

Key Points

1. The sample mean is RANDOM Depends on exactly who winds up in the

sample


Do these samples give us reliable estimates of the population mean?


Do these samples give us reliable estimates of the population mean? VERY SMALL -> Subject to a great deal of

randomness

Let’s Try It Again…

Groups of 3


Groups of 5


Groups of 10

Key Points

1. The sample mean is RANDOM Depends on exactly who winds up in the

sample

2. The larger the sample, the more likely that the sample mean will be close to the population mean

In larger samples, the randomness tends to ‘average out’, meaning less random fluctuation from sample to sample

Larger samples give more reliable results

Implications

Because the sample mean is random, we can describe it using a probability distribution I.e., for any given sample mean, there is

some probability And, we can talk about, ‘what is the

probability that we get a sample mean in the range ______?’

Called the ‘sampling distribution’

What Does the Sampling Distribution Look Like?

Depending on the actual raw data distribution, the distribution of the sample mean can have many different shapes In the next slide, we look at three

different data distributions, and what the distribution of the sample means looks like▪ When sample size, n, =2


Source: Dawson B, Trapp RG: Basic & Clinical Biostatistics, 4th edition

Raw DataDistribution of Sample Mean,

n=2


Those distributions look strange!

But, as sample size increases, wonderful things happen: First, the sample mean gets more

accurate▪ The distribution gets narrower▪ I.e., the probability of getting a sample

mean far from the real population mean is low

Second, the distribution changes shape


Source: Dawson B, Trapp RG: Basic & Clinical Biostatistics, 4th edition

Raw DataDistribution of Sample Mean,

When n=2

When n=10When n=30

Central Limit Theorem

As we take larger samples, the distribution of the sample mean approaches the normal distribution! (Almost) regardless of the shape of the actual

data!

Because of this, we can use what we have learned about the normal distribution to, e.g., judge how reliable/accurate our sample results are!

T-Distribution

As discussed, if the sample size is large, the sampling distribution approaches the normal distribution But, its not exactly equal to the normal

distribution▪ Especially if n is small!

For this reason, we have another distribution that we use, which is closely related

T-Distribution

T distribution takes sample size into account

T is wider and flatter than normal The smaller the

sample, the wider and flatter!▪ Reflecting that the

information is less reliable▪ I.e., that we are more

likely to get a result far from the real population mean

-3-2

.5 -2-1

.5 -1-0

.5 00.

5 11.

5 2

2.50

0000

0000

0001

3.00

0000

0000

0001

0

0.1

0.2

0.3

0.4

NormalT, n=2T, n=4

T-Distribution

T use the t-distribution we need to provide degrees of freedom This is just n – 1▪ (Sample size – 1)

Understanding Sample Mean

We can use the t-distribution to determine the probability of getting a mean in a given range, in the same way we used the normal distribution to find the probability of getting a value in a certain range

Solving Sample Questions When using t, no built-in ‘one-step’ like

norm.dist

2-step process1. Convert the x-value(s) into t-scores▪ Like z-scores!

2. Use the t-score(s) to look up the probability▪ Using t.dist▪ And the same structure: ‘Less than’ -> t.dist;

‘Greater than’ -> 1-t.dist; ‘Between’ -> t.dist(big) – t.dist(small)

Step 1: T-score

Recall: z = (value – mean)/SD

T-score: t = (value – mean)/(SD/sqrt(n))

Divide standard deviation by square root of sample size• The bigger the sample size,

the bigger number you divide SD by• -> Smaller SD -> less

spread out/more accurate!

Step 2: Calculate Probability

=t.dist(t-score, degrees of freedom, True)

Business Problems

I will walk you through an example, but first, we note that we cover this primarily so you will understand what comes later Direct business applications (or at least,

marketing applications) aren’t as common as for other techniques

‘Normal distribution’ Question

Heights for a particular segment are normally distributed, with an average of 176 cm, and a standard deviation of 7.1 cm. If you select an individual at random,

what is the probability that he has a height greater than 180 cm?

‘Normal distribution’ Question

Heights for a particular segment are normally distributed, with an average of 176 cm, and a standard deviation of 7.1 cm. If you select an individual at random,

what is the probability that he has a height greater than 180 cm?

=1 – norm.dist(180, 176, 7.1, true) ≈ 0.287

‘Sampling’ Question

Heights for a particular segment are normally distributed, with an average of 176 cm, and a standard deviation of 7.1 cm. If you select a random sample of size

5, what is the probability that the mean height is greater than 180 cm?

‘Sampling’ Question

Heights for a particular segment are normally distributed, with an average of 176 cm, and a standard deviation of 7.1 cm. If you select a random sample of size

5, what is the probability that the mean height is greater than 180 cm?

t = (180-176)/(7.1/sqrt(5)) = 1.259756607

prob =1 – t.dist(1.259756607, 7.1, true) ≈ 0.138

More practice

Repeat, with: Sample size of 15 Sample size of 30

What happens to the probability? Why?

at the beginning of the term, we talked about populations and samples what are they? why do we...

Documents

sample slide

given sample mean

sampling distribution

sample size increases

real population mean

probability distribution

average travel time

great deal of randomness