at the beginning of the term, we talked about populations and samples what are they? why do we...

37
Sampling

Upload: posy-mcbride

Post on 13-Dec-2015

222 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

Sampling

Page 2: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

Why We Sample

At the beginning of the term, we talked about populations and samples What are they? Why do we take samples?

Page 3: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

Sampling

Generally, we want to know about the population But, studying/surveying the entire

population is problematic!▪ Too costly▪ May be impossible!

Page 4: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

Sampling

So, we typically study samples rather than entire populations But, we are not usually interested in the

sample itself We hope that the sample will give

us insight into the population

Page 5: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

Sampling

Starting here, we will look at the relationship between samples and populations What we can learn How precise/reliable the information is

Page 6: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

Sample Mean vs. Population Mean

Suppose we were interested in knowing the average travel time for students coming to Seneca We don’t want to ask every Seneca

student So, we take a sample We hope that the sample mean will give

us insight into the population mean

Page 7: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

Sample Mean vs. Population Mean

Will the sample mean be exactly equal to the population mean?

Page 8: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

Sample Mean vs. Population Mean

Will the sample mean be exactly equal to the population mean? No, because it depends on exactly who

winds up in our sample

Page 9: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

Sample Mean vs. Population Mean

Will the sample mean be the same same for every sample?

Page 10: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

Sample Mean vs. Population Mean

Will the sample mean be the same same for every sample? No, because it depends on exactly who

winds up in our sample

Page 11: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

Let’s Try This!

Get into groups (samples) of two, and calculate your average travel time

Page 12: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

Key Points

1. The sample mean is RANDOM Depends on exactly who winds up in the

sample

Page 13: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

Sample Mean vs. Population Mean

Do these samples give us reliable estimates of the population mean?

Page 14: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

Sample Mean vs. Population Mean

Do these samples give us reliable estimates of the population mean? VERY SMALL -> Subject to a great deal of

randomness

Page 15: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

Let’s Try It Again…

Groups of 3

Page 16: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

Let’s Try It Again…

Groups of 5

Page 17: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

Let’s Try It Again…

Groups of 10

Page 18: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

Key Points

1. The sample mean is RANDOM Depends on exactly who winds up in the

sample

2. The larger the sample, the more likely that the sample mean will be close to the population mean

In larger samples, the randomness tends to ‘average out’, meaning less random fluctuation from sample to sample

Larger samples give more reliable results

Page 19: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

Implications

Because the sample mean is random, we can describe it using a probability distribution I.e., for any given sample mean, there is

some probability And, we can talk about, ‘what is the

probability that we get a sample mean in the range ______?’

Called the ‘sampling distribution’

Page 20: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

What Does the Sampling Distribution Look Like?

Depending on the actual raw data distribution, the distribution of the sample mean can have many different shapes In the next slide, we look at three

different data distributions, and what the distribution of the sample means looks like▪ When sample size, n, =2

Page 21: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

What Does the Sampling Distribution Look Like?

Source: Dawson B, Trapp RG: Basic & Clinical Biostatistics, 4th edition

Raw DataDistribution of Sample Mean,

n=2

Page 22: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

What Does the Sampling Distribution Look Like?

Those distributions look strange!

But, as sample size increases, wonderful things happen: First, the sample mean gets more

accurate▪ The distribution gets narrower▪ I.e., the probability of getting a sample

mean far from the real population mean is low

Second, the distribution changes shape

Page 23: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

What Does the Sampling Distribution Look Like?

Source: Dawson B, Trapp RG: Basic & Clinical Biostatistics, 4th edition

Raw DataDistribution of Sample Mean,

When n=2

When n=10When n=30

Page 24: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

Central Limit Theorem

As we take larger samples, the distribution of the sample mean approaches the normal distribution! (Almost) regardless of the shape of the actual

data!

Because of this, we can use what we have learned about the normal distribution to, e.g., judge how reliable/accurate our sample results are!

Page 25: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

T-Distribution

As discussed, if the sample size is large, the sampling distribution approaches the normal distribution But, its not exactly equal to the normal

distribution▪ Especially if n is small!

For this reason, we have another distribution that we use, which is closely related

Page 26: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

T-Distribution

T distribution takes sample size into account

T is wider and flatter than normal The smaller the

sample, the wider and flatter!▪ Reflecting that the

information is less reliable▪ I.e., that we are more

likely to get a result far from the real population mean

-3-2

.5 -2-1

.5 -1-0

.5 00.

5 11.

5 2

2.50

0000

0000

0001

3.00

0000

0000

0001

0

0.1

0.2

0.3

0.4

NormalT, n=2T, n=4

Page 27: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

T-Distribution

T use the t-distribution we need to provide degrees of freedom This is just n – 1▪ (Sample size – 1)

Page 28: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

Understanding Sample Mean

We can use the t-distribution to determine the probability of getting a mean in a given range, in the same way we used the normal distribution to find the probability of getting a value in a certain range

Page 29: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

Solving Sample Questions When using t, no built-in ‘one-step’ like

norm.dist

2-step process1. Convert the x-value(s) into t-scores▪ Like z-scores!

2. Use the t-score(s) to look up the probability▪ Using t.dist▪ And the same structure: ‘Less than’ -> t.dist;

‘Greater than’ -> 1-t.dist; ‘Between’ -> t.dist(big) – t.dist(small)

Page 30: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

Step 1: T-score

Recall: z = (value – mean)/SD

T-score: t = (value – mean)/(SD/sqrt(n))

Divide standard deviation by square root of sample size• The bigger the sample size,

the bigger number you divide SD by• -> Smaller SD -> less

spread out/more accurate!

Page 31: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

Step 2: Calculate Probability

=t.dist(t-score, degrees of freedom, True)

Page 32: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

Business Problems

I will walk you through an example, but first, we note that we cover this primarily so you will understand what comes later Direct business applications (or at least,

marketing applications) aren’t as common as for other techniques

Page 33: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

‘Normal distribution’ Question

Heights for a particular segment are normally distributed, with an average of 176 cm, and a standard deviation of 7.1 cm. If you select an individual at random,

what is the probability that he has a height greater than 180 cm?

Page 34: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

‘Normal distribution’ Question

Heights for a particular segment are normally distributed, with an average of 176 cm, and a standard deviation of 7.1 cm. If you select an individual at random,

what is the probability that he has a height greater than 180 cm?

=1 – norm.dist(180, 176, 7.1, true) ≈ 0.287

Page 35: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

‘Sampling’ Question

Heights for a particular segment are normally distributed, with an average of 176 cm, and a standard deviation of 7.1 cm. If you select a random sample of size

5, what is the probability that the mean height is greater than 180 cm?

Page 36: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

‘Sampling’ Question

Heights for a particular segment are normally distributed, with an average of 176 cm, and a standard deviation of 7.1 cm. If you select a random sample of size

5, what is the probability that the mean height is greater than 180 cm?

t = (180-176)/(7.1/sqrt(5)) = 1.259756607

prob =1 – t.dist(1.259756607, 7.1, true) ≈ 0.138

Page 37: At the beginning of the term, we talked about populations and samples  What are they?  Why do we take samples?

More practice

Repeat, with: Sample size of 15 Sample size of 30

What happens to the probability? Why?