1 where do data come from and why we don’t (always) trust statisticians

22
1 Where do data come from and Why we don’t (always) trust statisticians.

Upload: gideon-peres

Post on 14-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Where do data come from and Why we don’t (always) trust statisticians

1

Where do data come from and Why we don’t (always) trust

statisticians.

Page 2: 1 Where do data come from and Why we don’t (always) trust statisticians

2

Induction vs. Deduction the gist of statistics

• Deduction: “What is true about the whole, must be true about a part.”

• Induction: “What is true about the part might be true about the whole.”

Page 3: 1 Where do data come from and Why we don’t (always) trust statisticians

3

Population vs. Sample

• Population is the entire group of individuals about which we want information.

• Sample is a part of population from which we actually collect information.

• We use samples to study population because, often, populations are impossible or impractical to study.

Page 4: 1 Where do data come from and Why we don’t (always) trust statisticians

4

Real Life Example of a Bad Sample

• Ann Landers, a famous columnist, collected a sample of 10,000 people who wrote in to answer this question: “If you could do it all over again, would you have children?”

• 70% of the respondents said that they would not have children.

• When a sample was selected at random, 91% of the people said that they would have children.

Page 5: 1 Where do data come from and Why we don’t (always) trust statisticians

5

Potential problems with sample surveys

• Undercoverage occurs when some groups in population are left out of the process of choosing the sample.

• Nonresponse occurs when an individual chosen for the sample cannot be contacted or refuses to respond.

Page 6: 1 Where do data come from and Why we don’t (always) trust statisticians

6

Another Real life Example of a Bad Sample

• In 1936 Literary Digest mailed out 10,000,000 ballots asking who the respondents are going to vote for – A. Landon or F.D. Roosevelt.

• 2,300,000 ballots were returned, predicting a strong win (57%) for Landon.

Page 7: 1 Where do data come from and Why we don’t (always) trust statisticians

7

Another Real life Example of a Bad Sample

• George Gallup surveyed 50,000 people chosen randomly.

• Comparison of forecasts:Gallup’s Prediction for Roosevelt 56%

Gallup’s prediction of Digest 44%

Digest prediction for Roosevelt 43%

Actual vote 62%

• Literary Digest used their subscription list, phone directory, lists of car owners, club members.

Page 8: 1 Where do data come from and Why we don’t (always) trust statisticians

8

Page 9: 1 Where do data come from and Why we don’t (always) trust statisticians

9

Right and Wrong Ways to Sample

• A simple random sample is a sample where (1) each unit of population has an equal chance of being chosen and (2) all units are chosen independently.

• The sample is biased if at least one group of individuals has greater chances of being selected.

Page 10: 1 Where do data come from and Why we don’t (always) trust statisticians

10

Example of a good sample

• You want to study effects of computers on GPA. You don’t have the resources to study all students.

• To select a sample of students for the study you– Get a list of all students,

– Select at random students on the list,

– Collect information from the students selected,

– Compare those who have computer with those who don’t.

Page 11: 1 Where do data come from and Why we don’t (always) trust statisticians

11

Example of a bad sample

• You want to study effects of computers on GPA. You don’t have the resources to study all students.

• To select a sample of students for the study you– Use your friends.– Hang an ad in the computer lab.– Post an on-line questionnaire on WKU site.

Page 12: 1 Where do data come from and Why we don’t (always) trust statisticians

12

Stratified Random Sample

• When we know proportions of each group in the population – Stratified random sample is better than SRS.

• In stratified sample, number of people chosen from each group is proportional to the size of that group in the population.

Page 13: 1 Where do data come from and Why we don’t (always) trust statisticians

13

Confounding

• Two explanatory variables are confounded when their effects on the response variable cannot be distinguished from each other.

• Confounding is often a problem with a study that uses sample surveys to collect data (even if sampling is done right).

Page 14: 1 Where do data come from and Why we don’t (always) trust statisticians

14

Observation vs. Experiment

• Observational study - observes individuals and measures variables but does not attempt to influence responses.

• Experiment imposes treatment on individuals to observe their responses.

Page 15: 1 Where do data come from and Why we don’t (always) trust statisticians

15

How to design an Experiment

• The purpose of an experiment is to find out how one variable (response variable) changes in response to change in another variable (explanatory variable).

• Experiment:Subject Treatment Response

Page 16: 1 Where do data come from and Why we don’t (always) trust statisticians

16

Placebo Effect

• Placebo effect – change in behavior due to participation in experiment.

• Placebo effect is a problem when experiment does not have a control group (a basis for comparison)

• To avoid the problem – design a randomized comparative experiment.

Page 17: 1 Where do data come from and Why we don’t (always) trust statisticians

17

How to design a Randomized Comparative Experiment

• Randomly split the subjects into two groups:– control group – receives no treatment– treatment group – receives treatment

• Compare the results.

• Both will be equally affected by Placebo effect, so the difference between the groups shows whether the treatment works.

Page 18: 1 Where do data come from and Why we don’t (always) trust statisticians

18

How to interpret results of an experiment• Observe outcomes for treatment and control

groups.

• If outcomes are different enough so that we can say that this difference would rarely occur by chance, we conclude that the difference is statistically significant.

Page 19: 1 Where do data come from and Why we don’t (always) trust statisticians

19

Population vs. Sample

• Population is the entire group of individuals about which we want information.

• Sample is a part of population from which we actually collect information.

• Based on the sample, we make conclusion about the whole population.

Page 20: 1 Where do data come from and Why we don’t (always) trust statisticians

20

Parameter vs. Statistic

• A Parameter is the number that describes the population.

• A Statistic is a number that describes the sample.

• We use statistics to estimate parameters.

Page 21: 1 Where do data come from and Why we don’t (always) trust statisticians

21

Sampling Distribution

• The result of your study is a statistic, which can vary from sample to sample

• Sampling Distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population

• Estimate=True Parameter + Sampling Error

Page 22: 1 Where do data come from and Why we don’t (always) trust statisticians

22

Bias and variability

• A statistic is biased if the mean of the sampling distribution is not equal to the true value of the parameter being estimated.

• Variability of a statistic is the spread of sampling distribution.

• Bias does not go away with larger samples.

• Variability goes away with larger samples.