confidence intervals i 2/1/12 correlation (continued) population parameter versus sample statistic...

36
Confidence Intervals I 2/1/12 • Correlation (continued) • Population parameter versus sample statistic • Uncertainty in estimates • Sampling distribution • Confidence interval Section 3.1 Professor Kari Lock Morgan Duke University

Upload: george-greer

Post on 28-Dec-2015

226 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence

Confidence Intervals I2/1/12

• Correlation (continued)• Population parameter versus sample statistic• Uncertainty in estimates• Sampling distribution• Confidence interval

Section 3.1 Professor Kari Lock MorganDuke University

Page 2: Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence

Correlation Guessing Gamehttp://istics.net/gett/gcstart.php?group_id=duke

Highest scorer in the class gets one extra point on the first exam!

Page 3: Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence

Correlation

3.0 3.5 4.0 4.5 5.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

Malevolence Rating of Uniform

z-sc

ore

for

Pen

alty

Yar

ds

r = 0.43

NFL Teams

Page 4: Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence

Correlation

3.0 3.5 4.0 4.5 5.0

-0.8

-0.4

0.0

0.4

Malevolence Rating of Uniform

z-sc

ore

for

Pen

alty

Yar

ds

Same plot, but with Dolphins and Raiders (outliers) removed

r = 0.08

Page 5: Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence

Human Cannonball

YX

Plot Y vs. X

What is the correlation between X and Y?

(a) r > 0(b) r < 0(c) r = 0

Are X and Y associated?(a) Yes(b) No

Page 6: Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence

Correlation Cautions

1.Correlation can be heavily affected by outliers. Always plot your data!

2. r = 0 means no linear association. The variables could still be otherwise associated. Always plot your data!

3.Correlation does not imply causation!

Page 7: Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence

Summary: Two Quantitative Variables

•Summary Statistics– Correlation

• Visualization– Scatterplot

Page 8: Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence

Variable(s) Visualization Summary StatisticsCategorical bar chart,

pie chartfrequency table,

relative frequency table, proportion

Quantitative dotplot, histogram,

boxplot

mean, median, max, min, standard deviation,

range, IQR,five number summary

Categorical vs Categorical

side-by-side bar chart, segmented bar chart,

mosaic plot

two-way table,difference in proportions

Quantitative vs Categorical

side-by-side boxplots statistics by group

Quantitative vs Quantitative

scatterplot correlation

Page 9: Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence

The Big Picture

Population

Sample

Sampling

Statistical Inference

Page 10: Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence

Parameter vs Statistic

• A sample statistic is a number computed from sample data.

• A population parameter is a number that describes some aspect of a population

• We usually have a sample statistic and want to make inferences about the population parameter

Page 11: Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence

The Big Picture

Population

Sample

Sampling

Statistical Inference

PARAMETERS

STATISTICS

Page 12: Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence

Parameter vs Statistic

 

mu

sigma

rho

beta

Page 13: Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence

Obama’s Approval Rating• Gallup surveyed 1500 Americans between Jan 28-30, 2012, and 46% of these people approve of the job Barack Obama is doing as president

• What do you think is the true proportion of Americans who approve of the job Barack Obama is doing as president?

http://www.gallup.com/poll/113980/Gallup-Daily-Obama-Job-Approval.aspx

Statistic: ˆ 0.46p

Parameter: ???p

Page 14: Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence

Point and Interval Estimates• The sample statistic gives a point estimate of the population parameter (a single number)

• Usually, it is more useful to provide an interval estimate which gives a range of plausible values for the population parameter:

• How do we determine the margin of error???

margistatis n of etic r rro

Page 15: Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence

Obama

Page 16: Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence

Obama’s Approval Rating

statistic

0.03

Interval Es

timate: 0.46

0.43,0.49ME

ˆPoint Estimate: 0.46p

• Between 43% and 49% of Americans currently approve of the job Obama is doing as president

Page 17: Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence

IMPORTANT POINTS• Sample statistics vary from sample to sample. (they will not match the parameter exactly)

• KEY QUESTION: For a given sample statistic, what are plausible values for the population parameter? How much uncertainty surrounds the sample statistic?

• KEY ANSWER: It depends on how much the statistic varies from sample to sample!

Page 18: Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence

Reese’s Pieces• What proportion of Reese’s pieces are orange?

• Take a random sample of 10 Reese’s pieces

• What is your sample proportion? class dotplot

• Give a range of plausible values for the population proportion

Page 19: Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence

Sampling Distribution

• A sampling distribution is the distribution of statistics computed for different samples of the same size taken from the same population

• The sampling distribution shows us how the statistic varies from sample to sample

• We can use the spread of the sampling distribution to determine the margin of error for a statistic

Page 20: Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence

Sampling Distribution

In the Reese’s pieces sampling distribution, what does each dot represent?

a) One Reese’s pieceb) One sample statistic

Page 21: Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence

Sampling Distribution

The higher the standard deviation of the sampling distribution, the

(a) higher

(b) lower

the margin of error

Page 22: Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence

Sample Size

• For a larger sample size you get less variability in the statistics, so less uncertainty in your estimate

n = 10 n = 50 n = 100

http://www.rossmanchance.com/applets/Reeses/ReesesPieces.html

Page 23: Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence

Sampling Distribution

• A sampling distribution is the distribution of statistics computed for different samples of the same size taken from the same population

• The sampling distribution shows us how the statistic varies from sample to sample

• This gives us an idea for the uncertainty surrounding the estimate of a parameter

Page 24: Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence

Random Samples

• If you take random samples, the sampling distribution will be centered around the true population parameter

• If sampling bias exists (if you do not take random samples), your sampling distribution may give you bad information about the true parameter

Page 25: Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence

Lincoln’s Gettysburg Address

Page 26: Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence

Confidence Interval

• A confidence interval for a parameter is an interval computed from sample data by a method that will capture the parameter for a specified proportion of all samples

• The success rate (the proportion of all samples whose intervals contain the parameter) is known as the confidence level

• A 95% confidence interval will contain the true parameter for 95% of all samples

Page 27: Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence

Confidence Intervalshttp://bcs.whfreeman.com/ips4e/cat_010/applets/confidenceinterval.html

• The parameter is fixed

• The statistic is random (depends on the sample)

• The interval is random (depends on the sample)

ParameterSampling Distribution

Page 28: Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence

If you had access to the sampling distribution, how would you find the margin of error to ensure that intervals of the form

would capture the parameter for 95% of all samples?

Sampling Distribution

margistatis n of etic r rro

Page 29: Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence

•The standard error (SE) of a statistic is the standard deviation of the sample statistic

•A 95% confidence interval can be created by

Standard Error

statistic 2 SE

http://bcs.whfreeman.com/ips4e/cat_010/applets/confidenceinterval.html

Page 30: Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence

EconomyA recent survey of 1,502 Americans in January 2012 found that 86% consider the economy a “top priority” for the president and congress this year.

The standard error for this statistic is 0.01.

What is the 95% confidence interval for the true proportion of all Americans that consider the economy a “top priority” for the president and congress this year?

(a) (0.85, 0.87)(b) (0.84, 0.88)(c) (0.82, 0.90)

http://www.people-press.org/2012/01/23/public-priorities-deficit-rising-terrorism-slipping/

Page 31: Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence

Summary• To create a plausible range of values for a parameter:

• Take many random samples from the population, and compute the sample statistic for each sample

• Compute the standard error as the standard deviation of all these statistics

• Use statistic 2SE

• One small problem…

Page 32: Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence

Reality

… WE ONLY HAVE ONE SAMPLE!!!!

• How do we know how much sample statistics vary, if we only have one sample?!?

… to be continued

Page 33: Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence

Project 1• Pose a question that you would like to investigate. If

possible, choose something related to your major!

• Find or collect data that will help you answer this question (you may need to edit your question based on available data)– If using existing data, you have to find your own (do not

use a dataset already used in this class)– If collecting data, wait until your proposal has been

approved to collect the data

• You can choose either a single variable or a relationship between two variables

Page 34: Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence

Project 1• The result will be a five page paper including

– Description of the data collection method, and the implications this has for statistical inference

– Descriptive statistics (summary stats, visualization)– Confidence intervals– Hypothesis testing (following week)– Distribution-based inference (after Exam 1)

• Proposal due 2/15– Can submit earlier if want feedback sooner– Include data if you are using existing data – If collecting your own data, proposal should include a

detailed data collection plan

Page 35: Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence

To Do• Homework 2 (due Monday)

• Idea and data for Project 1 (proposal due 2/15)

Page 36: Confidence Intervals I 2/1/12 Correlation (continued) Population parameter versus sample statistic Uncertainty in estimates Sampling distribution Confidence

FINDING DATA

http://library.duke.edu/data/

Joel Herndon