essentialsofinferential statistics 2016deannaal/statistics_textbook.pdf · essentials of...

77
Dianna Cichocki ESSENTIALS OF INFERENTIAL STATISTICS 2016

Upload: others

Post on 13-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Dianna Cichocki

ESSENTIALS OF INFERENTIALSTATISTICS2016

Page 2: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

ESSENTIALS OF INFERENTIALSTATISTICS2016

Dianna CichockiUniversity at Buffalo

E-Assign LLC

Page 3: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Copyright © 2016 by E-Assign LLC

All rights reserved. This book or any portion thereof may not be reproduced or used in any mannerwhatsoever without the express written permission of the publisher except for the use of briefquotations in a book review.

Essentials of Inferential Statistics 2016 is an independent textbook.

Printed in the United States of America

First Printing, 2016

ISBN

E-Assign LLCwww.e-assign.com

Page 4: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

This book is dedicated to four of the blessings in my life.My supportive husband Jason

and my amazing children,Rachel, Maranda and Connor.

Page 5: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America
Page 6: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

v

Table of ContentsChapter 1 – Probability Distributions in MS Excel

1.1 - Binomial Probability Distributions 1

1.2 - Normal Probability Distributions 6

1.3 - Sampling and the Central Limit Theorem 10

1.4 – Sampling and the Finite Correction Factor 14

Chapter 2 – Estimation and Inference2.1 - Confidence Interval Vocabulary 17

2.2 - Critical Values 19

2.3 - Estimate Population Mean with Sigma Known 23

2.4 - Estimate Population Mean with Sigma Unknown 23

2.5 - Estimate Population Proportion 24

2.6 - Determine Minimum Sample Size 28

Chapter 3 – Hypothesis Testing3.1 - Hypothesis Testing (vocabulary, overview) 32

3.2 - P-Values 38

3.3 - Conducting Single Parameter Hypothesis Tests 42

3.4 - Testing Parameters from Two Independent Samples 45

3.5 - Testing the Difference of Means from a Paired Sample 48

Chapter 4 – Linear Regression4.1 - Simple Linear Regression 50

4.2 - MS Excel and Single Variable Regression Analysis 55

4.3 - Multiple Variable Regression 59

4.4 - Qualitative Variables (Dummy Variables) 62

AppendixA: Useful Excel Formulas

B: Confidence Interval & Hypothesis Testing Steps

B: Standard Normal Table & Student’s T-Distribution Table

Page 7: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

vi

What will your response be?

When I am asked what I do for a living, I respond by saying, “I am a Professor at the University at Buffalo.”This typically elicits a warm grin from the person asking the question. However, I realize what is comingnext. The person always follows up with asking, “What do you teach?” There it is, the dreaded question.Of course, I respond by saying, “I teach Statistics.”

This is where the conversation goes one of two ways. Approximately 80% of the time, the person willcringe and go into an explanation of how they did their best to avoid Statistics, or how they passedStatistics by sheer luck. This is where I remain silent and simply listen as the person informs me of howawful their Statistics experience was. When they are finished, I do my best to divert the conversation tosomething positive. There is no point in going any further.

I am typically shocked when 20% of the time the person responds, “I had a great experience withStatistics!” or “I enjoyed Statistics!” Now I am eager to continue the conversation and I ask, “What didyou like the best about Statistics?” The majority of those who enjoyed their experience in Statistics willstate it was because they had a great instructor or because their instructor did not focus on themathematics of statistics, but rather the application and they found the application of the material veryuseful. Then there are the rare few, like me, that say, “I just enjoy math.”

This book is written for the majority. It is written so that you will learn the essentials of Inferential Statisticsin a practical manner. You will not be asked to prove any formulas or derive equations. In fact, this text iswritten as a set of notes that will help guide you through the process of making sense out of every daydata and how to make decisions based on probability rather than intuition.

This textbook is purposely written to accompany my lectures. To get the most out of this text, you mustcommit to viewing the lectures and following along in the text. This method allows you to take an activerole in your learning. When you watch the lectures and follow along with this text you will be engagingthree of your five senses (hearing, sight and touch). Using multiple senses allows for more cognitiveconnections which can help information processing. How I wish I could figure out how to edit this text totrigger your senses of taste and smell! For now, I will settle for actively engaging 60% of your senses.

The title of this preface is, “What will your response be?” My hope is that someday, if you happen to runinto a Professor of Statistics, your response will shock them as you say, “I had a great experience withStatistics!”

Let’s get started…

Page 8: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Probability Distributions in MS Excel 1

Chapter 1 Probability Distributions in MS Excel

IntroductionThe material in this chapter is review. In previous course work you have studied probability distributions.However, understanding probability distributions is an essential skill in Statistics. Therefore, in thischapter we will review probability distributions using MS Excel, rather than formulas and tables, tocompute probabilities.

Objectives1. Define the characteristics of a Binomial Probability Distribution2. Compute the mean, variance and standard deviation of a Binomial Probability Distribution3. Use MSExcel to compute Binomial Probabilities4. Define the characteristics of Normal Probability Distributions5. Compute the Z-score for a random variable6. Use MSExcel to compute Normal Probabilities (Standard Normal, Normal, Inverse Normal)7. Use MSExcel to compute probabilities based on Sampling (Central Limit Theorem)

1.1 Binomial Probability DistributionsThe binomial probability distribution is a result of a binomial experiment. A binomial experiment has twooutcomes that are mutually exclusive (cannot overlap). These outcomes are often referred to as “success”and “failure”. This is also known as a Bernoulli Trial (named after a Swiss mathematician). Such statisticalexperiments have the following 4 characteristics:

1. The experiment consists of n repeated trials.2. Each trial has only 2 possible outcomes; success and failure.3. The probability of success, p, is the same for each trial. Thus, the probability of failure, q or (1-p),

is also the same for each trial.4. The trials are independent. Thus, the outcome of one trial has no influence on the outcome of

subsequent trials.

Keep in mind that the experiment need not only have 2 possible results (such as flipping a coin) to haveonly 2 possible outcomes. Success and failure may be defined by the experiment. In other words,success might be drawing an Ace from a deck of cards and failure may be drawing any other card.Success might be voting for candidate A and failure might be voting for any other candidate. Do notconfuse outcomes with options. A binomial experiment has only 2 possible outcomes. However, theremay be several options.

MS Excel Formula for Binomial Probability Distributions

Page 9: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

2 Chapter 1

Use =BINOMDIST when determining probabilities for discrete distributions given the 4 characteristicsabove. The =BINOMDIST function in MS Excel depends on the sample size, n, and the probability ofsuccess, p.

Choose the correct MS Excel formula for each of the following probabilities:

ExactFind P(x = 5) if n = 10, p = 0.4, and q = 0.6

o =BINOMDIST(5,10,0.4,true)o =BINOMDIST(5,10,0.4,false)o =BINOMDIST(5,10,0.6,true)o =BINOMDIST(5,10,0.6,false)

Less thanFind P(x < 5) if n = 10, p = 0.4, and q = 0.6

o =BINOMDIST(5,10,0.4,true)o =BINOMDIST(5,10,0.4,false)o =BINOMDIST(4,10,0.4,true)o =BINOMDIST(4,10,0.4,false)

At mostFind P(x ≤ 5) if n = 10, p = 0.4, and q = 0.6

o =BINOMDIST(5,10,0.4,true)o =BINOMDIST(5,10,0.4,false)o =BINOMDIST(5,10,0.6,true)o =BINOMDIST(10,5,0.4,true)

Page 10: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Probability Distributions in MS Excel 3

Greater than/ExceedFind P(x > 5) if n = 10, p = 0.4, and q = 0.6

o =1-BINOMDIST(5,10,0.4,true)o =1-BINOMDIST(5,10,0.4,false)o =1-BINOMDIST(6,10,0.4,true)o =1-BINOMDIST(6,10,0.4,false)

At leastFind P(x ≥ 5) if n = 10, p = 0.4, and q = 0.6

o =1-BINOMDIST(5,10,0.4,true)o =1-BINOMDIST(5,10,0.4,false)o =1-BINOMDIST(6,10,0.4,true)o =1-BINOMDIST(6,10,0.4,false)

BetweenFind P(3≤ x ≤ 6) if n = 10, p = 0.4, and q = 0.6

o =BINOMDIST(2,10,0.4,true)-BINOMDIST(6,10,0.4,true)o =BINOMDIST(3,10,0.4,false)-BINOMDIST(6,10,0.4,false)o =BINOMDIST(6,10,0.4,true)-BINOMDIST(2,10,0.4,true)o =BINOMDIST(6,10,0.4,true)-BINOMDIST(3,10,0.4,true)

Page 11: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

4 Chapter 1

MEAN (Expected Value) & STANDARD DEVIATION of a BINOMIAL DISTRIBUTIONThe expected value (most likely value to occur; highest probability) is the mean of a binomialdistribution. Recall that binomial distributions are discrete. Thus, the expected value will be a countablevalue and found at the highest point of the distribution. The standard deviation describes the spread ofthe data. As the standard deviation increases, the spread of the data increases (the distributionbecomes wider and flatter). As the standard deviation decreases, the data becomes less spread out(more centered around the mean; narrower and taller).

SHAPE of Binomial Distributions changes as p changes (n indicates the range ofthe distribution)What do you notice about the shape of each of the following binomial distributions? What is theexpected value? How are the distributions shaped? Could you generalize the shape of a distributionbased on the probability of success (p)?

Page 12: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Probability Distributions in MS Excel 5

Section 1.1 Practice1. Suppose you work for a travel agency and you have read that twenty percent of newlyweds pay fortheir honeymoon themselves. You randomly select 25 newlyweds and ask each if they paid for theirhoneymoon themselves. Find the probability that the number of newlyweds who say they paid for theirhoneymoon themselves is:

a. Exactly 15

b. At least 10

c. Less than 21

d. Greater than 15

e. Between 10 and 20, inclusive

2. About 30% of college students earn at least $400 per month. You randomly select 30 collegestudents and ask each if he or she earns at least $400 per month.

a. What is the probability that more than 20 of them will answer yes?

b. What is the probability that at most 20 of them will answer no?

c. What do you notice about the results of part A and part B?

d. How many college students, out of the 30 students sampled, would you expect to earn at least$400 per month?

Page 13: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

6 Chapter 1

1.2 Normal Probability Distributions

The normal distribution (normal curve) is a family ofcontinuous probability distributions (measureable;take on any numerical value in an interval). The graph ofthe normal distribution depends on two factors – themean and the standard deviation. The mean is thecenter of the distribution and the standard deviationrepresents the spread of the distribution. When themean of a normal distribution increases, the curve shiftsto the right. When the standard deviation of a normaldistribution decreases, the spread decreases and thusthe curve becomes tall and narrow. As the standarddeviation increases, the spread increases and the curvebecomes shorter and wider.

Here are the top 7 characteristics of a normal distribution:

1. Normal distributions are symmetric around the mean.2. The mean, median and mode of a normal distribution are equal.3. The total area under the normal curve is equal to 1 or 100%.4. Normal distributions are denser in the center and less dense in the tails.5. Normal distributions are defined by the mean (μ) and standard deviation (σ).6. The probability of a single random variable, X, is zero. (Thus, there are no “exact” probabilities)7. The Empirical Rule applies to normal distributions. Whereas, approximately 68% of the data falls

within 1 standard deviation of the mean, 95% of the data falls within 2 standard deviations ofthe mean and approximately 99.7% of the data falls within 3 standard deviations of the mean.

Page 14: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Probability Distributions in MS Excel 7

Normal Distribution versus Standard Normal Distribution

Normal distributions may take on any value for the mean and standard deviation as the distribution iscontinuous. The problem is that not every distribution has the same mean and/or standard deviation. Inorder to make comparisons between data sets with different means and standard deviations, we needto standardize the distributions. Comparisons between data sets should not be made if the distributionsare not set to the same scale. The standard normal distribution is a normal distribution with a mean of0 (μ = 0) and standard deviation of 1 (σ = 1). Converting a normal distribution into a standard normaldistribution “standardizes” the data and allows for comparison between data sets.

*The z-score formula allows us to standardize data.

MS Excel Formulas for Normal Probability Distributions

NORMS versus NORM and DIST versus INV

The letter S in the formula for the normal distribution indicates a Standard normal distribution. Thus,use =NORMSDIST when determining probabilities for standard normal distributions (given z-scores orwhen μ = 0 and σ = 1). Use =NORMDIST when determining probabilities for any normal distribution.

Use DIST (distribution; area under the curve) when determining probabilities of a given random variable.Use INV (inverse) when determining the random value from a given probability or percentile (area lessthan a random variable).

Page 15: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

8 Chapter 1

Section 1.2 Practice

1. P(z < 1.60)

2. P(z > 1.60)

3. P( -1.25 < z < 2.75)

4. Suppose that μ = 100 and σ = 50 for a normal distribution. Find the x value at the top 10%.

5. Suppose income is normally distributed for a group of workers, with μ = $45,000 and σ = $5,000.Find the probability that a randomly selected worker from this group has an income between $38,000and $48,000

Page 16: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Probability Distributions in MS Excel 9

6. A survey was conducted to measure the height of U.S. males. In the survey, respondents weregrouped by age. In the 20-29 age group, the heights were normally distributed, with a mean of 69.2inches and a standard deviation of 2.9 inches. A study participant is randomly selected,

a. Find the probability that the participant’s height is less than 68.5 inches.

b. Find the probability that the participant’s height is between 67 and 73 inches.

c. Find the probability that the participant’s height is more than 71 inches.

d. What height represents the top 10% of heights in the distribution?

e. What height represents the third quartile of the distribution?

f. A doorway is to be constructed so that 95% of U.S. males, age 20-29, will be able to enter thedoorway. What is the height of the door rounded up to the nearest inch?

g. Suppose there are 20 U.S. males, age 20-29, in a room. How many are expected to be between67 and 73 inches tall?

Page 17: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

10 Chapter 1

1.3 Sampling and the Central Limit Theorem

If a sample is selected properly and the analysis performed correctly, sample information can be used tomake an accurate assessment of the entire population. This is the beginning of inferential statistics orusing sample statistics to make inferences (estimates) regarding the population parameters.

• The Central Limit Theorem states that the sample means of large-sized samples will be normallydistributed regardless of the shape of their population distributions.

• The sampling distribution of the mean describes the pattern that sample averages tend tofollow when randomly drawn from a population

• According to the central limit theorem, sample means from samples of sufficient size, drawnfrom any population, will be normally distributed.

– In most cases, sample sizes of 30 or larger will result in sample means being normallydistributed, regardless of the shape of the population distribution

– If the population follows the normal probability distribution, the sample means will alsobe normally distributed, regardless of the shape of the samples.

Let’s use dice to explore the central limit theorem.If we have 1 single die, the random variables for thesum of the roll are the numbers 1 to 6. Thetheoretical probability of each sum is 1/6. This is auniform distribution.

If we have 2 dice, the random variables for the sumof the dice are the numbers 1 to 12. The theoreticalprobability is no longer uniform as the sum of 7 isthe expected value. Thus, increasing the sample sizefrom 1 die to 2 dice has altered the graph.

If we have 3 dice, the random variables for the sumof the dice are the numbers 1 to 18. Now the sumsof 10 and 11 are the expected values as these sumshave the highest probability of occurring. Noticehow the graph is appearing more bell-shaped.

As we add dice to the experiment, it can be seenthat the graph approaches the normal distribution.

Page 18: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Probability Distributions in MS Excel 11

Sampling Distribution of the Mean

Suppose we draw all possible samples of a certain size (n) from an infinite population. For every sampleof size (n) we compute a sample mean. The resulting probability distribution of this statistic is called asampling distribution of the mean. In order to compute probabilities for the distribution of the samplemeans we must know the mean and standard deviation of the sampling distribution.

• The mean of the sample means, computed from allrandom samples of size n, is the expected value of allpossible sample means. The best approximation for theexpected value of the sample means is the populationmean.

• The standard deviation of the sample means, computedfrom all random samples of size n, is equal to thepopulation standard deviation divided by the squareroot of the sample size. This is known as the standarderror of the mean.

The standard error of the mean is an important concept that you will need for the remainder of thiscourse! In addition to the standard error of the mean, we will study the standard error of theproportion.

Sampling Distribution of the Proportion

Suppose the next time we draw all possible samples of size (n) from an infinite population, we computea sample proportion instead of a sample mean. The resulting probability distribution of this statistic iscalled a sampling distribution of the proportion.

• The expected value of the sample proportions,computed from all random samples of size n, is thepopulation proportion. The best approximation for theexpected value of the sample proportion is thepopulation proportion.

• The standard deviation of the sample proportion,computed from all random samples of size n, is equal tothe square root of the product of the populationproportion of success and the population proportion offailure divided by the sample size. This is known as thestandard error of the proportion.

Page 19: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

12 Chapter 1

The Central Limit Theorem is applied when the distribution is comprised of SAMPLES. Students are oftenconfused when to apply the Central Limit Theorem. If you are determining the probability based onsamples of the population, you should consider whether the central limit theorem applies.

If the population distribution is known to be normally distributed, the size of the sample doesnot matter. Any sample size will produce a normal sampling distribution.

If the population distribution is unknown (or known to be non-normal) and the size of thesample is at least 30, the distribution of the sample means will be normally distributed.

If the population distribution is unknown (or known to be non-normal) and the size of thesample is less than 30, the distribution of the sample means might be normally distributed. Thiswill depend on the distribution of the original population. If the skewness of the originalpopulation is unknown, assume that the normal distribution will not apply.

MS Excel Formula for Probability Distributions with CLT (Central Limit Theorem)

Section 1.3 Practice1. Suppose ‘Buffalonians’ (people born and raised in Buffalo, NY) drive an average of 12,000 miles peryear with a standard deviation of 2,580 miles per year.

a. What is the probability that a randomly selected driver will drive more than 12,500 miles?

b. What is the probability that a randomly selected sample of 36 drivers will drive, on average,more than 12,500 miles?

Page 20: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Probability Distributions in MS Excel 13

2. Suppose American Express reports that credit card balances are normally distributed, with a meanof $2800 & a standard deviation of $500.

a. What is the probability that a randomly selected credit card holder has a credit card balance lessthan $2500?

b. You randomly select 25 credit card holders. What is the probability that their mean credit cardbalance is less than $2500?

3. Suppose a claim is made that the mean age of members of a local YMCA is 35 years. The populationstandard deviation is 5 years. Evaluate this claim if a random sample of 25 members has a mean age of36.5 years.

4. Suppose a claim is made that the mean age of members of a local YMCA is at most 35 years. Thepopulation standard deviation is 5 years. Evaluate this claim if a random sample of 25 members has amean age of 34.5 years.

Page 21: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

14 Chapter 1

1.4 Sampling and the Finite Correction Factor

Assuming the central limit theorem applies, an adjustment to the standard error calculation for bothmeans and proportions may be needed for sample sizes thatrepresent a significant portion of the overall population.

You will notice that in statistics “significance” is often referred toas a threshold of 5%. The reasoning is due to the Empirical Rule.Recall that the Empirical Rule states that 95% of data will fallwithin 2 standard deviations of the mean. Thus, 5% will fall outside2 standard deviations. Data that falls outside of the 95% thresholdis considered “significantly” different than the rest of the data.

For sample sizes (without replacement) that represent a significantportion of the population, a correction factor is needed to slightly reduce the standard error. Therefore,more precise estimates are calculated when the finite population correction factor is applied.

Small populations require an adjustment to the standard error of the mean calculation if– the proportion n/N is greater than 5% and– the sampling is without replacement

Formula for the Standard Error of the Mean with Finite Correction Factor

Formula for the Standard Error of the Proportion with Finite Correction Factor

The finite correction factor is only applied when the population size is known (finite), the sampling isconducted without replacement and the ratio of the sample size to the population size is more than 5%.Often students will tell me that their answer was only “off” by a very small decimal value. Most of thetime, the reason is because the student neglected to apply the finite correction factor.

Page 22: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Probability Distributions in MS Excel 15

Section 1.4 Practice1. The mean rent of an apartment in a professionally managed apartment building of 200 apartmentsis $800. You randomly select 16 professionally managed apartment buildings. What is the probabilitythat the average rent of the sample is less than $830? Assume that the rents are normally distributed,with a standard deviation of $100.

2. For a normal population, with a mean of 50 and a standard deviation of 10,a. Determine the probability of observing a sample mean of 45 or more from a sample of 25.

b. Determine the probability of observing a sample mean of 45 or more from a sample of 25.Assume the size of the population is 400.

Page 23: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

16 Chapter 1

3. Assume that the probability of success for a population is 40%. If random samples of size 25 aredrawn (without replacement)

a. Determine the probability of observing a sample proportion of 35% or more.

b. Determine the probability of observing a sample proportion of 35% or more if the populationsize is known to be 250.

Page 24: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Confidence Intervals 17

Chapter 2 Confidence Intervals

IntroductionIn Chapter 1, we reviewed how to determine probabilities of discrete and continuous distributions usingMS Excel. For the remainder of this text, we will be working with samples of continuous distributions.Whether or not these samples are normally distributed, will depend on the distribution of the populationand the sample size.

When the population is normally distributed, the distribution of the sample means and sampleproportions will also be normally distributed.

When the population distribution is unknown, as long as the sample size is large enough (typicallyat least 30), the distribution of the sample means and sample proportions will also be normallydistributed.

The remaining chapters in this text will focus on inferential statistics, or using sampling distributions tomake inferences about the population. Confidence intervals, hypothesis testing and regression analysisall utilize sample data to make predictions concerning populations.

“We seldom know reality, what we know is the results of observing a sample of reality”

“Confidence Intervals, Hypothesis Testing and P-Values are ways of telling us how much of a mistake weare likely to make”

Objectives1. Compute Critical Value(s) for Confidence Intervals (sIgma known & unknown)2. Use the Standard Normal Table and Student’s T-Distribution Table to compute Critical Value(s)3. Compute Margins of Error for Confidence Intervals of Parameters (means & proportions)4. Compute Confidence Intervals for parameters (means & proportions)5. Adjust Margin of Error for Finite Correction Factor when appropriate6. Determine the minimum sample size for Confidence Intervals

2.1 Confidence Interval Vocabulary

Confidence Intervals provide a RANGE of possible values for a parameter based on sample statistics.Given repeated sampling, the parameter would fall between a lower confidence limit & an upperconfidence limit with a specified level of confidence. These limits are determined based on the margin oferror and a point estimate.

Page 25: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

18 Chapter 2

The point estimate is the sample statistic or middle point of the interval estimate.

The margin of error is the distance between the point estimate and the limits of the confidence interval.

The width of the confidence interval is the distance between the limits of the confidence interval.

The level of confidence (1 - α) is the probability (or percentage) of all possible samples that can beexpected to include the true population parameter and determines the width of the confidence interval.

The level of significance (α) is the complement of the level of confidence. The level of significance is alsothe probability that the true mean is not contained in the interval.

Visual for Confidence IntervalIn this diagram below, the true population proportion (parameter) is 70%. Below the horizontal axis wesee 20 confidence intervals created from a random sample of the population. We see that 19 of the 20confidence intervals “capture” the true parameter of 70%. This represents the level of confidence. Thus,with repeated sampling it is expected that 95% of the intervals will include the true parameter.Therefore, there is 1 interval that does not “capture” the true parameter (1/10 = 5%).

Thus, we state that we are 95% confident that the true proportion will be between 65% and75% with a 5% margin of error.

This level of confidence is not a probability that a sample proportion will result in limits of 65% to 75%.The sample proportion is already known, it is the point estimate.

The level of confidence indicates the likelihood (probability) that interval will contain thepopulation parameter upon repeated sampling.

ConfidenceIntervals areALWAYStwo-taileddistributions

Page 26: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Confidence Intervals 19

Confidence Intervals for Decision MakingLet’s assume 2 candidates are running for office. If we randomly poll individuals and discover that 48%of those surveyed plan to vote for candidate A and 53% plan to vote for candidate B, does that meanthat candidate B will be the winner?

Remember, our results are from a random sample and do not represent the entire population.Therefore, we must assume a margin of error. Our margin of error depends on our level of confidence(which produces a critical value (see section 2.2)) and the standard error of the distribution (fromsection 1.3).

Let’s assume our margin of error is 5% for a 90% level of confidence. This means that we are 90%confident that the true proportion of voters who will choose candidate A is between 43% and 53%.Meanwhile, we are also 90% confident that the true proportion of voters who will choose candidate B isbetween 48% and 58%. Thus, given repeated sampling of the population, we expect that the trueproportion will fall within our intervals 90% of the time.

As you can see, the intervals overlap and both intervals exceed 50%. It is a possibility that candidate Awill receive only 48% of the votes and candidate B will prevail with 53% of the votes. However, it is alsoa possibility that candidate A will win 53% of the votes and candidate B will only gain 48% of the votes.Therefore, this is a tight race. We cannot confidently tell from our sample who will win.

2.2 Critical Values

As was mentioned in the above scenario, the specified level of confidence produces a Critical Value.Critical Values are the boundary lines or the number of standard errors away from the center of thedistribution based on the level of confidence.

As the level of confidence increases the critical values will increase or move further into the tails of thedistribution. As the level of confidence decreases the critical values will decrease or move closer to thecenter of the distribution.

Page 27: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

20 Chapter 2

Standard Normal Distribution (z-critical values)Use the Standard Normal Distribution to determine z-critical values for inference of,

– the population mean (μ) when the population standard deviation (σ) is known.– the population proportion (p).

Summary of Common Z-Critical Values

Level ofConfidence

0.90 0.95 0.98 0.99

One-Tail

Two-Tail

Z Critical Value

---------------------------------------------------------------------------------------------------------------------------------------

Excel formulas to compute z-critical values (general, then with alpha = 0.10)

Left Tailed:

Right Tailed:

Two-tailed:

Page 28: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Confidence Intervals 21

Student’s T-Distribution (t-critical values)Use the Student’s T-Distribution to determine t-critical values for inference of the population mean (μ)when the population standard deviation (σ) is unknown. Only the sample standard deviation is given orcomputed.

The Student’s t-Distribution is a family of distributions with properties similarto the Standard Normal Distribution.

The t-distribution is normally distributed and symmetrical around themean.

The area under the curve is 1 or 100%. The t-distribution has a larger variance than normal distribution.

Thus, it is flatter and wider than the normal distribution. The critical t-values will be greater than the critical z-values for the

same level of confidence or level of significance. The t-distribution is a family of curves based on degrees of freedom. As the sample size approaches 30 the t-distribution approaches the Normal Distribution.

Degrees of Freedom

The degrees of freedom for the Student’sT-distribution are computed as follows:

As the degrees of freedom increases, the t-distribution converges on the normal distribution. For samplesizes as large as 120, the distributions are approximately identical. When the sample size is larger than30, the two distributions are difficult to distinguish. Thus, for a sample size of 30 or more, manyresearchers will use the z distribution.

In an effort to try an understand wherethe formula for the degrees of freedomcomes from, let us remember the formulafor the sample variance.

Another way to wrap your mind around the formula for degrees of freedom is to consider the followingquestion:

If the average of 4 numbers is 80 what are the degrees of freedom?

_______ _______ _______ _______

Page 29: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

22 Chapter 2

Student’s T-Distribution (t-critical values)Use the Student’s T- Distribution to determine t-critical values for inference of,

– the population mean (μ) when the population standard deviation (σ) is unknown.

Because t-critical values are based on the degrees of freedom of the sample size, we cannot generalizethe critical values for certain levels of confidence. Thus, you will either use the T-Distribution Table orMS Excel to compute the t-critical values.

When using the T-distribution table be mindful of the first 3 rows that designate the levels ofconfidence. The first row lists the level of confidence. The second row then indicates the probability inone of the tails based on the given level of confidence. For example, if the level of confidence is 0.90 or90%, the area in ONE tail would be 0.05 or 5%, because levels of confidence are TWO-TAILED tests.

Finally, the third row lists the TOTAL area in both tails of the distribution. Therefore, for the 90% level ofconfidence we see that there is 5% in ONE tail and 10% in TWO tails. Each of the first three rowsrepresent the same distribution!

----------------------------------------------------------------------------------------

2.1 Confidence Interval Vocabulary

Use T-Critical Values when:

ONLY SAMPLE STANDARDdeviation is given or

computed

Excel formulas to compute t-critical values (General then with alpha = 0.10, n = 20)

Left Tailed:

Right Tailed:

Two-tailed:

HINT:If n>30 use Excelformulas tocompute CV ratherthan the table!

Page 30: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Confidence Intervals 23

2.3 Estimate population mean with sigma knownWhen you estimate a population mean using a confidence interval, one of the first things to consider iswhether or not the population standard deviation (sigma) is known. In reality, if sigma is known therewould be no need to create a confidence interval as the population mean is needed to compute sigma!

However, there are times that a researcher may assume sigma based on previous data for a currentstudy. In other words, the researcher does not really know the value of sigma, but is willing to assume itis the same as it was in a previously.

When sigma is either given or assumed based on a previous study, the z-critical value will be used tocompute the margin of error to estimate a population mean.

Steps to creating a Confidence Interval to estimate μ with σ known1. Compute/determine the sample mean, . This is the point estimate.2. Determine the level of confidence (typically given; if not, use 95%).3. Use the level of confidence & the standard normal distribution or =NORMSINV(alpha) to determinethe z-critical value.4. Compute the standard error of the mean.5. Use results from steps 3 & 4 to compute the margin of error.6. To compute the Lower Limit (LL) subtract the margin of error from the sample mean.7. To compute the Upper Limit (UL) add the margin of error to the sample mean.

2.4 Estimate population mean with sigma unknownAs was mentioned in the last section, the population standard deviation of a sample is typically notknown. It is more likely that the standard deviation is computed from the sample. If you are unsurewhether the standard deviation is a parameter or a statistic you should assume it is a statistic (samplestandard deviation).

When sigma is unknown, the t-critical value will be used to compute the margin of error to estimate apopulation mean.

Page 31: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

24 Chapter 2

Steps to creating a Confidence Interval to estimate μ with σ unknown

1. Compute/determine the sample mean, . This is the point estimate.2. Determine the level of confidence (typically given; if not, use 95%).3. Use the level of confidence and the Student’s T-Distribution or =T.INV.2T(alpha, df) to determine thet-critical value.4. Compute the standard error of the mean.5. Use results from steps 3 & 4 to compute the margin of error.6. To compute the Lower Limit (LL) subtract the margin of error from the sample mean.7. To compute the Upper Limit (UL) add the margin of error to the sample mean.

2.5 Estimate population proportion

When creating a confidence interval to estimate a population proportion, a standard deviation will notbe directly given. The standard error of the proportion will need to be computed based on the pointestimate (sample proportion) and sample size (see below). If you are not given a standard deviation, norgiven data to compute the standard deviation, you should check to see if you are being asked to create aconfidence interval for the population proportion.

When estimating a population proportion, the z-critical value will always be used to compute themargin of error. Remember, that the standard error of the proportion will need to be computed as well.

Steps to creating a Confidence Interval to estimate p

1. Compute/determine the sample proportion, . This is the point estimate.2. Determine the level of confidence (typically given; if not, use 95%).3. Use the level of confidence and the standard normal table or =NORMSINV to determine the z-critical value.4. Compute the standard error of the proportion.5. Use results from steps 3 & 4 to compute the margin of error.6. To compute the Lower Limit (LL) subtract the margin of error from the sample proportion.7. To compute the Upper Limit (UL) add the margin of error to the sample proportion.

Page 32: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Confidence Intervals 25

Section 2.3 - 2.5 Practice*Hint: if the population size is known (N) and n/N>5%, apply the FCF (Finite Correction Factor)

1. A random sample of 25 UB students finds an average commute time of 20 minutes with a samplestandard deviation of 10 minutes. Assuming commute time follows a normal distribution construct a90% confidence interval for the average commute time for all UB North students. Round yourconfidence interval to the nearest tenth.

2. When 400 UB students were surveyed, 220 of them said they lived at home with their parents.Construct a 99% confidence interval for the proportion of all UB students who live at home with theirparents. Round your confidence interval to the nearest whole percent. (approx. 20,000 undergraduatestudents, thus small amount of population is sampled n/N < 0.05)

Page 33: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

26 Chapter 2

3. A random sample of 100 UB students finds an average commute time of 20 minutes with a samplestandard deviation of 10 minutes. Assuming commute time follows a normal distribution construct a95% confidence interval for the average commute time for all UB North students. Round yourconfidence interval to the nearest tenth.

4. There are approximately 20,000 Undergraduate students at UB. A random sample of 1500 UBstudents finds an average commute time of 20 minutes with a sample standard deviation of 7 minutes.Assuming commute time follows a normal distribution construct a 95% confidence interval for theaverage commute time for all UB North students. Round your confidence interval to the nearest tenth.(n/N = 1,500/20,000 = 0.075)

Page 34: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Confidence Intervals 27

5. A random sample of 25 UB students finds an average commute time of 20 minutes with anassumed population standard deviation of 10 minutes. Assuming commute time follows a normaldistribution construct a 99% confidence interval for the average commute time for all UB Northstudents. Round your confidence interval to the nearest tenth.

6. There are approximately 20,000 Undergraduate students at UB. When 1200 UB students weresurveyed, 210 of them said they lived at home with their parents. Construct a 95% confidence intervalfor the proportion of all UB students who live at home with their parents. Round your confidenceinterval to the nearest whole percent. (n/N = 1,200/20,0000 = 0.06 - use the Finite Correction Factor)

Page 35: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

28 Chapter 2

2.6 Determine minimum sample size

Recall that confidence intervals are based on a point estimate or sample statistic from the population. Asample size is part of the population. Thus, based on sampling, we can never be 100% confident in ourestimates of population parameters. While we cannot attain 100% confidence, there are ways to adjustthe width of our confidence intervals.

Ways to adjust the width of the Confidence Interval

1. Change the level of confidence

2. Change the Sample Size

Determining Minimum Sample Size

The formulas for determining the minimum sample size are derived from the formulas for the margin oferror. The formulas to determine the minimum sample size to estimate the population mean and todetermine the minimum sample size to estimate the population proportion are able to be determinedwith a bit of algebra:

How do we know which formula to use?

Are you given a standard deviation? Are you determining a minimum sample size to estimate a mean ora proportion?

Be mindful of rounding!

Always round minimum sample size UP to the nearest whole number. Thus, whether the computedminimum sample size is n = 24.892 or n = 24.132, the minimum sample size required will be n = 25.

Page 36: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Confidence Intervals 29

Section 2.6 Practice1. A manager of a local company interested in estimating the average amount of money per week heremployees spend on gasoline. How large a sample must she select if she desires to be 99% confidentthat the true mean is within $2 of the sample mean? The population standard deviation of the averageamount of money per week spent on gasoline is known to be $10.

2. What sample size is needed to estimate with 95% confidence the population proportion of U.S.citizens with blue eyes within a margin of error of ± 5%? Assume a pilot sample of 100 people found 22with blue eyes.

3. What sample size is needed to estimate with 90% confidence the population proportion of U.S.citizens with green eyes within a margin of error of ± 5%?

Page 37: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

30 Chapter 2

Basic Concepts – Review Questions Chapters 1 & 21. In your own words, why is it important to know the standard deviation of a data set

when interpreting risk?

2. How would you explain the standard distribution of the mean to someone who is justlearning about the concept?

3. How does the standard error of the mean relate to the population from which it wastaken?

4. How would you explain the central limit theorem to someone that does not have anystatistical background?

Page 38: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Confidence Intervals 31

5. Compare and contrast a point estimate and an interval estimate.

6. Compare and contrast the t-distribution and the standard normal distribution?

7. In your own words, explain what a level of confidence is.

8. How does changing the level of confidence effect a confidence interval? What aboutchanging the sample size?

Page 39: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

32 Chapter 3

Chapter 3 Hypothesis Testing

IntroductionIn Chapter 2, we estimated population parameters based on sample statistics. The purpose of confidenceintervals is to help us estimate parameters. The interval estimates allow us to utilize sample data to makepredictions concerning the population. Our resulting confidence interval, however, is limited. This isbecause the interval is created based on a sample of the population. Thus, we cannot be 100% confidentthat the population parameter is contained in our confidence interval. This is why we refer to confidenceintervals as estimates. We use sample statistics to estimate the values of population parameters withinour level of confidence.

Confidence intervals begin with sample statistics and attempt to estimate population parameters. In thischapter, we will reverse this process. In this chapter we will begin with a claim concerning a populationparameter. We will then use sample statistics to verify the claim. This process is hypothesis testing.

The process of hypothesis testing begins with a hypothesis (claim) about a population parameter. Thequestion is whether or not the claim about the parameter is warranted? To determine the merit of theclaim, evidence (sample statistics: sample mean, sample proportion) is gathered. This evidence is thencompared to the claimed parameter. Utilizing probabililties, it is then determined how likely it would beto gather the evidence given the proposed claim about the parameter.

Objectives1. Differentiate the basic concepts of hypothesis testing.2. Compare the assumptions of each hypothesis testing procedure.3. Design hypothesis tests to verify a claim about a population mean.4. Design hypothesis tests to verify a claim about a population proportion.5. Identify the pitfalls (Type I and Type II errors) involved in hypothesis testing6. Design hypothesis tests to verify a claim about the means of two independent populations.7. Design hypothesis tests to verify a claim about the proportions of two independent populations.

3.1 Hypothesis Testing (vocabulary, overview)

Hypothesis testing begins with a hypothesis (claim) about a population parameter. When performing ahypothesis test the FIRST step is to determine the claim. Essentially there are 6 options for a claim of asingle population parameter. It is important to stress that the claim concerns a population parameter.Sample statistics are then gathered in an effort to determine whether or not the claim is valid.

.

Page 40: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Chapter 3 Hypothesis Testing 33

The SECOND step is to define the HYPOTHESES. In hypothesis testing there will always be 2 hypotheses;the null hypothesis and the alternative hypothesis. A claim about a population parameter could bemade with either of these hypotheses.

The null hypothesis (H0) is a statement that claims equality. In other words, the null hypothesisclaims that a population parameter is either at least, at most or equivalent to a stated value(mean or proportion). The null hypothesis is ALWAYS ASSUMED TO BE TRUE!

The alternative hypothesis (H1 or Ha) is the counter-claim. The alternative hypothesisrepresents the opposing claim. Thus, the alternative hypothesis is a statement of inequality. Inother words, the alternative hypothesis claims that a population parameter is either less than,greater than or different than (not equal to) to a stated value (mean or proportion). Thealternative hypothesis assumes the null hypothesis is false.

It is important to remember that we cannot test an inequality. There is not one single value that definesan inequality. An inequality is defined by an infinite set of values. Therefore, we cannot test aninequality because there in not one single value that can be defined as a starting point. For this reason,we do not test an alternative hypothesis.

However, we absolutely can test an equality. If we have statements such as “equal to”, “at most” or “atleast” we are able to define a single value. Even though the statements “at most” and “at least” extendto either infinity or negative infinity, they each have an absolute starting point. For this reason, we onlytest null hypotheses.

The THIRD step is to determine the NATURE OF THE TEST. The nature of the test will help you conductyour analysis. Recall from Chapter 2 that confidence intervals are ALWAYS two-tailed tests. The level ofthe confidence represents the probability that the population parameter falls within the estimatedinterval. Because this interval has a lower and upper limit there is potential that the true parametermight be outside of the interval estimate. The level of significance represents the probability that thepopulation parameter does not fall within the estimated interval. Thus, the NATURE OF CONFIDENCEINTERVALS is always two-tailed as there are two “tails” or areas of potential error. The nature of the testdefines the area of potential “error” in a test.

Page 41: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

34 Chapter 3

There are 3 possibilities for the NATURE OF HYPOTHESIS TESTS. The nature of the hypothesis testsdepends on the statement of the claim. The nature of the test signifies which tail of the distributioncontains the potential “error” for the test. The LEVEL OF SIGNIFICANCE determines the area orprobability for this potential error. (Error in a hypothesis test will be defined in the next section ….)

1. One-tailed Right (Ha: greater than, increase, exceed; Ho: less than or equal, at most)2. One-tailed Left (Ha: less than, decreased; Ho: greater than or equal, at least)3. Two-tailed (Ha: not equal to, changed, different; Ho: equal to, the same, no change)

The FOURTH step is to gather evidence and compare it to the proposed parameter. The evidence thatis gathered will provide a sample statistic. The question becomes, is the sample statistic “close enough”’to the proposed parameter (as determined by the null hypothesis)?

If the sample statistic is “close enough” to the proposed parameter than the statistic that wasgathered did not provide enough evidence to refute the parameter and continues to beassumed true (not guilty).

If the sample statistic is not “close enough” to the proposed parameter than the statistic thatwas gathered did provide enough evidence to refute the parameter and it is rejected (guilty).

Example:

You and a group of friends are going bowling. On the way to the bowling alley, one of your friends isbragging that her longstanding bowling average is 160. You feel a bit intimated as you were hoping forthe bumper pads to be in place!

Scenario A: After 4 games of bowling, your friends’ average is 50. The question becomes, do youcontinue to believe your friend’s lofty claim that her bowling average is 160? Could it bethat she was bragging without cause? Could it be that she is simply having a bad bowlingnight?

Page 42: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Chapter 3 Hypothesis Testing 35

Scenario B: After 4 games of bowling, your friends’ average is 154. The question becomes, do youcontinue to believe your friend’s lofty claim about her bowling average is 160? Is 154“close enough” that you will give her a break and assume she simply had an off night?

How do we define “close enough”? This is defined by the level of significance and one of two methodsfor comparing the sample statistic to the proposed parameter. There are 2 methods to determiningwhether the sample statistic is “close enough” to the proposed parameter:

Test-Statistic/Critical Value MethodCompare the test statistic (location of the sampleevidence on the z or t distribution) to the critical valueof the test (z or t score that defines the rejectionzone) as determined by the level of significance.

P-Value MethodCompare the p-value (probability value) to the level of significance. The p-value is theprobability more extreme than the test statistic. The p-value represents the likelihood ofgathering the evidence assuming the null hypothesis is valid.

Remember that there are 3 possibilities for the NATURE OF THE TEST.

Page 43: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

36 Chapter 3

The FINAL step to is to state a VERDICT and CONCLUSION. Remember, we cannot test an inequality.There is not one single value that defines an inequality. For this reason, we test null hypotheses and notalternative hypotheses. Thus, our verdict (“close enough”) is based on the null hypothesis.

Reject the Null Hypothesis: There IS enough evidence to reject the null hypothesis at a specifiedlevel of significance.

We reject the null hypothesis in favor or supporting the alternative hypothesis. There isenough evidence to support that change (increase/decrease or both) has occurred.

Do not Reject the Null Hypothesis: There IS NOT enough evidence to reject the null hypothesisat a specified level of significance.

We continue to assume the null hypothesis is valid. This does NOT tell us the nullhypothesis is true – only that there was not enough evidence to support the alternative.

You cannot conclude the Null Hypothesis is true. You will NEVER ACCEPT the Null Hypothesis.

In reality it's actually almost impossible to prove anything true. Instead, all we do is show that things arefalse. So instead of proving our hypothesis true, we just try to prove the alternatives are wrong.

Let’s revisit our bowling example for an overview of each method of hypothesis testing (do notworry….we will spend more time on each method in later sections…)

Assume that you have read that the distribution of bowling scores is normal and the populationstandard deviation for bowlers in America is 12. Conduct a hypothesis test for scenario A, where yourfriend bowled an average of 50 in 4 games after claiming her longstanding average is 160.

Test Statistic/Critical Value Method

P-Value Method

Page 44: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Chapter 3 Hypothesis Testing 37

Now conduct a hypothesis test for scenario B, where your friend bowled a 154 although she claimed herlongstanding average was 160. Again, assume that the distribution of bowling scores is normal and thepopulation standard deviation for bowlers in America is 12.

One more thought….what if you bowled 36 games instead of 4? I know, that is a lot of bowling! Let’s justgo with it. How would the results of each scenario (50 & 154) change?

Test Statistic/Critical Value Method

P-Value Method

Test Statistic/Critical Value P-Value

Page 45: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

38 Chapter 3

3.2 P-Values & Type I/Type II Errors

The probability more extreme than the evidence is the p-value (probability value). In other words, it'sthe confidence that your null hypothesis is valid. Small p-values offer decreased confidence (lowprobability or less chance) that the null is correct, thus we reject the null (in favor of the alternative).Larger p-values offer increased confidence (higher probability or greater chance) that the null is correct,thus we do not reject the null (keep assuming it to be true).

The p-value is the probability that your null hypothesis is actually correct. To compute the p-value, youhave to know what kind of distribution you were expecting.

Example:

My son claims he brushed his teeth: Null HypothesisI am suspicisous that he did NOT brush: Alternative Hypothesis

Remember, that the null hypotheses is assume to be valid (innocent until proven guilty).

If I go upstairs and the toothbrush is wet….does that mean he brushed his teeth?No, evidence does NOT prove a null hypothesis valid.Our decisions cannot be perfect – there is always the risk of error due to sampling.

If I go upstairs and the toothbrush is wet the PROBABILITY that the null hypothesis is correct (probabilitythat my son is telling the truth) is likely to be high. Large p-values support the null hypothesis.

There is a sufficient probabiliity that the tooth brush would be wet if he did in fact brush his teeth. Thus,I would conclude…DO NOT REJECT THE NULL HYPOTHESIS – there is not enough evidence to reject thenull hypothesis in favor of the alternative (not enough evidence to support that my son did not brush histeeth). Thus, I continue to assume the null hypothesis is true and that my son brushed his teeth. Myalternative suspicions could not be validated given the evidence that I found.

Test Statistic/Critical Value P-Value

Page 46: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Chapter 3 Hypothesis Testing 39

This does NOT prove that he brushed his teeth. There is always error involved. It could be that he simplyran his toothbrush under the faucet! If the little bugger played this trick on me, I would have made aType II error (failed to reject a null hypothesis that was false). The probability of this error is known asthe Beta risk (which we will study soon).

If I go upstairs and the toothbrush is dry….does that mean he did not brush his teeth?No, evidence does NOT prove a null hypothesis false.Remember, our decisions cannot be perfect – there is always the risk of error due to sampling.

If I go upstairs and the toothbrush is dry the PROBABILITY that the null hypothesis is correct (probabilitythat my son is telling the truth) is likely to be low. Small p-values support the alternative hypothesis.

There is a small probabiliity (insufficient probabililty) that the tooth brush would be dry if he did in factbrush his teeth. Thus, I would conclude…. REJECT THE NULL HYPOTHESIS – there is enough evidence(dry toothbrush) to reject the null hypothesis in favor of the alternative hypothesis (enough evidence tosupport that my son did not brush his teeth). Thus, I assume the null hypothesis is false and that my sondid not brush his teeth. My alternative suspicions are validated given the evidence that I found.

This does NOT prove that he did not brush his teeth. Again, there is always error involved. It could bethat he used MY toothbrush (yuck!) If he indeed did brush his teeth and I concluded he did not, I wouldhave made a Type I error (rejected a null hypothesis that was true). The probability of this error isknown as the Alpha risk (Level of Significance = alpha risk).

The smaller the p-value the more statistical evidence to support the alternative hypothesis(reject Null).

Page 47: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

40 Chapter 3

TYPE I & TYPE II ERRORS

Ho is TRUE Ho is FALSE

Reject Ho

Do not Reject Ho

The researcher sets the ALPHA risk (Probability of Type I error; incorrectly rejecting a valid nullhypothesis). The ALPHA risk is also known as the level of significance on a distribution centered at theASSUMED parameter. Type I errors are also called: Producer’s Risk, False Alarm, False Postive

The Alpha risk then defines the following:

Probability of correctly accepting a valid null hypothesis. Computed as the complement of Alphaon a distribution centered at the assumed parameter.

BETA risk: Probability of Type II error; incorrectly accepting a false null hypothesis. Computedon a distribution centered at the ACTUAL parameter and based on the Alpha risk. Type II errorsare also called: Consumer’s Risk, Misdetection, False Negative

POWER of the test: Probability of correctly rejecting a false null hypothesis. Computed as thecomplement of Beta.

http://rpsychologist.com/d3/NHST/ applet to visualize each

Page 48: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Chapter 3 Hypothesis Testing 41

Steps to compute Beta and the Power of the Test.Beta (Prob Type II error) and the Power of the test are based on a distribution centered at the TRUEparameter. Often we do not know the true parameter. Thus, Beta and Power help the researcher set theALPHA risk.

To compute BETA:

1. Determine the claim.2. Define the hypotheses.3. Determine the nature of the test.4. Compute the Critical Mean (or proportion). This is the data value

that corresponds to the Critical Value (as determined by Alpha)on the distribution centered at the assumed paramter.

5. Find the z or t-score of the Critical Mean (or proportion) inrelation to the ACTUAL Population Mean (or proportion)*this is likely a different distribution than in step 1 so you maywant to draw 2 curves to see this

6. Use the Nature of the Test to compute the Power of the Test andBeta. *Beta will be in the opposite direction of Alpha.

BETA EXAMPLE

In spring 2002, 66% of adults in the US aged 18 years or older had internet access. A Harris interactivepoll in February and May of 2005 surveyed 2022 adults and found that 1365 of the sample had internetaccess (p = 0.675). Is this enough evidence to make a claim that more adults (greater than 66%) hadinternet access in 2005 than in 2002 at α = 0.01?

If we assume the actual population proportion is 68% what is the power of the test? What is the betarisk?

Page 49: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

42 Chapter 3

3.3 Conducting Single Parameter Hypothesis Tests

In this section we will simply practice what we have learned thus far.

1. A researcher claims that the average height of a woman aged 20 years or older is greater than the1994 mean height of 63.7 inches. She obtains a sample of 45 women and finds the sample meanheight to be 63.9 inches. Assume that the population standard deviation is 3.5 inches. Test theresearcher’s belief at α = 0.05

2. A researcher claims that the average height of a woman aged 20 years or older is greater than the1994 mean height of 63.7 inches. She obtains a sample of 16 women and finds the sample meanheight to be 65 inches and a sample standard deviation of 2.5 inches. Assuming heights of womenaged 20 years or older follows a normal distribution, test the researcher’s belief at α = 0.05.

Page 50: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Chapter 3 Hypothesis Testing 43

3. A local company claims that 15% of the goods shipped to them by the manufacturer are defectiveand thus they are demanding reimbursement. The manufacturer randomly samples 36 shipmentsand finds 13% to be defective. Does the local company have a valid complaint at the 5% level ofsignificance? What about at the 1% level of significance?

4. A certain brand of apple juice states that it has 64 ounces of juice. Because of the legal punishmentfor under filling bottles is severe, the larger mean amount of 64.05 ounces is used. However, thefilling machine is not precise, and the exact amount of juice varies from bottle to bottle.

A quality control manager wishes to verify the claim that the mean amount of juice in each bottle is64.05 ounces so that she can be sure that the machine is not over or under filling. She randomlysamples 22 bottles of juice and measures the content. She finds the average to be 64.007 and thestandard deviation of the sample to be 0.06.

Should the assembly line be shut down so that the machine can be recalibrated? In other words,test the hypothesis that the mean is equal to 64.05. What type of error could have been made? Doyou know the probability of such error?

Page 51: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

44 Chapter 3

5. The production line for Glow toothpaste is designed to fill tubes of toothpaste with a mean weightof 6 ounces. Periodically, a sample of 30 tubes will be selected in order to check the filling process.The most recent sample of 30 tubes resulted in a mean weight of 6.1 ounces.

Quality assurance procedures call for the continuation of the filling process if the sample results areconsistent with the assumption that the mean filling weight for the population of toothpaste tubesis 6 ounces; otherwise the filling process will be stopped and adjusted. Assume the standarddeviation for the filling process is set at 0.2 ounces.

6. A State Highway Patrol periodically samples vehicle speeds at various locations on a particularroadway. The sample of vehicle speeds is used to test the hypothesis that speeds are in excess of 10mph over the posted 55. The locations where Ho is rejected are deemed the best locations for radartraps.At Location #2, a sample of 16 vehicles shows a mean speed of 68.2 mph with a standard deviationof 3.8 mph. Use an α = .05 to determine if location #2 is a good location for radar traps.

Page 52: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Chapter 3 Hypothesis Testing 45

3.4 Testing Parameters from Two Independent Samples

Independent samples occur when the data are collected from two different groups who may havecome from the same population, but otherwise the groups do not consist of the same individuals. Aswith other situations, the samples may have known or unknown population standard deviations,resulting in either the z or t test.

Hypothesis Tests for 2 MEANS – SIGMA KNOWN

Hypothesis Tests for 2 MEANS – SIGMA UNKNOWN, UNEQUAL VARIANCES

Hypothesis Tests for 2 MEANS – SIGMA UNKNOWN, EQUAL VARIANCES

Hypothesis Tests for 2 PROPORTIONS

When testing two means where sigma is known use the z-test. The hypothesis testing procedure isEXACTLY the same as for the z-test of one mean EXCEPT for the computation of the TEST STATISTIC. Intesting the difference between 2 means, you will often assume that the populations are normallydistributed, with equal variances. For situations in which the two populations have equal variances, thepooled variance t-test is robust (or not sensitive) to moderate departures from the assumptions ofnormality, provided the sample sizes are large.

*The F-Test can helpdetermine if populationvariances are equal. TheF distribution is aRIGHT-TAILED TEST. TheNull hypothesis assumesthe variances are equal.If the F Test Statistic isGREATER than the FCritical Value: REJECTTHE NULL HYPOTHESIS.There is enoughevidence to support adifference in thevariances (they are notequal).

Page 53: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

46 Chapter 3

Example 1:

Are the mean weekly sales the same when using the normal shelf location and when usingan end-aisle display? There are two populations of interest. The first population is the set ofall possible weekly sales of Cola if all the stores (in a specific chain) used the normal shelflocation. The second population is the set of all possible weekly sales of Cola if all the storesused end-aisle displays. You collect the data from the population of 10 stores. Is there adifference in sales due to display location?

Example 2:

Are the mean weekly sales the same when using the normal shelf location and when usingan end-aisle display? There are two populations of interest. The first population is the set ofall possible weekly sales of Cola if all the stores (in a specific chain) used the normal shelflocation. The second population is the set of all possible weekly sales of Cola if all the storesused end-aisle displays. You collect the data from a sample of 10 stores (in a specific chain)that have been assigned a normal shelf location and another sample of 10 stores that havebeen assigned an end-aisle display.

IF we assume EQUAL VARIANCES between the population sales of the 2 stores:

DISPLAY LOCATIONNORMAL END-AISLE

22 5234 7152 7662 5430 6740 8364 6684 9056 7759 84

DISPLAY LOCATIONNORMAL END-AISLE

22 5234 7152 7662 5430 6740 8364 6684 9056 7759 84

Page 54: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Chapter 3 Hypothesis Testing 47

Example 3:

Using the same scenario from example 2, tf you cannot make the assumption that the two independentpopulations have equal variances, you cannot pool the two sample variances into the common estimateand thus cannot use the pooled-variance t-test. Instead, use the separate-variance t-test (unequalvariances). This test requires the computation of the degrees of freedom based on separate samplevariances.

If we assume UNEQUAL VARIANCES between the population sales of the 2 stores:

Example 4:

An experiment was conducted to study the choices made in mutual fund selection. Undergraduate andMBA students were presented with different S&P 500 index funds that were identical except for fees.Suppose that 100 undergraduate students and 100 MBA students were selected with the followingresults. Is there a difference between undergraduate and MBA stduents in the proportion who selectedthe highest-cost fund?

STUDENT GROUPFUND Undergraduate GraduateHighest-cost Fund 27 18Not-highest-cost Fund 73 82

Page 55: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

48 Chapter 3

3.5 Testing the Difference of Means from a Paired Sample

A paired sample occurs when the data are collected from the same individual at two different points intime, or on two different tasks, or some other fashion in which the values will be connected. The actualtest is done on the differences as a paired t-test.

When you take repeated measurements on the same items or individuals, you assume that the sameitems or individuals will behave alike if treated alike. Your objective is to show any differences betweentwo measurements of the same items of individuals due to different treatment conditions (results of thefirst population are not independent of the results of the second population).

Example 1:

You have just been hired as a consultant for AAA and yourobjective is to determine whether there is any difference in themean mileage between the real-life driving done by an AAAmember and the driving done according to governmentstandards. In other words, is there evidence that the meanmileage is differenct between the two types of driving?

Assume differences are normally distributed use paired t-test as sigma is not known

Model Members Government2005 Ford F-150 14.3 16.82005 Chevy Silverado 15 17.82002 Honda Accord LX 27.8 26.22002 Honda Civic 27.9 33.22004 Honda Civic Hybrid 48.8 47.62002 Ford Explorer 16.8 18.32005 Toyota Camry 23.7 28.52003 Toyota Corolla 32.8 33.12005 Toyota Prius 37.3 44

Page 56: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Chapter 3 Hypothesis Testing 49

Example 2:

A local pizza restaurant situated across the street from your apartment advertisesthat it delivers to the dormitories faster than the local branch of a national pizzachain. In order to determine whether this advertisement is valid, you and somefriends have decided to order 10 pizzas from the local pizza restaurant and 10pizzas from the national chain. In fact, each time you ordered a pizza from thelocal pizza restaurant, at the same time, your friends ordered a pizza from thenational pizza chain. Thus, you have matched samples. Is there enough evidenceto determine that the mean delivery time for the local pizza restaurant is less thanthe mean delivery time for the national chain?

Time LOCAL CHAIN1 16.8 222 11.7 15.23 15.6 18.74 16.7 15.65 17.5 20.86 18.1 19.57 14.1 178 21.8 19.59 13.9 16.5

10 20.8 24

Page 57: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

50 Chapter 4

Regression

Number ofVariablesSingle Variable

MultipleVariable

Type of Data(quantitative vs

qualitative )Shape of

Regression Line

Linear

Logistic

Polynomial

Stepwise

Ridge

Lasso

ElasticNet

Chapter 4 Linear Regression

IntroductionI am confident that you have heard the phrase, “Smoking causes cancer.” While I do not advocatesmoking, this is not truly an accurate statement. It would be better to say, “Smoking contributes tocancer,” or “There is a strong relationship between smoking and cancer.”

When stating a relationship between variables, we are wise to be cautious of proclaiming one variablecauses the other. Instead, we should state with a certain level of significance, if a relationship existsbetween variables. Regression analysis allows us to test the significance of relationships betweenvariables and to test the strength of the impact of variables on one another.

Regression analysis is a form of predictive modeling which determines if a relationship exists betweentwo or more variables. Furthermore, if a relationship exists, the independent (x, input, explanatory)variables are used to predict the dependent (y, output, response) variables. Regression analysis is usedfor forecasting, time series modeling and examining causal effect relationships (as mentioned above inthe smoking example).

Regression analysis is a very important tool in data analytics. There are various kinds of regressiontechniques available based on the following. We will focus our study on Single Variable and MultipleVariable Linear Regressions with both quantitative and qualitative data.

Objectives1. Define Correlation.2. Evaluate Correlation Coefficient and summarize results.3. Evaluate Coefficient of Determination and summarize results.4. Determine linear regression equations to make predictions.5. Analyze significance of coefficients and overall regression using MS Excel.6. Interpret the output of an Analysis of Variance (ANOVA).7. Use regression analysis to create models of best fit.

Page 58: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Chapter 4 Linear Regression 51

4.1 Correlation & Regression (Single Variable Regression)What is correlation? A correlation is a relationship between two variables. In single variable regressionthere is one independent variable and one dependent variable. The variable that you are trying topredict is the dependent variable because it “depends” on the independent variable.

The data can be represented by the ordered pairs (x, y) where x is the independent (explanatory)variable, and y is the dependent (response) variable. Below are examples of 4 different data sets. Ascatter plot can be used to determine whether a correlation exists between two variables and the bestfit for the shape of the regression line.

CovarianceCovariance indicates how two variables are related. A positive covariance means the variables arepositively (directly) related (as one increases, the other increases; move in the same direction). Anegative covariance means the variables are negatively (indirectly) related (as one increases, the otherdecreases; move in opposite directions).

Thus, the purpose of the covariance is to indicate the direction of a potential relationship betweenvariables. However, it does not indicate the strength of the relationship or if the relationship ismeaningful.

1.

2.

3.

4.

Page 59: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

52 Chapter 4

Correlation Coefficient r (sample) ρ (population)

How well does a line of best fit (regression equation) truly represent the set of data? Measures howclosely two variables move in relation with one another.

The correlation coefficient is a standardized measure of thecovariance and provides the strength and the direction of alinear relationship between two variables. The symbol rrepresents the sample correlation coefficient. The formulafor r is

The range of the correlation coefficient is 1 to 1. If x and yhave a strong positive linear correlation, r is close to 1. If xand y have a strong negative linear correlation, r is close to1. If there is no linear correlation or a weak linearcorrelation, r is close to 0.

This statistic is useful in many ways in finance. For example,it can be helpful in determining how well a mutual fund isbehaving compared to its benchmark index.

Correlation does NOT imply causation….

Remember the smoking/lung cancer example? Take a look at some “spurious” (false/fake) correlations.

http://tylervigen.com/spurious-correlations

Page 60: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Chapter 4 Linear Regression 53

Coefficient of DeterminationThe coefficient of determination is a measure used in statistical analysis that examines how well amodel explains and predicts future outcomes. It indicates the level of explained variability in the dataset. The coefficient of determination, (R-squared) is used as a guideline to measure the accuracy of themodel.

The coefficient of determination is used to explain how much one variable influences change in anothervariable. More specifically, R-squared gives the percentage of change in the dependent variable(predicted) that is influenced by the independent variable(s). Because the Coefficient of Determinationis a percentage it will range from 0 to 1. It is simply computed as the square of the correlationcoefficient. This allows the coefficient of variation to explain the degree of linear relationship betweenvariables.

The Coefficient of Determination is relied on heavily in trend analysis. and is represented as a valuebetween zero and one. The closer R-squared is to 1, the better the fit, or relationship, between the twofactors.

The Coefficient of Determination is known as the "goodness of fit." A value of one indicates a perfect fit,and therefore it is a very reliable model for future predictions. A value of zero, on the other hand, wouldindicate that the model fails to accurately predict the data.

The goodness of fit, or the degree of linear correlation, measures thedistance between a fitted line on a graph and all the data points that arescattered around the graph. The tight set of data will have a regressionline that's very close to the points and have a high level of fit, meaningthat the distance between the line and the data is very small. A good fithas an R-squared that is close to one.

Do not assume, however, that values of R-squared close to 1 always produce a valid model. Keep inmind that R-squared is unable to determine whether the data points or predictions are biased.Additionally, the closer R-squared is to 1 does not equate to a scale of better to best. Depending on thedata set, a low R-squared is not bad. It is up to the analyst to make a decision based on the R-squaredvalue.

Page 61: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

54 Chapter 4

Regression EquationA regression equation is used to predict or forecast the dependent variable. In other words, in simplelinear regression we can predict the dependent variable (y) for a specific independent variable (x). Whenrunning simple linear regressions the formula for the regression equation is:

Putting it all together

Page 62: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Chapter 4 Linear Regression 55

4.2 MS Excel and Single Variable Regression Analysis1. Install Data Analysis Toolpak

Go to FILE - OPTIONS – ADD-INS – at bottom of window in the Manage box, be sure ExcelAdd-ins is selected and click GO. Check Analysis ToolPak and then click OK. The data analysistoolpak will be located in your DATA ribbon.

2. Choose Regression in the Data Analysis Toolpak.3. When running regressions be sure to choose your

dependent variable (variable you are trying to predict)for the Y Range and your independent variable for theX. If you choose column labels with your data be sureto check “Labels”.

4. To choose a confidence interval other than 95% enterthe desired level of confidence in the box

Page 63: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

56 Chapter 4

Does the Sample Correlation Coefficient r provide enough evidence to support the significance of thepopulation Correlation Coefficient ρ (rho)? Hmmm…HYPOTHESIS TESTING!

The Null Hypothesis assumes NO relationship, the alternative is that a relationship exists. Is thereenough evidence to support a relationship (thus, reject the null)?

A hypothesis test determines whetherthe sample correlation coefficient rprovides enough evidence to concludethat the population correlationcoefficient ρ is significant at a specifiedlevel of significance.

Significance of the COEFFICIENTS (ρ and β)Typically a two-tailed test. Always a t-test.Testing the significance of the coefficients.The results of the t-test are found in the bottom of the Excel output.

The significance of the coefficients follows a t-distribution with n – k – 1 degrees of freedom. Where k =the number of independent (x) variables. In single variable regression, k = 1. Thus, df = n - 2

Page 64: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Chapter 4 Linear Regression 57

You are testing the significance of the COEFFICIENT. The Null Hypothesis claims that the Coefficient isequal to zero. Why zero? If we REJECT THE NULL HYPOTHESIS we state that there is enough evidence, at a certain level

of significance, to support a relationship between the variables. If we DO NOT REJECT THE NULL HYPOTHESIS we state that there is not enough evidence, at a

certain level of significance, to support a relationship between the variables.

Significance of OVERALL REGRESSION (R²)Is the coefficient of determination (R²) (% of variation of y that is influenced by x) significant? Keep inmind the null hypothesis assumes NO relationship. However, can this be two-tailed? Remember, apercent cannot be negative. Thus, Ho: R² = 0% or R² ≤ 0% and Ha: R² > 0.

MUST be a right tailed test as a percent is always positive.

To test the significance of the overall regression use the results found in the ANOVA (Analysis ofVariance) section of the regression output. Here you will see the results for analyzing the variance of theregression.

Page 65: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

58 Chapter 4

dfR: number of explanatory variables (k)

dfE: degrees freedom for regression dfe = n - k -1

dfT: total degrees freedom for regression dft = n -1

SSR : Sum Squared Regression

SSE: Sum of Squared Residual ( E )

SST: Sum of Squared Total

MSR: Mean sum Regression

MSE: Mean sum Residual ( E )

F : F Test Statistic

Sig F: P-Value of the F Test Stat

SSR/SST: Explained Variation (R²)

SSE/SST: Unexplained Variation (1 - R²)

Page 66: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Chapter 4 Linear Regression 59

4.3 Multiple Linear Regression

Multiple linear regression (MLR) is a statistical technique that uses several independent variables topredict the outcome of a dependent variable. The goal of multiple linear regression is to model therelationship between the variables that best fits all of the data points. In other words, to produce a lineof best fit that minimizes the residuals (distances between the data points and the regression line).

Page 67: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

60 Chapter 4

Multiple regression is used to determine how many factors influence a particular dependent variable.Remember, when modeling there will be a margin of error. The ‘Error Term’ for a multiple regressionrepresents the difference between the results of the model and the observed results.

The error term indicates that the model is not 100% accurate and the predicted results will not equalactual data. The error term exists any time that an actual data value does not fall on the predicted lineof best fit.

When running MULTIPLE VARIABLE Regression in EXCEL the x-variables must beCONTIGUOUS.

Use the Excel Output to determine:

If the OVERALL REGRESSION is significant (Significance of ρ²)

Is there enough evidence to support Ha: ρ²>0? This is RIGHT TAILED, F-test The F test statistic is found in the ANOVA The P-Value for the significance of the overall regression is Significance F

Page 68: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Chapter 4 Linear Regression 61

If each independent variable has significance within the regression? (Significance of the slopeof each x-variable). Which variables should REMAIN in the model?

Is there enough evidence to support Ha: βi ≠ 0? Remember, the evidence gathered to test Ho: βi = 0 are the individual slopes (bi) The significance of each slope is found by analyzing either the P-value or the Level of

Confidence.

Page 69: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

62 Chapter 4

4.4 Qualitative Variables (Dummy Variables)A Dummy variable or Indicator Variable is an artificial variable created to represent an attribute withtwo or more distinct categories/levels. Why is it used? Regression analysis treats all independent(X) variables in the analysis as numerical.

Use the =IF command in Excel to turn qualitative variables into “dummy variables” of 1 or 0.

When using dummy variables to run regressions be sure to know which variable is noted as 1 and whichis noted as 0.

Keep in mind that when running multiple regression analysis the dependent variables must becontiguous (columns must be next to one another).

To build/improve a model:

Each dependent variableis significant.

The overall regression issignificant.

As dependent variablesare added/removed theAdjusted R-squaredpercentage increases andthe Standard Error of theoverall regressiondecreases.

Page 70: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Chapter 4 Linear Regression 63

SUMMARY OUTPUT

Regression StatisticsMultiple R r = sqrt(R Square) ranges from -1 to 1. Indicates direction and strength of the relationship between variablesR Square R² = SSR/SST ranges from 0 to 1. Indicates % of variation in predicted value that is influenced by the variation in x-variablesAdjusted R Square =1-[(1-R²)(n-1)/(n-k-1)]adjusted for number of explanatory/independent variables (x-variables)Standard Error =sqrt(MSE) Standard Error for the predicted valueObservations n = df_total + 1

ANOVAdf SS MS F Significance F

Regression k = # of x-variables SSR = ∑(ŷ-ȳ)² MSR = SSR/k F Test Stat = MSR/MSE P-value for F Test (see below)Residual df = n - k - 1 SSE = ∑(y-ŷ)² MSE = SSE/df (variance)Total Total = k + df SST = ∑(y-ȳ)²

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept y-int =tstat*se se =coeff/tstat t stat = coeff/se =Prob beyond t Stat =coeff - CV*(stderror) =coeff + CV*(stderror)BOOKS slope X1 =tstat*se = (slope-0)/se slope X1 - ME slope X1 + MEATTEND slope X2 =tstat*se = r*(sqrt((n-2)/(1-R²)) 'slope X2 - ME 'slope X2 + ME

Sample Statistic Population Parameterr ρ

R² ρ²b β

Null Hypothesis Alternative Hypothesis Found in output Type and Nature of Test Critical Value(s) P-ValueH0: β1 =β2 = 0 H1: At least one βi ≠ 0 ANOVA Right Tailed F testH0: ρ² ≤ 0 H1: ρ² > 0 ANOVA Right Tailed F testH0: βi = 0 H1: βi ≠ 0 Individ. Coefficients 2 Tailed T TestH0: ρ ≤ 0 H1: ρ > 0 Individ. Coefficients 2 Tailed T Test

=F.INV.RT(alpha,k,df) =F.DIST.RT(Fteststat,k,df)

= T.INV.2T(alpha,df) =T.DIST.2T(Tteststat*,df)*Tteststat > 0

Page 71: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

EXCEL FORMULA SHEET CHAPTER 1Binomial DistributionsFor an EXACT probability

=BINOMDIST(x,n,p,false)

For less than or equal to x (at most x)=BINOMDIST(x, n, p, true)

For less than x=BINOMDIST((x-1), n, p, true)

For greater than x=1-BINOMDIST(x, n, p,true)

For greater than or equal to x (at least x)=1-BINOMDIST((x-1), n, p, true)

Normal DistributionsTo find the probability of a z-score (instead of using the Standard Normal Table!!)

=NORMSDIST(z1) * (used for P(z < z1 ))=1-NORMSDIST(z1) *(used for P(z > z1))=NORMSDIST(z1) – NORMSDIST(z2) *(used for “between” probabilities where z1 > z2 )

For an EXACT probability*You will not be given “exact” probabilities because these are continuous distributions and exact values cannot be measured.

For probability less than or equal to x (at most x)=NORMDIST(x, mean, stddev, true)

For probability less than x*same as above since Normal Distributions are continuous!

For probability greater than x=1-NORMDIST(x,mean, stddev, true)

For probability greater than or equal to x (at least x)*same as above since Normal Distributions are continuous!

Given the percentile (probability) with objective to find the data value (x-value)=NORMINV(percentile, mean, stddev)

If SAMPLING….with intent to determine the probability of Sampling Distribution=NORMDIST(sample MEAN, population MEAN, STDERROR, true)

*where STDERROR of the MEAN is:

**IF given a FINITE population and n/N > 5% apply the “Finite Correction Factor”

IF computing the STDERROR of a PROPORTION instead of the Mean:Compute the Z-score and use =NORMSDIST(z) (π = Population %, p = sample %)

Page 72: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

CONFIDENCE INTERVALSQuestions to ask yourself

1. Am I creating a Confidence Interval or determining a Minimum Sample Size?a. Confidence Interval –

i. Am I trying to estimate the Population Mean or the Population Proportion?1. Population Mean –

a. Will I use a Z or T Critical Value?i. If the POPULATION std. dev. is given use the Standard

Normal Table (z) to obtain the Critical Value. If ONLYthe SAMPLE standard deviation is given use the T-Distribution table to obtain the Critical Value (t)

b. The product of the positive Critical Value and the Standard Error(see formulas to the left) is the Margin of Error

c. Add/Subtract the Margin of Error to the Sample Mean.2. Population Proportion –

a. Find the Z Critical Valuea. The product of the positive Critical Value and the Standard Error

(see formula to the left) is the Margin of Errorb. Add/Subtract the Margin of Error to the Sample Proportion.

b. Minimum Sample Size –i. Am I determining the Minimum Sample Size for a Population Mean or a

Population Proportion?1. Population Mean –

a. Use the formula to determine a Sample Size for PopulationMeans (see formula to the left). Remember that the Margin ofError (E) is the amount of Error that you are willing to accept oramount you want to be “within.” This will generally be given.

2. Population Proportion –a. Use the formula to determine a Sample Size for Population

Proportions (see formula to the left). Remember that theMargin of Error (E) is the PERCENTAGE of Error that you arewilling to accept or amount you want to be “within.” Be sure toIF no preliminary estimates (information from prior studies) isgiven then the proportion (p) is assumed to be 0.50 (50%).

Page 73: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Critical Values in EXCEL:consider Nature of the Test=NORM.S.INV(prob)=T.INV(prob,df)=T.INV.2T(prob,df)

P-Values in EXCEL:consider Nature of the Test=NORMDIST(tstat,mean,stderror,true)=T.DIST(tstat,df,true)=T.DIST.RT(tstat,df,true)=T.DIST.2T(tstat,df,true)

*tstat must be >0

Hypothesis TestingSteps & Questions to ask Yourself

I. Am I performing a Hypothesis Test of the Population Mean or the Population Proportion?A. Population Mean –

1.State the claim mathematically identifying the Null & Alternative Hypothesis (be sure to makenote which of these is the claim).

2.Determine the Nature of the Test (left tailed, right tailed, two tailed) – completely based on theAlternative Hypothesis.

3. Specify the Level of Significance (if is not given assume α=0.05)4. SKETCH the distribution noting the Hypothesized Population Mean and the Rejection Zone.5. Am I using the P-Value Method or the Critical Value Method for my Hypothesis Test?

A. P-Value Method –The P-Value is the evidence against a Null Hypothesis. So, is thereenough evidence against the Hypothesized Mean (Null Hypothesis)?

P-Value - find the P-Value by first computing the Test Statistic*. The P-Value isthe probability of the Test Statistic based on the Nature of the Test (direction ofthe “tail”). This gives you the probability for falsely (erroneously) Rejecting theNull Hypothesis.

Decision Rule:If the P-Value is LESS THAN or EQUAL to the LEVEL of

SIGNIFICANCE….REJECT THE NULL HYPOTHESIS. Either the NullHypothesis is invalid or an unusual event has occurred. This tells you thatthe probability of obtaining samples that differ from the mean are“statistically significant, “or unlikely to have occurred by chance. Thus,there is enough evidence to draw conclusions based on α.If the P-Value is GREATER THAN the LEVEL of SIGNIFICANCE….FAIL to

REJECT THE NULL HYPOTHESIS. The null hypothesis is valid or the sampleis inaccurate. This tells you that the probabilities of obtaining samplesthat differ from the mean (Null Hypothesis) are “not statisticallysignificant.” Thus, not enough evidence (weak evidence) based on α towaver from the mean (Null).

B. Critical Value Method – compare the Critical Value(s) to the Test Statistic.Critical Value(s)- These are z-scores or t-values based on the given level ofsignificance AND the Nature of the Test. These value(s) “define” your Rejection

Zone(s). Use your TABLES to determine Critical Value(s). Test Statistic* – This is a z-score or t-value computed using one of the Test

Statistic formulas using the SAMPLE DATA. There are 2 for Hypothesis Testing ofthe Population Mean. (see asterisk below)

Decision Rule:If the Test Statistic falls within a Rejection Zone – REJECT the NULL

HYPOTHESIS.If the Test Statistic does NOT fall within a Rejection Zone – FAIL to

REJECT the NULL HYPOTHESIS.*Test Statistic – Which to use…z-stat or t-stat? If the POPULATION standard deviation is given use z. If theSAMPLE standard deviation is given use t.

Page 74: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Critical Values in EXCEL:=NORM.S.INV(prob)consider Nature of the Test

P-Values in EXCEL:consider Nature of the Test=NORMDIST(tstat,mean,stderror,true)

B. Population Proportion –1.State the claim mathematically identifying the Null & Alternative Hypothesis (be sure to make

note which of these is the claim).2.Determine the Nature of the Test (left tailed, right tailed, two tailed) – completely based on the

Alternative Hypothesis.3. Specify the Level of Significance (if α is not given assume α=0.05)4. SKETCH the distribution noting the Hypothesized Population Proportion and the Rejection Zone.5. Am I using the P-Value Method or the Critical Value Method for my Hypothesis Test?

A. P-Value Method –The P-Value is the evidence against a Null Hypothesis. So, is thereenough evidence against the Hypothesized Mean (Null Hypothesis)?

P-Value - find the P-Value by first computing the Test Statistic*. The P-Value isthe probability of the Test Statistic based on the Nature of the Test (direction ofthe “tail”). This gives you the probability for falsely (erroneously) Rejecting theNull Hypothesis.

Decision Rule:If the P-Value is LESS THAN or EQUAL to the LEVEL of

SIGNIFICANCE….REJECT THE NULL HYPOTHESIS. Either the NullHypothesis is invalid or an unusual event has occurred. This tells you thatthe probability of obtaining samples that differ from the mean are“statistically significant, “or unlikely to have occurred by chance. Thus,there is enough evidence to draw conclusions based on α.If the P-Value is GREATER THAN the LEVEL of SIGNIFICANCE….FAIL to

REJECT THE NULL HYPOTHESIS. The null hypothesis is valid or the sampleis inaccurate. This tells you that the probabilities of obtaining samplesthat differ from the mean (Null Hypothesis) are “not statisticallysignificant.” Thus, not enough evidence (weak evidence) based on α towaver from the mean (Null).

B. Critical Value Method – compare the Critical Value(s) to the Test Statistic. Critical Value(s)- ALWAYS z-scores for Population Proportion. Again based on the

given level of significance AND the Nature of the Test. These value(s) “define”your Rejection Zone(s). Use your STANDARD NORMAL TABLE to determine theCritical Value(s).

Test Statistic* – ALWAYS a z-stat for Population Proportion. Use the (one & only)Test Statistic formula for Hypothesis Testing of a Population Proportion.

*both p and π indicate hypothesized proportion (q=1-p or 1-π) Decision Rule:

If the Test Statistic falls within a Rejection Zone – REJECT the NULLHYPOTHESIS.If the Test Statistic does NOT fall within a Rejection Zone – FAIL to

REJECT the NULL HYPOTHESIS.

Page 75: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

z 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00-3.4 .0002 .0003 .0003 .0003 .0003 .0003 .0003 .0003 .0003 .0003-3.3 .0003 .0004 .0004 .0004 .0004 .0004 .0004 .0005 .0005 .0005-3.2 .0005 .0005 .0005 .0006 .0006 .0006 .0006 .0006 .0007 .0007-3.1 .0007 .0007 .0008 .0008 .0008 .0008 .0009 .0009 .0009 .0010

-3.0 .0010 .0010 .0011 .0011 .0011 .0012 .0012 .0013 .0013 .0013-2.9 .0014 .0014 .0015 .0015 .0016 .0016 .0017 .0018 .0018 .0019-2.8 .0019 .0020 .0021 .0021 .0022 .0023 .0023 .0024 .0025 .0026-2.7 .0026 .0027 .0028 .0029 .0030 .0031 .0032 .0033 .0034 .0035-2.6 .0036 .0037 .0038 .0039 .0040 .0041 .0043 .0044 .0045 .0047

-2.5 .0048 .0049 .0051 .0052 .0054 .0055 .0057 .0059 .0060 .0062-2.4 .0064 .0066 .0068 .0069 .0071 .0073 .0075 .0078 .0080 .0082-2.3 .0084 .0087 .0089 .0091 .0094 .0096 .0099 .0102 .0104 .0107-2.2 .0110 .0113 .0116 .0119 .0122 .0125 .0129 .0132 .0136 .0139-2.1 .0143 .0146 .0150 .0154 .0158 .0162 .0166 .0170 .0174 .0179

-2.0 .0183 .0188 .0192 .0197 .0202 .0207 .0212 .0217 .0222 .0228-1.9 .0233 .0239 .0244 .0250 .0256 .0262 .0268 .0274 .0281 .0287-1.8 .0294 .0301 .0307 .0314 .0322 .0329 .0336 .0344 .0351 .0359-1.7 .0367 .0375 .0384 .0392 .0401 .0409 .0418 .0427 .0436 .0446-1.6 .0455 .0465 .0475 .0485 .0495 .0505 .0516 .0526 .0537 .0548

-1.5 .0559 .0571 .0582 .0594 .0606 .0618 .0630 .0643 .0655 .0668-1.4 .0681 .0694 .0708 .0721 .0735 .0749 .0764 .0778 .0793 .0808-1.3 .0823 .0838 .0853 .0869 .0885 .0901 .0918 .0934 .0951 .0968-1.2 .0985 .1003 .1020 .1038 .1056 .1075 .1093 .1112 .1131 .1151-1.1 .1170 .1190 .1210 .1230 .1251 .1271 .1292 .1314 .1335 .1357

-1.0 .1379 .1401 .1423 .1446 .1469 .1492 .1515 .1539 .1562 .1587-0.9 .1611 .1635 .1660 .1685 .1711 .1736 .1762 .1788 .1814 .1841-0.8 .1867 .1894 .1922 .1949 .1977 .2005 .2033 .2061 .2090 .2119-0.7 .2148 .2177 .2206 .2236 .2266 .2296 .2327 .2358 .2389 .2420-0.6 .2451 .2483 .2514 .2546 .2578 .2611 .2643 .2676 .2709 .2743

-0.5 .2776 .2810 .2843 .2877 .2912 .2946 .2981 .3015 .3050 .3085-0.4 .3121 .3156 .3192 .3228 .3264 .3300 .3336 .3372 .3409 .3446-0.3 .3483 .3520 .3557 .3594 .3632 .3669 .3707 .3745 .3783 .3821-0.2 .3859 .3897 .3936 .3974 .4013 .4052 .4090 .4129 .4168 .4207-0.1 .4247 .4286 .4325 .4364 .4404 .4443 .4483 .4522 .4562 .4602-0.0 .4641 .4681 .4721 .4761 .4801 .4840 .4880 .4920 .4960 .5000

Table N1 - Standard Normal Distribution

Page 76: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090.0 .5000 .5040 .5080 .5120 .5160 .5199 .5239 .5279 .5319 .53590.1 .5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .57530.2 .5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .61410.3 .6179 .6217 .6255 .6293 .6331 .6368 .6406 .6443 .6480 .65170.4 .6554 .6591 .6628 .6664 .6700 .6736 .6772 .6808 .6844 .68790.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224

0.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .75490.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .78520.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .81330.9 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .83891.0 .8413 .8438 .8461 .8485 .8508 .8531 .8554 .8577 .8599 .8621

1.1 .8643 .8665 .8686 .8708 .8729 .8749 .8770 .8790 .8810 .88301.2 .8849 .8869 .8888 .8907 .8925 .8944 .8962 .8980 .8997 .90151.3 .9032 .9049 .9066 .9082 .9099 .9115 .9131 .9147 .9162 .91771.4 .9192 .9207 .9222 .9236 .9251 .9265 .9279 .9292 .9306 .93191.5 .9332 .9345 .9357 .9370 .9382 .9394 .9406 .9418 .9429 .9441

1.6 .9452 .9463 .9474 .9484 .9495 .9505 .9515 .9525 .9535 .95451.7 .9554 .9564 .9573 .9582 .9591 .9599 .9608 .9616 .9625 .96331.8 .9641 .9649 .9656 .9664 .9671 .9678 .9686 .9693 .9699 .97061.9 .9713 .9719 .9726 .9732 .9738 .9744 .9750 .9756 .9761 .97672.0 .9772 .9778 .9783 .9788 .9793 .9798 .9803 .9808 .9812 .9817

2.1 .9821 .9826 .9830 .9834 .9838 .9842 .9846 .9850 .9854 .98572.2 .9861 .9864 .9868 .9871 .9875 .9878 .9881 .9884 .9887 .98902.3 .9893 .9896 .9898 .9901 .9904 .9906 .9909 .9911 .9913 .99162.4 .9918 .9920 .9922 .9925 .9927 .9929 .9931 .9932 .9934 .99362.5 .9938 .9940 .9941 .9943 .9945 .9946 .9948 .9949 .9951 .9952

2.6 .9953 .9955 .9956 .9957 .9959 .9960 .9961 .9962 .9963 .99642.7 .9965 .9966 .9967 .9968 .9969 .9970 .9971 .9972 .9973 .99742.8 .9974 .9975 .9976 .9977 .9977 .9978 .9979 .9979 .9980 .99812.9 .9981 .9982 .9982 .9983 .9984 .9984 .9985 .9985 .9986 .99863.0 .9987 .9987 .9987 .9988 .9988 .9989 .9989 .9989 .9990 .9990

3.1 .9990 .9991 .9991 .9991 .9992 .9992 .9992 .9992 .9993 .99933.2 .9993 .9993 .9994 .9994 .9994 .9994 .9994 .9995 .9995 .99953.3 .9995 .9995 .9995 .9996 .9996 .9996 .9996 .9996 .9996 .99973.4 .9997 .9997 .9997 .9997 .9997 .9997 .9997 .9997 .9997 .9998

Table N1 - Standard Normal Distribution (continued )

Page 77: ESSENTIALSOFINFERENTIAL STATISTICS 2016deannaal/Statistics_Textbook.pdf · Essentials of Inferential Statistics 2016 is an independent textbook. Printed in the United States of America

Level of c 0.50 0.80 0.90 0.95 0.98 0.99ONE Tail 0.25 0.10 0.05 0.025 0.01 0.005TWO tails 0.50 0.20 0.10 0.05 0.02 0.01

d.f.1 1.000 3.078 6.314 12.706 31.821 63.6572 0.816 1.886 2.920 4.303 6.965 9.9253 0.765 1.638 2.353 3.182 4.541 5.8414 0.741 1.533 2.132 2.776 3.747 4.6045 0.727 1.476 2.015 2.571 3.365 4.0326 0.718 1.440 1.943 2.447 3.143 3.7077 0.711 1.415 1.895 2.365 2.998 3.4998 0.706 1.397 1.860 2.306 2.896 3.3559 0.703 1.383 1.833 2.262 2.821 3.250

10 0.700 1.372 1.812 2.228 2.764 3.16911 0.697 1.363 1.796 2.201 2.718 3.10612 0.695 1.356 1.782 2.179 2.681 3.05513 0.694 1.350 1.771 2.160 2.650 3.01214 0.692 1.345 1.761 2.145 2.624 2.97715 0.691 1.341 1.753 2.131 2.602 2.94716 0.690 1.337 1.746 2.120 2.583 2.92117 0.689 1.333 1.740 2.110 2.567 2.89818 0.688 1.330 1.734 2.101 2.552 2.87819 0.688 1.328 1.729 2.093 2.539 2.86120 0.687 1.325 1.725 2.086 2.528 2.84521 0.686 1.323 1.721 2.080 2.518 2.83122 0.686 1.321 1.717 2.074 2.508 2.81923 0.685 1.319 1.714 2.069 2.500 2.80724 0.685 1.318 1.711 2.064 2.492 2.79725 0.684 1.316 1.708 2.060 2.485 2.78726 0.684 1.315 1.706 2.056 2.479 2.77927 0.684 1.314 1.703 2.052 2.473 2.77128 0.683 1.313 1.701 2.048 2.467 2.76329 0.683 1.311 1.699 2.045 2.462 2.756

infinity 0.674 1.282 1.645 1.960 2.326 2.576