review of statistics

36
INTERVAL ESTIMATION INTERVAL ESTIMATION and and HYPOTHESIS TESTING HYPOTHESIS TESTING

Upload: jessica-angelina

Post on 19-Oct-2015

11 views

Category:

Documents


0 download

DESCRIPTION

l

TRANSCRIPT

  • INTERVAL ESTIMATION

    and

    HYPOTHESIS TESTING

  • Inferential Statistics

    Making statements about a population by examining sample resultsSample statistics Population parameters (known) Inference (unknown, but can be estimated from sample evidence)

  • Sample Statistics as Estimators of Population Parameters

    An estimator of a population parameter is a sample statistic used to estimate or predict the population parameter.An estimate of a parameter is a particular numerical value of a sample statistic obtained through sampling.A point estimate is a single value used as an estimate of a population parameter.A population parameter is a numerical measure of a summary characteristic of a population.A sample statistic is a numerical measure of a summary characteristic of a sample.

  • Inferential StatisticsEstimatione.g., Estimate the population mean using the information derived from sampleHypothesis Testinge.g., Use sample evidence to test hypotheses about the population meanDrawing conclusions and/or making decisions concerning a population based on sample results.

  • Point and Interval EstimatesA point estimate is a single number, A confidence interval contains a certain percentage of possible values of the parameterPoint EstimateLower Confidence LimitUpperConfidence LimitWidth of confidence interval

  • Confidence Level, (1-)Suppose confidence level = 95% Also written (1 ) = 0.95A relative frequency interpretation:Any possible sample has 95% chance that the confidence intervals constructed around its statistic will contain the unknown true parameter

  • Reducing the Margin of ErrorThe margin of error can be reduced if

    the sample standard deviation is lower ()

    The sample size is increased (n)

    The confidence level is decreased, (1 )

  • Finding z/2Consider a 95% confidence interval:z = -1.96z = 1.96Point EstimateLower Confidence LimitUpperConfidence LimitZ units:X units:Point Estimate0Find z.025 = 1.96 from the standard normal distribution table

  • Common Levels of ConfidenceCommonly used confidence levels are 90%, 95%, and 99%Confidence LevelConfidence Coefficient, Z/2 value1.281.6451.962.332.583.083.27.80.90.95.98.99.998.99980%90%95%98%99%99.8%99.9%

  • Intervals and Level of ConfidenceConfidence Intervals Intervals extend from to 100(1-)% of intervals constructed contain ; 100()% do not.Sampling Distribution of the Meanxx1x2

  • Z-value for Sampling Distributionof the MeanZ-value for the sampling distribution of :where:= sample mean= population mean= population standard deviation n = sample size

  • Example 1A large automotive-parts wholesaler needs an estimate of the mean life it can expect from windshield wiper blades under typical driving conditionsAlready, management has determined that the standard deviation of the population life is 6 monthsSuppose we select a simple random sample of 100 wiper blades, collect data on their useful lives, and obtain these results: = 6 months n = 100x = 21 monthsGive a 95% confidence interval for the true average life expectancy of wiper blades.

  • Example 1(continued)

  • InterpretationWe are 95% confident that the true mean life of the population of wiper blades is between 19.82 and 22.18 months Although the true mean may or may not be in this particular interval, 95% of intervals formed in this manner will contain the true mean

  • Example using ExcelDisplay shows the Excel function that is used(continued)

  • Example using ExcelDisplay shows the error value of 5387.75. Add and subtract this value to the sample mean to get the 95% confidence interval.(continued)

  • Central Limit TheoremnAs the sample size gets large enough the sampling distribution becomes almost normal regardless of shape of population

  • If the Population is not NormalPopulation DistributionDistribution of sample means (becomes normal as n increases)Central TendencyVariationLarger sample sizeSmaller sample sizeSampling distribution properties:

  • Students t-distributiont0t (df = 5) t (df = 13)t-distributions are bell-shaped and symmetric, but have fatter tails than the normalStandard Normal(t with df = )t Z as n increases

  • t TableRight Tail Areadf

    .10.025.05112.706233.182t02.920The body of the table contains t values, not probabilitiesLet: n = 3 df = n - 1 = 2 = .10 /2 =.05/2 = .053.0781.8861.6386.3142.9202.3534.303

  • t distribution valuesWith comparison to the Z valueConfidence t t t Z Level (10 d.f.) (20 d.f.) (30 d.f.) ____

    .80 1.372 1.325 1.310 1.282 .90 1.812 1.725 1.697 1.645 .95 2.228 2.086 2.042 1.960 .99 3.169 2.845 2.750 2.576Note: t Z as n increases

  • HYPOTHESIS TESTING

    The Null Hypothesis, H0

    At the beginning, assume that the null hypothesis is true (until evidence suggests otherwise)Similar to the notion of innocent until proven guiltyRefers to the status quoAlways contains =, or one of the signsMay or may not be rejected

  • The Alternative Hypothesis, HAIs the opposite of the null hypothesise.g., The average number of TV sets in U.S. homes is not equal to 3 ( H1: 3 )The assertion of all situations not covered by H0Challenges the status quoNever contains the = , or signIs generally the researchers theoryH0 and H1 are: Mutually exclusive: Only one can be true.Exhaustive: Together they cover all possibilities, so one or the other must be true.

  • Reason for Rejecting H0Sampling Distribution of X = 50If H0 is trueIf it is unlikely that we would get a sample mean of this value ...... then we reject the null hypothesis that = 50.20... if in fact this were the population meanX

  • Level of Significance and the Rejection RegionH0: 3 H1: < 30H0: 3 H1: > 3aa Represents critical valueLower-tail testRequired Level of significance = a0Upper-tail testTwo-tail testRejection region is shaded /20a /2aH0: = 3 H1: 3

  • Level of Significance, Defines rejection region of the sampling distributionIs designated by , (level of significance)Typical values are .01, .05, or .10Is selected by the researcher at the beginningProvides the critical value(s) of the test

  • Decision RuleReject H0Do not reject H0az00H0: 0 H1: > 0 Critical valueZAlternate rule:

  • Errors in Making DecisionsType I Error Rejecting a true null hypothesisThe probability of Type I Error is Called level of significance of the testSet by researcher in advance

    Type II ErrorFail to reject a false null hypothesisThe probability of Type II Error is

  • Outcomes and Probabilities

  • Result Probabilities

    H0: Innocent

    The Truth

    The Truth

    Verdict

    Innocent

    Guilty

    Decision

    H

    0

    True

    H

    0

    False

    Innocent

    Correct

    Error

    Do Not

    Reject

    H

    0

    1 -

    a

    Type II

    Error (

    b

    )

    Guilty

    Error

    Correct

    Reject

    H

    0

    Type I

    Error

    (

    a

    )

    Power

    (1 -

    b

    )

    Jury Trial

    Hypothesis

    Test

  • Two-Tail TestsIn some settings, the alternative hypothesis does not specify a unique directionDo not reject H0Reject H0Reject H0There are two critical values, defining the two regions of rejection /20H0: = 3 H1: 3/2Lower critical valueUpper critical value3zx-z/2+z/2

  • p-Value Approach to Testingp-value: Probability of obtaining a test statistic more extreme ( or ) than the observed sample value, given H0 is trueAlso called observed level of significanceSmallest value of for which H0 can be rejected

  • p-Value Approach to TestingConvert sample result (e.g., ) to test statistic (e.g., z statistic )Obtain the p-valueFor an upper tail test:

    Decision rule: compare the p-value to If p-value < , reject H0If p-value , do not reject H0 (continued)

  • The p-Value: Rules of ThumbWhen the p-value is smaller than 0.01, the result is called very significant.

    When the p-value is between 0.01 and 0.05, the result is called significant.

    When the p-value is between 0.05 and 0.10, the result is considered by some as marginally significant (and by most as not significant).

    When the p-value is greater than 0.10, the result is considered not significant.

  • Caution: Hypotheses are accepted, not provedSuppose your theory is x > 3 (HA)Obtaining a sample mean greater than 3 is not sufficient to support your theoryIt simply does not provide statistical evidence to reject it

    Obtaining a sample mean significantly greater than 3 (the H0 value) supports your theory, but does not prove it.THEORIES CAN NEVER BE PROVED BY SAMPLE EVIDENCE.

  • Example 1A company that delivers packageswithin a large metropolitan areaclaims that it takes an average of28 minutes for a package to bedelivered from your door to thedestination. Suppose that youwant to carry out a hypothesis testof this claim.A random sample of 100 deliveries resulted in x = 31.5 minutes and s = 5 minutes. Test the claim at the = 0.05 level.H0: = 28 H1: 28

  • Example 1a = 0.05n = 100 is unknown, but n is large, so use a z statisticCritical Value: z .025 = 1.96Reject H0: sufficient evidence that true mean delivery time is different from 28 minutesReject H0Reject H0a/2=.025-z/2Do not reject H00a/2=.025 -1.96 1.967H0: = 28 H1: 28z/2(continued)