copyright © 2011 pearson education, inc. alternative approaches to inference chapter 17

Post on 19-Jan-2016

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Copyright © 2011 Pearson Education, Inc.

Alternative Approaches to

Inference

Chapter 17

17.1 A Confidence Interval for the Median

An auto insurance company is thinking about compensating agents by comparing the number of claims they produce to a standard. Annual claims average near $3,200 with a median claim of $2,000.

Claims are highly skewed Use nonparametric methods that don’t rely on a

normal sampling distribution

Copyright © 2011 Pearson Education, Inc.

3 of 35

17.1 A Confidence Interval for the Median

Distribution of Sample of Claims (n = 42)

For this sample, the average claim is $3,632 with s = $4,254. The median claim is $2,456.

Copyright © 2011 Pearson Education, Inc.

4 of 35

17.1 A Confidence Interval for the Median

Is Sample Mean Compatible with µ=$3,200?

To answer this question, construct a 95% confidence interval for µ

This interval is $3,632 ± 2.02 x $4,254 / [$2,306 to $4,958]

Copyright © 2011 Pearson Education, Inc.

5 of 35

42

17.1 A Confidence Interval for the Median

Is Sample Mean Compatible with µ=$3,200?

The national average of $3,200 lies within the 95% confidence t-interval for the mean.

BUT…the sample does not satisfy the sample size condition necessary to use the t-interval.

The t-interval is unreliable with unknown coverage when the conditions are not met.

Copyright © 2011 Pearson Education, Inc.

6 of 35

17.1 A Confidence Interval for the Median

Nonparametric Statistics

Avoid making assumptions about the shape of the population.

Often rely on sorting the data.

Suited to parameters such as the population median θ (theta).

Copyright © 2011 Pearson Education, Inc.

7 of 35

17.1 A Confidence Interval for the Median

Nonparametric Statistics

For the claims data that are highly skewed to the right, θ < µ.

If the population distribution is symmetric, then θ = µ.

Copyright © 2011 Pearson Education, Inc.

8 of 35

17.1 A Confidence Interval for the Median

Nonparametric Confidence Interval

First step in finding a confidence interval for θ is to sort the observed data in ascending order (known as order statistics).

Order statistics are denoted asX(1) < X(2) < … < X(n)

Copyright © 2011 Pearson Education, Inc.

9 of 35

17.1 A Confidence Interval for the Median

Nonparametric Confidence Interval

If data are an SRS from a population with median θ, then we know

1. The probability that a random draw from the population is less than or equal to θ is ½,

2. The observations in the random sample are independent.

Copyright © 2011 Pearson Education, Inc.

10 of 35

17.1 A Confidence Interval for the Median

Nonparametric Confidence Interval

Determine the probabilities that the population median lies between ordered observations using the binomial distribution.

To form the confidence interval for θ combine several segments to achieve desired coverage.

Copyright © 2011 Pearson Education, Inc.

11 of 35

17.1 A Confidence Interval for the Median

Nonparametric Confidence Interval

In general, can’t construct a confidence interval for θ whose coverage is exactly 0.95.

The 94.6% confidence interval for the median claim is [$1,217 to $3,168].

Copyright © 2011 Pearson Education, Inc.

12 of 35

17.1 A Confidence Interval for the Median

Parametric versus Nonparametric

Limitations of nonparametric methods

1. Coverage is limited to certain values determined by sums of binomial probabilities (difficult to obtain exactly 95% coverage).

2. Median is not equal to the mean if the population distribution is skewed. This prohibits obtaining estimates for the total (total = nµ).

Copyright © 2011 Pearson Education, Inc.

13 of 35

17.2 Transformations

Transform Data into Symmetric Distributions

Taking base 10 logs of the claims data results in a more symmetric distribution.

Copyright © 2011 Pearson Education, Inc.

14 of 35

17.2 Transformations

Transform Data into Symmetric Distributions

Taking base 10 logs of the claims data results in data that could be from a normal distribution.

Copyright © 2011 Pearson Education, Inc.

15 of 35

17.2 Transformations

Transform Data into Symmetric Distributions

If y = log10 x, then = 3.312 with sy = 0.493.

The 95% confidence t-interval for µy is

[3.16 to 3.47].

If we convert back to the original scale of dollars, this interval resembles that for the median rather than that for the mean.

Copyright © 2011 Pearson Education, Inc.

16 of 35

y

17.3 Prediction Intervals

Prediction Interval: an interval that holds a future

draw from the population with chosen probability.

For the auto insurance example, a prediction interval anticipates the size of the next claim, allowing for the random variation associated with an individual.

Copyright © 2011 Pearson Education, Inc.

17 of 35

17.3 Prediction Intervals

For a Normal Population

The 100 (1 – α)% prediction interval for an independent draw from a normal population is

where and s estimate µ and σ.

Copyright © 2011 Pearson Education, Inc.

18 of 35

nstx

n

11

1,2/

x

17.3 Prediction Intervals

Nonparametric Prediction Interval

Relies on the properties of order statistics:

P(X(i) ≤ X ≤ X(i+1)) = 1/(n + 1)

P(X ≤ X(1)) = 1/(n + 1)

P(X(n) ≤ X) = 1/(n + 1)

Copyright © 2011 Pearson Education, Inc.

19 of 35

17.3 Prediction Intervals

Nonparametric Prediction Interval

Combine segments to get desired coverage.

P (X(2) ≤ X ≤ X(41)) = P ($255 ≤ X ≤ $17,305)

= (41 – 2)/43 0.91

There is a 91% chance that the next claim is between $255 and $17,305.

,

Copyright © 2011 Pearson Education, Inc.

20 of 35

4M Example 17.1: EXECUTIVE SALARIES

Motivation

Fees earned by an executive placement service are 5% of the starting annual total compensation package. How much can the firm expect to earn by placing a current client as a CEO in the telecom industry?

Copyright © 2011 Pearson Education, Inc.

21 of 35

4M Example 17.1: EXECUTIVE SALARIES

Method

Obtain data (n = 23 CEOs from telecom industry).

Copyright © 2011 Pearson Education, Inc.

22 of 35

4M Example 17.1: EXECUTIVE SALARIES

Method

The distribution of total compensation for CEOs in the telecom industry is not normal. Construct a nonparametric prediction interval for the client’s anticipated total compensation package.

Copyright © 2011 Pearson Education, Inc.

23 of 35

4M Example 17.1: EXECUTIVE SALARIES

Mechanics

Sort the data:

Copyright © 2011 Pearson Education, Inc.

24 of 35

4M Example 17.1: EXECUTIVE SALARIES

Mechanics

The interval x(3) to x(21) is

$743,801 to $29,863,393

and is a 75% prediction interval.

Copyright © 2011 Pearson Education, Inc.

25 of 35

4M Example 17.1: EXECUTIVE SALARIES

Message

The compensation package of three out of four placements in this industry is predicted to be in the range from about $750,000 to $30,000,000. The implied fee ranges from $37,500 to $1,500,000.

Copyright © 2011 Pearson Education, Inc.

26 of 35

17.4 Proportions Based on Small Samples

Wilson’s Interval for a Proportion

An adjustment that moves the sampling distribution of closer to ½ and away from the troublesome boundaries at 0 and 1.

Add four artificial cases (2 successes and 2 failures) to create an adjusted proportion .

Copyright © 2011 Pearson Education, Inc.

27 of 35

p~

17.4 Proportions Based on Small Samples

Wilson’s Interval for a Proportion

Add 2 successes and 2 failures to the data and define = (# of successes+2)/n+4 ( = n+4).

The z-interval is

Copyright © 2011 Pearson Education, Inc.

28 of 35

p~ n~

n

ppzp ~

)~1(~~2/

4M Example 17.2: DRUG TESTING

Motivation

A company is developing a drug to prolong time before a relapse of cancer. The drug must cut the rate of relapse in half. To test this drug, the company first needs to know the current time to relapse.

Copyright © 2011 Pearson Education, Inc.

29 of 35

4M Example 17.2: DRUG TESTING

Method

Data are collected for 19 patients who were observed for 24 months. Doctors found a relapse in 9 of the 19 patients. While the SRS condition is satisfied, the sample size condition is not. Use Wilson’s interval for a proportion.

Copyright © 2011 Pearson Education, Inc.

30 of 35

4M Example 17.2: DRUG TESTING

Mechanics

By adding two successes and two failures, we have

The interval is

0.478 ± 1.96 = [0.27 to 0.68]

Copyright © 2011 Pearson Education, Inc.

31 of 35

478.0)419/()29(~ p

)419/()478.01(478.0

4M Example 17.2: DRUG TESTING

Message

We are 95% confident that the proportion of patients with this cancer that relapse within 24 months is between 27% and 68%. In order to cut this proportion in half, the drug will have to reduce this rate to somewhere between 13% and 34%.

Copyright © 2011 Pearson Education, Inc.

32 of 35

Best Practices

Check the assumptions carefully when dealing with small samples.

Consider a nonparametric alternative if you suspect non-normal data.

Use the adjustment procedure for proportions from small samples.

Verify that your data are an SRS.

Copyright © 2011 Pearson Education, Inc.

33 of 35

Pitfalls

Avoid assuming that populations are normally distributed in order to use a t – interval for the mean.

Do not use confidence intervals based on normality just because they are narrower than a nonparametric interval.

Do not think that you can prove normality using a normal quantile plot.

Copyright © 2011 Pearson Education, Inc.

34 of 35

Pitfalls (Continued)

Do not rely on software to know which procedure to use.

Do not use a confidence interval when you need a prediction interval.

Copyright © 2011 Pearson Education, Inc.

35 of 35

top related