chapter 6 introduction to statistical inference. introduction goal: make statements regarding a...

33
Chapter 6 Introduction to Statistical Inference

Upload: damian-malone

Post on 29-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Chapter 6

Introduction to Statistical Inference

Introduction

• Goal: Make statements regarding a population (or state of nature) based on a sample of measurements

• Probability statements used to substantiate claims• Example: Clinical Trial for Pravachol (5-year follow-up)

– Of 3302 subjects receiving Pravachol, 174 had heart incidences

– Of 3293 subjects receiving placebo, 248 had heart incidences

11363in chance oneely Approximat

.000088 :effectivenot ifbetter much thisdo wouldPravacholy that Probabilit

%)53.7(0753.3293

248

%)27.5(0527.3302

174

placebo

^

Pravachol

^

p

p

Estimating with Confidence

• Goal: Estimate a population mean (proportion) based on sample mean (proportion)

• Unknown: Parameter (, p)• Known: Approximate Sampling Distribution of Statistic

n

pppNp

nNX

)1(,~,~

^

• Recall: For a random variable that is normally distributed, the probability that it will fall within 2 standard deviations of mean is approximately 0.95

95.0)1(

2)1(

295.022^

n

pppp

n

pppP

nX

nP

Estimating with Confidence

• Although the parameter is unknown, it’s highly likely that our sample mean or proportion (estimate) will lie within 2 standard deviations (aka standard errors) of the population mean or proportion (parameter)

• Margin of Error: Measure of the upper bound in sampling error with a fixed level (we will use 95%) of confidence. That will correspond to 2 standard errors:

error ofmargin estimate :Interval Confidence

)1(2 :)Confidence (95%Error ofMargin :Proportion

2 :)Confidence (95%Error ofMargin :Mean

n

pp

n

Confidence Interval for a Mean • Confidence Coefficient (C): Probability (based on

repeated samples and construction of intervals) that a confidence interval will contain the true mean

• Common choices of C and resulting intervals:

nzxC

nx

nx

nx

* :Confidence %

576.2 :Confidence 99%

960.1 :Confidence 95%

645.1 :Confidence 90%

C z*

90% 1.64595% 1.96099% 2.576

Normal Distribution

nz

*n

z *

C 2

1 C

2

1 C

Standard Normal Distribution

*z*z

C 2

1 C

2

1 C

0

Philadelphia Monthly Rainfall (1825-1869)

Histogram

0

20

40

60

80

100

120

140

Freq

uenc

y

1

2

3

4

5

6

7

8

9

10

11

12

13

14

1584.0

20

92.196.1:%)95 20,(error ofMargin 92.168.3 Cn

4 Random Samples of Size n=20, 95% CI’s Sample 1 Sample 2 Sample 3 Sample 4

Month Rain Ran# Month Rain Ran# Month Rain Ran# Month Rain Ran#156 2.56 0.0028 349 2.33 0.0007 185 2.69 0.0005 171 1.50 0.001151 2.87 0.0050 149 4.86 0.0013 527 5.28 0.0029 175 2.52 0.0048176 4.64 0.0052 227 4.15 0.0054 114 3.99 0.0048 130 1.22 0.0085364 2.05 0.0082 336 5.17 0.0073 312 4.51 0.0084 167 3.35 0.0094271 2.76 0.0142 124 4.33 0.0081 49 5.37 0.0085 101 5.88 0.01337 2.06 0.0145 330 4.03 0.0101 398 2.29 0.0166 33 0.79 0.0148312 4.51 0.0153 468 4.63 0.0132 396 5.55 0.0187 299 2.60 0.0164219 4.41 0.0160 293 3.99 0.0145 99 2.22 0.0233 337 1.85 0.019116 3.87 0.0171 511 2.39 0.0149 181 1.84 0.0235 447 3.55 0.0193484 2.83 0.0190 235 5.28 0.0172 364 2.05 0.0244 78 3.53 0.0213316 4.56 0.0202 314 3.11 0.0190 392 7.59 0.0253 117 3.57 0.0224318 3.44 0.0257 372 5.42 0.0260 477 7.16 0.0283 399 1.09 0.0227517 3.62 0.0272 164 2.78 0.0272 434 2.07 0.0290 52 4.99 0.0240249 2.16 0.0301 48 0.26 0.0281 229 4.05 0.0318 162 6.60 0.0261445 4.79 0.0320 236 2.40 0.0284 223 4.54 0.0320 95 2.59 0.029613 1.11 0.0324 50 3.75 0.0319 279 2.76 0.0364 479 3.93 0.0296479 3.93 0.0325 39 3.35 0.0325 520 5.44 0.0374 51 2.87 0.0303370 4.11 0.0345 417 7.68 0.0333 245 1.60 0.0374 380 6.00 0.0311348 2.17 0.0374 503 1.76 0.0359 183 2.63 0.0391 61 1.63 0.032489 5.40 0.0380 151 5.89 0.0361 41 3.49 0.0395 302 2.87 0.0339

Mean 3.39 3.88 3.86 3.15Mean-me 2.55 3.04 3.02 2.31Mean+me 4.23 4.72 4.70 3.99

84.020

92.196.1:%)95 20,(error ofMargin 92.168.3 Cn

Factors Effecting Confidence Interval Width

• Goal: Have precise (narrow) confidence intervals– Confidence Level (C): Increasing C implies increasing

probability an interval contains parameter implies a wider confidence interval. Reducing C will shorten the interval (at a cost in confidence)

– Sample size (n): Increasing n decreases standard error of estimate, margin of error, and width of interval (Quadrupling n cuts width in half)

– Standard Deviation (): More variable the individual measurements, the wider the interval. Potential ways to reduce are to focus on more precise target population or use more precise measuring instrument. Often nothing can be done as nature determines

Selecting the Sample Size

• Before collecting sample data, usually have a goal for how large the margin of error should be to have useful estimate of unknown parameter (particularly when comparing two populations)

• Let m be the desired level of the margin of error and be the standard deviation of the population of measurements (typically will be unknown and must be estimated based on previous research or pilot study

• The sample size giving this margin of error is:

2**

m

zn

nzm

Precautions• Data should be simple random sample from population

(or at least can be treated as independent observations)• More complex sampling designs have adjustments

made to formulas (see Texts such as Elementary Survey Sampling by Scheaffer, Mendenhall, Ott)

• Biased sampling designs give meaningless results• Small sample sizes from nonnormal distributions will

have coverage probabilities (C) typically below the nominal level

• Typically is unknown. Replacing it with sample standard deviation s works as a good approximation in large samples

Significance Tests

• Method of using sample (observed) data to challenge a hypothesis regarding a state of nature (represented as particular parameter value(s))

• Begin by stating a research hypothesis that challenges a statement of “status quo” (or equality of 2 populations)

• State the current state or “status quo” as a statement regarding population parameter(s)

• Obtain sample data and see to what extent it agrees/disagrees with the “status quo”

• Conclude that the “status quo” is not true if observed data are highly unlikely (low probability) if it were true

Pravachol and Olestra• Pravachol vs Placebo wrt heart disease/death

– Pravachol: 5.27% of 3302 patients suffer MI or death to CHD

– Placebo: 7.53% of 3293 patients suffer MI or death to CHD

– Probability of difference this large for Pravachol if no more effective than placebo is .000088 (will learn formula later)

• Olestra vs Triglyceride Chips wrt GI Symptoms– Olestra: 15.81% of 563 subjects report GI symptoms

– Triglyceride: 17.58% of 529 subjects report GI symptoms

– Probability of difference this large in either direction (olestra better or worse) is .4354

• Strong evidence of Pravachol effect vs placebo• Weak to no evidence of Olestra effect vs Triglyceride

Elements of a Significance Test

• Null hypothesis (H0): Statement or theory being tested. Will be stated in terms of parameters and contain an equality. Test is set up under the assumption of its truth.

• Alternative Hypothesis (Ha): Statement contradicting H0. Will be stated in terms of parameters and contain an inequality. Will only be accepted if strong evidence refutes H0 based on sample data. May be 1-sided or 2-sided, depending on theory being tested.

• Test Statistic (TS): Quantity measuring discrepancy between sample statistic (estimate) and parameter value under H0

• P-value: Probability (assuming H0 true) that we would observe sample data (test statistic) this extreme or more extreme in favor of the alternative hypothesis (Ha)

Example: Interference Effect

• Does the way items are presented effect task time?– Subjects shown list of color names in 2 colors: different/black

– Xi is the difference in times to read lists for subject i: diff-blk

– H0: No interference effect: mean difference is 0 ( = 0)

– Ha: Interference effect exists: mean difference > 0 ( > 0)

– Assume standard deviation in differences is = 8 (unrealistic*)

– Experiment to be based on n=70 subjects

39.2 :mean sample Observed

)96.070

8,0(~:under mean sample ofon Distributi eApproximat

0:under valueParameter

0

0

x

nNXH

H

How likely to observe sample mean difference 2.39 if = 0?

Sampling Distribution of X-bar

02.39

P-value

Computing the P-Value

• 2-sided Tests: How likely is it to observe a sample mean as far of farther from the value of the parameter under the null hypothesis? (H0: 0 Ha: 0)

)1,0(~,~:Under 000 N

n

XZ

nNXH

After obtaining the sample data, compute the mean and convert it to a z-score (zobs) and find the area above |zobs| and below -|zobs| from the standard normal (z) table

• 1-sided Tests: Obtain the area above zobs for upper tail tests (Ha:0) or below zobs for lower tail tests (Ha:0)

Interference Effect (1-sided Test)• Testing whether population mean time to read list of colors is higher

when color is written in different color

• Data: Xi: difference score for subject i (Different-Black)

• Null hypothesis (H0): No interference effect ( = 0)

• Alternative hypothesis (Ha): Interference effect ( > 0)

• “Known”: n=70, = 8 (This won’t be known in practice but can be replaced by sample s.d. for large samples)

0051.9949.1)57.2(:)7.81on (Based value-

0064.9936.1)49.2(:)8on (Based value-

57.293.0

39.2

70

81.7

039.2:)81.7on (Based StatisticTest

49.296.0

39.2

70

8

039.2:)8on (Based StatisticTest

7081.739.2 :Data Sample

ZPsP

ZPP

zs

z

nsx

obs

obs

Interference Effect (2-sided Test)• Testing whether population mean time to read list of colors is

effected (higher or lower) when color is written in different color

• Data: Xi: difference score for subject i (Different-Black)

• Null hypothesis (H0): No interference effect ( = 0)

• Alternative hypothesis (Ha): Interference effect (+ or -) ( 0)

• “Known”: n=70, = 8 (This won’t be known in practice but can be replaced by sample s.d. for large samples)

0102.)9949.1(2|)57.2|(2:)7.81on (Based value-

0128.)9936.1(2|)49.2|(2:)8on (Based value-

57.293.0

39.2

70

81.7

039.2:)81.7on (Based StatisticTest

49.296.0

39.2

70

8

039.2:)8on (Based StatisticTest

7081.739.2 :Data Sample

ZPsP

ZPP

zs

z

nsx

obs

obs

Equivalence of 2-sided Tests and CI’s

• For = 1-C, a 2-sided test conducted at significance level will give equivalent results to a C-level confidence interval:– If entire interval > 0, P-value < , zobs > 0 (conclude > 0)

– If entire interval < 0, P-value < , zobs < 0 (conclude < 0)

– If interval contains 0, P-value > (don’t conclude 0)

• Confidence interval is the set of parameter values that we would fail to reject the null hypothesis for (based on a 2-sided test)

Decision Rules and Critical Values

• Once a significance () level has been chosen a decision rule can be stated, based on a critical value:

• 2-sided tests: H0: = 0 Ha: 0

– If test statistic (zobs) > z/2 Reject Ho and conclude > 0

– If test statistic (zobs) < -z/2 Reject Ho and conclude < 0

– If -z/2 < zobs < z/2 Do not reject H0: = 0

• 1-sided tests (Upper Tail): H0: = 0 Ha: > 0

– If test statistic (zobs) > z Reject Ho and conclude > 0

– If zobs < z Do not reject H0: = 0

• 1-sided tests (Lower Tail): H0: = 0 Ha: < 0

– If test statistic (zobs) < -z Reject Ho and conclude < 0

– If zobs > -z Do not reject H0: = 0

Potential for Abuse of Tests

• Should choose a significance () level in advance and report test conclusion (significant/nonsignificant) as well as the P-value. Significance level of 0.05 is widely used in the academic literature

• Very large sample sizes can detect very small differences for a parameter value. A clinically meaningful effect should be determined, and confidence interval reported when possible

• A nonsignificant test result does not imply no effect (that H0 is true).

• Many studies test many variables simultaneously. This can increase overall type I error rates

Large-Sample Test H0:1-2=0 vs H0:1-2>0

• H0: 1-2 = 0 (No difference in population means

• HA: 1-2 > 0 (Population Mean 1 > Pop Mean 2)

)(:

:..

:..

2

22

1

21

21

obs

obs

obs

zZPvalueP

zzRR

ns

ns

xxzST

• Conclusion - Reject H0 if test statistic falls in rejection region, or equivalently the P-value is

Example - Botox for Cervical Dystonia

• Patients - Individuals suffering from cervical dystonia • Response - Tsui score of severity of cervical dystonia

(higher scores are more severe) at week 8 of Tx• Research (alternative) hypothesis - Botox A

decreases mean Tsui score more than placebo• Groups - Placebo (Group 1) and Botox A (Group 2)• Experimental (Sample) Results:

354.37.7

336.31.10

222

111

nsx

nsx

Source: Wissel, et al (2001)

Example - Botox for Cervical Dystonia

0024.)82.2(:

645.1:..

82.285.0

4.2

35)4.3(

33)6.3(

7.71.10:..

0:

0:

05.

22

21

210

ZPvalP

zzzRR

zST

H

H

obs

obs

A

Test whether Botox A produces lower mean Tsui scores than placebo ( = 0.05)

Conclusion: Botox A produces lower mean Tsui scores than placebo (since 2.82 > 1.645 and P-value < 0.05)

2-Sided Tests

• Many studies don’t assume a direction wrt the difference 1-2

• H0: 1-2 = 0 HA: 1-2 0

• Test statistic is the same as before• Decision Rule:

– Conclude 1-2 > 0 if zobs z=0.05 z2=1.96)

– Conclude 1-2 < 0 if zobs -z=0.05 -z2= -1.96)

– Do not reject 1-2 = 0 if -zzobs z

• P-value: 2P(Z |zobs|)

Power of a Test

• Power - Probability a test rejects H0 (depends on 1- 2)

– H0 True: Power = P(Type I error) = – H0 False: Power = 1-P(Type II error) = 1-

· Example: · H0: 1- 2 = 0 HA: 1- 2 > 0

=

n1 = n2 = 25

· Decision Rule: Reject H0 (at =0.05 significance level) if:

326.2645.12

2121

2

22

1

21

21

xx

xx

nn

xxzobs

Power of a Test

• Now suppose in reality that 1-2 = 3.0 (HA is true)

• Power now refers to the probability we (correctly) reject the null hypothesis. Note that the sampling distribution of the difference in sample means is approximately normal, with mean 3.0 and standard deviation (standard error) 1.414.

• Decision Rule (from last slide): Conclude population means differ if the sample mean for group 1 is at least 2.326 higher than the sample mean for group 2

• Power for this case can be computed as:

)414.10.2,3(~)326.2( 2121 NXXXXP

Power of a Test

6844.)48.041.1

3326.2()326.2( 21

ZPXXPPower

• All else being equal:

• As sample sizes increase, power increases

• As population variances decrease, power increases

• As the true mean difference increases, power increases

Power of a Test

Distribution (H0) Distribution (HA)

Power of a Test

Power Curves for group sample sizes of 25,50,75,100 and varying true values 1-2 with 1=2=5.

• For given 1-2 , power increases with sample size

• For given sample size, power increases with 1-2

Sample Size Calculations for Fixed Power• Goal - Choose sample sizes to have a favorable chance of

detecting a clinically meaning difference

• Step 1 - Define an important difference in means:– Case 1: approximated from prior experience or pilot study - dfference

can be stated in units of the data

– Case 2: unknown - difference must be stated in units of standard deviations of the data

21

• Step 2 - Choose the desired power to detect the the clinically meaningful difference (1-, typically at least .80). For 2-sided test:

2

22/

21

2

zz

nn