x simulating the sampling name: example -statistic

10
Math 3070 § 1. Treibergs Simulating the Sampling Distribution of the t-Statistic Name: Example May 20, 2011 How well does a statistic behave when the assumptions about the underlying population fail to hold? We shall explore the t-statistic, for small and large samples. For small samples, the population is assumed to be approximately normal, so that the statistic itself follows a t- distribution and that hypothesis tests can me made using critical t-scores. We simulate random samples taken from exponential and Cauchy distributions and look at the empirical distributions of the t-statistic. This study follows the description in Verzani’s “SimpleR: Using R for Introductory Statistics,” from the Appendix “A sample R session: a simulation example.” R Session: R version 2.10.1 (2009-12-14) Copyright (C) 2009 The R Foundation for Statistical Computing ISBN 3-900051-07-0 R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type ’license()’ or ’licence()’ for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type ’contributors()’ for more information and ’citation()’ on how to cite R or R packages in publications. Type ’demo()’ for some demos, ’help()’ for on-line help, or ’help.start()’ for an HTML browser interface to help. Type ’q()’ to quit R. [R.app GUI 1.31 (5538) powerpc-apple-darwin8.11.1] [Workspace restored from /Users/andrejstreibergs/.RData] > ##################################################################### > ### SIMULATION TO SEE HOW ROBUST THE t-STATISTIC IS TO CHANGES OF ### > ### DISTRIBUTION ### > ##################################################################### > > # This study follows the description in Verzani’s "SimpleR: Using R for > # Introductory Statistics", from the Appendix "A sample R session: a > # simulation example." > > # We are interested in how much the distribution of the t-statistic > # is influenced by the shape of the population distribution. > # For various background distributions of mean mu, we take 500 > # random samples of size n. For each we compute the t-statistic and > # then analyze the distribution by plotting its histogram, boxplot > # QQ-normal plot. 1

Upload: others

Post on 28-Feb-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: x Simulating the Sampling Name: Example -Statistic

Math 3070 § 1.Treibergs

Simulating the SamplingDistribution of the t-Statistic

Name: ExampleMay 20, 2011

How well does a statistic behave when the assumptions about the underlying populationfail to hold? We shall explore the t-statistic, for small and large samples. For small samples,the population is assumed to be approximately normal, so that the statistic itself follows a t-distribution and that hypothesis tests can me made using critical t-scores. We simulate randomsamples taken from exponential and Cauchy distributions and look at the empirical distributionsof the t-statistic.

This study follows the description in Verzani’s “SimpleR: Using R for Introductory Statistics,”from the Appendix “A sample R session: a simulation example.”

R Session:

R version 2.10.1 (2009-12-14)Copyright (C) 2009 The R Foundation for Statistical ComputingISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to redistribute it under certain conditions.Type ’license()’ or ’licence()’ for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.Type ’contributors()’ for more information and’citation()’ on how to cite R or R packages in publications.

Type ’demo()’ for some demos, ’help()’ for on-line help, or’help.start()’ for an HTML browser interface to help.Type ’q()’ to quit R.

[R.app GUI 1.31 (5538) powerpc-apple-darwin8.11.1]

[Workspace restored from /Users/andrejstreibergs/.RData]

> #####################################################################> ### SIMULATION TO SEE HOW ROBUST THE t-STATISTIC IS TO CHANGES OF ###> ### DISTRIBUTION ###> #####################################################################>> # This study follows the description in Verzani’s "SimpleR: Using R for> # Introductory Statistics", from the Appendix "A sample R session: a> # simulation example.">> # We are interested in how much the distribution of the t-statistic> # is influenced by the shape of the population distribution.> # For various background distributions of mean mu, we take 500> # random samples of size n. For each we compute the t-statistic and> # then analyze the distribution by plotting its histogram, boxplot> # QQ-normal plot.

1

Page 2: x Simulating the Sampling Name: Example -Statistic

> ######################### FUNCTION GIVES t-STATISTIC ################> make.t <- function(x,mu) (mean(x)-mu)/(sqrt(var(x)/length(x)))> mu <- 2; x <- rnorm(100,mu,1)> make.t(x,mu)[1] 1.365704> mu <- 16; x <- rexp(100,1/mu);make.t(x,mu)[1] 0.5150337> mu <- 16; x <- rexp(100,1/mu);make.t(x,mu)[1] -0.1985672> mu <- 16; x <- rexp(100,1/mu);make.t(x,mu)[1] 0.8135002>> ################## LOOP TO TAKE 500 SAMPLES OF SIZE n ################>> # Set up four plots per page. Save old graphics parameters. Restore at end.> opar <- par(mfrow=c(2,2))>> n<- 150;mu<-1> for(i in 1:500)re[i]<- make.t(rexp(n,1/mu),mu)> hist(re,main="Dist t-values for samples of",freq=F)> curve(dt(x,n-1),add=T,col=4)> boxplot(re,main=paste("Exp vars of size n =",n,", mu=",mu))> qqnorm(re);qqline(re,col=4)> curve(dexp(x,1/mu),xlim=c(0,4),main=paste("Exp dist with mu=",mu),col=4)> abline(h=0);abline(v=0)

2

Page 3: x Simulating the Sampling Name: Example -Statistic

Dist t-values for samples of

re

Density

-4 -2 0 2

0.0

0.1

0.2

0.3

0.4

-3-2

-10

12

3

Exp vars of size n = 150 , mu= 1

-3 -2 -1 0 1 2 3

-3-2

-10

12

3

Normal Q-Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

0 1 2 3 4

0.0

0.2

0.4

0.6

0.8

1.0

Exp dist with mu= 1

x

dexp

(x, 1

/mu)

> # With a large sample selected from exponential distribution, the> # t-statistic looke pretty nortmal: it is symmetric, and the QQ plot> # is pretty much normal. We have added the standard t-dist with n-1 df> # superimposed across the histogram.

3

Page 4: x Simulating the Sampling Name: Example -Statistic

> # The exponential distribution is asymmetric and one tail is light,> # the other s heavy. Do these properties show up in the simulation?> # Let’s try samples of size 15. The distribution of the statistic is> # skewed and no longer normal.>> n<- 15;mu<-10> for(i in 1:500)re[i]<- make.t(rexp(n,1/mu),mu)> for(i in 1:500)re[i]<- make.t(rexp(n,1/mu),mu)> hist(re,main="Dist t-values for samples of")> boxplot(re,main=paste("exponentials of size n =",n,", mu=",mu));qqnorm(re);qqline(re,col=2)> curve(dexp(x,1/mu),xlim=c(0,4),main=paste("exp dist with mu=",mu));abline(h=0);abline(v=0)

Dist t-values for samples of

re

Frequency

-8 -6 -4 -2 0 2

050

100

150

-8-6

-4-2

02

exponentials of size n = 15 , mu= 10

-3 -2 -1 0 1 2 3

-8-6

-4-2

02

Normal Q-Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

0 1 2 3 4

0.070

0.080

0.090

0.100

exp dist with mu= 10

x

dexp

(x, 1

/mu)

4

Page 5: x Simulating the Sampling Name: Example -Statistic

> # Let’s try samples of size 8 and change the parameter. The> # distribution of the statistic looks worse.>> n<- 8;mu<-4> for(i in 1:500)re[i]<- make.t(rexp(n,1/mu),mu)> hist(re,main="Dist t-values for samples of",freq=F);curve(dt(x,n-1),add=T,col=4)> legend(-18, .19,legend=paste("t-dist, df=",n-1),col=4,pch=19)> boxplot(re,main=paste("exponentials of size n =",n,", mu=",mu));qqnorm(re);qqline(re,col=4)> curve(dexp(x,1/mu),xlim=c(0,4),main=paste("exp dist with mu=",mu));abline(h=0);abline(v=0)

Dist t-values for samples of

re

Density

-20 -15 -10 -5 0

0.00

0.05

0.10

0.15

0.20

t-dist, df= 7

-15

-10

-50

exponentials of size n = 8 , mu= 4

-3 -2 -1 0 1 2 3

-15

-10

-50

Normal Q-Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

0 1 2 3 4

0.10

0.15

0.20

0.25

exp dist with mu= 4

x

dexp

(x, 1

/mu)

5

Page 6: x Simulating the Sampling Name: Example -Statistic

> # Let’s try samples of size 5 and change the parameter. The> # distribution of the statistic looks even worse.> n <- 5; mu <- 2> for(i in 1:500)re[i]<- make.t(rexp(n,1/mu),mu)> hist(re,main="Dist t-values for samples of",freq=F);curve(dt(x,n-1),add=T,col=6)> legend(-14, .19,legend=paste("t-dist, df=",n-1),col=6,pch=19)> boxplot(re,main=paste("exponentials of size n =",n,", mu=",mu));qqnorm(re);qqline(re,col=4)> curve(dexp(x,1/mu),xlim=c(0,4),main=paste("exp dist with mu=",mu));abline(h=0);abline(v=0)

Dist t-values for samples of

re

Density

-10 -5 0

0.00

0.05

0.10

0.15

0.20

t-dist, df= 4

-10

-50

exponentials of size n = 5 , mu= 2

-3 -2 -1 0 1 2 3

-10

-50

Normal Q-Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

0 1 2 3 4

0.1

0.2

0.3

0.4

0.5

exp dist with mu= 2

x

dexp

(x, 1

/mu)

6

Page 7: x Simulating the Sampling Name: Example -Statistic

> # Following Venables, we experiment with a symmetric, heavy-tailed> # distribution, the Cauchy distribution. Take a look at large sample> # first. Its t-statistic looks fairly normal, if not slightly short.>> n <- 100> mu <- 0;> for(i in 1:500)re[i]<- make.t(rcauchy(n),mu)> hist(re,main="Dist t-values for samples of",freq=F)> curve(dt(x,n-1),add=T,col=2)> legend(.5, .43,legend=paste("t-dist, df=",n-1),col=2,pch=19,bty="n")> boxplot(re,main=paste("Cauchy vars of size n =",n,", mu=",mu))> qqnorm(re)> qqline(re,col=2)> curve(dcauchy(x),xlim=c(-4,4),main=paste("Cauchy dist with mu=",mu),col=2)> abline(h=0)> abline(v=0)

7

Page 8: x Simulating the Sampling Name: Example -Statistic

Dist t-values for samples of

re

Density

-2 -1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

t-dist, df= 99

-2-1

01

2

Cauchy vars of size n = 100 , mu= 0

-3 -2 -1 0 1 2 3

-2-1

01

2

Normal Q-Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

-4 -2 0 2 4

0.05

0.15

0.25cauchy dist with mu= 0

x

dcauchy(x)

8

Page 9: x Simulating the Sampling Name: Example -Statistic

> # Take a look at a small sample (n=7). Its t-statistic still looks fairly> # normal. Thus the t-statistic is fairly robust with respect to this> # kind of population.>> n <- 7; mu <- 0> for(i in 1:500)re[i]<- make.t(rcauchy(n),mu)> hist(re,main="Dist t-values for samples of",freq=F)> curve(dt(x,n-1),add=T,col=3)> legend(1.6, .28,legend=paste("t-dist, df=",n-1),col=3,pch=19,bty="n")> boxplot(re,main=paste("Cauchy vars of size n =",n,", mu=",mu))> qqnorm(re)> qqline(re,col=3)> curve(dcauchy(x),xlim=c(-4,4),main=paste("Cauchy dist with mu=",mu),col=3)> abline(h=0)> abline(v=0)>> # Return to previous settings.> par(opar)

9

Page 10: x Simulating the Sampling Name: Example -Statistic

Dist t-values for samples of

re

Density

-4 -2 0 2 4

0.00

0.10

0.20

0.30

t-dist, df= 6

-20

24

Cauchy vars of size n = 7 , mu= 0

-3 -2 -1 0 1 2 3

-20

24

Normal Q-Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

-4 -2 0 2 4

0.05

0.15

0.25

Cauchy dist with mu= 0

x

dcauchy(x)

10