lecture 1 (chi-square and t tests)

4
1 Statistical Analysis Chi-square and t (Student’s) Test Significance Test tells us how confidently we can generalize to a larger (unmeasured) population from a (measured) sample of that population. Importance : we cannot generalize from a sample to the population without first submitting to a statistical test of significance. POPULATION versus SAMPLE Chi-square Test a statistical test that can be used to determine whether observed frequencies are significantly different from expected frequencies any statistical hypothesis test in which the test statistic has a chi- square distribution if the null hypothesis is true. can be used to test independence as well as goodness of fit Chi-square Test Non-parametric does not require the sample data to be normally distributed But assumes that the variable is normally distributed in the population from which the sample is drawn Chi-square Test For any positive integer n, the chi-square distribution with n degrees of freedom is the probability distribution of the random variable where the Z i are independent standard normal random variables (zero expected value and unit variance). This distribution is usually written 2 2 1 ... n Y Z Z = + + 2 n Y χ =

Upload: kismet

Post on 12-Nov-2014

7.958 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 1 (Chi-Square and T Tests)

1

Statistical Analysis

Chi-square and t (Student’s) Test

Significance Test

� tells us how confidently we can generalize to a larger (unmeasured) population from a (measured) sample of that population.

� Importance: we cannot generalize from a sample to the population without first submitting to a statistical test of significance.

POPULATION versus SAMPLE Chi-square Test

� a statistical test that can be used to determine whether observed frequencies are significantly different from expected frequencies

� any statistical hypothesis test in which the test statistic has a chi-square distribution if the null hypothesis is true.

� can be used to test independence as well as goodness of fit

Chi-square Test

� Non-parametric � does not require the sample data to be normally distributed

� But assumes that the variable is normally distributed in the population from which the sample is drawn

Chi-square Test

� For any positive integer n, the chi-square distribution with n degrees of freedom is the probability distribution of the random variable

� where the Zi are independent standard normal random variables (zero expected value and unit variance). This distribution is usually written

2 2

1 ... nY Z Z= + +

2

nY χ=

Page 2: Lecture 1 (Chi-Square and T Tests)

2

Chi-square Test

� The chi-square probability density function is

where:

y > 0

f (y) = 0 for y ≤ 0

Γ = Gamma function

� Y has a chi-square (χ2) distribution with ndegrees of freedom

( / 2) 1 / 2

/ 2

1( )

2 ( / 2)

n y

nf y y e

n

− −=Γ

Chi-square Test

1

0( ) for 0uu e duαα α

∞ − −Γ = >∫

( ) ( 1)! ( 1)( 2) (2)(1)p p p pΓ = − = − − ⋅⋅⋅

From this we can derive the following relationships:

2 1 1 3 5 (2 1)

2 2pp p π+ ⋅ ⋅ ⋅⋅ ⋅ − Γ =

for any positive integer p (p=1,2,3,…)

Chi-square Density Function

Chi-square Test

[ ]y E Y nµ = =

The mean of Y is:

The variance of Y is:

2 2( ) 2y yE Y nσ µ = − =

Chi-square Test

� If Y has a chi-square distribution with ndegrees of freedom, then its distribution function is:

where:

f(y) = density function

[ ]0

( ) ( )y

F y P Y y f y dy= ≤ = ∫

Chi-square Test

� F(y) is a probability

� If F(y) = p, where p is a constant, the corresponding value of y associated with pis

� this is known as the pth fractile or pth percentile of the chi-square distribution with n degrees of freedom

2

,p nχ

Page 3: Lecture 1 (Chi-Square and T Tests)

3

Chi-square Test

� The chi-square distribution has numerous applications in inferential statistics (chi-square tests and estimating variances).

� It enters the problem of estimating the mean of a normally distributed population and the problem of estimating the slope of a regressionline via its role in Student's t-distribution.

T (Student’s) Test

� arises in the problem of estimating the mean of a normally distributedpopulation when the sample size is small.

� Student's distribution arises when (as in nearly all practical statistical work) the population standard deviation is unknown and has to be estimated from the data.

T (Student’s) Test

� probability density function of the t-distribution resembles the bell shape of a normally distributed variable with mean 0 and variance 1, except that it is a bit lower and wider.

� as the number of degrees of freedom grows, the t-distribution approaches the normal distribution with mean 0 and variance 1.

� Symmetric about zero blue = normal distributionred = t-distribution

� If Z is a random variable with a standard normal distribution and Y is an independent random variable with a chi-square distribution with n degrees of freedom, then the random variable

is said to have a t-distribution with n degrees of freedom

/

ZT

Y n=

T (Student’s) Test

� The t (student) probability density function is

� If F(t) = p, where p is a constant, the corresponding value of t associated with pis

� this is known as the pth fractile or pth percentile of the t-distribution with n degrees of freedom

[ ] ( 1) / 22( 1) / 2

( ) 1 for - <t<( / 2)

nn t

f tnn nπ

− +Γ +

= + ∞ ∞ Γ

T (Student’s) Test

,p nt

Page 4: Lecture 1 (Chi-Square and T Tests)

4

1

0( ) for 0uu e duαα α

∞ − −Γ = >∫

( ) ( 1)! ( 1)( 2) (2)(1)p p p pΓ = − = − − ⋅⋅⋅

From this we can derive the following relationships:

2 1 1 3 5 (2 1)

2 2pp p π+ ⋅ ⋅ ⋅⋅ ⋅ − Γ =

for any positive integer p (p=1,2,3,…)

T (Student’s) Test T (Student’s) Test

[ ] 0 for n>1t E Tµ = =

The mean of T is:

The variance of T is:

2 2( ) for n>22

t t

nE T

nσ µ = − = −

Homework:

1. The random variable X has a chi-square distribution with twenty degrees of freedom. Calculate and plot the chi-square distribution of Y. Also determine the mean and standard deviation of Y. Show the table of values computed from the chi-square density function. Use up to 30 values for y, with 0.5 increments.

2. The random variable V is the sum of squares of ten standard normal variables. Calculate and plot the probability density function of V (V has a t-distribution). Evaluate the mean and standard deviation of V. Show the table of values computed from the t (student’s) density function. Use up to 30 values for y, with 0.5 increments.