lecture 1 (chi-square and t tests)
TRANSCRIPT
1
Statistical Analysis
Chi-square and t (Student’s) Test
Significance Test
� tells us how confidently we can generalize to a larger (unmeasured) population from a (measured) sample of that population.
� Importance: we cannot generalize from a sample to the population without first submitting to a statistical test of significance.
POPULATION versus SAMPLE Chi-square Test
� a statistical test that can be used to determine whether observed frequencies are significantly different from expected frequencies
� any statistical hypothesis test in which the test statistic has a chi-square distribution if the null hypothesis is true.
� can be used to test independence as well as goodness of fit
Chi-square Test
� Non-parametric � does not require the sample data to be normally distributed
� But assumes that the variable is normally distributed in the population from which the sample is drawn
Chi-square Test
� For any positive integer n, the chi-square distribution with n degrees of freedom is the probability distribution of the random variable
� where the Zi are independent standard normal random variables (zero expected value and unit variance). This distribution is usually written
2 2
1 ... nY Z Z= + +
2
nY χ=
2
Chi-square Test
� The chi-square probability density function is
where:
y > 0
f (y) = 0 for y ≤ 0
Γ = Gamma function
� Y has a chi-square (χ2) distribution with ndegrees of freedom
( / 2) 1 / 2
/ 2
1( )
2 ( / 2)
n y
nf y y e
n
− −=Γ
Chi-square Test
1
0( ) for 0uu e duαα α
∞ − −Γ = >∫
( ) ( 1)! ( 1)( 2) (2)(1)p p p pΓ = − = − − ⋅⋅⋅
From this we can derive the following relationships:
2 1 1 3 5 (2 1)
2 2pp p π+ ⋅ ⋅ ⋅⋅ ⋅ − Γ =
for any positive integer p (p=1,2,3,…)
Chi-square Density Function
Chi-square Test
[ ]y E Y nµ = =
The mean of Y is:
The variance of Y is:
2 2( ) 2y yE Y nσ µ = − =
Chi-square Test
� If Y has a chi-square distribution with ndegrees of freedom, then its distribution function is:
where:
f(y) = density function
[ ]0
( ) ( )y
F y P Y y f y dy= ≤ = ∫
Chi-square Test
� F(y) is a probability
� If F(y) = p, where p is a constant, the corresponding value of y associated with pis
� this is known as the pth fractile or pth percentile of the chi-square distribution with n degrees of freedom
2
,p nχ
3
Chi-square Test
� The chi-square distribution has numerous applications in inferential statistics (chi-square tests and estimating variances).
� It enters the problem of estimating the mean of a normally distributed population and the problem of estimating the slope of a regressionline via its role in Student's t-distribution.
T (Student’s) Test
� arises in the problem of estimating the mean of a normally distributedpopulation when the sample size is small.
� Student's distribution arises when (as in nearly all practical statistical work) the population standard deviation is unknown and has to be estimated from the data.
T (Student’s) Test
� probability density function of the t-distribution resembles the bell shape of a normally distributed variable with mean 0 and variance 1, except that it is a bit lower and wider.
� as the number of degrees of freedom grows, the t-distribution approaches the normal distribution with mean 0 and variance 1.
� Symmetric about zero blue = normal distributionred = t-distribution
� If Z is a random variable with a standard normal distribution and Y is an independent random variable with a chi-square distribution with n degrees of freedom, then the random variable
is said to have a t-distribution with n degrees of freedom
/
ZT
Y n=
T (Student’s) Test
� The t (student) probability density function is
� If F(t) = p, where p is a constant, the corresponding value of t associated with pis
� this is known as the pth fractile or pth percentile of the t-distribution with n degrees of freedom
[ ] ( 1) / 22( 1) / 2
( ) 1 for - <t<( / 2)
nn t
f tnn nπ
− +Γ +
= + ∞ ∞ Γ
T (Student’s) Test
,p nt
4
1
0( ) for 0uu e duαα α
∞ − −Γ = >∫
( ) ( 1)! ( 1)( 2) (2)(1)p p p pΓ = − = − − ⋅⋅⋅
From this we can derive the following relationships:
2 1 1 3 5 (2 1)
2 2pp p π+ ⋅ ⋅ ⋅⋅ ⋅ − Γ =
for any positive integer p (p=1,2,3,…)
T (Student’s) Test T (Student’s) Test
[ ] 0 for n>1t E Tµ = =
The mean of T is:
The variance of T is:
2 2( ) for n>22
t t
nE T
nσ µ = − = −
Homework:
1. The random variable X has a chi-square distribution with twenty degrees of freedom. Calculate and plot the chi-square distribution of Y. Also determine the mean and standard deviation of Y. Show the table of values computed from the chi-square density function. Use up to 30 values for y, with 0.5 increments.
2. The random variable V is the sum of squares of ten standard normal variables. Calculate and plot the probability density function of V (V has a t-distribution). Evaluate the mean and standard deviation of V. Show the table of values computed from the t (student’s) density function. Use up to 30 values for y, with 0.5 increments.