statistical methods icatherineghaase.weebly.com/.../statisticaltests1.pdf · use nonparametric...

35
Statistical Methods I Dr. Catherine Haase, Montana State University

Upload: others

Post on 12-Oct-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistical Methods Icatherineghaase.weebly.com/.../statisticaltests1.pdf · Use nonparametric tests 5. Use robust methods: ... •A chi-square statistic shows a relationship between

Statistical Methods IDr. Catherine Haase, Montana State University

Page 2: Statistical Methods Icatherineghaase.weebly.com/.../statisticaltests1.pdf · Use nonparametric tests 5. Use robust methods: ... •A chi-square statistic shows a relationship between

Statistical Models• A statistical model is a mathematical model that embodies a set of statistical

assumptions concerning the generation of some sample data and similar data from a larger population

• A statistical model is "a formal representation of a theory"

• A statistical model is usually specified by mathematical equations that relate random variables and possibly other non-random variables

Page 3: Statistical Methods Icatherineghaase.weebly.com/.../statisticaltests1.pdf · Use nonparametric tests 5. Use robust methods: ... •A chi-square statistic shows a relationship between

Parametric vs. Non-parametric Models• Parametric models assume that sample data comes from a population that

follows a probability distribution based on a fixed set of parameters• Uses info about the mean and deviation from the mean

• Assumptions:• Normally distributed (both samples and error)

• Homogeneity of variance

• Interval or ratio data

• Independence: no collinearity or autocorrelation

Page 4: Statistical Methods Icatherineghaase.weebly.com/.../statisticaltests1.pdf · Use nonparametric tests 5. Use robust methods: ... •A chi-square statistic shows a relationship between

Test for Normality• Graph the histogram of the data:

• Skewness: lack of symmetry

• Kurtosis: heavy tailed

• Check Q-Q Plot

• Run Shaprio-Wilk test

Page 5: Statistical Methods Icatherineghaase.weebly.com/.../statisticaltests1.pdf · Use nonparametric tests 5. Use robust methods: ... •A chi-square statistic shows a relationship between

Test for Homogeneity of Variance• Also called homoscedasticity

• Test with model residuals

• Residuals should fall evenly across the line set = 0• If uneven or skewed pattern,

homogeneity is not met

• If pattern across data emerges, need to consider the effect of another variable

Page 6: Statistical Methods Icatherineghaase.weebly.com/.../statisticaltests1.pdf · Use nonparametric tests 5. Use robust methods: ... •A chi-square statistic shows a relationship between

Check for Interval or Ratio Data• Interval data: we know not only the order, but also the exact differences

between the values• Can calculate mean, median, model, standard deviation

• Don’t have a “true zero”

• Ratio data: they tell us the exact value between units, order, have an absolute zero• Can be subtracted, added, multiplied, divided

• Can calculate mean, median, mode, standard deviation

Page 7: Statistical Methods Icatherineghaase.weebly.com/.../statisticaltests1.pdf · Use nonparametric tests 5. Use robust methods: ... •A chi-square statistic shows a relationship between

Test for Multicollinearity• Check correlation coefficients between

data in correlation matrix

• Plot a correlogram with corr.plot()

Page 8: Statistical Methods Icatherineghaase.weebly.com/.../statisticaltests1.pdf · Use nonparametric tests 5. Use robust methods: ... •A chi-square statistic shows a relationship between

Steps to Handle Violations of Assumptions1. Select the correct parametric test and check assumptions

2. Change your model: Problems with distribution of residuals is sometimes related to the model not describing the data well

3. Transform data and repeat your original analysis: The dependent variable, or other variables, can be transformed so that model assumptions are met

4. Use nonparametric tests

5. Use robust methods: There are other statistical methods which are robust to violations of parametric assumptions or are nonparametric

Page 9: Statistical Methods Icatherineghaase.weebly.com/.../statisticaltests1.pdf · Use nonparametric tests 5. Use robust methods: ... •A chi-square statistic shows a relationship between

How to Transform Data• Common transformations: square root, cube root, and log-transformations

• If data include zero: must add/subtract a constant

• Example: log(survey$weight + 1)

• Use a generalized linear model that incorporates a specific distribution link function that characterizes your data

Page 10: Statistical Methods Icatherineghaase.weebly.com/.../statisticaltests1.pdf · Use nonparametric tests 5. Use robust methods: ... •A chi-square statistic shows a relationship between

Parametric vs. Non-parametric Models• Non-parametric models assume that the data distribution cannot be defined

in terms of such a finite set of parameters (mean, variance)• Distribution-free or the parameters of the

distribution are unspecified

• Ordinal or nominal data

• Usually, non-normal data can be transformed to use parametric tests; but there are manynon-parametric equivalents that do not requiredata transformation

Page 11: Statistical Methods Icatherineghaase.weebly.com/.../statisticaltests1.pdf · Use nonparametric tests 5. Use robust methods: ... •A chi-square statistic shows a relationship between

Examples of Non-parametric Tests

Page 12: Statistical Methods Icatherineghaase.weebly.com/.../statisticaltests1.pdf · Use nonparametric tests 5. Use robust methods: ... •A chi-square statistic shows a relationship between

Statistical Tests• Statistical test statistic: numerical result of the statistical test

• i.e. t-test critical value

• Probability value (p-value): measures the probability that observed or more extreme differences would be found if the null hypothesis were true

• A small p-value means that it is unlikely that a difference would have been observed due to random variation

Page 13: Statistical Methods Icatherineghaase.weebly.com/.../statisticaltests1.pdf · Use nonparametric tests 5. Use robust methods: ... •A chi-square statistic shows a relationship between

Errors in Hypothesis Testing

Page 14: Statistical Methods Icatherineghaase.weebly.com/.../statisticaltests1.pdf · Use nonparametric tests 5. Use robust methods: ... •A chi-square statistic shows a relationship between

P-values and Statistical Significance

Page 15: Statistical Methods Icatherineghaase.weebly.com/.../statisticaltests1.pdf · Use nonparametric tests 5. Use robust methods: ... •A chi-square statistic shows a relationship between

Confidence Intervals• Confidence Interval: range of values we are fairly sure our true value lies in

• Factors affecting the width of the confidence interval include the size of the sample, the confidence level, and the variability in the sample

• Calculate confidence interval:

𝑋 ± 𝑧𝑠

𝑛

where z is the test statistic from a given CI percentile (usually 1.96), s is the sample standard deviation, and n is the number of samples

Page 16: Statistical Methods Icatherineghaase.weebly.com/.../statisticaltests1.pdf · Use nonparametric tests 5. Use robust methods: ... •A chi-square statistic shows a relationship between
Page 17: Statistical Methods Icatherineghaase.weebly.com/.../statisticaltests1.pdf · Use nonparametric tests 5. Use robust methods: ... •A chi-square statistic shows a relationship between

Chi-Squared Statistic (X2)• A chi-square statistic shows a relationship between two categorical variables

• The chi-squared statistic shows how much difference exists between the observed counts and the counts expected if there were no relationship at all in the population

• A low value means there is a high correlation between two sets of data

• If the observed and expected values were equal (“no difference”) then chi-square would be zero — an event that is unlikely to happen in real life

Additional Resources: https://www.khanacademy.org/tag/chi-square-test

Page 18: Statistical Methods Icatherineghaase.weebly.com/.../statisticaltests1.pdf · Use nonparametric tests 5. Use robust methods: ... •A chi-square statistic shows a relationship between

Chi-Squared Test• Compare calculated chi-squared statistic to the critical value pulled from a chi-

squared table with associated degrees of freedom (df)• df = # of (rows – 1) x (# of columns – 1)

• Or compare p-value – if less than selected α (0.05), difference is significant

• Extremely sensitive to sample size – when the sample size is too large (~500), almost any small difference will appear statistically significant

• Also sensitive to the distribution within the cells (< 5 cases)• Always using categorical variables with a limited number of categories

• Or combine categories if necessary to produce a smaller table

Page 19: Statistical Methods Icatherineghaase.weebly.com/.../statisticaltests1.pdf · Use nonparametric tests 5. Use robust methods: ... •A chi-square statistic shows a relationship between

Chi-Squared by Hand: Soda Preference1. Write hypothesis

2. Perform class survey (Gender, Coke/Peps preference)

3. Create observed table

4. Calculate expected table

5. Calculate Chi-squared value

6. Compare to table given df = # of (rows – 1) x (# of columns – 1)

Page 20: Statistical Methods Icatherineghaase.weebly.com/.../statisticaltests1.pdf · Use nonparametric tests 5. Use robust methods: ... •A chi-square statistic shows a relationship between
Page 21: Statistical Methods Icatherineghaase.weebly.com/.../statisticaltests1.pdf · Use nonparametric tests 5. Use robust methods: ... •A chi-square statistic shows a relationship between

Chi-Squared in Excel: Soda Preference1. Write hypothesis

2. Perform class survey (Gender, Coke/Peps preference)

3. Create a table in Excel

4. Sum up total for preference by sex

5. Create expected data table

6. Use the “=CHISQ.TEST” command

7. Compared X2 to table given df = # of (rows – 1) x (# of columns – 1)

8. Compare p-value to α

9. Reject or accept null hypothesis?

Page 22: Statistical Methods Icatherineghaase.weebly.com/.../statisticaltests1.pdf · Use nonparametric tests 5. Use robust methods: ... •A chi-square statistic shows a relationship between

Chi-Squared in R• Read in the data and

perform a chi-squared in R

• How do the results compare?

• Now manipulate the data – see how the test statistic changes if you multiply the data by 100, or reduce to <5 cases

Page 23: Statistical Methods Icatherineghaase.weebly.com/.../statisticaltests1.pdf · Use nonparametric tests 5. Use robust methods: ... •A chi-square statistic shows a relationship between

Example Problem: by hand• Determine if the sex of birds present between states is as expected

1. Organize data1. Subset to only presence data

2. Summarize number of birds within each state between sexes using the tablefunction : table(bird$Sex, bird$State)

2. Create table

3. Perform chi-squared test

4. Compare to Chi-squared value, given df

Page 24: Statistical Methods Icatherineghaase.weebly.com/.../statisticaltests1.pdf · Use nonparametric tests 5. Use robust methods: ... •A chi-square statistic shows a relationship between

Example Problem: in R or Excel• Determine if the sex of birds present between states is as expected

1. Organize data1. Subset to only presence data

2. Summarize number of birds within each state between sexes using the tablefunction : table(bird$Sex, bird$State)

2. Perform chi-squared test

3. Compare to Chi-squared value, given df

4. Compare p-value to α

Page 25: Statistical Methods Icatherineghaase.weebly.com/.../statisticaltests1.pdf · Use nonparametric tests 5. Use robust methods: ... •A chi-square statistic shows a relationship between

Example Problem• Suppose you survey 26 rural and 19 semi-urban farmers about their use of

their land. Of the semi-urban farmers, 53.3% use for livestock, and 44.2% use for cropping purposes. Of the rural farmers, 13.1% use for livestock, and 84.4% use for cropping.

• Try doing the math by hand first!

• Are these farmers using the land disproportionately – that is, more than expected for each type?

Page 26: Statistical Methods Icatherineghaase.weebly.com/.../statisticaltests1.pdf · Use nonparametric tests 5. Use robust methods: ... •A chi-square statistic shows a relationship between

t-Test Statistic• T-test assesses if the means of two groups are statistically different from each

• The t-test statistic or t-score is a ratio between the difference between and within two groups • The larger the t-score, the greater difference• The smaller the t-score, the more similar

• There are three main types of t-tests:• An Independent Samples (Student’s t-test)

t-test compares the means for two groups

• A Paired sample t-test compares means from the same group, different times

• A One sample t-test tests the mean of a single group against a known mean

Page 27: Statistical Methods Icatherineghaase.weebly.com/.../statisticaltests1.pdf · Use nonparametric tests 5. Use robust methods: ... •A chi-square statistic shows a relationship between

t-Test• Top: just the difference between

the two means or averages

• Bottom: a measure of the variability or dispersion of the scores in relation to noise

• Look up in a table of significance to test whether the ratio is large enough to say that the difference between the groups is not likely to have been a chance finding• Given α and df (n1 + n2 – 2)

Page 28: Statistical Methods Icatherineghaase.weebly.com/.../statisticaltests1.pdf · Use nonparametric tests 5. Use robust methods: ... •A chi-square statistic shows a relationship between
Page 29: Statistical Methods Icatherineghaase.weebly.com/.../statisticaltests1.pdf · Use nonparametric tests 5. Use robust methods: ... •A chi-square statistic shows a relationship between

t-Test by Hand: Height Differences1. Write hypothesis

2. Perform class survey (Gender, Height)

3. Calculate mean of groups

4. Calculate variance of groups

5. Calculate t-value

6. Compared t-value to table

7. Reject or accept null hypothesis?

Page 30: Statistical Methods Icatherineghaase.weebly.com/.../statisticaltests1.pdf · Use nonparametric tests 5. Use robust methods: ... •A chi-square statistic shows a relationship between

t-Test in Excel: Height Differences1. Write hypothesis

2. Perform class survey (Gender, Height)

3. Enter in the data per person in Excel

4. Calculate count, mean, sd, variance

5. Use the “=T.TEST” command on means

6. Compared t-value to table

7. Reject or accept null hypothesis?

Page 31: Statistical Methods Icatherineghaase.weebly.com/.../statisticaltests1.pdf · Use nonparametric tests 5. Use robust methods: ... •A chi-square statistic shows a relationship between

t-Test in R• Perform a Student’s t-test in R with the same dataset

• How are these results different?

• Can you check the assumptions in Excel?

• Do you still reject/fail to reject the null hypothesis?

• Now bring in another dataset to practice a paired t-test

• Finally, bring in the third dataset to practice a non-parametric equivalent

Page 32: Statistical Methods Icatherineghaase.weebly.com/.../statisticaltests1.pdf · Use nonparametric tests 5. Use robust methods: ... •A chi-square statistic shows a relationship between

Example Problem• Perform a t-test to compare elevation between Florida and Georgia from the

“bird_data.txt” file

1. First, determine what sort of t-test you need to use

2. Check assumptions

3. Write your hypothesis

4. Perform test

5. Compare t-value to table

6. Compare p-value to α

Page 33: Statistical Methods Icatherineghaase.weebly.com/.../statisticaltests1.pdf · Use nonparametric tests 5. Use robust methods: ... •A chi-square statistic shows a relationship between

Example Problem• Perform a t-test to compare elevation in males of bird data to expected value

of 1.0km

1. First, determine what sort of t-test you need to use

2. Check assumptions

3. Write your hypothesis

4. Perform test

5. Compare t-value to table

6. Compare p-value to α

Page 34: Statistical Methods Icatherineghaase.weebly.com/.../statisticaltests1.pdf · Use nonparametric tests 5. Use robust methods: ... •A chi-square statistic shows a relationship between

Example Problem• You want to compare the number of eggs per clutch between two villages at

different altitudes:• Village 1: mean = 12.9, variance = 4.4, number of surveys = 81

• Village 2: mean = 9.1, variance = 3.9, number of surveys = 92

• Determine which sort of t-test you should use to compare the samples?

• Is there a difference in the clutch sizes between villages?

Page 35: Statistical Methods Icatherineghaase.weebly.com/.../statisticaltests1.pdf · Use nonparametric tests 5. Use robust methods: ... •A chi-square statistic shows a relationship between

Example problem: your data?