basic statistical tests in r · famous hypotheses test example: the design of experiments (1935),...

Basic statistical tests in R

Anja Bråthen Kristoffersen Biomedical Research Group

Outline

• Example: Tea testing– Hypotese testing, type I and type II error

• More tests and how to find and read help files for different tests in R

•

2013.11.20 2

Famous hypotheses test example:

The Design of Experiments (1935), Sir Ronald A. Fisher

– A tea party in Cambridge in the 1920ties – A lady claims that she can taste whether milk is poured in cup before or after the tea

– All professors agree: impossible – Fisher: this is statistically interesting! He organized a test

2013.11.20 3

The lady tasting tea • Test with 8 trials, 2 cups in each trial

– In each trial: guess which cup had the milk poured in first

• Binomial experiment – Independent trials – Two possible outcomes, she guesses right cup (success), wrong cup (failure)

– Constant probability of success in each trial

• X = number of correct guesses in 8 trials, each with probability of success p – X is Binomially (8,p) distributed

2013.11.20 4

The lady tasting tea cont.• The null (conservative) hypothesis

– The one we initially believe in

• The alternative hypothesis – The new claim we wish to test

• She has no special ability to taste the difference

• She has a special ability to taste the difference

2013.11.20 5

0.5

0.5

How many right to be convincedWe expect maybe 3, 4 or 5 correct guesses if she has no special ability • Assume 7 correct guesses

– Is there enough evidence to claim that she has a special ability? If 8 correct guesses this would have been even more obvious!

• What if only 6 correct guesses? – Then it is not so easy to answer YES or NO

• Need a rule that says something about what it takes to be convinced.

2013.11.20 6

How many right to be convinced?

• Rule: We reject H0 if the observed data have a small probability under H0 (given H0 is true).

• Compute the p‐value. – The probability to obtain the observed value or something more extreme, given that is true

– NB! The p‐value is NOT the probability that is true

Small p‐value: reject the null hypothesis Large p‐value: keep the null hypothesis

2013.11.20 7

The lady tasting tea, cont. • Say: she identified 6 cups correctly • P‐value

– The probability to obtain the observed value or something more extreme, given that H0 is true

6 6 0.5 7 0.5 8 0.5

6, 8, 0.5 7, 8, 0.5 8, 8, 0.5sum dbinom 6:8,8,0.5 0.1445

• Is this enough to be convinced? • Need a limit.

– we must know about the types of errors we can make.

2013.11.20 8

Two types of error

• Type I error most serious – Wrongly reject the null hypothesis – Example:

• person is not guilty • person is guilty • To say a person is guilty when he is not is far more serious than to say he is not guilty when he is. 2013.11.20 9

H0 true H1 true

Accept H0 OK Type II error

Accept H1 Type I error OK

When to reject

• Decide on the hypothesis’ level of significance– Choose a level of significance α – This guarantees P(type I error) ≤ α – Example

• Level of significance at 0.05 gives 5 % probability to reject a true

• Reject H0 if P‐value is less than α

2013.11.20 10

Important parameters in hypothesis testing

• Null hypothesis • Alternative hypothesis • Level of significance

Must be decided upon before we know the results of the experiment

2013.11.20 11

The lady tasting tea, cont.

• Choose 5 % level of significance Conduct the experiment – Say: she identified 6 cups correctly – Is this evidence enough?

• P‐value – The probability to obtain the observed value or something more extreme, given that H0 is true

6 6: 8, 8, 0.50.1445

2013.11.20 12


• We obtained a p‐value of 0.1443 • The rejection rule says

– Reject H0 if p‐value is less than the level of significance α

– Since α = 0.05 we do NOT H0 reject

Small p‐value: reject the null hypothesis Large p‐value: keep the null hypothesis2013.11.20 13


• In the tea party in Cambridge: – The lady got every trial correct!

• Comment: – Why does it taste different?

• Pouring hot tea into cold milk makes the milk curdle, but not so pouring cold milk into hot tea*

2013.11.20 14*http://binomial.csuhayward.edu/applets/appletNullHyp.html Curdle = å skille seg

Area of rejection

• Reject H0 if p‐value ≤ α• Reject H0 if observed value (x) ≥ critical value (xc)• P(type I error) = P(reject H0 | H0 true) = P(X ≥ xc|p = 0.5)

– xc= 7 → sum(dbinom(7:8, 8, 0.5)) = 0.035 ≤ 0.05– xc= 6→ sum(dbinom(6:8, 8, 0.5)) = 0.145 > 0.05

Area of rejection: {x: x ≥ xc} → {x: x ≥ 7}

2013.11.20 15

Type II error

P(type I error) ≤ α P(type II error) = β• Want both errors as small as possible

– especially type I.

• β is not explicitly given, depends on H1

• There is one β for each possible value of p under H1

2013.11.20 16

H0 true H1 true

Accept H0 OK Type II error

Accept H1 Type I error OK

Example: type II error

• P(type II error) = P(not reject H0 | H1 true) – P = 0.7: P(not reject H0 | p = 0.7) = 1 ‐ P(reject H0 | p = 0.7) = 1 – P(X ≥ 7 | p = 0.7) = 1 ‐ (1 – P(X < 7 | p = 0.7) = P(X ≤ 6|p = 0.7) = sum(dbinom(1:6, 8, 0.7)) = 0.745

p = 0.7: H0 will wrongly be accept in 74.5% ofthe tests

2013.11.20 17

Power of the test

• The probability that a false H0 is rejectedP(reject H0 | H1 true) = 1 ‐ P(accept H0 | H1 true) = 1 ‐ β • A test with large power has:

– larger probability to draw the right conclusion– larger probability to reject a false null hypothesis

then a test with low power.• α and β is connected:

– Decreasing α will give an increased β which again will decrease the power of the test

2013.11.20 18

Example: power function

2013.11.20 19

p <- seq(0.6, 1, 0.01)antall <- length(p)beta8 <- rep(NA, antall)for(i in 1:antall){beta8[i] <- sum(dbinom(1:6, 8, p[i]))

}power8 <- 1 - beta8plot(p, power8, type = "l")

http://www.stanford.edu/~stephsus/basic‐stats‐tests.pdf

help(shapiro.test)

2013.11.20 21

2013.11.20 22

shapiro.test()

2013.11.20 23

chisq.test()

2013.11.20 24

Reject the null hypotheses and assume thatthere are differences between the groups

t.test()

2013.11.20 25

t.test()

2013.11.20 26

t.test(, paired = TRUE)

2013.11.20 27

wilcox.test()

• Same mean? But not normally distributed data.

2013.11.20 28

wilcox.test(, paired = TRUE)

2013.11.20 29

help(cor.test)

2013.11.20 30

2013.11.20 31

2013.11.20 32

2013.11.20 33

2013.11.20 34

Multiple hypothesis testing

• Tests are designed such that it has an expected proportion of incorrectly rejected null hypotheses, most often this level is 5%.

• When many tests are done the probability of rejecting a null hypotheses falsely increase, hence we can correct the probabilities according to how many tests that are done.

2013.11.20 35

Example 10000 genes• Q: is gene g, g = 1, …, 10 000, differentially expressed?• Gives 10 000 null hypothesis: ,

– : gene 1 not differentially expressed– …

• Assume: no genes differentially expressed– true for all g

• Significance level α ≤ 0.01– The probability to incorrectly conclude that one gene is differentially expressed is 0.01. e.g. 0.01 * 10000 = 100 expected wrong rejections of

2013.11.20 36

Need to control the risk of false positiveType I error

• Corrected p‐value:– The original p‐values do not tell the full story.– Instead of using the original p‐values for decision making, we should use corrected ones.

2013.11.20 37

Different correction methods

• Bonferroni (1935)– Just multiply all the p‐values by the number of tests– To conservative

• need very small p‐value to reject • giveverylittlepower

• Methods that control the family‐wise error rate (FWER).• Methods that control the false discovery rate (FDR).

2013.11.20 38

Family‐Wise Error Rate (FWER)

• Control type I errors at a level α– Bonferroni– Sidak– Bonferroni‐Holm– Westfall & Young

• Use one of these if you are most afraid of getting stuff on your significant list that should not have been there

2013.11.20 39

False Discovery Rate (FDR)

• Calculate the expected proportion of type I error among the rejected hypotheses

• Technique that applies to a set of p‐values – Benjamini & Hochberg– Different newer variants of Benjamini & Hochberg

• Use one of these if you are you most afraid of missing out on interesting stuff

2013.11.20 40

help(p.adjust)

2013.11.20 41

False discovery rate (fdr)

2013.11.20 42

regression

• Next lecture!

2013.11.20 43

basic statistical tests in r · famous hypotheses test example: the design of experiments (1935),...

Documents