some statistical basics marian scott. why bother with statistics we need statistical skills to: make...

31
Some statistical basics Marian Scott

Upload: isaiah-barrett

Post on 28-Mar-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Some statistical basics Marian Scott. Why bother with Statistics We need statistical skills to: Make sense of numerical information, Summarise data, Present

Some statistical basics

Marian Scott

Page 2: Some statistical basics Marian Scott. Why bother with Statistics We need statistical skills to: Make sense of numerical information, Summarise data, Present

Why bother with Statistics

We need statistical skills to:  Make sense of numerical information,  Summarise data,  Present results (graphically),  Test hypotheses   Construct models

Page 3: Some statistical basics Marian Scott. Why bother with Statistics We need statistical skills to: Make sense of numerical information, Summarise data, Present

Variables- number and type

Univariate: there is one variable of interest measured on the individuals in the sample. We may ask:

What is the distribution of results-this may be further resolved into questions concerning the mean or average value of the variable and the scatter or variability in the results?

Page 4: Some statistical basics Marian Scott. Why bother with Statistics We need statistical skills to: Make sense of numerical information, Summarise data, Present

Bivariate

Bivariate two variables of interest are measured on each member of the sample. We may ask :

How are the two variables related? If one variable is time, how does the other

variable change? How can we model the dependence of one

variable on the other?

Page 5: Some statistical basics Marian Scott. Why bother with Statistics We need statistical skills to: Make sense of numerical information, Summarise data, Present

Multivariate

Multivariate many variables of interest are measured on the individuals in the sample, we might ask:

What relationships exist between the variables? Is it possible to reduce the number of variables, but

still retain 'all' the information?

Can we identify any grouping of the individuals on the basis of the variables?

Page 6: Some statistical basics Marian Scott. Why bother with Statistics We need statistical skills to: Make sense of numerical information, Summarise data, Present

Data types

Numerical: a variable may be either continuous or discrete.

For a discrete variable, the values taken are whole numbers (e.g. number of chromosome abnormalities, numbers of eggs).

For a continuous variable, values taken are real numbers (positive or negative and including fractional parts) (e.g. blood lead level, alkalinity, weight, temperature).

Page 7: Some statistical basics Marian Scott. Why bother with Statistics We need statistical skills to: Make sense of numerical information, Summarise data, Present

categorical

Categorical: a limited number of categories or classes exist, each member of the sample belongs to one and only one of the classes e.g. sex is categorical.

Sex is a nominal categorical variable since the categories are unordered.

Dose of a drug or level of diluent (eg recorded as low, medium ,high) would be an ordinal categorical variable since the different classes are ordered

Page 8: Some statistical basics Marian Scott. Why bother with Statistics We need statistical skills to: Make sense of numerical information, Summarise data, Present

Inference and Statistical Significance

Sample Population

inference

Is the sample representative? Is the population homogeneous?

Since only a sample has been taken from the population we cannot be 100% certain

Significance testing

Page 9: Some statistical basics Marian Scott. Why bother with Statistics We need statistical skills to: Make sense of numerical information, Summarise data, Present

Hypothesis Testing II

Null hypothesis: usually ‘no effect’

Alternative hypothesis: ‘effect’

Make a decision based on the evidence (the data)

There is a risk of getting it wrong!

Two types of error:- reject null when we shouldn’t

- Type I don’t reject null when we should

- Type II

Page 10: Some statistical basics Marian Scott. Why bother with Statistics We need statistical skills to: Make sense of numerical information, Summarise data, Present

Significance Levels

We cannot reduce probabilities of both Type I and Type II errors to zero.

So we control the probability of a Type I error.

This is referred to as the Significance Level or p-value.

Generally p-value of <0.05 is considered a reasonable risk of a Type I error.(beyond reasonable doubt)

Page 11: Some statistical basics Marian Scott. Why bother with Statistics We need statistical skills to: Make sense of numerical information, Summarise data, Present

Statistical Significance vs. Practical Importance

Statistical significance is concerned with the ability to discriminate between treatments given the background variation.

Practical importance relates to the scientific domain and is concerned with scientific discovery and explanation.

Page 12: Some statistical basics Marian Scott. Why bother with Statistics We need statistical skills to: Make sense of numerical information, Summarise data, Present

Power

Power is related to Type II error

probability of

power = 1 -making a Type II error

Aim:

to keep power as high as possible

Page 13: Some statistical basics Marian Scott. Why bother with Statistics We need statistical skills to: Make sense of numerical information, Summarise data, Present

Sample size calculations

What is the objective of the experiment?

How much of a difference is it important to be able to detect (the effect size)?

At what significance level do you want to conduct the test? (decrease the significance level, reduces power)

What is the power of the experiment (what is the probability that you will detect such a difference when it actually exists)?

How variable is the population? Greater variation needs larger sample size to achieve the same power

Page 14: Some statistical basics Marian Scott. Why bother with Statistics We need statistical skills to: Make sense of numerical information, Summarise data, Present

Power Curves

Page 15: Some statistical basics Marian Scott. Why bother with Statistics We need statistical skills to: Make sense of numerical information, Summarise data, Present

Modelling continuous variables-checking Normality

Normal density function and histogram

Check for symmetry Other possibility-Normal

probability plot

C1

Frequency

2.41.60.80.0-0.8-1.6-2.4

20

15

10

5

0

Mean 0.1211StDev 1.015N 100

Histogram of C1Normal

Page 16: Some statistical basics Marian Scott. Why bother with Statistics We need statistical skills to: Make sense of numerical information, Summarise data, Present

Modelling continuous variables-checking Normality

Normal probability plot

Should show a straight line

p-value of test is also reported (null: data are Normally distributed)

C1

Perc

ent

43210-1-2-3

99.9

99

95

90

80706050403020

10

5

1

0.1

Mean

0.439

0.1211StDev 1.015N 100AD 0.361P-Value

Probability Plot of C1Normal

Page 17: Some statistical basics Marian Scott. Why bother with Statistics We need statistical skills to: Make sense of numerical information, Summarise data, Present

Statistical inference

Hypothesis testing and the p-value Statistical significance vs real-world importance Confidence intervals

Page 18: Some statistical basics Marian Scott. Why bother with Statistics We need statistical skills to: Make sense of numerical information, Summarise data, Present

Confidence intervals- an alternative to hypothesis testing

A confidence interval is a range of credible values for the population parameter. The confidence coefficient is the percentage of times that the method will in the long run capture the true population parameter.  

A common form is sample estimator 2* estimated standard error

Page 19: Some statistical basics Marian Scott. Why bother with Statistics We need statistical skills to: Make sense of numerical information, Summarise data, Present

Statistical models

Outcomes or Responsesthese are the results of the practical work and are sometimes referred to as ‘dependent variables’.

Causes or Explanationsthese are the conditions or environment within which the outcomes or responses have been observed and are sometimes referred to as ‘independent variables’, but more commonly known as covariates.

Page 20: Some statistical basics Marian Scott. Why bother with Statistics We need statistical skills to: Make sense of numerical information, Summarise data, Present

Statistical models

In experiments many of the covariates have been determined by the experimenter but some may be aspects that the experimenter has no control over but that are relevant to the outcomes or responses.

In observational studies, these are usually not under the control of the experimenter but are recorded as possible explanations of the outcomes or responses.

Page 21: Some statistical basics Marian Scott. Why bother with Statistics We need statistical skills to: Make sense of numerical information, Summarise data, Present

Specifying a statistical models

Models specify the way in which outcomes and causes link together, eg.

Metabolite = Temperature The = sign does not indicate equality in a mathematical

sense and there should be an additional item on the right hand side giving a formula:-

Metabolite = Temperature + Error

Page 22: Some statistical basics Marian Scott. Why bother with Statistics We need statistical skills to: Make sense of numerical information, Summarise data, Present

statistical model interpretation

Metabolite = Temperature + Error

The outcome Metabolite is explained by Temperature and other things that we have not recorded which we call Error.

The task that we then have in terms of data analysis is simply to find out if the effect that Temperature has is ‘large’ in comparison to that which Error has so that we can say whether or not the Metabolite that we observe is explained by Temperature.

Page 23: Some statistical basics Marian Scott. Why bother with Statistics We need statistical skills to: Make sense of numerical information, Summarise data, Present

Correlations and linear relationships

Strength of linear relationship Simple indicator lying between –1 and +1 Check your plots for linearity

Page 24: Some statistical basics Marian Scott. Why bother with Statistics We need statistical skills to: Make sense of numerical information, Summarise data, Present

gene correlations

1.11.00.90.80.70.60.50.4

3

2

1

mBadSpl

RA

G1S

pl

corr 0.9

1312111098765

1.1

1.0

0.9

0.8

0.7

0.6

0.5

0.4

mBcl2Sp

mB

adS

pl

corr 0.5

0.150.100.050.00

3

2

1

mBclxLNR

AG

1S

pl

corr 0.03

0.90.80.70.60.50.4

3

2

1

mBadLN

RA

G1S

pl

corr -0.56

Page 25: Some statistical basics Marian Scott. Why bother with Statistics We need statistical skills to: Make sense of numerical information, Summarise data, Present

Interpreting correlations

The correlation coefficient is used as a measure of the linear relationship between two variables,

The correlation coefficient is a measure of the strength of the linear association between two variables. If the relationship is non-linear, the coefficient can still be evaluated and may appear sensible, so beware- plot the data first.

Page 26: Some statistical basics Marian Scott. Why bother with Statistics We need statistical skills to: Make sense of numerical information, Summarise data, Present

Simple regression model

The basic regression model assumes: The average value of the response x, is

linearly related to the explanatory t, The spread of the response x, about the

average is the SAME for all values of t, The VARIABILITY of the response x, about

the average follows a NORMAL distribution for each value of t.

Page 27: Some statistical basics Marian Scott. Why bother with Statistics We need statistical skills to: Make sense of numerical information, Summarise data, Present

Simple regression model

Model is fit typically using least squares Goodness of fit of model assessed based on

residual sum of squares and R2 Assumptions checked using residual plots Inference about model parameters carried out

using hypothesis tests or confidence intervals

Page 28: Some statistical basics Marian Scott. Why bother with Statistics We need statistical skills to: Make sense of numerical information, Summarise data, Present

statistical model interpretation

The traditional ‘statistical tests’ such as t-tests, ANOVA, ANCOVA and regression are each special cases of a more general type of model, making a number of assumptions -

t-tests work where there are two groups, ANOVA works with categorical explanatory variables, regression assumes that explanatory variables are

continuous, Our explanatory variables are not like this, they are

mixtures of continuous and categorical, so we need a more flexible approach- the G(eneral) L(inear) M(odel).

Page 29: Some statistical basics Marian Scott. Why bother with Statistics We need statistical skills to: Make sense of numerical information, Summarise data, Present

General linear models

General Linear Models (GLMs) are a comprehensive set of techniques that cover a wide range of analyses. Problems that make use of number of specific techniques may be specified as GLM problems using a unified specification called a Model Syntax. The form of the Model Syntax varies a little from statistics package to statistics package, but is essentially just a way of unambiguously specifying what the relationship is between variables (categorical or continuous).

Page 30: Some statistical basics Marian Scott. Why bother with Statistics We need statistical skills to: Make sense of numerical information, Summarise data, Present

Examples

Example Traditional Test GLM word equation

Comparing the effect of burning and clipping on bracken

Two sample t-test SHOOTS = MANAGEMENT

Comparing the effect of two different drugs with a placebo

One-way analysis of variance EFFECT = DRUG

Comparing the yield between fertilisers conducting the experiment in several fields

One-way analysis of variance with blocking

YIELD = FIELD + FERTILISER

Investigating the relationship between height and weight in people

Regression WEIGHT = HEIGHT

Investigating the relationship between oxygen consumption and weight in scampi, taking level of activity into account

Analysis of covariance, with emphasis on regression

OXYGEN = WEIGHT + ACTIVITY

or under different assumptions(an interaction between the terms)OXYGEN = WEIGHT | ACTIVITY

Page 31: Some statistical basics Marian Scott. Why bother with Statistics We need statistical skills to: Make sense of numerical information, Summarise data, Present

summary

hypothesis tests and confidence intervals are used to make inferences

we build statistical models to explore relationships and explain variation

the modelling framework is a general one – general linear models, generalised additive models

assumptions should be checked.