introduction: why statistics? petter mostad 2005.08.29

Introduction: Why statistics?

Petter Mostad

2005.08.29

Statistics is…

• …a way to summarize and describe information: not very interesting in itself

• …an important tool for research in my field, and something I look forward to learning more about

• …an important tool for research in my field, but I only learn what I must learn about this

• … boringWhat best describes your attitude towards statistics?

How much do you already know?

• Definition of mean value, median, standard deviation?

• Bayes formula?

• t-tests?

• p-values?

• Computing the probability of getting dealt a flush in a game of poker?

Why a course in statistics?

What is research?

• A distinguishing feature of scientific research is that its conclusions are reproducible by other scientists

• Thus, research must – contain information about exactly what has

been done– somehow convince the reader that if she

repeates what has been done, she will reach the same conclusions

A goal of science: To study causality

• Ultimately, much of science is concerned with establishing statements like ”If A happens, then B will follow”

• In other words, one wants to show that B is reproduced every time A happens.

Example: Studying causality through intervension

• Retrospective studies can show covariation between variables, but not causality.

• Intervension can be used to argue that changing a certain variable causes another variable to change.

• To study effect of intervension, a control group is needed

Example: Reproducibility through randomization

Assume an experiment is done, with twogroups, receiving different ”treatment”: • Differences in the result could be caused

by differences in the treatments, or by differences between the groups from the start.

• Randomising the division into groups makes it unlikely that the groups are systematically different from the start

Example: blind, or double-blind studies

• Differences between the two groups could be caused by people’s knowledge they are in one group or the other.

• Differences could also be caused by the experimentalists (doctors) knowledge who is in which group.

• Removing the first knowledge gives a blind study, removing the second gives a double-blind study.

Quantitative and qualitative research

• Quantitative: Focus on things that can be measured or counted

• Qualitative: Focus on descriptions and examples.

• Two different scientific tratidions. Health economics and administration has elements from both.

• Both have advantages and disadvantages (which)?

Quantitative research

• For quantitative research, we have many good tools to ensure reproducibility of conclusions

• Statistics is a very important such tool

• Statistics used in this way can be called inferential statistics

Example: Reproducibility through statistics

• If you repeat a quantitative investigation (a questionnaire, an observation of a social phenomenon, a measurement) you are unlikely to get exactly the same numbers.

• Statistics can help you to estimate how different results are likely to be.

• This can tell you which conclusions are likely to be reproducible in a potential repetition of the investigation.

Descriptive vs. inferential statistics

• Descriptive statistics: To sum up, present, and visualize data.

• Inferential statistics: A tool to handle, and to draw (”infer”) reproducible conclusions on the basis of, uncertain information.

Descriptive statistics

• Goal: To reduce amount of data, while extracting the ”most important information”

• Can be done with single numbers (”summary statistics”), tables, or graphical figures.

• My next lecture will look at descriptive statistics

Can descriptive statistics be ”objective”?

• A person makes choices about: – What to measure– How to measure (for example what questions

to ask or what scale to use)– How to present the result

• Thus: A presentation or publication should always contain information about exactly how results have been obtained

Inferential statistics: Hypothesis test example

• You throw a dice ten times, and get 1 seven out of these ten times. You conclude that this is not a fair dice. Is the conclusion reproducible?

• You need to compute what observations are to be expected if the dice is a fair one.

Example: probability calculations

• The disease X has a 1% prevalence in the population. There is a test for X, and – If you are sick, the test is positive in 90% of

cases. – If you are not sick, the test is positive in 10%

of cases.

• You have a positive test: What is the probability that you are sick?

Example: desicions based on uncertain information

• An oil company wants to produce the maximum amount from an oil field.

• Available information: – Measurements (seismics) describing

approximately the geometry of the rock layers– Information from a couple of test drills– Information from geologists

• Where should they place the wells, and how should they produce?

The concept of a MODEL

• What separates inferential statistics from descriptive statistics is the use of a model.

• A model is a (mathematical) description of the connections between the variables you are interested in.

• It is a simplification of reality, and so never ”correct” or ”wrong”, but it can be more or less useful.

Statistical (or stochastic) models

• In statistical models, the variables are predicted with some variation or uncertainty: – The model for force moving a mass: F=ma, is exact. – The model for what the eyes of a fair dice will show

contains probabilities

• We can use the observed data to choose between possible models.

• The word ”stochastic” is often used when we are focusing more on the model than on the data.

Example

• Assume a certain portion of the population carry a specific gene, you want to know how many

• The model is simply the unknown proportion p

• You select and measure a number of individuals, and use the information to select the right model, i.e., the right p

Example

• You want to know the height distribution among 30 year old Norwegian women.

• You assume, using experience, that a good model is a normal distribution with some expectation and some variance

• You use data from a number of women to select a model (i.e. an expectation and variance), or a range of likely such expectations and variances

Sampling

• Often, the model can be a simplifying description of the population we want to study.

• We investigate the model by sampling from the population.

• When each individual is selected independently and randomly from the population, we call it (simple) random sampling

• Simple random sampling makes it easier to compute what we can conclude about the model from the data

Using the results

• Selecting some models over others means that you increase your understanding of each variable, and the relationships between variables

• Once a model has been selected, it can be used to forecast or predict the future

• Being able to predict the likely results of different desicions can be used to improve the desicion making

The goals of this course

• To enable you to understand, use, and criticise research results produced by others, and in particular to understand and view critically the statistical arguments

• To enable you to produce your own valid research results, using statistical tools.

Overview of statistics topics we will look at

• Descriptive statistics• Probability theory• Sampling and estimation• Regression• Non-parametrics• Analysis of variance• Desicion theory• Some more advanced topics• Much information is and will be available at

course web page

introduction: why statistics? petter mostad 2005.08.29

Documents

statistics isa way

quantitative investigation

doubleblind study

different treatment

study causalityultimately

different results

itselfan important tool

aboutan important tool