introduction: why statistics? petter mostad 2005.08.29
TRANSCRIPT
Statistics is…
• …a way to summarize and describe information: not very interesting in itself
• …an important tool for research in my field, and something I look forward to learning more about
• …an important tool for research in my field, but I only learn what I must learn about this
• … boringWhat best describes your attitude towards statistics?
How much do you already know?
• Definition of mean value, median, standard deviation?
• Bayes formula?
• t-tests?
• p-values?
• Computing the probability of getting dealt a flush in a game of poker?
What is research?
• A distinguishing feature of scientific research is that its conclusions are reproducible by other scientists
• Thus, research must – contain information about exactly what has
been done– somehow convince the reader that if she
repeates what has been done, she will reach the same conclusions
A goal of science: To study causality
• Ultimately, much of science is concerned with establishing statements like ”If A happens, then B will follow”
• In other words, one wants to show that B is reproduced every time A happens.
Example: Studying causality through intervension
• Retrospective studies can show covariation between variables, but not causality.
• Intervension can be used to argue that changing a certain variable causes another variable to change.
• To study effect of intervension, a control group is needed
Example: Reproducibility through randomization
Assume an experiment is done, with twogroups, receiving different ”treatment”: • Differences in the result could be caused
by differences in the treatments, or by differences between the groups from the start.
• Randomising the division into groups makes it unlikely that the groups are systematically different from the start
Example: blind, or double-blind studies
• Differences between the two groups could be caused by people’s knowledge they are in one group or the other.
• Differences could also be caused by the experimentalists (doctors) knowledge who is in which group.
• Removing the first knowledge gives a blind study, removing the second gives a double-blind study.
Quantitative and qualitative research
• Quantitative: Focus on things that can be measured or counted
• Qualitative: Focus on descriptions and examples.
• Two different scientific tratidions. Health economics and administration has elements from both.
• Both have advantages and disadvantages (which)?
Quantitative research
• For quantitative research, we have many good tools to ensure reproducibility of conclusions
• Statistics is a very important such tool
• Statistics used in this way can be called inferential statistics
Example: Reproducibility through statistics
• If you repeat a quantitative investigation (a questionnaire, an observation of a social phenomenon, a measurement) you are unlikely to get exactly the same numbers.
• Statistics can help you to estimate how different results are likely to be.
• This can tell you which conclusions are likely to be reproducible in a potential repetition of the investigation.
Descriptive vs. inferential statistics
• Descriptive statistics: To sum up, present, and visualize data.
• Inferential statistics: A tool to handle, and to draw (”infer”) reproducible conclusions on the basis of, uncertain information.
Descriptive statistics
• Goal: To reduce amount of data, while extracting the ”most important information”
• Can be done with single numbers (”summary statistics”), tables, or graphical figures.
• My next lecture will look at descriptive statistics
Can descriptive statistics be ”objective”?
• A person makes choices about: – What to measure– How to measure (for example what questions
to ask or what scale to use)– How to present the result
• Thus: A presentation or publication should always contain information about exactly how results have been obtained
Inferential statistics: Hypothesis test example
• You throw a dice ten times, and get 1 seven out of these ten times. You conclude that this is not a fair dice. Is the conclusion reproducible?
• You need to compute what observations are to be expected if the dice is a fair one.
Example: probability calculations
• The disease X has a 1% prevalence in the population. There is a test for X, and – If you are sick, the test is positive in 90% of
cases. – If you are not sick, the test is positive in 10%
of cases.
• You have a positive test: What is the probability that you are sick?
Example: desicions based on uncertain information
• An oil company wants to produce the maximum amount from an oil field.
• Available information: – Measurements (seismics) describing
approximately the geometry of the rock layers– Information from a couple of test drills– Information from geologists
• Where should they place the wells, and how should they produce?
The concept of a MODEL
• What separates inferential statistics from descriptive statistics is the use of a model.
• A model is a (mathematical) description of the connections between the variables you are interested in.
• It is a simplification of reality, and so never ”correct” or ”wrong”, but it can be more or less useful.
Statistical (or stochastic) models
• In statistical models, the variables are predicted with some variation or uncertainty: – The model for force moving a mass: F=ma, is exact. – The model for what the eyes of a fair dice will show
contains probabilities
• We can use the observed data to choose between possible models.
• The word ”stochastic” is often used when we are focusing more on the model than on the data.
Example
• Assume a certain portion of the population carry a specific gene, you want to know how many
• The model is simply the unknown proportion p
• You select and measure a number of individuals, and use the information to select the right model, i.e., the right p
Example
• You want to know the height distribution among 30 year old Norwegian women.
• You assume, using experience, that a good model is a normal distribution with some expectation and some variance
• You use data from a number of women to select a model (i.e. an expectation and variance), or a range of likely such expectations and variances
Sampling
• Often, the model can be a simplifying description of the population we want to study.
• We investigate the model by sampling from the population.
• When each individual is selected independently and randomly from the population, we call it (simple) random sampling
• Simple random sampling makes it easier to compute what we can conclude about the model from the data
Using the results
• Selecting some models over others means that you increase your understanding of each variable, and the relationships between variables
• Once a model has been selected, it can be used to forecast or predict the future
• Being able to predict the likely results of different desicions can be used to improve the desicion making
The goals of this course
• To enable you to understand, use, and criticise research results produced by others, and in particular to understand and view critically the statistical arguments
• To enable you to produce your own valid research results, using statistical tools.