Research projects differ• Some seek to explore issues
– field studies vs laboratory studies
• Some seek to confirm or disprove hypotheses– field studies vs laboratory studies
• Some seek to critically evaluate an area• Some seek to solve a problem and
implement a solution• Some design and implement new systems• Some seek to develop new theory or
algorithms
(Social)Sciences
Engineering & mathematics
StatisticsThere are lies, damn lies, and overused quotations
A statistician is someone who wanted to be an accountant but did not have the charisma.
Anon
Statistical thinking
will one day be as
necessary for efficient citizenship
as the ability to read
and write.H.G. Wells
It is the function of the statistical
method to emphasise that precise
conclusions cannot be drawn
from inadequate data.
E.S. Pearson and H.O Hartley
A witty statesman once said that you
might prove anything by figures (but)
a judicious man looks at statistics not
to get knowledge but to save himself
from having ignorance foisted upon
him.Thomas Carlyle
He uses statistics as a drunk uses a street
lamp, for support rather than illumination.
Andrew Lang
Statistics ...
• ... is the analytic heart of scientific research and inference
• It is not a numerical add-on; nor should it be seen as a hurdle to publication
So off to the Welsh Valleys!
Cynefin: a Welsh habitat
Cause and effect can be determined with
sufficient data
Knowable The realm of
Scientific Inquiry
Complex The realm of Social Systems
Cause and effect may be determined after the event
Chaotic Cause and effect not discernable
Known The realm of Scientific
Knowledge Cause and effect understood
and predicable
D. Snowden (2002). "Complex acts of knowing - paradox and descriptive self-awareness." Journal of Knowledge Management 6 pp. 100-11.
Cynefin:• physical environment• cultural environment• social environment• historical environment• …..
Cynefin: a Welsh habitat
Cause and effect can be determined with
sufficient data
K nowable The realm of
Scientific Inquiry
Complex The realm of Social Systems
Cause and effect may be determined after the event
Chaotic Cause and effect not discernable
K nown The realm of Scientific
Knowledge Cause and effect understood
and predicable
D. Snowden (2002). "Complex acts of knowing - paradox and descriptive self-awareness." Journal of Knowledge Management 6 pp. 100-11.
Cynefin:• physical environment• cultural environment• social environment• historical environment• …..
Cynefin: a Welsh habitat
Cause and effect can be determined with
sufficient data
Knowable The realm of
Scientific Inquiry
Complex The realm of Social Systems
Cause and effect may be determined after the event
Chaotic Cause and effect not discernable
Known The realm of Scientific
Knowledge Cause and effect understood
and predicable
D. Snowden (2002). "Complex acts of knowing - paradox and descriptive self-awareness." Journal of Knowledge Management 6 pp. 100-11.
Cynefin:• physical environment• cultural environment• social environment• historical environment• …..
Cynefin: a Welsh habitat
Cause and effect can be determined with
sufficient data
Knowable The realm of
Scientific Inquiry
Complex The realm of Social Systems
Cause and effect may be determined after the event
Chaotic Cause and effect not discernable
Known The realm of Scientific
Knowledge Cause and effect understood
and predicable
D. Snowden (2002). "Complex acts of knowing - paradox and descriptive self-awareness." Journal of Knowledge Management 6 pp. 100-11.
Cynefin:• physical environment• cultural environment• social environment• historical environment• …..
Cynefin: a Welsh habitat
Cause and effect can be determined with
sufficient data
Knowable The realm of
Scientific Inquiry
Complex The realm of Social Systems
Cause and effect may be determined after the event
Chaotic Cause and effect not discernable
Known The realm of Scientific
Knowledge Cause and effect understood
and predicable
D. Snowden (2002). "Complex acts of knowing - paradox and descriptive self-awareness." Journal of Knowledge Management 6 pp. 100-11.
Cynefin:• physical environment• cultural environment• social environment• historical environment• …..
Cynefin: a Welsh habitat
Cause and effect can be determined with
sufficient data
Knowable The realm of
Scientific Inquiry
Complex The realm of Social Systems
Cause and effect may be determined after the event
Chaotic Cause and effect not discernable
Known The realm of Scientific
Knowledge Cause and effect understood
and predicable
D. Snowden (2002). "Complex acts of knowing - paradox and descriptive self-awareness." Journal of Knowledge Management 6 pp. 100-11.
Cynefin:• physical environment• cultural environment• social environment• historical environment• …..
Learning and knowledge
Cause and effect can be determined with
sufficient data
K nowable The realm of
Scientific Inquiry
Complex The realm of Social Systems
Cause and effect may be determined after the event
Chaotic Cause and effect not discernable
K nown The realm of Scientific
Knowledge Cause and effect understood
and predicable
Knowledge Management and Nonaka’s SECI
tacit knowledge
explicit knowledge
Socialisation
Internalisation
Combination
Externalisation
12
The practice of
Science and
research
Sense-making and articulation is as important to Science and research
Cynefin and Knowledge Management
Cause and effect can be determined with
sufficient data
K nowable The realm of
Scientific Inquiry
Complex The realm of Social Systems
Cause and effect may be determined after the event
Chaotic Cause and effect not discernable
K nown The realm of Scientific
Knowledge Cause and effect understood
and predicable
Tacit KnowledgeJudgement/expertise
Explicit Knowledgee.g. Scientific Models
Applications of Cynefin• Emergency Management• Categorisation of DSS and OR/DA techniques• Human Reliability Analysis• High Reliability Organisations• Knowledge Management• Sensemaking• Research Methodology
S. French (2012) ‘Cynefin, Statistics and Decision Analysis’. Journal of the Operational Research Society. In press http://www.palgrave-journals.com/jors/journal/vaop/ncurrent/full/jors201223a.html
Cynefin: learning, repeatability
Cause and effect can be determined with
sufficient data
K nowable The realm of
Scientific Inquiry
Complex The realm of Social Systems
Cause and effect may be determined after the event
Chaotic Cause and effect not discernable
K nown The realm of Scientific
Knowledge Cause and effect understood
and predicable Repeatability and increasing familiarity
Cause and effect can be determined with
sufficient data
K nowable The realm of
Scientific Inquiry
Complex The realm of Social Systems
Cause and effect may be determined after the event
Chaotic Cause and effect not discernable
K nown The realm of Scientific
Knowledge Cause and effect understood
and predicable
Cynefin and data collection
Exper
imen
ts
and
trials
Case
studie
s,
inter
views,
and
surv
eys
Cause and effect can be determined with
sufficient data
K nowable The realm of
Scientific Inquiry
Complex The realm of Social Systems
Cause and effect may be determined after the event
Chaotic Cause and effect not discernable
K nown The realm of Scientific
Knowledge Cause and effect understood
and predicable
Cynefin and statistics
Repea
table
even
ts
Uniqueevents
Events?
Estim
ation
and
conf
irmat
ory
analy
sis
explo
rato
ry
analy
ses
Cause and effect can be determined with
sufficient data
K nowable The realm of
Scientific Inquiry
Complex The realm of Social Systems
Cause and effect may be determined after the event
Chaotic Cause and effect not discernable
K nown The realm of Scientific
Knowledge Cause and effect understood
and predicable
Cynefin and statistics
Repea
table
even
ts
Uniqueevents
Events? Actuall
y you
nee
d
explo
rato
ry st
atist
ics
here
to ch
eck t
hat y
ou
reall
y are
in th
e kn
own
or kn
owab
le sp
ace
19
Exploratory analyses• Look at the data
– In any, repeat any analysis, look at the data– It is too easy for data to pass from web
questionnaire to Excel to SPSS to analysis without your looking at the data.
• Simple plots and tables– Tables – do not think them ‘simple’ to construct!– Histograms, Boxplots, Scatterplots, …
• Useful in presenting results too• Generally easy to produce with Excel or SPSS
– If you know what you are trying to achieve– Data mining and data visualisation
Estimation and Confirmatory Analyses
• Based on statistical models– If your experiment needs statistics you should
have done a better experiment WRONG!• Estimation
– Point estimates– Confidence intervals
• Hypothesis tests
Data collection protocol• You need one!!!• Formal theory of experimental design
– How many and which data to collect– Mix of theoretical requirements for accuracy and pragmatism
• But wider than that you need to plan in advance many things about how you will gather your data, be it qualitative or quantitative.
• It is vital that you record your planning and your reasoning.– You will not remember when you come to write your thesis/paper
• You also need a data storage protocol– Keep original data not summaries if you can
• Sufficient statistics are for the theoreticians
– Keep a geographically separate copy for security purposes
Check assumptions• Independence. Usual to assume that the data points are
sampled independently so that– x1, x2, …, xn are independent and identically distributed (iid)
• Think about distributional assumptions– Parameters known?– Normal??? Maybe as approximation but check!– Do not make assumptions on the grounds that the text book gives
a statistical test for those assumptions • Ideally repeat analysis under different assumptions
– Sensitivity analysis• Outliers
– Some recommend removing data that is ‘clearly an outlier’– My view: a bad scientist blames his data – so discard data at your
peril– If you must remove outliers, document reasons and make sure they
are good.• If you cannot see the result in the data (simple plots) and/or it
does not make qualitative sense, question it!
Value focused thinking• “Values are what we care about. As such, values should be
the driving force for our decision making. They should be the basis for the time and effort we spend thinking about decisions. But this is not the way it is. It is not even close to the way it is.”
Keeney (1992)
• Define objectives, research questions, hypotheses at outset– (probably modify pragmatically as research progresses!)– More creative in research design
• Focuses attention on what matters• Helps identify the ‘right’ research/problem solving methodology
Note: whether we talk of objectives, research questions, hypotheses depends on type of research project
Thank you
Back up Slides
26
Tables and Charts• Clarify in titles and notes
– What the data are and where they come from– Units
• 2 or 3 ideas can be shown/explored in a table or chart … no more– Do not make over ‘busy’
• x’s not dustbins for data on waste!– Do not introduce spurious features
• E.g. number the data and accidentally introduce a ranking• Watch for cognitive aspects
– Appropriate scales– Appropriate number of significant figures– In tables: put important variation down the columns– Use of colour
• red-green bad (‘stop’) and good (‘go’) or just colour blind
Regression and Factor Analysis as exploratory analyses
• Often (usually!!!) data is multi-dimensional • It is difficult to see the key trends and
variations by eye• Regression and factor analyses reduce
dimensions to the ‘significant’ ones
Regression Analysis
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
Describe the cloud of data points16 (x,y) points = 32 numbers
Regression Analysis
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
Describe the cloud of data pointsRegression line: y = mx + cPlus standard deviation3 numbers …Trend, base case, and spread
30
Factor Analysis
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
Describe the cloud of data points16 (x,y) points = 32 numbers
Factor Analysis
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
Describe the cloud of data pointsProject each point onto regression line16 numbersKeeps each item separate in summary
Regression and Factor Analysis
• Here we have reduced 16 points in 2 dimensions onto 1 dimension (a line)
• Generally reduce a lot of points in high dimension onto many fewer dimensions
• More general methods known as multivariate analysis– Regression analysis– ANOVA– Factor Analysis– Principal components– Multi-dimensional scaling – ….
Ordinal and Interval Data• Sometimes data only contains ranking
information– Such data is called ordinal data
• Other times the data is measured against a scale with an origin and a unit– Such data is called interval data (or cardinal)
• Most of the methods of multivariate analysis assume interval data– But they work with ordinal data if you take them
with a pinch of salt! (and do not believe or quote significance levels, etc.)
– Read the assumptions behind the methods when using SPSS or similar.
Estimation
Try to find a function of the data that is tightly distributed about the quantity of interest.
Distribution of data
datapointQuantity of interest Distribution of mean
Quantity of interestData mean
Confidence intervalsintervals defined from the data
95% confidence intervals: calculate interval for each of 100 data sets
about 95 will contain .
Hypothesis testingHypothesis test: general
– Compare a null hypothesis H0 and an alternative H1
– Type 1 error: reject H0 when H0 true
– Type 2 error: do not reject H0 when H1 is true• Note never say accept an hypothesis! Best phrasing is
“there is/is not significant evidence against H0”
– Significance level is probability of type 1 error– Power is probability of type 2 error.– Conventionally significance level is set as
5% (significant) or 1% (highly significant)– Define g(x) and a critical region such that the
probability that g(x) lies in the region is less than the significance level if H0 is true
Hypothesis testing
• Note that 5% significance level means that 1 in 20 tests will result in a type 1 error and reject H0 when it is true.
• Thus if you perform lots of tests in your research you will necessarily make lots of mistakes!!!!!
• There are theories of multiple testing to help avoid misinterpretation in such cases
Meta-Analysis• Often there are several related studies in the
literature– Datasets collected under ‘similar’ conditions– Analysis of ‘similar’ research questions
• How do we combine their results and conclusions?• Key point: literature bias
– Insignificant results not published– Some authors cited more often and easier to find than others
• Assumptions of analysis often not fully clear– Data collection procedure– Outliers? Raw data or outliers discarded?
Meta Analysis: key points• Plan it and define a protocol before beginning
– Just as you would define any other data collection procedure• Define criteria for inclusion of studies a priori and use
these to guide a deep and detailed literature search.• Plot the different data sets on the same scales and
‘eyeball’ them– Explore these data just as you would an experimental data
set• No ‘right’ method for combining analyses so try
several if possible and look for common conclusions (or explain differences in terms of different assumptions)
• Check sensitivity and robustness of your combined conclusion as in any other analysis