evidence and doubt, science and policy interdisciplinary seminar: the centre for criminal justice...

30
Evidence and Doubt, Science and Policy Interdisciplinary Seminar: The Centre for Criminal Justice Economics and Psychology York University Paul Marchant Leeds Metropolitan University [email protected] 29 Apr 2008

Post on 19-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Evidence and Doubt, Science and Policy Interdisciplinary Seminar:

The Centre for Criminal Justice Economics and PsychologyYork University

Paul MarchantLeeds Metropolitan [email protected]

29 Apr 2008

Introduction

• A chance to be reflective….to think about science and statistics

• Seminar is billed as ‘Interdisciplinary’• Economics in the Centre’s title...says what

is worth doing…link with policy• ‘Science’ has had amazing success...but

worries about a ‘retreat from science’• There’s a lot of ‘social’ in all science .…

human foibles .…fascinating!

Scepticism

• Is a key aspect of scientific thinking

• Answers are provisional

• Questioning is fundamental

• The problem of the ‘unknown unknowns’

• However scepticism is unwelcome in most circles…Assertion is the way to ‘sell’ things.

Physics: Lifetime of neutron• Lifetime = 1013 (29)s (1963) =

885.7(0.8)s (2006) Later result is 4.4 old standard errors away from the old one.

P-value =10-5 (If the discrepancy were less than 2 SE i.e. p>0.05 we would say the new result is consistent with the old)

Imagine the response to someone at the earlier time suggesting the value was less than 900s.

Some systematic error was affecting the earlier measurement.

Other examples in physics…Physics has been full of surprises…the moving Earth for

example!

• “We know good lighting can reduce night time traffic accidents by up to 30%. It can also reduce street crime by 20%.” Karen Buck, Transport Minister 8 December 2005

• Are these claims true? • What indeed does the statement mean? …the

words “can” and “up to” are problematic. • In science in contrast we would say

“X, in circumstances such and such, has an expected effect Y within a likely range (Ylower,Yupper )”; i.e. within a confidence interval.

The claim about traffic accidents• Based on data from 1970s comparing the

number of night to day accidents PIAs at 89 sites (2-lane major roads, 30mph limit, dry conditions, Illumination 0.5 to 2 cd m-2)

• For a 1 cd m-2 increase gives a 35% reduction in number of night time accidents using a Poisson model, log(night count) with illumination, using day count as offset.

• But the CI is 12% to 51%... (Wide…and it was stated to be so in the paper.)

• The minister should say “based on 89 sites of a certain type of road the expected PIA reduction is likely to be between 12% and 51% for a 1cd m-2 increase.”Includes being not strictly Poisson, some overdispersion is evident. Other count models would give slightly different results.

It seems the 30% figure has been used willy nilly…all roads e.g. including junctions .

The Highways Agency examination

• Note the Highways Agency (which looks after the strategic road network) says their revised estimate based on looking at their records that PIA reduction is 9.6% . (I have not been able to get a CI out of them!)

• All credit to them that they checked

Chief Highways Engineer Memorandum 190/07

• “…requires the Road Safety Engineer to play a key role in the appraisal process which should no longer be solely undertaken by the lighting designer.”

• “Where existing lighting is being considered for replacement there may not always be an economic case for such action.”

-------------------------------------------------------------------

The UK lighting board is examining the case for other types of roads

The crime claim • I have criticised research in this area. I think the 20%

reduction claim is doubtful• The reaction to my criticism of the Cambridge work,

Farrington and others has been interesting…• I do want claims to be correct and science to be

respected. “In science we want a realistic estimation of uncertainties and statistical biases”.

• I’m not of the ‘realistic evaluation’ school, although I have used some of Nick Tilley’s data. I disagree with NT’s claim that RCTs “draw their authority from medicine” (in CJM62)… RCTs draw their authority from scientific logic!

Forest Plot as HORS 251 Meta-analysis

reconstructed

Odds ratio.156196 1 6.40222

Study % Weight Odds ratio (95% CI)

3.82 (2.28,6.40) Birmingham 0.7

1.72 (1.17,2.52) Stoke 1.8

1.39 (1.04,1.86) Atlanta 3.4

1.44 (1.17,1.77) Dudley 6.6

1.38 (0.97,1.97) Fort Worth 2.3

1.37 (1.06,1.77) Milwaukee 4.4

1.35 (1.23,1.47) Bristol 37.8

1.24 (0.95,1.64) Kansas City 4.2

1.14 (0.62,2.08) Dover 0.9

1.02 (0.75,1.40) Harrisburg 3.5

1.01 (0.89,1.15) New Orleans 21.2

0.94 (0.79,1.12) Portland 11.3

0.75 (0.47,1.18) Indianapolis 1.9

1.23 (1.17,1.31) Overall (95% CI)

• Originally CI 1.17 to 1.31 but after my criticism now say, taking into account overdispersion, CI 1.10 to 1.39 but not ‘Regression Towards the Mean’ which investigation suggests could by itself nullify the effect.

• There are other things wrong, basically because of weak research…poor reporting

• Not RCTs. No trials register

Response to my criticism

• “How many evaluations of criminological could withstand such a determined and destructive statistical assault? This is surely not helpful in advancing knowledge about the effectiveness of crime reduction programmes”.

• I ponder the answer to the first question! • To expose weaknesses is the way to advance

knowledge!• Need high standards of evidence whatever the

subject of study.

Better refereeing of statistics

• Need better refereeing of statistical argument, e.g. in BJC…..Just because one can fit a mixture of Poissons to a set of counts does not imply that the variance equals the mean, when it did not in the original data. (Confusion about the sum of Poisson random variables, it would seem )

A ‘pearl’ from Richard Feynman• "If you're doing an experiment you should report

everything that you think might make it invalid - not only what you think is right about it: other causes that could possibly explain your results; and things you thought of that you've eliminated by some other experiment, and how they worked - make sure the other fellow can tell they have been eliminated.  Details that could throw doubt on your interpretation must be given, if you know them ...” from Cargo Cult Science

Science v. PR

• A scientific answer is one which can give a surprising, unwelcome result. PR is not like that! E.g. Making a paper sound more secure than it really is

• ‘For a successful technology, reality must take precedence over public relations, for nature cannot be fooled’…. Feynman in response to the 1986 Space Shuttle disaster.

• ‘Policy’ can be inserted for ‘technology’

Good trials are needed• Well designed and executed.• Need a register of trials to reduce the

effect of publication bias• Trials might need to be big. (An

individual’s own experience is never going to be that great!)

• The trial of screening for Abdominal Aortic Aneurysms of 68000 older men, with cost information included…impressed me.

A note of optimism

• GOS, Science Review of Home Office and MoJ: Recommendation 11

1. More emphasis on independent review of science completed.

2. Across Govt. in evaluating effectiveness of interventions, RCTs should become the rule.

3. Consider establishing a ‘trials unit’.

----------------------------------------------------------

Other science reviews of other Govt. departments in the pipeline.

Need to observe during implementation

• Possibility of getting a ‘Type I’ error. (see something which is not there...simply by chance)

• Trial circumstances might not be the same as in practice.

• There might other effects which are rare but of importance which a trial would not see.

Stepped wedge implementation

• Programmes (of all sorts) may be phased in (e.g. for reasons of resource limitation)

• Opportunity to monitor effectiveness; esp. if randomised. (Brown and Lilford, 2006)

• However randomisation may not be built in…but would be best if it were.

(In general we want the opportunity for Popperian falsification of policy)

• Each unit has a time series of ‘events’ so one might see if there is a ‘shift’ in level from the time series before to time series after….but in any one unit there is a lot of noise. But for lots of units getting the change at different times, an effect might be measured. (But note the units might be correlated)

• It seems to me that such checking ought to be routinely done.

PFIs for new lights for streets: effect on crime

• I spoke on how this might be done at RSS2007 Conference

• Crimes recorded to unit postcode level (e.g. LS6 3QS ) and time of report by police.

• Classified by a number of codes . (Six types are of particular importance given in annual report by Home Office, 38000 of these for Leeds Apr05-Mar06)

Approach

• Group crimes of each type of interest into months for each unit post code.

• Have dummy variable ‘After’ to say when the new lighting is in place (might need another to indicate period of disruption)

• Model as an overdispersed Poisson time series using the dummy ‘After’ to estimate effect of lighting.

Further details

• The crime types form a multivariate set.• Incorporate seasonality into the time series.• Use polynomial in time for underlying trend.• Multilevel model based on post code geography.• Opportunity for higher level explanatory

variables, e.g. sector (LS6 3Q ) (1400 of these for LS) ACORN description

More possibilities • Note the After dummy could be incorporated as

interaction with time as a polynomial, rather than as simple shift (as effect of the intervention might be time variable…..criminological folk lore?)

• Correlations, i.e. effect of neighbouring units, could be incorporated using Multiple Membership Multiple Classification (MMMC) in MLwiN. Estimate using MCMC. (Browne and Ng, 2003, Browne et al. 2001)

• Temporal correlations also…. Autoregressive model?• A detailed study would be a big job. (So far I have not

got any data even for a less detailed analysis.)

Previous implementation data useful…as with any study

• Useful to have the data from a city which has already had the new lighting installed for a retrospective study, in order to develop the modelling framework for a prospective study.

Old wisdom

• For as knowledges are now delivered, there is a kind of contract of error, between the deliverer and the receiver: for he that delivereth knowledge desireth to deliver it in such form as may be best believed, and not as may be best examined; and he that receiveth knowledge desireth rather present satisfaction than expectant inquiry; and so rather not to doubt than not to err: glory making the author not to lay open his weakness, and sloth making the disciple not to know his strength.

Francis Bacon (1605) The Advancement of Learning

Problems of the research system• Claims need lots of independent checking ...lots of

eyes on research. • Research organisations trade on their

expertise ...’but science is the belief in the unimportance irrelevance of experts’…Feynman again! i.e. we just want good data and scientific logic.

• It becomes difficult to admit error, e.g. make corrections to papers.

• Worry that we might end up with a ritualistic value and exchange system…dodgy papers in abundance being published… an anthropological system, e.g. Malinowski’s work in the Trobriand Islands.

Final points• I prefer the term ‘Evidence based policy’ to

‘Evidence informed policy’• Important to check whether public money is

being well spent. Costs of good research are likely to be small compared with policy implementation.

• Scientific evaluation of programmes should be done.

• Scientific thinking is essential. Mathematics is the language of the audit trail of reasoning. Questioning and Scepticism are key parts of science.