techniques for inferring mileage from the department for ... · techniques for inferring mileage...

82
Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable (Aberdeen), Sally Cairns (TRL/UCL), Tim Chatterton (UWE), Oliver Turnbull (Bristol) and others EPSRC grants EP/J004758/1 EP/K000438/1 Faculty of Engineering University of Bristol March 25, 2015

Upload: others

Post on 17-Mar-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Techniques for Inferring Mileage from the Departmentfor Transport’s MOT Data Set

R. Eddie WilsonJillian Anable (Aberdeen), Sally Cairns (TRL/UCL), Tim Chatterton (UWE),

Oliver Turnbull (Bristol) and others

EPSRC grants EP/J004758/1 EP/K000438/1

Faculty of EngineeringUniversity of Bristol

March 25, 2015

Page 2: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

UK MOT (Ministry of Transport) test

I MOT: the UK’s annual safetyinspection for all road vehiclesolder than 3 years

I Since 2005: the results have beencaptured and stored digitially

I Since November 2010 — the DfThas published this data online -spanning back to 2005.

I Key interest: the odometerreading recorded at each test.

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 2 / 36

Page 3: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

A sample of the published data

I But the tests are grouped by year and do not “link” the vehicles(a problem fixed in more recent releases — at my prompting!)

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 3 / 36

Page 4: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Here’s a trick . . .

I Concatenate all files and sort by the “mystery” identifier.You get lots of blocks like this:

I We can follow individuals around and infer their mileage (rate)between consecutive test dates!!!!

I For example, in the interval from 2008-08-11 to 2009-08-05(359 days), I drove 132,299-123,259 = 9,040* miles,at an average rate of 25.18 miles per day.

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 4 / 36

Page 5: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Here’s a trick . . .

I Concatenate all files and sort by the “mystery” identifier.You get lots of blocks like this:

I We can follow individuals around and infer their mileage (rate)between consecutive test dates!!!!

I For example, in the interval from 2008-08-11 to 2009-08-05(359 days), I drove 132,299-123,259 = 9,040* miles,at an average rate of 25.18 miles per day.

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 4 / 36

Page 6: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Here’s a trick . . .

I Concatenate all files and sort by the “mystery” identifier.You get lots of blocks like this:

I We can follow individuals around and infer their mileage (rate)between consecutive test dates!!!!

I For example, in the interval from 2008-08-11 to 2009-08-05(359 days), I drove 132,299-123,259 = 9,040* miles,at an average rate of 25.18 miles per day.

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 4 / 36

Page 7: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Basic analysis object: intervals and their attributes

I Re-arrange blocks of same-vehicle data into consecutive pairs of tests:

Interval First test Second testdate t1 miles x1 place1 date t2 miles x2 place2

1 2005-08-26 99777 BS 2006-08-18 105420 BS2 2006-08-18 105420 BS 2007-08-13 113709 BS3 2007-08-13 113709 BS 2008-08-11 123259 BS4 2008-08-11 123259 BS 2008-08-11 123259 BS5 2008-08-11 123259 BS 2009-08-05 132299 BS

I To which can be linked vehicle-specific attributes:VAUXHALL, ASTRA LS 8V, WHITE, P (fuel), 1598 (cc), 1999 (year)

I (Eg) during interval 3 — I drove at an average rate of(123259− 113709)/364 = 26.24 miles per day, but we don’t knowhow my mileage was distributed during that period.

I These mileage rates are (more or less) complete across the vehiclepopulation — even after cleaning.

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 5 / 36

Page 8: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Population level statistics: straddling rate r(t)

t

test

test

avera

ge m

ileag

e r

ate

t*

I Select all N intervals that straddle agiven observation date t∗

I Each interval yields an average (pervehicle) rate ri .

I Straddling rate r(t∗) is thendefined by the averageaverage

r(t∗) =1

N

N∑i=1

ri .

I It is fine for annual statistics:choose t∗ = 1/7/2007,1/7/2008, 1/7/2009 etc.

I But r(t∗) actuallyincorporates miles drivenover the two year spant∗ − 1 ≤ t < t∗ + 1.

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 6 / 36

Page 9: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Population level statistics: straddling rate r(t)

t

test

test

avera

ge m

ileag

e r

ate

t*

I Select all N intervals that straddle agiven observation date t∗

I Each interval yields an average (pervehicle) rate ri .

I Straddling rate r(t∗) is thendefined by the averageaverage

r(t∗) =1

N

N∑i=1

ri .

I It is fine for annual statistics:choose t∗ = 1/7/2007,1/7/2008, 1/7/2009 etc.

I But r(t∗) actuallyincorporates miles drivenover the two year spant∗ − 1 ≤ t < t∗ + 1.

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 6 / 36

Page 10: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Population level statistics: straddling rate r(t)

t

test

test

avera

ge m

ileag

e r

ate

t*

I Select all N intervals that straddle agiven observation date t∗

I Each interval yields an average (pervehicle) rate ri .

I Straddling rate r(t∗) is thendefined by the averageaverage

r(t∗) =1

N

N∑i=1

ri .

I It is fine for annual statistics:choose t∗ = 1/7/2007,1/7/2008, 1/7/2009 etc.

I But r(t∗) actuallyincorporates miles drivenover the two year spant∗ − 1 ≤ t < t∗ + 1.

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 6 / 36

Page 11: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Mileage distributions: new(ish) vehicles

0 10 20 30 40 50 60 70 80 90 1000

0.05

0.1

0.15

0.2

0.25

Daily mileage

Nor

mal

ised

freq

uenc

yWest London vs Kirkcaldy: First registration 2004

West London (W)

Mean, Median

18.2768, 14.8481

Kirkcaldy (KY)

Mean, Median

25.5864, 22.6945

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 7 / 36

Page 12: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Mileage distributions: older vehicles

0 10 20 30 40 50 60 70 80 90 1000

0.05

0.1

0.15

0.2

0.25

Daily mileage

Nor

mal

ised

freq

uenc

yWest London vs Kirkcaldy: First registration 2000

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 8 / 36

Page 13: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Mileage distributions: even older vehicles

0 10 20 30 40 50 60 70 80 90 1000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Daily mileage

Nor

mal

ised

freq

uenc

yWest London vs Kirkcaldy: First registration 1996

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 9 / 36

Page 14: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Mileage distributions: old vehicles

0 10 20 30 40 50 60 70 80 90 1000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Daily mileage

Nor

mal

ised

freq

uenc

yWest London vs Kirkcaldy: First registration 1992

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 10 / 36

Page 15: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

From the Straddling Rate to the Census Date Rate

I Progression of a vehicle’s odometer with time

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 11 / 36

Page 16: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

From the Straddling Rate to the Census Date Rate

I Progression of a vehicle’s odometer with time — with tests

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 12 / 36

Page 17: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

From the Straddling Rate to the Census Date Rate

I The tests do not allow you to distinguish the 2 trajectories.

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 13 / 36

Page 18: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

From the Straddling Rate to the Census Date Rate

I Distributions derived from straddling rate suffer anomalous variancebecause some intervals are very short

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 14 / 36

Page 19: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

From the Straddling Rate to the Census Date Rate

I Solution is to interpolate onto some given census dates . . .

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 15 / 36

Page 20: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

From the Straddling Rate to the Census Date Rate

I . . . and use the rates between the census dates.(Also neatly synchronises the data into calendar year comparisons.)

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 16 / 36

Page 21: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Five digit odometer problem

FR

EQ

UE

NC

Y (

10 m

ile b

ins)

JUMP

ODOMETER READING

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 17 / 36

Page 22: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Cleaning: How to Deal with Bad Odometers

Solution 1: don’t worry about it too much

I Compute rates as if all odometers are perfectly correctI Reject intervals (*) if rates which are outside a reasonable range:

I Below 0I Above 150 miles per day (?)

I Scale population statistics up for the intervals of vehicles thusdiscarded

(*) Nomenclature: will talk of intervals as Bad or Good.

Solution 2: try to identify which individual odometer entries are bad andremove them instead

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 18 / 36

Page 23: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Cleaning: How to Deal with Bad Odometers

Solution 1: don’t worry about it too much

I Compute rates as if all odometers are perfectly correct

I Reject intervals (*) if rates which are outside a reasonable range:I Below 0I Above 150 miles per day (?)

I Scale population statistics up for the intervals of vehicles thusdiscarded

(*) Nomenclature: will talk of intervals as Bad or Good.

Solution 2: try to identify which individual odometer entries are bad andremove them instead

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 18 / 36

Page 24: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Cleaning: How to Deal with Bad Odometers

Solution 1: don’t worry about it too much

I Compute rates as if all odometers are perfectly correctI Reject intervals (*) if rates which are outside a reasonable range:

I Below 0I Above 150 miles per day (?)

I Scale population statistics up for the intervals of vehicles thusdiscarded

(*) Nomenclature: will talk of intervals as Bad or Good.

Solution 2: try to identify which individual odometer entries are bad andremove them instead

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 18 / 36

Page 25: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Cleaning: How to Deal with Bad Odometers

Solution 1: don’t worry about it too much

I Compute rates as if all odometers are perfectly correctI Reject intervals (*) if rates which are outside a reasonable range:

I Below 0I Above 150 miles per day (?)

I Scale population statistics up for the intervals of vehicles thusdiscarded

(*) Nomenclature: will talk of intervals as Bad or Good.

Solution 2: try to identify which individual odometer entries are bad andremove them instead

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 18 / 36

Page 26: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Cleaning: How to Deal with Bad Odometers

Solution 1: don’t worry about it too much

I Compute rates as if all odometers are perfectly correctI Reject intervals (*) if rates which are outside a reasonable range:

I Below 0I Above 150 miles per day (?)

I Scale population statistics up for the intervals of vehicles thusdiscarded

(*) Nomenclature: will talk of intervals as Bad or Good.

Solution 2: try to identify which individual odometer entries are bad andremove them instead

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 18 / 36

Page 27: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

When two (or more) Bads make a Good

miles d

riven

x

time t

BB

negative

mileage

mileage rate

too high

I The middle odometer entry is (probably) erroneous —due to a missing digit in the data entry?

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 19 / 36

Page 28: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

When two (or more) Bads make a Good

miles d

riven

x

time t

BB

negative

mileage

mileage rate

too high

good

G

I The middle odometer entry is (probably) erroneous — due to amissing digit?

I The spanning interval without the middle test is (probably) ok.

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 20 / 36

Page 29: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Syntactic games

I Represent each vehicle’s intervals as a sequence of B and G. Forexample BGGGBBGGBGG.

I Try to remove tests to end up with a sequence that is all G.

I Multiple consecutive Bs should be replaced with the spanning intervalwhich is either G (problem solved) or perhaps B.

I Only remaining problem is singleton B —which end of the bad interval should be removed?

I Endpoint B: delete the end test (yes, you then need infill)

I Interior B: a messy mixture of clocking events; clock rollover;(mild) centrally bad cases etc.

I Look at removing either or both ends so as to generate G.Repeat

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 21 / 36

Page 30: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Syntactic games

I Represent each vehicle’s intervals as a sequence of B and G. Forexample BGGGBBGGBGG.

I Try to remove tests to end up with a sequence that is all G.

I Multiple consecutive Bs should be replaced with the spanning intervalwhich is either G (problem solved) or perhaps B.

I Only remaining problem is singleton B —which end of the bad interval should be removed?

I Endpoint B: delete the end test (yes, you then need infill)

I Interior B: a messy mixture of clocking events; clock rollover;(mild) centrally bad cases etc.

I Look at removing either or both ends so as to generate G.Repeat

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 21 / 36

Page 31: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Syntactic games

I Represent each vehicle’s intervals as a sequence of B and G. Forexample BGGGBBGGBGG.

I Try to remove tests to end up with a sequence that is all G.

I Multiple consecutive Bs should be replaced with the spanning intervalwhich is either G (problem solved) or perhaps B.

I Only remaining problem is singleton B —which end of the bad interval should be removed?

I Endpoint B: delete the end test (yes, you then need infill)

I Interior B: a messy mixture of clocking events; clock rollover;(mild) centrally bad cases etc.

I Look at removing either or both ends so as to generate G.Repeat

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 21 / 36

Page 32: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Syntactic games

I Represent each vehicle’s intervals as a sequence of B and G. Forexample BGGGBBGGBGG.

I Try to remove tests to end up with a sequence that is all G.

I Multiple consecutive Bs should be replaced with the spanning intervalwhich is either G (problem solved) or perhaps B.

I Only remaining problem is singleton B —which end of the bad interval should be removed?

I Endpoint B: delete the end test (yes, you then need infill)

I Interior B: a messy mixture of clocking events; clock rollover;(mild) centrally bad cases etc.

I Look at removing either or both ends so as to generate G.Repeat

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 21 / 36

Page 33: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Syntactic games

I Represent each vehicle’s intervals as a sequence of B and G. Forexample BGGGBBGGBGG.

I Try to remove tests to end up with a sequence that is all G.

I Multiple consecutive Bs should be replaced with the spanning intervalwhich is either G (problem solved) or perhaps B.

I Only remaining problem is singleton B —which end of the bad interval should be removed?

I Endpoint B: delete the end test (yes, you then need infill)

I Interior B: a messy mixture of clocking events; clock rollover;(mild) centrally bad cases etc.

I Look at removing either or both ends so as to generate G.Repeat

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 21 / 36

Page 34: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Syntactic games

I Represent each vehicle’s intervals as a sequence of B and G. Forexample BGGGBBGGBGG.

I Try to remove tests to end up with a sequence that is all G.

I Multiple consecutive Bs should be replaced with the spanning intervalwhich is either G (problem solved) or perhaps B.

I Only remaining problem is singleton B —which end of the bad interval should be removed?

I Endpoint B: delete the end test (yes, you then need infill)

I Interior B: a messy mixture of clocking events; clock rollover;(mild) centrally bad cases etc.

I Look at removing either or both ends so as to generate G.Repeat

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 21 / 36

Page 35: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Syntactic games

I Represent each vehicle’s intervals as a sequence of B and G. Forexample BGGGBBGGBGG.

I Try to remove tests to end up with a sequence that is all G.

I Multiple consecutive Bs should be replaced with the spanning intervalwhich is either G (problem solved) or perhaps B.

I Only remaining problem is singleton B —which end of the bad interval should be removed?

I Endpoint B: delete the end test (yes, you then need infill)

I Interior B: a messy mixture of clocking events; clock rollover;(mild) centrally bad cases etc.

I Look at removing either or both ends so as to generate G.Repeat

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 21 / 36

Page 36: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

How to deal with multiple tests on the same day (I)(need to pare down to a single odometer reading per test day)

miles d

riven

x

time

B

B

t1 t2

I We want to complete previous syntactic procedure before decidingwhich test to select for each date.

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 22 / 36

Page 37: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

How to deal with multiple tests on the same day (II)

I Compute 4 rates, from the odometer pairs

(xmin1 , xmin

2 ) (xmax1 , xmax

2 ) (xmin1 , xmax

2 ) (xmax1 , xmin

2 )

I We call the intervalI Certainly Bad, if all 4 rates are BadI Certainly Good, if all 4 rates are GoodI Don’t know — if there is a mix

I The D are rare — no great loss in calling them B

I Note: for certainly Bad: there might be a good interval if there are 3or more distinct tests at both t1 and t2: also rare

I Proceed with previous procedure using certainly Bad and Good.

I Finally — decide which odometer at each t to use at the end.(For example: the median value.)

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 23 / 36

Page 38: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

How to deal with multiple tests on the same day (II)

I Compute 4 rates, from the odometer pairs

(xmin1 , xmin

2 ) (xmax1 , xmax

2 ) (xmin1 , xmax

2 ) (xmax1 , xmin

2 )

I We call the intervalI Certainly Bad, if all 4 rates are BadI Certainly Good, if all 4 rates are GoodI Don’t know — if there is a mix

I The D are rare — no great loss in calling them B

I Note: for certainly Bad: there might be a good interval if there are 3or more distinct tests at both t1 and t2: also rare

I Proceed with previous procedure using certainly Bad and Good.

I Finally — decide which odometer at each t to use at the end.(For example: the median value.)

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 23 / 36

Page 39: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

How to deal with multiple tests on the same day (II)

I Compute 4 rates, from the odometer pairs

(xmin1 , xmin

2 ) (xmax1 , xmax

2 ) (xmin1 , xmax

2 ) (xmax1 , xmin

2 )

I We call the intervalI Certainly Bad, if all 4 rates are BadI Certainly Good, if all 4 rates are GoodI Don’t know — if there is a mix

I The D are rare — no great loss in calling them B

I Note: for certainly Bad: there might be a good interval if there are 3or more distinct tests at both t1 and t2: also rare

I Proceed with previous procedure using certainly Bad and Good.

I Finally — decide which odometer at each t to use at the end.(For example: the median value.)

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 23 / 36

Page 40: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

How to deal with multiple tests on the same day (II)

I Compute 4 rates, from the odometer pairs

(xmin1 , xmin

2 ) (xmax1 , xmax

2 ) (xmin1 , xmax

2 ) (xmax1 , xmin

2 )

I We call the intervalI Certainly Bad, if all 4 rates are BadI Certainly Good, if all 4 rates are GoodI Don’t know — if there is a mix

I The D are rare — no great loss in calling them B

I Note: for certainly Bad: there might be a good interval if there are 3or more distinct tests at both t1 and t2: also rare

I Proceed with previous procedure using certainly Bad and Good.

I Finally — decide which odometer at each t to use at the end.(For example: the median value.)

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 23 / 36

Page 41: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

How to deal with multiple tests on the same day (II)

I Compute 4 rates, from the odometer pairs

(xmin1 , xmin

2 ) (xmax1 , xmax

2 ) (xmin1 , xmax

2 ) (xmax1 , xmin

2 )

I We call the intervalI Certainly Bad, if all 4 rates are BadI Certainly Good, if all 4 rates are GoodI Don’t know — if there is a mix

I The D are rare — no great loss in calling them B

I Note: for certainly Bad: there might be a good interval if there are 3or more distinct tests at both t1 and t2: also rare

I Proceed with previous procedure using certainly Bad and Good.

I Finally — decide which odometer at each t to use at the end.(For example: the median value.)

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 23 / 36

Page 42: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

How to deal with multiple tests on the same day (II)

I Compute 4 rates, from the odometer pairs

(xmin1 , xmin

2 ) (xmax1 , xmax

2 ) (xmin1 , xmax

2 ) (xmax1 , xmin

2 )

I We call the intervalI Certainly Bad, if all 4 rates are BadI Certainly Good, if all 4 rates are GoodI Don’t know — if there is a mix

I The D are rare — no great loss in calling them B

I Note: for certainly Bad: there might be a good interval if there are 3or more distinct tests at both t1 and t2: also rare

I Proceed with previous procedure using certainly Bad and Good.

I Finally — decide which odometer at each t to use at the end.(For example: the median value.)

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 23 / 36

Page 43: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Central Question for Remainder of Talk

Recall that I cannot possibly say anything aboutan individual’s mileage on finer time scales thanone year.

But can I derive something about population levelmileage over shorter time scales — eg a month?

Possible application: detect the sharp drop in driving in Autumn 2008following Lehman brothers collapse.

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 24 / 36

Page 44: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Central Question for Remainder of Talk

Recall that I cannot possibly say anything aboutan individual’s mileage on finer time scales thanone year.

But can I derive something about population levelmileage over shorter time scales — eg a month?

Possible application: detect the sharp drop in driving in Autumn 2008following Lehman brothers collapse.

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 24 / 36

Page 45: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

How to compute temporal evolution of mileage rates?

I Erm, isn’t it obvious?

I Take a given sequence ti , i = 1, 2, . . .

I Compute corresponding r(ti ) using straddling procedure

I Pairs (ti , r(ti )) reconstruct r(t)

I Actually . . . this process is flawed. . .But just look what we can do with it!!!

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 25 / 36

Page 46: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

How to compute temporal evolution of mileage rates?

I Erm, isn’t it obvious?

I Take a given sequence ti , i = 1, 2, . . .

I Compute corresponding r(ti ) using straddling procedure

I Pairs (ti , r(ti )) reconstruct r(t)

I Actually . . . this process is flawed. . .But just look what we can do with it!!!

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 25 / 36

Page 47: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

How to compute temporal evolution of mileage rates?

I Erm, isn’t it obvious?

I Take a given sequence ti , i = 1, 2, . . .

I Compute corresponding r(ti ) using straddling procedure

I Pairs (ti , r(ti )) reconstruct r(t)

I Actually . . . this process is flawed. . .But just look what we can do with it!!!

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 25 / 36

Page 48: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

How to compute temporal evolution of mileage rates?

I Erm, isn’t it obvious?

I Take a given sequence ti , i = 1, 2, . . .

I Compute corresponding r(ti ) using straddling procedure

I Pairs (ti , r(ti )) reconstruct r(t)

I Actually . . . this process is flawed. . .But just look what we can do with it!!!

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 25 / 36

Page 49: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

How to compute temporal evolution of mileage rates?

I Erm, isn’t it obvious?

I Take a given sequence ti , i = 1, 2, . . .

I Compute corresponding r(ti ) using straddling procedure

I Pairs (ti , r(ti )) reconstruct r(t)

I Actually . . . this process is flawed. . .But just look what we can do with it!!!

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 25 / 36

Page 50: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

How to compute temporal evolution of mileage rates?

I Erm, isn’t it obvious?

I Take a given sequence ti , i = 1, 2, . . .

I Compute corresponding r(ti ) using straddling procedure

I Pairs (ti , r(ti )) reconstruct r(t)

I Actually . . . this process is flawed. . .But just look what we can do with it!!!

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 25 / 36

Page 51: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Example of temporal evolution via straddling (WRONG)

J F M A M J J A S O N D J F M A M J J A S O N D12

14

16

18

20

22

24

26

28

date: 2007−2008

aver

age

aver

age

mile

age

rate

1991199319951997199920012003

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 26 / 36

Page 52: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Basic postulate: the population spot rate φ(t)

I Suppose there is a population-level spot rate φ(t) that modulates allvehicles’ mileage (alt. restrict to a population segment).

I Then each vehicle i has an individual spot rate φi (t) with

φi (t) = ciφ(t) + noise.

Here ci=const.; 〈ci 〉 = 1; and 〈noise〉 = 0, so that φ = 〈φi 〉.

I Let ψi (τ) denote miles driven by i between testsat times τ − 1/2 and τ + 1/2. Then

ψi (τ) =

∫ τ+1/2

τ−1/2(ciφ(s) + noise) ds, = ci

∫ τ+1/2

τ−1/2φ(s)ds.

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 27 / 36

Page 53: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Basic postulate: the population spot rate φ(t)

I Suppose there is a population-level spot rate φ(t) that modulates allvehicles’ mileage (alt. restrict to a population segment).

I Then each vehicle i has an individual spot rate φi (t) with

φi (t) = ciφ(t) + noise.

Here ci=const.; 〈ci 〉 = 1; and 〈noise〉 = 0, so that φ = 〈φi 〉.

I Let ψi (τ) denote miles driven by i between testsat times τ − 1/2 and τ + 1/2. Then

ψi (τ) =

∫ τ+1/2

τ−1/2(ciφ(s) + noise) ds, = ci

∫ τ+1/2

τ−1/2φ(s)ds.

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 27 / 36

Page 54: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Basic postulate: the population spot rate φ(t)

I Suppose there is a population-level spot rate φ(t) that modulates allvehicles’ mileage (alt. restrict to a population segment).

I Then each vehicle i has an individual spot rate φi (t) with

φi (t) = ciφ(t) + noise.

Here ci=const.; 〈ci 〉 = 1; and 〈noise〉 = 0, so that φ = 〈φi 〉.

I Let ψi (τ) denote miles driven by i between testsat times τ − 1/2 and τ + 1/2. Then

ψi (τ) =

∫ τ+1/2

τ−1/2(ciφ(s) + noise) ds, = ci

∫ τ+1/2

τ−1/2φ(s)ds.

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 27 / 36

Page 55: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

From the spot rate to the straddling rateI Thus by averaging over tests that straddle t:

r(t) =

∫ t+1/2

t−1/2〈ψi (τ)〉i dτ =

∫ t+1/2

t−1/2〈ci 〉

∫ τ+1/2

τ−1/2φ(s) ds dτ .

I Simplify integral by 〈ci 〉 = 1 and reverse the order of integration

r(t) =

∫ t+1

t−1w(s; t)φ(s)ds,

s

w(s;t)

1

t−1 t t+1

Triangular kernel

I Thus φ(t) leads to r(t).But we want to derive φ(t) from r(t) (which is derivable from data).

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 28 / 36

Page 56: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

From the spot rate to the straddling rateI Thus by averaging over tests that straddle t:

r(t) =

∫ t+1/2

t−1/2〈ψi (τ)〉i dτ =

∫ t+1/2

t−1/2〈ci 〉

∫ τ+1/2

τ−1/2φ(s) ds dτ .

I Simplify integral by 〈ci 〉 = 1 and reverse the order of integration

r(t) =

∫ t+1

t−1w(s; t)φ(s) ds,

s

w(s;t)

1

t−1 t t+1

Triangular kernel

I Thus φ(t) leads to r(t).But we want to derive φ(t) from r(t) (which is derivable from data).

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 28 / 36

Page 57: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

From the straddling rate to the spot rate

I See TR-E 2013 for a whole bunch of Mathematics!!! - upshot:

r ′′(t) = φ(t + 1)− 2φ(t) + φ(t − 1).

I Isolate φ(t + 1) to derive a time-stepping scheme to evolve φ(t), witha time-step ∆t (= 1 month, say)

I Compute r(t) from data at a mesh of points ti , and estimate r ′′(t) bythe divided difference — a natural step size is ∆t.

I in practice: r(t) is noisy, so the difference is applied to a smoothingleast squares fit spline.

I Unfortunately: 2 years of initial data for φ(t) are required — at thefine scale resolution ∆t.

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 29 / 36

Page 58: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

From the straddling rate to the spot rate

I See TR-E 2013 for a whole bunch of Mathematics!!! - upshot:

r ′′(t) = φ(t + 1)− 2φ(t) + φ(t − 1).

I Isolate φ(t + 1) to derive a time-stepping scheme to evolve φ(t), witha time-step ∆t (= 1 month, say)

I Compute r(t) from data at a mesh of points ti , and estimate r ′′(t) bythe divided difference — a natural step size is ∆t.

I in practice: r(t) is noisy, so the difference is applied to a smoothingleast squares fit spline.

I Unfortunately: 2 years of initial data for φ(t) are required — at thefine scale resolution ∆t.

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 29 / 36

Page 59: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

From the straddling rate to the spot rate

I See TR-E 2013 for a whole bunch of Mathematics!!! - upshot:

r ′′(t) = φ(t + 1)− 2φ(t) + φ(t − 1).

I Isolate φ(t + 1) to derive a time-stepping scheme to evolve φ(t), witha time-step ∆t (= 1 month, say)

I Compute r(t) from data at a mesh of points ti , and estimate r ′′(t) bythe divided difference — a natural step size is ∆t.

I in practice: r(t) is noisy, so the difference is applied to a smoothingleast squares fit spline.

I Unfortunately: 2 years of initial data for φ(t) are required — at thefine scale resolution ∆t.

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 29 / 36

Page 60: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

From the straddling rate to the spot rate

I See TR-E 2013 for a whole bunch of Mathematics!!! - upshot:

r ′′(t) = φ(t + 1)− 2φ(t) + φ(t − 1).

I Isolate φ(t + 1) to derive a time-stepping scheme to evolve φ(t), witha time-step ∆t (= 1 month, say)

I Compute r(t) from data at a mesh of points ti , and estimate r ′′(t) bythe divided difference — a natural step size is ∆t.

I in practice: r(t) is noisy, so the difference is applied to a smoothingleast squares fit spline.

I Unfortunately: 2 years of initial data for φ(t) are required — at thefine scale resolution ∆t.

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 29 / 36

Page 61: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Refinement of the straddling rate idea

t

av

era

ge

mil

ea

ge

ra

te

t* t*+a

I Select only the intervals thatstraddle t∗ and with right hand endsbefore t∗ + α, with α ≤ 1 year.

I Call resulting average averagestraddle rate rα(t)

I Crank the handle to give:

r ′′α(t) =1

α[φ(t + α)− φ(t)]

− 1

α[φ(t − 1 + α)− φ(t − 1)]

I Gives time-stepping scheme:but only 1 + α years ofinitial data required.

I So interest is in α→ 0,which givesr ′α(t) ' φ′(t)− φ′(t − 1)(natural meaning)

I α→ 0 means fewer andfewer intervals, means noisyrα(t)

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 30 / 36

Page 62: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Refinement of the straddling rate idea

t

av

era

ge

mil

ea

ge

ra

te

t* t*+a

I Select only the intervals thatstraddle t∗ and with right hand endsbefore t∗ + α, with α ≤ 1 year.

I Call resulting average averagestraddle rate rα(t)

I Crank the handle to give:

r ′′α(t) =1

α[φ(t + α)− φ(t)]

− 1

α[φ(t − 1 + α)− φ(t − 1)]

I Gives time-stepping scheme:but only 1 + α years ofinitial data required.

I So interest is in α→ 0,which givesr ′α(t) ' φ′(t)− φ′(t − 1)(natural meaning)

I α→ 0 means fewer andfewer intervals, means noisyrα(t)

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 30 / 36

Page 63: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Refinement of the straddling rate idea

t

av

era

ge

mil

ea

ge

ra

te

t* t*+a

I Select only the intervals thatstraddle t∗ and with right hand endsbefore t∗ + α, with α ≤ 1 year.

I Call resulting average averagestraddle rate rα(t)

I Crank the handle to give:

r ′′α(t) =1

α[φ(t + α)− φ(t)]

− 1

α[φ(t − 1 + α)− φ(t − 1)]

I Gives time-stepping scheme:but only 1 + α years ofinitial data required.

I So interest is in α→ 0,which givesr ′α(t) ' φ′(t)− φ′(t − 1)(natural meaning)

I α→ 0 means fewer andfewer intervals, means noisyrα(t)

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 30 / 36

Page 64: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Refinement of the straddling rate idea

t

av

era

ge

mil

ea

ge

ra

te

t* t*+a

I Select only the intervals thatstraddle t∗ and with right hand endsbefore t∗ + α, with α ≤ 1 year.

I Call resulting average averagestraddle rate rα(t)

I Crank the handle to give:

r ′′α(t) =1

α[φ(t + α)− φ(t)]

− 1

α[φ(t − 1 + α)− φ(t − 1)]

I Gives time-stepping scheme:but only 1 + α years ofinitial data required.

I So interest is in α→ 0,which givesr ′α(t) ' φ′(t)− φ′(t − 1)(natural meaning)

I α→ 0 means fewer andfewer intervals, means noisyrα(t)

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 30 / 36

Page 65: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Refinement of the straddling rate idea

t

av

era

ge

mil

ea

ge

ra

te

t* t*+a

I Select only the intervals thatstraddle t∗ and with right hand endsbefore t∗ + α, with α ≤ 1 year.

I Call resulting average averagestraddle rate rα(t)

I Crank the handle to give:

r ′′α(t) =1

α[φ(t + α)− φ(t)]

− 1

α[φ(t − 1 + α)− φ(t − 1)]

I Gives time-stepping scheme:but only 1 + α years ofinitial data required.

I So interest is in α→ 0,which givesr ′α(t) ' φ′(t)− φ′(t − 1)(natural meaning)

I α→ 0 means fewer andfewer intervals, means noisyrα(t)

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 30 / 36

Page 66: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Synthetic data set-up

I Choose spot rateφ(t) = 8000 + 500t − 1000 cos 2πt

− 1000[t − 2

]+

(t − 2)2,

I 106 vehicles with tests 1 yearapart, test dates uniformlydistributed through calendaryear

I Vehicle i daily mileage drawnfrom a distribution modulatedby φ(t) and (random) ci .

I Odometer readings on test datesare synthesised by addingindividual vehicle daily totals

−1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 44000

5000

6000

7000

8000

9000

10000

11000

Time (years)M

iles

per

year

START

phirbar: alpha=1.0rbar: alpha=0.25rbar: alpha=0.1

I Periodic component in spot rateφ(t) is suppressed in straddlingrates rα(t)

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 31 / 36

Page 67: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Results with synthetic data: α = ∆t = 0.1 years

−1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 43000

4000

5000

6000

7000

8000

9000

10000

Numerical solutionExact value

I Reconstructed φ(t) almost indistinguishable from ground truth.

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 32 / 36

Page 68: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Straddling rates rα(t) for real-world data

Jan07 Jan08 Jan09 Jan10

7300

7400

7500

7600

7700

7800

7900

8000

8100

8200

Time

Alp

ha−

wei

ghte

d rb

ar (

mile

s pe

r ye

ar)

13 wks4 wks

I Seasonal component shouldn’t be there: underlying assumptions ofthe theory are broken

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 33 / 36

Page 69: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Implicit assumptions in the theory. . .

A1 We assume that tests (odometer readings) are exactly one year apart.

I OKish — theory can be generalised.I In fact — marginal failure of this assumption can be used to quantify

seasonal variation.

A2 We assume that tests occur at same frequency on average throughoutyear.

I Not true — but easy to fix theory.

A3 We assume that a vehicle’s mileage rate is independent of the time ofyear of at which it is tested (and its odometer is read).

I Completely wrong. And very hard to fix.

On A3: fails because a pattern in new vehicle registrations throughout theyear (in the UK).

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 34 / 36

Page 70: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Implicit assumptions in the theory. . .

A1 We assume that tests (odometer readings) are exactly one year apart.I OKish — theory can be generalised.

I In fact — marginal failure of this assumption can be used to quantifyseasonal variation.

A2 We assume that tests occur at same frequency on average throughoutyear.

I Not true — but easy to fix theory.

A3 We assume that a vehicle’s mileage rate is independent of the time ofyear of at which it is tested (and its odometer is read).

I Completely wrong. And very hard to fix.

On A3: fails because a pattern in new vehicle registrations throughout theyear (in the UK).

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 34 / 36

Page 71: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Implicit assumptions in the theory. . .

A1 We assume that tests (odometer readings) are exactly one year apart.I OKish — theory can be generalised.I In fact — marginal failure of this assumption can be used to quantify

seasonal variation.

A2 We assume that tests occur at same frequency on average throughoutyear.

I Not true — but easy to fix theory.

A3 We assume that a vehicle’s mileage rate is independent of the time ofyear of at which it is tested (and its odometer is read).

I Completely wrong. And very hard to fix.

On A3: fails because a pattern in new vehicle registrations throughout theyear (in the UK).

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 34 / 36

Page 72: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Implicit assumptions in the theory. . .

A1 We assume that tests (odometer readings) are exactly one year apart.I OKish — theory can be generalised.I In fact — marginal failure of this assumption can be used to quantify

seasonal variation.

A2 We assume that tests occur at same frequency on average throughoutyear.

I Not true — but easy to fix theory.

A3 We assume that a vehicle’s mileage rate is independent of the time ofyear of at which it is tested (and its odometer is read).

I Completely wrong. And very hard to fix.

On A3: fails because a pattern in new vehicle registrations throughout theyear (in the UK).

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 34 / 36

Page 73: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Implicit assumptions in the theory. . .

A1 We assume that tests (odometer readings) are exactly one year apart.I OKish — theory can be generalised.I In fact — marginal failure of this assumption can be used to quantify

seasonal variation.

A2 We assume that tests occur at same frequency on average throughoutyear.

I Not true — but easy to fix theory.

A3 We assume that a vehicle’s mileage rate is independent of the time ofyear of at which it is tested (and its odometer is read).

I Completely wrong. And very hard to fix.

On A3: fails because a pattern in new vehicle registrations throughout theyear (in the UK).

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 34 / 36

Page 74: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Implicit assumptions in the theory. . .

A1 We assume that tests (odometer readings) are exactly one year apart.I OKish — theory can be generalised.I In fact — marginal failure of this assumption can be used to quantify

seasonal variation.

A2 We assume that tests occur at same frequency on average throughoutyear.

I Not true — but easy to fix theory.

A3 We assume that a vehicle’s mileage rate is independent of the time ofyear of at which it is tested (and its odometer is read).

I Completely wrong. And very hard to fix.

On A3: fails because a pattern in new vehicle registrations throughout theyear (in the UK).

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 34 / 36

Page 75: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Implicit assumptions in the theory. . .

A1 We assume that tests (odometer readings) are exactly one year apart.I OKish — theory can be generalised.I In fact — marginal failure of this assumption can be used to quantify

seasonal variation.

A2 We assume that tests occur at same frequency on average throughoutyear.

I Not true — but easy to fix theory.

A3 We assume that a vehicle’s mileage rate is independent of the time ofyear of at which it is tested (and its odometer is read).

I Completely wrong. And very hard to fix.

On A3: fails because a pattern in new vehicle registrations throughout theyear (in the UK).

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 34 / 36

Page 76: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Conclusions and Further Work (I)

I Incidental data is beautiful! (and useful and cheap)

I (Inadvertently) the MOT set provides vehicle usage data — notintentioned by its release — which is not available elsewhere(at least in this quantity and detail)

I Other data sources might enable huge extensions:

1. Per vehicle emissions data2. Fine scale data (month?) for point of first use3. Fine scale location data (LLSOA of registered keepers?)4. Link vehicles with same registered keeper / address

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 35 / 36

Page 77: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Conclusions and Further Work (I)

I Incidental data is beautiful! (and useful and cheap)

I (Inadvertently) the MOT set provides vehicle usage data — notintentioned by its release — which is not available elsewhere

(at least in this quantity and detail)

I Other data sources might enable huge extensions:

1. Per vehicle emissions data2. Fine scale data (month?) for point of first use3. Fine scale location data (LLSOA of registered keepers?)4. Link vehicles with same registered keeper / address

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 35 / 36

Page 78: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Conclusions and Further Work (I)

I Incidental data is beautiful! (and useful and cheap)

I (Inadvertently) the MOT set provides vehicle usage data — notintentioned by its release — which is not available elsewhere(at least in this quantity and detail)

I Other data sources might enable huge extensions:

1. Per vehicle emissions data2. Fine scale data (month?) for point of first use3. Fine scale location data (LLSOA of registered keepers?)4. Link vehicles with same registered keeper / address

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 35 / 36

Page 79: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Conclusions and Further Work (I)

I Incidental data is beautiful! (and useful and cheap)

I (Inadvertently) the MOT set provides vehicle usage data — notintentioned by its release — which is not available elsewhere(at least in this quantity and detail)

I Other data sources might enable huge extensions:

1. Per vehicle emissions data2. Fine scale data (month?) for point of first use3. Fine scale location data (LLSOA of registered keepers?)4. Link vehicles with same registered keeper / address

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 35 / 36

Page 80: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Conclusions and Further Work (II)

I Methods developed which extract population-level spot rate mileagefrom widely spaced individual vehicle odometer readings. Successwith synthetic data.

I UK MOT data set: some fixes/patches to theory are needed.

I Please contact me if you know of other datasets (international) inwhich odometer readings are systematically collected.

I These methods have the potential to complement / replace existingsurvey-based / link-flow techniques for estimating population-levelmileage.

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 36 / 36

Page 81: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Conclusions and Further Work (II)

I Methods developed which extract population-level spot rate mileagefrom widely spaced individual vehicle odometer readings. Successwith synthetic data.

I UK MOT data set: some fixes/patches to theory are needed.

I Please contact me if you know of other datasets (international) inwhich odometer readings are systematically collected.

I These methods have the potential to complement / replace existingsurvey-based / link-flow techniques for estimating population-levelmileage.

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 36 / 36

Page 82: Techniques for Inferring Mileage from the Department for ... · Techniques for Inferring Mileage from the Department for Transport’s MOT Data Set R. Eddie Wilson Jillian Anable

Conclusions and Further Work (II)

I Methods developed which extract population-level spot rate mileagefrom widely spaced individual vehicle odometer readings. Successwith synthetic data.

I UK MOT data set: some fixes/patches to theory are needed.

I Please contact me if you know of other datasets (international) inwhich odometer readings are systematically collected.

I These methods have the potential to complement / replace existingsurvey-based / link-flow techniques for estimating population-levelmileage.

R.E. Wilson et al (UoB) Temporal Mileage Rates March 25, 2015 36 / 36