improving timeliness and accuracy of estimates from the uk

14
Full Terms & Conditions of access and use can be found at https://www.tandfonline.com/action/journalInformation?journalCode=tstf20 Statistical Theory and Related Fields ISSN: 2475-4269 (Print) 2475-4277 (Online) Journal homepage: https://www.tandfonline.com/loi/tstf20 Improving timeliness and accuracy of estimates from the UK labour force survey D. J. Elliott & P. Zong To cite this article: D. J. Elliott & P. Zong (2019) Improving timeliness and accuracy of estimates from the UK labour force survey, Statistical Theory and Related Fields, 3:2, 186-198, DOI: 10.1080/24754269.2019.1676034 To link to this article: https://doi.org/10.1080/24754269.2019.1676034 Published online: 23 Oct 2019. Submit your article to this journal Article views: 41 View related articles View Crossmark data

Upload: others

Post on 06-Jan-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Improving timeliness and accuracy of estimates from the UK

Full Terms & Conditions of access and use can be found athttps://www.tandfonline.com/action/journalInformation?journalCode=tstf20

Statistical Theory and Related Fields

ISSN: 2475-4269 (Print) 2475-4277 (Online) Journal homepage: https://www.tandfonline.com/loi/tstf20

Improving timeliness and accuracy of estimatesfrom the UK labour force survey

D. J. Elliott & P. Zong

To cite this article: D. J. Elliott & P. Zong (2019) Improving timeliness and accuracy of estimatesfrom the UK labour force survey, Statistical Theory and Related Fields, 3:2, 186-198, DOI:10.1080/24754269.2019.1676034

To link to this article: https://doi.org/10.1080/24754269.2019.1676034

Published online: 23 Oct 2019.

Submit your article to this journal

Article views: 41

View related articles

View Crossmark data

Page 2: Improving timeliness and accuracy of estimates from the UK

STATISTICAL THEORY AND RELATED FIELDS2019, VOL. 3, NO. 2, 186–198https://doi.org/10.1080/24754269.2019.1676034

Improving timeliness and accuracy of estimates from the UK labour forcesurvey

D. J. Elliott and P. Zong

Office for National Statistics, Newport, United Kingdom

ABSTRACTEstimates of unemployment in the UK are based on data collected in the Labour Force Survey(LFS). The data is collected continuously and the survey design is structured in such a way asto provide quarterly estimates. These quarterly estimates are published each month as ‘rollingquarterly’ estimates. Currently the Office for National Statistics (ONS) publish rolling quarterlyestimates, and these have been assessed to be of sufficient quality to be badged as ‘NationalStatistics’. ONS also publish monthly estimates of a selection of labour force variables, but theseare designated ‘Experimental Statistics’ due to concerns over the quality of these data. Due to thesample design of the LFS,monthly estimates of change are volatile as there is no sample overlap.A state spacemodel can be used to develop improved estimates ofmonthly change, accountingfor aspects of the survey design. An additional source of information related to unemployment isadministrative data on the number of people claiming unemployment related benefits. This datais more timely than survey data collected in the LFS and can be used to provide early estimatesof monthly unemployment.

ARTICLE HISTORYReceived 31 January 2019Revised 30 September 2019Accepted 1 October 2019

KEYWORDSUnemployment; rotationgroup bias; state spacemodels

1. Introduction

The Office for National Statistics publish monthlystatistical bulletins on the UK labour market thatinclude estimates of unemployment (see for exampleONS, 2016b). For a statistical bulletin published inmonth t the latest available estimates of unemploy-ment relate to the quarter covered by months t−4 tot−2, and estimates of changes for these variables lookat the change from the quarter covered by monthst−7 to t−5. The aim of this paper is to developmethods to improve estimates of monthly change interms of two important quality measures, accuracy andtimeliness.

The current Labour Force Survey (LFS) is designedto provide accurate quarter on quarter estimates ofchange due to a rotation pattern where given no non-response there is an 80 per cent overlap in respon-dents. There is user demand formore timely and higherfrequency estimates of unemployment and possibleoptions are considered by Steele (1997). As a result ofthose investigations a decision was made to publishrolling quarterly estimates, as a full redesign of the sur-vey was considered to be too expensive to implement,as described by Werner (2006) who notes that the pro-posed monthly redesign would have cost an additionalseven to eight million pounds. Therefore, while there isa regular monthly publication, the main headline datais essentially quarterly data.

The LFS is a continuous survey as respondentsare interviewed each week of the year. Rather thanchanging the survey design an alternative option formonthly estimates is to account for the rotation pat-tern of the survey and derive monthly estimates froma state space model. State space models used for esti-mation in this paper are comprehensively discussedin texts such as Harvey (1990), Durbin and Koop-man (2001) and Commandeur and Koopman (2007).Statistics Netherlands and the Bureau of Labor Statis-tics in the US both use state space models for esti-mation of some labour force variables such as unem-ployment (see for example, Tiller, 1992; van den Brakel& Krieg, 2015). Other authors have explored the useof state space models using LFS data from the Aus-tralian Bureau of Statistics (ABS), such as Pfeffermann,Feder, and Signorelli (1998), for the Brazilian LFS, Silvaand Smith (2001) and also for the UK, such as Harveyand Chung (2000).

Pfeffermann et al. (1998) present a method for esti-mating the survey error autocorrelation from timeseries of panel estimates, that for the UK context arecalled wave-specific estimates. While the correlationof the survey errors could be estimated from the sur-vey data, as is suggested in Tiller (1992) the pseudo-survey error autocorrelationmethod described by Pfef-fermann et al. (1998) does not require this. The benefitof the pseudo-survey error autocorrelation method is

CONTACT D. J. Elliott [email protected]

© East China Normal University 2019

Page 3: Improving timeliness and accuracy of estimates from the UK

STATISTICAL THEORY AND RELATED FIELDS 187

the simplicity of the approach as the data required fordirect estimationmaynot be readily available.However,alternative methods could be used, including directestimation as part of a time series model.

Harvey and Chung (2000) develop bivariate statespace models for estimatingmonthly changes in unem-ployment for theUKusing LFS data and claimant countdata (an administrative data source) with a simpli-fied approximation for the correlation deriving it fromestimated variances of levels and change. The modelspresented in Harvey and Chung (2000) demonstratehow using the claimant count, which does not measurethe same thing as unemployment, along with surveyestimates of unemployment can improve estimates ofchange due to correlation in the error terms of themodelled population processes.

van den Brakel and Krieg (2009) following theapproach by Pfeffermann (1991) take a multivariateapproach to develop state space models for monthlyunemployment in the Dutch LFS, which has a verysimilar sample design to that of the UK LFS. Theirmodels are further developed in van den Brakeland Krieg (2016) to include auxiliary administrativedata related to unemployment, which as with Harveyand Chung (2000), they find improves the precisionof monthly estimates. In the multivariate time seriesmodel developed in van den Brakel and Krieg (2009),monthly wave-specific design-based estimates anddesign-based estimates of variance are used as inputsinto the model. The use of the design-based variancesare to account for the changing variance over time. Thisis important in the UK context also, as survey monthsare either four or fiveweeks long, and therefore the sam-ple sizes vary by month, and over time there has alsobeen an increase in non-response.

Section 2 provides relevant information on the LFSthat inform the structure of the models. Section 3describes the methods used for estimation of the inputsinto the model, and developments of the model from asimpler univariate model tomore complexmultivariatemodels that address particular issues in the data fromsurvey error autocorrelation, potential bias, and incor-porating administrative data to increase the timelinessas well as the accuracy of unemployment estimates.The models are evaluated in Section 4, while Section 5concludes.

2. Labour force survey

The Labour Force Survey provides among other thingsestimates of the number of people in different categoriesof labour market status (employed, unemployed andinactive) for the target population of people resident intheUK. The LFS has collected data since 1973, althoughthe survey has changed a number of times since thenin terms of the data collected and also the frequencyat which data is collected. Since 1992 the data collected

has been essentially quarterly, although fromApril 1998a monthly statistical release has been published whichincludes rolling-quarterly data for the variables stud-ied here (employment, unemployment and inactivity),which is made possible due to the continuous collec-tion. A history of the LFS can be found in ONS (2016a)and Werner (2006).

This section provides a brief overview of importantaspects of the Labour Force Survey (LFS) that are rel-evant to the development of the time series modelsdeveloped in later sections, in particular the LFS sampledesign and estimationmethodswhich are used to createthe input data (discussed further in Section 3). As oneof the aims of developing models for monthly estimatesis to improve their timeliness, the current publicationschedule of the main LFS variables are also reviewed.

2.1. Sample design

The LFS is a quarterly survey with a rotating paneldesign. Respondents in a panel remain in the survey forfive successive survey periods, where a survey periodis composed of thirteen weeks (labeled ‘stints’, one tothirteen) and respondents asked for their labour mar-ket status in the same stint at each survey period. Anyquarterly sample period covers the UK population; forsamples at frequencies less than quarterly there is aclustering effect due to the survey stints. Assumingfull response there should be an eighty per cent over-lap in the sample at any thirteen week lag (or onequarter lag) and a twenty per cent overlap in the sam-ple at a four quarter lag. One of the main reasonsfor this design is to use the sample overlap in orderto reduce the variance of estimates of quarterly andannual changes. While in theory it is possible to createweekly or monthly estimates (with a month composedof either four weeks or five weeks and not a strict calen-dar month), given the smaller sample sizes and also thelack of sample overlap at a weekly or monthly lag, esti-mates of change at a frequency higher than quarterly arevolatile.

As households are in the sample for five successivequarters, each quarter (month or week) it is possibleto identify the number of periods the household hasalready been in the sample for. Grouping the data in thisway wave 1 households are those that are selected forthe first time, wave 2 households are included for thesecond time and were initially selected thirteen weeksprior to that date and so on. Therefore households inwave 5 will have been selected at four previous peri-ods at thirteen week intervals. When drawing a samplefor any particular quarter, households for waves 2 to 5will have already been determined by the periods whenthey were selected to be in wave 1. The wave 1 sam-ple is drawn using a fixed interval of a geographicallyordered sampling frame. Further details of the sampledesign and selection can be found in ONS (2016a).

Page 4: Improving timeliness and accuracy of estimates from the UK

188 D. J. ELLIOTT AND P. ZONG

At wave 1 around 17,380 households are selectedfor the sample. Therefore, around either five and halfor six and half thousand households are selected persurvey month (either four or five weeks in length).Response rates in wave 1 tend to be slightly higher ataround fifty five per cent and slightly lower for sub-sequent waves at around forty to forty five per cent(ONS, 2016a, Table 3.3). Responses are requested for alleligible people in the household.

Prior to quarter three 2010, the main LFS was anequal probability sample. However, since then somemodifications to the design, such as rolling forward thewave 1 responses of households where all occupants areover the age of seventy five for all subsequent wavesand interviewing only one household from addresseswith multiple households means that this is no longerthe case. Typically responders in wave 1 are interviewedface-to-face with a computer assisted personal inter-view (CAPI), and in waves 2 to 5 responders are tele-phoned for a computer assisted telephone interview(CATI).

The important aspects of the sample design andpotential for possible forms of bias due to the modeof collection are used for the development of the timeseries models for survey error and observed systematicdifferences between wave-specific estimates.

2.2. Estimationmethods

Estimates of employment, unemployment and inactiv-ity use calibrated weights that, for the sample, sumto certain known population totals. The calibrationgroups used for the regular rolling-quarterly data arerelatively detailed due to reasonable sample size avail-able for groups. Three calibration groups are used forthe LFS including, geographical groups based on LocalAuthority Districts, an age by sex classification anda cross-classification using sex, age and GovernmentOffice Regions. Full details of the calibration groupsused can be found in ONS (2016a, Section 10). Thecalibration helps to deal with potential non-responsebias and can also reduce the standard errors of esti-mators. Population totals are derived from populationprojections.

Where there is non-response by an individual ina particular wave a response is imputed from theirresponse in the previous wave if a valid response wasprovided otherwise the case is ignored and weights areappropriately adjusted. This roll-forward imputationonly occurs once. If non-response continues no imputa-tion is done and the unit is ignored. Other than the cal-ibration there is no additional non-response weightingin the LFS.

Estimates of accuracy are produced for the mainLFS variables using a jacknife linearization methoddescribed by Skinner and Holmes (2011) rather thanthe methods described in ONS (2016a, Annex B). The

jackknife estimation method is preferred by Skinnerand Holmes (2011) as it may deal more effectively withincreases in variance due to varying survey weights.

2.3. Publications

Eachmonth a statistical bulletin on the UK labourmar-ket is published which provides estimates of employ-ment, unemployment, inactivity and other employmentrelated statistics, many of which are derived from theLFS, see for example ONS (2016b). The statistical bul-letin starts with a section on the main points for thelatest three months. The main points usually includethe number of people estimated to be in employmentand unemployment and how much more or less thelatest three month estimate is compared to the threemonth period from three months previously and also ayear ago. The figures presented are seasonally adjustedusing the seasonal adjustment software X-13ARIMA-SEATS, Time Series Research Staff (2017). The methodof seasonal adjustment used is the X-11 method.

Following the main points section a summary ofthe latest labour markets statistics is given, whichagain focuses on current level of variables, rates andalso changes on previous periods (essentially quarterlyand annual changes). Further on in the bulletin moredetailed analysis of the variables is provided, for exam-ple, plots of the seasonally adjusted estimates over timewith additional commentary.

Towards the end of the bulletin there is a sectionon accuracy of the estimates, which provides links todata sets on sampling variability with confidence inter-vals for estimates of levels, rates and changes. It shouldbe noted that the estimates of sampling variability arefor the non-seasonally adjusted series rather than theseasonally adjusted estimates that form the main head-line figures. Standard errors for seasonally adjustedestimates are not published.

A separate article is published eachmonth for exper-imental single month LFS estimates, for example seeONS (2017). The article stresses that due to the smallsample sizes and no sample overlap the estimates ofmonthly change are not reliable estimates of currenttrends in the labour market. The stated main pur-pose of the single month estimates is to help interpretmovements in the headline three-month average rollingquarterly data. As such the single month estimates arebenchmarked to the rolling quarterly data and chartsincluded in the article compare the single month withthe three-month moving average data.

3. Methods

Estimation of the underlying parameters of inter-est involves initial estimates of input data for themodel followed by the use of a time series model.The input data for the model include design-based

Page 5: Improving timeliness and accuracy of estimates from the UK

STATISTICAL THEORY AND RELATED FIELDS 189

monthly wave-specific estimates of unemployment andestimated variances and also estimates of the sam-ple error autocorrelation. Time series models use thewave-specific estimates and their associated variancesas input data for the models. The sample error auto-correlation can either be estimated as a hyperparame-ter in the model, or derived prior to the model usinga design-based estimate or the pseudo-survey errorautocorrelation approach of Pfeffermann et al. (1998).The models assume that the observed time seriesare comprised of unobserved components that areestimated with the Kalman filter in the state spaceframework.

3.1. Input data

Wave-specific time series are estimated using a sim-plified version of the calibration weighting used forthe published rolling-quarterly data.Wave-specific esti-mates have been estimated from January 2002 onwards,due to the ease of access of the sample data. Thewave-specific estimates can be averaged to providesingle month time series. It should be noted thatthe averaged monthly wave-specific estimates differslightly to the published single month estimates asthe method of estimation differs. Information on thecurrent methods for single month estimates can befound in ONS (2017). The sample at any time periodless than a quarter is clustered. The variance estima-tion for the single month wave-specific estimates isdone using the linearized jacknife estimator of Skin-ner and Holmes (2011) accounting for the clustering,stratification and calibration.

Due to the rotation pattern of the LFS, where intheory there should be an eighty per cent overlap inrespondents between a survey reference week τ andsurvey reference week τ − 13, there is expected to bea correlation structure to the sampling error. The sam-pling error autocorrelation of an estimator yτ from theLFS, is

Cor(yτ , yτ−k) ={

ρτ ,k for k ∈ {0, 13, 26, 39, 52}0 for k /∈ {0, 13, 26, 39, 52}

(1)

Monthly and quarterly estimates are averages of weeklyestimates. A quarterly estimate is the average of a thir-teen week period, while a monthly estimate is an aver-age of either a four or five week period with threemonthly estimates comprising a quarter following afour week plus four week plus five week pattern. If zτis an average of weeks τ to τ + l then if we assumethat the sampling errors follow a stationary processwith Var(yτ ) = σ 2 and Cov(yτ , yτ−k) = σk then formonthly or quarterly estimates as l ≤ 13 and onlyconsidering lags k > l (that is to say not considering

overlapping rolling periods),

Cor(zτ , zτ−k) =⎧⎨⎩

σk

σ 2 = ρk for k ∈ {0, 13, 26, 39, 52}0 for k /∈ {0, 13, 26, 39, 52}

(2)

The expected value of ρk given no non-response andwhere there is no change in employment status forthose respondents responding at τ and τ − k is givenby the proportion of sample overlap at k; ρ13 = 0.8. Inpractice there is non-response and the methods usedfor imputation as well as the probability of changingemployment status between any two periods will meanthat the estimated correlation is unlikely to be the the-oretical sample overlap. Nevertheless, we may expect tosee higher correlation in variables such as employmentand inactivity, compared to unemployment, dependingupon the labour market conditions. For example, thosedefined as unemployed should be actively seekingwork,and they may be more likely to change labour marketstatus than those who are employed, or those who areinactive (not actively seeking employment).

Pfeffermann et al. (1998) describe a method of esti-mating what they term pseudo-survey error autocorre-lations. Correlation is calculated not directly from thesample data but from the estimated panel time series,that in this paper are referred to as wave-specific esti-mates. While we can calculate the pseudo-survey errorautocorrelation for the series, ρk, at lags k it is alsopossible to estimate the cross correlation betweenwave-specific time series. For example, we may expect zerocorrelation at negative lags between the wave 1 sam-pling errors and other waves as this is the first time thatthose panel members are included in the sample. Wewould expect a correlation between wave 2 and wave1 sampling errors at lags of three months due to theexpected eighty per cent sample overlap, a slightly lowercorrelation between wave 3 and wave 1 sampling errorsat lags of six months due to the expected sixty per centover lap and so on. These pseudo-cross correlations willbe used in the development of the multivariate timeseries models.

The Pfeffermann et al. (1998) method of estimat-ing the pseudo survey error autocorrelation includesterms to address potential rotation group bias, wherebywave-specific time series may have a bias in the level,which could be due, for example, to the mode of datacollection.

3.2. Time seriesmodels

Three models are compared; a univariate basic struc-tural model of a single month estimate, accountingfor survey error autocorrelation and a changing vari-ance over time, a multivariate model using only wave-specific monthly data from the LFS, and an extension

Page 6: Improving timeliness and accuracy of estimates from the UK

190 D. J. ELLIOTT AND P. ZONG

to this multivariate model that incorporates an admin-istrative data source related to unemployment calledthe claimant count. A brief description of the models isgiven in this section with details onmodel specificationin state space form provided in Appendix. Common toall of the models explored, it is assumed that the popu-lation process,Yt , follows a basic structuralmodel givenby 4, that allows for stochastic trend, stochastic seasonaland noise. Models for the survey errors are developedto account for the changing variance over time and alsofor the survey error correlation. For all models a gen-eralised representation is given, but Section 4 assessesrestrictions on the models, to test whether certain vari-ables are required and whether a more parsimoniousmodel is sufficient.

All models have been fitted using the KFAS pack-age (Helske, 2017) in R (R Core Team, 2016). Non-stationary components have diffuse initialization ofthe Kalman filter, while the unobserved states for thesampling errors have a non-diffuse initialization withmean zero and variance of one. Hyperparameters forthe models are estimated using maximum likelihoodwith the quasi-Newton method BFGS from the optimfunction in R. Uncertainty arising from the replace-ment of the unknown true hyperparameters with esti-mates from maximum likelihood are not accountedfor in the standard errors of the estimated unobservedcomponents.

3.2.1. UnivariatemodelMonthly wave-specific estimates, y(i)

t , for i = 1, 2, . . . ,5, have independent samples at time t across waves i,so can be averaged to give a single month estimate,yt . It is assumed that this is comprised of an unob-served population process, Yt , and an autocorrelatedsurvey error process, et . The survey error is modelledas an AR([3]) process, which denotes an autoregres-sive process of order three where the coefficients onlag one and two are equal to 0. Design-based varianceestimates, k2t , are also used as input data for the modelto account for changing variance over time, in a simi-lar way to the multivariate model discussed in van denBrakel and Krieg (2009).

yt = Yt + et , (3)

where the population process follows a basic structuralmodel given by

Yt = Lt + St + It , (4)

with the trend component, Lt given by

Lt = Lt−1 + Rt−1 + wLt , wL

t ∼ N(0, σ 2L ), (5)

Rt = Rt−1 + wRt , wR

t ∼ N(0, σ 2R), (6)

the seasonal component by

St =j=6∑j=1

Sjt (7)

where for i = 1, . . . , 5

Sjt = Sj,t−1 cos λj + S∗j,t−1 sin λj + wS

jt ,

wSjt ∼ N(0, σ 2

Sj),

S∗jt = −Sj,t−1 sin λj + S∗

j,t−1 cos λj + wS∗jt ,

wS∗jt ∼ N(0, σ 2

Sj),

and for j = 6

Sjt = Sj,t−1 cos λj + wSjt , wS

jt ∼ N(0, σ 2Sj),

with wSjt and wS∗

jt uncorrelated, and unless otherwisespecified σ 2

Sj = σ 2S for all j.

The irregular component is given by

It = wIt , wI

t ∼ N(0, σ 2I ), (8)

and the survey error, assuming an AR([3]) process as

et = φ1et−3 + εt , εt ∼ N(0, (1 − φ2)σ 2ε k

2t ). (9)

For the purpose of estimation in the state space frame-work a similar approach to that introduced by Pfef-fermann (1991) and also described by van den Brakeland Krieg (2009) is used where et = ktet , which leadsto

et = φ1et−3 + εt , εt ∼ N(0, (1 − φ21)σ

2ε ) (10)

where an estimate of σ 2ε should be approximately equal

to 1.Alternative models for the error or population pro-

cesses could also be experimented with. Under theassumption that any rotation group bias is approxi-mately constant across time estimating the correlationwith a univariate model should not be a problem asany bias should only effect the average level of the timeseries estimates. Note that φ1 can either be estimatedas part of the model, or be estimated using the pseudo-survey error autocorrelation approach or directly fromthe survey data.

3.2.2. MultivariatemodelThe multivariate model using only LFS data assumesthat the five wave-specific monthly estimates, y(i)

t fori = 1, 2, . . . , 5, are comprised of the common stochasticpopulation process,Yt , andwave-specific survey errors,e(i)t . For this model the wave-specific survey error termsare assumed to be comprised of a rotation group biasterm, and sampling error which accounts for correla-tion in the error terms due to sample overlap. The rota-tion group bias terms are introduced as the observed

Page 7: Improving timeliness and accuracy of estimates from the UK

STATISTICAL THEORY AND RELATED FIELDS 191

wave-specific time series differ in level. Given no com-pelling evidence that the current level of variables arebiased the model for the bias terms assumes that thewave-specific bias terms sum to zero.

Wave-specific design-based variance estimates,(k(i)

t )2 for i = 1, 2, . . . , 5, are also used as input data forthe model to account for changing variance over time,as discussed in van den Brakel and Krieg (2009). Themultivariate model is

y(i)t = Yt + e(i)t (11)

where

e(i)t = b(i)t + ε

(i)t (12)

b(1)t = −

5∑i=2

b(i)t (13)

b(i)t = b(i)

t−1 + wb(i)t , wb(i)

t ∼ N(0, σ 2b(i)),

for i = 2, . . . , 5 (14)

ε(i)t = φ

(i)1 ε

(i−1)t−3 + ε

(i)t ,

ε(i)t ∼ N(0, (1 − φ2

i )σ2ε(i)(k

(i)t )2). (15)

Note that in the models σ 2ε(i) is estimated and should

be approximately equal to 1 and with φ1 = 0, thereforeε(1)t = ε

(1)t and Var(ε(1)

t ) = (k(1)t )2.

The model for the wave-specific error is comprisedof a wave specific bias term, b(i)

t , which in this casewe have assumed sums to zero over all waves, and awave-specific sampling error. Part of the reason for thisassumption is that this is the implicit assumption in thecurrent method of estimation for rolling-quarterly dataas no adjustment is made to data for particular wavesand the estimator is not considered to be biased. Thisdiffers from the assumption made by van den Brakeland Krieg (2009) where wave one is assumed to have nobias and all subsequent waves have a potential bias dueto systematic differences in the levels of wave-specificestimates. Alternative models for bias could be consid-ered as discussed in Elliott, Zong, Greenaway, Lacey,and Dixon (2015), which provides some limited evi-dence thatwave one estimates are not found to be biasedfor the UK. However, it is not possible to conclude fromthis that other waves are biased and so it is assumedthat the bias sums to zero across all waves. The wavespecific sampling error, ε

(i)t , follows an AR([3]) pro-

cess which leads to Var(ε(i)t ) = (1 − (φ

(i)1 )2)(k(i)

t )2. Forthe purpose of estimation in the state space frame-work accounting for the design-based variances, (k(i)

t )2,the same approach as described by van den Brakeland Krieg (2009) is used where ε

(i)t = k(i)

t ε(i)t , which

leads to

ε(i)t = φ

(i)1 ε

(i−1)t−3 + ε

(i)t ,

ε(i)t ∼ N(0, (1 − (φ

(i)1 )2)σ 2

ε(i)) (16)

where again estimates of σ 2ε(i) should be approximately

equal to 1 for all i.

3.2.3. Multivariatemodel with claimant countThe multivariate model can be developed further,by incorporating additional administrative data thatrelates to the variable of interest, which is a similarextension performed by Harvey and Chung (2000) indeveloping a bivariate model. For example, claimantcount data is a measure of those claiming out of workbenefits, and while it is not a measure of unemploy-ment based on ILO definitions there is the potential togain additional accuracy in monthly changes by bene-fiting from any correlation in unobserved componentsas shown by Harvey and Chung (2000) for the case ofa bivariate model. The multivariate model presentedabove can be extended to include observed claimantcount data as part of the vector of observations yt =(y(1)

t , y(2)t , . . . , y(5)

t , xt).If it is assumed that xt follows a basic structural

model (not the common population process for y(i)t )

then covariance terms for the errors in the trendand seasonal components can be incorporated to cap-ture similar movements in the series. To distinguishbetween population process parameters for unemploy-ment (Yt) and claimant count, superscript x is used.Hence, Equation (4) for claimant count is given by

xt = Lxt + Sxt + Ixt , ext ∼ N(0, σ 2Ix). (17)

Any relationship between the population processfor unemployment and claimant count is based onthe covariance in error terms, Cov(wL,wLx) = σLLx ,Cov(wR,wRx

) = σRRx , Cov(wI,wIx) = σIIx , and Cov(wS,wSx) = σSSx ,.

4. Results

4.1. Survey error autocorrelation

Figure 1 shows the estimated pseudo-survey errorautocorrelations and partial autocorrelations for eachvariable using UK data. Due to the periods of sam-ple overlap, in the following descriptions a lag ofl should be read as a lag of 3l months. The esti-mate of autocorrelation at lag 1 (three months) fromthe pseudo-survey error autocorrelation approach is0.46. Similar values are obtained by estimating it asa hyperparameter in a univariate state space model(0.4) and the approximation from quarterly design-based variance estimates (0.42). This approximationfrom design-based estimates uses the approxima-tion in Harvey and Chung (2000) where ρk ≈ 1 −var(yt − yt−k)/2var(yt). The patterns in the autocor-relation and partial autocorrelations are typical of anautoregressive process of order 1. It should be notedthat estimating φ in the univariate model, or using theapproximation from design-based estimates does not

Page 8: Improving timeliness and accuracy of estimates from the UK

192 D. J. ELLIOTT AND P. ZONG

AC

PAC

5202510150

0.0

0.2

0.4

0.0

0.2

0.4

Lag

Cor

rela

tion

Figure 1. Pseudo-survey error autocorrelation and partial autocorrelation for UK unemployment ages 16 and over. Note a lag of oneis three months as there is no sample overlap between month, so lag l, corresponds to a lag of 3lmonths.

Table 1. Pseudo survey error correlations for unemployment16 plus by wave. Correlation at lag l for wave i is Cor(eit , e

i−lt−3l).

Lag 1 Lag 2 Lag 3 Lag 4

Wave 2 0.57 – – –Wave 3 0.54 0.45 – –Wave 4 0.52 0.20 0.31 –Wave 5 0.63 0.32 0.13 0.24

account for any rotation group bias. For this reasonthe multi-variate models use the pseudo-survey errorautocorrelation. The alternatives methods in the uni-variate models are presented only to demonstrate thesimilarities in the estimates.

Table 1 shows estimates of wave-specific correlation.There is some evidence of correlation between wavestwo to five and wave one at the appropriate lag. Thecorrelation between waves is not as high as suggestedby sample overlap, which is not surprising as part of thedefinition of unemployment is actively seeking work. Ifsimilar calculations are done for employment or inac-tivity variables, the correlation tends to be greater. Notewhen the correlations are calculated on moving win-dows they are relatively stable over time.

4.2. Univariatemodels

Table 2 shows estimated hyperparameters for varia-tions on the univariate model, while Table 3 presentsstandard diagnostics for these models. Other varia-tions on model 1 were also tested but have not been

Table 2. Estimated hyperparameters for variations on model 1(univariate) for UK unemployment aged 16 and over.

Variables Model1a Model1b Model1c

σ 2L 2.213e−04

σ 2R 5.229e−05 6.426e−05 6.426e−05

σ 2S 4.208e−37 2.841e−37

φ 4.059e−01 3.716e−01 3.716e−01AIC −3.687e+02 −3.703e+02 −3.723e+02

Table 3. Diagnostics based on standardised one-step aheadforecast errors for variations of the univariate model of UKunemployment aged 16 and over including skewness (S), kur-tosis (K), Bowman and Shenton test for normality (N), testfor Heteroscedasticity (H(h), h indicates sample size), lags thatare found to be significant at the 5% level using a Ljung-BoxQ(k) test for serial correlation at lag k. See Durbin and Koop-man (2001, Section 2.12.1) for test details.

S K N H(62) lags Q(k) at 5%

Model 1a −0.041 3.150 0.227 1.037Model 1b −0.053 3.125 0.211 1.002Model 1c −0.053 3.125 0.211 1.002

included in the results (these included testing modelsthat included variance parameters σ 2

I and σ 2ε neither

of which gave improvements to the models in Table 3).There is no evidence of lack of normality, heteroscedas-ticity or autocorrelation in the standardised one-stepahead forecast errors. AIC suggests a model withoutvariance terms for the level or seasonal is preferred.Much of the volatility in the unemployment series is dueto the sampling error, and seaosnality is not particularlystrong in the series, although it appears to be stable.

Page 9: Improving timeliness and accuracy of estimates from the UK

STATISTICAL THEORY AND RELATED FIELDS 193

Table 4. Diagnostics based on standardised one-step aheadforecast errors for the univariate models of claimaint (withoutlevel shift variables, model 1d and with level shift variables,model 1e) including skewness (S), kurtosis (K), Bowman andShenton test for normality (N), test for Heteroscedasticity (H(h),h indicates sample size), lags that are found to be significant atthe 5% level using a Ljung-Box Q(k) test for serial correlation atlag k. See Durbin and Koopman (2001, Section 2.12.1) for testdetails.

Model S K N H(79) lags Q(k) at 5%

Model 1d 0.505 7.487 208.909 1.616 3 to 24Model 1e 0.022 3.273 0.758 1.286

A univariate model was also tested for the claimaintcount series to help test subsequent extensions to themultivariate model. The claimant count series followsa similar path to the unemployment data, but is sig-inficantly less volatile, and seasonality is more evident.Moreover, there is a steeper rise in claimant count thanunemployment going into the Great Recession in par-ticular theOctober 2008 to February 2009 periodwhichcauses significant problems for the model in termsof lack of normality, and residual autocorrelation. Asequential step-indicator saturation approach similar toHendry, Doornik, and Pretis (2013) is used to identifyshifts in the level of the series, which can be imple-mented as j impulse indicator variables (dj,t) in the stateequation for the trend component such that

Lxt = Lxt−1 + Rxt−1 +∑j

βjdj,t−1 + wLxt (18)

and coefficients βj are the magnitude of the level shifts.The variables dj,t are selected by testing the significanceof the estimated coefficients by saturating the first halfof the sample, followed by the second half of the sam-ple, and then a sequential selection retaining only thosevariables that are found to be statistically significant atthe 0.1% level. This step-indicator saturation approachindientified level shifts at October 2008 (of magnitude48,234), November 2008 (40,206), January 2009 (104,929) and February 2009 (31,944). Table 4 shows theimprovements in the model diagnostics from includingthe level shift variables.

The seasonality in the claimaint count series is notas stable as that found in the unemployment series.In order to capture the nature of the evolving sea-sonality sufficiently, frequency-specific seasonal vari-ances are required, as described in Hindrayanto, Aston,Koopman, and Ooms (2013). Based on model selec-tion using AIC, frequency-specific variances includeσ 2S1 , σ

2S2 , σ

2S3,4 , σ

2S5,6 where i and j, in σ 2

Si,j indicate theseasonal frequencies that have been grouped. Furtherinvestigations of alternative models for capturing sea-sonality in the claimant could would be interesting, asit is a stock recorded on the second Thursday of a cal-endar month, but are not pursued here as the focus

Table 5. Estimated hyperparameters for multivariate models:model 2 (independence between unemployment and claimantcount), model 3 (covariance terms for both level and slopeerrors) andmodel 4 (covariance between slope error terms only)of UK unemployment aged 16 and over and claimant count.

Model 2 Model 3 Model 4

σ 2I 3.079e−08

σ 2R 9.024e−05 4.254e−05 4.255e−05

σ 2Ix 3.995e−06 3.326e−06 3.268e−06

σ 2Rx 4.034e−05 3.803e−05 3.826e−05

σ 2Sx1

5.781e−07 3.136e−06 3.155e−06

σ 2Sx2

1.404e−06 1.332e−06 1.333e−06

σ 2Sx3,4

2.456e−07 2.417e−07 2.417e−07

σ 2Sx5,6

5.753e−08 5.961e−08 5.930e−08

Cor(wIt ,w

Ixt ) −1.114e−01

Cor(wRt ,w

Rxt ) 9.707e−01 9.708e−01

AIC −2.367e+03 −2.407e+03 −2.407e+03

is unemployment, and the model for our purposes issufficient.

4.3. Multivariatemodels

A similar model selection process to that used for theunivariate model was used to identify variance parame-ters to include in amultivariate model for wave-specificestimates of unemployment, with thewave-specific cor-relation parameters estimated using the pseudo-surveyerror autocorrelation approach. The model is then esti-mated including the claimant count model describedin the univariate results section and compared to alter-native models that account for possible correlationbetween the population process for unemployment andclaimant count. Three variations of the multivariatemodel are compared and the hyperparameters and AICfor each are given in Table 5.

Model 2 treats the multivariate model for unem-ployemnt and themodel for claimant count as indepen-dent, whereas model 3 and 4 include covariance terms.Note that in order to test for correlation between theerror terms in the irregular component (Cor(wI

t ,wIxt ))

the population irregular variance for the claimant count(σ 2

Ix) has been moved into the state equation. The addi-tion of covariance terms improves the model, but thereis not strong evidence for including covariance termsfor both the irregular and the slope. Similarly varianceand covariance terms for other components have beentested and are not required. The AIC for model 3 andmodel 4 are very similar. However, using a likelihoodratio test there is no evidence that inclusion of the addi-tional terms in model 3 is an improvement over model4, and therefore a covariance term for the slope errorterms only is included with an estimated correlationof 0.97.

Table 6 shows diagnostics for the standardised resid-uals of themultivariatemodel including claimant count(model 4) which suggests some slight evidence of het-eroscedasticity in waves 2 to 3, some evidence of lack

Page 10: Improving timeliness and accuracy of estimates from the UK

194 D. J. ELLIOTT AND P. ZONG

Table 6. Diagnostics for model 4 (multivariate) of UK unem-ployment aged 16 and over including claimant count for skew-ness (S), kurtosis (K), test for normality (N), test forHeteroscedas-ticity (H(h)), and lags k for which Ljung-Box Q(k) is significantat the 5% level from the first 24 lags. See Durbin and Koop-man (2001, Section 2.12.1) for test details.

Wave S K N H(62) lags Q(k) at 5%

1 0.012 3.006 0.005 0.972 18 to 242 −0.377 3.609 7.352 1.5473 −0.271 2.932 2.345 0.6454 −0.155 3.196 1.058 0.7095 0.021 2.557 1.550 1.005Claimant count 0.059 3.053 0.130 1.670 21 to 24

of normality for wave 2 and some residual autocorre-lation for lags 18 to 24 for wave 1 and lags 21 to 24in the claimant count. A plot of the cross-correlationof the residuals does not show any strong evidence ofcorrelation at any lags between the different waves andthe Doornik-Hansen test for multivariate normality onthe standardised residuals from model 4 has a value of13.61 (p-value of 0.33). Diagnostics for models 2 and 3are very similar and so are not included here.

Figure 2 shows thewave-specific estimates for unem-ployment along with claimant count. The wave-specificestimates are more volatile than published rolling-quarterly data due to the smaller sample sizes. Theclaimant count is significantly smoother, and the sea-sonality ismore evident. The level of the claimant countis lower than the estimates for unemployment (notall people who are defined as unemployed using theInternational Labour Organization definition are eligi-ble to claim benefits), but follows similar movements inthe trend. Figure 2 also shows the smoothed estimatesof the level of unemployment, Lt |T (note that T =October 2018, although survey data is only available to

September 2018) and claimant count (Lxt |T), frommul-tivariate model 4, including claimant count, which asshown by the prediction interval provides a more accu-rate estimate of the underlying level of unemploymentthan using wave specific estimates.

In a regular publication schedule for monthly data,there is often greater user interest in the latest esti-mates of change. With smoothed estimates, these willbe revised for historical data as new input data becomeavailable. Therefore, in order to give an indication ofthe accuracy of estimates of change from the perspec-tive of a regular monthly publication, the filtered esti-mates of change are examined. Figure 3 shows the fil-tered estimate of monthly change in unemploymentfrom the multivariate model 4 which includes claimantcount, and shows the gain in accuracy for estimatesof change. Using filtered estimates of change in thelevel of the population process, Rt | t , much of the timefor the period from 2008 onwards monthly change isnot significantly different from zero at the five per-cent level. However, by August 2008 the filtered esti-mate of the level shows three consecutive periods ofmonthly increases that are significantly different fromzero at the five per cent level. If seasonally adjustedsingle month estimates were used then there are noperiods since 2008 where there are three consecutivemonthly increases (or decreases) that are significantlydifferent from zero at the five percent level (using thenon-seasonally adjusted standard error as an approxi-mate standard error for the seasonally adjusted series).It is interesting to note that using the rolling-quarterlydata three consecutive periods of significant increasesare observed a month later in September 2008, suggest-ing that the modelled single month estimates may beuseful for more timely detection of turning points.

Time

Tho

usan

ds

2008 2010 2012 2014 2016 2018

1000

1500

2000

2500

3000

LFS wave 1−5

Claimant countSmootheded levelof unemploymentSmootheded levelof claimant count

Figure 2. LFSwave-specific estimates and smoothed estimate of the level of UKunemployment ages 16 andover fromamultivariatemodel (model 4) including claimant count (grey shaded area is a ninety five percent prediction interval), and level of claimant count.

Page 11: Improving timeliness and accuracy of estimates from the UK

STATISTICAL THEORY AND RELATED FIELDS 195

Time

Tho

usan

ds

2008 2010 2012 2014 2016 2018

−30

0−

200

−10

00

100

200

300 LFS single month

Claimant countFiltered estimateof unemployment

LFS single month

Claimant countFiltered estimateof change in unemployment

Figure 3. Month onmonth change for LFS singlemonth estimate (seasonally adjusted) and filtered estimate of change of UK unem-ployment ages 16 and over from amultivariate model including claimant count (grey shaded area is a ninety five percent predictioninterval), and monthly change in claimant count (seasonally adjusted).

Time

Tho

usan

ds

2008 2010 2012 2014 2016 2018

020

4060

8010

0

LFS single monthModel 1cModel 4

Figure 4. Standard errors of smoothed change in trend from univariate model (model 1c), multivariate model including claimantcount (model 4) and original single month, of UK unemployment aged 16 plus.

Figure 4 shows the gain in accuracy of the differentmodels compared to the single month estimates. Theinclusion of claimant count into themodel improves theestimated accuracy of the change in the level compo-nent for unemployment. This is due to the correlationin the error terms for the slope components of unem-ployment and the claimant count. In Figure 4 themulti-variate model provides an estimate one month ahead ofthe univariate model as survey data is not yet availableand therefore the estimate from the Kalman filter forthemultivariatemodel borrows some strength from theclaimant count which is available one period on fromcurrent single month LFS data.

5. Conclusions

The aim of using time series models is to improvethe accuracy and timeliness of single month estimatesof unemployment. A simple univariate model thataccounts for the survey error autocorrelation struc-ture makes an improvement in accuracy and increasesthe timeliness compared to the published rolling-quarterly data by one month. Further improvementsin accuracy can be gained by using a multivariatemodel that accounts for both the survey error auto-correlation as well as rotation group bias. An exten-sion to the multivariate model that includes claimant

Page 12: Improving timeliness and accuracy of estimates from the UK

196 D. J. ELLIOTT AND P. ZONG

count also improves the accuracy further and alsoenables an arguably improved one-step ahead fore-cast of unemployment by drawing on the relationshipbetween unemployment and the claimant count, whichwould provide an estimate of the level two monthsahead of the current rolling-quarter. The gain in accu-racy from all models, but particularly the multivari-ate models, improves the estimates of monthly changewhich can provide a more timely detection of turningpoints.

It is interesting to note that there appears to bevery little gain in accuracy when extending the univari-ate model to a multivariate model excluding claimantcount, with the larger gains in accuary occuring whenclaimant count is included. A more parsimoniousmodel, that still benefits from the inclusion of theclaimant count might therefore be a bivariate modelof unemployment and claimant count as presented inHarvey and Chung (2000). This has not been exploredin the current set of models as there is interest fromsome users and producers of the data in analysing thewave-specific series, and amultivariate model enables adecomposition of this data.

While the inclusion of claimant count appearspromising, further research is required on possible dis-continuities in the data that have not been pickedup in the indicator saturation approach or analysisof the model residuals, as there has been a changeto the administration of unemployment related bene-fits, which might explain the apparent closing of thegap between the average level of unemployment andclaimant count.

Common to many of the papers cited throughoutthis paper, is the assumption that the population pro-cess to be measured follows a basic structural modeland the sensitivity of this assumption and the impli-cations for estimation of the survey error would beinteresting to explore. Some of the papers attempt todeal with changing design-based variance, such as vanden Brakel and Krieg (2009), with the assumption thatthe stochastic nature of the population process itself isnot. Depending upon the periods over which the data isanalysed, again this may or may not be true and couldbe difficult to assess what is changes in variance in thesurvey error and what is changes in variance due to thepopulation process. As noted in the response to Harveyand Chung (2000) by Benoit Quenneville there is thedanger than too much of the total variance is assumedto belong to the survey error.

As with many of the papers we find that there isa gain in precision from using the time series model.However, these do not allow for the uncertainty inthe maximum likelihood estimates which if accountedfor could reduce the impact of such estimated gains.An alternative method that would account for thisuncertainty could be to use a Bayesian approach, ora replication technique for estimating the additional

uncertainty, as explored for example by Pfeffermannand Tiller (2005).

Acknowledgments

Thanks to two anonymous referees who provided detailedand constructive comments on an earlier submission. Thanksto Danny Pfeffermann for constructive conversations ondeveloping the models. Thanks also to Bethan Russ, Alas-tair Cameron, Greg Dixon, andMatthewGreenaway for theircontributions to providing the wave-specific monthly esti-mates and variances used as input data for all the modelling.

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes on contributor

D. Elliott and P. Zong are methodologists at the Office forNational Statistics in the UK.

References

Commandeur, J., & Koopman, S. (2007). An introduction tostate space time series analysis. Oxford: OUP.

Durbin, J., & Koopman, S. (2001).Time series analysis by statespace methods. Oxford: Clarendon Press.

Elliott, D., Zong, P., Greenaway, M., Lacey, A., & Dixon, G.(2015). A state space model for labour force survey esti-mates: Agreeing the target and dealing with wave specificbias.

Harvey, A. (1990). Forecasting, structural time series modelsand the kalman filter. Cambridge: Cambridge UniversityPress.

Harvey, A., & Chung (2000). Producing monthly estimates ofunemployment and employment according to the interna-tional labour office definition (with discussion). Journal ofthe Royal Statistical Society Series A, 160, 5–46.

Helske, J. (2017). KFAS: Exponential family state space mod-els in R. Journal of Statistical Software, 78(10), 1–39.

Hendry, D., Doornik, J. A., & Pretis, F. (2013, June). Step-indicator saturation.

Hindrayanto, I., Aston, J. A., Koopman, S. J., & Ooms, M.(2013). Modelling trigonometric seasonal components formonthly economic time series. Applied Economics, 45(21),3024–3034. Retrieved from https://doi.org/10.1080/00036846.2012.690937

ONS (2016a). Labour force survey user guide: Volume 1 -lfs background and methodology 2016. Retrieved fromhttps://www.ons.gov.uk/file?uri= /employmentandlabourmarket/peopleinwork/employmentandemployeetypes/methodologies/labourforcesurveyuserguidance/volume12016.pdf

ONS (2016b). Uk labour market: Nov 2016. Retrieved fromhttps://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/bulletins/uklabourmarket/november2016/pdf

ONS (2017). Single month labour force survey estimates: Jan2017. Retrieved fromhttps://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/articles/singlemonthlabourforcesurveyestimates/jan2017/pdf

Pfeffermann, D. (1991). Estimation and seasonal adjustmentof population means using data from repeated surveys.Journal of Business & Economic Statistics, 9, 163–175.

Page 13: Improving timeliness and accuracy of estimates from the UK

STATISTICAL THEORY AND RELATED FIELDS 197

Pfeffermann, D., Feder, M., & Signorelli, D. (1998). Estima-tion of autocorrelations of survey errors with applicationto trend estimation in small areas. Journal of Business &Economic Statistics, 16, 339–348.

Pfeffermann, D., & Tiller, R. (2005, November). Bootstrapapproximation to prediction mse for state-space modelswith estimated parameters. Journal of Time Series Analysis,26, 893–916.

R Core Team (2016). R: A language and environment for sta-tistical computing [Computer software manual]. Vienna,Austria. Retrieved from https://www.R-project.org/

Silva, D., & Smith, T. (2001). Modelling compositional timeseries from repeated surveys. Survey Methodology, 27(2),205–215. Retrieved from http://uos-app00353-si.soton.ac.uk/30037/

Skinner, H., & Holmes, D. (2011). Variance estimation forLabour Force Survey estimates of level and change: GSSMethodology Series 21. Retrieved from http://www.ons.gov.uk/ons/guide-method/method-quality/specific/gss-methodology-series/index.html

Steele, D. (1997). Producing monthly estimates of unem-ployment and employment according to the internationallabour office definition (with discussion). Journal of theRoyal Statistical Society Series A, 160, 5–46.

Tiller, R. (1992). Time series modeling of sample survey datafrom the US current population survey. Journal of OfficialStatistics, 8(2), 149–166.

Time Series Research Staff (2017). X-13arima-seats refer-ence manual [Computer software manual]. WashingtonDC, US. Retrieved from https://www.census.gov/ts/x13as/docX13AS.pdf

van den Brakel, J. A., & Krieg, S. (2009, February). Estima-tion of themonthly unemployment rate through structuraltime series modelling in a rotating panel design. SurveyMethodology, 35, 177–190.

van den Brakel, J. A., & Krieg, S. (2015, December). Dealingwith small sample sizes, rotation group bias and disconti-nuities in a rotating panel design. Survey Methodology, 41,267–296.

van den Brakel, J. A., & Krieg, S. (2016). Small area esti-mation with state space common factor models for rotat-ing panels. Journal of the Royal Statistical Society: SeriesA (Statistics in Society), 179(3), 763–791. Retrieved fromhttps://rss.onlinelibrary.wiley.com/doi/abs/10.1111/rssa.12158

Werner, B. (2006). Reflections on fifteen years of change inusing the labour force survey. Retrieved from http://www.ons.gov.uk/ons/rel/lms/labour-market-trends–discontinued-/volume-114–no–8/reflections-of-fifteen-years-of-change-in-using-the-labour-force-survey.pdf?format=hi-vis

Appendix. State spacemodels

This appendix provides details of the structure of the uni-variate andmultivariate state spacemodel. State spacemodelsenable the estimation of unobserved components of univari-ate and multivariate time series. The models developed inthis paper aim to estimate an underlying stochastic popu-lation process and survey error processes, all of which areunobserved. The models used are linear Guassian models.

A.1 State-space framework

The state space framework is comprised of an observationequation and transition equation. The observation equationshows how the unobserved components combine to formthe observed values, while the transition equation shows

how the unobserved components evolve over time follow-ing a first-order Markov process. Following the notation ofHarvey (1990) and simplifying slightly to only show aspectsthat are relevant to the current study the observation andtransition equation are respectively

yt = Ztαt + εt , (A1)

and

αt = Ttαt−1 + Rtηt , (A2)

where

E(εt) = 0, Var(εt) = Ht , (A3)

E(ηt) = 0, Var(ηt) = Qt . (A4)

The observations, yt , may be a multivariate time series withT elements. The state vector of unobserved components isgiven by αt with dimension m × 1. The observation matrixwhich combines the unobserved components,Zt is of dimen-sion N × m and the observation errors, εt , is an N × 1 vec-tor. In the transition equation, Tt is the m × m transitionmatrix which defines how that state vector changes over time,Rt is an m × q matrix determining which elements in thestate vector are stochastic, and the transition error, ηt is anq × 1 vector. Zt , Tt , Rt , Ht and Qt are system matrices thatare non-stochastic. However they may change over time,hence the time subscript and can contain estimated values,for example the hyperparameters. For details on the estima-tion of unobserved components using the Kalman filter andmethods for hyperparameter estimation see for exampleHar-vey (1990), Durbin and Koopman (2001) or Commandeurand Koopman (2007).

The models in this paper are estimated using the KFASpackage of Helske (2017) in the software R, R CoreTeam (2016). The optimiser used in R from the functionoptim provides various methods of optimization based onNelder-Mead, quasi-Newton and conjugate-gradient func-tions. In this paper BFGS (a quasi-Newton method) is used.

A.2 Univariatemodel

The basic structural model (Equation (4)) includes mod-els for the trend (Equation (5)) and seasonal component(Equation (7)), an irregular component as white noise(Equation (8)) and sampling error (Equation (9)).Theseequations can be expressed in state space form for monthlydata following a similar notation to van den Brakeland Krieg (2009) where rn×m is a matrix of dimension n × mwhere each element is equal to r, In is an n × n identitymatrix, Diag creates a diagonal matrix from the elements inparentheses, whileBlockDiag creates a block diagonalmatrixfrom the elements in parentheses. The model is composed ofa population process (θ) and sample error part (e), with thestate vector and system matrices given by

αt = (αθt ,α

et )

′, (A5)

Zt = (Zθt ,Z

et ), (A6)

Tt = BlockDiag(Tθt ,T

et ), (A7)

Rt = BlockDiag(Rθt ,R

et ), (A8)

Ht = σ 2ε , (A9)

Qt = Diag(Qθt ,Q

et ), (A10)

Page 14: Improving timeliness and accuracy of estimates from the UK

198 D. J. ELLIOTT AND P. ZONG

αθt = (Lt ,Rt , S1t , S∗

1t , . . . , S5t , S∗5t , S6t), (A11)

Zθt = (1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1), (A12)

Tθt = BlockDiag(TL

t ,TS1t , . . . ,TS5

t , cos λ6), (A13)

TLt =

(1 10 1

), (A14)

TSjt =

(cos λj sin λj

− sin λj cos λj

), (A15)

Rθt = I13, (A16)

Hθt = σ 2

ε , (A17)

Qθt = (σ 2

L , σ 2R , σ 2

S1 , σ 2S1 , σ

2S2 , σ 2

S2 , σ 2S3 , σ 2

S3 , σ 2S4 , σ 2

S4 ,

σ 2S5 , σ 2

S5 , σ 2S6). (A18)

Design-based variance estimates, k2t can be used to accountfor the changing variance over time as discussed by van denBrakel and Krieg (2009) and used to help obtain estimates ofthe sampling error.

αet = (et , et−1, et−2), (A19)

Zet = (kt , 0, 0) (A20)

Tet =

⎛⎝0 0 φ11 0 00 1 0

⎞⎠ (A21)

Ret = (1, 0, 0)′ (A22)

Qet = σ 2

ε . (A23)

A.3 Multivariatemodel

When the observed design-based estimates are monthlywave-specific estimates and assuming the wave-specific sur-vey error and bias terms are combined with the basic struc-tural model and combining with claimant count allowingpotentially for correlation, a multivariate state space modelcan be specified for yt = (y(1)

t , y(2)t , . . . , y(5)

t , xt)with the statevector and system matrices are given by

αt = (αθt ,α

εt ,α

bt ,α

θxt )′, (A24)

Zt = BlockDiag((15×1 ⊗ (Zθt )

′,Zεt ,Z

bt ),Z

θt ), (A25)

Tt = BlockDiag(Tθt ,T

εt ,T

bt ,T

θt ), (A26)

Rt = BlockDiag(Rθt ,R

εt ,R

bt ,R

θt ), (A27)

Ht = BlockDiag(05×5, σ 2ex), (A28)

Qt =⎛⎝ Diag(Qθ

t ,Qεt ,Q

bt ) C′

C Qθxt

⎞⎠ , (A29)

where

αεt = (ε

(1)t , ε(2)

t , . . . , ε(5)t , ε(1)

t−1, ε(2)t−1, . . . , ε

(5)t−1,

ε(1)t−2, ε

(2)t−2, . . . , ε

(5)t−2), (A30)

αbt = (b(2)

t , b(3)t , b(4)

t , b(5)t ), (A31)

Zεt = (Diag(k(1)

t , k(2)t , . . . , k(5)

t ), 05×10), (A32)

Zbt =

⎛⎜⎜⎜⎝

−1 −1 −1 −11 0 0 00 1 0 00 0 1 00 0 0 1

⎞⎟⎟⎟⎠ , (A33)

Tεt =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 φ

(2)1 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 φ(3)1 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 φ(4)1 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 φ(5)1 0

1 0 0 0 0 0 0 0 0 0 0 0 0 0 00 1 0 0 0 0 0 0 0 0 0 0 0 0 00 0 1 0 0 0 0 0 0 0 0 0 0 0 00 0 0 1 0 0 0 0 0 0 0 0 0 0 00 0 0 0 1 0 0 0 0 0 0 0 0 0 00 0 0 0 0 1 0 0 0 0 0 0 0 0 00 0 0 0 0 0 1 0 0 0 0 0 0 0 00 0 0 0 0 0 0 1 0 0 0 0 0 0 00 0 0 0 0 0 0 0 1 0 0 0 0 0 00 0 0 0 0 0 0 0 0 1 0 0 0 0 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

,

(A34)

Tbt = I4, (A35)

Rεt =

(I5

010×5

), (A36)

Rbt = I4, (A37)

Qεt = (1, (1 − (φ

(2)1 )2), (1 − (φ

(3)1 )2), (1 − (φ

(4)1 )2),

(1 − (φ(5)1 )2)), (A38)

Qbt = (σ 2

b2, σ2b3, σ

2b4, σ

2b5). (A39)

αθxt = (Lxt ,R

xt , S

xt , S

x1t , S

x∗1t , . . . , S

x5t , S

x∗5t , S

x6t), (A40)

Qθxt = Diag(σ 2

Lx , σ 2Rx , σ 2

Sx1, σ 2

Sx1, σ 2

Sx2, σ 2

Sx2, σ 2

Sx3, σ 2

Sx3, σ 2

Sx4, σ 2

Sx4,

σ 2Sx5, σ 2

Sx5, σ 2

Sx6), (A41)

C = BlockDiag(Cθ , 011×20), (A42)

Cθ =(

σLLx 00 σRRx

). (A43)

Note that the multivariate model includes the populationirregular term for the claimant count in the observation error.In the paper when correlation is tested between populationirregular components, then the systemmatrices are modifiedand the irregular moved to the state vector, and an irregu-lar term for the unemployment population process is alsoincluded in the state vector. Note also that no covarianceterms are included for the seasonal components of unem-ployment and claimant count in the above equations, as therewas no evidence that this was required, but this is also astraigtforward modification to the system matrices.