spatial modelling of origin-destination commuting … › content › dam › ethz ›...

53
Spatial modelling of origin-destination commuting flows with endogenous weight matrices Thomas Schatzmann Supervisors: Prof. Dr. K. Axhausen (ETH), Prof. G. Crawford (UZH), G. Sarlas (ETH) Master thesis Institute for Transport Planning and Systems September 2017 Department of Economics

Upload: others

Post on 10-Jun-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commutingflows with endogenous weight matrices

Thomas Schatzmann

Supervisors:Prof. Dr. K. Axhausen (ETH), Prof. G. Crawford (UZH), G. Sarlas (ETH)

Master thesis

Institute for Transport Planning and Systems September 2017

Department of Economics

Page 2: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

Contents

Acknowledgement 1

1 Introduction 21.1 Objective and structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Literature review 42.1 Spatial interaction models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1.1 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2 Spatial econometric models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2.1 Spatial linear regression models . . . . . . . . . . . . . . . . . . . . . 92.2.2 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3 Modelling origin-destination (OD) flows . . . . . . . . . . . . . . . . . . . . . 102.3.1 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3.2 Zero flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.3.3 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.3.4 Endogeneous spatial weights and regressors . . . . . . . . . . . . . . . 14

3 Methodology 163.1 Aspatial (gravity) models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.2 Spatial autoregressive models . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.3 Endogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4 A case study for Switzerland 204.1 Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.2 Explanatory variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.3 Aspatial (gravity) model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.3.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.3.2 Spatial dependence in the residuals . . . . . . . . . . . . . . . . . . . 28

4.4 Spatial autoregressive models . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.4.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.5 Endogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.5.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.5.2 Treating endogeneity in spatial autoregressive models . . . . . . . . . . 394.5.3 Further research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5 Conclusion 42

6 References 44

i

Page 3: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

A Appendix 48A.1 Software details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

List of Figures

1 Swiss municipalities in 2000 . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 Distributions of network distances and flows (in logs) before and after filtering . 233 Map of filtered public transportation commuting flows within Switzerland in 2000 244 Gravity model diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 Spatial dependence in the mean OLS residuals . . . . . . . . . . . . . . . . . . 306 Spatial dependence in the mean OLS residuals . . . . . . . . . . . . . . . . . . 31

List of Tables

1 Flow matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 OD model structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Variables in the data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 Summary statistics for the model variables . . . . . . . . . . . . . . . . . . . . 265 Gravity model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 Lagrange multiplier diagnostics for spatial dependence in OLS residuals . . . . 337 Spatial autoregressive models with network distance weight matrix . . . . . . . 348 Spatial autoregressive models with economic distance weight matrix . . . . . . 359 Instrumental variables (IV) in the data set . . . . . . . . . . . . . . . . . . . . 3910 Instrumental variable model (IV) . . . . . . . . . . . . . . . . . . . . . . . . . 40

ii

Page 4: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

Master thesis

Spatial modelling of origin-destination commuting flows withendogenous weight matrices

Thomas Schatzmann,IVTETH Zürich

phone: +41-79-586 12 [email protected]

September 2017

Abstract

This thesis presents a direct modelling approach for origin-destination commuting flows withpossible endogenous weight matrices. Methodologically, a gravity model and four spatialautoregressive models with different weighting schemes are examined to account for untreatedspatial dependence. Furthermore, the gravity model is tested for endogeneity when usingmean income differencies per municipality as regressor and, together with weighted traveltimes,as a basis for weights in spatial models. OLS, GMM and IV are used to obtain unbiasedand consistent parameter estimates. A case study for public transport commuting flows inSwitzerland is designed to illustrate the concept of OD flow modelling, based on flow data fromthe 2000 Census. It is found that the residuals of the aspatial model indeed exhibit remainingspatial dependence and thus justify the need for spatial models. SAR models relying on networkand economic distance weights (min-max weighting), origin- and destination-centric, showpositive influence of neighbouring communes on travel-to-work trips. Lastly, an IV regressionframework shows the endogeneity of income using a set of valid instruments and a recipe fortreating endogeneity in spatial autoregressive models is presented.

Keywordscommuting flows, spatial autoregressive regression models, origin-destination flow modelling,instrumental variable, endogenous weight matrix, endogeneity

Preferred citation styleSchatzmann, T. (2017) Spatial modelling of origin-destination commuting flows with endoge-nous weight matrices, Master thesis, Institute for Transport Planning and Systems, ETH Zurich,Zurich.

iii

Page 5: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

Acknowledgement

First I would like to thank Prof. Dr. K. Axhausen for the great opportunity to write my master’sthesis at the Institute for Transport Planning and Systems (IVT) of ETH Zurich. His guidanceand useful advices were very important to me in the process of researching and writing the thesis.Many thanks are also due to Prof. G. Crawford from the Chair of Applied Microeconomics ofUZH Zurich. Without him, I would not have been able to write my thesis at an external institute.I would also like to thank Georgios Sarlas, PhD candidate and research assistant at IVT, forhis never ending support and great contributions. On the one hand, he was always availableto answer my questions and help me out with the R statistical programming language. On theother one, he pushed me further on to solve problems and find solutions myself. Also, I wantto generally thank all the persons affiliated to the IVT and my fellow students for sharing theirvaluable inputs and research experiences with me. Last but not least, special thanks go to myfamily and my girlfriend for their encouragement and patience.

1

Page 6: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

1 Introduction

Mobility of society is an important topic worldwide because of its ongoing change of patterns anddimensions over time. For instance Switzerland: Even though daily average travel distance hasincreased since 2000 (2015: 36.8 km, +4.5%), daily average travel time has fallen (2015: 90.4min, −3.1%) in the same time span (BFS and ARE, 2017). Economic growth and urban sprawllead to higher transportation demand and road traffic, which in turn impose great challenges ontransportation infrastructure in industrialised and developing countries. People travel in order tosatisfy a need like work, leisure or health care undertaking an activity at particular locations.During the last decade, leisure activity has been the most important reason to use public orprivate transportation in Switzerland. "Trips-to-work" rank second (BFS and ARE, 2017).

The demand for transport is derived from the afore mentioned activities of people. Ortuzarand Willumsen (2011) specify: "It is the distribution of activities over space which makesfor transport demand." Thus, different methods for treating distance and for allocating originsand destinations over space and time are essential in transport analysis. The spatiality ofdemand makes it difficult to coordinate supply and demand and therefore may strongly affect theequilibrium between them. Furthermore, the concentration of travelling during certain time slotsshows the dynamic character of transport supply and demand. Congestion in peak periods is awell known outcome, for example. Finally, an important task of transport planning constitutesthe modelling and forecasting of equilibrium points of transport supply and demand in order tomaximise social welfare.

From an economic perspective, commuting in the sense of travel-to-work trips can be seen as theconsequence of a spatial discrepancy between work and living (Rouwendal and Van der Vlist,2005). If the spatial distribution of employment and that of workers is different, commuting willbe the outcome of such a mismatch. It describes human interaction as a movement of peoplefrom a fixed origin to a destination that results from a previous decision (Fotheringham andO’Kelly, 1989). Distance decay and gravity forces are two important aspects explaining humaninteraction in terms of commuting.

Transport demand modelling aims at replicating actual travel flows and is based on conventionaldata such as population census and travel diary surveys. These surveys, only covering asmall sample of the actual population, are being used to synthesise transport flows based on arepresentative population. In general, there are two approaches for transport demand forecastingthat can be distinguished: Aggregated and agent-based models. In the first kind of models traveldemand is specified as aggregated transport flow between spatial units whereas in models ofthe latter kind travel demand is kept at the level of individuals. The classical four step transport

2

Page 7: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

model originated in the 1950s according to Ortuzar and Willumsen (2011) and generally triesto estimate the number of trips for different travel modes and routes taken between any pair oforigin and destination zones in the study area. In the first step the total number of trips generatedand attracted by each zone is estimated (trip generation). The next step constitutes the spatialallocation of these trips to particular destinations (trip distribution). The third mode choice stepdetermines the travel mode used for each trip (modal split), while the fourth assignment steppredicts the routes used for each trip.

1.1 Objective and structure

The modelling of origin-destination (OD) commuting flows typically corresponds to the tripdistribution step which is most commonly addressed through the use and estimation of a gravitymodel, estimated by means of ordinary least squares estimators. However, the estimationtechniques of such models require the independence of observations (flows) in order for thebasic assumptions to hold, resulting in biased and inconsistent estimates if the condition doesnot hold. A common misconception when dealing with spatial data is that independence holdsand thus no further testing is conducted towards that direction. Previous research (e.g., LeSageand Pace, 2008) has acknowledged the implications of that and have suggested a modellingapproach for addressing the issue of spatial autocorrelation in the context of OD flows.

However, an open research question remains with how to deal with potential endogeneity issuesin the spatial autoregressive components of the model if they rely on economic characteristicsrather than euclidean or network distance (Kelejian and Piras, 2014). The purpose of this thesisis to develop a direct demand model in the context of commuting, which basically combinesthe first two steps of the classical transport model mentioned above. The goal of the thesis is toapply spatial autoregressive regression models to the case of public transport commuting flowsin Switzerland, test whether its spatial weight matrices and regressors are endogenous and if soto provide a solution based on existing literature.

The thesis is divided into four chapters. More specifically, the literature review in the firstchapter not only gives an overview over the existing literature, it also summarises related andrelevant estimation methods. The methodology chapter highlights how to reach the thesis’objective in three steps. The main chapter contains a case study for Switzerland which basicallyshows the application of spatial autoregressive regression models and its possible endogeneityproblems once economic distance spatial weights are used. The last chapter concludes on themain findings in the previous section and hints to further research goals in the context of thecase study.

3

Page 8: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

2 Literature review

This literature review summarises important aspects and problems concerning the modelling of(commuting) flows and provides an adequate, yet not complete overview over spatial models thatare of interest for this thesis. Furthermore, it serves as a guidance for the chosen methodology,which is presented in Section 3.

2.1 Spatial interaction models

The phenomena of spatial interaction has been studied intensively in recent decades and hasevolved as an important topic in social sciences. Many different models have been used toexamine spatial interaction in economics and geography. The well known gravity model hasprobably emerged as the most frequently used model in migration, trade and transportationdemand research.

According to Sen and Smith (1995) the term "spatial interaction models" has been used in theliterature to label models that focus on flows between origins and destinations. These modelsincorporate explanatory variables that represent characteristics of both origin and destinationzones as well as a function of distance between them. Wilson (1967) was the first who proposeda gravity model based on a statistical equilibrium concept (Sen and Smith, 1995).

Wilson (1971) explains that the gravity model was seen as a Newtonian analogy rather than afamily of interaction models as extensions of it which for him seems to be a better description.He claims that more fruitful analogies can be made and the model’s name is kept because ofhistorical reasons. In the paper a study area is divided into zones which interact in any formwith each other. Typically, interaction between two zones is characterized by the classic gravitymodel.

Ti j = K Mi Mj f (ci j ) (1)

where Ti j is a measure of interaction between zones i and j, Mi is a measure of the mass termassociated with zone i, Mj is a measure of the mass term associated with zone j, and f (ci j ) isa decreasing function of distance or a generalized cost of travel between zone i and j. K is aconstant of proportionality. By incorporating any additional knowledge, mainly total interaction

4

Page 9: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

Table 1: Flow matrix

T =

T11 . . . T1 j . . . T1m O1...

......

...

Ti1 . . . Ti j . . . Tin Oi...

.... . .

......

Tn1 . . . Tn j . . . Tnm On

D1 . . . D j . . . Dm T

flows that should be built into the model to constrain possible values of the interaction variable,four cases can be distinguished:

1. The unconstrained case (no additional knowledge)2. The single-constrained case

a) Production constrained (total of flows originating at zone i is known, Oi =∑

j Ti j)b) Attraction constrained (total of flows terminating at zone j is known, D j =

∑i Ti j)

3. The double-constrained case: production-attraction constrained (both totals are known, T)

The above mentioned models arise as modifications of the unconstrained Newtonian model.Wilson’s important contribution stems from the fact that he derives those models from a differentanalogy which is based on general statistical mechanics. He derived the models using an entropymaximizing approach which tries to find the most probable meso-state (total number of peopletravelling from zone i to j) with a set of constraints defining a given macro-state (totals oftravelling and total expenditure). By counting the number of micro-states associated with eachmeso-state (that also satisfies the constraints) and then finding the one that has the greatestnumber of micro-states linked with it, the most probable meso-state can be found. This relationis summarized below:

W (Ti j ) =T!∏i j Ti j!

(2)

where T is the total number of trips (Wilson, 1967). Maximizing lnW subject to production andattraction constraints as above yields:

Ti j = Ai B jOi D jexp(−βci j ), Ai =1∑

j B j D jexp(−βci j )& B j =

1∑i AiOiexp(−βci j )

(3)

5

Page 10: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

Ai and B j denote balancing factors. Note that the general function f (ci j ) is replaced by theexponential function exp(−βci j ). Summed up, Wilson’s theorem provides that the frequency oftrips by distance will be negative exponentially distributed so that the logarithmic weighting ofdistance required by the gravity model is derived for the first time (Curry, 1972). He was thefirst who addressed the issue of individual versus aggregate outcomes that previous modellingframeworks were unable to accommodate.

Conventional spatial interaction models make use of distance functions to capture spatialdependence in interaction flows. That notion has been challenged in recent years (LeSage andPace, 2008). Griffith (2007) credits Curry (1972) as the first to deal with spatial dependencein flows by claiming that spatial autocorrelation (local distance) effects are confounded withdistance decay (global distance) effects during the estimation of gravity models. He describesthe fact that Eq. (3) overlooks spatial autocorrelation contained in its origin and destinationgeographic distributions as a weak point of Wilson’s paper and proposes, together with Jones in1980, to include a spatial linear operator applied on flows as a specification which can accountfor that weakness.

(I − ρoW )T and T (I − ρdW ) (4)

where, in matrix notation, I is an n-by-n identity matrix; T is the n-by-n flows matrix ; W is arow-standardized geographic connectivity matrix1; and ρo and ρd are the spatial autocorrelationparameters for the origin and destination geographic distributions. Griffith and Jones (1980)showed the existence of statistically significant, positive spatial autocorrelation in gravity modelparameters. They explored the afore mentioned relationship between spatial structure and spatialinteraction at the intra-urban level by studying journey-to-work data for 24 Canadian cities.Furthermore they stated that flows from an origin are "enhanced or diminished in accordancewith the propensity of emissiveness of its neighbouring origin locations". Flows associated witha destination are "enhanced or diminished in accordance with the propensity of attractiveness ofits neighbouring destination locations". Due to insufficient computing power in 1980 they couldnot estimate their model which involves an n2-by-n2 matrix.

1Based upon a binary 0-1 matrix C for which ci j = 1 if areal units i and j are neighbours, and 0 otherwise

6

Page 11: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

2.1.1 Estimation

In the past, it has been common to estimate the parameters of spatial interaction models bylinearising the equations in terms of their parameters (Fotheringham and O’Kelly, 1989). Takingthe logarithm of both sides of a standard gravity model as in Sen and Smith (1995) yields Eq. (7).The parameters α, β and γ are estimated in an ordinary least squares (OLS) regression analysis.According to Fotheringham and O’Kelly (1989) it suffers from several shortcomings. Threeare important for this thesis and therefore mentioned here: First, an unrealistic assumption thatflows are (log) normally distributed. Second, the failure of the assumption of homoscedasticerror terms and last, the issue of zero-valued interactions because of the undefined logarithm ofzero.

Regarding the modelling of spatial interaction counts Flowerdew and Aitkin (1982) proposed totake the Poisson distribution as opposed to the Normal distribution, which seems reasonableas for example the number of commuters must be non-negative and that there is a constantprobability of an individual to travel from zone i to j. Indeed, the number of travellers can beconsidered to be the outcome of a Poisson process that can be estimated by means of maximumlikelihood (ML) in a Poisson regression model.

Even though Poisson regression models seem to be appropriate to model commuting flows,the problem of over-dispersion may not be addressed due to the restrictive assumption ofequidispersion of those models. In practice, over-dispersion of count data arises where theconditional variance is larger than the conditional mean of the dependent variable Ti j , potentiallyleading to downward biased standard errors (Cameron and Trivedi, 2013). One way to correctfor it is to assume a negative binomial distribution of Ti j , which is a generalisation of thePoisson distribution with an additional dispersion parameter in order to allow the conditionalvariance to exceed the conditional mean. Excess zero flows may cause over-dispersion and it isimportant in these cases to separate excess zeros from regular over-dispersion, because these twoforms of over-dispersion are likely generated from different underlying processes. Hurdle andzero-inflated models are the most commonly applied form of zero-augmented models. Both ofthem assume that there is an additional unknown process generating excess zeros. However, inthe case of zero-inflated models there are two possible ways that zero-flows may arise, whereasin the hurdle model, there is only one process generating zero-flows (Farmer, 2011).

2.2 Spatial econometric models

Paelinck and Klaassen (1979) depict five types of issues that arise in spatial econometrics in

7

Page 12: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

order to distinguish it from non-spatial (standard) econometrics:

• Spatial interdependence in spatial models• The asymmetry in spatial relations• Important explanatory factors in other spaces• Explicit modelling of space• Differentiation between ex post and ex ante interaction

Even though the term "spatial econometrics" has found broad acceptance the lines betweenthe definitions of spatial and standard econometrics are blurred as analyses like the estimationof spatial interaction models or the implementation of regional econometric models can bemastered with standard econometric techniques. Anselin (1988) therefore suggests to focus on"the specific spatial aspects of data and models" which basically are two spatial effects:

1. Spatial dependence or spatial autocorrelation2. Spatial heterogeneity

The former is acknowledged to Cliff and Ord (1973) and generally addresses the spatial depen-dence of observations in cross-sectional data sets. It is determined by relative space or relativelocation which emphasizes the effect of distance and is considered as the core of Tobler’s firstlaw of geography 1979 - "everything is related to everything else, but near things are morerelated than distant things." Because of the multidirectional nature of dependence in spacecompared to one direction in time, econometric results from time series analysis do not carryover straightforward to spatial dependence in cross-sectional samples (Anselin, 1988). Spatialheterogeneity relates to the behavioral instability of data over space: functional forms andparameters vary with location and thus are heterogeneous throughout the data set. However, incontrast to spatial autocorrelation heterogeneity issues can be solved with standard econometrictechniques.

Summed up, spatial econometrics "consist of those methods and techniques that, based on aformal representation of the structure of spatial dependence and spatial heterogeneity, providethe means to carry out the proper specification, estimation, hypothesis testing, and prediction formodels in regional science" (Anselin, 1988).

8

Page 13: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

2.2.1 Spatial linear regression models

A general spatial regression model for cross-sectional data is given by Anselin (1988):

y = ρW1y + X β + ε

ε = λW2ε + µ(5)

with µ ∼ N (0,Ω) and the diagonal elements of the error covariance matrix Ω as Ωii =

hi (zα) hi > 0. Furthermore, β is a parameter associated with exogenous (and not lagged)dependent variables X , ρ is the coefficient of the spatially lagged dependent variable y and λis the coefficient in a spatial autoregressive structure for the disturbance ε which is taken tobe normally distributed with a general diagonal covariance matrix Ω. W1 and W2 are eitherstandardized or unstandardized spatial weight matrices.

Weight matrices play an important role regarding how to formally incorporate the structureof spatial dependence in a model. The notion of neighbourhoods and nearest neighbour wasintroduced to see whether and if so how strong other units in the spatial system influencethe particular unit under consideration. A spatial contiguity matrix as a measure for spatialdependence is based on a binary coded structure of neighbours. If two spatial units have acommon border of non-zero length they are considered to be contiguous, and a value of 1 isassigned whereas a 0 indicates no neighbour relation (Moran, 1948). Another way to modelcontiguity stems from the notion of shortest path on a network by connecting points like spatialunits’ centroids. W1 and W2 often are row-standardized such that the row elements sum up toone which according to Anselin (1988) facilitates the interpretation of the model coefficients inmany cases. But the interpretation need not always be economically meaningful, especially ifthe standardized matrices are asymmetric.

The concept of binary contiguity was extended by Cliff and Ord (1973) to include a generalmeasure of the potential interaction between two spatial units such as inverse distance or negativeexponentials of it combined with shared border length.

wi j =[di j

]−a [βi j

] b(6)

with di j as distance between units i and j and βi j as proportion of the shared border. a and

9

Page 14: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

b are parameters. In general, euclidean distance measured between centroids need not be themost accurate measure for weight matrices. For instance in transportation networks like publictransport or roads, travel time (network distance) can be seen as an access variable and is moreappropriate than simple euclidean distance due to increasing mobility and different speeds oftraffic modes (Killer, 2014). Or in social networks, where the weight matrix can be definedas binary peer matrix, which measures peer effects (LeSage and Pace, 2009). Examples ofeconomic distance as underlying weights in matrices are given in Section 2.3.4.

2.2.2 Filtering

Different spatial model structures result when restricting parameters of the general model inEq. (5) to zero, which is called filtering in the existing literature (Anselin, 1988)2.

• Classical linear regression model– ρ = 0, λ = 0, α = 0 : ⇒ y = X β + ε

• Spatial autoregressive model (SAR)– λ = 0, α = 0 : ⇒ y = ρW1y + X β + ε

• Spatial error model (SEM)– ρ = 0, α = 0 : ⇒ y = X β + (I − λW2)−1µ

• Spatial autoregressive model with a spatial autoregressive error term (SAC)– α = 0 : ⇒ y = ρW1y + X β + (I − λW2)−1µ

Four more models arise when spatial heterogeneity is incorporated by restricting h(zα) tospecific forms. The spatial durbin model (SDM) where X is weighted as well is not listedhere.

2.3 Modelling origin-destination (OD) flows

As mentioned in Section 2.1 the gravity model in Eq. (3) has been used extensively to modelorigin-destination flows in many social sciences. But since it does not account for independentindividual flows exhibiting spatial dependence in the residuals, spatial regression models havegained importance being able to incorporate spatial autocorrelation. LeSage and Pace (2008)have provided new insights when it comes to the modelling and estimation of OD flows that canarise in migration, trade, network, communication & information or transporation. In two papersthey apply spatial regression methods to spatial interaction models (LeSage and Pace, 2008;

2Note that the corresponding data generation processes are shown and model names according to LeSage andPace (2009) are taken.

10

Page 15: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

LeSage and Thomas-Agnan, 2015). OD models involve n2 OD pairs in contrast to typical spatialeconometric models that contain samples of n regions with each region being an observation.They aim at explaining variation in the levels of flows between the n2 OD pairs. According tothe authors it has remained a difficult question how to structure the connectivity of n2 OD pairs,which is why they propose a spatial weight structure in way that is consistent with standardspatial autoregressive models. In addition, they focus on maximum likelihood and Bayesianestimation of the models.

They start with a conventional gravity model least-squares regression approach assuming inde-pendence between origin and destination flows (Eq. (7))3:

y = αιN + Xo βo + Xd βd + γg + ε (7)

In Eq. (7) Xo and Xd are explanatory variable matrices that represent origin (o) and destination(d) characteristics, βo and βd are parameter vectors, parameter γ reflects the effect of distance g

and ε is a vector of disturbances which is assumed to be normally distributed: ε ∼ N (0, σ2IN ).Thinking of the simplicity of this model, LeSage and Pace (2008) came up with two econometricmotivations for the use of spatial regression models with spatial lags of the dependent variablein order to possibly account for the spatial richness of OD flows:

1. Spatial dependence as a long-run equilibrium of an underlying spatiotemporal process2. Omitted variable exhibiting spatial dependence results in models with spatial lags of the

explanatory and dependent variables

It is interesting to see that they made use of a commuting flow example to illustrate the justifi-cation for using spatial lags in OD models (LeSage and Pace, 2008, page 948). Furthermorethey also showed econometrically that omitting spatial lags in conventional gravity modelswill lead to bias in the coefficient estimates. In conclusion they have developed spatial modelspecifications for origin-destination flows that are based on a general spatial autoregressivemodel which takes into account origin (1), destination (2) and so-called origin-to-destination (3)based dependence:

y = ρoWoy︸ ︷︷ ︸(1)

+ ρdWdy︸ ︷︷ ︸(2)

+ ρwWwy︸ ︷︷ ︸(3)

+αιN + Xo βo + Xd βd + γg + ε (8)

3This model is presented in Sen and Smith (1995) and equals a log-transformed standard gravity model.

11

Page 16: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

As in Section 2.2.2 different models from non-spatial up to fully spatial dependence result fromrestrictions on the parameters ρi, i = o, d,w. LeSage and Pace (2008) call them the successivespatial filter model specifications.

Recent work has included spatial dependence in models in various fields: Porojan (2001)used sample data from 1995 consisting of 15 EU member states and 7 OECD countries toshow that predicted trade flows vary considerably when inherent spatial effects are taken intoaccount. He stated that the traditional gravity model underestimates the size of the trade flowsfor countries which have trading neighbours and concluded that the spatial formulation is "aclear improvement upon all previous" ones. Lee and Pace (2005) in a study on retail gravitymodels where the shopping behavior of people is explored stated that under independence theparameter of the distance function is underestimated by as much as two thirds.

LeSage and Pace (2008) illustrate the above described family of spatial econometric modelsusing population migration flows at state-level in the US. They used state-to-state flows from1995-2000 and 1990 Census data of state characteristics. All in all they conclude that modelsbased on a single weight matrix (Wo,Wd,Ww) are associated with much lower likelihoods thanmodels based on all matrices. Moreover, the likelihood-ratio test rejects the least-squaresapproach which gives support for the importance spatial dependence as such, but much more forboth origin and destination connectivity information. Note that the magnitude of the coefficientsfrom least-squares and the spatial model can not be compared due to the fact that the least-squares coefficient for any explanatory variable x represents ∂y

∂x , whereas one from the spatiallag model does not because of possible feedback structures induced by the spatial weighting.

Cushing and Poot (2003) provide an extensive overview with many applications in migrationresearch which support the need for spatial autoregressive models. Griffith (2007) applieseigenvalue based approaches as another way to filter for spatial autocorrrelation.

2.3.1 Estimation

Since standard OLS estimation of Eq. (8) leads to an inconsistent estimator due to the spatial lagof the dependent variable that is typically correlated with the disturbance term, other methods ofestimation have to be considered. The log-likelihood function for the model in Eq. (8) providesa good starting point for both maximum likelihood (ML) and Bayesian estimation (BAYES)

12

Page 17: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

(LeSage and Pace, 2008; LeSage and Thomas-Agnan, 2015)4.

Ln L(ρd, ρo, ρw) = C + ln|IN − ρdWd − ρoWo − ρwWw |︸ ︷︷ ︸log-determinant

−N2

ln(S(ρd, ρo, ρw)) (9)

where S(ρd, ρo, ρw) denotes the sum of squared errors as a function of the parameters ρi

i = d, o,w. In maximum likelihood and Bayesian estimation, standard algorithms can be appliedfor the calculation of the log-likelihood function in Eq. (9). Depending on the sample sizethose algorithms become more difficult as the number of observations increases. This is wherecomputational power of computers may be an issue, because both estimation methods requirethe calculation of the log-determinants of n2 by n2 matrices, which requires large amounts ofcomputer memory. Based on earlier research5 and by exploiting the special structure of weightmatrices, LeSage and Pace found that "by reducing the troublesome log-determinant calculationto one involving only traces of n by n matrices", potential computational problems have beeneliminated.

Kelejian and Prucha (1998) in contrast to LeSage and Pace and Barry propose a generalisedspatial two-stage least squares (GS2SLS) procedure to estimate the same model, resulting ina consistent and asymptotically normal estimator. It has the advantage of less specified distri-butional assumptions compared to the ML estimator. A year later they derived a generalisedmoments estimator (GMM) in a similar set up. Furthermore, both mentioned procedures wereshown to be feasible with large samples (Kelejian and Prucha, 1999). In a follow-up paper in2010 the same authors introduced a new class of GM estimators for the autoregressive parameterof a spatially autoregressive disturbance process allowing for innovations with unknown het-eroskedasticity. Kelejian and Prucha again establish the consistency and asymptotic normalityof IV estimators for regression parameters in spatial autoregressive models with autoregressivedisturbances.

2.3.2 Zero flows

As already mentioned in Section 2.1.1, the problem of zero flows and excess zeros can be tackleddifferently. In the context of OD flow modelling, ML estimation is not appropriate for caseswhere a large number of zero flows exist. ML estimates need a normally distributed dependent

4The basic algorithms for Bayesian estimation and maximum likelihood can be found in (LeSage, 1997; Pace andBarry, 1997)

5See Pace and LeSage (2004); Barry and Pace (1999); Griffith (2004) for detailed information

13

Page 18: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

variable or at least one that can be transformed to achieve normality (LeSage and Pace, 2008;LeSage and Thomas-Agnan, 2015). Especially if fine spatial scales are used, excessive zerovalues begin to appear in flow matrices. LeSage et al. (2007); Lambert et al. (2010) make use ofPoisson estimation for cross-sectional spatial autoregressive models in order to account for theafore mentioned limitation of the ML method. Ranjan and Tobias (2007) treat zero flows usinga Tobit model for censoring, which was extended to the case of spatial autoregressive interactionmodels. LeSage (2000) shows how to employ limited dependent variable models in conjunctionwith a Gibbs sampling method for its estimation. Until now, no Poisson or negative Binomialestimation procedure has been developed for OD flow models, which would likely be required ifexcessive zero values exist.

2.3.3 Interpretation

LeSage and Thomas-Agnan (2015) elaborate more on the limited interpretation of parameterestimates in spatial regression models mentioned in Section 2.3. A change in characteristicsof one spatial unit, a region for example, may influence not only n − 1 in- and outflows, butalso other flows that occur in the network of flows or regions. They proposed a scheme forcalculating scalar summary measures for these impacts that were derived from the partialderivative expressions arising from this type of models. In their application they examinedcommuting flows for 60 regions, taken from the 1990 Census in Toulouse (FR) and impressivelypointed out that the non-spatial model underestimates the total impact of increasing residentsor jobs on commuting flows which arises because of network effects, also known as spatialspillovers.

2.3.4 Endogeneous spatial weights and regressors

For all models that are explained above, one assumption has remained the same: "Weightingmatrices are typically assumed to be exogenous (Kelejian and Piras, 2014)". But already Anselin(1988) stressed that when the spatial interaction phenomenon under consideration is determinedby factors such as purely economic variables, spatial weights linked to the physical features ofspatial units are less meaningful. Endogenous weighting matrices lead to inconsistent and biasedparameter estimates in spatial econometric models and thus require appropriate estimationmethods. As an important extension of earlier work by Kelejian and Prucha, Drukker et al.worked out a two-step generalized method of moments and instrumental variable estimator(2IV/GMM) which takes endogenous regressors and heteroscedastic innovations into account,in addition to a spatially lagged variable.

14

Page 19: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

In contrast to the above explained approaches, Kelejian and Piras (2014) specified a generalspatial panel data model with a spatially lagged dependent variable in terms of an endogenousweight matrix that accounts for the influence of economic variables in neighbouring regions.In their empirical application they study the dynamic demand for cigarettes based on a paneldata from 46 US states over the period 1963-1992. By applying an IV estimation procedurethey succeeded to show that buyers of cigarettes make purchases in the neighbouring stateswhen there is a price advantage in doing so. In a transportation study, Zhou et al. (2016) applieda spatial autoregressive binary probit model to a firm relocation choice problem, where thematrix weights rely on geographic and economic distance (see Eq. (10)), potentially inducingendogeneity. In Eq. (10) di j represents the euclidean distance measure between the centroids ofspatial unit i and j while the second term denotes the absolute difference of their mean incomelevels. As a result of the study, a higher level autocorrelation and endogeneity lead to betterpredictive accuracy of the model.

wi j = di j ∗ |zi − z j | (10)

Qu and Lee used weights based on "purely" economic distance when estimating a spatial au-toregressive model by three methods: Two-stage instrumental variable (2SIV), quasi-maximumlikelihood (QMLE) and generalised method of moments (GMM). They showed their consistencyand asymptotic normality in a theoretical framework.

15

Page 20: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

3 Methodology

As mentioned in Section 1 and following the structure of the literature review, three crucialsteps are foreseen to examine the problem of spatial dependence in origin-destination (OD)commuting flows and that of endogenous weighting matrices when using economic distance asunderlying impedance function.

1. Aspatial (gravity) model2. Spatial autoregressive models3. Endogenous weights and regressors in spatial autoregressive regression models

3.1 Aspatial (gravity) models

Commuting flow data typically belong to the class of count data due to its non-negative character.Given the very large number of zero flows in the data from the Nationales Personenverkehrsmod-ell (2005, see Section 4), it may be best to apply a Poisson or a negative Binomial estimationprocedure in order to account for that problem. But since none of those have been developed incombination with OD flow models, this thesis focuses on a combination of LeSage and Pace’sand Kelejian and Prucha’s approaches. OD flow modelling aims at explaining variation inthe levels of flows between the n2 OD pairs based on a sample containing n spatial units. Animportant difference to classic interaction modelling arises in how a flow matrix translates intoan n2 vector of flows, which defines the OD model structure (see Table 2).

Table 2: OD model structure

T o1 o2 . . . on

d1 o1 → d1 o2 → d1 . . . on → d1

d2 o1 → d2 o2 → d2 . . . on → d2...

......

. . ....

dn o1 → d2 o2 → dn . . . on → dn

=⇒

ιo oo do

1 1 1...

......

n 1 n...

......

n2 − n + 1 n 1...

......

n2 n n

In this thesis, an origin-centric ordering is employed. The first n elements in the stacked flowvector reflect flows from origin 1 to all n destinations. The last n elements of this vectorrepresent flows from origin n to destinations 1 to n (see right figure in Table 2). In accordance

16

Page 21: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

with LeSage and Pace (2008) and simply for the sake of comparability it makes sense to startwith an aspatial base model, a log-transformed standard gravity model, that was introduced inEq. (7) in Section 2.3 and can easily be extended with spatial weights.

y = αιN + Xo βo + Xd βd + γg + ε (11)

y is a n2-by-1 vector of commuting flows. Xo and Xd denote explanatory variable matrices(n2-by-k) representing origin (o)/destination (d) characteristics, βo and βd are parameter vectorsas well as γ that reflects the effect of a distance function g. ε is a vector of disturbances which isassumed to be normally distributed: ε ∼ N (0, σ2IN ). It is estimated by means of ordinary least-squares (OLS) and its parameter coefficients read as elasticities. An important assumption ofOLS is the independence of observations which in a transportation network setting correspondsto flows between spatial units that are not influenced by surrounding neighbours either on theorigin or destination side. This in turn assumes that the distance function g is able to account forall spatial processes going on concerning that specific flow, which is likely not to be the caseand thus leads to an inconsistent estimator, as the independent and identically distributed (iid)random variable assumption of OLS is violated.

By using the Moran’s I index, which quantifies the degree of autocorrelation in the residuals ofa model, it is possible to show that there is remaining spatial dependence in the disturbancesand thus justifies the need of spatial models that account for this problem. In the case oforigin-destination (OD) flow modelling, the Moran’s I test reveals the existence and strength ofspatially autocorrelated errors of the model for both the origins and destinations (Moran, 1948).Since the former produce transport demand and the latter attract it, creating weighting matricesfor each and the combination of them captures three spatial effects: Origin, destination andso-called origin-to-destination based dependence in the spatial autoregressive model.

The spatial weight matrix specifies the neighbourhood of each location and is based on differentimpedance functions. In the framework of this thesis network and economic distance functionsare considered and tested for the same spatial model in order to compare the predictive accuracyof each in the last step. For each matrix, the threshold up to what spatial extent there isstatistically significant autocorrelation is experimentally determined.

17

Page 22: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

3.2 Spatial autoregressive models

Having justified the need for spatial autoregressive models, a Lagrange multiplier test for thepresence spatial errors, lag or both indicates whether a spatial error (SEM), spatial autoregressive(SAR) or both combined (SAC) should be applied.

SAR : y = αιN + ρiWiy + Xo βo + Xd βd + ε with i = o, d,w

SEM : y = αιN + Xo βo + Xd βd + ε with ε = λiWiε + µ, i = o, d,w

SAC : y = αιN + ρiWiy + Xo βo + Xd βd + ε with ε = λiWiε + µ, i = o, d,w

(12)

where ρi is the spatial autoregressive coefficient, λi the spatial autocorrelation parameter andWi with dimensions n2-by-n2 reflects the spatial weight matrix. Subscript i denotes origins (o),destinations (d) and/or a combined version of both (w). In the SAR model ε is an iid errorterm, whereas for the SEM and SAC model it denotes a vector of disturbances and µ is the iid

error vector (n2-by-1). The inclusion of spatial dependence in the form of a weighted dependentvariable or a weighted disturbance term would lead to an inconsistent and biased estimatorif OLS was used, as the conditional mean and the iid assumptions would be violated, whichin turn would produce unreliable statistical tests. Spatial autoregressive regressions allow atwofold treatment of these issues, assuming different underlying mechanisms that generate thespatial dependence (Sarlas and Axhausen, 2017): First, a model that inherently considers anomitted spatial variable and thus accounts for spatially correlated errors (SEM). Second, whenthe response variable from neighbouring spatial unit has an indirect effect on the response atthe location under consideration, then the inclusion of a spatially lagged dependent variablecan mitigate spatial dependence issues and therefore facilitate the estimation of explanatoryvariables’ direct effects on the response variable (SAR). A combined version is also possible(SAC).

As already mentioned in Section 2.3.1, Kelejian and Prucha (2010) provide a combinationof a generalised method of moments (GMM) and an instrumental variable (IV) estimator forspatial autoregressive regression models with unknown heteroskedasticity in innovations, that isfeasible for large samples. The proposed three step procedure is explained in Section 3.3.

18

Page 23: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

3.3 Endogeneity

Lastly, the use of economic distance weights for the construction of the spatial weight matrix Wi

violates its exogeneity assumption. This causes the spatially weighted dependent variable tobe correlated with the error terms - an issue that must be taken care of. If the same economicvariable is used as regressor in the model as well, endogenous regressors complicate the situation.Drukker et al. (2013) proposed the following four step procedure to estimate this kind of models:In the first step the model is estimated by two stage least squares (2SLS) using the instrumentsfor the endogenous weights to obtain the model parameters (β ’s). Second, a GMM approach,based on the residuals from the first step, is applied in order to derive the estimates for ρ and/orλ. In the third step the model6 from the first step is re-estimated by generalised two stage leastsquareds (G2SLS) to obtain the new values of betas along with the residuals. The last stepbasically uses the residuals from the previous step within a GMM estimator to obtain the truevalue of ρ and/or λ, imposing the same moment conditions as before. It has to be emphasizedthat the validity of instruments must be given in order to apply the afore mentioned approach.That is, valid instruments must satisfy the instrument relevance and exogeneity condition. AnIV estimation of the aspatial model should therefore show if endogeneity indeed is an issue andthe approach of Drukker et al. (2013) can and should be used.

In the conclusion chapter, further research options and extensions are presented.

6The model is Cochrane-Orcutt-type transformed to account for the spatial (serial) correlation.

19

Page 24: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

4 A case study for Switzerland

In this section, a case study for commuting flows in Switzerland is designed to illustrate therather theoretical concept of OD flow modelling. The empirical analysis carried out is based ontravel-to-work trips from the Federal Census 2000. It contains public transport commuting-to-work flows between and in 2896 Swiss municipalities and shapefiles of Switzerland in 2000.The flows represent entries in the OD flow matrix T (see left figure in Table 2, where columnsreflect origins and rows destinations)7. The given data set has over 250’000 observations andapparently, it does not fill the whole flow matrix that contains 28962 = 8, 386, 816 entries/flows.In order to have a fully defined one, zero valued flows for the remaining links are assumed. Themode "public transport" is an aggregation of trips-to-work by train, tram, bus and combinationsof those.

Figure 1: Swiss municipalities in 2000

0 50 km

The present flow data and the objective of this thesis introduce some important issues that haveto be taken care of to proceed as explained in Section 3.

7Note that every municipality resembles an origin and a destination

20

Page 25: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

4.1 Issues

The problem is threefold and thus constitutes three different but crucial aspects to considerbefore moving on:

1. Choice of the appropriate model and its underlying distribution2. The data set3. Thesis objective

The first issue relates to choosing the right model. As pointed out in Section 2, commutingflow data belong to the category of count data which may exhibit a large number of zero values.Hence, the issue of having an appropriate model and its underlying distribution is closely relatedto the second point: the data set. Normally, zero valued flows in the context of commuting haveto be taken into account in order to have a consistent estimator. There are different approacheshow to treat them in applications like this. Further, the sample size may restrict the applicationof certain models. Lastly, the first two points need to be in line with the thesis objective. The setup of the case study for Switzerland requires a balanced consideration of all three issues.

The purpose of this master thesis is to improve the gravity model for OD flows by applying aspatial autoregressive regression model and different spatial weighting schemes. When usingeconomic distance weights in the spatial models, endogeneity issues may arise due to the spatialautocorrelation between the weight matrix and the error term. Therefore, if the weight matrixwas indeed endogenous and a valid instrument was found, a GMM/IV estimation approachaccording to Kelejian and Prucha (2010) should be applied to have consistent estimates andmaybe better predictive accuracy. The initial flow matrix contains 98% zero flows, which meansthat there is spatial interaction in only 2% of all flows. This would definitely suggest a Poissonor (zero-inflated) negative Binomial interaction model to account for zero flows as describedin Section 2.3.2. In addition to that, the large sample size of 8’386’816 observations possiblydemands too much memory and makes it computationally unfeasible to apply the model to thegiven data. But, the most important restriction is data availability. In this thesis, mean incomeand traveltime are used to reflect economic distance between municipalities. Since that dataonly covers 1595 municipalities, the original flow matrix has to be filtered for those. However,as already stated in the previous two sections, at the current time there is no implementation ofeither a Poisson or a negative Binomial spatial autoregressive regression model for OD flows touse and overcome the zero flow problem. Furthermore, the only way to address endogeneityissues in spatial models is to use Kelejian and Prucha’s GMM/IV estimation method with validinstruments to obtain a consistent estimator.

21

Page 26: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

In conclusion, a filter method for the given commuting flows is implemented. First, onlymunicipalities that also have income data are considered. Accounting only for interregionaltravel-to-work trips and filtering for all zero flows the final sample size reduces to 46,659observations and ensures the computational feasibility, especially concerning the 46,659-by-46,659 spatial weight matrix W . A caveat of this "filtering" approach is that it leads to biasedand inefficient estimates as has been proven by existing literature (DeGroot and Linders, 2006).In contrast to these findings it could be argued that this approach can be interpreted as a secondstep after applying a censoring model in a first one, obtaining only observations where spatialinteraction meaning commuting activity was found.

A presentation of the resulting commuting network is given in Fig. 3, where higher flow valuescorrespond to a thicker representation of the linkages and only flows bigger than the medianflow value are showed. This figure clearly shows dense linear features emanating between largercities in Switzerland, which also hints at the monocentric nature of employment in the areaof big cities and towns. The influence of the Swiss work-force is evident and must be takeninto account for the analysis of Swiss commuting-to-work patterns. Examining travel-to-workdistances within the public transportation flow network reveals that with filtering for zero flowsthe distribution of distances is heavily skewed towards the shorter ones (compare Fig. 2(a) andFig. 2(b)) and the maximum distance considered diminishes from 352 to 281 kilometers. Twoboxplots in the second row of Fig. 2 show logged flows for deciles of distance. Apparently, lowflows in the initial data set get filtered leading to higher median values for flows in the first 5deciles, which again shows the importance of the bigger cities in Switzerland. Note that for theleft boxplot, all zero flows were transformed with ynew = y + 1.

4.2 Explanatory variables

Modelling commuting behaviour requires a set of relevant explanatory variables that describethe characteristics of origins and destinations. The dependent variable, interregional commuting-to-work flows, is regressed on several independent variables obtained or derived from the 2000Census, the Swiss national transport model ARE (2005) and the Institute for Transport Planningand Systems (IVT) of ETH Zurich. These basic variables are often used to explain publictransport demand (LeSage and Thomas-Agnan, 2015; Farmer, 2011; Axhausen et al., 2015) (seeTable 3).

Network distance is reported as traveltime in minutes between municipalities. It basicallyresembles a generalised cost of travelling with public transport modes and incorporates notonly the raw traveltime, but also the waiting time at stations and the number of transfers ontravel-to-work trips. Hence, network distance reflects the structure of public transportation in

22

Page 27: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

Figure 2: Distributions of network distances and flows (in logs) before and after filtering

(a) Distance distribution before filtering

Distance (km)

Den

sity

0 50 100 150 200 250 300 350

0.00

00.

002

0.00

4

(b) Distance distribution after filtering

Distance (km)

Den

sity

0 50 100 150 200 250 300 350

0.00

00.

010

0.02

0

(c) Flows before filtering

1 2 3 4 5 6 7 8 9 10

02

46

810

Distance deciles

log(

flow

s)

(d) Flows after filtering

1 2 3 4 5 6 7 8 9 10

02

46

8

Distance deciles

log(

flow

s)

Table 3: Variables in the data set

Name Description Source

Commuting flows Annual working-day average of commuting flows Census 2000Network distance Generalised cost of travelling in minutes AREIncome Mean income per municipality IVTJob Accessibility Number of available jobs by public transport IVT

in neighbour municipalitiesPopulation Accessibility Number of available people by public transport IVT

in neighbour municipalitiesJobs Jobs per municipality Census 2000Jobs3rd Jobs in the 3rd sector (i.e. services) per municipality Census 2000, IVTWorkers Economically active population per municipality Census 2000Population Population per municipality Census 2000Area Area of municipality in (km2) Census 2000Car Cars per municipality ARE

23

Page 28: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatialmodelling

oforigin-destinationcom

muting

flows

with

endogenousw

eightmatrices

September2017

Figure 3: Map of filtered public transportation commuting flows within Switzerland in 2000. Flows emanate from the centroid of each municipality.

Map: Thomas Schatzmann

24

Page 29: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

a spatial grid and should influence the dependent variable negatively. Income is an importantvariable for transport demand as the difference of income between destinations and origins canbe seen as a reason to commute8. It is expected that income differences have positive influenceon flows. The big question about the influence of income is if it is a direct one or not. Tostart, it is assumed that income directly influences commuting flows, thus is exogenous withoutany other confounding effects. Job and population accessibility by public transportation aremeasures of available job positions and population in surrounding municipalities of origins anddestinations. They are constructed as follows (Sarlas and Axhausen, 2015)9:

Job accessibilityi =

j∑i

Jobs j ∗ exp(βcostαi j )

Pop. accessibilityi =

j∑i

Population j ∗ exp(βcostαi j )

(13)

Because they should show how municipalities generally compete against each other in terms ofavailable population and jobs, both measures should have a negative impact on the flow underconsideration, either at the origin or destination. All jobs overall and those in the service sector,population and economically active population should be positively correlated with flows. Thearea variable is used to calculate job and population density variables. Their influence on flowsare not clear a priori. The number of cars per commune yields information about the modechoice of the people and should therefore lower any public transportation flow by definition, asit is assumed that private and public transport are competing10.

4.3 Aspatial (gravity) model

As pointed out in Section 3, the starting point is a logged least-squares gravity model for ODcommuting flows with a slight modification compared to Eq. (7) in order to incorporate income

8Refer to Sarlas et al. (2015) for the derivation of the income per commune9Note that the parameters of the distance decay functions are taken from (Sarlas and Axhausen, 2015)

10Commuters using a mix of private and public transportation are not considered

25

Page 30: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

differences between destinations and origins:

y = ιαN X βoo X βd

d exp(incd − inco

inco

)δgγ

log(y) = α log(ιN ) + βo log(Xo) + βd log(Xd) + δ

(incd − inco

inco

)+ γ log(g) + ε

(14)

where Xo and Xd are characteristics of origins and destinations, g denotes the network distanceand

(incd−inco

inco

)reflects the relative difference of income between destination and origin munici-

palities. As changes in absolute differences would lead to bigger reactions due to the exponentialfunction, relative ones are used. The parameters associated with these variables are given by α,βo,d , δ and γ respectively. In the end, variables considered in the gravity model are summarisedin Table 4 and include the above listed variables either directly or transformed.

Table 4: Summary statistics for the model variables

Statistic Definition Mean St. Dev. Min Max

Flow average daily flows 11.1 79.1 1 5,698Network dist. minutes 74.1 42.8 5.8 730.3Income diff. rel.1 CHF (in 1,000) 0.068 0.235 −0.643 1,744Population (o) # inhabitants 13,661.970 40,809.420 45 363,273Jobs (d) # jobs 19,918.660 55,560.380 11 341,213Pop. density (d) # pop. / area (in km2) 1,372.3 1,719.4 1.5 9,581.1Job density (o) # jobs / area (in km2) 657.9 1,856.8 1.000 67,561.3Pop. access. (d) # access. pop. 277,352.7 224,990.9 93.8 1,064,884Job access. (o) # access. jobs 120,298.3 107,246.1 35.3 567,509.4Car (o) # cars / pop. 0.5 0.085 0 1.177Car (d) # cars / pop. 0.5 0.089 0 1.177Jobs3rd (o) # jobs3rd2 / # jobs 0.595 0.176 0.044 1.000Jobs3rd (d) # jobs3rd2 / # jobs 0.652 0.174 0.044 0.990Workers (o) # workers3 / pop. 0.524 0.036 0.277 0.726Workers (d) # workers3 / pop. 0.524 0.036 0.277 0.726

Note: N = 46,659(o),(d) = at origin, at destination municipalities1: see Eq. (14)2: Share of 3rd sector jobs per municipality3: Share of workers per municipality

26

Page 31: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

4.3.1 Results

After having identified a set of explanatory variables, the next step is to estimate the model inEq. (14) by OLS.

Table 5: Gravity model

Dependent variable: log(Commuting flows)

OLS

Estimate Str. Error Vif

(Intercept) 4.443∗∗∗ (0.133)log(Network distance) −1.537∗∗∗ (0.011) 1.30Rel. income diff. 0.085∗∗∗ (0.019) 1.42log(Jobs) (d) 0.473∗∗∗ (0.005) 2.63log(Pop. density) (d) 0.030∗∗∗ (0.005) 2.89log(Pop. access.) (d) −0.176∗∗∗ (0.007) 2.58log(Jobs3rd) (d) 0.102∗∗∗ (0.014) 1.51log(Car) (d) −0.071∗∗∗ (0.016) 1.08log(Workers) (d) 0.665∗∗∗ (0.067) 1.51log(Population) (o) 0.440∗∗∗ (0.006) 2.71log(Job density) (o) −0.043∗∗∗ (0.005) 2.87log(Job access.) (o) −0.180∗∗∗ (0.006) 2.43log(Jobs3rd) (o) −0.027∗∗ (0.013) 1.46log(Car) (o) −0.023∗ (0.014) 1.04log(Workers) (o) 0.365∗∗∗ (0.069) 1.52

HC robust std. errors YesObservations 46,659Adjusted R2 0.518Residual Std. Error 0.844 (df = 46,644)

Note: ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01

Table 5 shows parameter estimates, significance levels, standard errors (in brackets) and varianceinflation factors for the gravity model. All parameters are strongly significant except thosefor the share of 3rd sector jobs at origins and the share of cars per origin municipality havingp-values < 0.05 and < 0.1 respectively. The network distance decay parameter (-1.537) is withinthe expected range for commuting patterns and reads as follows: On average, for every percentincrease in network distance (in minutes) between origin i and destination j, commuting flows(number of commuters) decrease by 1.537%.

27

Page 32: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

All other explanatory variables have a much weaker impact on the dependent variable, but thisfinding is in line with the expectations of existing literature (LeSage and Thomas-Agnan, 2015;Farmer, 2011). Income differences between destinations and origins have a significant andpositive effect on travel-to-work trips and should be interpreted as an elasticity, since relativedifferences are used. This intuitively makes sense, as a higher income in another commune givesincentive to commute. Among the destination characteristics, an increase in the share of workers(economically active population per municipality) and the number of jobs yield the biggestinfluence on travel-to-work trips (0.665 and 0.473), whereas an increase in the accessibility ofpeople in neighbouring communes has the strongest negative influence on commuting (-0.176).A higher accessibility of population by public transport results in less transport demand forthe destination of the OD flow under consideration and thus can be interpreted as a kind ofcompetition variable. Regarding the origin-specific variables, the parameters for population andthe share of workers show the strongest positive impact (0.365 and 0.440) on commuting flows.An increase in both variables is positively related to travel demand, leading to higher flows awayfrom origin communes. If more jobs in the neighbouring communes are available by train andbuses, this has a negative, and again the biggest, effect on travel-to-work trips. Interestingly, ahigher number of jobs in the origin itself has a smaller effect on commuting flows compared tomore available jobs outside of it. As expected, cars have negative impact because they demandtravel and therefore compete public transportation.

The associated adjusted R-squared of 51.8% reported in Table 5 shows that a bit more than halfof the variation in the commuting flows can be explained by the OLS model. The residuals of thegravity model are almost normally distributed, yet exhibiting a slightly right skewed distributionand featuring a higher kurtosis (see Fig. 4(a)). Furthermore, heteroskedasticity robust std. errorsare calculated and presented to be safe with respect to non-constance variance of residuals. Thevariance inflation factors for all independent variables are below 3 and thus show no presence ofa multicollinearity problem.

4.3.2 Spatial dependence in the residuals

As emphasized in Section 3.1, OLS relies on independent observations. In the context of ODcommuting flows this assumes that the use of a network distance variable should eradicate thespatial dependence among the sample OD pairs, which is likely not the case in this setting, asGriffith and Jones (1980, p. 190) state that "flows associated with a destination are “enhanced ordiminished in accordance with the propensity of attractiveness of its neighboring destinationlocations". The same holds for flows from origins. Hence, residuals of gravity models indicatethe presence of untreated spatial effects (Curry, 1972).

28

Page 33: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

Figure 4: Gravity model diagnostics

(a) Distribution of the model residuals

Residuals

Den

sity

−4 −2 0 2 4

0.0

0.2

0.4

0.6

(b) Plot of the model residuals

1 2 3 4 5 6 7 8 9 10

−4

−2

02

4

Observation deciles (n = 46'659)

Res

idua

ls

In order to show that the residuals of the gravity model in Section 4.3.1 indeed exhibit spatialautocorrelation, the Moran’s I test (Moran, 1948) is applied to the OLS mean residuals groupedby origin and destination municipalities, using an origin- and a destination-centric spatial weightmatrix respectively. It should be noted that the final data sample with 46,659 observationscontains 1595 unique origins and 1537 unique destinations, hinting again at the monocentricnature of employment meaning that there are less destination working communes than origins inSwitzerland. Spatial weight matrices are constructed based on network (traveltime in minutes)and economic distance. The weights are inversed and hence weight more distant origins ordestinations less than closer ones. The spatial extent of autocorrelation in the OLS residualsis used as an indicator to define the size of neighbourhoods. Both plots on the left side inFig. 5 show that spatial autocorrelation is present when weighting with network distance,stronger at origins though (0.097 compared to 0.066), as the mean residuals of either originsor destinations are positively correlated with its spatially lagged disturbances. For the case oforigins, the spatial autocorrelation is significant up to a radius of 120 minutes of traveltimewhereas for destinations it is up to 100 minutes. Squares in the Moran scatterplots showinfluential observations (communes) which are able to influence the slope (global Moran’sI) more than proportionally. The maps on the right side in the figure show the influentialmunicipalities and it seems that there is no clear pattern or cluster overall, even though it isquite similar for both origins and destinations. Using economic distance weights (defined inEq. (15)) reduces the spatial autocorrelation at both origins and destinations (0.082 and 0.053),but nothings changes in the pattern of influencing observations across both weighting schemes.

29

Page 34: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatialmodelling

oforigin-destinationcom

muting

flows

with

endogenousw

eightmatrices

September2017

Figure 5: Spatial dependence in the mean OLS residuals with a network distance based spatial weight matrix

(a) Moran’s I plot for origins, MI: 0.097

−1 0 1 2

−0.

10−

0.06

−0.

020.

02

Mean residuals

Spa

tially

lagg

ed m

ean

resi

dual

s

39

557566

609781786

12081613

22162496288628952974

337135313536

353835753605

3732373437453789

3823 384438913911

4314

4621

5001

5002

5008

50135019

503850675070 5094

5120

512251325136

5143

5145

5147

5148

5151

51545156

5158

5160

5162

51635165

5167

51705171

5176

5178

5180

51815182

5184

5185

51865187

5189

5190

5191

5192

5193

51945195

5196

51975198

5199

52005201

52035205

5206

5208

5209

5210

5211

5212

52135214

52155217 5220

5221

5223

5224

522552275228

5229

5231

5233

5234

5235

52425247

524952505251

52545257

52585260

5263

5268

531453175321 551356255764589160066061

6084 61176195

624762856287 629162926300 66136615

6628664066436789

(b) Influential origins

0 50 km

(c) Moran’s I plot for destinations, MI: 0.066

−2 −1 0 1 2 3

−0.

10−

0.05

0.00

Mean residuals

Spa

tially

lagg

ed m

ean

resi

dual

s

263985214

305433

553557

601607

614 701786

874975987

1134

12031212

1220 207221382154

22312272

2307

284228663104 3294353135363551 35873603

38433844 38913985 41134168

4284

4312

50025005

50075015

50345038

5040

5047

5069 5070 50815098 5132

5145

51475148

515151545158

51625163

5165

5167

51705176

5180

5181

5182

5184

5185

5186

5187

5189

5191

5192

5193

5194 51955196

51975198

5199520152035205

52085209

5210

5211

5212

5213

521452155217

5219

5221

5222

5223

52245225

52275228

5231

5233

5234

5235

5241

5242

5247

52505254

5258

5262

5263

5266

5301 5303

5498551355375634

5642

56895702

5724

578357965803587158855891 59045925 6073

6132

6195

6215

642364776625

(d) Influential destinations

0 50 km

30

Page 35: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatialmodelling

oforigin-destinationcom

muting

flows

with

endogenousw

eightmatrices

September2017

Figure 6: Spatial dependence in the mean OLS residuals with an economic distance based spatial weight matrix

(a) Moran’s I plot for origins, MI: 0.082

−1 0 1 2

−0.

08−

0.04

0.00

Mean residuals

Spa

tially

lagg

ed m

ean

resi

dual

s

39557

566609

781786

12081613 2216249628862895297433713531

35363538

357536053732373437453789

3823 384438913911

43144621

5002

50085038

50675070 5094512251325136

5143

5147

5148

5151

5156

5158

5162

5163

5165

5167

5171

5176

5180

5182

5184

5185 518651875189

51905191

5192

5194

5195

5196

5197

51985199

5200

52015205

5206

52085209

5210

5211

5212

5213

5214

521552165217 5220

5221

5223

5224

5225

52275228

5231

5234

5235

52425247

5249

52505251

52545257

5263

5264

5268

53145317 5321 5513562557645891

600660616084 61176195

6247 62856287 629162926300 66136615 66286640

6789

(b) Influential origins

0 50 km

(c) Moran’s I plot for destinations, MI: 0.053

−2 −1 0 1 2 3

−0.

08−

0.04

0.00

Mean residuals

Spa

tially

lagg

ed m

ean

resi

dual

s

26 3985214

305433

553557

601

607612

614 701786

874

9871134

12031212

1220 20722138215422022231

2272 22982307

284228663104 3105 3294353135363551 35873603

38433844 38913985 4113

41684169

42754277

4284

4312

5002

5005

5007

50105015

50345038

5040

5047

5069 5070 50815098 5132

51445145

51475148

51515158 5162

5163

5167

51705176

5180

51815182

5184

5185

5186

51875189

51905191

5192

51945195

5196

51975198 51995201

52035205

5206

5208

5209

5210

5211

521252145215

5217

5221

5222

5223

52245225

52275228

5231

5234

5235

5241

524752505254

525852605266

53015303

54985513553756345702

5724

57835796580358715885

5890

589159045925 6073

6132

6195

621564236477

660666126621

6625

(d) Influential destinations

0 50 km

31

Page 36: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

4.4 Spatial autoregressive models

In the previous section it has been shown that network and economic distance based weightingof the mean residuals (by origin/destination) reveals untreated patterns of spatial dependence.This fact can be taken as justification for the need of models that are able to capture this kindof leftover autocorrelation. In this section, eight spatial autoregressive models are applied tothe case of OD commuting flows using both two weighting schemes in an origin-centric anddestination-centric fashion again. The definition of the weights is given in Eq. (15).

Network distance weights: wi j =1

(traveltimei j )

Economic distance weights: wi j =*.,

(traveltimei j )

exp( incd−inco

inco

) +/-

−1 (15)

For economic distance weights, traveltimes are weighted with the exponential of relative dif-ferences in communal incomes. For example: A positive difference resulting from higher aincome in destinations than origins for a given OD-dyad lowers traveltimes, implying a higherweight overall because of taking the inverse. Note that as a result of Section 4.3.2 networkand economic distances higher than a certain threshold are set to zero, thereby assuming thatthere is no more remaining spatial autocorrelation after it from origins and/or destinations. 120minutes of travelling away from origins and 100 minutes away from destinations are set asthresholds. Furthermore, a minmax-standardisation routine is applied to all weights, basically toaccount for the size of spatial units and to prevent the modifiable area unit problem (Kelejianand Prucha, 2010; Killer, 2014)11. These weights are now assigned to neighbouring origins inthe case of an origin-centric weight matrix, essentially weighting the corresponding commutingflows from neighbours of an origin to a specific destination. The same principle holds for thecase of destination-centric spatial weight matrices.

4.4.1 Results

SAR : y = αιN + ρiWiy + Xo βo + Xd βd + ε with i = o, d

SEM : y = αιN + Xo βo + Xd βd + ε with ε = λiWiε + µ, i = o, d

SAC : y = αιN + ρiWiy + Xo βo + Xd βd + ε with ε = λiWiε + µ, i = o, d

(16)

11More details about row- and minmax-standardization in Section 4.4.1

32

Page 37: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

A Lagrange multiplier test for spatial dependence applied to the OLS gravity model residualsin combination with all four weighting schemes, row- and minmax-normalized, indicates toestimate SAC models that contain spatial dependence in both the dependent variable and thedisturbances (see Table 6) because they consequently yield the highest statistics, especially forminmax-normalized weight matrices.

Table 6: Lagrange multiplier diagnostics for spatial dependence in OLS residuals

Model Normal. Robust SAR Robust SEM Robust SACStatistic p-value Statistic p-value Statistic p-value

Network dist. (o) row 229.6 < 2.2e-16 9,949.8 < 2.2e-16 16,143 < 2.2e-16Network dist. (d) row 588.6 < 2.2e-16 1,640.3 < 2.2e-16 1,743.2 < 2.2e-16Econ. dist. (o) row 183.6 < 2.2e-16 9,022.9 < 2.2e-16 15,394 < 2.2e-16Econ. dist. (d) row 621.7 < 2.2e-16 1,518.7 < 2.2e-16 1,572.9 < 2.2e-16Network dist. (o) minmax 4,200.2 < 2.2e-16 141,630 < 2.2e-16 156,400 < 2.2e-16Network dist. (d) minmax 1,315.1 < 2.2e-16 3,718.1 < 2.2e-16 6,976.9 < 2.2e-16Econ. dist. (o) minmax 3,447 < 2.2e-16 139,940 < 2.2e-16 153,040 < 2.2e-16Econ. dist. (d) minmax 793.83 < 2.2e-16 4,000.9 < 2.2e-16 6,379.1 < 2.2e-16

Up to this point, computer memory has been no problem for any of the calculations done sofar. Due to the filtering approach, the construction and implementation of the spatial weightsin R was difficult and highly customized. However, when it comes to the estimation of SEMand SAC models in the context of OD flows, computer memory is not big enough to run thesecalculations using the sphet package (Piras, 2010). The spdep package was not even able toestimate the SAR model. Furthermore, the current version of sphet does not allow to take threedifferent weight matrices Wi, i = (o, d,w) into account, as described in Eq. (8). It "only" allowsto incorporate one of these three specifications. Nevertheless, SAR models for all eight abovementioned weighting variants could be estimated. The results are presented in Table 7 andTable 8.

First of all, it has to be emphasized that parameter estimates of spatial autoregressive regres-sion models can not be interpreted as simple elasticities as in the gravity model, since spatialspillovers greatly complicate the task of interpreting estimates from these models. LeSage andThomas-Agnan (2015, p. 207) propose "scalar summary measures of these impacts that averageover changes applied to a single independent variable (regional characteristic) for all regions,analogous to how OLS regression model estimates are interpreted. The proposed approachallows separation of impacts by row/column and diagonal matrix elements, which we label:origin, destination, and intraregional effects". Those summary measures are not calculated here,but it is important to mention the limited interpretation of those estimates. It is even morecomplicated to show the cumulative network effects as a scalar summary. Even though theparameter estimates are not comparable between models regarding their interpretation, they

33

Page 38: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatialmodelling

oforigin-destinationcom

muting

flows

with

endogenousw

eightmatrices

September2017

Table 7: Spatial autoregressive models with network distance weight matrix

Dependent variable: log(Commuting flows)

(1) (2) (3) (4)

SAR network distance (o) SAR network distance (o) SAR network distance (d) SAR network distance (d)

(Intercept) 4.348∗∗∗ (0.134) 5.190∗∗∗ (0.126) 4.721∗∗∗ (0.137) 5.041∗∗∗ (0.130)log(Network distance) −1.506∗∗∗ (0.013) −1.322∗∗∗ (0.011) −1.594∗∗∗ (0.012) −1.378∗∗∗ (0.012)Rel. income diff. 0.083∗∗∗ (0.019) 0.027 (0.018) 0.091∗∗∗ (0.019) 0.111∗∗∗ (0.019)log(Jobs) (d) 0.450∗∗∗ (0.007) 0.298∗∗∗ (0.005) 0.466∗∗∗ (0.005) 0.485∗∗∗ (0.005)log(Pop. density) (d) 0.027∗∗∗ (0.005) 0.038∗∗∗ (0.005) 0.030∗∗∗ (0.005) 0.025∗∗∗ (0.005)log(Pop. access.) (d) −0.173∗∗∗ (0.007) −0.180∗∗∗ (0.006) −0.189∗∗∗ (0.007) −0.226∗∗∗ (0.007)log(Jobs3rd) (d) 0.100∗∗∗ (0.014) 0.087∗∗∗ (0.013) 0.100∗∗∗ (0.014) 0.086∗∗∗ (0.014)log(Car) (d) −0.067∗∗∗ (0.016) −0.001 (0.014) −0.077∗∗∗ (0.016) −0.087∗∗∗ (0.016)log(Workers) (d) 0.669∗∗∗ (0.066) 0.473∗∗∗ (0.062) 0.666∗∗∗ (0.067) 0.501∗∗∗ (0.065)log(Population) (o) 0.438∗∗∗ (0.006) 0.450∗∗∗ (0.006) 0.493∗∗∗ (0.007) 0.318∗∗∗ (0.006)log(Job density) (o) −0.040∗∗∗ (0.005) −0.041∗∗∗ (0.004) −0.046∗∗∗ (0.005) −0.044∗∗∗ (0.005)log(Job access.) (o) −0.174∗∗∗ (0.006) −0.239∗∗∗ (0.006) −0.179∗∗∗ (0.006) −0.182∗∗∗ (0.006)log(Jobs3rd) (o) −0.023∗ (0.013) −0.029∗∗ (0.012) −0.025∗ (0.013) −0.021∗ (0.013)log(Car) (o) −0.019 (0.013) −0.033∗∗∗ (0.012) −0.027∗ (0.014) 0.006 (0.013)log(Workers) (o) 0.348∗∗∗ (0.068) 0.276∗∗∗ (0.063) 0.418∗∗∗ (0.069) 0.300∗∗∗ (0.067)rho 0.071∗∗∗ (0.017) 1.824∗∗∗ (0.030) −0.222∗∗∗ (0.019) 1.596∗∗∗ (0.049)

HC robust std. errors Yes Yes Yes YesPseudo adj. R2 0.5238 0.5802 0.5149 0.5377Weighting row minmax row minmaxMoran’s I 0.06∗∗∗ 0.11∗∗∗ 0.20∗∗∗ 0.36∗∗∗

34

Page 39: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatialmodelling

oforigin-destinationcom

muting

flows

with

endogenousw

eightmatrices

September2017

Table 8: Spatial autoregressive models with economic distance weight matrix

Dependent variable: log(Commuting flows)

(5) (6) (7) (8)

SAR economic distance (o) SAR economic distance (o) SAR economic distance (d) SAR economic distance (d)

(Intercept) 4.331∗∗∗ (0.133) 4.993∗∗∗ (0.128) 4.655∗∗∗ (0.133) 4.845∗∗∗ (0.131)log(Network distance) −1.492∗∗∗ (0.013) −1.388∗∗∗ (0.011) −1.579∗∗∗ (0.013) −1.450∗∗∗ (0.012)Rel. income diff. 0.084∗∗∗ (0.019) −0.180∗∗∗ (0.019) 0.095∗∗∗ (0.019) 0.187∗∗∗ (0.020)log(Jobs) (d) 0.435∗∗∗ (0.007) 0.343∗∗∗ (0.006) 0.466∗∗∗ (0.007) 0.483∗∗∗ (0.005)log(Pop. density) (d) 0.025∗∗∗ (0.005) 0.040∗∗∗ (0.005) 0.035∗∗∗ (0.005) 0.028∗∗∗ (0.005)log(Pop. access.) (d) −0.176∗∗∗ (0.007) −0.180∗∗∗ (0.006) −0.202∗∗∗ (0.007) −0.205∗∗∗ (0.007)log(Jobs3rd) (d) 0.098∗∗∗ (0.014) 0.091∗∗∗ (0.014) 0.105∗∗∗ (0.014) 0.091∗∗∗ (0.014)log(Car) (d) −0.065∗∗∗ (0.016) −0.010 (0.014) −0.074∗∗∗ (0.016) −0.079∗∗∗ (0.016)log(Workers) (d) 0.662∗∗∗ (0.066) 0.452∗∗∗ (0.063) 0.672∗∗∗ (0.066) 0.582∗∗∗ (0.066)log(Population) (o) 0.438∗∗∗ (0.006) 0.445∗∗∗ (0.006) 0.494∗∗∗ (0.006) 0.364∗∗∗ (0.007)log(Job density) (o) −0.039∗∗∗ (0.005) −0.040∗∗∗ (0.004) −0.046∗∗∗ (0.005) −0.042∗∗∗ (0.005)log(Job access.) (o) −0.168∗∗∗ (0.006) −0.221∗∗∗ (0.006) −0.166∗∗∗ (0.006) −0.185∗∗∗ (0.006)log(Jobs3rd) (o) −0.023∗ (0.013) −0.025∗∗ (0.012) −0.023∗ (0.013) −0.026∗∗ (0.013)log(Car) (o) −0.019 (0.013) −0.027∗∗ (0.013) −0.022 (0.013) −0.005 (0.013)log(Workers) (o) 0.351∗∗∗ (0.067) 0.310∗∗∗ (0.065) 0.416∗∗∗ (0.067) 0.318∗∗∗ (0.068)rho 0.119∗∗∗ (0.020) 2.066∗∗∗ (0.063) −0.228∗∗∗ (0.020) 1.066∗∗∗ (0.053)

HC robust std. errors Yes Yes Yes YesPseudo adj. R2 0.5278 0.5594 0.5156 0.5274Weighting row minmax row minmaxMoran’s I 0.05∗∗∗ 0.09∗∗∗ 0.19∗∗∗ 0.33∗∗∗

35

Page 40: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

show slight variation throughout the different models and with the exception of relative incomedifference (d) in the sixth model and rho in the seventh, there are no changes in any of the signs.Car (d) in model four changes as well, but is not statistically significant anymore.

Rho (the autoregressive parameter) is of particular interest. For all SAR models with minmax-normalization (models 2,4,6 & 8), its value is higher than 1, which is an artefact of usingthat approach for the spatial weights when building Wi, i = (o, d) instead of classic row-normalization (Kelejian and Prucha, 2010). Because in applications it is typically found thatIn − ρWn is singular for some values of ρ ∈ (−1, 1), researchers normalize each row of theirspatial weights. In row-standardization, where each row sums to unity, In − ρWn (the spatiallinear operator from Eq. (4)) is non-singular for all ρ ∈ (−1, 1), which is needed to estimate themodel. A different normalization factor for the elements of each row is used, rather than a singleone for the whole matrix as in the minmax approach, which in general leads to a misspecifiedmodel according to (Kelejian and Prucha, 2010). Instead, minmax also considers columns.Taking W ∗n =

Wn

τnand selecting (-1,1) as the parameter space for ρ∗n = ρ∗nτn is equivalent to

choosing (−1/τn,1/τn) as ρ′s parameter space, leading to eigenvalues (vi,n, . . . , vn,n) of ρnWn

less than one in absolute values (see first equation in Eq. (17)). Since for large sample sizes thecomputation of the eigenvalues on Wn is difficult, equation two in Eq. (17) gives boundaries thatare simple to compute.

τn = max|v1,n |, . . . , |vn,n |

τ∗n = min

max1≤i≤n

n∑j=1

|wi j,n |, max1≤ j≤n

n∑i=1

|wi j,n |

(17)

For τn ≤ τ∗n , (In − λWn) is non-singular for all values of λ in the interval (−1/τ∗n ,1/τ∗n ). This

shows why the autoregressive parameter may be larger than 1, which in turn is not possible withrow-standardized weights by definition.

All models yield positive and significant values for the ρ′s, stating that neighbours at ori-gins or destinations influence the particulars OD flow under consideration positively. It canbe said, that the use of economic distance weights lowers the autoregressive parameter fordestinations and produces a higher for origins in the minmax setting. The estimate for networkdistance as a variable is quite constant across all models, yet exhibiting slightly lower valuesfor minmax models in both model "groups", relying on either network or economic distanceweight matrices. They are highly significant in all eight models. Relative income differencebetween destinations and origins does not yield a clear pattern in the transition from network to

36

Page 41: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

economic distance weighting. The estimates are among the smaller ones in all models and notconsiderably different to the gravity model. Two peculiar estimates relate to models two and six,where the relative income difference is very small and not statistically significant (0.027), butthen changes to a clearly negative and highly significant value (-0.180). This finding contradictsthe earlier stated expectation of relative income difference that should influence travel-to-worktrips positively. Furthermore, the estimate increases strongly between model four and eight,suggesting considerably higher importance of income differences at destinations in the case ofminmax weighted matrices. All other estimates have expected signs and do not vary much overmodels 1-8. Since the Akaike information criterion cannot be calculated for GMM models, apseudo adj. R2 is reported. Spatial models mostly perform better than the gravity model. Withrespect to the adj. R2, minmax weighting seems somewhat superior to row-normalizing as theyyield higher scores for all models. Additionally, it is also worth mentioning that within therow-normalized models (1,3,5 & 7), those based on economic distance weights yield a higherscore, whereas network distance based spatial lag models perform better within the group ofminmax-standardized matrices (2,4,6 & 8). Note that pseudo R2 values must be treated withcaution, as they are not equivalent to OLS-based R2 measures.

To see whether there is still remaining autocorrelation, Moran’s I statistics are showed again (lastrow). For example: The residuals of models with origin-wise spatial weight matrices are therebycross-tested with destination-based spatial weight matrices and vice versa. The resulting Moran’sI values indicate that origin-centric spatial weights are worse compared to destination-centricones in terms of remaining autocorrelation. According to theory, SAR models should yieldunbiased and consistent estimates accounting for leftover spatial dependence in the residuals ofthe gravity model. But neither origin- nor destination-wise spatial weights are able to completelysolve the problem of remaining autocorrelation. Nevertheless, SEM and SAC models wouldprovide more valuable insights concerning the consistency of the estimates, as the Lagrangemultiplier tests are in favour of those models.

4.5 Endogeneity

The problem of endogeneity is severe for any model if it exists. It results in biased andinconsistent estimates, making models and inference useless. In this framework, the meanincome as an economic characteristic of origins/destinations and as part of the spatial weightmatrix is used to explain variation in commuting flows. Initially, as explained in Section 4.2,income has been assumed to be exogenous in order to estimate a first model - the gravitymodel. Under this assumption, spatial models have been applied to account for untreated spatialdependence in the residuals as a first step. Because of the economic nature of income it may well

37

Page 42: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

be that there is an omitted variable, causing the disturbances to be correlated with the regressorin the case of the gravity model. Even worse in SAR models, where the regressor and spatiallyweighted dependent variable are both correlated with the error terms. This fact violates theconditional mean assumption which essentially means that is not possible to fully distinguishthe influence of and between each variable in the model.

To account for endogeneity in the gravity model an Instrumental Variable (IV) approach isused in order to get a consistent (but biased) and less efficient (than OLS) estimator. Generally,instruments provide a solution for threats to internal validity that cause a non-zero expectedconditional error term. Theoretically, the estimation of the model is splitted in two stages: Afirst one to isolate the uncorrelated part of the explanatory variable(s) with the disturbances.The second step uses the predicted values from the first stage in the original causal relationship.Both stages use OLS, but despite the name, estimation is done in a single step in order to getright standard errors. The most difficult part is basically finding valid instruments, satisfyingtwo conditions: Instrument relevance and exogeneity.

As stated before, it is difficult to think of income to be exogenous in the case of commuting.First and foremost, there may be other (omitted) variables explaining variation in travel-to-worktrips that are correlated with income - taxes at municipality level for example. Second, it isdifficult to assume no interaction with other variables in the model. In general terms, because ofstrong interrelations of transportation, human settlement, urban agglomeration and economicactivities concentrated in cities, the gravity model should be tested for endogeneity since incomeis used as a variable. Usually, family background, workforce variables or characteristics of jobpositions are used when it comes to find instruments for income. Sarlas et al. (2015) foundevidence for the positive impact of the latter on mean salaries. The variables that are chosen asinstruments are listed in Table 9.

4.5.1 Results

Generally, two groups of instruments can be distinguished: Instrumental variables 1-4 reflectsector specific attributes of jobs while the latter ones relate to required skills. All above listedIVs are included in the 2SLS regression framework. The results are presented in Table 10. Inorder to have a valid IV model according to existing theory, three tests are considered. Therejection of the F-test on the instruments in the first stage reveals that there actually are no weakinstruments, i.e. no weak first stage-relationship. The Wu-Hausmann tests the consistency of theOLS estimates under the assumption that IV is consistent. Due to its rejection OLS indeed isinconsistent, suggesting that endogeneity is present. The last test is called Sargan or J-test andtests instrument exogeneity using overidentifying restrictions. Since it is not rejected it can be

38

Page 43: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

Table 9: Instrumental variables (IV) in the data set

Name Description

Working (hotels, restaurants) Positions in the hotel & restaurant sectorWorking (manufacturing) Positions in the manufacturing sectorWorking (3rd sector) Positions in the service sectorWorking (other private sector) Positions in the private sectorTertiary education Positions requiring tertiary educationProfessional training Positions requiring professional trainingVocational training Positions requiring less than vocational trainingQualification 1 Positions with highest qualification demandsQualification 2 Positions with professional skillsManagement Position with no managerial duties

Source: Census 2000

concluded that the chosen instruments are valid.

All remaining variables are highly significant and deviate only little from the OLS estimates,which are showed aside to ease comparability. The estimate of income difference has propablychanged the most in IV, yielding a stronger, yet still small positive impact on commuting flows.In terms of model fit, omitting two variables in the case of IV results in a slightly smaller R2

(0.518).

4.5.2 Treating endogeneity in spatial autoregressive models

In this section a recipe is given to treat for endogeneity in the more complicated case of SARmodels. Incorporating an endogenous regressor as income results in biased and inconsistentcoefficients as is known. Thus, the models in Table 7 are not correct anymore. The abovepresented IV model now acts as a kind of base case in order to estimate spatial autoregressivemodels that account for endogeneity. Since valid instruments for the income difference betweenorigin and destination municipalities have been found, it is now possible to use the predicted andthus corrected income values (see first equation in Eq. (18)) of the first stage for constructing

39

Page 44: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

Table 10: Instrumental variable model (IV)

Dependent variable: log(Commuting flows)

(1) (2)

OLS IV (2SLS)

(Intercept) 4.443∗∗∗ (0.133) 4.452∗∗∗ (0.130)log(Network distance) −1.537∗∗∗ (0.011) −1.537∗∗∗ (0.010)Rel. income diff. 0.085∗∗∗ (0.019) 0.134∗∗∗ (0.029)log(Jobs) (d) 0.473∗∗∗ (0.005) 0.471∗∗∗ (0.004)log(Pop. density) (d) 0.030∗∗∗ (0.005) 0.028∗∗∗ (0.005)log(Pop. access.) (d) −0.176∗∗∗ (0.007) −0.177∗∗∗ (0.006)log(Jobs3rd) (d) 0.102∗∗∗ (0.014) 0.103∗∗∗ (0.014)log(Car) (d) −0.071∗∗∗ (0.016) −0.071∗∗∗ (0.016)log(Workers) (d) 0.665∗∗∗ (0.067) 0.669∗∗∗ (0.065)log(Population) (o) 0.440∗∗∗ (0.006) 0.441∗∗∗ (0.005)log(Job density) (o) −0.043∗∗∗ (0.005) −0.041∗∗∗ (0.006)log(Job access.) (o) −0.180∗∗∗ (0.006) −0.178∗∗∗ (0.006)log(Jobs3rd) (o) −0.027∗∗ (0.013) −0.028∗∗ (0.013)log(Car) (o) −0.023∗ (0.014) −0.023∗ (0.013)log(Workers) (o) 0.365∗∗∗ (0.069) 0.363∗∗∗ (0.067)

Used instruments: See Table 9

IV diagnostic tests:df 1 df 2 Stat. Signif.

Weak instruments 13 46632 1286.218 ∗∗∗

Wu-Hausmann 1 46643 4.751 ∗

Sargan 12 NA 18.293

HC robust std. errors Yes YesObservations 46,659 46,659Adjusted R2 0.518 0.518Residual Std. Error 0.844 (df = 46644) 0.844 (df = 46644)

Note: ∗p<0.1; ∗∗p<0.05; ∗∗∗p<0.01

40

Page 45: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

the spatial weights.

G( incd

inco− 1

)= all instruments + all exog. variables + ε

Eco. dist.wi j =*..,

(traveltimei j )

exp(Gincd

inco− 1

) +//-

−1 (18)

By directly including predicted values of income in the construction of the spatial weight matrix,it can be accounted for its previous endogenous elements. The further procedure has already beenexplained in Section 3.3: By applying Drukker et al.’s 4 step estimation method, endogeneityissues should be accounted for. An implementation in R is also available and can be used inorder to re-estimate the SAR models based on the formula of the IV regression.

4.5.3 Further research

This thesis only scratches at the surface of OD flow modelling in the context of public trans-port commuting flows, hence further research should be done. First, the above mentionedprocedure could be applied to be able to compare those models with such that are not takingendogeneity into account. Some further examination of the spatial weight matrices could bedone. For instance, the SAR model could incorporate a combined weight matrix since there isstill remaining autocorrelation in the residuals (Wo and W(d) summed up). Since the spatialregression models in this thesis only make use of one spatial weight matrix at once, origin-and/or destination-centric, an extended version could be implemented to be able to stick closerto LeSage and Pace (2008) and their filtering approach. This would also imply to improve thesphet package, which currently can only incorporate one spatial weight matrix. Furthermore,once computer memory is not limiting the calculation anymore, also SEM and SAC modelsshould be calculated to better investigate the model estimates. In addition, a comparison of thepredictive accuracies of all models could be made, in- and also out-of-sample, which shouldgive more valuable information on the model performances when it comes to further exploitthem. Another extension of this thesis could be to model private transportation flows in the sameway, to see how they compete against each other or not.

41

Page 46: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

5 Conclusion

In this thesis, a direct transport demand model for OD commuting flows is presented. A casestudy for Switzerland, including 2896 municipalities, is designed in order to illustrate theunderlying mechanisms. It is based on data from the 2000 Census, the Lohnstrukturerhebung(2000) and the National Transport Model (2000). Further variables are based on calculations bythe Institute for Transport Planning and Systems of ETH Zurich.

In order to define the scope of this framework, a literature review summarises the most importantaspects and problems concerning the modelling of (commuting) flows and provides an adequate,yet not complete overview over spatial models that are of interest for this thesis. The reviewconsiders three classes of models and their estimation methods that are related with the spatialmodelling of OD commuting flows: Spatial interaction and spatial econometric models as wellas a specific OD flow modelling approach based on LeSage and Pace (2008). Additionally, ithighlights the importance of the correct specification of the models in order to get unbiased andconsistent estimates for the variables of interest.

The methodology section shows the applied procedure: A three step process to examine theproblem of spatial dependence in origin-destination (OD) commuting flows and that of endoge-nous weighting matrices and regressors when using economic distance as underlying impedancefunction. The starting point is a simple gravity model relying on independent observations,which is then replaced by spatial autoregressive models that are based on two weighting schemes(network and economic distance) in order to account for untreated spatial dependence in thegravity model. Both weighting schemes are applied in an origin- and destination-centric wayas in OD flow modelling municipalities are engaged as either origins or destinations. The laststep checks if endogeneity is present in the gravity and the spatial models by applying an IVregression, using valid instruments for the assumed endogenous variable: the mean income.

Because regional salary data is only available for 1595 communes and the large number of zerovalued flows in the initial data set, a filter method is employed, which gives a final sample of46,659 observations. A gravity model was applied in the first step and its estimates were in linewith expectations of existing literature concerning its statistical and economical importance.As a justification for the need of spatial autoregressive models, two Moran’s I tests with anunderlying network distance based weight matrices, one origin-wise and one destination-wise,showed that the gravity model’s residuals contains patterns of remaining autocorrelation up to aradius of 140 minutes of traveltime. As a next step four SAR models, with four different weightmatrices, are estimated. LM tests indicated to also estimate SEM and SAC models, but due tocomputer memory limits this was unfeasible. The weighting matrices are based on network

42

Page 47: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

and economic distance in order to check if economic similarities between communes providea more appropriate way to model travel-to-work. It was found that the SAR model estimatesindeed are more intuitive and appropriate than those of the gravity model. The use of economicdistance as spatial weights slightly lowered the estimates for the network distance variable,indicating less importance of it. The autoregressive parameters were all positive and significant(except one), meaning that surrounding communes positively influence the OD flows underconsideration. The remaining explanatory variables remained stable across all models in signand magnitude. In the last and third step, it was shown that by using valid instruments in a IVframework, mean income is indeed endogenous, resulting in biased and inconsistent estimatesin the aspatial and spatial model(s) presented before. Further more, a recipe is explained howto estimate also SAR models that are able to account for endogeneity in the spatial weightsand endogenous regressors simultaneously. Lastly, further research options and extensions arepresented to continue exploring the spatial modelling of OD commuting flows.

43

Page 48: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

6 References

Anselin, L. (1988) Spatial Econometrics: Methods and Models, Springer-Science+BusinessMedia, B.V., Dordrecht.

ARE (2005) Nationales Personenverkehrsmodell des UVEK, Swiss National Transport Model

2000, Bern.

Axhausen, K., T. Bischof, R. Fuhrer, R. Neuenschwander, G. Sarlas and P. Walker (2015)Gesamtwirtschaftliche Effekte des öffentlichen Verkehrs mit besonderer Berücksichtigungder Verdichtungs- und Agglomerationseffekte, Schlussbericht, Arbeitsberichte Verkehrs- und

Raumplanung, 1079, ETH Zurich, Zurich.

Barry, R. and R. Pace (1999) A monte carlo estimator of the log determinant of large sparsematrices, Linear Algebra and its Applications, 289 (1) 41–54.

BFS and ARE (2017) Verkehrsverhalten der Bevölkerung, Ergebnisse des Mikrozensus Mobilität

und Verkehr 2015, Neuchatel and Bern.

Cameron, A. and P. Trivedi (2013) Regression Analysis of Count Data, Cambridge UniversityPress, Cambridge.

Cliff, A. and J. Ord (1973) Spatial Autocorrelation, Pion Press, London.

Curry, L. (1972) A spatial analysis of gravity flows, Regional Studies: The Journal of the

Regional Studies Association, 6 (2) 131–147.

Cushing, B. and J. Poot (2003) Crossing boundaries and borders: Regional science advances inmigration modelling, Papers in Regional Science, 83 (1) 317–338.

DeGroot, H. and G.-J. Linders (2006) Estimation of the gravity equation in the presence of zeroflows, Tinbergen Institute Discussion Papers, 72 (3).

Drukker, D. M., P. Egger and I. R. Prucha (2013) On two-step estimation of a spatial autore-gressive model with autoregressive disturbances and endogenous regressors, Econometric

Reviews, 32 (5-6) 686–733.

Farmer, C. (2011) Commuting flows & local labour markets: Spatial interaction modelling oftravel-to-work, Ph.D. Thesis, National University of Ireland, Maynooth.

Flowerdew, R. and M. Aitkin (1982) A method of fitting the gravity model based on the poissondistribution, Journal of Regional Science, 22 (2) 191–202.

44

Page 49: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

Fotheringham, A. and M. O’Kelly (1989) Spatial Interaction Models: Formulations and Appli-

cations, Kluwer Academic, Dordrecht.

Griffith, D. (2004) Faster maximum likelihood estimation of very large spatial autoregressivemodels: An extension of the smirnov-anselin result, Journal of Statistical Computation and

Simulation, 74 (1) 855–866.

Griffith, D. A. (2007) Spatial structure and spatial interaction: 25 years later, The Review of

Regional Studies, 37 (1) 28–38.

Griffith, D. A. and K. G. Jones (1980) Explorations into the relationship between spatial structureand spatial interaction, Environment and Planning A, 12 (2) 187–201.

Kelejian, H. H. and G. Piras (2014) Estimation of spatial models with endogenous weightingmatrices, and an application to a demand model for cigarettes, Regional Science and Urban

Economics, 46 (1) 140–149.

Kelejian, H. H. and I. R. Prucha (1998) A generalized spatial two-stage least squares procedurefor estimating a spatial autoregressive model with autoregressive disturbances, The Journal of

Real Estate Finance and Economics, 17 (1) 99–121.

Kelejian, H. H. and I. R. Prucha (1999) A generalized moments estimator for the autoregressiveparameter in a spatial model, International Economic Review, 40 (2) 509–533.

Kelejian, H. H. and I. R. Prucha (2010) Specification and estimation of spatial autoregressivemodels with autoregressive and heteroskedastic disturbances, Journal of Econometrics, 157 (1)53–67.

Killer, V. (2014) Understanding spatial interactions in models of commuting behaviour, Ph.D.Thesis, ETH Zurich, Zurich.

Lambert, D., J. Brown and R. Florax (2010) A two-step estimator for a spatial lag model ofcounts: Theory, small sample performance and an application, Regional Science and Urban

Economics, 40 (4) 241–252.

Lee, M.-l. and R. K. Pace (2005) Spatial distribution of retail sales, The Journal of Real Estate

Finance and Economics, 31 (1) 53–69.

LeSage, J. P. (1997) Bayesian estimation of spatial autoregressive models, International Regional

Science Review, 20 (1 & 2) 113–129.

LeSage, J. P. (2000) Bayesian estimation of limited dependent variable spatial autoregressivemodels, Geographical Analysis, 32 (1) 19–35.

45

Page 50: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

LeSage, J. P., M. M. Fischer and T. Scherngell (2007) Knowledge spillovers across Europe:Evidence from a Poisson spatial interaction model with spatial effects, Papers in Regional

Science, 86 (3) 393–421.

LeSage, J. P. and R. K. Pace (2008) Spatial econometric modeling of origin-destination flows,Journal of Regional Science, 48 (5) 941–967.

LeSage, J. P. and R. K. Pace (2009) Introduction to Spatial Econometrics, CRC Press, NewYork.

LeSage, J. P. and C. Thomas-Agnan (2015) Interpreting spatial econometric origin-destinationflow models, Journal of Regional Science, 55 (2) 188–208.

Moran, P. (1948) The interpretation of statistical maps, Journal of the Royal Statistics Society,2 (10) 243–255.

Ortuzar, J. d. D. and L. G. Willumsen (2011) Modelling Transport, John Wiley & Sons, Ltd,Chichester.

Pace, R. and R. Barry (1997) Quick computation of spatial autoregressive estimators, Geograph-

ical Analysis, 29 (3) 232–247.

Pace, R. and J. LeSage (2004) Techniques for improved approximation of the determinantterm in the spatial likelihood function, Computational Statistics and Data Analysis, 45 (1)179–196.

Paelinck, J. and L. Klaassen (1979) Spatial Econometrics, Saxon House, Farnborough.

Piras, G. (2010) sphet: Spatial models with heteroskedastic innovations in R, Journal of

Statistical Software, 35 (1).

Porojan, A. (2001) Trade flows and spatial effects: The gravity model revisited, Open Economies

Review, 12 (3) 265–280.

Qu, X. and L.-f. Lee (2015) Estimating a spatial autoregressive model with an endogenousspatial weight matrix, Journal of Econometrics, 184 (2) 209–232, feb 2015.

Ranjan, P. and J. L. Tobias (2007) Bayesian inference for the gravity model, Journal of Applied

Econometrics, 22 (1) 817–838.

Rouwendal, J. and A. Van der Vlist (2005) A dynamic model of commutes, Environment and

Planning A, 37 (12) 2209–2232.

Sarlas, G. and K. Axhausen (2015) Prediction of AADT on a nationwide network based on anaccessibility-weighted centrality measure, Arbeitsberichte Verkehrs- und Raumplanung, 1094,ETH Zurich, Zurich.

46

Page 51: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

Sarlas, G. and K. Axhausen (2017) Mean speed prediction with endogenous volume andspatial autocorrelation: A Swiss mean speed prediction with endogenous volume and spatialautocorrelation: A Swiss case study, Arbeitsberichte Verkehrs- und Raumplanung, 1275, ETHZurich, Zurich.

Sarlas, G., R. Fuhrer and K. Axhausen (2015) Quantifying the agglomeration effects of Swisspublic transport between 2000 and 2010, 15th Swiss Transport Research Conference (STRC2015), Ascona, Switzerland.

Sen, A. and T. E. Smith (1995) Gravity Models of Spatial Interaction Behavior, Springer-Verlag,Berlin.

Tobler, W. R. (1979) Cellular Geography, Philosophy in Geography, 20 (1) 379–386.

Wilson, A. (1967) A statistical theory of spatial distribution models, Transportation Research,1 (3) 253–269.

Wilson, A. (1971) Family of spatial interaction models, and associated developments, Environ-

ment and Planning, 3 (1) 1–32.

Zhou, Y., X. Wang and J. Holguin-Veras (2016) Discrete choice with spatial correlation: Aspatial autoregressive binary probit model with endogenous weight matrix (SARBP-EWM),Transportation Research Part B: Methodological, 94 (1) 440–455.

47

Page 52: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with

Spatial modelling of origin-destination commuting flows with endogenous weight matrices September 2017

A Appendix

A.1 Software details

The thesis was written in TEXstudio (http://www.texstudio.org/). All statistical re-sults in were obtained using the R statistical programming language (http://www.r-project.org/, version 3.3.1 (2016-06-21)) and RStudio (https://www.rstudio.com/).The following packages were used: AER (1.2-5), dplyr (0.7.0), ggplot2 (2.2.1), maptools (0.9-2), Matrix (1.2-8), reshape2 (1.4.2), rgdal (1.2-6), rgeos (0.3-23), sp (1.4-4), spdep (0.6-11),stargazer (5.2), SwissCommunes (0.0-7), SwissHistMunData (0.0-2), as well as basic functionsand custom R code.

48

Page 53: Spatial modelling of origin-destination commuting … › content › dam › ethz › special-interest › baug › ...Spatial modelling of origin-destination commuting flows with