street smarts: measuring intercity road quality using deep ...gabca/docs/... · street smarts:...

Street Smarts: Measuring Intercity RoadQualityUsing Deep Learning on Satellite Imagery

ABSTRACTHigh-quality roads are the scaffolding for prosperous andhealthy societies, and accordingly garner huge investmentsfrom governments every year. However, current techniquesto monitor those investments tend to be time-consuming,laborious, and expensive, placing them out of reach for manydeveloping regions. In this work, we develop a model formonitoring the quality of road infrastructure using satel-lite imagery, enabling much larger scale and much lowercosts than are achievable with current methods. For thistask, we harness two trends: the increasing availability ofhigh-resolution, often-updated satellite imagery, and sub-stantial improvement in accuracy and performance of neuralnetwork-based methods for executing computer vision tasks.In this study, we train a model for intercity road qualityprediction using a unique dataset of road quality measure-ment labels (57 roads, total length is 7000km) throughoutthe Republic of Kenya combined with corresponding 50cmresolution satellite imagery. Using a variety of neural net-work architectures, we create and evaluate regressionmodelsfor predicting road quality. Our results show a best-case R2

value of 0.79 for the regression problem using a standardtrain-test split and an R2 value of 0.35 for the substantiallyharder held-out regression problem which has the addedpotential to generalize more readily to other contexts. Inaddition, we demonstrate the potential of our measurementtechnique with a case study that compares the quality ofintercity roads entering 322 large towns spread throughoutKenya against the corresponding satellite nighttime illumi-nation, showing a positive relationship between road qualityand nighttime illumination, a common proxy measurementfor local economic activity. We believe these results indicatethe possibility to measure road quality at an unprecedented

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copiesare not made or distributed for profit or commercial advantage and thatcopies bear this notice and the full citation on the first page. Copyrightsfor components of this work owned by others than the author(s) mustbe honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee. Request permissions from [email protected] COMPASS ’19, July 03–05, 2019, Accra, Ghana© 2019 Copyright held by the owner/author(s). Publication rights licensedto ACM.ACM ISBN 978-1-4503-9999-9/18/06. . . $15.00https://doi.org/10.1145/1122445.1122456

scale, providing insight into the contribution of high-qualityroads to many societal development indicators. Code for ourstudy is available from [OMITTED FOR REVIEW].

ACM Reference Format:. 2019. Street Smarts: Measuring Intercity Road Quality Using DeepLearning on Satellite Imagery. In ACM COMPASS ’19: ACM SIGCASConference on Computing and Sustainable Societies, July 03–05, 2019,Accra, Ghana. ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/1122445.1122456

1 INTRODUCTIONHigh-quality roads are among the foremost infrastructure forhastening societal development. Roads enable goods, people,and ideas to travel easily, leading to better equity in serviceprovision, faster economic development, and ultimately, bet-ter human outcomes. Though enormous sums are spent onroads – for example, in sub-Saharan Africa, 1.5% of totalGDP is spent on roads [31] – funds for road maintenanceconsistently fall short, a problem arising from an inabilityto prioritize investments [5]. This is partially due to limitedmeasurement of road quality, which requires large amountsof labor, time, and expensive equipment. As Lord Kelvin fa-mously said: "If you cannot measure it, you cannot improveit."

In urban developing settings, where road usage is heavierand increasingly more people travel with sensor-laden smart-phones, crowdsourcing data on road quality is possible [33].However, rural settings are not as conducive to smartphone-based solutions. In this work, we present a viable alternativefor measuring road quality in rural, resource-constrained set-tings: models for predicting road quality from remote sensingimagery. Our models leverage recent advances in two areas:satellite technology and computer vision. A proliferation ofsatellite companies has resulted in increasingly higher resolu-tion images collected more frequently; in developing regions,this imagery is as high as 30-50cm resolution and some urbanareas are imaged on a near-daily basis. Meanwhile, advancesin computer vision have produced techniques for creatingand applying sophisticated neural network-based modelswith thousands to millions of training examples. Further, wecan explore the potential for domain adaptation, where wecan apply our models that have been trained in one setting toanother setting entirely, a capability that has huge potentialbenefits in cost-constrained contexts.

https://doi.org/10.1145/1122445.1122456

https://doi.org/10.1145/1122445.1122456

https://doi.org/10.1145/1122445.1122456

ACM COMPASS ’19, July 03–05, 2019, Accra, Ghana

Our training dataset consists of road roughness measure-ments collected by specialized equipment over 7000 km ofinterurban roadways through diverse terrain in Kenya. Weemploy this unique dataset to train a regression engine toproduce estimates of road quality based solely on observingsatellite imagery. This learning task is well-suited to devel-oping regions, where sensing approaches using fixed infras-tructure, expensive mobile equipment, and even smartphone-based systems may be infeasible. Our models are built uponconvolutional neural network architectures [13, 16, 28] withmodifications to accommodate our regression task. We focuson the particular domain adaptation challenge of predictionfor held-out roads, which have been explicitly excluded fromthe training set to evaluate whether our model can accuratelypredict road quality using imagery at places or times thatit has never seen before. To exhibit the potential of our ap-proach to road quality measurement, we present a case studyon the positive correlation between road quality and townprosperity, as measured by satellite nighttime illumination.

2 BACKGROUND AND RELATEDWORKRoad quality measurement: The road quality measure usedthroughout our study is called the International RoughnessIndex (IRI), developed by the World Bank in 1986 [25]. IRImeasures cumulative vertical displacement of a vehicle alonga stretch of road due to the roughness of the road surface,is typically provided in units of m/km, and is commonlycollected using a specialized vehicle with a mounted laser.IRI values can be any positive real number, where higherIRI values imply worse road quality, and typical values fallbetween 0 and 30. While not explicitly a measurement ofroad quality, in practice IRI has been found to have very highcorrelation with user perception of road smoothness.

The equipment used for measuring IRI is high precision,complex, and expensive. Governments in developing coun-tries with oversubscribed budgets for infrastructure can sel-dom afford to pay for these equipment and carry out thisprocedure on a regular basis [10]. The result is that devel-oping countries either conduct road quality surveys as in-frequently as once every few years or even do not conductfull, accurate road quality measurements. Some researchersare leveraging cheap accelerometers and gyroscopes thatare fitted with mobile phones to measure road quality [8],but these cheap sensors are incapable of handling contin-uous acceleration and vibration intensity for more than afew minutes without losing calibration. In addition, smartphones typically use Assisted GPS which relies on nearbycell towers for better GPS accuracy, but due to poor cellularnetwork systems in developing regions, this process is quiteexpensive in terms of data and battery consumption that arerequired to maintain the accuracy of GPS.

CNNs and LSTMs on satellite imagery. The increasing ap-plication of convolutional neural networks (CNNs) and longshort-termmemory/recurrent neural networks (LSTM-RNNs)to satellite imagery has been partly due to faster and cheapercomputational power [26], efficient and less computationalintensive algorithms, increasing availability of satellite-gatheredimage datasets in the public domain, and transfer learning.As a result, these algorithms are being applied to satellite im-agery for more complicated tasks such as large-scale damagedetection after calamities [9], land use classifications [3], andgenerating human-like descriptions of satellite images [27].This means that highly accurate algorithms trained on tra-ditional images can be used to evaluate satellite images viatransfer learning.

Another related and popular area of work is road detectionusing satellite imagery, which has spawned substantial re-search [19, 30], competitions [6, 18], and even companies [1].The road detection problem seeks to identify the locationsof road infrastructure, and while most work focuses on in-dustrialized regions, there are examples of work that targetmore challenging unpaved roads [4]. Nonetheless, while thetwo problems – road detection and road quality estimation –have some complementarity (for example, the latter problemcan leverage outputs of road detection algorithms), in mostcases systems solving these problems are independent andemploy entirely different algorithms and metrics.

Traditionally, LSTMs have been applied to recognize pat-terns in sequential data such as speech, text, and video [24,32, 34]. Previous work on using LSTMs on satellite imageryincludes temporal vegetation modelling for crop identifica-tion [23]. However, we take a different approach by lever-aging the spatial characteristics of individual image patchesand the sequential nature of a single stretch of a road (roadpatches in a sequence) to harness the advantages of LSTMs.Essentially, by adapting sequences of patches to imitate atime-series of images, we create image frames in successionthat act as inputs to an LSTM.This paper builds upon previous work by the authors on

the topic of measuring road quality using satellite imagery,recently published in a workshop [OMITTED FOR REVIEW].While the input datasets for both pieces of work are thesame, in this work, we explore many additional methods toimprove prediction performance (recurrent neural networksand auto-encoders), perform regression analysis instead ofclassification, and conduct a novel and significant case studyto demonstrate the unique value of our road quality mea-surement techniques for studying economic activity in adeveloping region.

Street Smarts: Measuring Intercity RoadQuality Using Deep Learning on Satellite ImageryACM COMPASS ’19, July 03–05, 2019, Accra, Ghana

Figure 1: Three different roads highlighting the challengingdiversity of our dataset. Left: an urban environment alongthe A104 highway. A104 is a major highway in Kenya andthe selected road tile is ’great’ quality. Center: the C47 mi-nor road. It passes through an arid environment and theroad segment has ’poor’ quality. Right: the C67 minor road.It passes through large forests and cropland and the roadsegment in the image has ’good’ quality.

Figure 2: Roads with labeled quality data as collected by theKenya National Highway Authority (KenHA). Indicated arethe original dataset and a dataset that has been filtered tomatch availability of concurrent satellite imagery.

3 METHODOLOGYDatasetsOur intent is to ultimately predict the quality of a road seenin satellite imagery. Towards this goal, we employ two maindatasets: one set of road quality measurements and a cor-responding set of satellite imagery, both for the country ofKenya.

The dataset of road quality measurements used to train ourmodels consists of IRI measurements conducted at a resolu-tion of 10m along a diverse set of 57 roads throughout Kenya,resulting in samples over a total length of 7000km [14]. Amap of the dataset is available in Figure 2. This dataset wascollected as the result of a partnership between the KenyaNational Highway Authority (KenHA) and the Japanese In-ternational Cooperation Agency (JICA). Each measurementis tagged with a latitude and longitude ("lat-lon") and a dateof survey (during 2013-2015). The roads can vary from tensto several hundred kilometers in length and, as we showin Figure 1, span a wide variety of road sizes, terrain types,and land usage. Additionally, IRI measurements are oftenbucketized into 5 road quality classes: great (0-7), good (7-12), fair (12-15), poor (15-20) and bad (20+). Figure 1 alsoshows examples of roads falling into these categories. Roadsin our data can also be split into three administrative classes:Class A, linking centers of international importance; ClassB, linking national centers within the country; and ClassC, linking provincially-important centers. These roads com-prise the fabric of Kenya’s road transport system, serving asthe primary interlinkage between major towns throughoutthe country.The satellite imagery we are using is the DigitalGlobe

Basemap+Vivid product [7] and the coverage is the entiretyof Kenya. We employ two iterations of this imagery product,each of which is a mosaic of roughly 6300 tiles that formsthe illusion of a continuous map by stitching together sev-eral images collected at different points in time by multipledifferent satellites. The +Vivid product is post-processed toaccount for orthorectification, color correction, and cloudcover, though the latter is still a problem in some remote ar-eas. The first mosaic, compiled in November 2014, consists ofimagery from the QuickBird-02 and WorldView-02 satellites,and is composed of tiles with collection dates ranging from2002 to 2014. The second mosaic, compiled in September,2017, consists of imagery from the QuickBird-02, GeoEye-1,WorldView-02, andWorldView-03 satellites, and is composedof tiles from 2002 to 2017. The typical resolution of the tilesis roughly 50 cm per pixel, and each of the two image mosaicdatasets is 7 − 8TB.

While the satellite imagery and road quality datasets areboth impressively large, the wide range of dates coveredby the tiles of each mosaic coupled with the range of datesof the IRI measurements creates a mismatch. This issue ap-pears often when learning on satellite imagery [15], but isparticularly acute in our scenario since road quality can ex-perience sudden and potentially substantial changes (i.e.,due to weather or construction) in a way that only a seriousemergency may impact other attributes commonly predicted


RoadLength(km)

Bad(%)

Poor(%)

Fair(%)

Good(%)

Great(%)

A104 269.2 2.2 1.1 1.5 6.3 88.9A109 86.54 0 0 1.9 10.5 86.6A23 10.31 44.9 17.9 12.6 24.2 0.3B8 47.56 3.4 9.8 18.3 50.5 17.9B9 16.64 13.6 31.9 21.3 31.1 2C31 39.33 0 0 0 1.5 98.5C32 30.03 61.7 20.5 8.8 7.9 1.2C33 44.79 0.1 1.0 2.4 19.3 76.2C36 23.05 11.2 9.5 8.3 21.5 49.6C42 40.88 31.9 13.5 6.7 9.9 38.0C47 104.92 40.3 35.8 14.8 8.9 2.4C51 40.89 2.1 3.8 4.9 16.3 72.9C54 27.62 4.4 1 2.8 13.9 77.8C67 37.86 90.3 4.4 3.1 2.3 0C68 16.94 56.3 20 15.1 8.6 0C69 95.07 0 0.1 1.7 26.1 72.0C76 41.89 79.5 16 4.4 0 0C77 110.40 22.2 16.6 21.8 36.1 3.2C78 28.71 16 20 17 39.3 8C83 21.38 100 0 0 0 0C96 19.07 5 23 29 39 4All 1153 19.2 9.5 7.6 16 47.7

Table 1: A summary of the diverse set of roads in our labeledand filtered data set recording both the length and distribu-tion of road quality labels. For each road, the modal roadquality class is in bold. The set ranges from first-class high-ways (e.g., A104) to rough dirt roads (e.g., C67) and includesroads with significant internal variation (e.g., C77).

via satellite imagery (like wealth). In an ideal data-collectionscenario the maximum time period requirement would be amonth or even a week, though given the reduced frequencyof data collection in developing regions, this is untenable.Ultimately, to ensure that imagery reasonably matches thecondition on the ground when the IRI sample was collected,we decided to restrict our label dataset to only those sampleswhere the difference between the two dates was 12 monthsor less. Selecting any period of time shorter than 1 yearfor the maximal time discrepancy would have significantlydecreased the amount of labeled data. This left us with atradeoff between not having enough data to properly train adeep net and possibly having some incorrect labels. Our idealscenario of using data no more than one month out of datewould have left us with only 340 kilometers of road and 40%of the unique roads in our 1-year set. Given that anythingmore than three months already entails a possible shift be-tween seasons it was decided that the extra data provided bya maximal discrepancy of 1 year was tolerable. This designdecision results in a subset of the samples from the larger

Figure 3: An example of how a segment of road can be sepa-rated into tiles. The top segment shows a road divided intooverlapping 64x64 squares; this generates the tiles shown inthe middle segment. The tiles are always aligned in the di-rection of the road (red arrow). The bottom shows the samesegment if it were divided into 224x224 tiles instead. Notethat the road constitutes a much smaller proportion of eachtile relative to the 64x64 case.

IRI dataset used for training; this subset consists of samplescovering 1153km over 21 roads, as detailed in Table 1, whichalso includes a breakdown of the classes of labels for eachroad. Additionally, a map of the filtered dataset is availablein Figure 2. This set of roads includes paved and dirt roads,consistently high-quality and consistently low-quality roads,and roads with high variability in quality.

One nuance of the data set is deciding what the fundamen-tal unit of training data will be. We define this unit as a patch,and define it as a quadrilateral such that the length of thepatch is parallel to the course of the road and the width isperpendicular as seen in Figure 3. Our IRI measurements areat intervals of approximately 10 meters (20 pixels in our im-agery data) so possibilities for length are bounded below by20 pixels. Arguments for a smaller patch size include greatergranularity and the road forming a greater proportion ofthe patch’s area. However patches of lesser dimensions maysometimes not include some or all of the road due to ran-dom noise in the latitude-longitude pairs associated with IRIvalues. We settled on a compromise of 64x64 pixels whichwas robust enough to account for this noise and also neatlycovers 3 IRI measurements per patch.


Training and metricsThe IRI data set is sufficiently fine-grained as to allow sev-eral different choices for what exactly to predict. In the firstcase, IRI is a numeric measure (again, note that lower valuesimply higher quality) but is often broken down into the 5road quality classes. One could choose to either predict theunderlying IRI number of a length of road directly or insteadattempt a classify which of the five aforementioned classesit falls into. This work pursues the former approach for thereason that this is both informative and avoids corner caseswith stretches of roads falling near the threshold betweentwo different labels.As such, our work focuses on a regression problem and

will record results in terms of the mean square error (MSE)and the R2 coefficient. MSE gives us an absolute averagederror while R2 explains how much of the total variance inthe IRI is explained by our prediction. Instead of directlyusing the IRI values y∗i , we establish a maximum thresholdT = 30 and train on the labels yi =

min(y∗i ,T )

T . We feel this isjustified since anything above an IRI of 20 is already bad andthe visual/practical difference between a tile of IRI 30 andanother with IRI 40 is minimal. Note that since any predictedIRI value can easily be mapped to a prediction of one of thefive road quality classes we can also measure the accuracy ofour predictions if they were used for a classification task asopposed to a regression. We report these accuracies through-out our results for comparative purposes even though we donot at any time train for classification accuracy.A final consideration is how to define the train and test

sets in this scenario. Since the training data has a sequentialnature, randomly splitting the data into train and test sets(as is usually done) would result in cases where patches thatappear next to each other in the satellite imagery might be inboth the training and test set. In addition to the problem oftest data contamination, this testing scenario would be verydifferent from the use case that we are targeting where onewould seek to predict on an entirely unseen road. As such, wedevise two more appropriate methods of generating trainingand test sets. The first is done by splitting the entire set into1-kilometer long ’runs’ which are then randomly assignedto the train or test set with proportion 70%-30% – we callthis the standard method since it more closely resemblesthe random train-test split. The second method is to assignan entire road to the test set and the remaining 20 roadsto the train set and average the result over the 21 possiblesplits (one with each road held out): we call this the held-outsplit procedure. Though this very closely approximates a realapplication we note that this method breaks an often centralassumption of machine learning methods: that the train andtest sets are drawn from the same distribution. As the held-out problem is much harder to predict, results reported using

the held-out methodology are significantly worse than thosereported using the standard methodology. However, held-outpredictions are potentially more impactful, as results cangeneralize to unseen contexts.

Convolutional neural nets and auto-encodersConvolutional Neural Networks (CNNs) are a class of ma-chine learning models that have shown excellent promisein a number of visual processing tasks. Though initially fo-cused on the idea of classifying images into many categories(such as ImageNet [22]), these models have also been appliedsuccessfully on satellite imagery. Sometimes, complex pre-trained models can be re-purposed on another task with lessdata through a process known as transfer learning. Howeverwith enough training data, the structure of successful netscan be re-used while all the parameters are re-learned. Weexperimented with both approaches but went with the lat-ter after noting that we had a sufficiently large data set totrain networks from scratch. Thus, we began with Resnet,AlexNet and VGG-11 [11, 17, 29] as initial network structuresand then simply replaced the last layer of fully connectedlayer nodes with a single sigmoid function instead. We thentrained using our 64x64 tiles scaled to 224x224 pixels.While our labeled dataset is already fairly large, we dis-

cussed in Section 3 that this only represents 15% of the totalof the roads in our data-set; the remainder had to be discardedsince the labels might be out of date. However auto-encodersprovide an alternative to supervised CNNs that allow us toleverage that large set without relying on the labels. Con-volutional auto-encoders consist of two parts: an encoderwhich compresses the images down to k features and a de-coder which attempts to reverse the encoding back to theoriginal image. Training this network to attempt to recreatethe original image as closely as possible should ideally lead toa k-dimensional representation of the image that preservesas much information as possible. We can leverage this bytraining the auto-encoder over the larger, complete set ofroads to learn a very efficient representation of any given tileand then doing an L2-regularized regression of these featureson our training set. We perform this with a 2-convolutionauto-encoder with k = 1000 alongside retraining the afore-mentioned CNNs.

Sequence learning via LSTMsAnother avenue for exploration is how much the sequentialstructure of roads can be leveraged tomore accurately predictroad quality. In the simplest sense we can keep the samefundamental aim of predicting yi but instead of using onlythe tile image xi, one can use the last s {xj∥i − s < j ≤ i}.Slightly more complex would be the case where we use thesame segments of satellite imagery but attempt to insteadpredict the average IRI of the entire segment yi =

∑ij=i−s yi .


5-class accuracy Regression R2

Base net Standard Held-out Standard Held-outResnet 0.69 0.44 0.79 0.24VGG-11 0.71 0.47 0.78 0.26AlexNet 0.73 0.49 0.66 0.21

Auto-encoder 0.65 0.41 0.78 0.31Table 2: 5-class accuracy and regression R-squared resultsunder standard train-test and held-out conditions for thesingle-tile regression problem.

We handle both of these cases by first using the auto-encoder to featurize all the roads and grouping them intocontiguous sequences of length s . We will use the same sim-ple 1-layer LSTM with 500 internal nodes, changing only theobjective to optimize between the two. The LSTM is thentrained with L2 regularization to prevent overfitting and werecord the held-out results in Section 4.

4 RESULTSSingle tile regressionWe first compare how different network structures and theauto-encoder regression perform under the aforementionedstandard and held-out testingmethodologies in Table 2. Resnetand AlexNet were retrained from scratch while VGG-11 wastransfer learned with only its classifier component beingretrained. All the CNNs were trained over 10 epochs of thedata, augmented by random horizontal and vertical flips, andcompleted within a few hours when trained on a GPU cluster.We found that after around 10 epochs the training loss wasroughly flat, and continuing to train would likely only resultin overfitting. The auto-encoder was trained overnight onthe unlabelled data set first for 20 epochs and then simplyregressed with an L2 penalty. Though training the auto-encoder itself is time-consuming, this is only a one-time taskand it can be later used to quickly featurize any road.One immediate observation that can be made is that the

more realistic held-out test case is significantly harder thanthe standard scenario. However the results are encouraginggiven that achieving even the baseline result (accuracy of0.20 and an R2 of 0.0) is not guaranteed when the train andtest sets are entirely different. This provides evidence thatthe problem is indeed approachable using standard machinelearning techniques. The second observation we wish tohighlight is that accuracy is fairly good for a 5-class problemeven though we at no point directly optimize for accuracy.This is a consequence that estimating the IRI value fairly ac-curately will translate to a correct estimate of the road qualityclass and seems to bolster the idea that directly regressingthe IRI and then transforming it to less granular measuresas the application calls for it is a viable idea. We note thatmeasuring accuracy does not distinguish between one error

5-class accuracy Regression R2

Sequence length Last Mean Last Mean1 0.41 0.41 0.31 0.3110 0.42 0.43 0.34 0.3525 0.43 0.43 0.35 0.32

Table 3: Results for LSTMs in the held-out test scenario asa function of the length of sequence trained on. Regressingon the final tile value (last) are compared to regressing onthe average tile value (mean).

mistaking a "fair" tile for a "good" tile and another error mis-taking a "poor" tile for an "excellent" tile, thus potentiallyunderstating the predictive quality of the CNNs.We find that there is not much to separate between the

different CNN classes in terms of performance. Similar initiallearning rates and dropout rates were used in all, thoughVGG-11 had more problems with overfitting compared tothe other two and did not move beyond transfer learning. Interms of the key regression metric, we found that the auto-encoder regression outperformed the other CNN methods.This is likely due to its superior generalization performanceon unseen roads. In this we find a notable advantage ofbeing able to leverage the entire set of roads; the featurerepresentation from the auto-encoder was much more stablein the held-out scenario than it was on the other CNNs. Thiswas likely due to the fact that the auto-encoder could lookat a much more diverse set of roads to determine how tofeaturize a patch, as opposed to the only 21 roads that aCNN could use. Figure 4 illustrates this by comparing theresults on individual roads for Alexnet to those of the auto-encoder regression. In addition to better overall performancethe auto-encoder has fewer roads with very high error: thisis important for real-world application as we would like tobe confident of some guarantee of predictive power.

Tile sequence regressionAfter the single-tile regression results, we wished to explorewhether incorporating the sequence of tiles leading up to thefinal one would improve our predictive quality. As discussedin Section 3, we wished to explore this for both the averageand last tile IRI. Based on the earlier results in Section 4, we fo-cused our efforts only on the held-out test scenario since thestandard method already had strong results for single-tile re-gression and the former is in any case a more representativeapproximation of real-world applications. We also decided tofocus on the autoencoder features instead of using the CNNfeatures since the two-step featurization/sequence trainingmakes the strong generalization performance of the formeran important asset.

Our results are summarized in Table 3. These show a mod-est improvement over the non-sequential LSTM though these


Figure 4: Figures showing the distribution of Mean square errors (y-axis, lower indicates better predictive power) of differentroads using a Resnet CNN (left) and auto-encoder regression (right). The x-axis is a measure of the heterogeneity of the road,the color provides the average road quality, and the circle size indicates the relative sizes of the roads. Comparisons to VGG-11and Resnet yield similar results.

initial experiments do not seem to suggest much improve-ment when increasing the sequence length over 10. Howeverthis is likely influenced by the well-known difficulties oftraining on longer sequence LSTMs and may not reflect theactual limit of this technique. Further investigation in boththe structure of the LSTM as well as its training will beimportant.

5 CASE STUDY: CORRELATION BETWEEN ROADQUALITY AND ECONOMIC ACTIVITY

We believe that the results in Section 4 characterize a newmeasurement tool with strong accuracy, high resolution,and applicability in a wide array of terrain types. In thissection, we extend our evaluation of this technique beyondthe confines of analyzing only road quality and examinethe implications of road quality more broadly on economicdevelopment. Specifically, we attempt to test the relationshipbetween high-quality roads and the prosperity of the townsthat they connect, and we perform our analysis at the scaleof an entire country.

For measuring economic conditions, we employ a dataseton nighttime light illumination collected in an imaging prod-uct called the Visible Infrared Imaging Radiometer Suite (VI-IRS) Day-Night Band [20]. The data used to create this imag-ing product are collected via a daily fly-by of theNASA/NOAASuomi National Polar-Orbiting Partnership; results are post-processed to create monthly data on illumination level forevery pixel of the earth’s surface, where each pixel is roughly450m x 450m and data are available for each month sinceApril 2012 [21]. The pixel data range in value from -1.4011 to32641.72, with higher values representing more illumination.

A wide range of studies have employed these data for gran-ular measurements in developing regions, including measur-ing total economic activity [12], the incidence of poverty [15],electrification [2], electricity infrastructure corridors [20],and many others. In our case, we use the nightlights data asa generic proxy for economic activity; through this simpli-fied lens, we can estimate relative differences in economicactivity both spatially and temporally.

For our analysis, we choose to compare quality of intercityroads and the economic activity of major towns (as measuredvia nightlights) throughout Kenya. Using data from OpenStreet Maps (OSM), we select all OSM-named "places" inKenya with a population above 5000 within 2km (n = 924).Then, for each of these places, we select all roads incidentupon each place with length greater than 5km; this targetsintercity roads. For this analysis, we exclude any places inNairobi County, which is nearly continuously urban. To onlyconsider more highly-connected places and to expedite com-putation, we limit our road quality analysis to exclude placesthat have fewer than 4 segments entering town and we re-strict those segments to only the final 10km entering eachtown. Finally, we randomly selected half of the remainingplaces, leaving us with a dataset of 322 places and 11,376kmof associated roadway. Figure 5(a) shows a map of the townlocations in Kenya, with each location marker sized accord-ing to its population.

Figure 5(b) shows a comparison of road quality and night-time illumination for each of these towns. For each roads,we estimate road quality by selecting 10 meter patches ofimagery and applying an autoencoder and a fully-connectedlayer for making the predictions, according to the methods


(a) Towns in Kenya examined in our case study

0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0Mean Nightlight DN

7.5

10.0

12.5

15.0

17.5

20.0

22.5

25.0

Mea

n IR

I

(b) Comparison of Road Quality and Nighttime Illumination for each town

Figure 5: Comparison of mean intercity road quality (measured by International Roughness Index) and mean recent 6-monthnighttime illumination (measured as a "Digital Number") for a sample of towns in Kenya (n = 322). For (a), place marker sizerepresents relative population size, and for (b), a line of best fit is plotted (R2 = −0.25). Note that lower values of IRI correspondto better road quality.

described in Section 3. This provides estimates of IRI for eachpatch, which we aggregate to produce the mean and standarddeviation of IRI values for each road segment. We combinethese mean IRI values to quantify the mean road quality foreach place weighted by the length of each road (capped at10 km). For nighttime illumination, we use the mean of themost recent six months of data (June to December 2018).From Figure 5(b), we can see a weak relationship betweenroad quality and nighttime illumination (recall that lowervalues of IRI correspond to better road quality). While thecorrelation is weak (R2 = −0.25), the relationship is robustto removal of places with low illumination (i.e., NightlightDN < 1.0) or places with high illumination (DN > 10.0).While it is not at all expected that road quality would be thesole determinant of local economic activity, we believe thatthese results do show a level of correlation that bears furtherexamination.

We acknowledge a number of other shortcomings of thisapproach. For one, nightlight illumination is only an ap-proximate proxy for local economic activity, especially asmany rural areas have very low levels of nightlights, satel-lite flyover times are often in the middle of the night, and

an increasing number of locations are being electrified bymore efficient and decentralized electricity sources that mayproduce less ambient and wasted light. Additionally, not con-sidering the entirety of the entering roads may poorly reflectthe actual ability for goods, services, and people to entereach town. Last, the question of mismatched timing betweenwhen images were taken and when nightlight values wererecorded contributes some disparity. Last, we acknowledgethe accuracy challenges that our models have for predictingquality for individual patches. While this is of concern, webelieve this is somewhat mitigated by the consideration thatmost use cases we envision are focused on the quality ofsegments of roads rather than that of individual patches; forsegments, our approach should provide suitable accuracy.On the other hand, for applications that require high patch-by-patch accuracy like pothole detection, our methods maybe insufficient.Despite these challenges, we believe that this case study

shows that the measurement technique that we have de-veloped enables examination of road quality throughout anentire country, an unprecedented scale. Beyond the valu-able purposes of improving investments in construction and


maintenance of roads, we believe that our work presents theopportunity to correlate road quality to many other societaldevelopment indicators. This includes access to other infras-tructure or services (e.g., water resources, cellular towers,financial services, education and health facilities, markets,cropland, etc.), household survey responses (e.g., educationand health indicators, wealth measurements, perspectiveson governance, etc.), and other remote sensing data (e.g.,weather and climate information). While our work repre-sents only a beginning, it can enable a wide range of studieson the impact of road quality on different aspects of societaldevelopment.

6 FUTUREWORK AND CONCLUSIONSIn this work we describe a methodology to infer the qualityof intercity roads in developing regions, with the primarygoal of enabling useful and practical applications. To dothis, we trained models using satellite data and road rough-ness data from Kenya and demonstrated that the modelsperformed well in some cases in locations previously un-seen, while remaining cognizant of remaining challenges.We saw that while the normal train-test paradigm can beapproached readily, achieving reliable results on the held-outcase is significantly harder. We also demonstrated a noveluse case of our road quality measurement at a larger scalethan traditional methods would feasibly allow.

There are several machine learning aspects that could beexplored further. Though we have shown that auto-encodersare greatly beneficial in this scenariowhere there is a plethoraof unlabelled data, understanding the ideal setup will takefurther investigation. What sort of convolutional structure,whether it should have any regularization, and how far tocompress each tile will all need to be investigated with rigor.Likewise for the recurrent neural network structure. In thisvein, further investigation of the LSTM structure is a possi-bility as is the idea of using an entirely different type of RNNto model the continuous sequential nature of this data. Thiswould be an interesting novelty compared to the discretesequential nature of data such as words in a sentence orframes in a video.More germane to potential use cases, we can further ex-

plore the ability of different modeling decisions and methodson the ability to generalize to different contexts, especiallythose further afield. We can also consider the effects of im-proved or degraded satellite imagery quality on results andmeasure performance improvements from including otherfeatures available from further remote sensing data (e.g., landuse data or multispectral imagery).In general, more accurate and granular measurement of

road quality can lead to reduced road maintenance costs,allowing expensive rehabilitation efforts to be replaced bytargeted repairs. Further, these capabilities can empower

governments, donors, and policymakers to identify particu-larly hazardous roads and monitor the short- and long-termperformance of construction firms and contractors, improv-ing public safety and enabling more efficient public invest-ments. Additionally, these models can enhance the work ofeconomists and others researching public policy in a varietyof domains, ideally leading to a clearer understanding of thelevers of societal development in a diverse array of contexts.

REFERENCES[1] [n. d.]. CrowdAI. https://crowdai.com.[2] Adedamola Adepetu and Jay Taneja. 2016. Filling Spatial and Temporal

Gaps in Development Surveys Using Night Lights. In UNESCO ChairConference on Technologies for Development (Tech4Dev 2016).

[3] Adrian Albert, Jasleen Kaur, and Marta C. Gonzalez. 2017. UsingConvolutional Networks and Satellite Imagery to Identify Patternsin Urban Environments at a Large Scale. In Knowledge Discovery andData Mining.

[4] Jining Bao, Yunzhou Zhang, Xiaolin Su, and Rui Zheng. 2018. Unpavedroad detection based on spatial fuzzy clustering algorithm. Journal onImage and Video Processing 26 (2018).

[5] Monica Beuran, Marie Castaing Gachassin, and Gael Raballand. 2013.Are ThereMyths on Road Impact and Transport in Sub-SaharanAfrica?(2013).

[6] Ilke Demir, Krzysztof Koperski, David Lindenbaum, Guan Pang, JingHuang, Saikat Basu, Forest Hughes, Devis Tuia, and Ramesh Raskar.2018. DeepGlobe 2018: A Challenge to Parse the Earth through SatelliteImages. In DeepGlobe Workshop at CVPR (DeepGlobe 2018).

[7] DigitalGlobe. [n. d.]. Basemap +Vivid Product. dg - cms -uploads-production.s3.amazonaws.com/uploads/document/file/2/DGBasemapVividDS1 .pdf.

[8] Lars Forslöf and Hans Jones. 2015. Roadroid: Continuous road condi-tion monitoring with smart phones. Journal of Civil Engineering andArchitecture 9, 4 (2015), 485–496.

[9] Lionel Gueguen and Raffay Hamid. 2015. Large-Scale Damage De-tection Using Satellite Imagery. In The IEEE Conference on ComputerVision and Pattern Recognition (CVPR).

[10] Ken Gwilliam and Zmarak Shalizi. 1999. Road Funds, User Charges,and Taxes. The World Bank Research Observer 14, 2 (1999), 159–186.https://dx.doi.org/10.1093/wbro/14.2.159

[11] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deepresidual learning for image recognition. In IEEE conference on ComputerVision and Pattern Recognition (CVPR).

[12] J. Vernon Henderson, Adam Storeygard, and David N. Weil. 2012.Measuring Economic Growth from Outer Space. American EconomicReview 102, 2 (2012), 994–1028.

[13] Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf,William J Dally, and Kurt Keutzer. 2016. Squeezenet: Alexnet-levelaccuracy with 50x fewer parameters and< 0.5 mb model size. arXivpreprint arXiv:1602.07360 (2016).

[14] Japanese International Cooperation Agency. [n. d.]. Summaryof Terminal Evaluation. www.jica.go.jp / english / ourwork /evaluation/techandgrant/project/ term/africa/c8h0vm000001rp75-att/kenya201501.pdf.

[15] Neal Jean, Marshall Burke, Michael Xie, W Matthew Davis, David BLobell, and Stefano Ermon. 2016. Combining satellite imagery andmachine learning to predict poverty. Science 353, 6301 (2016), 790–794.

[16] Alex Krizhevsky, Ilya Sutskever, and Geoffrey EHinton. 2012. Imagenetclassification with deep convolutional neural networks. In Advancesin neural information processing systems (NIPS).

https://crowdai.com

dg-cms-uploads-production.s3.amazonaws.com/uploads/document/file/2/DG_Basemap_Vivid_DS_1.pdf



https://dx.doi.org/10.1093/wbro/14.2.159

www.jica.go.jp/english/our_work/evaluation/tech_and_grant/project/term/africa/c8h0vm000001rp75-att/kenya_2015_01.pdf




[17] Alex Krizhevsky, Ilya Sutskever, and Geoffrey EHinton. 2012. Imagenetclassification with deep convolutional neural networks. In Advancesin neural information processing systems (NIPS).

[18] David Lindenbaum. [n. d.]. Introducing the SpaceNet Road Detec-tion and Routing Challenge and Dataset. https://medium.com/the-downlinq/introducing- the- spacenet- road-detection-and-routing-challenge-and-dataset-7604de39b779.

[19] Volodymyr Mnih and Geoffrey E. Hinton. 2010. Learning to DetectRoads in High-Resolution Aerial Images. In European Conference onComputer Vision (ECCV 2010).

[20] NASA. [n. d.]. Suomi NPP VIIRS Land. https://viirsland.gsfc.nasa.gov/index.html.

[21] NOAA EOG. [n. d.]. Version 1 VIIRS Day/Night Band Nighttime Lights.https://ngdc.noaa.gov/eog/viirs/downloaddnbcomposites.html.

[22] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, SanjeevSatheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla,Michael Bernstein, et al. 2015. Imagenet large scale visual recogni-tion challenge. International Journal of Computer Vision 115, 3 (2015),211–252.

[23] Marc Rußwurm and Marco Korner. 2017. Temporal vegetation mod-elling using long short-term memory networks for crop identificationfrom medium-resolution multi-spectral satellite images. In Proceedingsof the IEEE Conference on Computer Vision and Pattern RecognitionWorkshops. 11–19.

[24] Haşim Sak, Andrew Senior, Kanishka Rao, and Françoise Beaufays.2015. Fast and accurate recurrent neural network acoustic models forspeech recognition. arXiv preprint arXiv:1507.06947 (2015).

[25] M. W. Sayers. 1996. On the calculation of International RoughnessIndex from longitudinal road profile. Transportation Research Record1501 (1996), 1–12.

[26] Robert R Schaller. 1997. Moore’s law: past, present and future. IEEEspectrum 34, 6 (1997), 52–59.

[27] Zhenwei Shi and Zhengxia Zou. 2017. Can a machine generate hu-manlike language descriptions for a remote sensing image? IEEETransactions on Geoscience and Remote Sensing 55, 6 (2017), 3623–3634.

[28] Karen Simonyan and Andrew Zisserman. 2014. Very deep convo-lutional networks for large-scale image recognition. arXiv preprintarXiv:1409.1556 (2014).

[29] Karen Simonyan and Andrew Zisserman. 2014. Very deep convo-lutional networks for large-scale image recognition. arXiv preprintarXiv:1409.1556 (2014).

[30] Mingjun Song and Daniel Civco. 2004. Road Extraction Using SVM andImage Segmentation. Photogrammetric Engineering & Remote Sensing70, 12 (2004), 1365–1371.

[31] The World Bank. 2017. Africa’s Pulse. (2017).[32] Amin Ullah, Jamil Ahmad, Khan Muhammad, Muhammad Sajjad, and

Sung Wook Baik. 2018. Action recognition in video sequences usingdeep bi-directional LSTM with CNN features. IEEE Access 6 (2018),1155–1166.

[33] Aisha Walcott-Bryant, Reginald E Bryant, Michiaki Tatsubori, DanielEmaasit, Samuel Osebe, John Wamburu, and Simone Fobi. 2017. TheLiving Roads Project: Giving a Voice to Roads in Developing Cities.Transportation Research Board – 96th Annual Meeting (2017).

[34] Yue Zhang, Qi Liu, and Linfeng Song. 2018. Sentence-state lstm fortext representation. arXiv preprint arXiv:1805.02474 (2018).

https://medium.com/the-downlinq/introducing-the-spacenet-road-detection-and-routing-challenge-and-dataset-7604de39b779



https://viirsland.gsfc.nasa.gov/index.html

https://viirsland.gsfc.nasa.gov/index.html

https://ngdc.noaa.gov/eog/viirs/download_dnb_composites.html

street smarts: measuring intercity road quality using deep ...gabca/docs/... · street smarts:...

Documents