consumer data research centre masters research ...€¦ · masters research dissertation programme...

18
Masters Research Dissertation Programme Case Studies 2017 CONSUMER DATA RESEARCH CENTRE www.cdrc.ac.uk Edited by Guy Lansley 13 October 2017

Upload: others

Post on 07-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CONSUMER DATA RESEARCH CENTRE Masters Research ...€¦ · Masters Research Dissertation Programme Case Studies 2017 CONSUMER DATA RESEARCH CENTRE Edited by Guy Lansley 13 October

Masters Research Dissertation Programme

Case Studies 2017

CONSUMER DATA RESEARCH CENTRE

www.cdrc.ac.uk

Edited by Guy Lansley13 October 2017

Page 2: CONSUMER DATA RESEARCH CENTRE Masters Research ...€¦ · Masters Research Dissertation Programme Case Studies 2017 CONSUMER DATA RESEARCH CENTRE Edited by Guy Lansley 13 October

Foreword

The Consumer Data Research Centre’s Masters Research Dissertation Programme instigates

several student led research projects which seek to tackle topical problems put forward by

industry. Each year we invite representatives from major retailers and organisations which

handle consumer data to propose research ideas to Masters level students. Students from all

UK institutions are eligible to apply to undertake the projects via the CDRC website.

Successful students complete the research over the summer with joint supervision from their

academic tutors and their industry sponsor.

This year we welcomed 16 students from a diverse range of academic disciplines. Between

them they used a wide range of software packages and analytical techniques in their

research.

A selection of short summaries of their research have been provided in this document.

If you are interested in becoming an industrial partner or are a student wishing to find out

more please visit https://www.cdrc.ac.uk/retail-masters/ or contact Guy Lansley

[email protected].

Page 3: CONSUMER DATA RESEARCH CENTRE Masters Research ...€¦ · Masters Research Dissertation Programme Case Studies 2017 CONSUMER DATA RESEARCH CENTRE Edited by Guy Lansley 13 October

An exploratory analysis of the temporal fluctuations in footfall around Hammersmith towncentre in relation to Business Improvement District events

Adjoa Adu-Poku1, Michael G. Epitropakis1 and Livia Caruso2

1Lancaster University, 2HammersmithLondon BID

Project Background

HammersmithLondon BID seeks, as part of itsaims as a Business Improvement District, topromote the attractiveness of the Hammersmitharea. A result of the promotion includes animproved visitor count, which is understood tooffer local businesses greater sellingopportunities. One way it aims to increase theappeal of the area is through locally-hostedevents that occur mainly in the summer andwinter periods. This project is responsible fororganising temporal fluctuations in footfall, insuch a way that HammersmithLondon mayrecognise how their events could becontributing to the cause of pattern changes.This project is an exploratory analysis thatapplies a range of tools in order to makeinferences from a set of time series data. Thetools used are implemented effectively in theexisting literature in a wide range ofapplications.

Data and Methods

The data consists of an hourly count ofpedestrians, captured throughout one year,across 6 wi-fi sensors located across the centraland western areas of the BID. The data can bepresented as a time series, which seesfluctuations, as well as observations that maybe considered anomalies.

The 6 tools applied to the data are 1. Timeseries decomposition, 2. Change point detection(via a binary segmentation algorithm), 3.Anomaly detection, 4. Point data mapping andanimation 5. Point data interpolation and 6.Principal component analysis. These tools wereemployed through packages on R Studiosoftware and ArcGIS.

The first 3 of these tools compose a time seriesanalysis, which allowed the comparison ofpositive anomalous footfall events to a calendarof HammersmithLondon events, in order toanalyse for correlations. This allowed theproject to make inferences about which events,or types of events, served as a part of thecause of high footfall counts. The final 3 toolsmade up the spatial analysis element. These 3methods allowed us to see the relativeimportance of the sensor data in each location,and justified the omission of 3 sensors from thetime series analysis.

Key Findings

Figure 1 presents an example of those positiveanomalies detected from the time series data ofone sensor. Through joining these results withcorresponding specific events falling within theanomalous dates and times, and change pointdetection results to specific weeks and seasonsin the year, the project was able to determinethe most important events thatHammersmithLondon hosts.

Figure 1. Anomaly detection graph for the ‘LyricSquare’ sensor

The empirical results indicate that BID-hosted‘Yoga in the Square’ and ‘Wimbledon in theSquare’ events are most likely to causeabnormally high footfall rates. Results are alsoexplained in the context of categories. Of thesecategories, ‘Big Screen’ and ‘Fitness, Health andWellbeing’ contained the most significantlycorrelated events to positive anomalous footfallresults. ‘Virtual Reality Cabin’ and‘Hammersmith Movie Hub’ events, in particular,were found to correlate the least withanomalous results. Change point detection inthe Lyric Square trend level also indicatedimprovements in the footfall counts during theBID Summer Festival period.

Value of the Research

The results of this study have given the partnersome evidence to suggest that certain eventsare drawing in more visitors than others,helping to justify spending in specific areas.HammersmithLondon can also justifyinvestment into an increased number of footfallsensors in the area to improve the spatialanalysis. The study also provides a positivecase for the implementation of electronicfootfall sensors for industries which areconcerned with visitor counts as a response topromotional activities or events.

Page 4: CONSUMER DATA RESEARCH CENTRE Masters Research ...€¦ · Masters Research Dissertation Programme Case Studies 2017 CONSUMER DATA RESEARCH CENTRE Edited by Guy Lansley 13 October

Customer pathway and in-store activityUlzhan Bissarinova1, Daniel Hulme1, Constantinos Gavriel2 and Emilie Lhomme2

1University College London, 2Sainsbury’s

Project Background

It is desirable for retailers to understand thepathways customers use to move around the storewhile purchasing their products. Pathwayinformation gives insights on the space occupancyrate and might allow store plans to be reallocatedmore efficiently. The most promising state-of-the-art indoor tracking methods involve usage ofclosed-circuit television cameras (CCTV), Wi-Fi,Bluetooth, ultra wide-band, Long-Term Evolution,inertial measurement units and near-fieldcommunication, whose work principle is based onelectromagnetic fields. These methods come witha high cost of implementation, as well asinaccuracy problems related to the distortion ofsignals. Unlike the signal-based methods, themethod proposed in this work does not requireinvolvement of any additional hardware and reliespurely on transactions and floor-plan data.

Data and Methods

The floor plan of one specific store along with itstransaction data recorded during a four weekperiod were analysed for this research. Theanalysis included obtaining x, y-coordinates ofcustomers along their predicted paths, creatingclusters of similarly visited sections andgeneration of association rules for the productsfrequently bought together. The family of shortestpath algorithms was considered to estimate thepotential path of clients, among which includedDijkstra’s algorithm. A K-means algorithm wasimplemented to form clusters based on thenumbers of “views” and sales volume of eachsection. Number of “views” indicates howfrequently a particular section was passed by andtherefore was viewed by a visitor. Finally, theassociation rules were used to identify stronglyrelated product pairs and to observe how distantthose products are located from each other.

Key Findings

Aggregated pathways show that overall, morningswere less busy and the store was busiest duringthe lunchtime rush. A clear path that was followedby the majority of the visitors during lunchtimeson weekdays included sections ”Snacking &Sharing”, ”Fresh Pizza Bread & Pasta”, ”FreshFish”, ”Fresh Poultry”, ”Bacon”, as well as”Yogurts” and ”Cheese”. Figure 1 illustrates thestrength of related products against the distancebetween them. The product pairs that have highconfidence and high distance values couldpotentially be placed more closely together toimprove their sales. The top 3 examples withhighest confidence and traveled distance values(distance>164m, confidence>0.75) include

sections “Fresh Fish” -“Veg & Salad”, “FishCounter”- “Veg & Salad”, “Potatoes” – “Veg &Salad”.

Figure 1. Confidence of associated products (>0.01)against distance between them

Value of the Research

This methodology has several benefits. First of all,it presents a tool to visually represent thecustomer pathways within a given store. This canbe used to estimate how customers navigate astore and occupancy rate of areas according toseasons, days of week and times of day. Thisinsight could be used to create recommendationson better store layouts and to understand theeffectiveness of promotions. A betterunderstanding of customer pathways may allowstores to increase the sales of complementaryproducts and impulsive purchases. In addition, itcould be used to minimise congestion and improvecustomer satisfaction.

Generally, the methods discussed in this work areapplicable to all types of retail stores whereconsumers browse items across the store floor.The core advantage of the approach is that theretailers don’t need to invest in expensive (andpotentially undesirable) tracking technologies andcan instead take advantage of data which areroutinely collected by transactions.

Page 5: CONSUMER DATA RESEARCH CENTRE Masters Research ...€¦ · Masters Research Dissertation Programme Case Studies 2017 CONSUMER DATA RESEARCH CENTRE Edited by Guy Lansley 13 October

Customer value modelling – how sensitive is the model to change?Aidon Blong1, Simon Maskell1, Nicola Dunford2 and Tony Birch2

1University of Liverpool, 2Shop Direct

Project Background

This dissertation explores how sensitivedifferent implementations of a customer valuemodel within the retail industry can be. Existinglinear stepwise regression models that arecurrently in place are able to predict howvaluable a segment of customers are going tobe worth in the future, more specifically interms of a monetary aggregate value. Thepurpose of the project is to better the existingmodels in terms of lowering the overall errorscore of the predicted values.

Data and Methods

The project used the same data set that thepartner’s existing customer value model uses,which ensured fairness when comparingdifferent models. The dataset consisted of manydifferent categories of data such as purchasehistory, browsing history and other keystatistics regarding customers. Hypothesistesting was used as a means of proving anddisproving certain statements of belief aboutthe project. Initial hypothesis outlined thatexperiment 3, which used a random forestversion of customer value model, would yield alower overall error score compared to theoriginal customer value model. Statisticalsignificance was used to assess whether eachcustomer value model held a statisticalsignificance over the other.

Key Findings

The initial hypothesis was proved true, and therandom forest implementation of the customervalue model was able to outperform theexisting linear regression model in terms ofRMSE (Root Mean Square Error) score. Othererror scores such as MSE (Mean Squared Error)have also been applied to the differentalgorithms, with no significant difference interms of the clear winner.

However in terms of the accuracy of themodels; linear regression outperformed randomforest. Note that accuracy is not the same as‘classification accuracy’ and simply defines howclose the predicted target value aggregate is tothe actual target value. The accuracy and errorimpact is something which is commonly knownas a bias-variance trade-off; which given moretime to investigate could explain why the twoalgorithms score differently for both error andaccuracy. Experiment section 2 detailedmultiple different attempts to select the ‘best’

variables from the data set, relative to thetarget variable. Some of the methods includedPearson/Spearman correlation, allowing forlinear/non-linear relationships, as well asdifferent entropy measures. Each of theexperiments conducted are outlined withinFigure 1.

Figure 1. The overall average scores per 10-foldcross-validation on the test data sets

Value of the Research

The research conducted provided some valuablefeedback for Shop Direct to review; even toconsider changing their existing model. Theproject provided Shop Direct with an insight asto how a random forest can outperform linearregression in terms of error score, however,there are several different methods such asBART (Bayesian Additive Regression Trees)which have the potential to yield even betterresults.

Figure 2. Experiment 1: Stepwise linear regression

Page 6: CONSUMER DATA RESEARCH CENTRE Masters Research ...€¦ · Masters Research Dissertation Programme Case Studies 2017 CONSUMER DATA RESEARCH CENTRE Edited by Guy Lansley 13 October

Estimate impact of slot availability on customer demandby using choice-based demand modelsYifan Cao1, Arne Strauss1 and Matthew Pratt2

1The University of Warwick, 2Sainsbury’s

Project Background

The increasing sales through online channels hasmade large supermarket retailers seek newstrategies to meet this surging customer demandin online grocery shopping. However, no matterwhat new service retailers aim to provide on e-fulfilment, it is important to strike the balancebetween delivery efficiency and customersatisfaction. A suggested approach is to adoptrevenue management techniques to help betterunderstand customer choice behaviour towardsdelivery time selection considering slotsavailability and their prices. This research soughtto demonstrate that it is useful to analyse theimpact of slot availability and price on customerdemand for delivery services. It also presentedmethods to assist the planning of resourceallocation and capacity management policies toimprove the efficiency of services.

Data and Methods

Data used for this research was collected fromSainsbury’s nationwide online transaction recordsover a period of 26 days, from March 12th 2017 toApril 6th 2017. It included information consideredrelevant for customer decisions, such as slotsavailability and their prices. The provision of slotinformation for non-purchasing customers gave abetter theoretical basis for the analysis since itavoided the approximation for the non-purchasebehaviours.

Two independent multinomial logit (MNL) modelswere used to estimate the complete delivery slotselection process, one focused on the day whilstthe other considered the slot selection. Althoughboth models are independent, their results can becombined to calculate the likelihood of a customerchoosing any slot throughout the week. Factorsthat were used as variables for the delivery dayselection MNL model were capacity ratio (% ofavailable slots on a delivery day), price averageand customer arrival day. Factors that wereassumed to affect slot choices were slotavailability and their prices.

Two validation methods were used to assess thepredictive powers of the models. The first methodcompared the modelled results between theestimation data set and validation data set, whilethe second approach compared the computedmodel results with real data of slot selections.

Key Findings

The results from both MNL models providedsignificant insights on customer choices. Theresults from the delivery day selection model

indicate that when all slots are available, mostdemand fall on the first three days (78%) uponcustomer arrival, and there is a 13.72% chancethat a customer will not pick any slot. As thecapacity ratio of each delivery day is lowered inturns, this probability rises but does not exceed20%. However, it is indicated that a decrease incapacity ratio of delivery days closer to customerarrival results in a greater negative impact.

The delivery slot selection model resultscomprised two parts. Firstly, without consideringprice, findings identified that customer behaviourtowards delivery slots varies significantly betweenweekdays (Monday to Thursday) and weekends(Friday to Sunday). Morning slots were found tobe more popular than the others during theweekend, while early afternoon sessions weremore popular through weekdays. After introducingdifferentiated prices as a leverage to smoothdemand, the choice behaviours were found to bealtered and more balanced. Another interestingfinding was that with and without price impact,half hour slots (e.g. 13:30 – 14:30) were found tobe usually less popular than their closest hourlyslots (e.g. 13:00 – 14:00).

Figure 1. Estimated attractiveness of Monday Slots

Value of the Research

The impact of slot availability on demand can beinferred by combining results from the deliveryday and delivery slot selection models withdemand forecasts. These insights are valuable indaily delivery capacity management and forimproving service delivery. For instance, theresults can be used for an effective resourceallocation policy focusing on client satisfaction anddelivery capability. The results also providedfurther evidence on MNL models suitability inestimating consumer behaviour, especially in thefield of e-fulfilment where validation tests arerarely published.

Page 7: CONSUMER DATA RESEARCH CENTRE Masters Research ...€¦ · Masters Research Dissertation Programme Case Studies 2017 CONSUMER DATA RESEARCH CENTRE Edited by Guy Lansley 13 October

How competitive is propensity score matching using ‘address embeddings’ with supervisedclassification for record linkage tasks?

Sam Comber1, Dani Arribas-Bel1 and Ronald Nyakairu2

1University of Liverpool, 2Local Data Company

Project Background

The growing availability of Open Data hasopened the opportunity for businesses to enrichtheir existing insight from data sources. Inparticular, a competitive advantage exists in theanalysis of integrated data compared toanalysing data in isolation. Yet, in reality, mostreal-world databases are noisy, inconsistent orreplete with missing values; these issuescomplicate the integration of data. The processfor disambiguating record pairs that representthe same entity into matches and non-matchesis known as record linkage.

In this project, an innovative technique iscompared with a supervised method forpredicting the match status of address pairs.

Data and Methods

Our record linkage methods are tested oncommercial address databases maintained bythe Local Data Company (LDC) and ValuationOffice Agency (VOA). Given the Cartesianproduct of the 700,077 and 1,978,830 recordsfor the LDC and VOA databases creates a 1,311comparison space of candidate record pairs, werequire the addresses to be ‘blocked’ due tocomputational complexity. This means wepartition the number of comparisons for linkageto within mutually exclusive blocks by postcodevalue.

Once partitioned, two methods for linkage areemployed. Our innovative technique usespropensity score matching on the vectorrepresentations of addresses learnt from aneural-based language model. Here a 100dimension vector is learnt using the word2vecalgorithm to create an ‘address embedding’ forevery address. Propensity score matching onthese vectors is then used to facilitate thelinkage.

Our second method begins with using thelibpostal address parser to segment eachaddress string into feature columns – forexample, business name, street number, streetname. These features become the basis forgenerating ‘comparison vectors’ for eachcandidate record pair. Here, each element inthe vector is a similarity score between thefeatures columns. This score might, forexample, indicate how similar two street namesare from one another. The comparison vectorsare then classified as matched or non-matched

using decision tree ensemble learners. Thesemodels are trained on address pairs with aknown match status obtained from a previousround of matching between LDC and VOAaddresses.

Key Findings

Precision, recall and F1 scores in Table 1 areinterpreted as percentages. Our ideal outcomeis to minimise misclassification error – i.e. thenumber of false positives and false negatives.For this reason particular attention is paid tothe F1 score which balances the trade-offbetween false negatives and false positives. Asshown, the random forest and XGBoost modelfar outperform propensity score matching onaddress embeddings.

Table 1. Quality metrics for classifications

Value of the Research

The untidiness of Open Data complicates theprocess of linkage. Innovation in the methodsemployed for record linkage problems has thepotential to improve the richness of knowledgediscovery from data sources when this data canbe successfully linked.

Our proposed method for learning the vectorrepresentations of addresses for linkage viapropensity score matching was only able tomatch 18.1% of address pairs. Nevertheless,our supervised classification workflow was ableto match up to 99.5% of addresses correctly.We attribute this to libpostal’s segmentation ofaddresses into accurate feature columns forcomparison. In summary, the availability ofopen source libraries such as libpostalrepresents an opportunity for companies toachieve high quality match rates for recordlinkage tasks.

Page 8: CONSUMER DATA RESEARCH CENTRE Masters Research ...€¦ · Masters Research Dissertation Programme Case Studies 2017 CONSUMER DATA RESEARCH CENTRE Edited by Guy Lansley 13 October

An evaluation of the relationships between construction developments and retail vacancyMatthew Grove1, Les Dolega1, Ronald Nyakairu2, Adam Valentine3 and Steve Shelley3

1University of Liverpool, 2Local Data Company, 3Barbour ABI

Project Background

Vacancy rate is considered to be a key indicatorwhen assessing retail centre performance, yetvery little literature has been producedevaluating the relationship betweenconstruction developments and retail vacancy.Suggestions have been made thatdevelopments in surrounding areas may impactthe performance of a retail centre, yet noprevious studies have undertaken quantitativeanalysis of this. An increase in the proportion ofvacant retail units is generally linked toeconomic decline within an area, thereforeestablishing the factors that influence vacancyis valuable. This study explores this novelconcept and attempts to establish the strengthof the relationships present between dataprovided by Barbour ABI and the Local DataCompany.

Data and Methods

The relationships between constructiondevelopments and retail vacancy within thearea of Greater Manchester were assessedusing multiple linear regression andgeographically weighted regression. Fixedeffects were included within the linear model toaccount for spatial heterogeneity. Fourcontrasting drive distance catchments werecomputed using road networks and retailcentres and their performance evaluated acrossfour different time-span groupings. From theoptions evaluated, it was determined thatvariable drive distance catchments relative toretail centre size best represented the data overthe course of a bi-yearly twelve-monthtimespan. Four categories of constructiondevelopments were then evaluated in relationto Comparison, Convenience, Leisure andService vacancy, alongside a variable thataccounts for retail vacancy collectively. Model fitfor both linear regression and geographicallyweighted regression was assessed throughout,in order to establish the strength of therelationships present.

Key Findings

The results obtained within this study arecomplex, allowing for generalisations to rarelybe made across each of the variables andareas. Residential developments predominantlyyield a positive relationship with reduction ofretail vacancy, with Stockport being the onlyarea to display a negative relationship. Results

from human amenity, industry and transportservice developments vary considerably,emphasising that the relationship betweenconstruction developments and retail vacancy isnot unanimous across time and space. The levelof detail and complexity within the results ofthis study suggest that the relationship betweenconstruction developments and retail vacancycannot be generalised to all areas, and shouldbe considered on a case by case basis in futurestudies.

Figure 1. Construction developments between 2011 –2017 within the area of Greater Manchester

Value of the Research

The exploration of this novel concept has giveninsight into an area that has received relativelylittle attention within previous literature,allowing for a better understanding of thefactors that influence such a key element ofretail health. Contrasting two datasets frominherently different sectors has allowed forrelationships to be established that were notachievable before. This not only allows forbetter insight into the relationship betweenconstruction developments and retail vacancy,but also allows for a more pragmatic approachto be undertaken when assessing futuredevelopments.

Page 9: CONSUMER DATA RESEARCH CENTRE Masters Research ...€¦ · Masters Research Dissertation Programme Case Studies 2017 CONSUMER DATA RESEARCH CENTRE Edited by Guy Lansley 13 October

Electric vehicle charge point placement optimisationJon Haycox1, Guanpeng Dong1, Jeeah Hwang2 and Katie Hendry2

1University of Liverpool, 2E.ON UK plc

Project Background

There is increasing demand for plug-in andhybrid vehicles in the UK as they are becomingmore affordable due to decreased battery costsand lower manufacturing costs. In order for thegrowth in electric vehicles to continue,consumers need to be reassured that there isadequate charging provision, with a recentsurvey showing almost half of UK consumersare worried that they would not be able to findan available, working or compatible chargepoint (CP). E.ON is an international, privately-owned energy supplier. As part of itssustainability programme, coupled with theincreasing need to focus on the environment,E.ON is looking to assess the market for newpublic charging points across E.ON sites and thefeasibility of incentivising its businesscustomers to operate public charging stations.

Data and Methods

A regional-level optimisation model isconsidered for Great Britain, with a full analysisperformed for the London and North Eastregions. Points of Interest (POIs) data areexploited in the analysis of CPs because theyare potential locations where charging isneeded, i.e. where people are likely to drive toand spend time. The number of POIs withinwalking distance of existing CPs have been usedto rank the POI categories in terms ofimportance. Workplace population and nighttime population from the 2011 census, andtraffic data and EV registration locations fromthe Department for Transport were alsoconsidered as potential demand indicators forCPs. The data for these variables wasresampled to a 1 km grid so they could bejoined with the POI data to build a box factormeasure representing demand for CPs in a cell.The maximum coverage location problem wassolved for the demand function to obtaincandidate cells that maximise the total demandcoverage such that the overall demand iscovered in the most effective way. This wasperformed using the lpSolve package in R anddisplayed using ArcMap. A sensitivity analysiswas then performed to determine the bestvariables at matching existing CPs.

Key Findings

From our statistical models, a positivecorrelation was found between workplacepopulation and CP intensity, which is consistent

with the theory that people are likely to chargetheir cars in places where lots of people work,as well as at their workplaces. The modelcombining workplace population and POI in theanalysis, with the weighting two-thirds to one-third respectively, has shown to bequantitatively the best for the London region(Figure 1) and North East region. In both areas,the model selecting candidate sites appears tomatch the actual CP locations best in the areawith highest workplace values, i.e. in the denseurban areas. In the suburbs of London thepattern is very similar but there is a poor matchfor the rural areas in the North East. A simpleproximity analysis has shown that even in adense, urban environment like London there arecandidate cells that are nearly 3 km away froman existing public site.

Figure 1. Optimised CP locations (green cells) andexisting CP locations (blue points) in London.

Value of the Research

This study has successfully produced a proof-of-concept method for optimising the location ofCPs at a regional level and builds upon previouswork by including new datasets, such as EVregistrations, in order to generate betterinsight. The methods presented in this studyprovide a way for E.ON to identify candidateB2B sites that are suitable for further local- andstreet-level investigation. This research alsosupports a major energy company in improvingthe future suitability of the UK infrastructure toelectric vehicles and, therefore, supports carbonfootprint reduction initiatives.

Page 10: CONSUMER DATA RESEARCH CENTRE Masters Research ...€¦ · Masters Research Dissertation Programme Case Studies 2017 CONSUMER DATA RESEARCH CENTRE Edited by Guy Lansley 13 October

Pricing optimisation - a key tool in successful retailingErin Knochenhauer1, Graham Clarke1 and Christian Setzkorn2

1University of Leeds, 2Shop Direct

Project Background

Few decisions impact the success of a retailermore than product pricing optimisation. Itrequires balancing a multitude of tradeoffswhich together determines a company’ssuccess. The decision maker has the difficulttask of selecting the right price when faced withmultiple, competing objectives such asmaximising profits and revenue whilstminimising the price distance betweencompetitors and maintaining proper inventorylevels. Optimal prices are essential to themarketing mix and to the business as a whole.An inaccurate price could lead to significantforfeited profit, inflated inventory managementcosts from excess stock or low consumersatisfaction due to stock shortages. Thus, themain objective for this project was to exploremethodologies that produce sets of optimalprice alternatives for a sample of products at agiven time at the individual product level.

Data and Methods

Non-dominated Sorting Genetic Algorithms(NSGA) were used for this project for theirefficient, flexible methodology and limited inputdata requirements. The project first requiredcalibrating the initial demand forecastingmodels using daily historical sales data at theindividual product level to better predictrevenue and profits in the ultimate goal of usingtheir estimated parameters as revenue andprofit maximising objective functions in thepricing optimisation. The competing objectiveentailed minimising the distance from theaverage competitor price. Sets of time-dependent optimal price solutions wereproduced for 13 electronic products within ShopDirect’s ‘Very’ fascia utilising various functionalforms to model revenue and profit. The pricingoptimisation was completed using Liger, anopen-source optimisation environment, andPython. The demand forecasting was completedwith R. This study utilized both NSGA-II andNSGA-II-PSA algorithms, the latter is apartition-based selection algorithm.

Key Findings

For the Electronics trading department as awhole, the linear-log demand model, includingprice, promotion, a price-promotion interactionterm and seasonal dummy variables aspredictors of demand, had the best out-of-sample prediction accuracy. The log-log

demand model consistently showed evidence ofoverfitting, and the additional terms ofcompetitor price ratio and substitutes price ratioas well as their measures of variance did notgreatly reduce out-of-sample prediction error.Linear models are not the most robust choice ofdemand modelling, as predictions errors areinflated, however, their low complexity is auseful choice as objective functions in pricingoptimisation for scalability. Depending onbusiness needs, prices could be updated daily,weekly or monthly by changing which historicaldata are used to forecast demand. Finally, therewas no significant difference between theperformance of the NSGA-II-PSA algorithmcompared to NSGA-II when deriving optimalprices for the electronic goods.

Figure 1. Actual (blue) vs. predicted (yellow) demand for aparticular product

Value of the Research

This project served as a Proof of Conceptutilising Genetic Algorithms for the multi-objective optimisation problem of settingoptimal prices. This methodology produces arange of optimal solutions the Decision Makercan choose from promoting competitiveadvantage within the dynamic electronicmarket. Though a small sample of products wasassessed, the results can be generalised to thegreater electronics department. Further study isrequired to generalise to other productcategories before scaling up to the entireproduct portfolio. Overall, this project was avaluable base on which to build Shop Direct’sin-house pricing optimisation workflow.

Page 11: CONSUMER DATA RESEARCH CENTRE Masters Research ...€¦ · Masters Research Dissertation Programme Case Studies 2017 CONSUMER DATA RESEARCH CENTRE Edited by Guy Lansley 13 October

Exploring submarket trends in London: a constrained spatial interaction modelling analysisNeelam Kundi1, Adam Dennett1 and Faisal Durrani2

1University College London, 2Cluttons

Project Background

This study investigates the characteristics of theLondon housing market and the effects ofvariables such as house price and accessibilityon commuter distance through spatialinteraction modelling. Observations fromCluttons data show that the housing market inLondon is characterised by high prices andturnover. Furthermore, it may be that on closerinspection of the capitals’ regions there couldbe more desirable areas for people to live in tocommute to work. This is a complex market todelve into. A new geography is formed toprovide an improved outlook to niches withinLondon, offering another perspective on areasand allowing for Cluttons and census data to beutilised in new ways to uncover and exploremicrotrends for a deeper understanding of aremarkable urban centre.

Data and Methods

The submarket categorisation utilised byCluttons for real estate grouping and analysisform the basis of the methodology. As thisgeography has not been represented before, amap has been created using thiessen polygonanalysis and overlaid with Census boundarydata at the medium super output area level.

To explore the relationship between house priceand migration flows, constrained spatialinteraction models are used. Variables for theinteraction model are arranged to ensure thecoefficients obtained from the model areinformative. Variables determined to be mostuseful include an accessibility index, relativemedium house price and mean net weeklyincome (destination to origin ratio).

Attraction constrained models allow insight intothe number of people that have left a givenarea, and aims to explore which variableexplain people’s destination choice. Productionconstrained models allow insight into thenumber of people arriving at a destination, andaims to explore which variables explain theirorigin. The two constrained models form oneset - three sets of these models are run, thefirst explores the entire research area whichconsists of Submarkets and Travel to WorkAreas (TTWAs). The second explores onlysubmarket areas using all variables. The third isalso based on the submarket area, but usesCluttons data on property prices.

Key Findings

The first set of models looking at movementbetween submarkets and TTWAs show thatalthough the most important variable indictating flows for the production constrainedmodel is house price to net annual income ratio,people have an aspiration to live in moreexpensive areas rather than commute intoLondon from surrounding TTWAs. The attractionconstrained model shows that people living insubmarkets have a desire to move to moreaffordable places such as Slough and Heathrow,both of which hold the highest estimates. Inother words, living in London is an incentive tomove out towards the TTWAs.

Similar trends are seen in the next two sets,where submarkets such as Hammersmith arean attractive destination for those looking toenter the London housing market. However,those already residing in these submarkets willbe looking to move into a more attractive, andoften higher priced submarket such as HollandPark. These results link in with existingliterature on the property desirability whichsuggests such trends, where an individual’sdesirability changes based on their changingfinancial position.

The results suggest the presence of a propertyladder within London whereby South Easternand South Westerns submarkets are moreaffordable and therefore are more attractive asdestinations for those who move from TTWAs orcheaper submarkets. Prime Central Londonappears to be the most favoured origin, withresidents less likely to move away despite everincreasing property prices.

Value of the Research

The study allowed for a new geography to becreated based on Cluttons property price dataper submarket. From this data visualisationcould be carried out to provide an insight intothe net flows of people between submarketsand then using spatial interaction models to findvariables which best explain these flows.Furthermore, modelling origin emissiveness anddestination attractiveness valuable trends in theSubmarkets gives evidence of a clear housingladder effect within the capital, which isbecoming increasingly difficult to move up dueto high property price to income ratios

Page 12: CONSUMER DATA RESEARCH CENTRE Masters Research ...€¦ · Masters Research Dissertation Programme Case Studies 2017 CONSUMER DATA RESEARCH CENTRE Edited by Guy Lansley 13 October

Where next for Parsons Bakery? A preliminary quantitative location assessment exploring thelink between demographics, competition and other variables.

William Laing1, Richard Harris1 and Tom Curling2

1University of Bristol, 2Parsons Bakery

Project Background

Parsons Bakery competes in a dynamic marketand each year it needs to adapt to new trends,deal with new threats and connect with thelocal community to provide the best chance ofbeing successful. Store location assessment hasbecome a critical element of retail strategy.Location and finding the right community is keyto company profit. Parsons Bakery, a Bristolbased family run business of 46 bakeries plansto expand its network and needed better spatialintelligence in order to make the best decisions.

The aim of the research was to determine theoptimal sites for future expansion using locationassessment analysis. The adoption of anoptimal location strategy has the potential toprovide retailers with a competitive advantageover their rivals. The research itself, focused onSouth West England and South East Wales dueto the coverage of Parsons’ store network.

Data and Methods

The initial part of developing a quantitativelocation assessment is determining what acompany requires of its locations. Firstly,Geographic Information Systems (GIS) wereused to identify the geodemographic groupsgeographically associated with Parsons’ bestperforming bakeries. This information is used totarget new locations and produced 53 uniqueLower Super Output Areas (LSOAs).

A catchment-cum-analogue approach was thendeveloped in order to systematically identify thefactors that are associated with highperformance bakeries. Two catchment areaswere created for each existing bakery. UsingParsons’ 2016 financial performance figures from existing bakeries combined withcatchment area variables calculated fromcensus statistics, six performance-forecastmodels were then designed. Overall eighthotspots matching the identified geodemographic subgroup were revealed forfurther performance testing.

Key Findings

Using six performance models a number ofpostcodes within Gloucester were revealed asthe areas where a high profitability andcustomer number could be expected. In thissetting, a bakery based in Gloucester iscalculated to produce about 35% more sales

than the sales expected from the otherlocations within the study area investigated byperformance modelling. Therefore, Gloucesterwas recommended as a prospective location forfurther investigation into the uniquerequirements of Parsons Bakery.

Figure 1. The geodemographic hotspots of LSOAsaround Gloucester

Value of the Research

Parsons, like many other small and mediumsized businesses, has not historically been ofsufficient size to have the technical resourcesand capital investment required to carry outspecialist location planning research. Acomprehensive location research strategyrequires many other factors to be taken intoconsideration.

Quantitative analysis provides a useful startingpoint. The analysis conducted in this research isa specific, measurable and reproducibleapproach to planning, which is affordable andreduces subjectivity. The catchment-cum-analogue models designed in this researchavoided analytical bias as the independentvariables are given uniform importance. Whilsttheir functionality was limited by the staticnature of census data, the models are designedso that they can be updated when new databecomes available.

Other retail companies that currently do notconduct their own quantitative locationassessment are recommended to do so. Whilstother retailers may follow a similarmethodological process, the importance ofcorresponding retailer specific locationrequirements to their overall corporate strategyand corporate goals cannot be overemphasised.

Page 13: CONSUMER DATA RESEARCH CENTRE Masters Research ...€¦ · Masters Research Dissertation Programme Case Studies 2017 CONSUMER DATA RESEARCH CENTRE Edited by Guy Lansley 13 October

Propensity modelling – moving beyond standard regression models & explorations intoonline engagement

Mary E. Lennon1, Guanpeng Dong1 and Tony Birch2

1University of Liverpool, 2Shop Direct

Project Background

Online retailers (e-tailers) have not only cast asidebrick-and-mortar locations but also establishednew methodologies for customer focusedinitiatives. In their place they are exploring BigData and associated statistical techniques andmachine learning algorithms, to understand theircustomer’s engagement. Broadly, engagement isa customer’s spend (transactional) and emotional(non-transactional) commitment to a brand thatcan be quantified for customer acquisition andretention efforts. However, engagement tends tobe ethereal and is difficult to define and leverage.This dissertation has consequentially situated itselfbetween theoretically defining and usingengagement for statistically driven customerfocused initiatives. To achieve these goals, thisstudy had two aims:

1. Explore random forest, a machine learningalternative to traditional regression framework, forpropensity modelling in the context of onlineengagement.

2. Improve current theoretical models onengagement through the addition of individualcontextual information. Specifically, through theinclusion of demographic and geo-demographicvariables.

Data and Methods

This study primarily used data from Shop Direct(SD), a prominent UK e-tailer, specifically, datafrom their current engagement model. Eachobservation was an active customer accountbetween 1st January – 1st April 2017, for theirlargest brand, totaling 2.2 million observations.This study also considered the age and gender ofthe customer’s from SD’s customer account file.The Internet User Classification (IUC) and RetailAttractiveness Scores supplemented SD’s internaldata to test the effects of geographic contexts,such as proximity to retail centres, on customerengagement. The data were included in aderivation of the random forest algorithm entitled‘Bag of Little Bootstraps’ (BLB). BLB was coded inR and employed to tackle the memory andcomputational issues that arose when analysingSD’s Big Data. The technique saves time andmemory by running optimised bootstrappingacross parallel cores on a computer whileproducing results that are as robust and reliableas the traditional random forest algorithm. Toassure robustness, the results were crossvalidated.

Key Findings

This study had three key findings. First, BLBrandom forest offered notable improvements tothe logistic regression baseline, with an averageincrease in performance of 1%. Second, theresults were robust. The model error rates had afluctuation of +/- 0.15% between each validationtest. Third, the top performing model was thatwhich included all available demographic and geo-demographic variables indicating that individualcontext matters. Beyond model performance,logistic regression’s coefficients and randomforest’s variable importance shed light on therelationship of the variables to engagement. Keyamong these are that proximity to retail centresaffects a customer’s propensity to engage with ane-tailer, such as SD. Further, there is complicatedrelationships between IUC and the SD data.Customers in all IUC groups except for the mostunlikely, elderly and unconnected, had a negativerelationship with customer engagement. Theresults for IUC may be muddled due to thevarying nature of the relationships caught by boththe classification system and engagement model.For effective use of IUC additional exploration ofthese relationships is necessary. Regardless of thenature of the relationship, IUC and primary retailcatchments were important contributions to thetop performing model.

Table 1. Comparing Model Performance

Value of the Research

Online engagement is critical to many e-tailers.Despite this, theory on how to measure it islimited and non-theoretical examples of itsimplementation are currently unknown. Thisproject offers a starting point for crucialdiscussions into defining and predictingengagement. Further, it has proven the potentialof machine learning techniques as well as thebenefits of demographic and geodemographiccontextual information to engagement models.They provide notable improvements in modelperformance and increase insights into consumerbehaviours.

Logistic Regression v. Random Forest

VariablesModel

No.Training Testing

SD Variables M₁ 0.2284 0.2273

SD Variables + Demographics M₂ 0.2281 0.2273

SD Variables + IUC M₃ 0.2289 0.2278

SD Variables + RetailCatchments

M₄ 0.2289 0.2276

SD Variables + Demographics+ IUC + Retail Catchments

M₅ 0.2282 0.2270

Page 14: CONSUMER DATA RESEARCH CENTRE Masters Research ...€¦ · Masters Research Dissertation Programme Case Studies 2017 CONSUMER DATA RESEARCH CENTRE Edited by Guy Lansley 13 October

An exploration of mobile data:towards proximity based passenger sensing on public transport

Christian Tonge1, Dani Arribas-Bel1 and Daniel Stockdale2

1University of Liverpool, 2Movement Strategies

Project Background

Urban public transport systems have traditionallybeen planned with limited understanding of theircustomers travel patterns. Ticket gates andfareboxes yield disaggregate counts of passengeractivity on specific stations, vehicles and individualpassengers. However, greater information aboutorigins, destinations and interchanges of individualpassengers are typically acquired through muchsmaller samples of costly and therefore infrequentpassenger surveys. Substantial research has beenconducted on more intelligent systems forpassenger monitoring, although little work hassought to capitalise on the wireless mobile sensorsthat transport customers carry. As such this studyaims to evaluate the potential that a largegeolocated mobile device data set has for gaininga better understanding of urban transit users.

Data and Methods

A methodology is developed for proximity basedpassenger sensing on the London bus networkusing location data collected from the GlobalPositioning System, and to a lesser extentBluetooth. As the primary purpose of the data isfor providing hyper-contextual adverts and not toestimate public transport patronage, a significantamount of pre-processing is required on the rawdata.

The raw data is taken from 5 consecutiveweekdays from Monday 3rd to Friday 7th July2017, capturing a total of 66,179,367 data eventscomposed of 393,360 distinct users. Initialinvestigation rendered Bluetooth and speed lessuseful determinants of ridership despite capturingdaily means of 0.01% and 21.4% of total eventsas bus users respectively. Automated vehiclelocation data is used to enrich the GPS data,which is used to show 13,144 unique bus journeyson 3 London bus routes with origins anddestinations for each.

Key Findings

This study has offered the data filtering andmethodologies required to understand theboarding and alighting points using passengerproximity on the TfL network, offered ataggregations higher than previously permittedusing traditional automated data collectionsystems alone. Pre-processing techniques havehighlighted the issues of horizontal accuracy andusers with multiple devices within the data.

100% of journeys using the combined data haveorigins and destinations, a figure higher than anyprevious GPS study in the literature. The busiest

points of boarding and alighting are predominantlyat stations which infers polymodal travel withinthe network. The distance travelled by users tothe Oxford Street region can be used to indicatethat the area attracts users from over 15km away,highlighting high levels of retail centreattractiveness. The successful cleaning processesand methodology developed in this study willallow more conclusive results to be drawn on ODmatrices and other important insights. Suchinsights include temporal variability, daily andseasonal ridership patterns, whole networkcomparisons and data penetration levels.However, due to the novelty of the methodologyand the spatiotemporal resolution at user level itis difficult to justify the success of the studyagainst another form of data.

Figure 1. Origin Destination Matrix of Trajectories atBus Stop Level

Value of the Research

As the first attempt at defining a methodology forthis data, the time constraints prevent moreinteresting and certain insights from being gainedthan OD ridership levels. While the novelty of thismethodology makes it difficult to compare to otherdata sources, it has been shown to infer aspects ofpublic transport activity that are not immediatelyavailable from automated data collection systems.Implementing this methodology over a wholenetwork for a longer period would help to providemore useful insights for transport planners andmarketers alike. Despite this data being Londoncentric, the pervasiveness and ubiquity of GPSand the relative ease of implementing SDKapplications, makes this method reproduciblewherever vehicle location and passenger GPS dataare available.

Page 15: CONSUMER DATA RESEARCH CENTRE Masters Research ...€¦ · Masters Research Dissertation Programme Case Studies 2017 CONSUMER DATA RESEARCH CENTRE Edited by Guy Lansley 13 October

Identifying Socio-Spatial Inequalities in Student Housing in LondonTerje Trasberg1, Guy Lansley1 and Thomas Murphy2

1University College London, 2Knight Frank

Project Background

This project aimed to analyse existing spatialpatterns in student accommodation and, basedon student’s preferences, suggest the mostsuitable areas for students in London. Previousresearch on students’ residential decisionmaking has shown that students often clusterinto areas around the university, ignoring betteroptions available on the market. So, this studyproposed a computing multi-criteriaattractiveness index in order to select the mostsuitable areas for student housing. The resultsof the study should support students in makingmore informed residential decisions and serveas guideline for real estate consultancies intargeting areas for purpose built studentaccommodation.

Data and Methods

Existing spatial patterns and criteria for theindex were identified based on StudentAccommodation Survey results provided byUniversity of London Housing Department. Thedataset included 4,699 survey replies fromstudents who rent from the private sector(shared flats, studio, and landlord’s house). Thecriteria for the attractiveness index wereweighted using pairwise comparison afteranalysing the spatial inequalities of studentpreferences across London.

An open-source tool MCDA4ArcMap, designedfor spatial multi criteria analysis in GISsoftware, was applied to assign anattractiveness score to all the areas in thedataset. The areas with the highest score wereseen as the most suitable locations.

Key Findings

The attractiveness index was built upon thefollowing criterion: proximity to university, costof the area, safety and attractiveness of thearea (deprivation measure). Criteria wereweighted based on the spatial analysis ofstudent preferences. It was observed that thereare significant spatial inequalities in factors,which students hold important about thelocation of their accommodation.

To accommodate the spatial differences, theattractiveness index was first calculated usingthe examples of University College London incentral London and St George’s, University ofLondon, in south London. Those universitieswere selected due to their high volume of

respondents that were geographically clusterednear their campuses. For the UCL sample,proximity to university was ranked by studentsto be more important than cost. So, mostsuitable areas were expected to be close to theuniversity. Surprisingly, this was not the caseand instead Paddington and Maida Vale, whichare around 4km away from UCL, received thehighest attractiveness score. Those areas havelower weekly rents than areas with a highconcentration of students (for exampleCamden). Similarly, areas in the immediatesurroundings of St George’s University, wherestudent concentration is high, received lowersuitability scores compared to the othersurrounding areas due to lower areaattractiveness and higher crime rate.

The study concludes that to receive better livingconditions students should consider areas withlow student concentrations and not theimmediate surroundings of the university. Mostsuitable areas are around 3-6 km away fromthe university.

Figure 1. Percent of students that ranked each of thefactors as important

Value of the Research

This paper introduced a methodology to definethe most suitable areas for studentaccommodation based on student surveys onthe example of 2 universities. Similar analysiscould be conducted for all London universitiesto observe, if any areas stood out as generallythe most suitable. This would be valuable inputfor real estate consultancies which are planningto establish purpose built studentaccommodation in London and are looking tofind the best suitable locations and learn moreabout student preferences and expectations forthe accommodation.

Page 16: CONSUMER DATA RESEARCH CENTRE Masters Research ...€¦ · Masters Research Dissertation Programme Case Studies 2017 CONSUMER DATA RESEARCH CENTRE Edited by Guy Lansley 13 October

The impact of the night tube on Westminster’s Night EconomyPaula White1, Steven Gray1 and Dominic Baker2

1University College London, 2City of Westminster Council

Project Background

The City of Westminster has the largest eveningand night time economy (ENTE) in the UK; it islarger than the ENTE’s of Edinburgh, Birminghamand Manchester combined. The highestconcentration of Westminster night timeprovisions is found in the West End, an areacomprising of Soho, Leicester Square, PiccadillyCircus and Covent Garden – it is a major touristdestination within the UK and internationally. Inaddition, with the introduction of Night Tube inAugust 2016 to December 2016 there are now 8stations operating in and around the vicinity of theWest End for 24 hours Friday and Saturday nights.There is widespread interest from businesses,residents and policy makers to understand theENTE in the London borough of Westminster andthe West End in particular

Data and Methods

First, work was undertaken to understand theproblem domain, next data were collected,cleaned and reformatted, after which exploratorydata analysis were undertaken to give a solidfoundation for detailed discovery steps. Theanalysis for this study covered three keys areas;footfall and incidents, commercial premisesapplications and social media analysis. These wereundertaken using a number of techniques fromGIS, quantitative methods, data mining and socialmedia analytics.

Service data and commercial premises applicationdata has been provided by Westminster CityCouncil. The Consumer Data Research Centreprovided footfall data. This is from the SmartStreet Sensor project which is a partnershipproject between UCL and Local Data Company(LDC). A variety of social media was sourced viaIBM Watson Analytics, a data exploration andvisualisation tool.

Key Findings

Overall this study, has shown that the West End isstill undoubtedly a massive, highly dense, hub ofactivity.

There were four geographical areas of analysis inthe following areas; The West End, Victoria andsurrounding area, Marble Arch and surrounds andBayswater. Comparisons were made from areference footfall before the Night Tube and thevarious changes through and after its deployment.

There was a significant increase in footfall onFriday and Saturdays in parts of the West End by202% and 219% respectively. By contrast,Thursdays only have a 164% increase. In

Bayswater, Friday and Saturdays saw an 80%increase in comparison to 50% on Thursdays.Note that these are proxy values only so areindicative of areas of change rather than showingcomplete accuracy.

Figure 1. Saturday Night Noise Complaints

As can be seen from Figure 1, it was alsoobserved that street noise complaints are on theincrease. Analysis showed that many of thesecomplaints have a higher relative density in theWest End but there is also a relative increase inthe Victoria area.

Since the Night Tube deployment, the dataindicates some shifts in the locations ofapplications for commercial premises of hotels andpubs which are now more dispersed across theborough. There was also a significant growth inCasino applications, as well a short-term trendtoward applications for smaller capacity pubs.

London’s nightlife seems to have a positivesentiment on social media as does the Night Tubefrom a worldwide audience which reflect London’sglobal status as a vibrant and exciting world classcity.

Value of the Research

This dissertation found some answers to importantquestions about the direction the ENTE is taking inWestminster and starts to allow for betterconversations and thinking about the city ENTE. Italso provides some thoughts on future researchdata and approaches to further the understandingof this fascinating topic in greater detail.

Page 17: CONSUMER DATA RESEARCH CENTRE Masters Research ...€¦ · Masters Research Dissertation Programme Case Studies 2017 CONSUMER DATA RESEARCH CENTRE Edited by Guy Lansley 13 October

Can harnessing the demographic characteristics of store catchments improve the planningof promotions and pricing strategies?

Daniel Winder1, Nik Lomax1 and Muhammad Adnan2

1University of Leeds, 2Marks and Spencer

Project Background

The overarching aim of the study was to examinethe influence of demographics on price sensitivityand deal proneness in the UK grocery retailindustry. If Marks and Spencer (M&S) fail to adaptto changing market pressures and consumerbehaviour this could lead to a decrease in marketshare. The company now places a strategic focuson its grocery element and using their ‘Sparks’loyalty card data to increase the efficiency andrelevancy of promotions. It is within this strategicdirection that the study resided: by firstlyattempting to show how demographics may affectthe uptake of promotions and could therefore beused to improve promotional and pricingefficiency; and secondly to evidence the valuableinsight that can be obtained from loyalty carddata.

Data and Methods

The research was enriched by commercial datathat is not openly available to academia. POS datafor ten grocery products over the course of a yearwere provided from four stores across England.Expenditure on ‘Sparks’ loyalty cards at eachstore, attributed to a specific output area(aggregated from postcodes) was used for theestimation of the store catchment areas with aGIS. The estimated catchment areas were thenjoined with 2011 Census data to obtain the keydemographic characteristics of the stores’catchments that have been argued to be effectivedeterminants of deal proneness and pricesensitivity.

The effect of price on the sales of four products atthe different stores was explored by examiningsales data alongside price histories. Linearregression models were applied for two purposes:firstly, for hypothesis testing to establish whetherthe impact of price was seemingly influential onsales; secondly, to explore the extent of the price-effect, if present, by estimating the price elasticityof demand from the outputs of the regressionmodels. The results were then collated with thedemographic characteristics of the stores to inferpotential demographic influence on pricesensitivity and deal proneness.

Key Findings

The use of loyalty card data for catchment areaestimation was shown to be an applicablemethod but it did overestimate areas for storeswhere customers were likely to exhibit complexcustomer patronage behaviour. Most poignantly,

the extent of the calculated catchment area of aconvenience store located in a travel hubevidenced the intricacies associated withestimating a catchment area for a store of thistype.

The findings displayed distinct differences in howprice impacted sales at stores. Price and dealswere argued to affect customers of the travel hubconvenience store differently to the other moretraditional outlets. In tandem with the differingprice-effects, the estimated catchment areademographics also varied (see Figure 1).Customers at Store A were argued to be the mostprice sensitive and thus deal prone. Thecomparatively high proportion of householdswhere 5 or more people reside led to the inferencethat households of this size were potentially moredeal prone. In addition, the findings were arguedto contradict the common belief that elderlypeople are more price sensitive due to loweropportunity costs and that more educated peopleare less price sensitive because of higheropportunity costs.

Figure 1. Estimated composition of store catchmentareas with key demographic variables from the 2011Census

Value of the Research

The academic and commercial benefit for M&Sresides firmly in the explicit explanation of thecatchment area estimation technique which isbrought forward for the first time in the existingliterature. Furthermore, the insight asserts thatstrategically M&S, and other retailers, should notoffer homogenous pricing and promotions acrosstheir network; these should be customised forincreased effectiveness. The inferences in thefindings show that catchment area demographicsshould be used to support the planning of these.

Page 18: CONSUMER DATA RESEARCH CENTRE Masters Research ...€¦ · Masters Research Dissertation Programme Case Studies 2017 CONSUMER DATA RESEARCH CENTRE Edited by Guy Lansley 13 October

An application of statistical and spatial analysis techniques to engineer callout data from aUK telecommunications company

Patrick Wood1, Ruth Hamilton1 and Andy Simpson2

1University of Sheffield, 2CDRC industry expert

In collaboration with a nationaltelecommunications provider

Project Background

This research aims to understand patterns inengineer callouts for a telecommunications &media company. The research analyses dataconcerned with the frequency and nature ofengineer callouts during the year 2016, withinBristol, Bath, Bradford and Newcastle. It combinesengineer callout data with geodemographic data inorder to assess whether different social groups aremore or less likely to experience issues withinternet services. The research also presents anoriginal enquiry into the effect of weathertemperature on internet services as measured byengineer callouts. Analysis focuses on customer-caused spliced cable callouts testing a hypothesisfrom an engineer interviewed during a company‘ride-out’ day.

Data and Methods

To investigate geodemographic trends, a numberof statistical tests were conducted. Chi-square testresults indicate whether the distribution ofengineer callouts by geodemographic groups arerandom and a binary logistic regression producesodds ratios quantifying how much more or lesslikely ACORN and Internet User Classification(IUC) groups are to experience engineer calloutsthan selected reference groups. Bivariate linearregression and Pearson’s R tests detail thestrength and direction of trends between engineercallout data and explanatory variables such as‘final mile’ distance and weather temperature.

Mapping and spatial analysis methods werecarried out using ESRI ArcGIS software.Dasymetric mapping techniques visualise engineercallout data by shading building footprints, insteadof administrative area polygons, to better reflectthe geography of urban areas, customers andpopulation density. Global and local spatialautocorrelation statistics analyse whether the datashowed statistically significant geographicclustering of percentage change in customer-caused spliced cables between cold and hot daysat the postcode sector level.

Key Findings

‘Chargeable visit’ engineer callouts are codedwhen nobody is present at the property upon theengineer’s arrival or when the problem is causedby faulty customer equipment. These callouts aremore likely amongst younger, more e-engagedneighbourhoods. ‘Clarification of service' callouts,indicating a customer has falsely reported a faultdue to a misunderstanding of their service, aremore likely amongst older-age groups. However,

socio-economics amongst older age groups is alsoinfluential, with higher socio-economic groupshaving proportionally less callouts.

This research identified a positive and statisticallysignificant relationship between the average ‘finalmile’ distance and engineer callouts, although theregression models indicated that it only explainsroughly 4% of the variance. Mean daily weathertemperature also exhibited a similar influence ontelevision-related and internet-related callouts.However, weather temperature was found toexplain 22% of the variation in customer-causedspliced cables. Studying the percentage changebetween cold and hot days across space revealedthat built-up semi-periphery areas have thegreatest increases in callouts on hot days.

Figure 1. Percentage absolute change in customer-caused spliced cable callouts in Bristol (Red =increase, Green = decrease)

Value of the Research

While the population is becoming more E-engaged, this research has demonstrated thatthere are still variances in miscommunications andunderstandings of telecommunications technologybetween different age groups and also differentsocio-economic age groups. This has subsequentlyled to geographic trends in engineer calloutswithin cities. This insight can therefore supporttelecommunications providers and localgovernments by enabling them to communicatemore effectively with consumers with particulargeodemographic characteristics to improve servicedelivery and reduce barriers to internet access.