automated models for value prediction: a critical review ...agostino valier *, ezio micelli **...

journal valori e valutazioniNo. 24 - 2020 151

Agostino Valier *, Ezio Micelli ** keywords: real estate, avm, machine learning,artificial intelligence, neural networks

Automated models forvalue prediction: acritical review of thedebate

Mass appraisal techniques are used in the valuation oflarge groups of real estate assets. Their use involves theuse of common real estate data, a single evaluation pro-tocol and result verification tests. Given the vast amountof information they have to process, they are entrustedto automatic value prediction models. If initially thesemodels were based on the theory of implicit marginalprices, identified through regression analysis, now theycan take radically different forms thanks to the noveltiesbrought by statistical self-learning algorithms. The algorithms of automatic learning – known as machinelearning models – autonomously learn the informationcontained in a dataset. They are able to acquire the exist-ing relations between the characteristics of the assetsand the values of price of the goods, even when thesehave forms well distant from the more traditional linearrelation. Each model is first trained with the data of knowncases, and then tested in its ability to predict unknownvalues. The scientific literature has followed the evolutionof the machine learning models for the prediction of thevalue, investigating them under more analysis profiles.The most frequently found research theme concerns the

comparison of several evaluation models on the samedataset of real estate data, compared in terms of accuracyin the prediction.

The research provides a critical review of the debate inall publications in which the effectiveness of new valueprediction models has been empirically investigated. Themodels prove to be effective in their predictive capacity,less effective in their inferential capacity, i.e. to evaluatethe dependence of the price phenomenon on the causesexplained by the variables. The debate confirms a higheraccuracy of prediction of the new models with respectto the traditional regression analysis. However, it is notpossible to rank the models in order of accuracy, as theeffectiveness of each model depends on the data avail-able to it. In the face of this undeniable advantage, thesemodels present a limit in their characteristic of black box:the valuer cannot know with certainty what values andforms the variables assume in the learning processes.This makes the models ineffective for understanding thedynamics of formation and variation of value in relationto the characteristics of the good and external agents.

Abstract

1. INTRODUCTION

Real estate valuation services are undergoing a radicalchange. Current innovation has an impact on the nature ofvaluations, operational procedures and the skills requiredof the professional sector (Rics, 2017).Estimation services do not necessarily identify with the nat-ural person of the valuer, more and more often the evalu-ation can be entrusted to an automatic calculation model.

The need to use automated value prediction modelsbecomes more powerful in cases where the number ofassets to be estimated is large. Large-scale real estateappraisal – commonly referred to as mass appraisal – ismainly used to identify values for tax purposes, to assessreal estate portfolios or to update pre-existing values. TheIAAO - International Association of Assessing Officers -defines the technique of mass appraisal as the process ofevaluating large groups of properties, carried out using

journal valori e valutazioniNo. 24 - 2020152

demic research also began to deal with the phenomenon,first in the United States and then gradually expanding untilit became the subject of global interest (Robson and Down-ie, 2008; Bidanset, 2014). However, for many years theresearch has suffered from the lack of a single definitionof Avm. Only in 2003 did the IAAO fill the gap by draftingthe “Standard on Automated Valuation Models”, whichare defined as: A mathematically based computer softwareprogram that market analysts use to produce an estimateof market value based on market analysis of location, marketconditions, and real estate characteristics from informationthat was previously and separately collected (Moore, 2009;IAAO, 2018).

Almost all automated valuation models predict the marketvalues of residential units. The reasons for choosing thismarket segment may be different: the residential sector isthe one with the highest number of transactions (and there-fore data), the goods have almost homogeneous character-istics, the distribution of purchases and sales over the territoryis more uniform than in other sectors. With regard to thevaluation approach used, almost all of the Avm is based onthe Market comparison approach, which is the most effectivewhen data on a large number of transactions are available,as in the case of mass appraisal estimates (French and Gabriel-li, 2018). However, in the literature there is also room forAvm with different purposes of employment and with dif-ferent approaches, for example the income approach forthe evaluation of hotels (O’Neill, 2004), or the comparisonof monthly rents for the estimation of the value of commer-cial premises (Núñez-Tabales et al., 2016).

The automatic models of value prediction therefore adoptthe procedure of estimation by comparison. The parameterson which the comparison is based are not limited to salesprices – or offer prices – but include several characteristics,using datasets that are as broad and detailed as possible.They are therefore identified as estimation procedures formultiparameter comparisons and are based on the fact thatthe value of the asset varies according to its real estate char-acteristics. These characteristics, which describe the assetitself and its context, must all be observable and quantifi-able.

The comparison between assets on the basis of theirdescription parameters is a well-disciplined procedure inthe valuation manuals. Among the most well-known in Italy,there is the procedure for typical values (Forte and De Rossi,1974) or procedure for merit points (Realfonzo, 1994). Thisprocedure foresees that the asset under estimate is com-pared with the asset that on the same market has recordedthe highest unit price, assuming that the characteristics ofthe latter present the optimal degree of appreciation bythe market. The comparison is then divided into the sumof the ratings assigned for each individual characteristic.Simonotti (2006) deserves credit for having introduced theSales Comparison Approach method in the national con-text, which subjects the values of the comparative assetsto systematic adjustments, determined on the basis of thecomparison between the individual parameters. However,

common real estate data, a single evaluation protocol andresults verification tests (Gloudemans, 1999; IAAO, 2013). If the estimates of individual assets are often entrusted toan assessor, the mass appraisal procedures always use auto-matic value prediction models because they would be dif-ficult to rely on solely in the assessor’s judgment. Due toits intrinsically subjective nature, the expert would find itdifficult to maintain a single valuation approach and toensure that all the cases analysed are treated in the sameway. Moreover, the manual estimation of large quantitiesof real estate would be unsustainable in terms of costs andtime of the evaluation process (Bartke and Schwarze, 2015). For the automatic calculation models, traditional forms ofinferential statistics were initially used, but recent devel-opments in the field of artificial intelligence have allowednew paradigms on which to base the prediction of value.Computational machines, through the technique of self-learning, are increasingly able to replace activities tradi-tionally performed by human intelligence, including eval-uation activities.The interest in new and more effective automatic evaluationmodels does not only come from the scientific world. Thereal estate sector is increasingly in demand for reliable andobjectively valid estimates. The last financial crisis has seencredit institutions, companies and private individuals strug-gling to liquidate real estate assets except for significantdevaluations, which also occurred because of the errorsof overvaluation made previously on the assets. A moreeffective valuation system would perhaps have avoidedsuch losses in value (Canesi et al., 2016). Frey and Osborne (2017) in an extensive survey that assignsto each profession the degree of possible computerisation– i.e. the possibility that the work currently carried out byman can be entirely replaced by the work of a machine –estimate at 90% the susceptibility of computerisation ofthe profession of property valuers. The whole field of eval-uation therefore wonders what the future of estimates willbe, what impact automatic value prediction models willhave on professional evaluation practice, what their poten-tial will be and what their limits of use will be (Cook, 2015).

2. AUTOMATED MODELS FOR VALUEPREDICTION

The automated valuation models – henceforth referred toas Avm, an acronym for Automated Valuation Models –arise from the need to estimate large quantities of realestate quickly and with uniform procedures. They are alsoreferred to as Computer-assisted mass appraisal. The first proposals for automatic evaluation modelsappeared in the 1970s in soil estimation models (Holland,1970). Initially, these models were proposed for the esti-mation of the taxable value of land and real estate for taxpurposes, and with this aim the first experiments in theUnited States and Denmark were conceived. The firstpatents for automatic estimation models were filed in the1990s (Jost et al., 1994; Hough, 1995). At the same time, aca-

Automated models for value prediction: a critical review of the debate

these processes can only be used when the number ofassets and parameters for comparison is limited. For largedata sets, such as those used in mass appraisal estimates,prediction models for statistical probability must be used.

The objective of a mass appraisal model is therefore toidentify – on the basis of the comparison between variousgoods – the shape of f, the function that links the price (Y)of the good to its n characteristics (x1, x2 ..., xn). Preciselyon the identification of the form of f the two types of modelsare divided, which correspond to two different conceptualand procedural approaches: on the one hand the traditionaleconometric models, on the other hand the models of self-learning or machine learning. The first ones prepare a priorithe form of the function on the basis of the knowledgelearned from the economic theory and on the basis of thedata available quantify the different coefficients. The latterallow the data itself to give shape to the model, identifyingthe function that most effectively adapts to the informationcontained in the data. Kauko and D’Amato (2008) in their scientific productionuse an effective terminology to name the two classes justdescribed. On the one hand, the “orthodox” models, whichuse a hedonic approach, quantify the relationship betweenthe price of the property and its characteristics. On theother hand, “heretical” models, which instead adopt a sta-tistical approach, read the patterns emerging from the dis-tribution of data. For the terminology used in this articlewe will use the subdivision between traditional modelsand machine learning models. All models are evaluated according to two capacities: infer-ential capacity and predictive capacity. The first consists inthe model’s ability to identify cause-effect relationshipsbetween explained variables and independent variables,the second lies in the model’s ability to process outputresults corresponding to the value of real data. Traditionally, mass appraisal techniques employ multipleregression analysis. Such analysis is able to determine withnumerical coefficients the degree of influence that the sin-gle parameters exert in the variation of the final value. Itfinds theoretical and methodological justification in thetheory of hedonic prices (or of implicit marginal prices)according to which a real demand function can be deter-mined for each single real estate characteristic of the asset,going so far as to quantify – through multivariate analysistechniques – prices not directly observable for each singlecharacteristic, that is, the implicit prices (Rosen, 1974; Ghos-alkar, 2018). Given the advantage of a rapid intelligibility of the model,this approach has some critical elements. The limits froman operational point of view impose – in order to avoidmulticollinearity phenomena – a number of contained vari-ables, thus making the regression unsuitable for the vastswarms of information of the Big Data. From an economicpoint of view, regression is more powerful in its inferentialcapacities – i.e. the determination of the causal effect of

individual regressors – than in those predictive of deter-mining the final value (Schulz et al., 2014; Athey, 2019; Pérez-Rave et al., 2019).The emergence and development of machine learningmodels is intrinsically linked to the development of com-putational techniques. However, the difference betweenthese algorithms and the previous traditional models cer-tainly does not lie in the use of the computer, which is alsoused in regression analysis. In the traditional models thehuman operator inserts in the machine the algorithm withwhich it has to work, then enters the data and will obtainthe results that the machine has elaborated following thesupplied model. In the self-learning models, the humanoperator provides the machine with a set of input data anda corresponding set of output data. Subsequently, themachine searches for the function that best adapts to thedata both autonomously, through a statistical self-learningprocess, and with the intervention of the operator whooptimizes and selects the parameters of the model. Themachine learning evaluation models do not directly providethe numerical price values as results, but rather producealgorithms with which it is possible to process new dataand obtain the value of the price variable from them (Varian,2014; Attigeri et al., 2015; James et al., 2013).

3. METHODOLOGY AND DATA

The interests of the professional world and the interestsof scientific research therefore converge on the theme ofthe Avm. If the former is more oriented to know the realoperational evidence that these may have in practice eval-uation, the academic world investigates the phenomenonwith a broader spectrum including also theoretical aspects. This research aims to describe the most significant contri-butions and the main positions emerged in the scientificdebate about the use of statistical self-learning algorithmsin real estate valuation. The review is aimed at highlightingthe potential and limits of use they can offer in the estima-tion activity. The review has been divided according to the differenttypes of algorithms, for each of which a brief descriptionhas been provided. Within the single class of algorithms,the debate was organized both by chronological order,and by grouping together different contributions that cameto similar conclusions. The study of the different algorithmsby the authors was based on an essay on scientific dissem-ination (Domingos, 2015), a volume on statistical method-ology (Hastie et al., 2013) and an operating manual for theuse of software in the writing of algorithms (Zinoviev, 2016). There are numbers of machine learning algorithms, as wellas numerous proposals for classification. Some subdivisionsare based on the statistical forms of the models (Dey, 2016),others on their operational purposes of use (Glumac andDes Roisiers, 2018). Only the supervised learning algorithmsused to solve regression problems will be dealt with here,as they are the only ones that can be used to predict thevalue of a continuous variable such as the price of a prop-


! " #$%!& %"& ' & %#(!

erty. Therefore the algorithms usually known for classifi-cation problems – for example, the k-nearest neighbor –when mentioned in this article are to be understood onlyin their form of regression algorithms. The next step was to identify the body of articles on whichto investigate the evolution of the debate. It was dividedinto two phases: the identification of an initial dataset ofarticles that would provide a state of the art on the use ofmachine learning models in real estate valuation and a sub-sequent refinement of the dataset to a smaller number ofarticles on which to base the critical review. The first phase consisted in a research on the online data-base Scopus, chosen because it represents one of the mostcomplete and reliable in the field of estimation disciplines.The preliminary study of the literature on the subject ofself-learning algorithms has allowed the authors to identifythe words that recur most frequently in abstracts and articletitles, and then use these terms in research on Scopus. Thetotal research strings used are 36, the result of a combinationof a term delimiting the operational scope of real estate, 4terms indicating evaluation practice and 9 terms describingthe models of self-learning. These words have been com-bined with each other through the appropriate Booleanlogical connectors as can be seen in the table (Tab 1).The Scopus research, updated to July 2019, revealed a totalof 381 articles. The elimination of the numerous duplicatesthen reduced the sample to 165 articles. The complete read-ing of the 165 papers made it possible to identify the linesof research within the theme and to select the articles onwhich to focus attention in order to respond to the researchquestion posed previously. The analysis profiles with which scientific production dealswith self-learning models in mass appraisal evaluations canbe traced back to three: theoretical, methodological andempirical. From a theoretical point of view, the economictheories on which the automatic value estimation proce-dures are based are investigated (Mooya, 2009, 2017).Methodological research proposes new evaluation modelsor proposes a classification of existing ones (Kauko andD’Amato, 2008; D’Amato and Kauko, 2017; Glumac and DesRoisiers, 2018). Finally, empirical research tests evaluationmodels on real estate data datasets – for sale or for offer –quantifying the forecasting capacity of the models in numer-ical terms (Shinde, 2018). Accuracy measurement is neveran end in itself, but is the starting point from which to reflecton estimation models and their possibilities of use.

On this last type of research it was decided to base the crit-ical review of the debate, delimiting the field of investigationto 66 articles containing the application of one or moreevaluation models tested on the same dataset of real estatedata. The choice is justified by the belief that the tests ofmeasurement of accuracy are the meeting point betweenthe knowledge of scientific research and the operationalneeds of real estate valuation. The dataset of 66 articles wassubsequently expanded with the technique of bibliographicsnowballing, i.e. the inclusion of new contributions thathave been found within the lists of bibliographic referencesof other articles. In fact, many of the articles read referredto the content of other papers not present in the texts ini-tially identified. The most significant bibliographic refer-ences have been added, enriching the sample with 13 units,for a total of 79 articles used.

4. THE CRITICAL REVIEW OF THE DEBATE

Before proceeding to the critical review of the debate, it isnecessary to define what is meant by the measure of accu-racy, by far the most common method of investigation inthe literature on Avm using machine learning models. Themeasurement of predictive ability always adopts the sameprotocol. The dataset available to authors is divided into twoparts: the training set and the testing set. The first set includes70-80% of the total data and is used for the training phase ofthe model, in which the computer works with the input data(the real estate characteristics) and the output data (the finalprices), identifying the function that best explains the val-ue-dependent variable. Algorithms excessively trained tothe data of the training set risk overfitting phenomena: againstan excellent learning of the internal relations of the sample,they have not matured capacity of adaptation to the externaldata of the sample. The remaining part of the overall dataset(the testing set) is used to test the model obtained. The inputdata (X) of the testing set are processed by the algorithmformed on the training set, then the output values providedby the model (Y ) are compared with the output values ofthe testing set (Y). The smaller the difference between Yand Y, the more the model can be declared effective in itsability to predict the value. There are numerous statistical indicators that measure thegap between the output values obtained and the target val-ues. The choice of this indicator is not a marginal choice,in many researches (for example, Lasota et al., 2009) the


“real estate” AND A. “valuation*”B. “appraisal*”C. “automated valuation

model*” OR “avm”D. “price* forecast*”

AND 1. “machine learning”2. “artificial intelligence” OR “ai”3. “decision tree*” OR “regression tree*”4. “neural network*” OR “ann”5. “backpropagation”6. “genetic algorithm*”7. “bayes*”8. “nearest neighbour*” OR “knn”9. “vector machine” OR “svm”

Table 1 - Search strings used in the Scopus database


order in which the models are distributed according totheir forecasting capacity varies according to the indicatorconsidered. Finally, it is useful to remember that the eval-uation protocol of the model described above is not a pre-rogative of the self-learning models but is applied to anytype of forecasting model, therefore also to evaluate themultiple regression models.

4.1 Regression trees and random forest

One of the biggest limitations of machine learning modelslies in their black box character, i.e. the forms and valuesof the internal passages of the model between the inputphase and the output phase are unknown. This is not truein the case of the regression trees, for which it is possibleto know the value assumed by the data in each step of theself-learning path. For this reason they are also defined aswhite box models.

Like decision trees, regression trees organize the cognitivepath into several paths made up of successive knots endingin final leaves. Starting from the root node (input data), thealgorithm divides the dataset into two internal nodes, whichin turn are divided into two subnodes and so on, followinga process of successive binary partitions until they reachthe final nodes (called leaves) in which the data containedare much more homogeneous than those of the startingdataset. The partition function (split) uses the explanatoryvariables entered in the model and divides each node withthe split value that maximizes the decrease in impurities(Gini heterogeneity index) of the variable.

Fan et al. (2006) use regression trees to study which char-acteristics contribute most to pricing in Singapore’s resi-dential property. However, these models risk specializingtoo much on the available data. When the results of pre-dictive accuracy obtained on the training data far exceedthose obtained on the verification data, the model cannotbe considered as effective because it suffers from overfit-ting. To overcome this problem, training periods are limitedor specific techniques such as pruning are used. Randomforest models, models of ensemble learning resulting fromthe aggregation of several trees of regression, are morefortunate. The models of learning together combine severalindividual models – in this case, the regression trees – withina single metamodel – the random forest – which offers bet-ter performance than that offered by each model consid-ered individually (Tang et al., 2018; Graczyk et al., 2010;Antipov and Pokryshevskaya, 2012).

In Mullainathan and Spiess’s (2017) research, decision treesshow less predictive capacity than regression analysis,which is in turn outdated by random forest models. In theestimate of 7,400 residential transactions in the city of Ljubl-jana, the coefficient of determination (R2) recorded by ran-dom forests is 34 percentage points higher than thatobtained by the method of least squares (Ceh et al.,2018).Kok et al. (2017) use data from 36,000 single housesbetween California, Florida and Texas to test random forestmodels. They apply the model in three different cases: in

the first two cases the model predicts the market value, inthe third case it predicts the values of NOI (Net OperatingIncome). In the second case, moreover, the NOI values ofthe assets have been inserted within the real estate datanecessary to predict the market value. Random forest mod-els have proved more effective than the minimum squaremethod in the first and third cases. Therefore, when NOIis included in the input data, regression analysis is evenmore effective.

4.2 Artificial Neural Networks

Artificial neural networks – henceforth also referred to asANN, an acronym for Artificial Neural Networks – wereborn from the conjunction between the disciplines of math-ematics and neurophysiology in the 40s of the last century(McCulloch and Pitts, 1943). They organize the cognitivepath reproducing what the human brain does when it learnsnew information, therefore they arrange the informationin more nodes, called neurons, distributed in more levels,called layers.

The first level contains the input data, while the final levelcontains the output data. Between these two levels thereis a variable number of hidden layers, within which the reallearning path takes place. The choice of the architectureof the model, that is the number of hidden layers and thenumber of the relative neurons contained in them, hasimportant repercussions on the obtained results, as demon-strated in the contribution of Cechin et al (2000). Each neu-ron is connected to each of the neurons of the previouslevel and to each of the neurons of the next level, each con-nection is expressed by a function that regulates its weight.Each weight is directly proportional to the importance thatthe model assigns to the connection between the two spe-cific neurons.

However, if the neural network were to be reduced onlyto what has just been described, infinite random attemptswould be necessary before the model could demonstratea sufficient degree of reliability. To this end, the neural net-works have been implemented with the algorithm of retro-progation, fundamental for the training of a neural network(Rumelhart et al., 1986). It causes that the neural network,after having produced autonomously the initial results,compares them with the outputs provided by the testingset and on the basis of the comparison between the tworesults starts a reverse path correcting all the weightsbetween the various neurons.

The first one to quantify the capacity of the artificial neuralnetworks to supply reliable estimates was Borst (1991). Thecomparative researches of more models on the samedataset were inaugurated by Do and Grudnitski (1992), whotest the superior effectiveness of neural networks againstmultiple regression in recognizing the price of 136 housesin San Diego. In many other subsequent researches theperformance of ANNs exceeds, sometimes in significantterms, that of traditional models (Amri and Tularam, 2012;Wang et al., 2014; Kutasi and Badics, 2016). The measurement


of the forecasting capacity of the ANNs was not limited tothe estimates of values, but extended to the analysis of thereal estate market (Hausler et al., 2018; Ling et al., 2014) andthe estimation of aggregated indicators (Han et al., 2018).All the authors point out that these models have an unde-niable predictive effect and a critical element in their blackbox character. It is not possible to observe the role that thesingle parameters play in the variation of the value, definingin numerical coefficients the causal relationships betweenprices and the characteristics of the goods (Yacim andBoshoff, 2018). Some authors, in order to study the impor-tance of variables, draw up a sensitivity analysis in which,for each variable, they calculate the difference betweenthe average squared deviation of the entire model with theaverage squared deviation of the model without theanalysed variable (Tajani et al., 2015).

The common enthusiasm for the predictive capacity of neuralnetworks came to a halt with the research of Worzala, Lenkand Silva (1995), who were the first to criticise the works ofBorst (1991), Do and Grudnitski (1992). The authors work ona set of 288 houses in Fort Collins, therefore a larger sampleof data than the two previous researches, and compare neuralnetworks/multiple regression on three samples: the completesample (case 1), the sample made up of cases that fell withinthe price range analyzed by Do and Grudnitski (case 2) andfinally – to compare with the case of Borst who had usedvery similar goods between them – a sample made up ofhouses belonging to the same postal code (case 3). Case 1found values of accuracy almost identical for both models,in case 2 the performance classification varied according tothe type of software used, and finally only in case 3 – thereforewith a very homogeneous data sample – the neural networksperformed better than the regression analysis. The authorstherefore question the absolute superiority of the neuralnetworks over the traditional models, making such superi-ority correspond to specific conditions of the dataset or ofthe software employed.

Similar conclusions have been reached by Lenk et al. (1997),McGreal et al. (1998), McCluskey et al. (2013). In Nguyenand Cripps (2001) the merit of having effectively studiedthe relationship that binds neural networks to the amountof data available to them. Using a dataset of 3906 observa-tions and 108 times the comparison with datasets of differ-ent sizes, they show that ANNs exceed the predictive capac-ity of multiple regression only when the sample is of medi-um-large size.

The limits of artificial neural networks are also highlightedin other types of evaluation such as, for example, predictiontests of energy market values (Gensler et al., 2016).

4.3 Genetic algorithms

Genetic algorithms are based on the same adaptation func-tion defined by Darwin for biological evolution, where indi-viduals develop by adapting to the stimuli created by thesurrounding environment. Genetic algorithms process thealgorithm several times adapting it to the data available to

them through the functions of crossover and mutation.Each generation of algorithm, therefore, adapts to the databetter than its previous one. The application of these mod-els within the mass appraisal evaluation models is oftencombined with other algorithms: Sun (2019) uses themeffectively to optimize a more traditional retropropagationalgorithm. (Zhao and Jia, 2011) apply it to predict the grossvalue added of the real estate sector. It is worth mentioningthe contribution that the Italian academic world is makingin testing genetic algorithms in real estate valuation. Thesemodels have proved effective in predicting the value ofreal estate in Naples (Del Giudice et al., 2017), Potenza (Man-ganelli et al., 2015) and Bari, Naples and Rome (Morano etal., 2018).

4.4 Nearest neighbors

Analogue algorithms work by researching the behaviourof cases similar to the case under investigation, in order topredict the behaviour of the case under investigation. Theydo not therefore organize the cognitive path in successivephases, but observe the distance that elapses between thevarious cases, measuring it on the basis of the variablesthat describe them. They are based on the principle thatphenomena in nature are almost never linear with respectto the whole sample, but have linear relations in small localcontexts limited to a few data. Therefore, instead of lookingfor the coordinates of a line that can explain the entire phe-nomenon, they simply analyze the phenomenon in thevicinity of the case to be estimated.

They are defined as lazy algorithms because, unlike all otheralgorithms, they do not identify and model the internalrelationships of the training set data before knowing whichis the demand presented by the testing set. Once the termsof the submitted instance are known, they process onlythe data necessary to answer the question.

The most known are the k-nearest neighbors, where theinvestigated variable – in this case the price – is the averageof the values that the variable assumes in the number k ofthe closest cases. Isakson (1988) uses the technique to pre-dict the value of 143 real estate properties in Dallas dividedbetween apartment, industrial, office and retail: in all fourtypes the nearest neighbors achieve better performancethan the method of minimum squares. In the same contri-bution, however, the author notes that the technique iseffective in cases where the value to be predicted has char-acteristics close to the average of the available data, whileit proves more inadequate in cases where the object to beevaluated is a statistical outlier. In other research, the k-nearest neighbors record values of accuracy lower thanthose obtained from other models (Borde et al., 2017;Oladunni and Sharma, 2016).

Although these models use complex geometries, moreable to work on multidimensionality than the traditionalEuclidean geometry, they present strong criticalities whenthe number of data – and therefore the number of sizes –increases (Cover and Hart, 1967). This has led to limiting



their use in mass appraisal models, although they proveeffective in some phases of the evaluation: Mccluskey andAnand (1999) use them to identify the most significant com-parables within a hybrid model, then entrusting the deter-mination of the price to neural networks and genetic algo-rithms.

4.5 Support Vector Machines

The criticalities found in the nearest neighbors led to agradual abandonment of the techniques of working onhyperspaces, until the introduction of the support vectormachines (SVM) introduced by Vapnik (Boser, Guyon andVapnik, 1992). They are able to reduce the great complexityof non-linear models in vector spaces on which linear alge-bra models can be applied (Cristianini and Schölkopf, 2002).A problem presented in its original space often presentselements that are difficult to separate from each other inthat space; the same task becomes easier if one works in ahyperplane made up of several spaces.

Their applications in real estate valuation models are manyand generally show great effectiveness in predicting value.Kontrimas and Verikas (2011) identify the Support VectorRegression - SVR - as the most effective predictor of value.Similar conclusions are reached by numerous otherresearches, mostly coming from the Asian area (Li et al.,2009; Lin and Chen, 2011; Zhang, 2012; Yeh, Hsieh and Wang,2013; Mu, Wu and Zhang, 2014; Wang et al., 2014; Huang,2019). However, it is not possible to assign to the machineswith support vectors the role of best algorithm for theabsolute evaluation, everything depends on the nature ofthe available data. For example, in their comparison withartificial neural networks, some authors identify them asmore effective (Lam, Yu and Lam, 2009), while others identifythem as less effective (Abidoye et al., 2019; Phan, 2018).

Despite the growing interest that research is showing insuch algorithms, the literature has not yet dedicated a struc-tured debate to vector-based machines supporting realestate evaluation, such as the one that artificial neural net-works have known. The reasons for this gap are perhapsin the fact that the theme has been present in scientificproduction for a time still too short.

4.5.1 Interpretation of results

The reading of the results from time to time reported wouldseem to confirm a greater ability to predict the self-learningmodels compared to the traditional econometric approach.However, there is no research that can empirically confirmthis hypothesis on a dataset of articles as broad and repre-sentative as possible, quantifying the results emerging fromthe literature produced.

With regard to the conditions that guide the choice of mod-el, it can be said that each evaluator preparing to define amodel mass appraisal is facing a crossroads: on the onehand the traditional econometric models, on the otherhand the machine learning models. Although both share

the same objective – estimating market value – their meth-ods of use and their characteristics are very different. Takingup the expression of Mullainathan and Spiess (2017): theformer estimate β, the latter estimate y.

The traditional models of regression – linear or logarithmic – are inferential procedures: they explain the cause-effect relationship that the independent variables have onthe dependent variable. They are born to identify relation-ships within a dataset, the measure of their predictive capac-ity on an external dataset (testing set) can therefore be con-sidered as a forcing, as it is assigned predictive capacity toa model whose primary purpose does not lie in predictingthe value of unknown variables. Their inferential powermakes them valid tools on which to base inductive process-es to describe generally valid behaviors on a given statisticalpopulation (Mangialardo et al., 2019).

The main objective of the machine learning models is theprediction of the value of y, the identification of the line(straight or curved) that explains the relationship betweeny and each variable is overshadowed by the plan. It is notpossible to derive inferential hypotheses for lack of thenumerical coefficients β. Even if an order of importancewere assigned between the variables by measuring the dif-ference in accuracy between the model considered as awhole and the model without the variable in question,important indicators such as the standard error measure-ment would be missing. The total absence of cause-effecthypotheses in the algorithm must not imply, however, thatknowledge of economic theory is irrelevant to the successof the model. The economic and estimative knowledgeguides the operator in the choice of the model, in the treat-ment of the variables and in the regulation of the parame-ters.

The main limit of machine learning is overfitting. Its pow-erful predictive power often risks being ineffective whenfaced with new data, different from those with which it hastrained. To overcome this critical situation, techniques ofregularization and refinement of parameters are used. Thefirst are data mapping techniques that reduce the phenom-enon of high dimensionality, typical of Big Data. Large setsof data may in fact present insignificant information oranomalous patterns that make it difficult to identify themost significant statistical relationships. The refinementof the parameters, on the other hand, is a process of gradualincrease in the accuracy of the model. It is defined not bychance as empirical tuning, since it is based entirely on thevalues expressed by the data. Through successive cross-validation within the training set, it is possible to refine theparameters before they are tested on the testing set, thusreducing the phenomenon of overfitting.

The passage of empirical tuning, consisting of continuouscorrections, choices of the model and its internal articula-tions, lead the operator who defines the model to assumea leading role. Writing and optimizing self-learning modelscorresponds to a real knowledge, the result of the sum ofskills and experiences in the field. Each model is also theresult of the subject who conceived and refined it, adapting


it to the nature of the problem requested. Numerousresearches show how the predictive capacity of the modelsundergoes great variations with the variation of the para-meters that are assigned to them (Baldominos et al., 2018).At the same time, each model is only valid for the data withwhich it has been conceived, the addition of new data orthe modification of existing data will not necessarily meanthat the model preserves or increases its predictive capacity.It may, on the contrary, see it decrease.

5. CONCLUSIONS

The review was aimed at identifying what evidence emergesfrom the literature so far produced on the subject of Auto-mated valuation models. The readings revealed a certainpredominance of automatic learning models over tradi-tional econometric models with regard to the ability to pre-dict the price value of real estate. Within the various typesof models, however, it is not possible to draw up a classifi-cation in order of predictive reliability: the effectivenessof the model depends directly on the nature of the dataavailable to it. Scientific production has not only evaluated these modelsin terms of performance,but has also found a series ofevidences that have generated a debate on the areas inwhich their use is possible. From an operational point ofview, the high performance obtained in forecasting prop-erty prices makes machine learning models attractive toall operators who evaluate, manage or trade in real estateassets. Investors can use them to evaluate possible invest-ments or transactions of which they are a party. Similarly,valuation service providers can use self-learning algo-rithms to offer reliable estimates to their clients. It shouldbe borne in mind that the creation of machine learningmodels will only be possible for those who have a greatwealth of information with which to train and optimizelearning. Small independent evaluators are unlikely tohave enough data and skills to create their own models,but they will be able to take advantage of the services soldto them by the largest players in the industry. Technolog-ical innovation will therefore bring radical changes to thecurrent structure of the professional sector of evaluators(Abidoye and Chan, 2017). Faced with the undeniable advantages in terms of accuracyof prediction, many authors have identified the black boxcharacter of machine learning models as their greatest limitin operational use. The new models are not suitable to

explain the dynamics of the market and – more generally– the mechanisms of value formation. They are difficult touse for policies of a public nature, where the evaluationprocess must guarantee fairness of treatment for all thecases concerned and maintain the same efficiency overtime. Self-learning models are not able to guarantee thesame requirements of accuracy in the face of the arrival ofnew data to be estimated. This could lead to complaintsfrom individuals who feel damaged by their assessments.

However, it is worth drawing attention to a further elementin favour of modern learning techniques. They are able towork with Big Data, not only in their vastness but also intheir variety (Choi and Varian, 2012). In this research onlyvariables expressed through numerical values or categorieswere dealt with, but machine learning models can alsowork with very different types of data: for example, pho-tographic images. Numerous researches study evaluationprocesses using real estate photos (You et al., 2017; Pour-saeed et al., 2018). The use of new information sources suchas images (including satellite images), the movementstraced by the devices we use daily, the rating assigned tobusinesses – just to name a few – may prove to be effectivepredictors of value. They may also partially make up for thelack of information sources traditionally complained of bymany in the real estate sector.

The use of machine learning models implies a paradigmshift in real estate valuation. Traditional models are basedon economic theory, which assigns a marginal price to eachparameter with which prices can be compared. The newmodels are based on the role that information – of whatevernature it may be – plays in pricing. The models of self-learn-ing come from areas far from the economic sciences, arisein contexts of study of the creation and development ofhuman thought (neurophysiology) or study of the evolutionof human behavior as a reaction to the stimuli posed bythe surrounding environment (evolutionism in the anthro-pological sciences).

Traditionally, economic science was considered as a naturalscience, where there are laws that explain the influence ofsome phenomena (the causes) on others (the effects). Theinnovation introduced by behavioural economics hasinstead considered the relationship between economicdecisions and the principles of cognitive psychology. Inthis new approach, models of self-learning, designed toreplicate the role that information plays in guiding humanchoice, are optimal.


* Agostino Valier, Department of Civil Environmental Engineering, University of studies Padovae-mail: [email protected]

** Ezio Micelli, Department of Project Cultures, Dorsoduro 2196, Veneziae-mail: [email protected]


Bibliografy

ABIDOYE R. B., CHAN A. P. C., Valuers’ receptiveness to theapplication of artificial intelligence in property valuation,Pacific Rim Property Research Journal, Vol. 23, No. 2, 2017,pp.175-193. ABIDOYE R. B., CHAN A. P. C., ABIDOYE F. A., OSHODI O. S.,Predicting property price index using artificial intelligencetechniques, International Journal of Housing Markets andAnalysis, Vol. 12, No. 6, 2019, pp. 1072-1092. AMRI S., TULARAM G. A., Performance of Multiple LinearRegression and Nonlinear Neural Networks and Fuzzy LogicTechniques in Modelling House Prices, Journal ofMathematics and Statistics, Vol. 8, No. 4, 2012, pp. 419-434.ANTIPOV E. A., POKRYSHEVSKAYA E. B., Mass appraisal ofresidential apartments: an application of random forest forvaluation and a CART-based approach for modeldiagnostics, Expert Systems with Applications, Vol. 39, No.2, 2012, pp. 1772-1778.ATHEY S., “The impact of Machine Learning on Economics”,in Agrawal A., Gans J., Goldfarb A. (a cura di), The Economicsof Artificial Intelligence. An Agenda, University of ChicagoPress, Chicago, 2019, pp. 507-552.ATTIGERI G. V., PAI M., PAI M. R., NAYAK A., Stock marketprediction: A big data approach, TENCON 2015 - 2015 IEEERegion 10 Conference. BALDOMINOS A., BLANCO I., MORENO A., ITURRARTE R., BERNARDEZ

O., AFONSO C., Identifying Real Estate Opportunities UsingMachine Learning, Applied Sciences, Vol. 8, No. 11, 2018,p. 2321. BARTKE S., SCHWARZE R., The economic role of valuers in realproperty markets, UFZ Discussion Papers, n. 13, 2015. BIDANSET P. E., Moving automated valuation models out ofthe box: the global geography of AVMs, Fair & Equitable,2014. BORDE S., RANE A., SHENDE G., SHETTY S.,Real Estate InvestmentAdvising Using Machine Learning, International ResearchJournal of Engineering and Technology, Vol. 4, No. 3, 2017,pp. 1821-1825. BORST R. A., Artificial neural networks: the nextmodelling/calibration technology for the assessmentcommunity, Property Tax Journal, Vol. 10, No. 1, 1991, pp.69-94. BOSER B. E., GUYON I. M., VAPNIK V. V., Training AlgorithmMargin for Optimal Classifiers, COLT ‘92 Proceedings ofthe fifth annual workshop on Computational learningtheory, 1992. CANESI R., D’ALPAOS C., MARELLA C., Forced Sale Values vs.Market Values in Italy, Journal of Real Estate Literature, Vol.24, No. 2, 2016, pp. 377-401.CECHIN A., SOUTO A., GONZALEZ M. A., Real estate value atPorto Alegre city using artificial neural networks, 6thBrazilian Symposium on Neural Networks (SBRN 2000), 22-25 November 2000, Rio de Janiero, Brazil. CEH M., KILIBARDA M., LISEC A., BAJAT B., Estimating the

Performance of Random Forest versus Multiple Regressionfor Predicting Prices of the Apartments, ISPRS InternationalJournal of Geo-Information, Vol. 7, No. 5, 2018, p. 168.

CHOI H., VARIAN H., Predicting the Present with GoogleTrends, Economic Record, n. 88, 2012, pp. 2-9.

COOK D.,RICS futures: turning disruption from technologyto opportunity, Journal of Property Investment & Finance,Vol. 33, No. 5, 2015, pp. 456-464.

COVER T. M., HART P. E., Nearest neighbor patternclassification, IEEE Transactions on Information Theory,Vol. 13, No. 1, 1967, pp. 21-27.

CRISTIANINI N., SCHÖLKOPF B., Support Vector Machines andKernel Methods Learning Machines, AI Magazine, Vol. 23,No. 3, 2002, pp. 31-42.

D’AMATO M., KAUKO T. (a cura di), Advances in AutomatedValuation Modeling, Springer International Publishing,2017.

DEL GIUDICE V., DE PAOLA P., FORTE, F., Using geneticalgorithms for real estate appraisals, Buildings, Vol. 7, No.4, 2017, pag. 31.

DEY A.,Machine Learning Algorithms: a Review, InternationalJournal of Computer Science and InformationTechnologies, Vol. 7, No. 3, 2016, pp. 1174-1179.

DO A. Q., GRUDNITSKI G., A neural network approach toresidential property appraisal, The Real Estate Appraiser,Vol. 58, No. 3, 1992, pp. 38-45.

DOMINGOS P., The master algorithm: how the quest for theultimate learning machine will remake our world, AllenLane, 2015.

FAN G. Z., ONG S. E., KOH H. C., Determinants of house price:A decision tree approach, Urban Studies, Vol. 43, No. 12,2006, pp. 2301-2316.

FORTEC., DEROSSIB.,Principi di economia ed estimo, Etas, 1974.

FRENCH N., GABRIELLI L., Pricing to market Property valuationrevisited: the hierarchy of valuation approaches, methodsand models, Journal of Property Investment & Finance, Vol.36, No. 4, 2018, pp. 391-396.

FREY C. B., OSBORNE M. A., The future of employment: howsusceptible are jobs to computerisation?, TechnologicalForecasting and Social Change, n. 114, 2017, pp. 254-280.

GENSLER A., HENZE J., SICK B., RAABE N., Deep Learning for solarpower forecasting - An approach using AutoEncoder andLSTM Neural Networks, International Conference onSystems, Man, and Cybernetics (SMC), 2016.

GHOSALKAR N. N., Real Estate Value Prediction Using LinearRegression, 2018 Fourth International Conference onComputing Communication Control and Automation(ICCUBEA).

GLOUDEMANS R. J., Mass appraisal of real property,International Association of Assessing Officers, 1999.

GLUMAC B., DES ROISIERS F., Real estate and land propertyautomated valuation systems: a taxonomy and conceptualmodel, LISER Working papers, n. 9, 2018.


GRACZYK M., LASOTA T., TRAWI�SKI B., TRAWI�SKI K., “Comparisonof Bagging, Boosting and Stacking Ensembles Applied toReal Estate Appraisal”, in Ngoc Thanh Nguyen N. T., ThanhM., �wi�tek L. (a cura di), Intelligent Information and DatabaseSystems, Springer, 2010, pp. 340-350.HAN S., KO Y., KIM J., HONG T., Housing Market TrendForecasts through Statistical Comparisons based on BigData Analytic Methods, Journal of Management inEngineering, Vol. 34, No. 2, 2018.HASTIE T., TIBSHIRANI R., FRIEDMAN J., The elements of statisticallearning data mining, inference, and prediction, SpringerVerlag, 2013. HAUSLER J., RUSCHEINSKY J., LANG M., News-based sentimentanalysis in real estate: a machine learning approach, Journalof Property Research, Vol. 35, No. 4, 2018, pp. 344-371. Volume 35, 2018 - Issue 4HOLLAND D. M., The assessment of land value, Universityof Wisconsin press, 1970.HOUGH J., System and method for computing a comparativevalue of real estate, brevetto n. 5.414.621, United StatesPatent, 1995. HUANG Y.,Predicting Home Value in California, United Statesvia Machine Learning Modeling, Statistics, optimizationand information computing, Vol. 7, No. 1, 2019.IAAO, Standard on Mass Appraisal of Real Property,International Association of Assessing Officers, 2013.IAAO, Standard on Automated Valuation Models (AVMs),International Association of Assessing Officers, 2018.ISAKSON H. R., Valuation analysis of commercial real estateusing the nearest neighbors appraisal technique, Growthand Change, Vol. 19, No. 2, 1988, pp. 11-24. JAMES G., WITTEN D., HASTIE T., TIBSHIRANI R., An Introductionto Statistical Learning: With Applications in R, SpringerVerlag, 2013. JOST A., NELSON J., GOPINATHAN K., SMITH C., Real estateappraisal using predictive modeling, brevetto n. 5.361.201,United States Patent, 1994. KAUKO T., D’AMATO M. (a cura di), Mass Appraisal Methods:An International Perspective for Property Valuers, Hardcover,Wiley-Blackwell, 2008. KOK, N., KOPONEN, E. L., MARTÍNEZ-BARBOSA, C. A., Big Data inReal Estate? From Manual Appraisal to Automated Valuation,The Journal of Portfolio Management, Vol. 43, No. 6, 2017,pp. 202-211.KUTASI D., BADICS M. C., Valuation methods for the housingmarket: evidence from Budapest, Acta Oeconomica, Vol.66, No. 3, 2016, pp. 527-546.KONTRIMAS V., VERIKAS, A., The mass appraisal of the realestate by computational intelligence, Applied SoftComputing, Vol. 11, No. 1, 2011, pp. 443-448. LAM K. C., YU C. Y., LAM, C. K., Support vector machine andentropy based decision support system for propertyvaluation, Journal of Property Research, Vol. 26, No. 3, 2009,pp. 213-233.

LASOTA T., MAKOS M., TRAWINSKI B. “Comparative analysis ofNeural Network models for premises valuation using SASenterprise miner” in Nguyen N. T., Katarzyniak R. P., JaniakA. (a cura di), New Challenges in Computational CollectiveIntelligence, Springer Berlin Heidelberg, 2009, pp. 337-348. LENK M. M., WORZALA E. M., SILVA A., High‐tech valuation:should artificial neural networks bypass the human valuer?,Journal of Property Valuation and Investment, Vol. 15, No.1, 1997, pp. 8-26. LI D. Y., XU W., ZHAO H., CHEN R. Q., A SVR based forecastingapproach for real estate price prediction, EighthInternational Conference on Machine Learning andCybernetics, 2009. LIN H. Y. U., CHEN K., Predicting Price of Taiwan Real EstatesBy Neural Networks and Support Vector Regression, 15thWSEAS international conference on Systems, 2011. LING D. C., NARANJO A., SCHEICK B., Investor Sentiment, Limitsto Arbitrage and Private Market Returns, Real estateEconomics, Vol. 42, No. 3, 2014, pp. 531-577.MANGANELLI B., DE MARE G., NESTICÒ A., “Using geneticalgorithms in the housing market analysis”, in Gervasi O.,Murgante B., Misra S., Gavrilova M.L., Rocha A.M.A.C., TorreC.M., Taniar D., Apduhan B.O. (a cura di), ComputationalScience and Its Applications - ICCSA 2015, SpringerInternational Publishing, 2015, pp. 36-45.MANGIALARDO A., MICELLI E., SACCANI F., Does SustainabilityAffect Real Estate Market Values? Empirical Evidence fromthe Office Buildings Market in Milan (Italy), Sustainability,Vol. 11, No. 1, 2019.MCCLUSKEY W. J., MCCORD M., DAVIS P. T., HARAN M.,MCILHATTON D., Prediction accuracy in mass appraisal: Acomparison of modern approaches, Journal of PropertyResearch, Vol. 30, No. 4, 2013, pp. 239-265. MCCLUSKEY W. J., ANAND S., The application of intelligenthybrid techniques for the mass appraisal of residentialproperties, Journal of Property Investment & Finance, Vol.17, No. 3, 1999, pp. 218-239. MCCULLOCH W. S., PITTS W. H., A logical calculus of the ideasimmanent in nervous activity, Bulletin of MathematicalBiophysics, Vol. 5, 1943, pp. 115-133.MCGREAL S., ADAIR A., MCBURNEY D., PATTERSON D., Neuralnetworks: the prediction of residential values, Journal ofProperty Valuation and Investment, Vol. 16, No. 1, 1998, pp.57-70. MOORE W. J., A History of Appraisal Theory and PracticeLooking Back from IAAO’s 75th Year, Journal of Property TaxAssessment and Administration, Vol. 6, No. 3, 2009, pp. 23-49.MOOYA M. M., Market Value without a Market: Perspectivesfrom Transaction Cost Theory, Urban Studies, Vol. 46, No.3, 2009, pp. 687-701.MOOYA M. M., “Automated Valuation Models and EconomicTheory”, in D’Amato M., Kauko T. (a cura di), Advances inAutomated Valuation Modeling, Springer InternationalPublishing, 2017, pp. 33–57.



MORANO P., TAJANI F., LOCURCIO M., Multicriteria analysis andgenetic algorithms for mass appraisals in the Italian propertymarket, International Journal of Housing Markets andAnalysis, Vol. 11, No. 2, 2018, pp. 229-262.

MU J., WU F., ZHANG A., Housing Value Forecasting Basedon Machine Learning Methods, Abstract and AppliedAnalysis, n. 4, 2014.

MULLAINATHAN S., SPIESS J., Machine learning: an appliedeconometric approach, Journal of Economic Perspectives,Vol. 31, No. 2, 2017, pp. 87-106.

NGUYEN N., CRIPPS A., Predicting Housing Value: AComparison of Multiple Regression Analysis and ArtificialNeural Networks, The Journal of Real Estate Research, Vol.22, No. 3, 2001, pp. 313-336.

NÚÑEZ-TABALES J. M., REY-CARMONA F. J., CARIDAD Y OCERIN J.M., Commercial properties prices appraisal: alternativeapproach based on neural networks, Journal of ArtificialIntelligence, Vol. 14, No. 1, 2016, pp. 53-70.

O’NEILL J. W., An Automated Valuation Model for Hotels,Cornell Hotel and restaurant administration quarterly, Vol.45, No. 3, 2004, pp. 260-268.

OLADUNNI T., SHARMA S., Hedonic housing theory - A machinelearning investigation, 15th IEEE International Conferenceon Machine Learning and Applications, ICMLA 2016.

PÉREZ-RAVE J. I., CORREA-MORALES J. C., GONZÁLEZ-ECHAVARRÌA

F., A machine learning approach to big data regressionanalysis of real estate prices for inferential and predictivepurposes of real estate prices for inferential and predictivepurposes, Journal of Property Research, Vol. 36, No. 1, 2019,pp. 59-96.

PHAN, T. D., Housing Price Prediction using MachineLearning Algorithms: The Case of Melbourne City , Australia,International Conference on Machine Learning and DataEngineering (iCMLDE), 2018.

POURSAEED O., MATERA T., BELONGIE S, Vision-based real estateprice estimation, Machine Vision and Applications, Vol. 29,No. 4, 2018, pp. 667-676.

REALFONZO A., Teoria e metodo dell’Estimo Urbano, La NuovaItalia Scientifica, 1994.

RICS, The Future of Valuations, 2017 (disponibile al sitointernet: https://www.rics.org/globalassets/rics-website/media/knowledge/research/insights/future-of-valuations-insights-paper-rics.pdf, consultato online il 12 Novembre2019)

ROBSON G., DOWNIE M. L., Automated Valuation Models: aninternational perspective, RICS Automated ValuationModels Conference: AVMs Today and Tomorrow, 2008.

ROSEN S., Hedonic Prices and Implicit Markets: ProductDifferentiation in Pure Competition, The Journal of PoliticalEconomy, Vol. 82, No. 1.,1974, pp. 34-55.

RUMELHART D. E., HINTON G. E., WILLIAMS R. J., Learningrepresentations by back-propagating errors, Nature, Vol.323, No. 6088, 1986, pp. 533-536.

SCHULZ R., WERSING M., WERWATZ A., Automated valuationmodelling: a specification exercise, Journal of PropertyResearch, Vol. 31, No. 2, 2014, pp. 131-153.

SHINDE N., GAWANDE K., Survey on predicting property price,International Conference on Automation andComputational Engineering (ICACE), 2018.

SIMONOTTI M., Metodi di stima immobiliare. Applicazionedegli standard internazionali, Flaccovio editore, 2006.

SUN Y., Real Estate Evaluation Model Based on GeneticAlgorithm Optimized Neural Network, Data Science Journal,Vol. 18, No. 36, 2019, pp. 1-9.

TAJANI F., MORANO P., LOCURCIO M., D’ADDABBO, “Propertyvaluations in times of crisis. Artificial Neural Networks andevolutionary algorithms in comparison”, in Gervasi O.,Murgante B., Misra S., Gavrilova M.L., Rocha A.M.A.C., TorreC.M., Taniar D., Apduhan B.O. (a cura di), ComputationalScience and Its Applications - ICCSA 2015, SpringerInternational Publishing, 2015, pp. 194-209.

TANG Y., QIU S., GUI P., Predicting Housing Price Based onEnsemble Learning Algorithm, International Conferenceon Artificial Intelligence and Data Processing (IDAP), 2018.

VARIAN H. R., Big Data: New Tricks for Econometrics, Journalof Economic Perspectives, Vol. 28, No. 2, 2014, pp. 3-28.

WANG X., WEN J., ZHANG Y., WANG Y., Real estate priceforecasting based on SVM optimized by PSO, Optik, Vol.125, No. 3, 2014, pp. 1439-1443.

WORZALA E., LENK M., SILVA, A., An Exploration of NeuralNetworks and Its Application to Real Estate Valuation, TheJournal of Real Estate Research, Vol. 10, No. 2, 1995, pp. 185-201.

YACIM J. A., BOSHOFF D. G. B., Impact of Artificial NeuralNetworks training algorithms on accurate prediction ofproperty values, Journal of Real Estate Research, Vol. 40,No. 3, 2018, pp. 375-418.

YEH H. C., HSIEH T. K., WANG T. S., Study on the Relationshipbetween City and District Average Price by GAOT in Taipei,Applied Mechanics and Materials, Vol. 370, 2013, pp. 2043-2049.

YOU Q., PANG R., CAO L., LUO J., Image Based Appraisal ofReal Estate Properties, IEEE Transactions on Multimedia,Vol. 19, No. 12, 2017, pp. 2751-2759.

ZHANG S. H., Application of Support Vector Machine indetermination of real estate price, Advanced MaterialsResearch, n. 461, 2012, pp. 818-821.

ZHAO Y., JIA S., The regression analysis and prediction ofreal estate added value based on genetic algorithm,International Conference on Management Science andIndustrial Engineering (MSIE), 2011.

ZINOVIEV D., Data Science essentials in Python: collect,organize, explore, predict, value, Pragmatic Bookshelf, 2016.


rivista valori e valutazioni n. 24 - 2020 151

Agostino Valier*, Ezio Micelli** parole chiave: real estate, AVM, machine learning, artificial intelligence, neural networks

I modelli automatici per la predizione del valore: una rassegna critica del dibattito

Le tecniche mass appraisal vengono impiegate in valu-tazioni di grandi insiemi di beni immobili. Il loro impiegoimplica l’utilizzo di dati immobiliari comuni, un protocollovalutativo unico e test di verifica dei risultati. Data lavastità di informazioni che devono elaborare, sono affi-date a modelli automatici di predizione del valore. Seinizialmente tali modelli si basavano sulla teoria dei prezzimarginali impliciti, individuati tramite l’analisi di regres-sione, ora possono assumere forme radicalmente diversegrazie alle novità apportate dagli algoritmi di autoappren-dimento statistico. Gli algoritmi di apprendimento automatico -noti comemodelli machine learning- apprendono autonomamentele informazioni contenute in un dataset. Essi sono in gra-do di acquisire le relazioni esistenti tra le caratteristichedegli asset e i valori di prezzo dei beni, anche quandoqueste hanno forme ben distanti dalla più tradizionalecorrelazione esprimibile con coefficienti numerici. Ognimodello viene dapprima addestrato con i dati di casi noti,e successivamente testato nella sua capacità di predire ivalori incogniti. La letteratura scientifica ha seguito l’e-voluzione dei modelli machine learning per la predizionedel valore, indagandoli sotto più profili di analisi. Il tema

di ricerca riscontrato con maggiore frequenza riguardala comparazione di più modelli di valutazione sul mede-simo dataset di dati immobiliari, confrontati in terminidi accuratezza nella predizione. La ricerca offre una rassegna critica del dibattito di tuttele pubblicazioni nelle quali è stata empiricamente inda-gata l’efficacia dei nuovi modelli di predizione del valore.I modelli si dimostrano efficaci nella loro capacità pre-dittiva, meno efficaci nella loro capacità inferenziale,ovvero valutare la dipendenza del fenomeno prezzo dallecause spiegate dalle variabili. Il dibattito conferma unasuperiore accuratezza di predizione dei nuovi modellinei confronti della tradizionale analisi di regressione.Non è tuttavia possibile stilare una classifica tra i modelliper ordine di accuratezza in quanto l’efficacia di ognimodello dipende dai dati di cui dispone. A fronte di taleinnegabile vantaggio, questi modelli presentano un limitenella propria caratteristica di black box: il valutatore nonpuò sapere con certezza quali valori e forme assumanole variabili nei processi di apprendimento. Ciò rende imodelli inefficaci per capire le dinamiche di formazionee variazione del valore in relazione alle caratteristichedel bene e agli agenti esterni.

Abstract

rivista valori e valutazioni n. 24- 2020152

L’intero settore della valutazione si interroga dunque suquale sarà il futuro delle stime, quali impatto avranno imodelli automatici di predizione del valore sulla praticaprofessionale valutativa, quali saranno le loro potenzialitàe i loro limiti di impiego (Cook, 2015).

2. I MODELLI AUTOMATICI PER LA PREDIZIONEDEL VALORE

I modelli di valutazione automatico – da ora in seguito defi-niti come AVM, acronimo di Automated Valuation Models –nascono nell’esigenza di stimare rapidamente e con pro-cedure uniformi grandi quantità di immobili. Vengono indicati anche con il termine di Computer-assisted massappraisal. Le prime proposte di modelli automatici di valutazionecompaiono negli anni ’70 in modelli di stima dei suoli (Hol-land, 1970). Inizialmente tali modelli venivano proposti perla stima del valore imponibile di suoli e beni immobili aifini dell’imposizione fiscale, e con tale obiettivo vengonoideate le prime sperimentazioni negli Stati Uniti e in Dani-marca. Risalgono agli anni ’90 i primi brevetti depositati dimodelli per la stima automatica (Jost et al., 1994; Hough,1995). Contemporaneamente anche la ricerca accademicaha iniziato occuparsi del fenomeno, prima negli Stati Unitie poi via via espandendosi fino a far diventare il tema diinteresse globale (Robson e Downie, 2008; Bidanset, 2014).Tuttavia, per molti anni la ricerca ha sofferto della mancanzadi una definizione univoca di AVM. Solamente nel 2003l’IAAO colma la lacuna redigendo gli “Standard on Auto-mated Valuation Models” che vengono definiti: A mathe-matically based computer software program that marketanalysts use to produce an estimate of market value basedon market analysis of location, market conditions, and realestate characteristics from information that was previouslyand separately collected (Moore, 2009; IAAO, 2018).

La quasi totalità dei modelli di valutazione automatica pre-dice i valori di mercato di unità residenziali. Le ragioni chemotivano la scelta di questo segmento di mercato possonoessere diverse: il settore residenziale è quello che registrail maggior numero di transazioni (e dunque di dati), i benihanno caratteristiche pressoché omogenee, la distribuzionedelle compravendite sul territorio è più uniforme rispettoad altri settori. In merito all’approccio valutativo impiegato,la quasi totalità degli AVM si basa sul Market comparisonapproach, il più efficace quando si dispongono i dati di unampio numero di transazioni come nel caso di stime massappraisal (French e Gabrielli, 2018). Tuttavia in letteraturatrovano spazio anche AVM con diverse finalità di impiegoe con differenti approcci, per esempio l’income approachper la valutazione di alberghi (O’Neill, 2004), o la compa-razione di affitti mensili per la stima del valore di localicommerciali (Núñez-Tabales et al., 2016).

I modelli automatici di predizione del valore adottano dun-que il procedimento di stima per comparazione. I parametrisui quali si basa la comparazione non si limitano ai soliprezzi di vendita – o di offerta – ma comprendono più carat-

1. INTRODUZIONE

I servizi di valutazione del settore immobiliare stanno cono-scendo un radicale mutamento. L’innovazione in atto incidesulla natura delle valutazioni, sui procedimenti operativi,sulle competenze richieste al settore professionale (Rics,2017).

I servizi estimativi non si identificano necessariamente conla persona fisica del soggetto valutatore, sempre più spessola valutazione può essere affidata a un modello di calcoloautomatico. La necessità di ricorrere ai modelli automatiz-zati di predizione del valore acquista maggiore forza neicasi in cui il numero di beni da stimare sia ampio. La stimasu larga scala di beni immobili – comunemente definita coltermine mass appraisal – occorre perlopiù nell’individua-zione di valori a fini fiscali, nella valutazione di portafogliimmobiliari o nell’aggiornamento di valori preesistenti. LaIAAO - International Association of Assessing Officers -definisce la tecnica di mass appraisal come il processo divalutazione di grandi insieme di immobili, svolto utilizzandodati immobiliari comuni, un protocollo valutativo unico etest di verifica dei risultati (Gloudemans, 1999; IAAO, 2013).

Se le stime di singoli asset sono spesso affidate a un sog-getto valutatore, i procedimenti mass appraisal impieganosempre modelli automatici di predizione del valore poichérisulterebbero difficilmente affidabili al solo giudizio delvalutatore. Il perito – per la sua natura intrinsecamente sog-gettiva – incontrerebbe difficoltà nel mantenere un approc-cio valutativo unico e garantire un trattamento omogeneoa tutti i casi analizzati. La stima manuale di grandi quantitàdi immobili, inoltre, si rileverebbe insostenibile in terminidi costi e tempi del processo valutativo (Bartke e Schwarze,2015).

Per i modelli automatici di calcolo sono state inizialmenteimpiegate forme tradizionali di statistica inferenziale, mai recenti sviluppi conseguiti nel campo dell’intelligenzaartificiale hanno permesso nuovi paradigmi sui quali basarela predizione del valore. Le macchine computazionali, attra-verso la tecnica dell’autoapprendimento, sono sempre piùin grado di sostituire attività tradizionalmente svolte dal-l’intelligenza umana tra le quali l’attività di valutazione.

L’interesse per nuovi e più efficaci modelli automatici divalutazione non proviene solamente dal mondo scientifico.Da parte del settore immobiliare vi è una richiesta semprepiù urgente di stime certe e oggettivamente valide. L’ultimacrisi finanziaria ha visto istituti di credito, imprese e privatifaticare nel liquidare beni immobili se non a fronte di signi-ficative svalutazioni, avvenute anche a causa degli erroridi sovravalutazione compiuti precedentemente sui beni.Un più efficace sistema di valutazione avrebbe forse evitatotali perdite di valore (Canesi et al., 2016).

Frey and Osborne (2017) in un’ampia indagine che assegnaad ogni professione il grado di possibile computerizzazio-ne – ovvero la possibilità che il lavoro attualmente svoltodall’uomo possa interamente essere sostituito dal lavorodi una macchina – stimano al 90% la suscettibilità di com-puterizzazione della professione dei valutatori immobiliari.


teristiche, impiegando dataset il più possibile ampi e det-tagliati. Si identificano come procedimenti di stima percomparazione pluriparametrica e assumono a fondamentoche il valore del bene vari in funzione delle sue caratteri-stiche immobiliari. Tali caratteristiche, che descrivono ilbene stesso e il suo contesto, devono essere tutte osser-vabili e quantificabili. La comparazione tra beni sulla base dei loro parametri didescrizione è un procedimento ben disciplinato all’internodella manualistica estimativa. Tra i più noti in ambito italiano,vi è il procedimento per valori tipici (Forte e De Rossi, 1974)o procedimento per punti di merito (Realfonzo, 1994). Taleprocedimento prevede che il bene oggetto di stima vengaconfrontato col bene che sul medesimo mercato ha regi-strato il maggior prezzo unitario, ipotizzando che le carat-teristiche di quest’ultimo presentino il grado ottimale diapprezzamento da parte del mercato. Il confronto vienequindi articolato nella somma dei giudizi assegnati per ognisingola caratteristica. A Simonotti (2006) il merito di avereintrodotto nel contesto nazionale il metodo del Sales Com-parison Approach, il quale sottopone i valori dei beni diconfronto ad aggiustamenti sistematici, determinati sullabase del confronto tra i singoli parametri. Tali procedimentipossono però essere impiegati solo quando il numero dibeni e di parametri per il confronto è limitato. Per grandiinsiemi di dati, quali quelli impiegati nelle stime massappraisal, occorre ricorrere a modelli di predizione perprobabilità statistica.

L’obiettivo di un modello mass appraisal è dunque indivi-duare – sulla base del confronto tra vari beni – la forma dif, la funzione che lega il prezzo (Y) del bene alle sue n carat-teristiche (x1, x2, …, xn). Proprio sull’individuazione dellaforma di f si dividono le due tipologie di modelli, che cor-rispondo a due approcci concettuali e procedurali diffe-renti: da un lato i modelli econometrici tradizionali, dall’altroi modelli di autoapprendimento o machine learning. I primipredispongono a priori la forma della funzione sulla basedelle conoscenze apprese dalla teoria economica e sullabase dei dati a disposizione quantificano i diversi coeffi-cienti. I secondi lasciano che siano i dati stessi a dare formaal modello, individuando la funzione che più efficacementesi adatti alle informazioni contenute nei dati. Kauko e D’Amato (2008) nella loro produzione scientificautilizzano una terminologia efficace per nominare le dueclassi appena descritte. Da un lato i modelli “ortodossi”,che impiegano un hedonic approach, quantificano il rap-porto che lega il prezzo dell’immobile alle sue caratteristi-che. Dall’altro lato i modelli “eretici”, che invece adottanouno statistical approach, leggono i pattern emergenti dalladistribuzione dei dati. Per la terminologia utilizzata in que-sto articolo impiegheremo la suddivisione tra modelli tra-dizionali e modelli machine learning. Tutti i modelli vengono valutati secondo due capacità: lacapacità inferenziale e la capacità predittiva. La prima con-siste nell’abilità del modello nell’individuare rapporti cau-

sa-effetto tra le variabili spiegate e le variabili indipendenti.La seconda risiede nella capacità del modello di elaborarerisultati di output corrispondenti al valore dei dati reali.

Tradizionalmente, le tecniche di mass appraisal impieganoanalisi di regressione multipla. Tale analisi è in grado dideterminare con coefficienti numerici il grado di influenzache esercitano i singoli parametri nella variazione del valorefinale. Essa trova giustificazione teorica e metodologicanella teoria dei prezzi edonici (o dei prezzi marginali impli-citi) secondo cui si può determinare una vera e propriafunzione di domanda per ogni singola caratteristica immo-biliare del bene arrivando a quantificare – tecniche di analisimultivariata – prezzi non direttamente osservabili per ognisingola caratteristica, ovvero i prezzi impliciti (Rosen, 1974;Ghosalkar, 2018).

I primi modelli di analisi di regressione erano lineari, ovveroricercavano la pendenza della linea retta che minimizzassela somma delle distanze assolute tra la retta e le ordinatedei casi. Ulteriori sviluppi di questa tecnica di analisi hannoprevisto anche l’esistenza di relazioni non lineari tra i dati.Per esempio, la regressione polinomiale include nel model-lo di regressione le funzioni polinomiali dei vari regressori.La più importante novità all’interno dei modelli edonici èstata rappresentata dall’introduzione dei modelli che con-siderano la componente spaziale dei dati. Questi modellistima l’esistenza di un’autocorrelazione spaziale tra le varia-bili esaminate e una matrice di distanze che definisce l’in-tensità delle relazioni che intercorrono tra i vari casi osser-vati.

A fronte del vantaggio di una rapida intelligibilità del model-lo, questo approccio presenta alcuni elementi di criticità.I limiti sotto il profilo operativo impongono – al fine di evi-tare fenomeni di multicollinearità – un numero di variabilicontenute, rendendo così la regressione poco adatta aivasti sciami informativi dei Big Data. Sotto il profilo eco-nomico la regressione si presenta più potente nelle suecapacità inferenziali – ovvero la determinazione dell’effettocausale dei singoli regressori – che non in quelle predittivedi determinazione del valore finale (Schulz et al., 2014;Athey, 2019; Pérez-Rave et al., 2019).

La comparsa e lo sviluppo dei modelli machine learning –detti anche modelli di autoapprendimento – è intrinseca-mente collegata allo sviluppo delle tecniche computazio-nali. La differenza tra questi algoritmi e i precedenti modellitradizionali non risiede certo nell’uso del computer, cheviene impiegato anche nelle analisi di regressione. Neimodelli tradizionali l’operatore umano inserisce nella mac-china l’algoritmo con cui essa deve lavorare, successiva-mente vi inserisce i dati e otterrà i risultati che la macchinaha elaborato seguendo il modello fornito. Nei modelli diautoapprendimento l’operatore fornisce alla macchina uninsieme di dati di input e un corrispondente insieme di datidi output. Successivamente la macchina ricerca la funzioneche meglio si adatta ai dati sia autonomamente, attraversoun percorso di autoapprendimento statistico, sia con l’in-tervento dell’operatore che ottimizza e seleziona i parametridel modello. I modelli di valutazione machine learning non


! " #$%!& %"& ' & %#(!

forniscono come risultati diretti i valori numerici di prezzo,bensì producono a loro volta algoritmi con i quali sarà pos-sibile trattare nuovi dati e da questi ottenere il valore dellavariabile di prezzo (Varian, 2014; Attigeri et al., 2015; Jameset al., 2013). Sul tema degli AVM convergono gli interessi del mondoprofessionale e gli interessi della ricerca scientifica. Se ilprimo è più orientato a conoscere il reale riscontro ope-rativo che questi possono avere nella pratica valutativa, ilmondo accademico indaga il fenomeno con uno spettropiù ampio includendovi anche aspetti teorici. Gli algoritmi di autoapprendimento statistico sono unostrumento recente e ad oggi pochi operatori li impieganonelle valutazioni di massa. Allo stesso tempo costituisconoun tema in rapidissima evoluzione, dove i ricercatori simuovono senza un apparato teorico e metodologico giàconsolidato. Con questa ricerca, gli autori si prefiggono dicolmare tale lacuna proponendo una rassegna delle con-clusioni a cui è finora giunta la letteratura scientifica. Larassegna descrive i contributi più significativi e le principaliposizioni emerse nel dibattito scientifico circa l’uso deimodelli machine learning per la predizione del valoreimmobiliare. Una particolare attenzione è dedicata alleloro possibilità di impiego, evidenziando le potenzialità ei limiti nell’attività professionale di stima. La ricerca potràfungere da base di partenza per futuri affinamenti deimodelli o confutazioni dei risultati raggiunti.

3. METODOLOGIA E DATI

La rassegna si suddivide secondo le diverse tipologie dialgoritmi, per ognuna delle quali viene fornita una brevedescrizione. All’interno della singola classe di algoritmi, ildibattito è organizzato sia per ordine cronologico, sia accor-pando tra loro diversi contributi che giungono a simili con-clusioni. Lo studio dei diversi algoritmi da parte degli autorisi basa su un saggio scientifico (Domingos, 2015), un volumedi metodologia statistica (Hastie et al., 2013) e un manualeoperativo per l’impiego di software nella scrittura di algo-ritmi (Zinoviev, 2016) A queste letture si aggiungono le con-sultazioni di numerose altre fonti per la migliore compren-sione dell’argomento. Vi sono numeri algoritmi machine learning, così come sononumerose le proposte di classificazione. Alcune suddivi-sioni si basano sulle forme statistiche dei modelli (Dey,2016), altre sulle loro finalità operative di impiego (Glumac

e Des Roisiers, 2018). Qui verranno trattati solamente glialgoritmi di apprendimento supervisionato utilizzati perrisolvere problemi di regressione, poiché sono gli uniciimpiegabili per predire il valore di una variabile continuaquale il prezzo di un immobile. Dunque gli algoritmi soli-tamente noti per problemi di classificazione – ad esempio,k-nearest neighbor – quando citati in questo articolo sonoda intendersi solo nella loro forma di algoritmi di regres-sione. Il passaggio successivo consiste nell’individuazione delcorpus di articoli sui quali indagare l’evoluzione del dibat-tito. Si articola in due fasi: l’identificazione di un datasetiniziale di articoli che fornisca uno stato dell’arte sull’im-piego dei modelli machine learning nella valutazioneimmobiliare e un successivo affinamento del dataset a unnumero più ristretto di articoli sui quali basare la rassegnacritica. La prima fase avviene con una ricerca sul database onlineScopus, scelto perché rappresenta uno dei più completied affidabili nel campo delle discipline estimative. Lo studiopreliminare della letteratura sul tema degli algoritmi diautoapprendimento ha permesso agli autori l’identifica-zione delle parole che ricorrono con maggiore frequenzanegli abstract e nei titoli degli articoli, per poi impiegaretali termini nella ricerca su Scopus. Le stringhe di ricercaimpiegate totali sono 36, esito delle combinazioni di untermine delimitante l’ambito operativo del real estate, 4termini indicanti la pratica valutativa e 9 termini descriventii modelli di autoapprendimento. Tali parole sono statecombinate tra loro tramite gli opportuni connettivi logicibooleani come riscontrabile in tabella (Tab 1).La ricerca su Scopus, aggiornata al luglio 2019, individuaun totale di 381 articoli. L’eliminazione dei numerosi dupli-cati riduce poi il campione a 165 articoli. La lettura integraledei 165 paper permette di individuare i filoni di ricercaall’interno del tema e di selezionare gli articoli sui qualifocalizzare l’attenzione per rispondere alla domanda diricerca posta precedentemente. I profili di analisi con cui la produzione scientifica tratta imodelli di autoapprendimento nelle valutazioni massappraisal possono essere ricondotti a tre: teorico, meto-dologico ed empirico. Sotto il profilo teorico si indaganoquali teorie economiche siano a fondamento dei procedi-menti di stima automatica del valore (Mooya, 2009, 2017).La ricerca metodologica propone nuovi modelli di valuta-zione o propone una classificazione di quelli esistenti


“real estate” AND A. “valuation*”B. “appraisal*”C. “automated valuation

model*” OR “avm”D. “price* forecast*”

AND 1. “machine learning”2. “artificial intelligence” OR “ai”3. “decision tree*” OR “regression tree*”4. “neural network*” OR “ann”5. “backpropagation”6. “genetic algorithm*”7. “bayes*”8. “nearest neighbour*” OR “knn”9. “vector machine” OR “svm”

Tabella 1 - Stringhe di ricerca impiegate sul database Scopus


(Kauko e D’Amato, 2008; D’Amato e Kauko, 2017; Glumace Des Roisiers, 2018). Le ricerche di natura empirica, infine,testano i modelli di valutazione su dataset di dati immobi-liari – di compravendita o di offerta – quantificando in ter-mini numerici la capacità previsionale dei modelli (Shinde,2018). La misura dell’accuratezza non è mai fine a sé stessa,ma è il punto di partenza da cui operare riflessioni suimodelli di stima e sulle loro possibilità di impiego.

Su quest’ultima tipologia di ricerca si basa la rassegna criticadel dibattito, delimitando il campo di indagine ai 66 articolicontenenti l’applicazione di uno o più modelli di valuta-zione testati sul medesimo dataset di dati immobiliari. Lascelta trova giustificazione nella convinzione che i test dimisura dell’accuratezza siano il punto di incontro tra ilsapere della ricerca scientifica e le necessità operative dellavalutazione immobiliare. Il dataset di 66 articoli è stato suc-cessivamente ampliato con la tecnica dello snowballingbibliografico, ovvero l’inclusione di nuovi contributi chesono stati reperiti all’interno delle liste dei riferimenti biblio-grafici di altri articoli. Molti articoli letti infatti rimandanoal contenuto di altri paper non presenti nei testi inizialmenteindividuati. I riferimenti bibliografici ritenuti più significativisono dunque stati aggiunti arricchendo il campione di 13unità, per un totale di 79 articoli utilizzati.

4. LA RASSEGNA CRITICA DEL DIBATTITO

Prima di procedere alla rassegna critica del dibattito, occorredefinire cosa si intenda per misura dell’accuratezza, di granlunga il metodo di indagine più diffuso nella letteraturasugli AVM che impiegano modelli machine learning. Lamisura della capacità predittiva adotta sempre il medesimoprotocollo. Il dataset a disposizione degli autori viene sud-diviso in due parti: il training set e il testing set. Il primoinsieme comprende il 70-80% dei dati totali e viene impie-gato per la fase di addestramento del modello, in cui il com-puter lavora con i dati di input (le caratteristiche immobiliari)e i dati di output (i prezzi finali) individuando la funzioneche meglio spieghi la variabile dipendente del valore. Algo-ritmi eccessivamente allenati ai dati del training set rischianofenomeni di overfitting: a fronte di un ottimo apprendi-mento delle relazioni interne al campione, non hannomaturato capacità di adattamento ai dati esterni al campio-ne. La restante parte del dataset complessivo (il testing set)viene impiegata per testare il modello ottenuto. I dati diinput (X) del testing set vengono elaborati dall’algoritmoformatosi sul training set, successivamente i valori di outputforniti dal modello (Y ) vengono confrontati con i valori dioutput del testing set (Y). Tanto minore sarà lo scarto tra Ye Y, tanto più il modello potrà essere dichiarato efficacenella sua capacità di previsione del valore.

Vi sono numerosi indicatori statistici che misurano lo scartotra i valori di output ottenuti e i valori di obiettivo. La sceltadi tale indicatore non è una scelta marginale, in molte ricer-che (ad esempio, Lasota et al., 2009) a seconda dell’indica-tore considerato varia l’ordine in cui si distribuiscono imodelli secondo la loro capacità previsionale. È utile ricor-

dare infine che il protocollo di valutazione del modelloappena descritto non è prerogativa dei modelli di autoap-prendimento ma viene applicato a qualsiasi tipo di modellodi previsione, dunque anche per valutare i modelli di regres-sione multipla.

4.1 Alberi di regressione e random forest

Uno dei più grandi limiti dei modelli machine learning risie-de nel loro carattere di black box, ovvero sono ignote leforme e i valori dei passaggi interni del modello che inter-corrono tra la fase di input e la fase di output. Ciò non èvero nel caso degli alberi di regressione, per i quali è pos-sibile conoscere il valore assunto dai dati in ciascun pas-saggio del percorso di autoapprendimento. Per tale motivosono definiti anche come modelli white box. Analogamente agli alberi di decisione, gli alberi di regres-sione organizzano il percorso conoscitivo in più percorsifatti di nodi successivi che terminano con foglie finali. Par-tendo dal nodo radice (dati di input), l’algoritmo divide ildataset in due nodi interni, che a loro volta vengono sud-divisi ciascuno in due sottonodi e così via seguendo unprocesso di partizioni binarie successive fino a raggiungerei nodi finali (detti foglie) in cui i dati contenuti sono moltopiù omogenei di quelli del dataset di partenza. La funzionedi partizione (split) impiega le variabili esplicative immessedel modello e suddivide ogni nodo con il valore di splitche massimizza il decremento di impurità (indice di ete-rogeneità di Gini) della variabile. Fan et al. (2006) utilizzanogli alberi di regressione per studiare quali caratteristicheconcorrano maggiormente alla determinazione del prezzonel patrimonio residenziale di Singapore.

Tali modelli rischiano però di specializzarsi troppo sui datia disposizione. Quando i risultati di accuratezza predittivaottenuti sui dati di addestramento superano di gran lungaquelli ottenuti sui dati di verifica il modello non può essereconsiderato come efficace perché soffre di overfitting. Perovviare a tale inconveniente vengono limitati i periodi ditraining oppure vengono adottate tecniche specifiche qualiil pruning (letteralmente: potatura). Maggior fortuna laottengono i modelli random forest, modelli di ensemblelearning esito dell’aggregazione di più alberi di regressione.I modelli di apprendimento di insieme combinano piùmodelli singoli – in questo caso, gli alberi di regressione –all’interno di un unico metamodello – la foresta casuale –che offre prestazioni migliori di quelle offerte da ciascunmodello considerato singolarmente (Tang et al., 2018;Graczyk et al., 2010; Antipov e Pokryshevskaya, 2012).

Nella ricerca di Mullainathan e Spiess (2017), gli alberi deci-sionali dimostrano minore capacità predittiva dell’analisidi regressione, che però è a sua volta superata dai modellirandom forest. Nella stima di 7.400 transazioni residenzialinella città di Ljubljana, il coefficiente di determinazione(R2) registrato dalle foreste casuali supera di 34 punti per-centuali quello ottenuto dal metodo dei minimi quadrati(Ceh et al., 2018). Kok et al. (2017) impiegano i dati di 36.000case singole tra California, Florida e Texas per testare i


modelli random forest. Applicano il modello in tre diversicasi: nei primi due casi il modello predice il valore di mer-cato, nel terzo caso predice i valori del NOI (Net OperatingIncome). Nel secondo caso, inoltre, sono stati inseriti ivalori NOI degli asset all’interno dei dati immobiliari neces-sari alla predizione del valore di mercato. I modelli randomforest si sono dimostrati più efficaci del metodo dei minimiquadrati nel primo e nel terzo caso. Dunque, quando ilNOI è incluso nei dati di input l’analisi di regressione sidimostra ancora come più efficace.

4.2 Reti neurali artificiali

Le reti neurali artificiali – d’ora in poi indicate anche comeANN, acronimo di Artificial Neural Networks – sono natedalla congiunzione tra le discipline della matematica e dellaneurofisiologia negli anni ’40 del secolo scorso (McCulloche Pitts, 1943). Esse organizzano il percorso conoscitivo ripro-ducendo quello che compie il cervello umano all’atto diapprendere nuove informazioni, dispongono perciò leinformazioni in più nodi, detti neuroni, distribuiti in piùlivelli, detti layer.

Il primo livello contiene i dati di input, mentre il livellofinale contiene i dati di output. Tra questi due livelli vi è unnumero variabile di hidden layers, all’interno dei qualiavviene il reale percorso di apprendimento. La scelta del-l’architettura del modello, cioè il numero di hidden layerse il numero dei relativi neuroni in essi contenuti, ha ricaduteimportanti sui risultati ottenuti, come dimostrato nel con-tributo di Cechin et al. (2000). Ogni neurone è collegato aciascuno dei neuroni del livello precedente e a ciascunodei neuroni del livello successivo, ogni connessione èespressa da una funzione che ne regola il peso. Ogni pesoè direttamente proporzionale all’importanza che il modelloassegna al collegamento tra i due specifici neuroni.

Se però la rete neurale si riducesse solamente a quantoappena descritto, servirebbero infiniti tentativi casuali primache il modello arrivi a dimostrare un sufficiente grado diaffidabilità. A tal fine le reti neurali sono state implementatecon l’algoritmo della retroprogazione, fondamentale perl’allenamento di una rete neurale (Rumelhart et al., 1986).Esso fa sì che la rete neurale, dopo aver prodotto autono-mamente i risultati iniziali, li confronti con gli output fornitidel testing set e sulla base del confronto tra i due risultatiinizi un percorso inverso correggendo tutti i pesi tra i varineuroni.

Il primo a quantificare la capacità delle reti neurali artificialidi fornire stime affidabili è stato Borst (1991). A inaugurarele ricerche comparative di più modelli sul medesimo datasetsono invece stati Do e Grudnitski (1992), i quali testano lasuperiore efficacia delle reti neurali nei confronti dellaregressione multipla nel riconoscere il prezzo di 136 abi-tazioni a San Diego. In numerose altre ricerche successivela performance delle ANN supera, talvolta in termini ampia-mente significativi, quella dei modelli tradizionali (Amri eTularam, 2012; Wang et al., 2014; Kutasi e Badics, 2016). Lamisura della capacità previsionale delle ANN non si è limi-

tata alle stime dei valori, ma si è estesa all’analisi del mercatoimmobiliare (Hausler et al., 2018; Ling et al., 2014) e allastima di indicatori aggregati (Han et al., 2018). Tutti gli autorievidenziano come tali modelli presentino, a fronte di un’in-negabile efficacia predittiva, un elemento di criticità cherisiede nel loro carattere di black box. Non è osservabileil ruolo che i singoli parametri svolgono nella variazionedel valore, definendo in coefficienti numerici i rapporticausali tra i prezzi e le caratteristiche dei beni (Yacim eBoshoff, 2018). Alcuni autori, per studiare l’importanza dellevariabili, redigono un’analisi di sensitività in cui per ognivariabile calcolano la differenza tra lo scarto quadraticomedio dell’intero modello con lo scarto quadratico mediodel modello senza la variabile analizzata (Tajani et al., 2015).

Il comune entusiasmo sulla capacità predittiva delle retineurali ottiene una battuta di arresto a partire dalla ricercadi Worzala, Lenk e Silva (1995) i quali per primi muovonoalcune critiche ai lavori di Borst (1991) di Do e Grudnitski(1992). Gli autori lavorano su un set di 288 case a Fort Collins,dunque un campione di dati più ampio delle due ricercheprecedenti, e iterano il confronto reti neurali/regressionemultipla su tre campioni: il campione completo (caso 1), ilcampione costituito dai casi che ricadevano nel range diprezzo analizzato da Do e Grudnitski (caso 2) e infine – perconfrontarsi col caso di Borst il quale aveva impiegato benimolto simili tra loro – un campione costituito da case appar-tenenti al medesimo codice postale (caso 3). Il caso 1 hariscontrato valori di accuratezza pressoché identici perentrambi i modelli, nel caso 2 la classifica di performancevariava al variare del tipo di software impiegato, infine solonel caso 3 – dunque con un campione di dati molto omo-geneo – le reti neurali performavano meglio dell’analisi diregressione. Gli autori mettono dunque in dubbio la supe-riorità assoluta delle reti neurali sui modelli tradizionali,facendo corrispondere tale superiorità a specifiche con-dizioni del dataset o del software impiegato.

Ad analoghe conclusioni sono giunte le ricerche di Lenket al. (1997), McGreal et al. (1998), McCluskey et al. (2013).A Nguyen e Cripps (2001) il merito di avere studiato conefficacia il rapporto che lega le reti neurali alla quantità didati di cui dispongono. Impiegando un dataset di 3906osservazioni e iterando 108 volte il confronto con datasetdi diversa dimensione, dimostrano che le ANN superanola capacità previsionale della regressione multipla soloquando il campione è di dimensioni medio-grandi.

I limiti delle reti neurali artificiali sono evidenziati anchein altre tipologie di valutazione quali, ad esempio, test dipredizione dei valori di mercato dell’energia (Gensler etal., 2016).

4.3 Algoritmi evoluzionisti

Gli algoritmi genetici si basano sulla medesima funzionedi adattamento definita da Darwin per l’evoluzione biolo-gica, dove gli individui si sviluppano adattandosi agli stimolicostituiti dall’ambiente circostante. Gli algoritmi geneticiiterano più volte l’algoritmo adattandolo ai dati a loro dispo-



sizione attraverso le funzioni di crossover e mutation. Ognigenerazione di algoritmo, quindi, si adatta ai dati megliodella sua precedente. L’applicazione di questi modelli all’in-terno dei modelli di valutazione mass appraisal è spessoposta in combinazione con altri algoritmi: Sun (2019) liimpiega efficacemente per ottimizzare un più tradizionalealgoritmo di retropropagazione. Zhao e Jia (2011) lo appli-cano per predire il valore aggiunto lordo del settore immo-biliare. Doveroso segnalare il contributo che il mondo acca-demico italiano sta dando nel testare algoritmi geneticinella valutazione immobiliare. Tali modelli si sono infattidimostrati efficaci nel predire il valore di immobili a Napoli(Del Giudice et al., 2017), Potenza (Manganelli et al., 2015)e Bari, Napoli e Roma (Morano et al., 2018).

4.4 Nearest neighbors

Gli algoritmi analogisti lavorano ricercando quali siano icomportamenti dei casi simili al caso oggetto di indagine,ai fini di predire il comportamento di quest’ultimo. Nonorganizzano dunque il percorso conoscitivo in fasi succes-sive, ma osservano la distanza che intercorre tra i vari casimisurandola sulla base delle variabili che li descrivono. Sibasano sul principio che i fenomeni in natura non sonoquasi mai lineari rispetto al campione considerato per inte-ro, bensì presentano relazioni lineari in piccoli contestilocali limitati a pochi dati. Perciò, anziché cercare le coor-dinate di una linea che possa spiegare l’intero fenomeno,si limitano ad analizzare il fenomeno nelle prossimità delcaso da stimare.

Sono definiti forme di apprendimento pigro (lazy algo-rithms) poiché, a differenza di tutti gli altri algoritmi, nonidentificano e modellano le relazioni interne ai dati del trai-ning set prima di conoscere quale sia la domanda presentatadal testing set. Una volta conosciuti i termini dell’istanzapresentata, trattano solamente i dati necessari a risponderealla domanda.

I più noti sono i k-nearest neighbors, dove la variabile inda-gata – in questo caso il prezzo – è la media dei valori chela variabile assume nel numero k dei casi più prossimi. Isak-son (1988) usa la tecnica per predire il valore di 143 proprietàimmobiliari a Dallas suddivise tra apartment, industrial,office e retail: in tutte quattro le tipologie i nearest neigh-bors ottengono migliori performance del metodo dei mini-mi quadrati. Nel medesimo contributo l’autore rileva peròche la tecnica è efficace nei casi in cui il valore da predireabbia caratteristiche vicine alla media dei dati a disposizio-ne, mentre si riveli più inadeguata nei casi in cui l’oggettoda valutare sia un outlier statistico. In altre ricerche i k-nea-rest neighbors registrano valori di accuratezza inferiori aquelli ottenuti da altri modelli (Borde et al., 2017; Oladunnie Sharma, 2016).

Sebbene tali modelli impieghino geometrie complesse,maggiormente in grado di lavorare sulla multidimensiona-lità rispetto alla tradizionale geometria euclidea, presentanoforti criticità quando il numero di dati – e dunque il numerodi dimensioni – cresce (Cover e Hart, 1967). Ciò ha portato

a limitarne l’impiego nei modelli mass appraisal, sebbenesi dimostrino efficaci in alcune fasi della valutazione: Mcclu-skey e Anand (1999) li impiegano per individuare i compa-rables più significativi all’interno di un modello ibrido, affi-dando poi la determinazione del prezzo alle reti neurali eagli algoritmi genetici.

4.5 Macchine a vettori di supporto

Le criticità riscontrate nei nearest neighbors hanno portatoa un graduale abbandono delle tecniche di lavoro sugliiperspazi, fino all’introduzione delle macchine a vettoridi supporto (SVM) introdotti da Vapnik (Boser, Guyon eVapnik, 1992). Esse sono in grado di ridurre la grande com-plessità di modelli non lineari in spazi vettoriali sui qualipotere applicare modelli di algebra lineare (Cristianini eSchölkopf, 2002). Un problema presentato nel suo spaziooriginale spesso presenta elementi difficilmente separabilitra loro in quello spazio; il medesimo compito diventapiù facile se si lavora in un iperpiano costituito da piùspazi.

Le loro applicazioni nei modelli di valutazione immobiliaresono tante e generalmente mostrano una grande efficacianella predizione del valore. Kontrimas e Verikas (2011)individuano la Support Vector Regression - SVR - come ilpiù efficace predittore del valore. Ad analoghe conclusionigiungono numerose altre ricerche, provenienti perlopiùdall’area asiatica ( Li et al., 2009; Lin e Chen, 2011; Zhang,2012; Yeh, Hsieh e Wang, 2013; Mu, Wu e Zhang, 2014;Wang et al., 2014; Huang, 2019) . Non è tuttavia possibileassegnare alle macchine a vettori di supporto il ruolo dimiglior algoritmo per la valutazione in assoluto: tuttodipende dalla natura dei dati a disposizione. Ad esempio,nel loro confronto con le reti neurali artificiali alcuni autorile individuano come più efficaci (Lam, Yu e Lam, 2009),altri autori come meno efficaci (Abidoye et al., 2019; Phan,2018).

Nonostante il crescente interesse che la ricerca sta dimo-strante verso tali algoritmi, alle macchine a vettori di sup-porto nella valutazione immobiliare la letteratura non haancora dedicato un dibattuto strutturato come quello chehanno conosciuto le reti neurali artificiali. Le ragioni ditale lacuna risiedono forse nel fatto che il tema è presentenella produzione scientifica da un tempo ancora troppobreve.

4.5.1 Modelli tradizionali e modelli machinelearning: un confronto

La lettura dei risultati di volta in volta riportati sembre-rebbe confermare una maggiore capacità previsionaledei modelli di autoapprendimento rispetto al tradizio-nale approccio econometrico. Manca tuttavia una ricercache possa empiricamente confermare tale ipotesi su undataset di articoli il più ampio e rappresentativo possi-bile, quantificando i risultati emergenti dalla letteraturaprodotta.


Per quanto riguarda le condizioni che guidano la sceltadel modello, si può affermare che ogni valutatore appre-standosi a definire un modello mass appraisal si trova difronte a un bivio: da un lato i modelli econometrici tradi-zionali, dall’altro i modelli machine learning. Sebbeneentrambi condividano il medesimo l’obiettivo – stimareil valore di mercato – le loro modalità di impiego e le lorocaratteristiche sono molto diverse. Riprendendo l’espres-sione di Mullainathan e Spiess (2017): i primi stimano β, isecondi stimano y.I tradizionali modelli di regressione sono dei procedimentidi tipo inferenziale: spiegano il rapporto causa-effetto chele variabili indipendenti svolgono sulla variabile dipendente.Essi nascono per individuare relazioni all’interno di un data-set, la misura della loro capacità predittiva su un datasetesterno (testing set) può essere considerata quindi comeuna forzatura, in quanto viene assegnata capacità predittivaa un modello la cui finalità primaria non risiede nel predireil valore di variabili incognite. Il loro potere inferenziale lirende validi strumenti su cui basare processi induttivi perdescrivere comportamenti generalmente validi su una datapopolazione statistica (Mangialardo et al., 2019).

I modelli machine learning hanno come principale obiettivola predizione valore di y, viene posta in secondo piano lapiano l’individuazione della linea (retta o curva) che spiegail rapporto che intercorre tra y e ciascuna variabile. Non sipossono derivare ipotesi inferenziali per mancanza deicoefficienti numerici β. Anche nel caso in cui si assegnasseun ordine di importanza tra le variabili misurando la diffe-renza di accuratezza tra il modello considerato nel suo inte-ro e il modello senza la variabile in oggetto, mancherebberoindicatori importanti quali la misura di errore standard. Latotale assenza di ipotesi causa-effetto nell’algoritmo nondeve però implicare che la conoscenza della teoria econo-mica sia irrilevante per la buona riuscita del modello. Leconoscenze economiche ed estimative guidano l’operatorenella scelta del modello, nel trattamento delle variabili enella regolazione dei parametri.

Il limite principale del machine learning risiede nell’over-fitting. La sua potente capacità predittiva rischia spesso dirisultare poco efficace quando affronta nuovi dati, diversida quelli con cui si è allenata. Per ovviare a questa criticitàsi ricorre a tecniche di regolarizzazione e di affinamentodei parametri. Le prime sono tecniche di mapping dei datiche riducono il fenomeno dell’elevata dimensionalità, tipicadei Big Data. Grandi insieme di dati possono infatti pre-sentare informazioni insignificanti o pattern anomali cherendono difficoltosa l’individuazione delle relazioni stati-stiche più significative. L’affinamento dei parametri, invece,è un processo di graduale incremento dell’accuratezza delmodello. Viene definito non a caso come empirical tuning,poiché si basa interamente sui valori espressi dai dati. Attra-verso successive convalide incrociate interne al trainingset si riescono ad affinare i parametri prima che questi ven-gano testati sul testing set, riducendo così il fenomeno dioverfitting.

Il passaggio di empirical tuning, costituito da correzioni

continue, scelte del modello e delle sue articolazioni inter-no, portano l’operatore che definisce il modello ad assu-mere un ruolo di primo piano. Scrivere e ottimizzare imodelli di autoapprendimento corrisponde a un vero eproprio sapere, esito della somma di competenze ed espe-rienze sul campo. Ogni modello è frutto anche del soggettoche lo ha ideato e affinato, adattandolo al meglio alla naturadel problema richiesto. Numerose ricerche mostrano comela capacità predittiva dei modelli subisca grandi variazionial variare dei parametri che vengono loro assegnati (Bal-dominos et al., 2018). Allo stesso tempo ogni modello èvalido solo per i dati con i quali è stato ideato, l’aggiuntadi nuovi dati o la modifica dei dati esistenti non necessa-riamente comporterà che il modello conservi o aumenti lasua capacità predittiva. Potrebbe, al contrario, vederla dimi-nuire.

5. CONCLUSIONI

La rassegna è stata orientata a individuare quali evidenzeemergono nella letteratura finora prodotta sul tema degliAutomated valuation models. Le letture hanno rilevato unasicura predominanza dei modelli di apprendimento auto-matico nei confronti dei modelli econometrici tradizionaliin merito alla capacità di predizione del valore di prezzodegli immobili. All’interno delle varie tipologie di modellinon è tuttavia possibile stilare una classifica per ordine diaffidabilità predittiva: l’efficacia del modello dipende diret-tamente dalla natura dei dati di cui dispone.

La produzione scientifica non ha solo valutato in terminiprestazionali tali modelli, ma ha rilevato una serie di evi-denze che hanno generato un dibattito sugli ambiti neiquali ne sia possibile l’impiego. Sotto il profilo operativo,le elevate perfomance ottenute nella previsione dei prezzidegli immobili rendono invitanti i modelli machine learningper tutti gli operatori che valutano, gestiscono o compra-vendono il patrimonio immobiliare. Gli investitori li pos-sono impiegare in valutazione di possibili investimenti odi transazioni di cui sono parte contraente. Analogamentechi offre servizi di valutazione può ricorrere agli algoritmidi autoapprendimento per offrire stime affidabili ai propriclienti. Si tenga conto che la creazione di modelli machinelearning sarà possibile solamente a chi detiene un grandepatrimonio informativo col quale allenare e ottimizzarel’apprendimento. Piccoli valutatori indipendenti difficil-mente avranno sufficienti dati e competenze per crearemodelli propri, potranno bensì usufruire dei servizi vendutiloro dai player più grandi del settore. L’innovazione tec-nologica apporterà perciò modifiche radicali all’attualeassetto del settore professionale dei valutatori (Abidoye eChan, 2017).

A fronte degli innegabili vantaggi in termini di accuratezzadella predizione, molti autori hanno individuato nel carat-tere di black box dei modelli machine learning il loro piùgrande limite nell’impiego operativo. I nuovi modelli nonsono adatti a spiegare le dinamiche del mercato e – piùgeneralmente – i meccanismi di formazione del valore.



Sono difficilmente impiegabili per policies di natura pub-blica, laddove il processo valutativo deve garantire equitàdi trattamento a tutti i casi interessati e mantenere la stessaefficienza nel tempo. I modelli di autoapprendimento nonsono in grado di garantire i medesimi requisiti di accura-tezza a fronte dell’arrivo di nuovi dati da stimare. Ciò potreb-be essere causa di contestazione da parte dei soggetti chesi sentono danneggiati dalle loro valutazioni. Vale la pena però richiamare all’attenzione un ulterioreelemento a favore delle nuove tecniche di apprendimento.Esse sono in grado di lavorare con i Big Data, intesi nonsolo nella loro caratteristica di vastità ma anche nella carat-teristica di varietà (Choi e Varian, 2012). In questa ricerca

si sono trattate solamente variabili espresse attraverso valorinumerici o categorie, ma i modelli machine learning pos-sono lavorare anche con tipi di dati molto diversi: ad esem-pio, le immagini fotografiche. Numerose ricerche studianoprocessi di valutazione che impiegano le foto degli immobili(You et al., 2017; Poursaeed et al., 2018). L’impiego di nuovefonti informative quali immagini (anche satellitari), gli spo-stamenti tracciati dai devices che quotidianamente impie-ghiamo, il rating assegnato agli esercizi commerciali – soloper citarne alcune – potranno rivelarsi efficaci predittoridel valore. Potranno inoltre supplire in parte alla carenzadi fonti informative tradizionalmente lamentata da moltinel settore immobiliare.


* Agostino Valier, Dipartimento di Ingeneria Civile e Ambientale, Università degli Studi di Padovae-mail: [email protected]

** Ezio Micelli, Dipartimento di Culture del Progetto, Dorsoduro 2196, Veneziae-mail: [email protected]

Bibliografia

ABIDOYE R. B., CHAN A. P. C., Valuers’ receptiveness to theapplication of artificial intelligence in property valuation,Pacific Rim Property Research Journal, Vol. 23, No. 2, 2017,pp.175-193.

ABIDOYE R. B., CHAN A. P. C., ABIDOYE F. A., OSHODI O. S.,Predicting property price index using artificial intelligencetechniques, International Journal of Housing Markets andAnalysis, Vol. 12, No. 6, 2019, pp. 1072-1092.

AMRI S., TULARAM G. A., Performance of Multiple LinearRegression and Nonlinear Neural Networks and Fuzzy LogicTechniques in Modelling House Prices, Journal ofMathematics and Statistics, Vol. 8, No. 4, 2012, pp. 419-434.

ANTIPOV E. A., POKRYSHEVSKAYA E. B., Mass appraisal ofresidential apartments: an application of random forest forvaluation and a CART-based approach for modeldiagnostics, Expert Systems with Applications, Vol. 39, No.2, 2012, pp. 1772-1778.

ATHEY S., “The impact of Machine Learning on Economics”,in Agrawal A., Gans J., Goldfarb A. (a cura di), The Economicsof Artificial Intelligence. An Agenda, University of ChicagoPress, Chicago, 2019, pp. 507-552.

ATTIGERI G. V., PAI M., PAI M. R., NAYAK A., Stock marketprediction: A big data approach, TENCON 2015 - 2015 IEEERegion 10 Conference.

BALDOMINOS A., BLANCO I., MORENO A., ITURRARTE R., BERNARDEZ

O., AFONSO C., Identifying Real Estate Opportunities UsingMachine Learning, Applied Sciences, Vol. 8, No. 11, 2018,p. 2321.

BARTKE S., SCHWARZE R., The economic role of valuers in realproperty markets, UFZ Discussion Papers, n. 13, 2015.

BIDANSET P. E., Moving automated valuation models out of

the box: the global geography of AVMs, Fair & Equitable,2014. BORDE S., RANE A., SHENDE G., SHETTY S.,Real Estate InvestmentAdvising Using Machine Learning, International ResearchJournal of Engineering and Technology, Vol. 4, No. 3, 2017,pp. 1821-1825. BORST R. A., Artificial neural networks: the nextmodelling/calibration technology for the assessmentcommunity, Property Tax Journal, Vol. 10, No. 1, 1991, pp.69-94. BOSER B. E., GUYON I. M., VAPNIK V. V., Training AlgorithmMargin for Optimal Classifiers, COLT ‘92 Proceedings ofthe fifth annual workshop on Computational learningtheory, 1992. CANESI R., D’ALPAOS C., MARELLA C., Forced Sale Values vs.Market Values in Italy, Journal of Real Estate Literature, Vol.24, No. 2, 2016, pp. 377-401.CECHIN A., SOUTO A., GONZALEZ M. A., Real estate value atPorto Alegre city using artificial neural networks, 6thBrazilian Symposium on Neural Networks (SBRN 2000), 22-25 November 2000, Rio de Janiero, Brazil. CEH M., KILIBARDA M., LISEC A., BAJAT B., Estimating thePerformance of Random Forest versus Multiple Regressionfor Predicting Prices of the Apartments, ISPRS InternationalJournal of Geo-Information, Vol. 7, No. 5, 2018, p. 168. CHOI H., VARIAN H., Predicting the Present with GoogleTrends, Economic Record, n. 88, 2012, pp. 2-9.COOK D.,RICS futures: turning disruption from technologyto opportunity, Journal of Property Investment & Finance,Vol. 33, No. 5, 2015, pp. 456-464.COVER T. M., HART P. E., Nearest neighbor patternclassification, IEEE Transactions on Information Theory,Vol. 13, No. 1, 1967, pp. 21-27.

CRISTIANINI N., SCHÖLKOPF B., Support Vector Machines andKernel Methods Learning Machines, AI Magazine, Vol. 23,No. 3, 2002, pp. 31-42.

D’AMATO M., KAUKO T. (a cura di), Advances in AutomatedValuation Modeling, Springer International Publishing,2017.

DEL GIUDICE V., DE PAOLA P., FORTE, F., Using geneticalgorithms for real estate appraisals, Buildings, Vol. 7, No.4, 2017, pag. 31.

DEY A.,Machine Learning Algorithms: a Review, InternationalJournal of Computer Science and InformationTechnologies, Vol. 7, No. 3, 2016, pp. 1174-1179.

DO A. Q., GRUDNITSKI G., A neural network approach toresidential property appraisal, The Real Estate Appraiser,Vol. 58, No. 3, 1992, pp. 38-45.

DOMINGOS P., The master algorithm: how the quest for theultimate learning machine will remake our world, AllenLane, 2015.

FAN G. Z., ONG S. E., KOH H. C., Determinants of house price:A decision tree approach, Urban Studies, Vol. 43, No. 12,2006, pp. 2301-2316.

FORTEC., DEROSSIB.,Principi di economia ed estimo, Etas, 1974.

FRENCH N., GABRIELLI L., Pricing to market Property valuationrevisited: the hierarchy of valuation approaches, methodsand models, Journal of Property Investment & Finance, Vol.36, No. 4, 2018, pp. 391-396.

FREY C. B., OSBORNE M. A., The future of employment: howsusceptible are jobs to computerisation?, TechnologicalForecasting and Social Change, n. 114, 2017, pp. 254-280.

GENSLER A., HENZE J., SICK B., RAABE N., Deep Learning for solarpower forecasting - An approach using AutoEncoder andLSTM Neural Networks, International Conference onSystems, Man, and Cybernetics (SMC), 2016.

GHOSALKAR N. N., Real Estate Value Prediction Using LinearRegression, 2018 Fourth International Conference onComputing Communication Control and Automation(ICCUBEA).

GLOUDEMANS R. J., Mass appraisal of real property,International Association of Assessing Officers, 1999.

GLUMAC B., DES ROISIERS F., Real estate and land propertyautomated valuation systems: a taxonomy and conceptualmodel, LISER Working papers, n. 9, 2018.

GRACZYK M., LASOTA T., TRAWI�SKI B., TRAWI�SKI K., “Comparisonof Bagging, Boosting and Stacking Ensembles Applied toReal Estate Appraisal”, in Ngoc Thanh Nguyen N. T., ThanhM., �wi�tek L. (a cura di), Intelligent Information and DatabaseSystems, Springer, 2010, pp. 340-350.

HAN S., KO Y., KIM J., HONG T., Housing Market TrendForecasts through Statistical Comparisons based on BigData Analytic Methods, Journal of Management inEngineering, Vol. 34, No. 2, 2018.

HASTIE T., TIBSHIRANI R., FRIEDMAN J., The elements of statisticallearning data mining, inference, and prediction, SpringerVerlag, 2013.

HAUSLER J., RUSCHEINSKY J., LANG M., News-based sentimentanalysis in real estate: a machine learning approach, Journalof Property Research, Vol. 35, No. 4, 2018, pp. 344-371. Volume 35, 2018 - Issue 4HOLLAND D. M., The assessment of land value, Universityof Wisconsin press, 1970.HOUGH J., System and method for computing a comparativevalue of real estate, brevetto n. 5.414.621, United StatesPatent, 1995. HUANG Y.,Predicting Home Value in California, United Statesvia Machine Learning Modeling, Statistics, optimizationand information computing, Vol. 7, No. 1, 2019.IAAO, Standard on Mass Appraisal of Real Property,International Association of Assessing Officers, 2013.IAAO, Standard on Automated Valuation Models (AVMs),International Association of Assessing Officers, 2018.ISAKSON H. R., Valuation analysis of commercial real estateusing the nearest neighbors appraisal technique, Growthand Change, Vol. 19, No. 2, 1988, pp. 11-24. JAMES G., WITTEN D., HASTIE T., TIBSHIRANI R., An Introductionto Statistical Learning: With Applications in R, SpringerVerlag, 2013. JOST A., NELSON J., GOPINATHAN K., SMITH C., Real estateappraisal using predictive modeling, brevetto n. 5.361.201,United States Patent, 1994. KAUKO T., D’AMATO M. (a cura di), Mass Appraisal Methods:An International Perspective for Property Valuers, Hardcover,Wiley-Blackwell, 2008. KOK, N., KOPONEN, E. L., MARTÍNEZ-BARBOSA, C. A., Big Data inReal Estate? From Manual Appraisal to Automated Valuation,The Journal of Portfolio Management, Vol. 43, No. 6, 2017,pp. 202-211.KUTASI D., BADICS M. C., Valuation methods for the housingmarket: evidence from Budapest, Acta Oeconomica, Vol.66, No. 3, 2016, pp. 527-546.KONTRIMAS V., VERIKAS, A., The mass appraisal of the realestate by computational intelligence, Applied SoftComputing, Vol. 11, No. 1, 2011, pp. 443-448. LAM K. C., YU C. Y., LAM, C. K., Support vector machine andentropy based decision support system for propertyvaluation, Journal of Property Research, Vol. 26, No. 3, 2009,pp. 213-233. LASOTA T., MAKOS M., TRAWINSKI B. “Comparative analysis ofNeural Network models for premises valuation using SASenterprise miner” in Nguyen N. T., Katarzyniak R. P., JaniakA. (a cura di), New Challenges in Computational CollectiveIntelligence, Springer Berlin Heidelberg, 2009, pp. 337-348. LENK M. M., WORZALA E. M., SILVA A., High‐tech valuation:should artificial neural networks bypass the human valuer?,Journal of Property Valuation and Investment, Vol. 15, No.1, 1997, pp. 8-26. LI D. Y., XU W., ZHAO H., CHEN R. Q., A SVR based forecastingapproach for real estate price prediction, Eighth



International Conference on Machine Learning andCybernetics, 2009.

LIN H. Y. U., CHEN K., Predicting Price of Taiwan Real EstatesBy Neural Networks and Support Vector Regression, 15thWSEAS international conference on Systems, 2011.

LING D. C., NARANJO A., SCHEICK B., Investor Sentiment, Limitsto Arbitrage and Private Market Returns, Real estateEconomics, Vol. 42, No. 3, 2014, pp. 531-577.

MANGANELLI B., DE MARE G., NESTICÒ A., “Using geneticalgorithms in the housing market analysis”, in Gervasi O.,Murgante B., Misra S., Gavrilova M.L., Rocha A.M.A.C., TorreC.M., Taniar D., Apduhan B.O. (a cura di), ComputationalScience and Its Applications - ICCSA 2015, SpringerInternational Publishing, 2015, pp. 36-45.

MANGIALARDO A., MICELLI E., SACCANI F., Does SustainabilityAffect Real Estate Market Values? Empirical Evidence fromthe Office Buildings Market in Milan (Italy), Sustainability,Vol. 11, No. 1, 2019.

MCCLUSKEY W. J., MCCORD M., DAVIS P. T., HARAN M.,MCILHATTON D., Prediction accuracy in mass appraisal: Acomparison of modern approaches, Journal of PropertyResearch, Vol. 30, No. 4, 2013, pp. 239-265.

MCCLUSKEY W. J., ANAND S., The application of intelligenthybrid techniques for the mass appraisal of residentialproperties, Journal of Property Investment & Finance, Vol.17, No. 3, 1999, pp. 218-239.

MCCULLOCH W. S., PITTS W. H., A logical calculus of the ideasimmanent in nervous activity, Bulletin of MathematicalBiophysics, Vol. 5, 1943, pp. 115-133.

MCGREAL S., ADAIR A., MCBURNEY D., PATTERSON D., Neuralnetworks: the prediction of residential values, Journal ofProperty Valuation and Investment, Vol. 16, No. 1, 1998, pp.57-70.

MOORE W. J., A History of Appraisal Theory and PracticeLooking Back from IAAO’s 75th Year, Journal of Property TaxAssessment and Administration, Vol. 6, No. 3, 2009, pp. 23-49.

MOOYA M. M., Market Value without a Market: Perspectivesfrom Transaction Cost Theory, Urban Studies, Vol. 46, No.3, 2009, pp. 687-701.

MOOYA M. M., “Automated Valuation Models and EconomicTheory”, in D’Amato M., Kauko T. (a cura di), Advances inAutomated Valuation Modeling, Springer InternationalPublishing, 2017, pp. 33–57.

MORANO P., TAJANI F., LOCURCIO M., Multicriteria analysis andgenetic algorithms for mass appraisals in the Italian propertymarket, International Journal of Housing Markets andAnalysis, Vol. 11, No. 2, 2018, pp. 229-262.

MU J., WU F., ZHANG A., Housing Value Forecasting Basedon Machine Learning Methods, Abstract and AppliedAnalysis, n. 4, 2014.

MULLAINATHAN S., SPIESS J., Machine learning: an appliedeconometric approach, Journal of Economic Perspectives,Vol. 31, No. 2, 2017, pp. 87-106.

NGUYEN N., CRIPPS A., Predicting Housing Value: AComparison of Multiple Regression Analysis and ArtificialNeural Networks, The Journal of Real Estate Research, Vol.22, No. 3, 2001, pp. 313-336.

NÚÑEZ-TABALES J. M., REY-CARMONA F. J., CARIDAD Y OCERIN J.M., Commercial properties prices appraisal: alternativeapproach based on neural networks, Journal of ArtificialIntelligence, Vol. 14, No. 1, 2016, pp. 53-70.

O’NEILL J. W., An Automated Valuation Model for Hotels,Cornell Hotel and restaurant administration quarterly, Vol.45, No. 3, 2004, pp. 260-268.

OLADUNNI T., SHARMA S., Hedonic housing theory - A machinelearning investigation, 15th IEEE International Conferenceon Machine Learning and Applications, ICMLA 2016.

PÉREZ-RAVE J. I., CORREA-MORALES J. C., GONZÁLEZ-ECHAVARRÌA

F., A machine learning approach to big data regressionanalysis of real estate prices for inferential and predictivepurposes of real estate prices for inferential and predictivepurposes, Journal of Property Research, Vol. 36, No. 1, 2019,pp. 59-96.

PHAN, T. D., Housing Price Prediction using MachineLearning Algorithms: The Case of Melbourne City , Australia,International Conference on Machine Learning and DataEngineering (iCMLDE), 2018.

POURSAEED O., MATERA T., BELONGIE S, Vision-based real estateprice estimation, Machine Vision and Applications, Vol. 29,No. 4, 2018, pp. 667-676.

REALFONZO A., Teoria e metodo dell’Estimo Urbano, La NuovaItalia Scientifica, 1994.

RICS, The Future of Valuations, 2017 (disponibile al sitointernet: https://www.rics.org/globalassets/rics-website/media/knowledge/research/insights/future-of-valuations-insights-paper-rics.pdf, consultato online il 12 Novembre2019)

ROBSON G., DOWNIE M. L., Automated Valuation Models: aninternational perspective, RICS Automated ValuationModels Conference: AVMs Today and Tomorrow, 2008.

ROSEN S., Hedonic Prices and Implicit Markets: ProductDifferentiation in Pure Competition, The Journal of PoliticalEconomy, Vol. 82, No. 1.,1974, pp. 34-55.

RUMELHART D. E., HINTON G. E., WILLIAMS R. J., Learningrepresentations by back-propagating errors, Nature, Vol.323, No. 6088, 1986, pp. 533-536.

SCHULZ R., WERSING M., WERWATZ A., Automated valuationmodelling: a specification exercise, Journal of PropertyResearch, Vol. 31, No. 2, 2014, pp. 131-153.

SHINDE N., GAWANDE K., Survey on predicting property price,International Conference on Automation andComputational Engineering (ICACE), 2018.

SIMONOTTI M., Metodi di stima immobiliare. Applicazionedegli standard internazionali, Flaccovio editore, 2006.

SUN Y., Real Estate Evaluation Model Based on GeneticAlgorithm Optimized Neural Network, Data Science Journal,Vol. 18, No. 36, 2019, pp. 1-9.


TAJANI F., MORANO P., LOCURCIO M., D’ADDABBO, “Propertyvaluations in times of crisis. Artificial Neural Networks andevolutionary algorithms in comparison”, in Gervasi O.,Murgante B., Misra S., Gavrilova M.L., Rocha A.M.A.C., TorreC.M., Taniar D., Apduhan B.O. (a cura di), ComputationalScience and Its Applications - ICCSA 2015, SpringerInternational Publishing, 2015, pp. 194-209.

TANG Y., QIU S., GUI P., Predicting Housing Price Based onEnsemble Learning Algorithm, International Conferenceon Artificial Intelligence and Data Processing (IDAP), 2018.

VARIAN H. R., Big Data: New Tricks for Econometrics, Journalof Economic Perspectives, Vol. 28, No. 2, 2014, pp. 3-28.

WANG X., WEN J., ZHANG Y., WANG Y., Real estate priceforecasting based on SVM optimized by PSO, Optik, Vol.125, No. 3, 2014, pp. 1439-1443.

WORZALA E., LENK M., SILVA, A., An Exploration of NeuralNetworks and Its Application to Real Estate Valuation, TheJournal of Real Estate Research, Vol. 10, No. 2, 1995, pp. 185-201.

YACIM J. A., BOSHOFF D. G. B., Impact of Artificial NeuralNetworks training algorithms on accurate prediction ofproperty values, Journal of Real Estate Research, Vol. 40,No. 3, 2018, pp. 375-418.YEH H. C., HSIEH T. K., WANG T. S., Study on the Relationshipbetween City and District Average Price by GAOT in Taipei,Applied Mechanics and Materials, Vol. 370, 2013, pp. 2043-2049. YOU Q., PANG R., CAO L., LUO J., Image Based Appraisal ofReal Estate Properties, IEEE Transactions on Multimedia,Vol. 19, No. 12, 2017, pp. 2751-2759.ZHANG S. H., Application of Support Vector Machine indetermination of real estate price, Advanced MaterialsResearch, n. 461, 2012, pp. 818-821. ZHAO Y., JIA S., The regression analysis and prediction ofreal estate added value based on genetic algorithm,International Conference on Management Science andIndustrial Engineering (MSIE), 2011. ZINOVIEV D., Data Science essentials in Python: collect,organize, explore, predict, value, Pragmatic Bookshelf, 2016.


automated models for value prediction: a critical review ...agostino valier *, ezio micelli **...

Documents