modeling multiple land use changes using ann, cart and mars: comparing tradeoffs in goodness of fit...

15
International Journal of Applied Earth Observation and Geoinformation 28 (2014) 102–116 Contents lists available at ScienceDirect International Journal of Applied Earth Observation and Geoinformation jo ur nal home p age: www.elsevier.com/locate/ jag Modeling multiple land use changes using ANN, CART and MARS: Comparing tradeoffs in goodness of fit and explanatory power of data mining tools Amin Tayyebi a,b,, Bryan C. Pijanowski b a University of Wisconsin-Madison, Wisconsin Energy Institute, 1552 University Avenue, Madison, WI 53726, USA b Department of Forestry and Natural Resources, Purdue University, 195 Marsteller Street, West Lafayette, IN 47907, USA a r t i c l e i n f o Article history: Received 13 September 2013 Accepted 12 November 2013 Keywords: Land Transformation Model Multiple Classifications (LTM-MC) Classification And Regression Trees (CART) Multivariate Adaptive Regression Splines (MARS) Multiple Classifications (MC) a b s t r a c t Over half of the earth’s terrestrial surface has been modified by humans. This modification is called land use change and its pattern is known to occur in a non-linear way. The land use change modeling community can advance its models using data mining tools. Here, we present three data mining land use change models, one based on Artificial Neural Network (ANN), another on Classification And Regression Trees (CART) and another Multivariate Adaptive Regression Splines (MARS). We reconfigured the three data mining models to concurrently simulate multiple land use classes (e.g. agriculture, forest and urban) in South-Eastern Wisconsin (SEWI), USA (time interval 1990–2000) and in Muskegon River Watershed (MRW), Michigan, USA (time interval 1978–1998). We compared the results of the three data mining tools using relative operating characteristic (ROC) and percent correct match (PCM). We found that ANN provided the best accuracy in both areas for three land use classes (e.g. urban, agriculture and forest). In addition, in both regions, CART and MARS both showed that forest gain occurred in areas close to current forests, agriculture patches and away from roads. Urban increased in areas of high urban density, close to roads and in areas with few forests and wetlands. We also found that agriculture gain is more likely for the areas closer to the agriculture and forest patches. Elevation strongly influenced urbanization and forest gain in MRW while it has no effect in SEWI. © 2013 Elsevier B.V. All rights reserved. 1. Introduction Land use change (LUC) is a complex process (Irwin and Geoghegan, 2001; Lambin and Geist, 2006) and modeling these systems are challenging. It is well known that the drivers of LUC operate across a variety of spatial-temporal scales in a non- linear way (Veldkamp and Lambin, 2001) and thus nonlinear tools are needed to simulate these dynamics. Non-linear approaches are used frequently in environmental modeling. Indeed, artificial neural networks (ANNs; Tayyebi et al., 2011a) have been used extensively by LUC modelers over the last two decades. Multi- variate Adaptive Regression Splines (MARS) and Classification And Regression Trees (CART) are the other two most well-known mod- els (Breiman et al., 1984; Friedman, 1991). CART calculates the likelihood of the outcomes using multiple spatial drivers to develop monotone outcomes. MARS, on the other hand, overcomes the Corresponding author at: University of Wisconsin-Madison, Wisconsin Energy Institute, 1552 University Avenue, Madison, WI 53726, USA. Tel.: +1 765 412 1591. E-mail address: [email protected] (A. Tayyebi). restriction of the piecewise constant functions in CART by generat- ing piecewise linear models using basis functions (Friedman, 1991). There are a lot of studies available in literature to show that these three data mining models often fit better to data than traditional models (Tayyebi et al., 2012). One of challenges in modeling LUC is that, within a given region, multiple land use changes occur. For example, it is quite com- mon for some areas to be converted from agriculture to urban while nearby forests are converted to agriculture (Alexandridis et al., 2007; Washington et al., 2010). However, few researchers have considered multiple land use transitions in the same model and thus oversimplify the land use change process. In modeling, simulating more than one outcome often creates what is known as the Multiple Classifications (MC) problem (Ho, 2000). Several approaches have been developed during the last three decades to simulate LUC using numerous environmental variables. However, most of them limited their application to only a single land use transition (Pontius et al., 2001). For example, Land Transformation Model (LTM), which is an ANNs model, has been used in a variety of places around the world to simulate only a single LUC (Pijanowski et al., 2005, 2006, 2009, 2013). Here, we proposed different coding 0303-2434/$ see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.jag.2013.11.008

Upload: bryan-c

Post on 23-Dec-2016

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Modeling multiple land use changes using ANN, CART and MARS: Comparing tradeoffs in goodness of fit and explanatory power of data mining tools

MCm

Aa

b

a

ARA

KLCCM(M

1

GsLlaanevRelm

I

0h

International Journal of Applied Earth Observation and Geoinformation 28 (2014) 102–116

Contents lists available at ScienceDirect

International Journal of Applied Earth Observation andGeoinformation

jo ur nal home p age: www.elsev ier .com/ locate / jag

odeling multiple land use changes using ANN, CART and MARS:omparing tradeoffs in goodness of fit and explanatory power of dataining tools

min Tayyebia,b,∗, Bryan C. Pijanowskib

University of Wisconsin-Madison, Wisconsin Energy Institute, 1552 University Avenue, Madison, WI 53726, USADepartment of Forestry and Natural Resources, Purdue University, 195 Marsteller Street, West Lafayette, IN 47907, USA

r t i c l e i n f o

rticle history:eceived 13 September 2013ccepted 12 November 2013

eywords:and Transformation Model – Multiplelassifications (LTM-MC)lassification And Regression Trees (CART)ultivariate Adaptive Regression Splines

MARS)

a b s t r a c t

Over half of the earth’s terrestrial surface has been modified by humans. This modification is calledland use change and its pattern is known to occur in a non-linear way. The land use change modelingcommunity can advance its models using data mining tools. Here, we present three data mining land usechange models, one based on Artificial Neural Network (ANN), another on Classification And RegressionTrees (CART) and another Multivariate Adaptive Regression Splines (MARS). We reconfigured the threedata mining models to concurrently simulate multiple land use classes (e.g. agriculture, forest and urban)in South-Eastern Wisconsin (SEWI), USA (time interval 1990–2000) and in Muskegon River Watershed(MRW), Michigan, USA (time interval 1978–1998). We compared the results of the three data miningtools using relative operating characteristic (ROC) and percent correct match (PCM). We found that ANN

ultiple Classifications (MC) provided the best accuracy in both areas for three land use classes (e.g. urban, agriculture and forest). Inaddition, in both regions, CART and MARS both showed that forest gain occurred in areas close to currentforests, agriculture patches and away from roads. Urban increased in areas of high urban density, closeto roads and in areas with few forests and wetlands. We also found that agriculture gain is more likelyfor the areas closer to the agriculture and forest patches. Elevation strongly influenced urbanization andforest gain in MRW while it has no effect in SEWI.

© 2013 Elsevier B.V. All rights reserved.

. Introduction

Land use change (LUC) is a complex process (Irwin andeoghegan, 2001; Lambin and Geist, 2006) and modeling theseystems are challenging. It is well known that the drivers ofUC operate across a variety of spatial-temporal scales in a non-inear way (Veldkamp and Lambin, 2001) and thus nonlinear toolsre needed to simulate these dynamics. Non-linear approachesre used frequently in environmental modeling. Indeed, artificialeural networks (ANNs; Tayyebi et al., 2011a) have been usedxtensively by LUC modelers over the last two decades. Multi-ariate Adaptive Regression Splines (MARS) and Classification Andegression Trees (CART) are the other two most well-known mod-

ls (Breiman et al., 1984; Friedman, 1991). CART calculates theikelihood of the outcomes using multiple spatial drivers to develop

onotone outcomes. MARS, on the other hand, overcomes the

∗ Corresponding author at: University of Wisconsin-Madison, Wisconsin Energynstitute, 1552 University Avenue, Madison, WI 53726, USA. Tel.: +1 765 412 1591.

E-mail address: [email protected] (A. Tayyebi).

303-2434/$ – see front matter © 2013 Elsevier B.V. All rights reserved.ttp://dx.doi.org/10.1016/j.jag.2013.11.008

restriction of the piecewise constant functions in CART by generat-ing piecewise linear models using basis functions (Friedman, 1991).There are a lot of studies available in literature to show that thesethree data mining models often fit better to data than traditionalmodels (Tayyebi et al., 2012).

One of challenges in modeling LUC is that, within a given region,multiple land use changes occur. For example, it is quite com-mon for some areas to be converted from agriculture to urbanwhile nearby forests are converted to agriculture (Alexandridiset al., 2007; Washington et al., 2010). However, few researchershave considered multiple land use transitions in the same modeland thus oversimplify the land use change process. In modeling,simulating more than one outcome often creates what is knownas the Multiple Classifications (MC) problem (Ho, 2000). Severalapproaches have been developed during the last three decades tosimulate LUC using numerous environmental variables. However,most of them limited their application to only a single land use

transition (Pontius et al., 2001). For example, Land TransformationModel (LTM), which is an ANNs model, has been used in a variety ofplaces around the world to simulate only a single LUC (Pijanowskiet al., 2005, 2006, 2009, 2013). Here, we proposed different coding
Page 2: Modeling multiple land use changes using ANN, CART and MARS: Comparing tradeoffs in goodness of fit and explanatory power of data mining tools

A. Tayyebi, B.C. Pijanowski / International Journal of Applied Earth Observation and Geoinformation 28 (2014) 102–116 103

ools fo

set

ucctlmmnhiwrotg

JeolhhAacpcmpwit

(adidtuis

Fig. 1. Coupling GIS with data mining t

chema strategies that LUC modelers can use to modify their mod-ls for MC. We used these strategies to modify three data miningools (e.g. ANN, CART and MARS) for MC LUC modeling.

MC LUC models should assign each cell in a map to a unique landse class (Pontius and Connors, 2006); however, there are oftenells that may be assigned to more than one land use class in MC (Weall it ambiguous prediction afterward). There are some modelshat are able to deal with multiple land use classes such as cellu-ar automata models (Ballestores et al., 2012), logistic regression

odels (Verburg et al., 2002; Tayyebi et al., 2013a,b), agent-basedodels (Parker et al., 2003; Ralha et al., 2013) and artificial neural

etworks (ANNs; Li and Yeh, 2002a,b). Some of them use a simpleierarchical approach to deal with MC while others use a compet-

tive approach in land allocation. These studies proposed differentays to handle MC problems but they failed to suggest an effective

ule to eliminate confliction problems in MC LUC modeling. Otherbjective of this manuscript is to suggest a simple method to solvehe confliction problems by eliminating ambiguous predictions foriven cells in simulated land use maps mutually.

Many LUC models use non-linear techniques (Clarke et al., 1997;enerette and Wu, 2001) but a comparison of several tools in differ-nt locations has been lacking (Pontius et al., 2008), which limitsur understanding of how non-linear techniques can aptly simu-ate the drivers and the complexity of LUC patterns. MARS and CARTold promise as modeling tools for LUC; neither, to our knowledge,ave been applied to simulating LUC and compared to popular (e.g.,NN) nonlinear tools. Although data mining models have received

lot of attention during the last three decades by the LUC modelingommunity (Tayyebi et al., 2010), there are few studies that com-are LUC models for a MC problem. Therefore, there is a need toompare and contrast nonlinear tools for modeling MC LUC, deter-ine which MC methods work best, and to examine how these

erform for conditions where multiple land use transitions occurithin a location. Here, we compared three well-known data min-

ng approaches (e.g. ANN, CART and MARS) for MC LUC modelingo understand the land use patterns in Midwest USA.

This paper coupled three nonlinear data mining tools in GISFig. 1) to simulate more than one class of LUC for two diversereas of the Midwest USA. The aims of this paper are to (1) proposeifferent coding schema strategies to reconfigure three data min-

ng tools for multiple land use transitions that have already beeneveloped for single LUC; (2) develop an effective rule to overcome

he ambiguous predictions in MC; (3) compare the multiple landse transitions of three data mining techniques with each other

n terms of their potential for agriculture, forest and urban gainimulation in Muskegon River Watershed (MRW) and Southeast

r simulating multiple land use classes.

Wisconsin (SEWI) using relative operating characteristic (ROC) andpercent correct match (PCM) and (4) explore the lessons that welearn about land use change patterns in Midwest USA.

2. Strategies to modify LUC models for MultipleClassifications

There are two ways available for employing Multiple Classifica-tions (MC). One method is for an MC to be converted into numerousbinary classifications that are solved using binary classifiers. Alter-natively, binary classifications can be extended to the MC, whichneeds special formulations to perform the separation. We limitedour study on the MC modeling using one model to simulate multi-ple land use classes. This has the advantage that it minimizes thetotal number of models that need to be executed and takes intoaccount the correlation between different output classes simulta-neously. The advantages of using a single model for MC have beenreported several times in the literature for CART (Blockeel et al.,1999), ANNs (Caruana, 1997) and MARS (Wang et al., 1999).

2.1. Decomposing Multiple Classifications into binary classes

The central objective of MC is to integrate data from all classessimultaneously. Several methods have been suggested to decom-pose the MC into numerous binary classifications including (Fig. 2;Rifkin and Klautau, 2004):

One-Verses-All (OVA): The OVA approach, proposed by severalresearchers in recent years (Rifkin and Klautau, 2004; Dubchaket al., 1999; Fig. 2A), is the simplest of those employed, whereeach modeling run discriminates one class from the other n − 1classes (Rifkin and Klautau, 2004). This procedure is repeated foreach of the n classes, leading to n binary classifiers. OVA has severalshortcomings in the training run of machine learning because oneclass usually has fewer cells compared to the large number of otherclasses at each time (Tsoumakas et al., 2010).All-Verses-All (AVA): AVA considers all possible mutual binaryclassifiers between n classes while ignoring the rest of the classes(Friedman, 1996; Hastie and Tibshirani, 1998; Fig. 2B). This

method requires building

(n2

)= n(n − 1)/2 binary classifiers.

For the testing run, each cell receives a vote from all possiblebinary classifiers and the land use class with the dominant voteis assigned to the cell. AVA is difficult to analyze due to the largenumber of binary classifiers.

Page 3: Modeling multiple land use changes using ANN, CART and MARS: Comparing tradeoffs in goodness of fit and explanatory power of data mining tools

104 A. Tayyebi, B.C. Pijanowski / International Journal of Applied Earth Observation and Geoinformation 28 (2014) 102–116

A - One-Versus-All (OVA) - nClassifiers

B - All-Versus-All (AVA) - ( )

2

1−nnClassifiers

Class 1

Class 2

Class n-1

Class n

Training

Run

Class 1

Class 2

Class n-1

Class n

Training

Run

Training

Run

Class 1

Class 2

Class n-1

Class n

Training

Run

Class 1

Class 2

Class n-1

Class n

Class 1

Class 2

Class n-1

Class n

Training

RunTraining

Run

Class 1

Class 2

Class n-1

Class n

Code as 1

Code as 0

Not presented to model

ersus

2C

bbsra

Fig. 2. The idea of One-Versus-All (OVA) and All-V

Error-Correcting Output-Coding (ECOC): ECOC runs n binaryclassifiers to discriminate between the k different classes(Dietterich and Bakiri, 1995). A code with length n is given to eachclass. ECOC employs the matrix to represent different classes andlength of the code. The row and column of matrix, viz., M corre-sponds to a certain class and length of code, respectively. Separatebinary classifiers are trained using each column. During the test-ing run, the output code from the n classifiers with the minimumdistance from the given k code is considered as the class label.Generalized Coding: This coding scheme is the general case of thethree previous coding strategies (Allwein et al., 2000), where thematrix, M, is allowed to take values {−1, 0, +1}. The value of +1 and−1 in the entry M(k,n) means that the cell belonging to class k isconsidered as change and non-change to classifier n, respectively;however, a value of 0 ignores that class. For the OVA approach,each column in matrix M contains only one +1 and the rest arefilled with −1. For the AVA method, each column has only one +1value and one −1 value and others are set to zero. Lastly, ECOC fillsthe matrix M with combination of +1 and −1 values.

.2. Extensible data mining coding scheme for Multiplelassifications

The data mining coding scheme for binary applications cane extended for MC as well. The idea is to use numerousinary classifiers to solve multiple binary classification problemsimultaneously. The model structure and output code scheme cor-esponding to each class can be chosen as follows (after Dietterichnd Bakiri, 1995):

One-per-class coding: Instead of just having one binary output,

we could have n binary outputs (Fig. 3; Supplement 1a). Each out-put is designated the task of identifying a given class. The code forthat class should be 1 at one output and 0 for the others. Therefore,we will need n outputs, where n is the number of classes.

-All (AVA) procedures for Multiple Classifications.

Distributed output coding: Each class receives a unique code thatcan change from 0 to n (Supplement 1b), where n is the number ofoutput. During the testing run, the calculated code is compared tothe code for the reference n classes. The closer the value is to theobserved class, according to distance measure (e.g., Hamming orEuclidean distance), that class becomes the winning class.

2.3. Challenges of Multiple Classifications

Transitioning from binary to MC for LUC modeling presentsseveral challenges. First, we need a way to objectively comparedifferent LUC models. To accomplish this, we used the current ver-sion of a well-known LUC ANN-based model, called LTM (Tayyebiet al., 2011a), to guide the parameterization of new ANN, CART andMARS models. Similar parameterization (Fig. 1) means that differ-ences cannot be attributable to variations in the spatial drivers orthe values assigned to each cell (i.e., pixel in the map). We only haveone difference between the models, namely, that the ANN uses anoutput layer that is equal to the number of desired outputs (Fig. 3;Supplement 1a); however, MARS and CART uses a single output forMC (Supplement 1b).

A second challenge is to determine how LUC models treat eachland use class. ANN uses a “one-per-class coding strategy” for theoutput layer nodes where code 1 and 0 still represents change andno-change of land use classes (Fig. 3; Supplement 1a); however,MARS and CART use a distributed output coding strategy that treatmultiple land use classes as a group in a single record (scale from0 to n; Supplement 1). In such a case, 0 represents the persistenceof land use classes and other values (1,2,..., n; Supplement 1b) areused to code other land use changes.

Finally, ANN and CART develop a separate probability map for

each output; however, MARS develops one suitability map thatscales from 0 to n (Fig. 5b). There are some cells that may changeto more than one class in CART and LTM-MC (Figs. 4 and 5c; calledambiguous predictions); however, each cell can belong only to one
Page 4: Modeling multiple land use changes using ANN, CART and MARS: Comparing tradeoffs in goodness of fit and explanatory power of data mining tools

A. Tayyebi, B.C. Pijanowski / International Journal of Applied Earth Observation and Geoinformation 28 (2014) 102–116 105

nd cod

ln

3C

dnric

3(

tsn(snoeno(a

3

rt

Fig. 3. Model structure a

and use class in the future. Here, we create simple rules to elimi-ate ambiguous predictions between mutual land use simulations.

. Modifying data mining models for Multiplelassifications

In order for the readers to compare the formula for threeata mining models in the following section, we standardized theotation of spatial inputs and outputs of three models. Where, iepresents the index for spatial input variables, k represents thendex for spatial output variables and j shows the index for otheralculation in middle.

.1. Land Transformation Model – Multiple ClassificationsLTM-MC)

We changed the original structural version of LTM with respecto two aspects (Tayyebi et al., 2011a). First, we modified the originaltructure of the LTM for binary classification, where the number ofodes in output layer is equal to the number of the desired outputFig. 3; Supplement 1a). Second, we used a ‘one-per-class codingtrategy’ for the output layer, which uses a combination of k-binaryumbers to represent k-category attributes, each associated withne of the transition. In order to show the state of transition forach land use class, only one of the k numbers in the output layereeds to be coded as one while the others stay zero. All the nodes inutput layer are coded zero if land use persists between two timesFig. 3). LTM-MC enables a user to define I inputs, H hidden unitsnd K output units.

.2. CART and MARS for Multiple Classifications

CART and MARS over grow at first to make sure that stoppingules do not prevent the model from extracting the correct pat-erns in data during the training run (e.g. preventing under-fitting).

ing scheme of LTM-MC.

Consequently, the model is pruned back by penalizing model com-plexity and removing the splits of the model that do not improvethe accuracy significantly (e.g. preventing over-fitting; Steinbergand Colla, 1997).

3.2.1. Classification And Regression TreesCART is a recursive partitioning procedure that classifies the cat-

egorical (classification tree) or continuous (regression tree) data ateach node (e.g., parent) using a set of if-then-else rules (Timofeev,2004). CART begins with the root node at the top of the tree, whichcontains the entire data for the training run (Yap et al., 2011). A nodein the CART model is either a terminal node (a node without chil-dren), or non-terminal node (a node with children; Chen, 2011). Thetree structure represents spatial drivers of LUC organized hierarchi-cally (levels in the tree are representative of the level of significanceof variables) and series of splits for different predictors. CART seeksthe split using search algorithms to classify the data into binary ormultiple classes (Breiman et al., 1984) by checking all unique val-ues across the range of data values of different predictors (Ayoublooet al., 2011).

CART calculates the probability (pk) of the land use classes inthe root node of the tree using relative frequencies in the entirelearning data (pk = (Nk/N) ; k = 1, 2, . . ., K; where Nk is the number ofcells belong to land use class k from the entire data N; Loh, 2010).Afterward, p(k,t) denotes the probability of land use class k (Eq.(1a)) which is estimated from the data within node t (where Nk(t)is the number of cells in node t belonging to class k). p(k|t) denotesthe conditional probability that CART classifies the land use classesaccurately (Eq. (1b); where p(t) =

∑kp(k, t)):

Nk(t)

p(k, t) = pk ×

Nk(1a)

p(k|t) = p(k, t)p(t)

(1b)

Page 5: Modeling multiple land use changes using ANN, CART and MARS: Comparing tradeoffs in goodness of fit and explanatory power of data mining tools

106 A. Tayyebi, B.C. Pijanowski / International Journal of Applied Earth Observation and Geoinformation 28 (2014) 102–116

F showi

adacj

d

d

igaumiog

3

itAmcdffcvncfE

ig. 4. Overview of the confliction removal between two simulated LUC maps. Wenformation).

Gini index is usually used as a node impurity function to define splitting rule (Breiman et al., 1984) for each unique value in pre-ictors to find the best split to fragment data (uniform cost; Eq. (2a)nd non-uniform cost; Eq. (2b)). C(j|k) represents the cost of mis-lassifying a cell that belongs to land use class k into land use class

as follows:

(t) =K∑

k=1

K−1∑j=1

p(j|t)p(k|t) = 12

(1 −

K∑k=1

p2(k|t)

)(2a)

(t) =K∑

k=1

K−1∑j=1

p(j|t)p(k|t)(C(j|k) + C(k|j)) (2b)

The best split in node t is the one that maximizes the nodempurity function (d(t)) in the children of node t (Loh, 2010). Theain function (Eq. (3)) can be used to determine the goodness of

split (Paulsen et al., 2011; split s for node t). The gain functionses a distribution of data before and after splitting to make aore homogenous subset than the previous node. A splitting value

s adopted at node t that maximizes the reduction in diversitybtained by the split. Where pL and pR are the proportions of cellsoing to nodes tL (left) and tR (right) respectively:

d(s, t) = d(t) − pLd(tL) − pRd(tR) (3)

.2.2. Multivariate Adaptive Regression SplinesMARS is an innovative mathematical model for function approx-

mation (Li et al., 2011). Knots are responsible in MARS to breakhe independent variables into subsets (Menéndeza et al., 2011).ny arbitrary function with an irregular shape can be approxi-ated using a large number of knots (Grzesiak et al., 2011). The

oefficients of MARS can change for different intervals as well asifferent predictors (Friedman, 1991). We have generalized MARSor incorporation into the multiple LUC classifications, allowingor multiple responses. We assume that y = (y1, y2, . . ., yK)T ∈ RK

ontains a K-dimensional output which depends on I-dimensionalariables x = (x1, x2, . . ., xI)T ∈ RI (Gooijer and Ray, 2003; 0 represent

o-change while the other integer numbers show different land usehanges). The goal of MARS is to construct a data-driven procedureor simultaneous estimation of an unknown function (like f) usingq. (4). Specifically, each regression function is modeled as a linear

different steps of this approach with example in next figure (see Fig. 5 for more

combination of S > 0 basis function bs(xi), so that for a function f(where i and k can change from 0 to I and K, respectively):

yk = f̂ (xi) =I∑

i=1

Sj∑s=1

ˇibs(xi) (4)

Here Sj denotes the number of knots for the corresponding pre-dictors (xi) and ˇi are regression parameters. In order to have afast and easy interpretable MARS model, we limit the basis func-tions to linear terms (only (xi − tis)+ and (tis − xi)+ where tis is theknot for driver i). The better couple of basis functions are definedas that which more greatly reduces the sum of residual squares.The best MARS model is chosen using generalized cross-validation(GCV). For GVC, those pairs of basis functions that contribute less tothe goodness-of-fit are eliminated in a backward phase of model-ing. GCV takes into account not only estimation errors, but also thecomplexity of the model (Eq. (5); N is total number of basis func-tions). GVC is calculated as such, where � is the effective numberof degrees of freedom whereby the GCV adds a penalty for addingmore input variables to the model (Gooijer and Ray, 2003):

GCV = 1N

∑Ni=1(yi − f (xi))

2

(1 − (�/N))2(5)

4. Adapted rules to remove conflictions in MultipleClassifications

A simple method is suggested to solve the conflict problemsin MC (Fig. 4). This is achieved by eliminating ambiguous predic-tions for given cells in simulated land use maps mutually. A routinewas written in C# to eliminate cells that undergo ambiguous pre-dictions. The example in Fig. 5 shows a case with three land useclasses (e.g. agriculture, urban and forest); however, the suggestedmethod is also valid for the simulated maps with k land use classes.Fig. 4 shows the overview of the proposed algorithm for conflictionremoval. Let us assume that 3 × 3 window represents the land usemap of study area (Fig. 5a–c) and the each code represents differentland use classes (0, 1, 2 and 3 show other class, agriculture, urban

and forest, respectively).

This method requires several steps. First, contingency table isconstructed to summarize the number of different land use transi-tions (i.e., the number of cells undergoing change for each land use

Page 6: Modeling multiple land use changes using ANN, CART and MARS: Comparing tradeoffs in goodness of fit and explanatory power of data mining tools

A. Tayyebi, B.C. Pijanowski / International Journal of Applied Ea

Suitability map for class 2 Suitability map for class 1

0.960.63 0.47

0.350.22 0.87

0.580.78 0.70

Step 1

0.12 0.360.58

0.31 0.870.65

0.81 0.490.27

Simulated map for class 2 Simulated map for class 1

1 0 0

0 0 1

0 1 1

Step 2

000

011

100

Suitability map for class 1 Suitability map for class 2

0.960.63 0.47

0.350.22 0.87

0.580.78 0.70

Step 3

0.12 0.360.58

0.31 0.870.65

0.81 0.490.27

New simulated map for class 2 Original simulated map for class 1

1 0 0

0 0 1

0 1 1

Step 4

001

010

100

Reference map in time 2 Reference map in time 1

02 1

10 2

32 1

213

210

112

N(0) = 1

N(1) = 4

N(2) = 3

N(3) = 1

Suitability map for three outputs

2.642.251.05

0.522.880.15

1.350.831.82

Number of transitions for each

land use classes (Figure 5a)

N(0) = 1

N(1) = 4

N(2) = 3

N(3) = 1

Simulated map for three outputs

221

130

112

a

b

c

Fig. 5. (a) Using contingency table to quantify different land use changes betweentwo times and give rank to land use classes (Step 1 in Fig. 4). Where in N(i) = j, i andj show the code and the number of transitions for each land use class, respectively.Let us assume that code 0, 1, 2 and 3 represent other class, agriculture, urban andforest, respectively. Land use class 1 and 2 with 4 and 3 observed land use transitionsreceive rank 1 and 2, respectively (Step 2 in Fig. 4). (b) Stylized example showing thedeviation of a simulated land use map from a suitability map calculated by MARS.The range of suitability values can change from 0 to 3. Conversion from suitabilitymap to simulated map in MARS is not random or threshold base and it is based onsuitability values and number of land use transitions for each land use class (a).

rth Observation and Geoinformation 28 (2014) 102–116 107

class) between two referenced LUC maps separated in time (Step 1in Fig. 4). These values are used to determine the quantity of LUCto occur between two time periods during the final thresholdingphase of modeling (Moving from step 1 to step 2 and step 3 to step4 in Fig. 5c). In other words, thresholding is not determined by thevalue of output or suitability map (e.g., cells with value >0.5 areassigned a 1, those <0.5 are assigned a value of 0); rather, the num-bers of cells are predetermined by vales in the contingency table.Such a modeling rule imposes a condition called “fixed quantity”(see Pontius et al., 2008; Tayyebi et al., 2013b) which thus doesnot create modeling errors in quantity (see Fig. 5a as example). Theland use classes receive a rank value from high to low based on thenumber of land use transitions; a higher and lower rank is assignedfor the land use class with more and less reference land use transi-tions, respectively (Step 2 in Fig. 4; Fig. 5a). Thereafter, suitabilitymaps of each land use classes are employed to predict LUC (Step 1in Fig. 5c; Fig. 5b).

Second, this sub-component counts and saves the number andlocation of conflicted cells between the simulated land use maps(Step 3 in Fig. 4) with the highest rank and the other simulated landuse maps with lower rank (e.g. land uses with code 1 and 2; Step 2 inFig. 5c) which are mutually compared (k − 1 comparison betweenk output classes). The highest rank prediction map in a first run isa final simulated map without any change because changes occurin the land use prediction map with the lower rank. At each run(k − 1 times comparison), those ambiguous cells are removed fromthe lower rank suitability map (those cells assign to zero whichrepresent no-change in the final simulated map; Step 3 in Fig. 5c)and the lower rank suitability map uses to predict land use classagain. Thus, the new version of simulated map for the lower rankland use class is free from cells with ambiguous predictions (Step4 in Fig. 5c).

Finally, the entire process is repeated for the other k − 1 classes(e.g. between land use classes with code 1 and 3 this time and fol-lowing land use classes with code 1 and 0). For the next run, wehold out the output land use class with the highest rank (e.g. landuse with code 1) and the entire process runs with other lower rankk − 1 classes (e.g. land use class with code 2 has the highest rank thistime). This procedure follows sequentially to remove the conflictedcells between different layers in our mutually compared simulatedLUC maps. For MC problems with k outputs, (k − 1)! comparisonsare necessary which is computationally intensive.

5. Comparison of data mining approaches, study area andmodel building

5.1. Calibration and validation

Data are split in calibration and validation sets. These sets

are used, respectively, for building the model and assessing theaccuracy. For the calibration run, three models use a k-fold crossvalidation procedure to examine the model performance (Fig. 6;Refaeilzadeh et al., 2008). Calibration data (60% of data) are

Thus, the cells with suitability values closer to 0 and 3 are more likely to map 0 and3, respectively. As you can see in the example, confliction is not possible in MARSbecause MARS generates only one suitability map. (c) Stylized example showing thedeviation of a simulated land use map from a suitability map calculated by CART orLTM-MC. Overview how conflicting information from different suitability maps isresolved for CART or LTM-MC between land use class with code 1 and 2 (Steps 3 and4 in Fig. 4). The number of transitions for land use classes and their rank derived from(a). Here, we showed the process for land use classes with code 1 and 2 because theyhave the highest rank between land use classes. Blue circle show the conflict cellbetween two land use simulated maps (0 and 1 represent no-change and change,respectively). Red cross indicates the conflict cell that removed from the suitabilitymap with the lower rank (rank 2). (For interpretation of the references to color inthis figure legend, the reader is referred to the web version of this article.)

Page 7: Modeling multiple land use changes using ANN, CART and MARS: Comparing tradeoffs in goodness of fit and explanatory power of data mining tools

108 A. Tayyebi, B.C. Pijanowski / International Journal of Applied Earth Observation and Geoinformation 28 (2014) 102–116

Training

Run

Fold 1

Fold 2

Fold k-1

Fol d k

Testing

RunFold 1

Fold 2

Fold k-1

Fold k

Training

Run

Testing

Run

Fold 1

Fold 2

Fold k-1

Fold k

Training

Run

Testing

Run

Training

Run

K1 2 …

Fig. 6. k-fold cross validation.

Table 1A contingency table to compare simulated and reference land use maps.

Reference map

1 2 3 ... J Sum of row

Simulated Map

1 n11 n12 n13 n1J S1 =∑

n1j

2 n21 n22 n23 n2J S2 =∑

n2j

3 n31 n32 n33 n3J S3 =∑

n3j

.

.

....

J nJ1 nJ2 nJ3 nJJ SJ =∑

nJj

nj2

J∑ J∑

srarmamcf4

tsc(tbptlutBeMRfof

5

Lrp

urban ranked second (24%); and forest was the least portion with7% between these three land use classes. The remaining areas wereshared by wetland (10%) and shrub/water/barren (7%). Between1990 and 2000, the amount of urban in the watershed increased

Table 2List, abbreviations and the date of spatial predictors in SEWI and MRW.

SEWI in 1990 MRW in 1978 Abbreviations

1 Elevation Elevation Elevation2 Aspect Aspect Aspect3 Distance to Urban Distance to Urban Urban4 Density of Urban Density of Urban Durban5 Distance to Forest Distance to Forest Forest6 Density of Forest Density of Forest Dforest7 Distance to Agriculture Distance to Agriculture Agriculture8 Density of Agriculture Density of Agriculture Dagriculture9 Distance to Shrub Distance to Shrub Shrub

10 Density of Shrub Density of Shrub Dshrub11 Distance to Wetland Distance to Wetland Wetland12 Density of Wetland Density of Wetland Dwetland13 Distance to Park Distance to Park Park

Sum of column A1 =∑

nj1 A2 =∑

egmented into k equal sized fold data partitions (stratifiedandom). For the calibration run at each iteration (k possible iter-tions), one of the k folds is used for the validation run and theemaining k − 1 fold data are used to find the pattern in data. Threeodels (cf. Steinberg and Golovnya, 2006) take the average of k iter-

tions to provide an estimate of the model accuracy. We used theost common cross validation procedure (10-fold). Following the

alibration run, the best LTM-MC, CART and MARS models derivedrom the calibration run are applied to the validation data (other0% of data; Tayyebi et al., 2012).

The percent correct match (PCM) indicates the percentage ofhe cells for the land use class that were classified correctly by thepatial explicit models (Tayyebi et al., 2011b, 2012; Table 1). PCMan be used to calculate the proportion of cells that undergo changePCM C) and no-change (PCM NC). The relative operating charac-eristic (ROC) curve can be used to evaluate the performance ofinary (Pontius and Batchu, 2003) and MC (Hand and Till, 2001)roblems. For MC, we followed Hand and Till (2001) that extendedhe ROC for MC by averaging pair-wise comparisons. The MC prob-em is decomposed into all possible binary problems and the areander the curve is calculated for each class pair. For a specific class,he maximum area under the curve is used as the ROC measure.ecause CART and LTM-MC develop unique suitability maps forach of land use classes, the application of CART and LTM-MC forC can be treated as a binary classification using the conventional

OC (Pijanowski et al., 2006). In contrast, due to one suitability mapor all land use classes ranging from 0 to n (the maximum numberf land use classes; Fig. 5b) resulting from the MARS model, weollowed the adapted ROC for MC given by Hand and Till (2001).

.2. Study areas

The Muskegon River Watershed (MRW) region is located in theower Peninsula of Michigan, USA (Ray et al., 2012). MRW is cur-ently dominated by forests in the north, agriculture in the centralortion, and urban in the south (Fig. 7a). In 1978, the dominant land

A3 =∑

nj3 ... AJ =∑

njJ Total =j=1

Aj =j=1

Sj

use was forest, covering more than half of the watershed (53%);agricultural ranked second (23%); and urban was the least portionwith 4.2% between these three land use classes. The remainingareas were shared by shrub (10%) and water/wetland/barren (9.7%).Between 1978 and 1998, the amount of urban in the watershednearly doubled from 4% total coverage to over 7%. Approximately5% of the agriculture in 1978 was lost by 1998, largely replaced byforest during the 20 years period (Fig. 7c).

The South-Eastern Wisconsin (SEWI) region is located in thesouth-eastern of Wisconsin, USA (Alexandridis and Pijanowski,2013). SEWI is currently dominated by urban in the east, agricul-ture in the north and south (Fig. 7b). In 1990, the dominant land usewas agriculture, covering more than half of the watershed (52%);

14 Distance to Stream Distance to Stream Stream15 Distance to Road Distance to Road Road16 Slope Slope Slope17 – Distance to Water –, Water

Page 8: Modeling multiple land use changes using ANN, CART and MARS: Comparing tradeoffs in goodness of fit and explanatory power of data mining tools

A. Tayyebi, B.C. Pijanowski / International Journal of Applied Earth Observation and Geoinformation 28 (2014) 102–116 109

Fig. 7. Study areas in USA and quantity of land use changes.

Page 9: Modeling multiple land use changes using ANN, CART and MARS: Comparing tradeoffs in goodness of fit and explanatory power of data mining tools

110 A. Tayyebi, B.C. Pijanowski / International Journal of Applied Earth Observation and Geoinformation 28 (2014) 102–116

Table 3Size of samples, time interval and resolution of data for SEWI and MRW.

Study area Time interval Agriculture gain Forest gain Urban gain Other-transitions Total Resolution

SEWI 1990–2000 135,709 79,467 491,031 314,265 1,020,472 30 m × 30 mMRW 1978–1998 120,013 641,237 412,302 693,598 1,867,150 30 m × 30 m

Table 4Variable, split and improvement for CART.

Variable Split Improvement

(a) SEWIMain Forest 63 0.094511 Agriculture 75 0.065822 Urban 51 0.037523 DForest 0.11346 0.028204 Road 121 0.017765 Durban 0.16735 0.016326 Wetland 51 0.014517 DAgriculture 0.43037 0.013718 Slope 3.14572 0.013179 Shrub 142 0.0071210 Park 1499 0.0064811 DShrub 0.05948 0.0061012 DWetland 0.09972 0.0047313 Elevation 265 0.0045814 Water 101 0.0045615 Stream 121 0.0041816 Aspect 0.50 0.00349

(b) MRWMain Agriculture 345 0.065871 Shrub 51 0.064862 Road 114 0.038233 Forest 190 0.030404 Durban 0.03543 0.015175 Urban 270 0.013336 DAgriculture 0.14085 0.009807 Elevation 266 0.009398 Wetland 875 0.009109 DShrub 0.13359 0.0082910 DWetland 0.04490 0.0037411 DForest 0.37794 0.0034612 Stream 915 0.0022713 Park 2772 0.0011014 Slope 1.14232 0.0008915 Aspect 5.58117 0.00056

Table 5Coefficients, variables and knots in MARS.

Coefficient Variable Sign Knot

(a) BFs in SEWI0 −0.9846801 −0.061248 Agriculture − 672 0.035816 Agriculture + 673 −0.031057 Urban + 424 −0.015252 Wetland − 1615 0.014962 Wetland + 1616 −0.001308 Road − 1617 0.003081 Road + 1618 −0.022361 Wetland + 609 −0.000101 Water − 90

10 −0.006427 Water + 9011 −0.004736 Shrub − 36612 0.004286 Shrub + 36613 0.000015 Forest − 6714 −0.004473 Forest + 6715 0.012819 Urban − 4216 −3.674577 DUrban + 0.772617 −1.079093 DUrban − 0.772618 −3.326859 DAgriculture + 0.0790

Table 5 (Continued)

Coefficient Variable Sign Knot

19 −2.891896 DAgriculture − 0.079020 −0.007839 Shrub + 13421 0.000535 Aspect + 14522 0.000145 Aspect − 14523 −0.722788 DShrub + 0.372824 −1.050544 DShrub − 0.372825 0.999261 DWetland + 0.102026 0.056512 DWetland − 0.102027 0.792730 DForest + −0.000028 −0.006119 Slope − 2.862429 −0.027395 Slope + 2.862430 0.003083 Shrub − 17431 −0.000693 Elevation + 26732 −0.001118 Elevation − 26733 −1.464719 DWetland + 0.549234 0.002106 Urban + 8435 −0.000475 Stream + 24036 0.000135 Stream − 121837 0.000110 Water + 68438 4.434641 DAgriculture − 0.104539 0.000002 Park + 0.0001

(b) BFs in MRW0 1.9168991 −0.023503 Agriculture − 12142 −0.000051 Road + 1893 0.002369 Road − 1894 0.000130 Wetland + 845 −1.712863 DAgriculture − 0.01226 4.617346 DAgriculture + 0.01227 0.000037 Stream − 1088 −0.002620 Stream + 1089 −0.005666 Elevation − 339

10 −0.004297 Elevation + 33911 0.019570 Shrub + 98412 0.016237 Forest + 44513 −0.498543 DWetland − 0.060214 0.611026 DWetland + 0.060215 0.000564 Agriculture − 121416 −0.000770 Shrub + 98417 8.710874 DForest − 0.944418 −1.011644 DUrban + 0.503819 −0.500216 DUrban − 0.503820 −0.000134 Park + 12,55421 −0.000334 Aspect − 6422 −0.001198 Aspect + 6423 −0.000096 Urban − 232324 0.000019 Urban + 232325 0.222987 DShrub − 0.222826 0.452830 DShrub + 0.222827 0.000119 Park − 17,01728 0.000030 Park + 596429 −0.000777 Forest − 44530 1.990626 DAgriculture + 0.075531 −0.014505 Slope − 2.635032 −0.023725 Slope + 2.635033 0.011041 Elevation − 21634 0.083747 Elevation + 23935 −0.037933 Elevation − 25436 −0.066528 Elevation + 22837 0.007891 Elevation − 29838 0.005506 Elevation + 35639 −0.000250 Wetland − 113040 0.000162 Wetland + 2165

Page 10: Modeling multiple land use changes using ANN, CART and MARS: Comparing tradeoffs in goodness of fit and explanatory power of data mining tools

A. Tayyebi, B.C. Pijanowski / International Journal of Applied Ea

F thin tt rticle.

fwp

etriA

5

ecpaeah(aan

ig. 8. Visualization of the main splitters in CART. Cells that meet the condition wihe references to color in the text, the reader is referred to the web version of this a

rom 24% to over 28%. Approximately 5% of the agriculture in 1990as lost by 2000, largely replaced by urban during the 10 yearseriod (Fig. 7d).

In both study areas, three land use classes (e.g. agriculture, for-st and urban) dominated more than 80% of landscape. We keephe description of both study areas short in this manuscript, andeaders can find more details about them in our previous stud-es (See Pijanowski et al., 2002 and Ray et al., 2012 for MRW andlexandridis and Pijanowski, 2013 for SEWI).

.3. Model building

Since the LUC process is spatially-explicit, coupling nonlin-ar tools with GIS has become standard practice within the LUCommunity as preparing data input and analysis of model out-ut. The land use maps were digitized from aerial photographst Anderson Level 1 (7 land use classes; Fig. 7a and b; Andersont al., 1976). We used Salford Systems Software which contained

trial version of CART and MARS (www.salford-systems.com);owever, LTM is open source software which is available online

http://ltm.agriculture.purdue.edu). Three models use data or vari-bles at the initial time as input (e.g. spatial drivers in 1990 for SEWInd in 1978 for MRW, respectively; Table 2) and the change ando-change between initial and subsequent time as output (Table 3).

rth Observation and Geoinformation 28 (2014) 102–116 111

he nodes go to left side while other cells go to the right side. (For interpretation of)

For input data, elevation for both regions was obtained from theUSGS’s Shuttle Radar Topography Mission. Slope was then calcu-lated from the elevation using the ArcGIS10 spatial analyst tool.For distance layers, for each cell Euclidean distances were then cal-culated to urban, forest, wetland, shrub and agriculture in 1978for MRW and in 1990 for SEWI (Table 2). For density calculation,neighborhood function (focal function) used to compute the valueat each location based on the input cells in a neighborhood of thecentral cell (e.g. 1.2 km). We used a circle to define the neighbor-hoods of the central cell and computed the mean of the neighboringcells. The variables in Table 2 are expected to affect urban, forestand agriculture gain in SEWI between 1990 and 2000 and in MRWbetween 1978 and 1998. For output data, the land use layers in firstand second time (1990 and 2000 for SEWI and 1978 and 1998 forMRW) were reclassified into agriculture, forest, urban and otherclasses in both regions to make a land use change map as output ofeach model.

Because of the large sizes of cells in SEWI and MRW, we usedstratified random sampling to take 1,020,472 samples for SEWI and1,867,150 samples for MRW (Table 3). The size of each simulation,type of land use change, time interval and the resolution of data for

each region are given in Table 3. We used stratified random samp-ling again to split data into two mutually exclusive sets: training(approximately 60% of the data) and testing (the other 40% of data)
Page 11: Modeling multiple land use changes using ANN, CART and MARS: Comparing tradeoffs in goodness of fit and explanatory power of data mining tools

112 A. Tayyebi, B.C. Pijanowski / International Journal of Applied Earth Observation and Geoinformation 28 (2014) 102–116

F an chr le.)

f1abammmtqtfi

ig. 9. Terminal node in CART (gray, red, yellow and green represent no-change, urbeferences to color in the text, the reader is referred to the web version of this artic

or reach region. Three models were developed using exact same6 variables in 1990 for SEWI and 17 variables in 1978 for MRWs inputs (Table 2) and agriculture, urban and forest change mapsetween two times (1990–2000 in SEWI and 1978–1998 in MRW)s outputs (Table 3). After training the three models, the best treeodel of CART, the best network file of LTM-MC and the best MARSodel (Supplement 2, 3 and 4) were applied to create the suitabilityap for the validation run. We used contingency table to compare

he reference land use maps between two times for each region touantify land use change for each land use class (e.g. fix the quan-ity of land use gain between two times; Table 3). We then used thexed quantity of land use change for agriculture, forest and urban

ange, agriculture change and forest change, respectively). (For interpretation of the

in SEWI and MRW (Table 3) to simulate the land use change in thesubsequent time, and finally compared these simulated results foreach model to reference land use map of the subsequent time.

6. Results

6.1. CART simulation

Red, green and yellow nodes in the tree include cells thatencounter urban, forest and agriculture gain, respectively; how-ever, gray nodes (Fig. 8a and b) include cells that encounter otherland use transitions or stay in the same land use class. Distance

Page 12: Modeling multiple land use changes using ANN, CART and MARS: Comparing tradeoffs in goodness of fit and explanatory power of data mining tools

A. Tayyebi, B.C. Pijanowski / International Journal of Applied Earth Observation and Geoinformation 28 (2014) 102–116 113

NOVA

tauTststivtd

lectl

Fig. 10. A

o forest and agriculture in SEWI and distance to agriculture, roadnd forest in MRW are the most significant drivers to simulaterban, forest and agriculture gain simultaneously (Fig. 8a and b).he improvement score measures the quality of the split (largercores are better); this is where the variance reduction occurs dueo the split (Table 4 in decreasing order). Distance to agricultureplits at the value 75 m yields an improvement of 0.065 much lowerhan the main splitter (distance to forest with 0.094 improvement)n SEWI. Similarly, distance to shrub is the best variable, splits at thealue 51 m, which yields an improvement of 0.064, quite similar tohe main splitter, distance to shrub with 0.065 improvements andistance to agriculture with 0.065 improvements in MRW (Table 4).

The cells located within the red, green and yellow nodes witharger suitability values have the greatest chance for urban, for-

st and agriculture gain, respectively (Fig. 8a and b). In SEWI, theells at node 8 (distance to forest less than 80 m), node 20 (dis-ance to wetland over 80 m) and node 18 (distance to agricultureess than 75 m) have the greatest suitability for forest, urban and

in MARS.

agriculture gain, respectively (Fig. 8a). In MRW, the cells withinnode 17 (distance to shrub over 40 m), node 14 (density of urbangreater than 0.04) and node 18 (distance to wetland over 60 m)have the greatest suitability for forest, urban and agriculture gain,respectively (Fig. 8b). This procedure continues from node withgreater suitability values to other nodes with lower suitability valueuntil CART satisfies the total number of reference LUC for each landuse class (Table 3). CART assigns same suitability values to the cellswithin each node.

Fig. 9 enables LUC modelers to find the nodes with their poten-tial interest by sorting the nodes from high to low concentrationsfor different land use classes. Fig. 9 provides a representation of theability of the nodes in CART to capture the LUC pattern (see Fig. 8to find the node number). For example, farmers, urban planners

and natural resource managers are interested to know where moreagriculture gain, urbanization and forest gain occur in the givenregion, respectively (e.g. nodes 3 and 18 in SEWI for agriculturegain; nodes 10 and 14 in MRW for urbanization and nodes 7 and 11
Page 13: Modeling multiple land use changes using ANN, CART and MARS: Comparing tradeoffs in goodness of fit and explanatory power of data mining tools

114 A. Tayyebi, B.C. Pijanowski / International Journal of Applied Earth Observation and Geoinformation 28 (2014) 102–116

F n SEWM

ictNutct

6

ttaui

ig. 11. The calculated PCM for non-change and agriculture, urban and forest gain iRW.

n SEWI for forest gain). This figure also represents each land uselass with its own natural color (other land use transitions, agricul-ure, forest and urban are gray, yellow, green and red, respectively).odes 3, 5, 7 and 14 have the highest concentration of agriculture,rban, forest gain and other land use transitions in SEWI, respec-ively (Fig. 9). Similarly, nodes 11, 10, 19 and 13 have the highestoncentration of agriculture, urban, forest gain and other land useransitions in MRW, respectively (Fig. 9).

.2. MARS simulation

The variable with the larger standard deviation (ANOVA) hashe more explanatory power to describe the relationship between

he inputs and outputs. Distance to agriculture with standard devi-tion 0.74 and 0.29 show greater contributions to simulate landse transitions in SEWI and MRW, respectively (Fig. 10). Follow-

ng those variables, distance to urban with standard deviation of

I and MRW. ROC of three models for agriculture, urban and forest gain in SEWI and

0.31 and distance to shrub with standard deviation 0.25 indicategreater contribution to simulate land use transitions in SEWI andMRW, respectively (Fig. 10). In addition, distance to shrub with 4basis functions in SEWI and elevation with 8 basis functions in MRWinclude the highest number of basis functions in the MARS (Table 5;Supplement 5 and 6).

6.3. Comparison of data mining models

According to the PCM (Fig. 11a), LTM-MC and CART had similaraccuracy and were more accurate than MARS to simulate urban,agriculture and forest gain in both regions. Only for the forest gainmodeling in SEWI, the difference between LTM-MC and CART with

MARS was large. The difference is due to the fact that few cells thatexperienced forest gain during 10 years (around 7.7%). Accordingto ROC (Fig. 11b), LTM-MC and CART outperformed MARS usingvalidation data significantly and LTM-MC performed slightly better
Page 14: Modeling multiple land use changes using ANN, CART and MARS: Comparing tradeoffs in goodness of fit and explanatory power of data mining tools

ied Ea

thM

7

idhpmcfWmCwc

wunho2iRwrfCaCmt

Uvumoicdbcctpau

mgulhfisipwfL

A. Tayyebi, B.C. Pijanowski / International Journal of Appl

han CART. ROC for three models was similar to each other in SEWI;owever, ROC for LTM-MC and CART were greater than the ROC forARS in MRW.

. Conclusions and discussion

Classification algorithms help to understand the existing patternn data and can be used to predict the land use class using suitableata mining techniques. Most of the LUC models developed to dateave been designed for binary predictions (Pontius et al., 2001). Thisaper reconfigured three nonlinear data mining tools to simulateultiple LUC for two diverse areas of the Midwest USA (e.g. agri-

ulture, forest and urban). Here, we modified LTM, CART and MARSor MC that has been originally developed for binary LUC modeling.

e found that the one-per-class coding strategy (e.g., LTM-MC) isore appropriate than a distributed output coding strategy (e.g.,

ART and MARS) for modeling MC LUC problems. Results of ourork also support the conclusion that LTM-MC, CART and MARS

an be used to model multiple LUC.Model assessment for binary classification is straightforward

here scientists usually use a contingency table to compare sim-lated and referenced LUC maps in two categories, change ando-change (Pijanowski et al., 2005). However, very few studiesave focused on building a LUC model for MC (Li and Yeh, 2002a,b)r compare different LUC models with each other (Pontius et al.,008) for multiple LUC modeling. Thus, this study adds to our abil-

ty to simulate multiple land use transitions using nonlinear tools.esults show that three data mining models performed reasonablyell using a hard classification LUC model goodness of fit met-

ic (e.g. PCM) and a soft classification (e.g. ROC) method. We alsoound that LTM-MC gave consistently a better goodness of fit thanART and MARS tools for each land use class. ANNs provided betterccuracy but little explanatory power (Fig. 11). On the other hand,ART and MARS provided clearer explanations of what drivers areost important and values associated with each kind of land use

ransition (Tables 4 and 5).Results also show that LUC is sequential in Mid-Western of

SA (e.g. SEWI and MRW). The areas nearby forests are con-erted to agriculture and agriculture lands are then converted torban (Ray et al., 2012). The former land use transition is nor-ally faster than the latter one. The ambiguous predictions in MC

ccur due to the complexity of the LUC patterns, where data min-ng approaches cannot draw distinct boundaries between land uselasses. The number of cells that experience ambiguous predictionsepends on the ability of data mining procedures to discriminateetween land use classes, strength of drivers, quantity of referencehanges for each land use classes and number of output land uselasses. Here, we developed a simple method to solve the conflic-ion problems in MC. This is achieved by eliminating ambiguousredictions for given cells in simulated land use maps mutuallynd is applicable for simulated land use maps with arbitrary landse classes.

More work, however, is needed to fully understand how toodel multiple land use transitions. LUC models, especially for MC,

enerate both false positive (i.e., assigning a cell to an incorrect landse class) and false negative (i.e., not assigning a cell to a correct

and use class) errors. Models with different algorithms not onlyave different classification performances, but also their misclassi-cation rates (false positive and false negative) are variable acrosspatial and temporal scales. Thus, using only one model may not bedeal and models can complement each other. Using two or more

owerful tools, such as CART and MARS together, in conjunctionith the model that fits the best (LTM-MC), helps to rein-

orce explanation of the underlying complex process occurring inUC.

rth Observation and Geoinformation 28 (2014) 102–116 115

Acknowledgements

Funding to complete this research was obtained through theUSGS Climate Change Research Program, the Great Lakes FisheryTrust, the Wege Foundation, and the Department of Forestry andNatural Resources, Purdue University.

Appendix A. Supplementary data

Supplementary data associated with this article can be found, inthe online version, at http://dx.doi.org/10.1016/j.jag.2013.11.008.

References

Alexandridis, K., Pijanowski, B.C., Lei, Z., 2007. Assessing multi-agent parcelizationperformance in the MABEL simulation model using Monte Carlo replicationexperiments. Environment and Planning B 34, 223–244.

Alexandridis, K., Pijanowski, B.C., 2013. Spatially-explicit Bayesian informationentropy metrics for calibrating landscape transformation models. Entropy 15(7), 2480–2509.

Allwein, E., Shapire, R., Singer, Y., 2000. Reducing multiclass to binary: a unifyingapproach for margin classifiers. Journal of Machine Learning Research, 113–141.

Anderson, J.R., Hardy, E.E., Roach, J.T., Witmer, R.E., 1976. A Land Use and Land CoverClassification System for Use with Remote Sensor Data. US Geological Survey,Professional Paper 964: 28, Reston, VA.

Ayoubloo, M.K., Azamathulla, H.M., Jabbari, E., Zanganeh, M., 2011. Predictivemodel-based for the critical submergence of horizontal intakes in open channelflows with different clearance bottoms using CART ANN and linear regressionapproaches. Expert Systems with Applications 38, 10114–10123.

Ballestores Jr., F., Qiu, Z., Nedorezova, B.N., Nedorezov, L.V., Ferrarini, A., Ramathilaga,A., Ackah, M., 2012. An integrated parcel-based land use change model usingcellular automata and decision tree. Proceedings of the International Academyof Ecology and Environmental Sciences 2 (2), 53–69.

Blockeel, H., Raedt, L.D., Jacobs, N., Demoen, B., 1999. Scaling up inductive logicprogramming by learning from interpretations. Data Mining and KnowledgeDiscovery 3 (1), 59–93.

Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J., 1984. Classification and Regres-sion Trees. Wadsworth, Belmont, CA.

Caruana, R., 1997. Multi-task learning. Machine Learning 28 (1), 41–75.Chen, M.Y., 2011. Predicting corporate financial distress based on integration of deci-

sion tree classification and logistic regression. Expert Systems with Applications38 (2011), 11261–11272.

Clarke, K.C., Hoppen, S., Gaydos, L., 1997. A self-modifying cellular automaton modelof historical urbanization in the San Francisco Bay area. Environment and Plan-ning B: Planning and Design 24, 247–261.

Dietterich, T.G., Bakiri, G., 1995. Solving multiclass learning problems via error cor-recting output codes. Journal of Artificial Intelligence Research 39, 1–38.

Dubchak, I., Muchnik, I., Mayor, C., Dralyuk, I., Kim, S.H., 1999. Recognition of aprotein fold in the context of the Structural Classification of Proteins (SCOP)classification. Proteins 35, 401–407.

Friedman, J.H., 1991. Multivariate Adaptive Regression Splines (with discussion).Annals of Statistics 19, 1.

Friedman, J.H., 1996. Another approach to polychotomous classification. Technicalreport. Stanford University.

Gooijer, D.J., Ray, G.B.K., 2003. Modeling vector nonlinear time series using POLY-MARS. Computational Statistics & Data Analysis 42, 73–90.

Grzesiak, W., Zaborski, D., Sablik, P., Zukiewicz, A., Dybus, A., Szatkowsk, I., 2011.Detection of cows with insemination problems using selected classificationmodels. Computers and Electronics in Agriculture 74 (2010), 265–273.

Hastie, T., Tibshirani, R., 1998. Classification by pairwise coupling. In: Michael, I.,Jordan, Michael, J., Kearns, Sara, A., Solla (Eds.), Advances in Neural InformationProcessing Systems, vol. 10. The MIT Press.

Hand, D.J., Till, R.J., 2001. A simple generalization of the area under the ROC curvefor multiple class classification problems. Machine Learning 45, 171–186.

Ho, T.K., 2000. Complexity of classification problems and comparative advantagesof combined classifiers. Lecture Notes in Computer Science 1857, 97–106.

Irwin, E., Geoghegan, J., 2001. Theory, data, methods: developing spatially-expliciteconomic models of land use change. Agriculture, Ecosystems & Environment85, 7–24.

Jenerette, G.D., Wu, J., 2001. Analysis and simulation of land-use change in the centralArizona – Phoenix region, USA. Landscape Ecology 16, 611–626.

Lambin, E.F., Geist, H.J. (Eds.), 2006. Land-use and Land-Cover Change: Local Pro-cesses and Global Impacts. Springer, Berlin.

Li, X., Yeh, A.G.O., 2002a. Neural-network-based cellular automata for simulatingmultiple land use changes using GIS. International Journal of Geographical Infor-mation Science 16 (4), 323–343.

Li, X., Yeh, A.G.O., 2002b. Urban simulation using principal components analysis

and cellular automata for land use planning. Photogrammetric Engineering andRemote Sensing 68 (4), 341–351.

Li, Y.F., Ng, S.H., Xie, M., Goh, T.N., 2011. A systematic comparison of metamodelingtechniques for simulation optimization in Decision Support Systems. AppliedSoft Computing 10 (2010), 1257–1273.

Page 15: Modeling multiple land use changes using ANN, CART and MARS: Comparing tradeoffs in goodness of fit and explanatory power of data mining tools

1 ied Ea

L

M

P

P

P

P

P

P

P

P

P

P

P

R

R

R

16 A. Tayyebi, B.C. Pijanowski / International Journal of Appl

oh, W.-Y., 2010. Tree-structured classifiers. WIREs Computational Statistics 2,364–369.

enéndeza, L.A., de Cos Juez, F.J., Lasheras, F.S., Riesgo, J.A.A., 2011. Artificial neuralnetworks applied to cancer detection in a breast screening programme. Mathe-matical and Computer Modelling 52 (2010), 983–991.

arker, D.C., Manson, S.M., Janssen, M.A., Hoffmann, M.J., Deadman, P., 2003. Multi-agent systems for the simulation of land-use and land-cover change: a review.Annals of the Association of American Geographers 93 (2), 314–337.

aulsen, P., Smulders, F.J.M., Tichy, A., Aydin, A., Höck, C., 2011. Application of Clas-sification and Regression Tree (CART) analysis on the microflora of minced meatfor classification according to Reg. (EC) 2073/2005. Meat Science 88 (2011),531–534.

ijanowski, B.C., Brown, D.G., Shellito, B.A., Manik, G.A., 2002. Using neural networksand GIS to forecast land use changes: a Land Transformation Model. ComputersEnvironment and Urban Systems 26, 553–575.

ijanowski, B.C., Pithadia, S., Shellito, B.A., Alexandridis, K., 2005. Calibrating a neu-ral network based urban change model for two metropolitan areas of UpperMidwest of the United States. International Journal of Geographical InformationScience 19, 197–215.

ijanowski, B.C., Alexandridis, K., Mueller, D., 2006. Modeling urbanization patternsin two diverse regions of the world. Journal of Land Use Science 1, 83–108.

ijanowski, B.C., Tayyebi, A., Delavar, M.R., Yazdanpanah, M.J., 2009. Urban expan-sion simulation using geographic information systems and artificial neuralnetworks. International Journal of Environmental Research 3 (4), 493–502.

ijanowski, B.C., Tayyebi, A., Doucette, J., Pekin, B.K., Braun, D., Plourde, J., 2013. Abig data urban growth simulation at a national scale: configuring the GIS andneural network based Land Transformation Model to run in a high performancecomputing environment. Environmental Modeling & Software 51, 250–268.

ontius Jr., R.G., Cornell, J., Hall, C., 2001. Modeling the spatial pattern of land-usechange with GEOMOD2: application and validation for Costa Rica. Agriculture,Ecosystems & Environment 85 (1–3), 191–203.

ontius Jr., R.G., Batchu, K., 2003. Using the relative operating characteristic to quan-tify certainty in prediction of location of land cover change in India. Transactionsin GIS 7 (4), 467–484.

ontius Jr., R.G., Connors, J., 2006. Expanding the conceptual, mathematical, andpractical methods for map comparison. In: Conference Proceedings of the Meet-ing of Spatial Accuracy, Lisbon, Portugal, 16 pp.

ontius Jr., R.G., Boersma, W., Castella, J.C., Clarke, K., de Nijs, T., Dietzel, C., Duan, Z.,Fotsing, E., Goldstein, N., Kok, K., Koomen, E., Lippitt, C.D., McConnell, W., MohdSood, A., Pijanowski, B., Pithadia, S., Sweeney, S., Trung, T.N., Veldkamp, A.T.,Verburg, P.H., 2008. Comparing input, output, and validation maps for severalmodels of land change. Annals of Regional Science 42, 11–47.

efaeilzadeh, P., Tang, L., Liu, H., 2008. Cross-validation. http://www.public.asu.edu/∼ltang9/papers/ency-cross-validation.pdf

alha, C.G., Abreu, C.G., Coelho, C.G., Zaghetto, A., Macchiavello, B., Machado, R.B.,

2013. A multi-agent model system for land-use change simulation. Environmen-tal Modeling & Software 42, 30–46.

ay, D.K., Pijanowski, B.C., Kendall, A.D., Hyndman, D.W., 2012. Coupling land useand groundwater models to map land use legacies: assessment of model uncer-tainties relevant to land use planning. Applied Geography 34 (2012), 356–370.

rth Observation and Geoinformation 28 (2014) 102–116

Rifkin, R., Klautau, A., 2004. Parallel networks that learn to pronounce English text.Journal of Machine Learning Research, 101–141.

Steinberg, D., Colla, P., 1997. CART-Classification And Regression Trees: A Supple-mentary Manual for Windows. Salford Systems Inc., San Diego.

Steinberg, D., Golovnya, M., 2006. CART 6. 0 User’s Manual. Salford Systems, SanDiego, CA.

Tayyebi, A., Delavar, M.R., Yazdanpanah, M.J., Pijanowski, B.C., Saeedi, S.,Tayyebi, A.H., 2010. A Spatial Logistic Regression Model for SimulatingLand Use Patterns: A Case Study of the Shiraz Metropolitan Area of IranAdvances in Earth Observation of Global Change. Springer, Netherlands, pp.27–42.

Tayyebi, A., Pijanowski, B.C., Tayyebi, A.H., 2011a. An urban growth bound-ary model using neural networks, GIS and radial parameterization:an application to Tehran, Iran. Landscape and Urban Planning 100,35–44.

Tayyebi, A., Pijanowski, B.C., Pekin, B., 2011b. Two rule-based urban growth bound-ary models applied to the Tehran metropolitan area, Iran. Applied Geography31, 908–918.

Tayyebi, A., Pekin, B.K., Pijanowski, B.C., Plourde, J.D., Doucette, J., Braun, D., 2012.Hierarchical modeling of urban growth across the conterminous USA: develop-ing meso-scale quantity drivers for the Land Transformation Model. Journal ofLand Use Science, 1–12.

Tayyebi, A., Perry, P.C., Tayyebi, A.H., 2013a. Predicting the expansion of an urbanboundary using spatial logistic regression and hybrid raster–vector routineswith remote sensing and GIS. International Journal of Geographical InformationScience, 1–21 (in press).

Tayyebi, A.H., Tayyebi, A., Khanna, N., 2013b. Assessing uncertainty dimensions inland use change models using swap and multiplicative error models by injectingattribute and positional errors in spatial data. International Journal of RemoteSensing 35 (1), 149–170.

Timofeev, R., 2004. Classification and regression trees (CART) theory and applica-tions. Master’s thesis. Humboldt University Berlin.

Tsoumakas, G., Katakis, I., Vlahavas, I., 2010. Mining multi-label data. In: Maimon,O., Rokach, L. (Eds.), Data Mining and Knowledge Discovery Handbook. , 2nd ed.Springer, pp. 667–685 (Chapter 34).

Veldkamp, A., Lambin, E.F., 2001. Editorial: predicting land-use change. Agriculture,Ecosystems & Environment 85, 1–6.

Verburg, P.H., Soepbaoer, W., Veldkamp, A., Limpiada, R., Espaldon, V., Mastura, S.,2002. Modeling the spatial dynamics of regional land use: the CLUE S model.Environmental Management 30 (2), 391–405.

Wang, K., Zhou, S., Liew, S., 1999. Building hierarchical classifiers using class proxim-ities. In: Proceedings of the International Conference on Very Large Data Bases(VLDB).

Washington, C., Pijanowski, B., Campbell, D., Olson, J., Kinyamario, J., Irandu, E.,Nganga, J., Gicheru, P., 2010. Using a role-playing game to inform the devel-

opment of land-use models for the study of a complex socio-ecological system.Agricultural Systems, http://dx.doi.org/10.1016/j.agsy.2009.10.002.

Yap, B.W., Ong, S.H., Husain, N.H.M., 2011. Using data mining to improve assessmentof credit worthiness via credit scoring models. Expert Systems with Applications38, 13274–13283.