research article decision tree approach for soil...

9
Hindawi Publishing Corporation e Scientific World Journal Volume 2013, Article ID 346285, 8 pages http://dx.doi.org/10.1155/2013/346285 Research Article Decision Tree Approach for Soil Liquefaction Assessment Amir H. Gandomi, 1 Mark M. Fridline, 2 and David A. Roke 1 1 Department of Civil Engineering, e University of Akron, Akron, OH 44325, USA 2 Department of Statistics, e University of Akron, Akron, OH 44325, USA Correspondence should be addressed to David A. Roke; [email protected] Received 30 August 2013; Accepted 1 October 2013 Academic Editors: G.-C. Fang and S. Kazemian Copyright © 2013 Amir H. Gandomi et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. In the current study, the performances of some decision tree (DT) techniques are evaluated for postearthquake soil liquefaction assessment. A database containing 620 records of seismic parameters and soil properties is used in this study. ree decision tree techniques are used here in two different ways, considering statistical and engineering points of view, to develop decision rules. e DT results are compared to the logistic regression (LR) model. e results of this study indicate that the DTs not only successfully predict liquefaction but they can also outperform the LR model. e best DT models are interpreted and evaluated based on an engineering point of view. 1. Introduction Empirical classification techniques remain a highly researched topic, especially for real-world problems. In general, there are two types of classification techniques, statistical techniques and soſt computing-based techniques. e most well-known soſt computing techniques are artificial neural networks (ANNs), support vector machines (SVMs), and genetic programming (GP) [1]. ese techniques are heuristic and mostly have a random nature; therefore, several runs are required to achieve useful results, even for a constant parameter setting. In contrast, statistical classifiers do not have a random nature and they are mathematically proven. e most common statistical techniques are logistic regression (LR) and decision tree (DT). Although DT techniques have been successfully applied to several real- world problems, they have been rarely used in engineering, especially geotechnical engineering (e.g., [2]). Soil liquefaction is a process by which intergranular stresses vanish within the soil. Generally, loose and saturated sandy soils are susceptible to liquefaction. Seismic shaking, nonseismic vibration, or waved-induced shear stresses can cause dynamic liquefaction. e application of noncyclic shear stresses can cause static liquefaction in some loose sediments [3]. During the liquefaction process, a mass of saturated sandy soil tends to decrease in volume. If drainage conditions are not met or the drainage velocity is low, this decrease in volume will be accompanied with increased pore water pressure. ereaſter, the pore water pressure () will gradually equal the total stress () in the soil. en, the value of effective stress ( ) becomes zero; consequently, shear strength () approaches zero: = − , = tan . (1) In such a case, the saturated sandy soil functions as a liquid and cannot bear any shear strength, resulting in flow of the sand [4]. e potential failure of critical structures due to liquefac- tion is a major concern in geotechnical engineering. us, extensive research has been conducted to understand the liquefaction phenomenon. e results obtained from in situ tests such as the standard penetration test (SPT) and the cone penetration test (CPT) are widely used for the evaluation of the soil liquefaction potential (e.g., [5, 6]). As the most common in situ test, SPT provides an approximate measure of the dynamic soil resistance, as well as a disturbed drive sample. In order to perform this test, a hollow thick-walled tube is driven into the ground and the number of blows to advance the split-barrel sampler a vertical distance of 300 mm

Upload: others

Post on 10-Mar-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Hindawi Publishing CorporationThe Scientific World JournalVolume 2013 Article ID 346285 8 pageshttpdxdoiorg1011552013346285

Research ArticleDecision Tree Approach for Soil Liquefaction Assessment

Amir H Gandomi1 Mark M Fridline2 and David A Roke1

1 Department of Civil Engineering The University of Akron Akron OH 44325 USA2Department of Statistics The University of Akron Akron OH 44325 USA

Correspondence should be addressed to David A Roke rokeuakronedu

Received 30 August 2013 Accepted 1 October 2013

Academic Editors G-C Fang and S Kazemian

Copyright copy 2013 Amir H Gandomi et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

In the current study the performances of some decision tree (DT) techniques are evaluated for postearthquake soil liquefactionassessment A database containing 620 records of seismic parameters and soil properties is used in this study Three decision treetechniques are used here in two different ways considering statistical and engineering points of view to develop decision rulesTheDT results are compared to the logistic regression (LR) model The results of this study indicate that the DTs not only successfullypredict liquefaction but they can also outperform the LR model The best DT models are interpreted and evaluated based on anengineering point of view

1 Introduction

Empirical classification techniques remain a highlyresearched topic especially for real-world problems Ingeneral there are two types of classification techniquesstatistical techniques and soft computing-based techniquesThemost well-known soft computing techniques are artificialneural networks (ANNs) support vector machines (SVMs)and genetic programming (GP) [1] These techniques areheuristic and mostly have a random nature thereforeseveral runs are required to achieve useful results even for aconstant parameter setting In contrast statistical classifiersdo not have a random nature and they are mathematicallyproven The most common statistical techniques are logisticregression (LR) and decision tree (DT) Although DTtechniques have been successfully applied to several real-world problems they have been rarely used in engineeringespecially geotechnical engineering (eg [2])

Soil liquefaction is a process by which intergranularstresses vanish within the soil Generally loose and saturatedsandy soils are susceptible to liquefaction Seismic shakingnonseismic vibration or waved-induced shear stresses cancause dynamic liquefaction The application of noncyclicshear stresses can cause static liquefaction in some loosesediments [3] During the liquefaction process a mass ofsaturated sandy soil tends to decrease in volume If drainage

conditions are not met or the drainage velocity is low thisdecrease in volume will be accompanied with increased porewater pressure Thereafter the pore water pressure (119906) willgradually equal the total stress (120590) in the soil Then the valueof effective stress (1205901015840) becomes zero consequently shearstrength (120591) approaches zero

1205901015840= 120590 minus 119906

120591 = 1205901015840sdot tan 119906

(1)

In such a case the saturated sandy soil functions as a liquidand cannot bear any shear strength resulting in flow of thesand [4]

The potential failure of critical structures due to liquefac-tion is a major concern in geotechnical engineering Thusextensive research has been conducted to understand theliquefaction phenomenon The results obtained from in situtests such as the standard penetration test (SPT) and the conepenetration test (CPT) are widely used for the evaluationof the soil liquefaction potential (eg [5 6]) As the mostcommon in situ test SPT provides an approximate measureof the dynamic soil resistance as well as a disturbed drivesample In order to perform this test a hollow thick-walledtube is driven into the ground and the number of blows toadvance the split-barrel sampler a vertical distance of 300mm

2 The Scientific World Journal

is measured Using a drop weight system a 635 kg (140 lb)hammer is repeatedly dropped from 076m to achieve threesuccessive increments of 150mm [7] The 119873 value (ldquoblowcountrdquo) or SPT resistance is the sumof the number of blows toadvance the second and third increments SPT can be used forcharacterizing a wide range of soil types CPT is another fastand economical in situ test that provides continuous profilingof geostratigraphy and soil property evaluation [7] The testis performed by pushing a cylindrical steel probe into theground at a constant rate of 20mms and measuring theresistance to penetration The standard penetrometer has aconical tip with 60∘ angle apex 357mm diameter body and150 cm2 friction sleeve This test can be used in soils rangingfrom very soft clays to dense sands CPT provides moreaccurate and reliable data for liquefaction analysis comparedto more conventional soil tests such as cyclic triaxial andsimple shear tests Thus it can be considered as a very goodcomplement to SPT measurements

Classification techniques have beenwidely used for lique-faction modelling Table 1 shows the techniques and the soiltest used in each of them From the table it is clear that softcomputing techniques have been widely used for liquefactionmodeling However statistical methods such as LR and DThave been only used for liquefaction assessment based onCPT dataThis paper presents liquefaction assessments basedon SPT data using both LR and DT

Generally statistical and soft computing techniques arenot based on engineering fundamentals as they are empir-ically developed based on experimental data In the currentstudy we have forced the liquefaction assessment DTs touse SPT results as the first variable to corroborate with theengineering point of viewAnother commonproblemofmostclassification techniques is that they are not interpretableHowever DT does not have this issue and its built modelscan be easily interpreted based on the nature of the param-eters In this study three different DT methodologies areused including chi-squared automatic interaction detection(CHAID) exhaustive CHAID (E-CHAID) and classificationand regression tree (CART) algorithms The DT results arefurther compared with logistic regression (LR) analysis as aclassical benchmark

2 Predictive Modeling Techniques

21 Decision Tree Algorithms Decision trees are becominga more attractive predictive modeling procedure becauseof the easy interpretation by nonstatisticians One of theadvantages of DT analysis is that the relationship betweenthe binary dependent variable and the related independentvariables is clearly illustrated using a tree structure DTanalysis is especially useful when the independent variablesare expressed as both categories and continuous values Thisnonparametric modeling procedure makes no assumptionsabout the underlying data DT analysis determines howindependent variables best combine to explain the outcomeof a given binary dependent variable In simple terms DTsbreak down to ldquoyesrdquo or ldquonordquo statements according to ldquoif-thenrdquologic In DT algorithms the data set is partitioned into two

or more mutually exclusive subsets in each split The goal isto produce subsets of the data which are as homogeneous aspossible with respect to the target (dependent) variable

Kass [8] proposed the CHAID (chi-squared automaticinteraction detection) algorithm as the first tree-based classi-fication technique Continuous predictors are discretized intoseveral groups and changed to ordinal predictors The mainsteps of the algorithm are themerging splitting and stoppingsteps Merging is useful for any potential parent node that hasmore than two categories in this step potential parent nodesaremerged until all adjusted119875 values are less than the defined120572merge Splitting of potential parents is the process used to findthe best split using the lowest119875 value the splitting step selectswhich independent variable to be used to optimally split thenode The stopping step checks if the tree growing processshould be stopped according to certain stopping rules (egreaching the maximum tree depth level or minimum parentor child nodes)

Biggs et al [9] proposed a new CHAID algorithmcalled exhaustive CHAID (E-CHAID) In E-CHAID thebasic CHAID algorithm is changed to improve merging andtesting of dependent variables as a consequence E-CHAID ismore computationally intensive In the E-CHAID algorithmthere is no reference to any 120572merge value category mergingcontinues until only two categories remain Therefore E-CHAID may not suitable for large data sets with manycontinuous predictor variables The program then followsthe splitting and stopping steps as described for the CHAIDalgorithm

Breiman et al [10] proposed the CART (classification andregression trees) algorithmThe main difference between theCART and CHAID algorithms is that the CART procedurewill grow as a purely binary treeTherefore CART results areeasier to understand as parent nodes are always split into 2child nodes A complete binary tree algorithm partitions dataand produces homogeneous subsets In the CART procedurewe first develop the maximum tree and then prune it to avoidoverfitting

22 Logistic Regression Logistic regression models are usedwhen the dependent variable (119910) is binary (eg the depen-dent variable can take the value of 1 with probability ofsuccess 120587 or the value of 0 with probability of failure 1 minus 120587)and the independent variables (119909

119894) are either categorical or

continuous values The following is a representation of thebinary multiple LR model

ln[120587 (119909119894)

1 minus 120587 (119909119894)

] = ln[119875 (119910119894= 1119909

119894)

119875 (119910119894= 0119909

119894)

] = 1205730+

119901

sum

119894=1

120573119894119909119894 (2)

where 1205730is a constant and 120573

119894are the coefficients of the

independent variables in the model Equation (2) called thelikelihood function is used for estimating the LR coefficientsin the model The maximum likelihood estimation methoduses an iterative procedure to find the model coefficientsthat best match the pattern of observations in the sampledata Interpretation of the model comes from transformingthe LR coefficients for each independent variable Simplytake the exponential of the coefficients (119890120573119894) to determine

The Scientific World Journal 3

Table 1 Different liquefaction models proposed in the literature

Type of technique Reference Year Soil test Classification technique Number of records

Soft computing

[11] 1994 SPT ANN 85[12] 1996 CPT ANN 109[13] 1998 SPT ANN 105[14] 2002 CPT ANN 170[15] 2006 CPT and SPT SVM 109 and 85[16] 2007 CPT and SPT SVM 170 and 105[17] 2007 SPT ANN 620[18] 2007 CPT SVM 226[19] 2009 CPT ANN 226[20] 2011 CPT GP 170[21] 2011 SPT GP and ANN 569[22] 2012 CPT GP 170[23] 2013 CPT GP 170[24] 2013 CPT GP 170

Statistical [25] 2006 CPT LR 396[26] 2008 CPT DT 178

the influences of each independent variable on the dependentvariable in terms of the odds ratio To determine if eachmodelcoefficient is statistically significant the Wald test will beused

3 Liquefaction Modeling

31 Experimental Database and Data Preprocessing Adatabase [17] with 620 postearthquake observations and12 variables has been used in this study In addition to theseismic parameters the database includes soil properties andthe standard penetration test (SPT) results of the soil Thebinary dependent variable is liquefaction and we selected avalue of 0 for nonliquefied soils and a value of 1 for liquefiedsoils The database includes the values of the followingindependent variables

119885 soil specimen depth (m)(1198731)60 corrected PST number ()11986575 percent fines less than 75120583m ()119889119908 ground water table depth (m)120590vo total vertical stress (kPa)1205901015840

vo effective vertical stress (kPa)119886119905 threshold acceleration which contributes to thestrain-based procedure and along with strain-basedsafety factor indicates the exceedance of the thresholdstrain (Hanna et al 2007) (g)

120591av1205901015840

vo cyclic stress ratio which illustrates the seismicdemand on soil119881119904 shear wave velocity (ms)1205931015840 initial soil friction angle (∘)119872V earthquake magnitude (Richter)

PHA peak horizontal acceleration at ground surface (g)

The database was randomly divided into training andtesting subsets The training data were used for the modeldevelopment using the algorithms The testing data wereused to evaluate the generalization capability of the obtainedmodels on the unseen data In order to obtain a consistentdata division several combinations of the training and testingsets were considered Out of the 620 records 80 was usedas the training data and the remaining 20 of the data setswere taken for the testing of the generalization capability ofthe DTs and LR

The descriptive statistics of the data used in this studyare given in Table 2 From this table it can be seen thatthe database contains only high magnitude earthquakesThe records indicate that 256 sites were liquefied and theremaining 364 were not liquefied

The correlationmatrix is computed for all parameters andis presented in Table 3 As expected the total and effectivevertical stresses have strong correlation and specimen depthcorrelates strongly with both stresses From this table SPTresults have high correlation with initial soil friction angleThis is expected as the friction angle represents the shearcapacity of soil which is the most important parameter inSPT results

32 Model Construction Using DTs The procedure of theCART algorithm ranks the independent variables in terms oftheir predictive power the CART algorithm therefore servesas a powerful exploratory tool for understanding the underly-ing structure of the data [27]The relative importance of the 12independent variables is shown in Figure 1 Figure 1 indicatesthat the most important predictor variable is friction angle(1205931015840) Therefore the first split of the CART algorithm is basedon friction angle In the CHAID and E-CHAID algorithmsthe lowest adjusted 119875 value in each level determines the mostassociate independent variable After calculation the frictionangle has the lowest adjusted 119875 value at the beginning of

4 The Scientific World Journal

Table 2 Descriptive statistics of variables used in the model development

Predictor Minimum Mean Maximum Standard deviation119885 08 766 198 490(1198731)60 1 1448 75 1139119865 le 75 120583m 1 6299 100 3428119889119908

035 145 10 120120590vo 121 14460 4089 98201205901015840

vo 75 8248 2337 5284119886119905

0 007 085 007120591av1205901015840

vo 012 037 077 015119881119904

37 16698 500 67091205931015840 2346 3196 5208 485119872V 74 749 76 010PHA 018 038 067 015

Table 3 Correlation matrix

119885 (1198731)60 119865

75119889119908

120590vo 1205901015840

vo 119886119905120591av1205901015840

vo 119881119904

1205931015840119872

V PHA Liq119885 1 039 minus025 008 100 098 minus023 minus012 058 053 049 minus011 minus030(1198731)60 039 1 minus057 008 041 042 003 015 040 085 015 017 minus02711986575

minus025 minus057 1 minus018 minus027 minus030 minus012 minus007 minus030 minus054 minus028 minus016 minus006119889119908

008 008 minus018 1 007 021 014 minus027 016 013 021 minus002 minus013120590vo 100 041 minus027 007 1 099 minus022 minus009 059 055 050 minus007 minus0291205901015840

vo 098 042 minus030 021 099 1 minus019 minus010 061 056 055 minus003 minus028119886119905

minus023 003 minus012 014 minus022 minus019 1 minus006 044 minus002 000 011 minus008120591av1205901015840

vo minus012 015 minus007 minus027 minus009 minus010 minus006 1 minus003 009 minus020 090 025119881119904

058 040 minus030 016 059 061 044 minus003 1 047 032 004 minus0211205931015840 053 085 minus054 013 055 056 minus002 009 047 1 029 012 minus035119872V 049 015 minus028 021 050 055 000 minus020 032 029 1 minus011 minus015PHA minus011 017 minus016 minus002 minus007 minus003 011 090 004 012 minus011 1 019Liq minus030 minus027 minus006 minus013 minus029 minus028 minus008 025 minus021 minus035 minus015 019 1

0

10

20

30

40

50

60

70

80

90

100

Nor

mal

ized

impo

rtan

ce (

)

Independent variable

120593998400

F75

120590vo

120591av

120590998400 vo Z

120590998400 vo

(N1) 60

PHA

dW V

s a t

M

Figure 1 Importance of independent variables

the CHAID and E-CHAID algorithms and again serves asthe first split

Based on geotechnical engineering theory the SPT testresults ((119873

1)60) should be the most associated parameter and

should therefore be the first split in theDT algorithmsThere-fore for DT modeling each algorithm was implemented intwo ways

(1) let the DT algorithm determine the first split based onits own statistical method

(2) force the DT algorithm to use SPT test results as itsfirst split

The performance of a DT model mainly depends onthe architecture and parameter settings There are manyparameters that are involved in the DT algorithms For theCHAID and E-CHAID algorithms the choices of maximumtree depth and minimum number of cases in the parent andchild nodes play an important role in model constructionAfter evaluating different values for themaximum tree depthit is apparent that only training results (not the results forunseen data) improve for maximum depths greater than4 Therefore to avoid overfitting we have set the maxi-mum tree depth as 4 Products of ten are considered forthe minimum number of cases in the parent nodes theminimum number of cases in the child nodes is always set as

The Scientific World Journal 5

half of the parent nodesrsquo minimum For this study 30 and 15are set as the values for the parent and child node maximumnumber of cases Two chi-square statistics are commonlyused in the CHAID and E-CHAID algorithms Pearson andlikelihood ratio Pearson is faster than the likelihood ratioand is more suitable for very large databases The likelihoodratio is more robust than Pearson but its calculation is moretime consuming and it is therefore more suitable for a smalldatabase [28] Due to its robustness the likelihood ratio isused in this study as the chi-square statistic for both theCHAID and E-CHAID algorithms

For the CART algorithm the minimum number of casesin the parent and child nodes is set just like the other DTalgorithms The maximum tree depth is set as 7 and pruningis used to avoid overfitting (pruning at least one level in thisproblem) IBM SPSS software package version 21 [29] wasused to perform the analysis

33 Model Construction Using LR In the conventional clas-sification process logistic regression analysis is an importanttool for building a model In this study the LR algorithmwas performed to approximate the predictive power ofthe DT techniques in comparison with another statisticalclassification technique

The LR models cannot be developed using the sameinput variables and data divisions as DTs Before findingthe final LR model any multicollinearity problems betweenthe independent variables were solved by removing thehighly correlated independent variables from the modelSevere multicollinearity can cause instability in the modelcoefficients especially when highly correlated variables areincluded in the model

4 Results and Discussion

As described above and indicated in Table 4 six different DT-based models were obtained for the assessment of the soilliquefaction The DT models for which the SPT results werethe first variable are indicated with the suffix ldquoSPTrdquo Overallperformance of the DT and LR-based models in terms ofmisclassification rate (risk) on the training data and testingdata subsets and the full data set are summarized in Table 4

Two important characteristics of a classification modelare sensitivity and specificity Sensitivity quantifies the pro-portion of correctly identified positives (liquefied soils) andspecificity gives the proportion of correctly identified nega-tives (nonliquefied soils) Sensitivity and specificity of all DTand LR models are also presented in Table 5 Comparing theperformance of DTs and LRs as tabulated in Tables 4 and5 the DT results are clearly better than the LR results theDTs generally have lower risk values and higher sensitivityand specificity values on the training testing and completedatasets An interesting point from these tables is that whenthe DT algorithms are forced to used SPT results as thefirst variable the model performances on the testing datasets(unseen data) are always improved This suggests that thestrategy of using SPT as the first variable is very effective forfuture prediction of new datasets However as expected if

0102030405060708090100110

0 20 40 60 80 100

Gain

Percentile

CHAID-SPTCHAID

Figure 2 Gain plot of the CHAID and CHAID-SPT models

the DTs use their own criteria for choosing the first variableduring the training process performance is improved on thetraining datasets but not the testing datasets

As shown in Table 4 CHAID models (CHAID andCHAID-SPT) produce the lowest risks among all DTs andLRs CHAID-SPThas the best performance on unseen testingdata sets (risk equal to 0169) and regular CHAID has the bestperformance on the training dataset (risk equal to 0141) Aspredicting liquefied soil after earthquakes is very importantthe highest sensitivity value for each set is bolded in Table 5From this table it can be seen that CHAID-SPT has thehighest sensitivity on the testing datasets (sensitivity equal to0867) and CHAID has the highest sensitivity on the trainingdatasets (sensitivity equal to 0806) Therefore these twomodels have been chosen as the final models

Percentage of total records in the target category in eachnode is recognized as gain The gains plot is a line chartof cumulative percentile gains calculated using cumulativepercentile target over total target [28] The gains plot of theselected models (CHAID and CHAID-SPT) for the trainingresults is presented in Figure 2 In this figure the straight lineshows the random selection model and the area between itand the model curve shows the model benefit area Figure 2shows the effectiveness of the CHAID and CHAID-SPTmodels From this figure it is clear that the CHAID modelhas the better benefit on the training data

The final decision tree for the CHAID-SPT model ispresented in Figure 3 This tree has 22 nodes 14 of which areterminal nodes The first level of the tree was split into fourinitial branches corresponding to four ranges of SPT valuesAs seen in the first level of the tree the two nodes for thehighest SPT values are terminal nodes which are interestingand show the importance of the SPT variable For these twonodes (nodes 3 and 4) the chance of liquefaction is very lowwhich corresponds with geotechnical engineering theory

In the second level the percentage of fines is the bestpredictor for the soils with the lowest SPT values and itsthree splits show that liquefaction probability decreases with

6 The Scientific World Journal

Table 4 Risk estimation of the DT and LR models

Model Testing Training AllCHAID 0234 0141 0160E-CHAID 0234 0161 0176CART 0194 0163 0169CHAID-SPT 0169a 0188 0184E-CHAID-SPT 0177 0161 0165CART-SPT 0185 0165 0169LR 0258 0270 0268aBold sets are the lowest risk values for each set

Table 5 Sensitivity and specificity of the DT and LR models

Model Testing Training AllSensitivity Specificity Sensitivity Specificity Sensitivity Specificity

CHAID 0622 0848 0806 0898 0773 0887E-CHAID 0644 0835 0754 0902 0734 0887CART 0667 0886 0739 0909 0727 0904CHAID-SPT 0867a 0810 0763 0849 0781 0841E-CHAID-SPT 0711 0886 0768 0891 0758 0890CART-SPT 0800 0823 0796 0863 0797 0854LR 0822 0696 0673 0772 0699 0755aBold sets are the highest sensitivity values for each set

increasing percentage of fines For larger SPT values (14 ltSPT le 24) peak ground surface horizontal acceleration is thebest predictor In this range increasing peak horizontal accel-eration substantially raises the chance of liquefaction whichcorresponds with earthquake engineering fundamentals

The third level consists largely of terminal nodes and hasbranches based on four different variables earthquake mag-nitude cyclic stress ratio total vertical stress and percentageof fines In this level the liquefaction probability decreaseswith increasing total vertical stress and increasing cyclicstress ratio Like in the second level liquefaction probabilitydecreases with increasing percentage of fines

In the fourth level there are five terminal nodes onenode is split into three ranges of soil friction angle and onenode is split using two ranges of effective vertical stressIn this level the liquefaction probability decreases withincreasing effective vertical stress which corresponds to thetrend of total vertical stress in the third level and geotechnicalengineering fundamentals

The final decision tree for the CHAID model is shown inFigure 4This tree hasmore terminal nodes than theCHAID-SPT tree with 26 nodes (17 of which are terminal nodes)Thefirst level of the tree was split to six initial branches accordingto the initial soil friction angle which was indicated as thevariable most associated with liquefaction according to theCHAID algorithm As seen in the first level of the tree thenode for the highest angles (1205931015840 gt 3929∘) is a terminal nodethat indicates zero probability of liquefaction

In the second level nodes 1 and 2 related to the lowestinitial soil friction angle (1205931015840 le 2734∘) are split using thepercentage of fines This indicates that percentage of finesis most associated with liquefaction prediction for low soilfriction angles From the branches of the nodes 1 and 2 of the

CHAID tree (in the third level) the liquefaction probabilitydecreases with increasing percentage of fines which agreeswith the previous resultsThe soil with an initial friction anglebetween the 2849∘ and 335∘ the cyclic stress ratio is thebest predictor of liquefaction The probability of liquefactionincreases significantly with an increasing cyclic stress ratio

In the third level only one node is a parent node theother six nodes are terminal nodes In the fourth level thereare only two terminal nodes split using the percentage offines From the third and forth levels it can be noted thatliquefaction probability decreases with increasing percentageof fines

Froma comparison of theCHAID-SPT andCHAID treesit can be seen that percentage of fines is the best second-levelpredictor for the lowest SPT values ((119873

1)60le 14) and for the

lowest initial soil friction angle values (1205931015840 le 2734∘) Anothersimilarity of the developed CHAID and CHAID-SPT trees inthe second level is that peak horizontal acceleration is the bestpredictor for midrange values of SPT (14 lt (119873

1)60le 24) and

midrange values of soil friction angle (335∘ lt 1205931015840 le 3929∘)In the second level for the highest values of SPT and initialsoil friction angle there is zero liquefaction probability Fromthese similarities there appears to be a strong correlationbetween SPT and initial soil friction angle This generalconclusion can be also verifiedwith the correlation coefficientof these two parameters (presented in Table 3) From thesedata SPT and initial soil friction angle are high correlatedparameters

5 Summary and Conclusions

Six decision tree models were successfully developed in thisstudy to classify the liquefied and nonliquefied soil using

The Scientific World Journal 7

PHA le 18 g 18 g lt PHA

LiquefactionNo 575Yes 425 Root node

No 524Yes 476

Node 1

No 25Yes 75

Node 5

No 65Yes 935

Node 10

No 42Yes 58 Node 11

No 432Yes 568

Node 6

No 808Yes 192

Node 12

No 255Yes 745

Node 13

No 768Yes 232

Node 7

No 724Yes 276 Node 14

No 761Yes 239 Node 18

No 538Yes 462

Node 19

No 903Yes 97 Node 20

No 100Yes 0 Node 15

No 41Yes 59 Node 2

No 923Yes 77 Node 8

No 23Yes 77

Node 9

No 0Yes 100

Node 16

No 321Yes 679

Node 17

No 156Yes 844 Node 21

No 571Yes 429

Node 22

No 909Yes 91 Node 3

No 100Yes 0 Node 4

(N1)60 le 14 14 lt (N1)60 le 24 24 lt (N1)60 le 28 28 lt (N1)60

F75 le 74 74 lt F75 le 93 93 lt F75

Mw le 74 74lt Mw 028 lt 120590vo le 1741kPa 1741kPa lt 120590vo F75 le 19 19 lt F75

120593998400 le ltle 977kPa 977kPa2734∘ 298∘ lt 1205939984002734 lt 120593998400 le 298

120591av120590998400vo le 028 120591av120590998400vo

120590998400vo120590998400vo

Figure 3 Final CHAID-SPT tree

LiquefactionNo 575Yes 425

Root node

No 383Yes 617

Node 1

No 43Yes 957

Node 7

No 708Yes 292

Node 8

No 585Yes 415

Node 2

No 353Yes 647 Node 9

No 100Yes 0 Node 10

No 294Yes 706

Node 3

No 171Yes 829

Node 11

No 562Yes 438

Node 12

No 48Yes 52

Node 4

No 851Yes 149

Node 13

No 60Yes 40 Node 18

No 100Yes 0 Node 19

No 325Yes 675 Node 14

No 100Yes 0 Node 20

No 206Yes 794 Node 21

No 85Yes 915 Node 25

No 538Yes 462 Node 26

No 0Yes 100 Node 15

No 778Yes 222 Node 5

No 100Yes 0 Node 16

No 651Yes 349

Node 17

No 824Yes 176

Node 22

No 19Yes 81 Node 23

No 92Yes 8 Node 24

No 100Yes 0 Node 6

(N1)60 le 6 18 lt (N1)60 le 24(N1)60 le 186 lt (N1)60 24 lt (N1)60F75 le 74 74 lt F75

F75 le 93 93 lt F75

F75 le 93 93 lt F75 F75 le 97 97 lt F75 at le 007 g 007 g le at le 028

120593998400 le le 2734 le 2849

le 058

2734 lt 12059399840026 lt 12059399840026∘

PHA le 18 g 18 g lt PHA

335 lt 120593998400 le 39292849 lt 120593998400 le 335

058 lt

3929∘ lt 120593998400

028 lt 120591av120590998400vo 120591av120590998400vo120591av120590998400vo

Figure 4 Final CHAID tree

three different techniques Based on geotechnical engineeringtheory three of the decision trees were developed usingSPT as the first variable The results of the DT models werecompared with those of an LR model the results show thatDT models generally have better performance than the LRmodel for this problem

The two best models (CHAID and CHAID-SPT) werechosen based on the risk values and the sensitivity of theliquefaction prediction The CHAID-SPT tree has betterperformance on the unseen (testing) dataset indicating thatbasing DTmodels on engineering fundamentals can improvemodel performance The general interpretation of the treescorresponds well with engineering theory which indicatesthe reliability of themodels From the interpretation results ofthe trees the SPT values and initial soil friction angle values(used as first variables of the trees) have strong correlation

Some terminal nodes have small sample sizes whichmay lead to results that are anecdotal However theseanecdotal conclusions about soil liquefaction assessment arehighly informative from an exploratory point of view Asthe earthquake parameters do not have a wide range theirinterpretation may not be accurate enough Therefore for

future research the database used in this study should beexpanded to a wider range of earthquake parameters

References

[1] AHGandomi X S Yang S Talatahari andAHAlavi ldquoMeta-heuristics in modeling and optimizationrdquo in MetaheuristicApplications in Structures and Infrastructures A H GandomiX S Yang S Talatahari and A H Alavi Eds chapter 1 pp1ndash24 Elsevier New York NY USA 2013

[2] J Park D Ki K Kim S J Lee D H Kim and K J Oh ldquoUsingdecision tree to develop a soil ecological quality assessmentsystem for planning sustainable constructionrdquo Expert Systemswith Applications vol 38 no 5 pp 5463ndash5470 2011

[3] F Darve ldquoLiquefaction phenomenon of granular materials andconstitutive stabilityrdquo Engineering Computations vol 13 no 7pp 5ndash28 1996

[4] S S H Khozaghi and A J A Choobbasti ldquoPredicting ofliquefaction potential in soils using artificial neural networksrdquoElectronic Journal of Geotechnical Engineering vol 12 article0723 2007

8 The Scientific World Journal

[5] H B Seed and I M Idriss ldquoSimplified procedure for evaluatingsoil liquefaction potentialrdquo Journal of the Soil Mechanics andFoundations Division vol 97 no 9 pp 1249ndash1273 1971

[6] K Tokimatsu and Y Yoshimi ldquoEmpirical correlation of soilliquefaction based on SPT N-value and fines contentrdquo Soils andFoundations vol 23 no 4 pp 56ndash74 1983

[7] A Arman N Samtani R Castelli and G Munfakh ldquoGeotech-nical and foundation engineering module 1-subsurface investi-gationsrdquo FHWA-HI-97-021 1997

[8] G Kass ldquoAn exploratory technique for investigating largequantities of categorical datardquo Applied Statistics vol 29 no 2pp 119ndash127 1980

[9] D Biggs B de Ville and E Suen ldquoA method of choosing mul-tiway partitions for classification and decision treesrdquo Journal ofApplied Statistics vol 18 no 1 pp 49ndash62 1991

[10] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees ChapmanampHallCRC NewYork NY USA 1984

[11] A T C Goh ldquoSeismic liquefaction potential assessed by neuralnetworksrdquo Journal of Geotechnical Engineering vol 120 no 9pp 1467ndash1480 1994

[12] A T C Goh ldquoNeural-network modeling of CPT seismic liq-uefaction datardquo Journal of Geotechnical and GeoenvironmentalEngineering vol 122 no 1 pp 70ndash73 1996

[13] Y Najjar and H Ali ldquoOn the use of BPNN in liquefactionpotential assessment tasksrdquo in Proceedings of the InternationalWorkshop on Artificial Intelligent and Mathematical Methods inPavement and Geomechanical Systems N O Attoh-Okine Edpp 55ndash63 1998

[14] A T C Goh ldquoProbabilistic neural network for evaluatingseismic liquefaction potentialrdquo Canadian Geotechnical Journalvol 39 no 1 pp 219ndash232 2002

[15] M Pal ldquoSupport vector machines-based modelling of seismicliquefaction potentialrdquo International Journal for Numerical andAnalytical Methods in Geomechanics vol 30 no 10 pp 983ndash996 2006

[16] H B Zhao Z L Ru and S Yin ldquoUpdated support vectormachine for seismic liquefaction evaluation based on thepenetration testsrdquoMarine Georesources and Geotechnology vol25 no 3-4 pp 209ndash220 2007

[17] AMHannaDUral andG Saygili ldquoNeural networkmodel forliquefaction potential in soil deposits using Turkey and Taiwanearthquake datardquo Soil Dynamics and Earthquake Engineeringvol 27 no 6 pp 521ndash540 2007

[18] A T C Goh and S H Goh ldquoSupport vector machines theiruse in geotechnical engineering as illustrated using seismicliquefaction datardquoComputers and Geotechnics vol 34 no 5 pp410ndash421 2007

[19] A Baykasoglu A Cevik L Ozbakir and S Kulluk ldquoGeneratingprediction rules for liquefaction through data miningrdquo ExpertSystems with Applications vol 36 no 10 pp 12491ndash12499 2009

[20] A H Gandomi and A H Alavi ldquoMulti-stage genetic pro-gramming a new strategy to nonlinear system modelingrdquoInformation Sciences vol 181 no 23 pp 5227ndash5239 2011

[21] C Kayadelen ldquoSoil liquefactionmodeling by genetic expressionprogramming and neuro-fuzzyrdquo Expert Systems with Applica-tions vol 38 no 4 pp 4080ndash4087 2011

[22] A H Gandomi and A H Alavi ldquoA new multi-gene geneticprogramming approach to non-linear system modelingmdashpartII geotechnical and earthquake engineering problemsrdquo NeuralComputing and Applications vol 21 no 1 pp 189ndash201 2012

[23] P M Muduli and S K Das ldquoCPT-based seismic liquefactionpotential evaluation using multi-gene genetic programmingapproachrdquo Indian Geotechnical Journal 2013

[24] A H Gandomi and A H Alavi ldquoHybridizing genetic pro-gramming with orthogonal least squares for modeling of soilliquefactionrdquo International Journal of Earthquake Engineeringand Hazard Mitigation vol 1 no 1 pp 2ndash8 2013

[25] S Y LaiW J Chang and P S Lin ldquoLogistic regression nodel forevaluating soil liquefaction probability using CPT datardquo Journalof Geotechnical and Geoenvironmental Engineering vol 132 no6 pp 694ndash704 2006

[26] G Livingston M Piantedosi P Kurup and T G SitharamldquoUsing decision-tree learning to assess liquefaction potentialfrom CPT and Vsrdquo in Proceedings of the 4th GeotechnicalEarthquake Engineering and Soil Dynamics Congress pp 1ndash10May 2008

[27] H Pham Ed Springer Handbook of Engineering StatisticsSpringer London UK 2006

[28] M J Noruesis IBM SPSS Decision Trees 20 IBM Corporation3rd edition 2011

[29] SPSS Inc ver 21 2013 httpwwwspsscom

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

2 The Scientific World Journal

is measured Using a drop weight system a 635 kg (140 lb)hammer is repeatedly dropped from 076m to achieve threesuccessive increments of 150mm [7] The 119873 value (ldquoblowcountrdquo) or SPT resistance is the sumof the number of blows toadvance the second and third increments SPT can be used forcharacterizing a wide range of soil types CPT is another fastand economical in situ test that provides continuous profilingof geostratigraphy and soil property evaluation [7] The testis performed by pushing a cylindrical steel probe into theground at a constant rate of 20mms and measuring theresistance to penetration The standard penetrometer has aconical tip with 60∘ angle apex 357mm diameter body and150 cm2 friction sleeve This test can be used in soils rangingfrom very soft clays to dense sands CPT provides moreaccurate and reliable data for liquefaction analysis comparedto more conventional soil tests such as cyclic triaxial andsimple shear tests Thus it can be considered as a very goodcomplement to SPT measurements

Classification techniques have beenwidely used for lique-faction modelling Table 1 shows the techniques and the soiltest used in each of them From the table it is clear that softcomputing techniques have been widely used for liquefactionmodeling However statistical methods such as LR and DThave been only used for liquefaction assessment based onCPT dataThis paper presents liquefaction assessments basedon SPT data using both LR and DT

Generally statistical and soft computing techniques arenot based on engineering fundamentals as they are empir-ically developed based on experimental data In the currentstudy we have forced the liquefaction assessment DTs touse SPT results as the first variable to corroborate with theengineering point of viewAnother commonproblemofmostclassification techniques is that they are not interpretableHowever DT does not have this issue and its built modelscan be easily interpreted based on the nature of the param-eters In this study three different DT methodologies areused including chi-squared automatic interaction detection(CHAID) exhaustive CHAID (E-CHAID) and classificationand regression tree (CART) algorithms The DT results arefurther compared with logistic regression (LR) analysis as aclassical benchmark

2 Predictive Modeling Techniques

21 Decision Tree Algorithms Decision trees are becominga more attractive predictive modeling procedure becauseof the easy interpretation by nonstatisticians One of theadvantages of DT analysis is that the relationship betweenthe binary dependent variable and the related independentvariables is clearly illustrated using a tree structure DTanalysis is especially useful when the independent variablesare expressed as both categories and continuous values Thisnonparametric modeling procedure makes no assumptionsabout the underlying data DT analysis determines howindependent variables best combine to explain the outcomeof a given binary dependent variable In simple terms DTsbreak down to ldquoyesrdquo or ldquonordquo statements according to ldquoif-thenrdquologic In DT algorithms the data set is partitioned into two

or more mutually exclusive subsets in each split The goal isto produce subsets of the data which are as homogeneous aspossible with respect to the target (dependent) variable

Kass [8] proposed the CHAID (chi-squared automaticinteraction detection) algorithm as the first tree-based classi-fication technique Continuous predictors are discretized intoseveral groups and changed to ordinal predictors The mainsteps of the algorithm are themerging splitting and stoppingsteps Merging is useful for any potential parent node that hasmore than two categories in this step potential parent nodesaremerged until all adjusted119875 values are less than the defined120572merge Splitting of potential parents is the process used to findthe best split using the lowest119875 value the splitting step selectswhich independent variable to be used to optimally split thenode The stopping step checks if the tree growing processshould be stopped according to certain stopping rules (egreaching the maximum tree depth level or minimum parentor child nodes)

Biggs et al [9] proposed a new CHAID algorithmcalled exhaustive CHAID (E-CHAID) In E-CHAID thebasic CHAID algorithm is changed to improve merging andtesting of dependent variables as a consequence E-CHAID ismore computationally intensive In the E-CHAID algorithmthere is no reference to any 120572merge value category mergingcontinues until only two categories remain Therefore E-CHAID may not suitable for large data sets with manycontinuous predictor variables The program then followsthe splitting and stopping steps as described for the CHAIDalgorithm

Breiman et al [10] proposed the CART (classification andregression trees) algorithmThe main difference between theCART and CHAID algorithms is that the CART procedurewill grow as a purely binary treeTherefore CART results areeasier to understand as parent nodes are always split into 2child nodes A complete binary tree algorithm partitions dataand produces homogeneous subsets In the CART procedurewe first develop the maximum tree and then prune it to avoidoverfitting

22 Logistic Regression Logistic regression models are usedwhen the dependent variable (119910) is binary (eg the depen-dent variable can take the value of 1 with probability ofsuccess 120587 or the value of 0 with probability of failure 1 minus 120587)and the independent variables (119909

119894) are either categorical or

continuous values The following is a representation of thebinary multiple LR model

ln[120587 (119909119894)

1 minus 120587 (119909119894)

] = ln[119875 (119910119894= 1119909

119894)

119875 (119910119894= 0119909

119894)

] = 1205730+

119901

sum

119894=1

120573119894119909119894 (2)

where 1205730is a constant and 120573

119894are the coefficients of the

independent variables in the model Equation (2) called thelikelihood function is used for estimating the LR coefficientsin the model The maximum likelihood estimation methoduses an iterative procedure to find the model coefficientsthat best match the pattern of observations in the sampledata Interpretation of the model comes from transformingthe LR coefficients for each independent variable Simplytake the exponential of the coefficients (119890120573119894) to determine

The Scientific World Journal 3

Table 1 Different liquefaction models proposed in the literature

Type of technique Reference Year Soil test Classification technique Number of records

Soft computing

[11] 1994 SPT ANN 85[12] 1996 CPT ANN 109[13] 1998 SPT ANN 105[14] 2002 CPT ANN 170[15] 2006 CPT and SPT SVM 109 and 85[16] 2007 CPT and SPT SVM 170 and 105[17] 2007 SPT ANN 620[18] 2007 CPT SVM 226[19] 2009 CPT ANN 226[20] 2011 CPT GP 170[21] 2011 SPT GP and ANN 569[22] 2012 CPT GP 170[23] 2013 CPT GP 170[24] 2013 CPT GP 170

Statistical [25] 2006 CPT LR 396[26] 2008 CPT DT 178

the influences of each independent variable on the dependentvariable in terms of the odds ratio To determine if eachmodelcoefficient is statistically significant the Wald test will beused

3 Liquefaction Modeling

31 Experimental Database and Data Preprocessing Adatabase [17] with 620 postearthquake observations and12 variables has been used in this study In addition to theseismic parameters the database includes soil properties andthe standard penetration test (SPT) results of the soil Thebinary dependent variable is liquefaction and we selected avalue of 0 for nonliquefied soils and a value of 1 for liquefiedsoils The database includes the values of the followingindependent variables

119885 soil specimen depth (m)(1198731)60 corrected PST number ()11986575 percent fines less than 75120583m ()119889119908 ground water table depth (m)120590vo total vertical stress (kPa)1205901015840

vo effective vertical stress (kPa)119886119905 threshold acceleration which contributes to thestrain-based procedure and along with strain-basedsafety factor indicates the exceedance of the thresholdstrain (Hanna et al 2007) (g)

120591av1205901015840

vo cyclic stress ratio which illustrates the seismicdemand on soil119881119904 shear wave velocity (ms)1205931015840 initial soil friction angle (∘)119872V earthquake magnitude (Richter)

PHA peak horizontal acceleration at ground surface (g)

The database was randomly divided into training andtesting subsets The training data were used for the modeldevelopment using the algorithms The testing data wereused to evaluate the generalization capability of the obtainedmodels on the unseen data In order to obtain a consistentdata division several combinations of the training and testingsets were considered Out of the 620 records 80 was usedas the training data and the remaining 20 of the data setswere taken for the testing of the generalization capability ofthe DTs and LR

The descriptive statistics of the data used in this studyare given in Table 2 From this table it can be seen thatthe database contains only high magnitude earthquakesThe records indicate that 256 sites were liquefied and theremaining 364 were not liquefied

The correlationmatrix is computed for all parameters andis presented in Table 3 As expected the total and effectivevertical stresses have strong correlation and specimen depthcorrelates strongly with both stresses From this table SPTresults have high correlation with initial soil friction angleThis is expected as the friction angle represents the shearcapacity of soil which is the most important parameter inSPT results

32 Model Construction Using DTs The procedure of theCART algorithm ranks the independent variables in terms oftheir predictive power the CART algorithm therefore servesas a powerful exploratory tool for understanding the underly-ing structure of the data [27]The relative importance of the 12independent variables is shown in Figure 1 Figure 1 indicatesthat the most important predictor variable is friction angle(1205931015840) Therefore the first split of the CART algorithm is basedon friction angle In the CHAID and E-CHAID algorithmsthe lowest adjusted 119875 value in each level determines the mostassociate independent variable After calculation the frictionangle has the lowest adjusted 119875 value at the beginning of

4 The Scientific World Journal

Table 2 Descriptive statistics of variables used in the model development

Predictor Minimum Mean Maximum Standard deviation119885 08 766 198 490(1198731)60 1 1448 75 1139119865 le 75 120583m 1 6299 100 3428119889119908

035 145 10 120120590vo 121 14460 4089 98201205901015840

vo 75 8248 2337 5284119886119905

0 007 085 007120591av1205901015840

vo 012 037 077 015119881119904

37 16698 500 67091205931015840 2346 3196 5208 485119872V 74 749 76 010PHA 018 038 067 015

Table 3 Correlation matrix

119885 (1198731)60 119865

75119889119908

120590vo 1205901015840

vo 119886119905120591av1205901015840

vo 119881119904

1205931015840119872

V PHA Liq119885 1 039 minus025 008 100 098 minus023 minus012 058 053 049 minus011 minus030(1198731)60 039 1 minus057 008 041 042 003 015 040 085 015 017 minus02711986575

minus025 minus057 1 minus018 minus027 minus030 minus012 minus007 minus030 minus054 minus028 minus016 minus006119889119908

008 008 minus018 1 007 021 014 minus027 016 013 021 minus002 minus013120590vo 100 041 minus027 007 1 099 minus022 minus009 059 055 050 minus007 minus0291205901015840

vo 098 042 minus030 021 099 1 minus019 minus010 061 056 055 minus003 minus028119886119905

minus023 003 minus012 014 minus022 minus019 1 minus006 044 minus002 000 011 minus008120591av1205901015840

vo minus012 015 minus007 minus027 minus009 minus010 minus006 1 minus003 009 minus020 090 025119881119904

058 040 minus030 016 059 061 044 minus003 1 047 032 004 minus0211205931015840 053 085 minus054 013 055 056 minus002 009 047 1 029 012 minus035119872V 049 015 minus028 021 050 055 000 minus020 032 029 1 minus011 minus015PHA minus011 017 minus016 minus002 minus007 minus003 011 090 004 012 minus011 1 019Liq minus030 minus027 minus006 minus013 minus029 minus028 minus008 025 minus021 minus035 minus015 019 1

0

10

20

30

40

50

60

70

80

90

100

Nor

mal

ized

impo

rtan

ce (

)

Independent variable

120593998400

F75

120590vo

120591av

120590998400 vo Z

120590998400 vo

(N1) 60

PHA

dW V

s a t

M

Figure 1 Importance of independent variables

the CHAID and E-CHAID algorithms and again serves asthe first split

Based on geotechnical engineering theory the SPT testresults ((119873

1)60) should be the most associated parameter and

should therefore be the first split in theDT algorithmsThere-fore for DT modeling each algorithm was implemented intwo ways

(1) let the DT algorithm determine the first split based onits own statistical method

(2) force the DT algorithm to use SPT test results as itsfirst split

The performance of a DT model mainly depends onthe architecture and parameter settings There are manyparameters that are involved in the DT algorithms For theCHAID and E-CHAID algorithms the choices of maximumtree depth and minimum number of cases in the parent andchild nodes play an important role in model constructionAfter evaluating different values for themaximum tree depthit is apparent that only training results (not the results forunseen data) improve for maximum depths greater than4 Therefore to avoid overfitting we have set the maxi-mum tree depth as 4 Products of ten are considered forthe minimum number of cases in the parent nodes theminimum number of cases in the child nodes is always set as

The Scientific World Journal 5

half of the parent nodesrsquo minimum For this study 30 and 15are set as the values for the parent and child node maximumnumber of cases Two chi-square statistics are commonlyused in the CHAID and E-CHAID algorithms Pearson andlikelihood ratio Pearson is faster than the likelihood ratioand is more suitable for very large databases The likelihoodratio is more robust than Pearson but its calculation is moretime consuming and it is therefore more suitable for a smalldatabase [28] Due to its robustness the likelihood ratio isused in this study as the chi-square statistic for both theCHAID and E-CHAID algorithms

For the CART algorithm the minimum number of casesin the parent and child nodes is set just like the other DTalgorithms The maximum tree depth is set as 7 and pruningis used to avoid overfitting (pruning at least one level in thisproblem) IBM SPSS software package version 21 [29] wasused to perform the analysis

33 Model Construction Using LR In the conventional clas-sification process logistic regression analysis is an importanttool for building a model In this study the LR algorithmwas performed to approximate the predictive power ofthe DT techniques in comparison with another statisticalclassification technique

The LR models cannot be developed using the sameinput variables and data divisions as DTs Before findingthe final LR model any multicollinearity problems betweenthe independent variables were solved by removing thehighly correlated independent variables from the modelSevere multicollinearity can cause instability in the modelcoefficients especially when highly correlated variables areincluded in the model

4 Results and Discussion

As described above and indicated in Table 4 six different DT-based models were obtained for the assessment of the soilliquefaction The DT models for which the SPT results werethe first variable are indicated with the suffix ldquoSPTrdquo Overallperformance of the DT and LR-based models in terms ofmisclassification rate (risk) on the training data and testingdata subsets and the full data set are summarized in Table 4

Two important characteristics of a classification modelare sensitivity and specificity Sensitivity quantifies the pro-portion of correctly identified positives (liquefied soils) andspecificity gives the proportion of correctly identified nega-tives (nonliquefied soils) Sensitivity and specificity of all DTand LR models are also presented in Table 5 Comparing theperformance of DTs and LRs as tabulated in Tables 4 and5 the DT results are clearly better than the LR results theDTs generally have lower risk values and higher sensitivityand specificity values on the training testing and completedatasets An interesting point from these tables is that whenthe DT algorithms are forced to used SPT results as thefirst variable the model performances on the testing datasets(unseen data) are always improved This suggests that thestrategy of using SPT as the first variable is very effective forfuture prediction of new datasets However as expected if

0102030405060708090100110

0 20 40 60 80 100

Gain

Percentile

CHAID-SPTCHAID

Figure 2 Gain plot of the CHAID and CHAID-SPT models

the DTs use their own criteria for choosing the first variableduring the training process performance is improved on thetraining datasets but not the testing datasets

As shown in Table 4 CHAID models (CHAID andCHAID-SPT) produce the lowest risks among all DTs andLRs CHAID-SPThas the best performance on unseen testingdata sets (risk equal to 0169) and regular CHAID has the bestperformance on the training dataset (risk equal to 0141) Aspredicting liquefied soil after earthquakes is very importantthe highest sensitivity value for each set is bolded in Table 5From this table it can be seen that CHAID-SPT has thehighest sensitivity on the testing datasets (sensitivity equal to0867) and CHAID has the highest sensitivity on the trainingdatasets (sensitivity equal to 0806) Therefore these twomodels have been chosen as the final models

Percentage of total records in the target category in eachnode is recognized as gain The gains plot is a line chartof cumulative percentile gains calculated using cumulativepercentile target over total target [28] The gains plot of theselected models (CHAID and CHAID-SPT) for the trainingresults is presented in Figure 2 In this figure the straight lineshows the random selection model and the area between itand the model curve shows the model benefit area Figure 2shows the effectiveness of the CHAID and CHAID-SPTmodels From this figure it is clear that the CHAID modelhas the better benefit on the training data

The final decision tree for the CHAID-SPT model ispresented in Figure 3 This tree has 22 nodes 14 of which areterminal nodes The first level of the tree was split into fourinitial branches corresponding to four ranges of SPT valuesAs seen in the first level of the tree the two nodes for thehighest SPT values are terminal nodes which are interestingand show the importance of the SPT variable For these twonodes (nodes 3 and 4) the chance of liquefaction is very lowwhich corresponds with geotechnical engineering theory

In the second level the percentage of fines is the bestpredictor for the soils with the lowest SPT values and itsthree splits show that liquefaction probability decreases with

6 The Scientific World Journal

Table 4 Risk estimation of the DT and LR models

Model Testing Training AllCHAID 0234 0141 0160E-CHAID 0234 0161 0176CART 0194 0163 0169CHAID-SPT 0169a 0188 0184E-CHAID-SPT 0177 0161 0165CART-SPT 0185 0165 0169LR 0258 0270 0268aBold sets are the lowest risk values for each set

Table 5 Sensitivity and specificity of the DT and LR models

Model Testing Training AllSensitivity Specificity Sensitivity Specificity Sensitivity Specificity

CHAID 0622 0848 0806 0898 0773 0887E-CHAID 0644 0835 0754 0902 0734 0887CART 0667 0886 0739 0909 0727 0904CHAID-SPT 0867a 0810 0763 0849 0781 0841E-CHAID-SPT 0711 0886 0768 0891 0758 0890CART-SPT 0800 0823 0796 0863 0797 0854LR 0822 0696 0673 0772 0699 0755aBold sets are the highest sensitivity values for each set

increasing percentage of fines For larger SPT values (14 ltSPT le 24) peak ground surface horizontal acceleration is thebest predictor In this range increasing peak horizontal accel-eration substantially raises the chance of liquefaction whichcorresponds with earthquake engineering fundamentals

The third level consists largely of terminal nodes and hasbranches based on four different variables earthquake mag-nitude cyclic stress ratio total vertical stress and percentageof fines In this level the liquefaction probability decreaseswith increasing total vertical stress and increasing cyclicstress ratio Like in the second level liquefaction probabilitydecreases with increasing percentage of fines

In the fourth level there are five terminal nodes onenode is split into three ranges of soil friction angle and onenode is split using two ranges of effective vertical stressIn this level the liquefaction probability decreases withincreasing effective vertical stress which corresponds to thetrend of total vertical stress in the third level and geotechnicalengineering fundamentals

The final decision tree for the CHAID model is shown inFigure 4This tree hasmore terminal nodes than theCHAID-SPT tree with 26 nodes (17 of which are terminal nodes)Thefirst level of the tree was split to six initial branches accordingto the initial soil friction angle which was indicated as thevariable most associated with liquefaction according to theCHAID algorithm As seen in the first level of the tree thenode for the highest angles (1205931015840 gt 3929∘) is a terminal nodethat indicates zero probability of liquefaction

In the second level nodes 1 and 2 related to the lowestinitial soil friction angle (1205931015840 le 2734∘) are split using thepercentage of fines This indicates that percentage of finesis most associated with liquefaction prediction for low soilfriction angles From the branches of the nodes 1 and 2 of the

CHAID tree (in the third level) the liquefaction probabilitydecreases with increasing percentage of fines which agreeswith the previous resultsThe soil with an initial friction anglebetween the 2849∘ and 335∘ the cyclic stress ratio is thebest predictor of liquefaction The probability of liquefactionincreases significantly with an increasing cyclic stress ratio

In the third level only one node is a parent node theother six nodes are terminal nodes In the fourth level thereare only two terminal nodes split using the percentage offines From the third and forth levels it can be noted thatliquefaction probability decreases with increasing percentageof fines

Froma comparison of theCHAID-SPT andCHAID treesit can be seen that percentage of fines is the best second-levelpredictor for the lowest SPT values ((119873

1)60le 14) and for the

lowest initial soil friction angle values (1205931015840 le 2734∘) Anothersimilarity of the developed CHAID and CHAID-SPT trees inthe second level is that peak horizontal acceleration is the bestpredictor for midrange values of SPT (14 lt (119873

1)60le 24) and

midrange values of soil friction angle (335∘ lt 1205931015840 le 3929∘)In the second level for the highest values of SPT and initialsoil friction angle there is zero liquefaction probability Fromthese similarities there appears to be a strong correlationbetween SPT and initial soil friction angle This generalconclusion can be also verifiedwith the correlation coefficientof these two parameters (presented in Table 3) From thesedata SPT and initial soil friction angle are high correlatedparameters

5 Summary and Conclusions

Six decision tree models were successfully developed in thisstudy to classify the liquefied and nonliquefied soil using

The Scientific World Journal 7

PHA le 18 g 18 g lt PHA

LiquefactionNo 575Yes 425 Root node

No 524Yes 476

Node 1

No 25Yes 75

Node 5

No 65Yes 935

Node 10

No 42Yes 58 Node 11

No 432Yes 568

Node 6

No 808Yes 192

Node 12

No 255Yes 745

Node 13

No 768Yes 232

Node 7

No 724Yes 276 Node 14

No 761Yes 239 Node 18

No 538Yes 462

Node 19

No 903Yes 97 Node 20

No 100Yes 0 Node 15

No 41Yes 59 Node 2

No 923Yes 77 Node 8

No 23Yes 77

Node 9

No 0Yes 100

Node 16

No 321Yes 679

Node 17

No 156Yes 844 Node 21

No 571Yes 429

Node 22

No 909Yes 91 Node 3

No 100Yes 0 Node 4

(N1)60 le 14 14 lt (N1)60 le 24 24 lt (N1)60 le 28 28 lt (N1)60

F75 le 74 74 lt F75 le 93 93 lt F75

Mw le 74 74lt Mw 028 lt 120590vo le 1741kPa 1741kPa lt 120590vo F75 le 19 19 lt F75

120593998400 le ltle 977kPa 977kPa2734∘ 298∘ lt 1205939984002734 lt 120593998400 le 298

120591av120590998400vo le 028 120591av120590998400vo

120590998400vo120590998400vo

Figure 3 Final CHAID-SPT tree

LiquefactionNo 575Yes 425

Root node

No 383Yes 617

Node 1

No 43Yes 957

Node 7

No 708Yes 292

Node 8

No 585Yes 415

Node 2

No 353Yes 647 Node 9

No 100Yes 0 Node 10

No 294Yes 706

Node 3

No 171Yes 829

Node 11

No 562Yes 438

Node 12

No 48Yes 52

Node 4

No 851Yes 149

Node 13

No 60Yes 40 Node 18

No 100Yes 0 Node 19

No 325Yes 675 Node 14

No 100Yes 0 Node 20

No 206Yes 794 Node 21

No 85Yes 915 Node 25

No 538Yes 462 Node 26

No 0Yes 100 Node 15

No 778Yes 222 Node 5

No 100Yes 0 Node 16

No 651Yes 349

Node 17

No 824Yes 176

Node 22

No 19Yes 81 Node 23

No 92Yes 8 Node 24

No 100Yes 0 Node 6

(N1)60 le 6 18 lt (N1)60 le 24(N1)60 le 186 lt (N1)60 24 lt (N1)60F75 le 74 74 lt F75

F75 le 93 93 lt F75

F75 le 93 93 lt F75 F75 le 97 97 lt F75 at le 007 g 007 g le at le 028

120593998400 le le 2734 le 2849

le 058

2734 lt 12059399840026 lt 12059399840026∘

PHA le 18 g 18 g lt PHA

335 lt 120593998400 le 39292849 lt 120593998400 le 335

058 lt

3929∘ lt 120593998400

028 lt 120591av120590998400vo 120591av120590998400vo120591av120590998400vo

Figure 4 Final CHAID tree

three different techniques Based on geotechnical engineeringtheory three of the decision trees were developed usingSPT as the first variable The results of the DT models werecompared with those of an LR model the results show thatDT models generally have better performance than the LRmodel for this problem

The two best models (CHAID and CHAID-SPT) werechosen based on the risk values and the sensitivity of theliquefaction prediction The CHAID-SPT tree has betterperformance on the unseen (testing) dataset indicating thatbasing DTmodels on engineering fundamentals can improvemodel performance The general interpretation of the treescorresponds well with engineering theory which indicatesthe reliability of themodels From the interpretation results ofthe trees the SPT values and initial soil friction angle values(used as first variables of the trees) have strong correlation

Some terminal nodes have small sample sizes whichmay lead to results that are anecdotal However theseanecdotal conclusions about soil liquefaction assessment arehighly informative from an exploratory point of view Asthe earthquake parameters do not have a wide range theirinterpretation may not be accurate enough Therefore for

future research the database used in this study should beexpanded to a wider range of earthquake parameters

References

[1] AHGandomi X S Yang S Talatahari andAHAlavi ldquoMeta-heuristics in modeling and optimizationrdquo in MetaheuristicApplications in Structures and Infrastructures A H GandomiX S Yang S Talatahari and A H Alavi Eds chapter 1 pp1ndash24 Elsevier New York NY USA 2013

[2] J Park D Ki K Kim S J Lee D H Kim and K J Oh ldquoUsingdecision tree to develop a soil ecological quality assessmentsystem for planning sustainable constructionrdquo Expert Systemswith Applications vol 38 no 5 pp 5463ndash5470 2011

[3] F Darve ldquoLiquefaction phenomenon of granular materials andconstitutive stabilityrdquo Engineering Computations vol 13 no 7pp 5ndash28 1996

[4] S S H Khozaghi and A J A Choobbasti ldquoPredicting ofliquefaction potential in soils using artificial neural networksrdquoElectronic Journal of Geotechnical Engineering vol 12 article0723 2007

8 The Scientific World Journal

[5] H B Seed and I M Idriss ldquoSimplified procedure for evaluatingsoil liquefaction potentialrdquo Journal of the Soil Mechanics andFoundations Division vol 97 no 9 pp 1249ndash1273 1971

[6] K Tokimatsu and Y Yoshimi ldquoEmpirical correlation of soilliquefaction based on SPT N-value and fines contentrdquo Soils andFoundations vol 23 no 4 pp 56ndash74 1983

[7] A Arman N Samtani R Castelli and G Munfakh ldquoGeotech-nical and foundation engineering module 1-subsurface investi-gationsrdquo FHWA-HI-97-021 1997

[8] G Kass ldquoAn exploratory technique for investigating largequantities of categorical datardquo Applied Statistics vol 29 no 2pp 119ndash127 1980

[9] D Biggs B de Ville and E Suen ldquoA method of choosing mul-tiway partitions for classification and decision treesrdquo Journal ofApplied Statistics vol 18 no 1 pp 49ndash62 1991

[10] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees ChapmanampHallCRC NewYork NY USA 1984

[11] A T C Goh ldquoSeismic liquefaction potential assessed by neuralnetworksrdquo Journal of Geotechnical Engineering vol 120 no 9pp 1467ndash1480 1994

[12] A T C Goh ldquoNeural-network modeling of CPT seismic liq-uefaction datardquo Journal of Geotechnical and GeoenvironmentalEngineering vol 122 no 1 pp 70ndash73 1996

[13] Y Najjar and H Ali ldquoOn the use of BPNN in liquefactionpotential assessment tasksrdquo in Proceedings of the InternationalWorkshop on Artificial Intelligent and Mathematical Methods inPavement and Geomechanical Systems N O Attoh-Okine Edpp 55ndash63 1998

[14] A T C Goh ldquoProbabilistic neural network for evaluatingseismic liquefaction potentialrdquo Canadian Geotechnical Journalvol 39 no 1 pp 219ndash232 2002

[15] M Pal ldquoSupport vector machines-based modelling of seismicliquefaction potentialrdquo International Journal for Numerical andAnalytical Methods in Geomechanics vol 30 no 10 pp 983ndash996 2006

[16] H B Zhao Z L Ru and S Yin ldquoUpdated support vectormachine for seismic liquefaction evaluation based on thepenetration testsrdquoMarine Georesources and Geotechnology vol25 no 3-4 pp 209ndash220 2007

[17] AMHannaDUral andG Saygili ldquoNeural networkmodel forliquefaction potential in soil deposits using Turkey and Taiwanearthquake datardquo Soil Dynamics and Earthquake Engineeringvol 27 no 6 pp 521ndash540 2007

[18] A T C Goh and S H Goh ldquoSupport vector machines theiruse in geotechnical engineering as illustrated using seismicliquefaction datardquoComputers and Geotechnics vol 34 no 5 pp410ndash421 2007

[19] A Baykasoglu A Cevik L Ozbakir and S Kulluk ldquoGeneratingprediction rules for liquefaction through data miningrdquo ExpertSystems with Applications vol 36 no 10 pp 12491ndash12499 2009

[20] A H Gandomi and A H Alavi ldquoMulti-stage genetic pro-gramming a new strategy to nonlinear system modelingrdquoInformation Sciences vol 181 no 23 pp 5227ndash5239 2011

[21] C Kayadelen ldquoSoil liquefactionmodeling by genetic expressionprogramming and neuro-fuzzyrdquo Expert Systems with Applica-tions vol 38 no 4 pp 4080ndash4087 2011

[22] A H Gandomi and A H Alavi ldquoA new multi-gene geneticprogramming approach to non-linear system modelingmdashpartII geotechnical and earthquake engineering problemsrdquo NeuralComputing and Applications vol 21 no 1 pp 189ndash201 2012

[23] P M Muduli and S K Das ldquoCPT-based seismic liquefactionpotential evaluation using multi-gene genetic programmingapproachrdquo Indian Geotechnical Journal 2013

[24] A H Gandomi and A H Alavi ldquoHybridizing genetic pro-gramming with orthogonal least squares for modeling of soilliquefactionrdquo International Journal of Earthquake Engineeringand Hazard Mitigation vol 1 no 1 pp 2ndash8 2013

[25] S Y LaiW J Chang and P S Lin ldquoLogistic regression nodel forevaluating soil liquefaction probability using CPT datardquo Journalof Geotechnical and Geoenvironmental Engineering vol 132 no6 pp 694ndash704 2006

[26] G Livingston M Piantedosi P Kurup and T G SitharamldquoUsing decision-tree learning to assess liquefaction potentialfrom CPT and Vsrdquo in Proceedings of the 4th GeotechnicalEarthquake Engineering and Soil Dynamics Congress pp 1ndash10May 2008

[27] H Pham Ed Springer Handbook of Engineering StatisticsSpringer London UK 2006

[28] M J Noruesis IBM SPSS Decision Trees 20 IBM Corporation3rd edition 2011

[29] SPSS Inc ver 21 2013 httpwwwspsscom

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

The Scientific World Journal 3

Table 1 Different liquefaction models proposed in the literature

Type of technique Reference Year Soil test Classification technique Number of records

Soft computing

[11] 1994 SPT ANN 85[12] 1996 CPT ANN 109[13] 1998 SPT ANN 105[14] 2002 CPT ANN 170[15] 2006 CPT and SPT SVM 109 and 85[16] 2007 CPT and SPT SVM 170 and 105[17] 2007 SPT ANN 620[18] 2007 CPT SVM 226[19] 2009 CPT ANN 226[20] 2011 CPT GP 170[21] 2011 SPT GP and ANN 569[22] 2012 CPT GP 170[23] 2013 CPT GP 170[24] 2013 CPT GP 170

Statistical [25] 2006 CPT LR 396[26] 2008 CPT DT 178

the influences of each independent variable on the dependentvariable in terms of the odds ratio To determine if eachmodelcoefficient is statistically significant the Wald test will beused

3 Liquefaction Modeling

31 Experimental Database and Data Preprocessing Adatabase [17] with 620 postearthquake observations and12 variables has been used in this study In addition to theseismic parameters the database includes soil properties andthe standard penetration test (SPT) results of the soil Thebinary dependent variable is liquefaction and we selected avalue of 0 for nonliquefied soils and a value of 1 for liquefiedsoils The database includes the values of the followingindependent variables

119885 soil specimen depth (m)(1198731)60 corrected PST number ()11986575 percent fines less than 75120583m ()119889119908 ground water table depth (m)120590vo total vertical stress (kPa)1205901015840

vo effective vertical stress (kPa)119886119905 threshold acceleration which contributes to thestrain-based procedure and along with strain-basedsafety factor indicates the exceedance of the thresholdstrain (Hanna et al 2007) (g)

120591av1205901015840

vo cyclic stress ratio which illustrates the seismicdemand on soil119881119904 shear wave velocity (ms)1205931015840 initial soil friction angle (∘)119872V earthquake magnitude (Richter)

PHA peak horizontal acceleration at ground surface (g)

The database was randomly divided into training andtesting subsets The training data were used for the modeldevelopment using the algorithms The testing data wereused to evaluate the generalization capability of the obtainedmodels on the unseen data In order to obtain a consistentdata division several combinations of the training and testingsets were considered Out of the 620 records 80 was usedas the training data and the remaining 20 of the data setswere taken for the testing of the generalization capability ofthe DTs and LR

The descriptive statistics of the data used in this studyare given in Table 2 From this table it can be seen thatthe database contains only high magnitude earthquakesThe records indicate that 256 sites were liquefied and theremaining 364 were not liquefied

The correlationmatrix is computed for all parameters andis presented in Table 3 As expected the total and effectivevertical stresses have strong correlation and specimen depthcorrelates strongly with both stresses From this table SPTresults have high correlation with initial soil friction angleThis is expected as the friction angle represents the shearcapacity of soil which is the most important parameter inSPT results

32 Model Construction Using DTs The procedure of theCART algorithm ranks the independent variables in terms oftheir predictive power the CART algorithm therefore servesas a powerful exploratory tool for understanding the underly-ing structure of the data [27]The relative importance of the 12independent variables is shown in Figure 1 Figure 1 indicatesthat the most important predictor variable is friction angle(1205931015840) Therefore the first split of the CART algorithm is basedon friction angle In the CHAID and E-CHAID algorithmsthe lowest adjusted 119875 value in each level determines the mostassociate independent variable After calculation the frictionangle has the lowest adjusted 119875 value at the beginning of

4 The Scientific World Journal

Table 2 Descriptive statistics of variables used in the model development

Predictor Minimum Mean Maximum Standard deviation119885 08 766 198 490(1198731)60 1 1448 75 1139119865 le 75 120583m 1 6299 100 3428119889119908

035 145 10 120120590vo 121 14460 4089 98201205901015840

vo 75 8248 2337 5284119886119905

0 007 085 007120591av1205901015840

vo 012 037 077 015119881119904

37 16698 500 67091205931015840 2346 3196 5208 485119872V 74 749 76 010PHA 018 038 067 015

Table 3 Correlation matrix

119885 (1198731)60 119865

75119889119908

120590vo 1205901015840

vo 119886119905120591av1205901015840

vo 119881119904

1205931015840119872

V PHA Liq119885 1 039 minus025 008 100 098 minus023 minus012 058 053 049 minus011 minus030(1198731)60 039 1 minus057 008 041 042 003 015 040 085 015 017 minus02711986575

minus025 minus057 1 minus018 minus027 minus030 minus012 minus007 minus030 minus054 minus028 minus016 minus006119889119908

008 008 minus018 1 007 021 014 minus027 016 013 021 minus002 minus013120590vo 100 041 minus027 007 1 099 minus022 minus009 059 055 050 minus007 minus0291205901015840

vo 098 042 minus030 021 099 1 minus019 minus010 061 056 055 minus003 minus028119886119905

minus023 003 minus012 014 minus022 minus019 1 minus006 044 minus002 000 011 minus008120591av1205901015840

vo minus012 015 minus007 minus027 minus009 minus010 minus006 1 minus003 009 minus020 090 025119881119904

058 040 minus030 016 059 061 044 minus003 1 047 032 004 minus0211205931015840 053 085 minus054 013 055 056 minus002 009 047 1 029 012 minus035119872V 049 015 minus028 021 050 055 000 minus020 032 029 1 minus011 minus015PHA minus011 017 minus016 minus002 minus007 minus003 011 090 004 012 minus011 1 019Liq minus030 minus027 minus006 minus013 minus029 minus028 minus008 025 minus021 minus035 minus015 019 1

0

10

20

30

40

50

60

70

80

90

100

Nor

mal

ized

impo

rtan

ce (

)

Independent variable

120593998400

F75

120590vo

120591av

120590998400 vo Z

120590998400 vo

(N1) 60

PHA

dW V

s a t

M

Figure 1 Importance of independent variables

the CHAID and E-CHAID algorithms and again serves asthe first split

Based on geotechnical engineering theory the SPT testresults ((119873

1)60) should be the most associated parameter and

should therefore be the first split in theDT algorithmsThere-fore for DT modeling each algorithm was implemented intwo ways

(1) let the DT algorithm determine the first split based onits own statistical method

(2) force the DT algorithm to use SPT test results as itsfirst split

The performance of a DT model mainly depends onthe architecture and parameter settings There are manyparameters that are involved in the DT algorithms For theCHAID and E-CHAID algorithms the choices of maximumtree depth and minimum number of cases in the parent andchild nodes play an important role in model constructionAfter evaluating different values for themaximum tree depthit is apparent that only training results (not the results forunseen data) improve for maximum depths greater than4 Therefore to avoid overfitting we have set the maxi-mum tree depth as 4 Products of ten are considered forthe minimum number of cases in the parent nodes theminimum number of cases in the child nodes is always set as

The Scientific World Journal 5

half of the parent nodesrsquo minimum For this study 30 and 15are set as the values for the parent and child node maximumnumber of cases Two chi-square statistics are commonlyused in the CHAID and E-CHAID algorithms Pearson andlikelihood ratio Pearson is faster than the likelihood ratioand is more suitable for very large databases The likelihoodratio is more robust than Pearson but its calculation is moretime consuming and it is therefore more suitable for a smalldatabase [28] Due to its robustness the likelihood ratio isused in this study as the chi-square statistic for both theCHAID and E-CHAID algorithms

For the CART algorithm the minimum number of casesin the parent and child nodes is set just like the other DTalgorithms The maximum tree depth is set as 7 and pruningis used to avoid overfitting (pruning at least one level in thisproblem) IBM SPSS software package version 21 [29] wasused to perform the analysis

33 Model Construction Using LR In the conventional clas-sification process logistic regression analysis is an importanttool for building a model In this study the LR algorithmwas performed to approximate the predictive power ofthe DT techniques in comparison with another statisticalclassification technique

The LR models cannot be developed using the sameinput variables and data divisions as DTs Before findingthe final LR model any multicollinearity problems betweenthe independent variables were solved by removing thehighly correlated independent variables from the modelSevere multicollinearity can cause instability in the modelcoefficients especially when highly correlated variables areincluded in the model

4 Results and Discussion

As described above and indicated in Table 4 six different DT-based models were obtained for the assessment of the soilliquefaction The DT models for which the SPT results werethe first variable are indicated with the suffix ldquoSPTrdquo Overallperformance of the DT and LR-based models in terms ofmisclassification rate (risk) on the training data and testingdata subsets and the full data set are summarized in Table 4

Two important characteristics of a classification modelare sensitivity and specificity Sensitivity quantifies the pro-portion of correctly identified positives (liquefied soils) andspecificity gives the proportion of correctly identified nega-tives (nonliquefied soils) Sensitivity and specificity of all DTand LR models are also presented in Table 5 Comparing theperformance of DTs and LRs as tabulated in Tables 4 and5 the DT results are clearly better than the LR results theDTs generally have lower risk values and higher sensitivityand specificity values on the training testing and completedatasets An interesting point from these tables is that whenthe DT algorithms are forced to used SPT results as thefirst variable the model performances on the testing datasets(unseen data) are always improved This suggests that thestrategy of using SPT as the first variable is very effective forfuture prediction of new datasets However as expected if

0102030405060708090100110

0 20 40 60 80 100

Gain

Percentile

CHAID-SPTCHAID

Figure 2 Gain plot of the CHAID and CHAID-SPT models

the DTs use their own criteria for choosing the first variableduring the training process performance is improved on thetraining datasets but not the testing datasets

As shown in Table 4 CHAID models (CHAID andCHAID-SPT) produce the lowest risks among all DTs andLRs CHAID-SPThas the best performance on unseen testingdata sets (risk equal to 0169) and regular CHAID has the bestperformance on the training dataset (risk equal to 0141) Aspredicting liquefied soil after earthquakes is very importantthe highest sensitivity value for each set is bolded in Table 5From this table it can be seen that CHAID-SPT has thehighest sensitivity on the testing datasets (sensitivity equal to0867) and CHAID has the highest sensitivity on the trainingdatasets (sensitivity equal to 0806) Therefore these twomodels have been chosen as the final models

Percentage of total records in the target category in eachnode is recognized as gain The gains plot is a line chartof cumulative percentile gains calculated using cumulativepercentile target over total target [28] The gains plot of theselected models (CHAID and CHAID-SPT) for the trainingresults is presented in Figure 2 In this figure the straight lineshows the random selection model and the area between itand the model curve shows the model benefit area Figure 2shows the effectiveness of the CHAID and CHAID-SPTmodels From this figure it is clear that the CHAID modelhas the better benefit on the training data

The final decision tree for the CHAID-SPT model ispresented in Figure 3 This tree has 22 nodes 14 of which areterminal nodes The first level of the tree was split into fourinitial branches corresponding to four ranges of SPT valuesAs seen in the first level of the tree the two nodes for thehighest SPT values are terminal nodes which are interestingand show the importance of the SPT variable For these twonodes (nodes 3 and 4) the chance of liquefaction is very lowwhich corresponds with geotechnical engineering theory

In the second level the percentage of fines is the bestpredictor for the soils with the lowest SPT values and itsthree splits show that liquefaction probability decreases with

6 The Scientific World Journal

Table 4 Risk estimation of the DT and LR models

Model Testing Training AllCHAID 0234 0141 0160E-CHAID 0234 0161 0176CART 0194 0163 0169CHAID-SPT 0169a 0188 0184E-CHAID-SPT 0177 0161 0165CART-SPT 0185 0165 0169LR 0258 0270 0268aBold sets are the lowest risk values for each set

Table 5 Sensitivity and specificity of the DT and LR models

Model Testing Training AllSensitivity Specificity Sensitivity Specificity Sensitivity Specificity

CHAID 0622 0848 0806 0898 0773 0887E-CHAID 0644 0835 0754 0902 0734 0887CART 0667 0886 0739 0909 0727 0904CHAID-SPT 0867a 0810 0763 0849 0781 0841E-CHAID-SPT 0711 0886 0768 0891 0758 0890CART-SPT 0800 0823 0796 0863 0797 0854LR 0822 0696 0673 0772 0699 0755aBold sets are the highest sensitivity values for each set

increasing percentage of fines For larger SPT values (14 ltSPT le 24) peak ground surface horizontal acceleration is thebest predictor In this range increasing peak horizontal accel-eration substantially raises the chance of liquefaction whichcorresponds with earthquake engineering fundamentals

The third level consists largely of terminal nodes and hasbranches based on four different variables earthquake mag-nitude cyclic stress ratio total vertical stress and percentageof fines In this level the liquefaction probability decreaseswith increasing total vertical stress and increasing cyclicstress ratio Like in the second level liquefaction probabilitydecreases with increasing percentage of fines

In the fourth level there are five terminal nodes onenode is split into three ranges of soil friction angle and onenode is split using two ranges of effective vertical stressIn this level the liquefaction probability decreases withincreasing effective vertical stress which corresponds to thetrend of total vertical stress in the third level and geotechnicalengineering fundamentals

The final decision tree for the CHAID model is shown inFigure 4This tree hasmore terminal nodes than theCHAID-SPT tree with 26 nodes (17 of which are terminal nodes)Thefirst level of the tree was split to six initial branches accordingto the initial soil friction angle which was indicated as thevariable most associated with liquefaction according to theCHAID algorithm As seen in the first level of the tree thenode for the highest angles (1205931015840 gt 3929∘) is a terminal nodethat indicates zero probability of liquefaction

In the second level nodes 1 and 2 related to the lowestinitial soil friction angle (1205931015840 le 2734∘) are split using thepercentage of fines This indicates that percentage of finesis most associated with liquefaction prediction for low soilfriction angles From the branches of the nodes 1 and 2 of the

CHAID tree (in the third level) the liquefaction probabilitydecreases with increasing percentage of fines which agreeswith the previous resultsThe soil with an initial friction anglebetween the 2849∘ and 335∘ the cyclic stress ratio is thebest predictor of liquefaction The probability of liquefactionincreases significantly with an increasing cyclic stress ratio

In the third level only one node is a parent node theother six nodes are terminal nodes In the fourth level thereare only two terminal nodes split using the percentage offines From the third and forth levels it can be noted thatliquefaction probability decreases with increasing percentageof fines

Froma comparison of theCHAID-SPT andCHAID treesit can be seen that percentage of fines is the best second-levelpredictor for the lowest SPT values ((119873

1)60le 14) and for the

lowest initial soil friction angle values (1205931015840 le 2734∘) Anothersimilarity of the developed CHAID and CHAID-SPT trees inthe second level is that peak horizontal acceleration is the bestpredictor for midrange values of SPT (14 lt (119873

1)60le 24) and

midrange values of soil friction angle (335∘ lt 1205931015840 le 3929∘)In the second level for the highest values of SPT and initialsoil friction angle there is zero liquefaction probability Fromthese similarities there appears to be a strong correlationbetween SPT and initial soil friction angle This generalconclusion can be also verifiedwith the correlation coefficientof these two parameters (presented in Table 3) From thesedata SPT and initial soil friction angle are high correlatedparameters

5 Summary and Conclusions

Six decision tree models were successfully developed in thisstudy to classify the liquefied and nonliquefied soil using

The Scientific World Journal 7

PHA le 18 g 18 g lt PHA

LiquefactionNo 575Yes 425 Root node

No 524Yes 476

Node 1

No 25Yes 75

Node 5

No 65Yes 935

Node 10

No 42Yes 58 Node 11

No 432Yes 568

Node 6

No 808Yes 192

Node 12

No 255Yes 745

Node 13

No 768Yes 232

Node 7

No 724Yes 276 Node 14

No 761Yes 239 Node 18

No 538Yes 462

Node 19

No 903Yes 97 Node 20

No 100Yes 0 Node 15

No 41Yes 59 Node 2

No 923Yes 77 Node 8

No 23Yes 77

Node 9

No 0Yes 100

Node 16

No 321Yes 679

Node 17

No 156Yes 844 Node 21

No 571Yes 429

Node 22

No 909Yes 91 Node 3

No 100Yes 0 Node 4

(N1)60 le 14 14 lt (N1)60 le 24 24 lt (N1)60 le 28 28 lt (N1)60

F75 le 74 74 lt F75 le 93 93 lt F75

Mw le 74 74lt Mw 028 lt 120590vo le 1741kPa 1741kPa lt 120590vo F75 le 19 19 lt F75

120593998400 le ltle 977kPa 977kPa2734∘ 298∘ lt 1205939984002734 lt 120593998400 le 298

120591av120590998400vo le 028 120591av120590998400vo

120590998400vo120590998400vo

Figure 3 Final CHAID-SPT tree

LiquefactionNo 575Yes 425

Root node

No 383Yes 617

Node 1

No 43Yes 957

Node 7

No 708Yes 292

Node 8

No 585Yes 415

Node 2

No 353Yes 647 Node 9

No 100Yes 0 Node 10

No 294Yes 706

Node 3

No 171Yes 829

Node 11

No 562Yes 438

Node 12

No 48Yes 52

Node 4

No 851Yes 149

Node 13

No 60Yes 40 Node 18

No 100Yes 0 Node 19

No 325Yes 675 Node 14

No 100Yes 0 Node 20

No 206Yes 794 Node 21

No 85Yes 915 Node 25

No 538Yes 462 Node 26

No 0Yes 100 Node 15

No 778Yes 222 Node 5

No 100Yes 0 Node 16

No 651Yes 349

Node 17

No 824Yes 176

Node 22

No 19Yes 81 Node 23

No 92Yes 8 Node 24

No 100Yes 0 Node 6

(N1)60 le 6 18 lt (N1)60 le 24(N1)60 le 186 lt (N1)60 24 lt (N1)60F75 le 74 74 lt F75

F75 le 93 93 lt F75

F75 le 93 93 lt F75 F75 le 97 97 lt F75 at le 007 g 007 g le at le 028

120593998400 le le 2734 le 2849

le 058

2734 lt 12059399840026 lt 12059399840026∘

PHA le 18 g 18 g lt PHA

335 lt 120593998400 le 39292849 lt 120593998400 le 335

058 lt

3929∘ lt 120593998400

028 lt 120591av120590998400vo 120591av120590998400vo120591av120590998400vo

Figure 4 Final CHAID tree

three different techniques Based on geotechnical engineeringtheory three of the decision trees were developed usingSPT as the first variable The results of the DT models werecompared with those of an LR model the results show thatDT models generally have better performance than the LRmodel for this problem

The two best models (CHAID and CHAID-SPT) werechosen based on the risk values and the sensitivity of theliquefaction prediction The CHAID-SPT tree has betterperformance on the unseen (testing) dataset indicating thatbasing DTmodels on engineering fundamentals can improvemodel performance The general interpretation of the treescorresponds well with engineering theory which indicatesthe reliability of themodels From the interpretation results ofthe trees the SPT values and initial soil friction angle values(used as first variables of the trees) have strong correlation

Some terminal nodes have small sample sizes whichmay lead to results that are anecdotal However theseanecdotal conclusions about soil liquefaction assessment arehighly informative from an exploratory point of view Asthe earthquake parameters do not have a wide range theirinterpretation may not be accurate enough Therefore for

future research the database used in this study should beexpanded to a wider range of earthquake parameters

References

[1] AHGandomi X S Yang S Talatahari andAHAlavi ldquoMeta-heuristics in modeling and optimizationrdquo in MetaheuristicApplications in Structures and Infrastructures A H GandomiX S Yang S Talatahari and A H Alavi Eds chapter 1 pp1ndash24 Elsevier New York NY USA 2013

[2] J Park D Ki K Kim S J Lee D H Kim and K J Oh ldquoUsingdecision tree to develop a soil ecological quality assessmentsystem for planning sustainable constructionrdquo Expert Systemswith Applications vol 38 no 5 pp 5463ndash5470 2011

[3] F Darve ldquoLiquefaction phenomenon of granular materials andconstitutive stabilityrdquo Engineering Computations vol 13 no 7pp 5ndash28 1996

[4] S S H Khozaghi and A J A Choobbasti ldquoPredicting ofliquefaction potential in soils using artificial neural networksrdquoElectronic Journal of Geotechnical Engineering vol 12 article0723 2007

8 The Scientific World Journal

[5] H B Seed and I M Idriss ldquoSimplified procedure for evaluatingsoil liquefaction potentialrdquo Journal of the Soil Mechanics andFoundations Division vol 97 no 9 pp 1249ndash1273 1971

[6] K Tokimatsu and Y Yoshimi ldquoEmpirical correlation of soilliquefaction based on SPT N-value and fines contentrdquo Soils andFoundations vol 23 no 4 pp 56ndash74 1983

[7] A Arman N Samtani R Castelli and G Munfakh ldquoGeotech-nical and foundation engineering module 1-subsurface investi-gationsrdquo FHWA-HI-97-021 1997

[8] G Kass ldquoAn exploratory technique for investigating largequantities of categorical datardquo Applied Statistics vol 29 no 2pp 119ndash127 1980

[9] D Biggs B de Ville and E Suen ldquoA method of choosing mul-tiway partitions for classification and decision treesrdquo Journal ofApplied Statistics vol 18 no 1 pp 49ndash62 1991

[10] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees ChapmanampHallCRC NewYork NY USA 1984

[11] A T C Goh ldquoSeismic liquefaction potential assessed by neuralnetworksrdquo Journal of Geotechnical Engineering vol 120 no 9pp 1467ndash1480 1994

[12] A T C Goh ldquoNeural-network modeling of CPT seismic liq-uefaction datardquo Journal of Geotechnical and GeoenvironmentalEngineering vol 122 no 1 pp 70ndash73 1996

[13] Y Najjar and H Ali ldquoOn the use of BPNN in liquefactionpotential assessment tasksrdquo in Proceedings of the InternationalWorkshop on Artificial Intelligent and Mathematical Methods inPavement and Geomechanical Systems N O Attoh-Okine Edpp 55ndash63 1998

[14] A T C Goh ldquoProbabilistic neural network for evaluatingseismic liquefaction potentialrdquo Canadian Geotechnical Journalvol 39 no 1 pp 219ndash232 2002

[15] M Pal ldquoSupport vector machines-based modelling of seismicliquefaction potentialrdquo International Journal for Numerical andAnalytical Methods in Geomechanics vol 30 no 10 pp 983ndash996 2006

[16] H B Zhao Z L Ru and S Yin ldquoUpdated support vectormachine for seismic liquefaction evaluation based on thepenetration testsrdquoMarine Georesources and Geotechnology vol25 no 3-4 pp 209ndash220 2007

[17] AMHannaDUral andG Saygili ldquoNeural networkmodel forliquefaction potential in soil deposits using Turkey and Taiwanearthquake datardquo Soil Dynamics and Earthquake Engineeringvol 27 no 6 pp 521ndash540 2007

[18] A T C Goh and S H Goh ldquoSupport vector machines theiruse in geotechnical engineering as illustrated using seismicliquefaction datardquoComputers and Geotechnics vol 34 no 5 pp410ndash421 2007

[19] A Baykasoglu A Cevik L Ozbakir and S Kulluk ldquoGeneratingprediction rules for liquefaction through data miningrdquo ExpertSystems with Applications vol 36 no 10 pp 12491ndash12499 2009

[20] A H Gandomi and A H Alavi ldquoMulti-stage genetic pro-gramming a new strategy to nonlinear system modelingrdquoInformation Sciences vol 181 no 23 pp 5227ndash5239 2011

[21] C Kayadelen ldquoSoil liquefactionmodeling by genetic expressionprogramming and neuro-fuzzyrdquo Expert Systems with Applica-tions vol 38 no 4 pp 4080ndash4087 2011

[22] A H Gandomi and A H Alavi ldquoA new multi-gene geneticprogramming approach to non-linear system modelingmdashpartII geotechnical and earthquake engineering problemsrdquo NeuralComputing and Applications vol 21 no 1 pp 189ndash201 2012

[23] P M Muduli and S K Das ldquoCPT-based seismic liquefactionpotential evaluation using multi-gene genetic programmingapproachrdquo Indian Geotechnical Journal 2013

[24] A H Gandomi and A H Alavi ldquoHybridizing genetic pro-gramming with orthogonal least squares for modeling of soilliquefactionrdquo International Journal of Earthquake Engineeringand Hazard Mitigation vol 1 no 1 pp 2ndash8 2013

[25] S Y LaiW J Chang and P S Lin ldquoLogistic regression nodel forevaluating soil liquefaction probability using CPT datardquo Journalof Geotechnical and Geoenvironmental Engineering vol 132 no6 pp 694ndash704 2006

[26] G Livingston M Piantedosi P Kurup and T G SitharamldquoUsing decision-tree learning to assess liquefaction potentialfrom CPT and Vsrdquo in Proceedings of the 4th GeotechnicalEarthquake Engineering and Soil Dynamics Congress pp 1ndash10May 2008

[27] H Pham Ed Springer Handbook of Engineering StatisticsSpringer London UK 2006

[28] M J Noruesis IBM SPSS Decision Trees 20 IBM Corporation3rd edition 2011

[29] SPSS Inc ver 21 2013 httpwwwspsscom

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

4 The Scientific World Journal

Table 2 Descriptive statistics of variables used in the model development

Predictor Minimum Mean Maximum Standard deviation119885 08 766 198 490(1198731)60 1 1448 75 1139119865 le 75 120583m 1 6299 100 3428119889119908

035 145 10 120120590vo 121 14460 4089 98201205901015840

vo 75 8248 2337 5284119886119905

0 007 085 007120591av1205901015840

vo 012 037 077 015119881119904

37 16698 500 67091205931015840 2346 3196 5208 485119872V 74 749 76 010PHA 018 038 067 015

Table 3 Correlation matrix

119885 (1198731)60 119865

75119889119908

120590vo 1205901015840

vo 119886119905120591av1205901015840

vo 119881119904

1205931015840119872

V PHA Liq119885 1 039 minus025 008 100 098 minus023 minus012 058 053 049 minus011 minus030(1198731)60 039 1 minus057 008 041 042 003 015 040 085 015 017 minus02711986575

minus025 minus057 1 minus018 minus027 minus030 minus012 minus007 minus030 minus054 minus028 minus016 minus006119889119908

008 008 minus018 1 007 021 014 minus027 016 013 021 minus002 minus013120590vo 100 041 minus027 007 1 099 minus022 minus009 059 055 050 minus007 minus0291205901015840

vo 098 042 minus030 021 099 1 minus019 minus010 061 056 055 minus003 minus028119886119905

minus023 003 minus012 014 minus022 minus019 1 minus006 044 minus002 000 011 minus008120591av1205901015840

vo minus012 015 minus007 minus027 minus009 minus010 minus006 1 minus003 009 minus020 090 025119881119904

058 040 minus030 016 059 061 044 minus003 1 047 032 004 minus0211205931015840 053 085 minus054 013 055 056 minus002 009 047 1 029 012 minus035119872V 049 015 minus028 021 050 055 000 minus020 032 029 1 minus011 minus015PHA minus011 017 minus016 minus002 minus007 minus003 011 090 004 012 minus011 1 019Liq minus030 minus027 minus006 minus013 minus029 minus028 minus008 025 minus021 minus035 minus015 019 1

0

10

20

30

40

50

60

70

80

90

100

Nor

mal

ized

impo

rtan

ce (

)

Independent variable

120593998400

F75

120590vo

120591av

120590998400 vo Z

120590998400 vo

(N1) 60

PHA

dW V

s a t

M

Figure 1 Importance of independent variables

the CHAID and E-CHAID algorithms and again serves asthe first split

Based on geotechnical engineering theory the SPT testresults ((119873

1)60) should be the most associated parameter and

should therefore be the first split in theDT algorithmsThere-fore for DT modeling each algorithm was implemented intwo ways

(1) let the DT algorithm determine the first split based onits own statistical method

(2) force the DT algorithm to use SPT test results as itsfirst split

The performance of a DT model mainly depends onthe architecture and parameter settings There are manyparameters that are involved in the DT algorithms For theCHAID and E-CHAID algorithms the choices of maximumtree depth and minimum number of cases in the parent andchild nodes play an important role in model constructionAfter evaluating different values for themaximum tree depthit is apparent that only training results (not the results forunseen data) improve for maximum depths greater than4 Therefore to avoid overfitting we have set the maxi-mum tree depth as 4 Products of ten are considered forthe minimum number of cases in the parent nodes theminimum number of cases in the child nodes is always set as

The Scientific World Journal 5

half of the parent nodesrsquo minimum For this study 30 and 15are set as the values for the parent and child node maximumnumber of cases Two chi-square statistics are commonlyused in the CHAID and E-CHAID algorithms Pearson andlikelihood ratio Pearson is faster than the likelihood ratioand is more suitable for very large databases The likelihoodratio is more robust than Pearson but its calculation is moretime consuming and it is therefore more suitable for a smalldatabase [28] Due to its robustness the likelihood ratio isused in this study as the chi-square statistic for both theCHAID and E-CHAID algorithms

For the CART algorithm the minimum number of casesin the parent and child nodes is set just like the other DTalgorithms The maximum tree depth is set as 7 and pruningis used to avoid overfitting (pruning at least one level in thisproblem) IBM SPSS software package version 21 [29] wasused to perform the analysis

33 Model Construction Using LR In the conventional clas-sification process logistic regression analysis is an importanttool for building a model In this study the LR algorithmwas performed to approximate the predictive power ofthe DT techniques in comparison with another statisticalclassification technique

The LR models cannot be developed using the sameinput variables and data divisions as DTs Before findingthe final LR model any multicollinearity problems betweenthe independent variables were solved by removing thehighly correlated independent variables from the modelSevere multicollinearity can cause instability in the modelcoefficients especially when highly correlated variables areincluded in the model

4 Results and Discussion

As described above and indicated in Table 4 six different DT-based models were obtained for the assessment of the soilliquefaction The DT models for which the SPT results werethe first variable are indicated with the suffix ldquoSPTrdquo Overallperformance of the DT and LR-based models in terms ofmisclassification rate (risk) on the training data and testingdata subsets and the full data set are summarized in Table 4

Two important characteristics of a classification modelare sensitivity and specificity Sensitivity quantifies the pro-portion of correctly identified positives (liquefied soils) andspecificity gives the proportion of correctly identified nega-tives (nonliquefied soils) Sensitivity and specificity of all DTand LR models are also presented in Table 5 Comparing theperformance of DTs and LRs as tabulated in Tables 4 and5 the DT results are clearly better than the LR results theDTs generally have lower risk values and higher sensitivityand specificity values on the training testing and completedatasets An interesting point from these tables is that whenthe DT algorithms are forced to used SPT results as thefirst variable the model performances on the testing datasets(unseen data) are always improved This suggests that thestrategy of using SPT as the first variable is very effective forfuture prediction of new datasets However as expected if

0102030405060708090100110

0 20 40 60 80 100

Gain

Percentile

CHAID-SPTCHAID

Figure 2 Gain plot of the CHAID and CHAID-SPT models

the DTs use their own criteria for choosing the first variableduring the training process performance is improved on thetraining datasets but not the testing datasets

As shown in Table 4 CHAID models (CHAID andCHAID-SPT) produce the lowest risks among all DTs andLRs CHAID-SPThas the best performance on unseen testingdata sets (risk equal to 0169) and regular CHAID has the bestperformance on the training dataset (risk equal to 0141) Aspredicting liquefied soil after earthquakes is very importantthe highest sensitivity value for each set is bolded in Table 5From this table it can be seen that CHAID-SPT has thehighest sensitivity on the testing datasets (sensitivity equal to0867) and CHAID has the highest sensitivity on the trainingdatasets (sensitivity equal to 0806) Therefore these twomodels have been chosen as the final models

Percentage of total records in the target category in eachnode is recognized as gain The gains plot is a line chartof cumulative percentile gains calculated using cumulativepercentile target over total target [28] The gains plot of theselected models (CHAID and CHAID-SPT) for the trainingresults is presented in Figure 2 In this figure the straight lineshows the random selection model and the area between itand the model curve shows the model benefit area Figure 2shows the effectiveness of the CHAID and CHAID-SPTmodels From this figure it is clear that the CHAID modelhas the better benefit on the training data

The final decision tree for the CHAID-SPT model ispresented in Figure 3 This tree has 22 nodes 14 of which areterminal nodes The first level of the tree was split into fourinitial branches corresponding to four ranges of SPT valuesAs seen in the first level of the tree the two nodes for thehighest SPT values are terminal nodes which are interestingand show the importance of the SPT variable For these twonodes (nodes 3 and 4) the chance of liquefaction is very lowwhich corresponds with geotechnical engineering theory

In the second level the percentage of fines is the bestpredictor for the soils with the lowest SPT values and itsthree splits show that liquefaction probability decreases with

6 The Scientific World Journal

Table 4 Risk estimation of the DT and LR models

Model Testing Training AllCHAID 0234 0141 0160E-CHAID 0234 0161 0176CART 0194 0163 0169CHAID-SPT 0169a 0188 0184E-CHAID-SPT 0177 0161 0165CART-SPT 0185 0165 0169LR 0258 0270 0268aBold sets are the lowest risk values for each set

Table 5 Sensitivity and specificity of the DT and LR models

Model Testing Training AllSensitivity Specificity Sensitivity Specificity Sensitivity Specificity

CHAID 0622 0848 0806 0898 0773 0887E-CHAID 0644 0835 0754 0902 0734 0887CART 0667 0886 0739 0909 0727 0904CHAID-SPT 0867a 0810 0763 0849 0781 0841E-CHAID-SPT 0711 0886 0768 0891 0758 0890CART-SPT 0800 0823 0796 0863 0797 0854LR 0822 0696 0673 0772 0699 0755aBold sets are the highest sensitivity values for each set

increasing percentage of fines For larger SPT values (14 ltSPT le 24) peak ground surface horizontal acceleration is thebest predictor In this range increasing peak horizontal accel-eration substantially raises the chance of liquefaction whichcorresponds with earthquake engineering fundamentals

The third level consists largely of terminal nodes and hasbranches based on four different variables earthquake mag-nitude cyclic stress ratio total vertical stress and percentageof fines In this level the liquefaction probability decreaseswith increasing total vertical stress and increasing cyclicstress ratio Like in the second level liquefaction probabilitydecreases with increasing percentage of fines

In the fourth level there are five terminal nodes onenode is split into three ranges of soil friction angle and onenode is split using two ranges of effective vertical stressIn this level the liquefaction probability decreases withincreasing effective vertical stress which corresponds to thetrend of total vertical stress in the third level and geotechnicalengineering fundamentals

The final decision tree for the CHAID model is shown inFigure 4This tree hasmore terminal nodes than theCHAID-SPT tree with 26 nodes (17 of which are terminal nodes)Thefirst level of the tree was split to six initial branches accordingto the initial soil friction angle which was indicated as thevariable most associated with liquefaction according to theCHAID algorithm As seen in the first level of the tree thenode for the highest angles (1205931015840 gt 3929∘) is a terminal nodethat indicates zero probability of liquefaction

In the second level nodes 1 and 2 related to the lowestinitial soil friction angle (1205931015840 le 2734∘) are split using thepercentage of fines This indicates that percentage of finesis most associated with liquefaction prediction for low soilfriction angles From the branches of the nodes 1 and 2 of the

CHAID tree (in the third level) the liquefaction probabilitydecreases with increasing percentage of fines which agreeswith the previous resultsThe soil with an initial friction anglebetween the 2849∘ and 335∘ the cyclic stress ratio is thebest predictor of liquefaction The probability of liquefactionincreases significantly with an increasing cyclic stress ratio

In the third level only one node is a parent node theother six nodes are terminal nodes In the fourth level thereare only two terminal nodes split using the percentage offines From the third and forth levels it can be noted thatliquefaction probability decreases with increasing percentageof fines

Froma comparison of theCHAID-SPT andCHAID treesit can be seen that percentage of fines is the best second-levelpredictor for the lowest SPT values ((119873

1)60le 14) and for the

lowest initial soil friction angle values (1205931015840 le 2734∘) Anothersimilarity of the developed CHAID and CHAID-SPT trees inthe second level is that peak horizontal acceleration is the bestpredictor for midrange values of SPT (14 lt (119873

1)60le 24) and

midrange values of soil friction angle (335∘ lt 1205931015840 le 3929∘)In the second level for the highest values of SPT and initialsoil friction angle there is zero liquefaction probability Fromthese similarities there appears to be a strong correlationbetween SPT and initial soil friction angle This generalconclusion can be also verifiedwith the correlation coefficientof these two parameters (presented in Table 3) From thesedata SPT and initial soil friction angle are high correlatedparameters

5 Summary and Conclusions

Six decision tree models were successfully developed in thisstudy to classify the liquefied and nonliquefied soil using

The Scientific World Journal 7

PHA le 18 g 18 g lt PHA

LiquefactionNo 575Yes 425 Root node

No 524Yes 476

Node 1

No 25Yes 75

Node 5

No 65Yes 935

Node 10

No 42Yes 58 Node 11

No 432Yes 568

Node 6

No 808Yes 192

Node 12

No 255Yes 745

Node 13

No 768Yes 232

Node 7

No 724Yes 276 Node 14

No 761Yes 239 Node 18

No 538Yes 462

Node 19

No 903Yes 97 Node 20

No 100Yes 0 Node 15

No 41Yes 59 Node 2

No 923Yes 77 Node 8

No 23Yes 77

Node 9

No 0Yes 100

Node 16

No 321Yes 679

Node 17

No 156Yes 844 Node 21

No 571Yes 429

Node 22

No 909Yes 91 Node 3

No 100Yes 0 Node 4

(N1)60 le 14 14 lt (N1)60 le 24 24 lt (N1)60 le 28 28 lt (N1)60

F75 le 74 74 lt F75 le 93 93 lt F75

Mw le 74 74lt Mw 028 lt 120590vo le 1741kPa 1741kPa lt 120590vo F75 le 19 19 lt F75

120593998400 le ltle 977kPa 977kPa2734∘ 298∘ lt 1205939984002734 lt 120593998400 le 298

120591av120590998400vo le 028 120591av120590998400vo

120590998400vo120590998400vo

Figure 3 Final CHAID-SPT tree

LiquefactionNo 575Yes 425

Root node

No 383Yes 617

Node 1

No 43Yes 957

Node 7

No 708Yes 292

Node 8

No 585Yes 415

Node 2

No 353Yes 647 Node 9

No 100Yes 0 Node 10

No 294Yes 706

Node 3

No 171Yes 829

Node 11

No 562Yes 438

Node 12

No 48Yes 52

Node 4

No 851Yes 149

Node 13

No 60Yes 40 Node 18

No 100Yes 0 Node 19

No 325Yes 675 Node 14

No 100Yes 0 Node 20

No 206Yes 794 Node 21

No 85Yes 915 Node 25

No 538Yes 462 Node 26

No 0Yes 100 Node 15

No 778Yes 222 Node 5

No 100Yes 0 Node 16

No 651Yes 349

Node 17

No 824Yes 176

Node 22

No 19Yes 81 Node 23

No 92Yes 8 Node 24

No 100Yes 0 Node 6

(N1)60 le 6 18 lt (N1)60 le 24(N1)60 le 186 lt (N1)60 24 lt (N1)60F75 le 74 74 lt F75

F75 le 93 93 lt F75

F75 le 93 93 lt F75 F75 le 97 97 lt F75 at le 007 g 007 g le at le 028

120593998400 le le 2734 le 2849

le 058

2734 lt 12059399840026 lt 12059399840026∘

PHA le 18 g 18 g lt PHA

335 lt 120593998400 le 39292849 lt 120593998400 le 335

058 lt

3929∘ lt 120593998400

028 lt 120591av120590998400vo 120591av120590998400vo120591av120590998400vo

Figure 4 Final CHAID tree

three different techniques Based on geotechnical engineeringtheory three of the decision trees were developed usingSPT as the first variable The results of the DT models werecompared with those of an LR model the results show thatDT models generally have better performance than the LRmodel for this problem

The two best models (CHAID and CHAID-SPT) werechosen based on the risk values and the sensitivity of theliquefaction prediction The CHAID-SPT tree has betterperformance on the unseen (testing) dataset indicating thatbasing DTmodels on engineering fundamentals can improvemodel performance The general interpretation of the treescorresponds well with engineering theory which indicatesthe reliability of themodels From the interpretation results ofthe trees the SPT values and initial soil friction angle values(used as first variables of the trees) have strong correlation

Some terminal nodes have small sample sizes whichmay lead to results that are anecdotal However theseanecdotal conclusions about soil liquefaction assessment arehighly informative from an exploratory point of view Asthe earthquake parameters do not have a wide range theirinterpretation may not be accurate enough Therefore for

future research the database used in this study should beexpanded to a wider range of earthquake parameters

References

[1] AHGandomi X S Yang S Talatahari andAHAlavi ldquoMeta-heuristics in modeling and optimizationrdquo in MetaheuristicApplications in Structures and Infrastructures A H GandomiX S Yang S Talatahari and A H Alavi Eds chapter 1 pp1ndash24 Elsevier New York NY USA 2013

[2] J Park D Ki K Kim S J Lee D H Kim and K J Oh ldquoUsingdecision tree to develop a soil ecological quality assessmentsystem for planning sustainable constructionrdquo Expert Systemswith Applications vol 38 no 5 pp 5463ndash5470 2011

[3] F Darve ldquoLiquefaction phenomenon of granular materials andconstitutive stabilityrdquo Engineering Computations vol 13 no 7pp 5ndash28 1996

[4] S S H Khozaghi and A J A Choobbasti ldquoPredicting ofliquefaction potential in soils using artificial neural networksrdquoElectronic Journal of Geotechnical Engineering vol 12 article0723 2007

8 The Scientific World Journal

[5] H B Seed and I M Idriss ldquoSimplified procedure for evaluatingsoil liquefaction potentialrdquo Journal of the Soil Mechanics andFoundations Division vol 97 no 9 pp 1249ndash1273 1971

[6] K Tokimatsu and Y Yoshimi ldquoEmpirical correlation of soilliquefaction based on SPT N-value and fines contentrdquo Soils andFoundations vol 23 no 4 pp 56ndash74 1983

[7] A Arman N Samtani R Castelli and G Munfakh ldquoGeotech-nical and foundation engineering module 1-subsurface investi-gationsrdquo FHWA-HI-97-021 1997

[8] G Kass ldquoAn exploratory technique for investigating largequantities of categorical datardquo Applied Statistics vol 29 no 2pp 119ndash127 1980

[9] D Biggs B de Ville and E Suen ldquoA method of choosing mul-tiway partitions for classification and decision treesrdquo Journal ofApplied Statistics vol 18 no 1 pp 49ndash62 1991

[10] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees ChapmanampHallCRC NewYork NY USA 1984

[11] A T C Goh ldquoSeismic liquefaction potential assessed by neuralnetworksrdquo Journal of Geotechnical Engineering vol 120 no 9pp 1467ndash1480 1994

[12] A T C Goh ldquoNeural-network modeling of CPT seismic liq-uefaction datardquo Journal of Geotechnical and GeoenvironmentalEngineering vol 122 no 1 pp 70ndash73 1996

[13] Y Najjar and H Ali ldquoOn the use of BPNN in liquefactionpotential assessment tasksrdquo in Proceedings of the InternationalWorkshop on Artificial Intelligent and Mathematical Methods inPavement and Geomechanical Systems N O Attoh-Okine Edpp 55ndash63 1998

[14] A T C Goh ldquoProbabilistic neural network for evaluatingseismic liquefaction potentialrdquo Canadian Geotechnical Journalvol 39 no 1 pp 219ndash232 2002

[15] M Pal ldquoSupport vector machines-based modelling of seismicliquefaction potentialrdquo International Journal for Numerical andAnalytical Methods in Geomechanics vol 30 no 10 pp 983ndash996 2006

[16] H B Zhao Z L Ru and S Yin ldquoUpdated support vectormachine for seismic liquefaction evaluation based on thepenetration testsrdquoMarine Georesources and Geotechnology vol25 no 3-4 pp 209ndash220 2007

[17] AMHannaDUral andG Saygili ldquoNeural networkmodel forliquefaction potential in soil deposits using Turkey and Taiwanearthquake datardquo Soil Dynamics and Earthquake Engineeringvol 27 no 6 pp 521ndash540 2007

[18] A T C Goh and S H Goh ldquoSupport vector machines theiruse in geotechnical engineering as illustrated using seismicliquefaction datardquoComputers and Geotechnics vol 34 no 5 pp410ndash421 2007

[19] A Baykasoglu A Cevik L Ozbakir and S Kulluk ldquoGeneratingprediction rules for liquefaction through data miningrdquo ExpertSystems with Applications vol 36 no 10 pp 12491ndash12499 2009

[20] A H Gandomi and A H Alavi ldquoMulti-stage genetic pro-gramming a new strategy to nonlinear system modelingrdquoInformation Sciences vol 181 no 23 pp 5227ndash5239 2011

[21] C Kayadelen ldquoSoil liquefactionmodeling by genetic expressionprogramming and neuro-fuzzyrdquo Expert Systems with Applica-tions vol 38 no 4 pp 4080ndash4087 2011

[22] A H Gandomi and A H Alavi ldquoA new multi-gene geneticprogramming approach to non-linear system modelingmdashpartII geotechnical and earthquake engineering problemsrdquo NeuralComputing and Applications vol 21 no 1 pp 189ndash201 2012

[23] P M Muduli and S K Das ldquoCPT-based seismic liquefactionpotential evaluation using multi-gene genetic programmingapproachrdquo Indian Geotechnical Journal 2013

[24] A H Gandomi and A H Alavi ldquoHybridizing genetic pro-gramming with orthogonal least squares for modeling of soilliquefactionrdquo International Journal of Earthquake Engineeringand Hazard Mitigation vol 1 no 1 pp 2ndash8 2013

[25] S Y LaiW J Chang and P S Lin ldquoLogistic regression nodel forevaluating soil liquefaction probability using CPT datardquo Journalof Geotechnical and Geoenvironmental Engineering vol 132 no6 pp 694ndash704 2006

[26] G Livingston M Piantedosi P Kurup and T G SitharamldquoUsing decision-tree learning to assess liquefaction potentialfrom CPT and Vsrdquo in Proceedings of the 4th GeotechnicalEarthquake Engineering and Soil Dynamics Congress pp 1ndash10May 2008

[27] H Pham Ed Springer Handbook of Engineering StatisticsSpringer London UK 2006

[28] M J Noruesis IBM SPSS Decision Trees 20 IBM Corporation3rd edition 2011

[29] SPSS Inc ver 21 2013 httpwwwspsscom

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

The Scientific World Journal 5

half of the parent nodesrsquo minimum For this study 30 and 15are set as the values for the parent and child node maximumnumber of cases Two chi-square statistics are commonlyused in the CHAID and E-CHAID algorithms Pearson andlikelihood ratio Pearson is faster than the likelihood ratioand is more suitable for very large databases The likelihoodratio is more robust than Pearson but its calculation is moretime consuming and it is therefore more suitable for a smalldatabase [28] Due to its robustness the likelihood ratio isused in this study as the chi-square statistic for both theCHAID and E-CHAID algorithms

For the CART algorithm the minimum number of casesin the parent and child nodes is set just like the other DTalgorithms The maximum tree depth is set as 7 and pruningis used to avoid overfitting (pruning at least one level in thisproblem) IBM SPSS software package version 21 [29] wasused to perform the analysis

33 Model Construction Using LR In the conventional clas-sification process logistic regression analysis is an importanttool for building a model In this study the LR algorithmwas performed to approximate the predictive power ofthe DT techniques in comparison with another statisticalclassification technique

The LR models cannot be developed using the sameinput variables and data divisions as DTs Before findingthe final LR model any multicollinearity problems betweenthe independent variables were solved by removing thehighly correlated independent variables from the modelSevere multicollinearity can cause instability in the modelcoefficients especially when highly correlated variables areincluded in the model

4 Results and Discussion

As described above and indicated in Table 4 six different DT-based models were obtained for the assessment of the soilliquefaction The DT models for which the SPT results werethe first variable are indicated with the suffix ldquoSPTrdquo Overallperformance of the DT and LR-based models in terms ofmisclassification rate (risk) on the training data and testingdata subsets and the full data set are summarized in Table 4

Two important characteristics of a classification modelare sensitivity and specificity Sensitivity quantifies the pro-portion of correctly identified positives (liquefied soils) andspecificity gives the proportion of correctly identified nega-tives (nonliquefied soils) Sensitivity and specificity of all DTand LR models are also presented in Table 5 Comparing theperformance of DTs and LRs as tabulated in Tables 4 and5 the DT results are clearly better than the LR results theDTs generally have lower risk values and higher sensitivityand specificity values on the training testing and completedatasets An interesting point from these tables is that whenthe DT algorithms are forced to used SPT results as thefirst variable the model performances on the testing datasets(unseen data) are always improved This suggests that thestrategy of using SPT as the first variable is very effective forfuture prediction of new datasets However as expected if

0102030405060708090100110

0 20 40 60 80 100

Gain

Percentile

CHAID-SPTCHAID

Figure 2 Gain plot of the CHAID and CHAID-SPT models

the DTs use their own criteria for choosing the first variableduring the training process performance is improved on thetraining datasets but not the testing datasets

As shown in Table 4 CHAID models (CHAID andCHAID-SPT) produce the lowest risks among all DTs andLRs CHAID-SPThas the best performance on unseen testingdata sets (risk equal to 0169) and regular CHAID has the bestperformance on the training dataset (risk equal to 0141) Aspredicting liquefied soil after earthquakes is very importantthe highest sensitivity value for each set is bolded in Table 5From this table it can be seen that CHAID-SPT has thehighest sensitivity on the testing datasets (sensitivity equal to0867) and CHAID has the highest sensitivity on the trainingdatasets (sensitivity equal to 0806) Therefore these twomodels have been chosen as the final models

Percentage of total records in the target category in eachnode is recognized as gain The gains plot is a line chartof cumulative percentile gains calculated using cumulativepercentile target over total target [28] The gains plot of theselected models (CHAID and CHAID-SPT) for the trainingresults is presented in Figure 2 In this figure the straight lineshows the random selection model and the area between itand the model curve shows the model benefit area Figure 2shows the effectiveness of the CHAID and CHAID-SPTmodels From this figure it is clear that the CHAID modelhas the better benefit on the training data

The final decision tree for the CHAID-SPT model ispresented in Figure 3 This tree has 22 nodes 14 of which areterminal nodes The first level of the tree was split into fourinitial branches corresponding to four ranges of SPT valuesAs seen in the first level of the tree the two nodes for thehighest SPT values are terminal nodes which are interestingand show the importance of the SPT variable For these twonodes (nodes 3 and 4) the chance of liquefaction is very lowwhich corresponds with geotechnical engineering theory

In the second level the percentage of fines is the bestpredictor for the soils with the lowest SPT values and itsthree splits show that liquefaction probability decreases with

6 The Scientific World Journal

Table 4 Risk estimation of the DT and LR models

Model Testing Training AllCHAID 0234 0141 0160E-CHAID 0234 0161 0176CART 0194 0163 0169CHAID-SPT 0169a 0188 0184E-CHAID-SPT 0177 0161 0165CART-SPT 0185 0165 0169LR 0258 0270 0268aBold sets are the lowest risk values for each set

Table 5 Sensitivity and specificity of the DT and LR models

Model Testing Training AllSensitivity Specificity Sensitivity Specificity Sensitivity Specificity

CHAID 0622 0848 0806 0898 0773 0887E-CHAID 0644 0835 0754 0902 0734 0887CART 0667 0886 0739 0909 0727 0904CHAID-SPT 0867a 0810 0763 0849 0781 0841E-CHAID-SPT 0711 0886 0768 0891 0758 0890CART-SPT 0800 0823 0796 0863 0797 0854LR 0822 0696 0673 0772 0699 0755aBold sets are the highest sensitivity values for each set

increasing percentage of fines For larger SPT values (14 ltSPT le 24) peak ground surface horizontal acceleration is thebest predictor In this range increasing peak horizontal accel-eration substantially raises the chance of liquefaction whichcorresponds with earthquake engineering fundamentals

The third level consists largely of terminal nodes and hasbranches based on four different variables earthquake mag-nitude cyclic stress ratio total vertical stress and percentageof fines In this level the liquefaction probability decreaseswith increasing total vertical stress and increasing cyclicstress ratio Like in the second level liquefaction probabilitydecreases with increasing percentage of fines

In the fourth level there are five terminal nodes onenode is split into three ranges of soil friction angle and onenode is split using two ranges of effective vertical stressIn this level the liquefaction probability decreases withincreasing effective vertical stress which corresponds to thetrend of total vertical stress in the third level and geotechnicalengineering fundamentals

The final decision tree for the CHAID model is shown inFigure 4This tree hasmore terminal nodes than theCHAID-SPT tree with 26 nodes (17 of which are terminal nodes)Thefirst level of the tree was split to six initial branches accordingto the initial soil friction angle which was indicated as thevariable most associated with liquefaction according to theCHAID algorithm As seen in the first level of the tree thenode for the highest angles (1205931015840 gt 3929∘) is a terminal nodethat indicates zero probability of liquefaction

In the second level nodes 1 and 2 related to the lowestinitial soil friction angle (1205931015840 le 2734∘) are split using thepercentage of fines This indicates that percentage of finesis most associated with liquefaction prediction for low soilfriction angles From the branches of the nodes 1 and 2 of the

CHAID tree (in the third level) the liquefaction probabilitydecreases with increasing percentage of fines which agreeswith the previous resultsThe soil with an initial friction anglebetween the 2849∘ and 335∘ the cyclic stress ratio is thebest predictor of liquefaction The probability of liquefactionincreases significantly with an increasing cyclic stress ratio

In the third level only one node is a parent node theother six nodes are terminal nodes In the fourth level thereare only two terminal nodes split using the percentage offines From the third and forth levels it can be noted thatliquefaction probability decreases with increasing percentageof fines

Froma comparison of theCHAID-SPT andCHAID treesit can be seen that percentage of fines is the best second-levelpredictor for the lowest SPT values ((119873

1)60le 14) and for the

lowest initial soil friction angle values (1205931015840 le 2734∘) Anothersimilarity of the developed CHAID and CHAID-SPT trees inthe second level is that peak horizontal acceleration is the bestpredictor for midrange values of SPT (14 lt (119873

1)60le 24) and

midrange values of soil friction angle (335∘ lt 1205931015840 le 3929∘)In the second level for the highest values of SPT and initialsoil friction angle there is zero liquefaction probability Fromthese similarities there appears to be a strong correlationbetween SPT and initial soil friction angle This generalconclusion can be also verifiedwith the correlation coefficientof these two parameters (presented in Table 3) From thesedata SPT and initial soil friction angle are high correlatedparameters

5 Summary and Conclusions

Six decision tree models were successfully developed in thisstudy to classify the liquefied and nonliquefied soil using

The Scientific World Journal 7

PHA le 18 g 18 g lt PHA

LiquefactionNo 575Yes 425 Root node

No 524Yes 476

Node 1

No 25Yes 75

Node 5

No 65Yes 935

Node 10

No 42Yes 58 Node 11

No 432Yes 568

Node 6

No 808Yes 192

Node 12

No 255Yes 745

Node 13

No 768Yes 232

Node 7

No 724Yes 276 Node 14

No 761Yes 239 Node 18

No 538Yes 462

Node 19

No 903Yes 97 Node 20

No 100Yes 0 Node 15

No 41Yes 59 Node 2

No 923Yes 77 Node 8

No 23Yes 77

Node 9

No 0Yes 100

Node 16

No 321Yes 679

Node 17

No 156Yes 844 Node 21

No 571Yes 429

Node 22

No 909Yes 91 Node 3

No 100Yes 0 Node 4

(N1)60 le 14 14 lt (N1)60 le 24 24 lt (N1)60 le 28 28 lt (N1)60

F75 le 74 74 lt F75 le 93 93 lt F75

Mw le 74 74lt Mw 028 lt 120590vo le 1741kPa 1741kPa lt 120590vo F75 le 19 19 lt F75

120593998400 le ltle 977kPa 977kPa2734∘ 298∘ lt 1205939984002734 lt 120593998400 le 298

120591av120590998400vo le 028 120591av120590998400vo

120590998400vo120590998400vo

Figure 3 Final CHAID-SPT tree

LiquefactionNo 575Yes 425

Root node

No 383Yes 617

Node 1

No 43Yes 957

Node 7

No 708Yes 292

Node 8

No 585Yes 415

Node 2

No 353Yes 647 Node 9

No 100Yes 0 Node 10

No 294Yes 706

Node 3

No 171Yes 829

Node 11

No 562Yes 438

Node 12

No 48Yes 52

Node 4

No 851Yes 149

Node 13

No 60Yes 40 Node 18

No 100Yes 0 Node 19

No 325Yes 675 Node 14

No 100Yes 0 Node 20

No 206Yes 794 Node 21

No 85Yes 915 Node 25

No 538Yes 462 Node 26

No 0Yes 100 Node 15

No 778Yes 222 Node 5

No 100Yes 0 Node 16

No 651Yes 349

Node 17

No 824Yes 176

Node 22

No 19Yes 81 Node 23

No 92Yes 8 Node 24

No 100Yes 0 Node 6

(N1)60 le 6 18 lt (N1)60 le 24(N1)60 le 186 lt (N1)60 24 lt (N1)60F75 le 74 74 lt F75

F75 le 93 93 lt F75

F75 le 93 93 lt F75 F75 le 97 97 lt F75 at le 007 g 007 g le at le 028

120593998400 le le 2734 le 2849

le 058

2734 lt 12059399840026 lt 12059399840026∘

PHA le 18 g 18 g lt PHA

335 lt 120593998400 le 39292849 lt 120593998400 le 335

058 lt

3929∘ lt 120593998400

028 lt 120591av120590998400vo 120591av120590998400vo120591av120590998400vo

Figure 4 Final CHAID tree

three different techniques Based on geotechnical engineeringtheory three of the decision trees were developed usingSPT as the first variable The results of the DT models werecompared with those of an LR model the results show thatDT models generally have better performance than the LRmodel for this problem

The two best models (CHAID and CHAID-SPT) werechosen based on the risk values and the sensitivity of theliquefaction prediction The CHAID-SPT tree has betterperformance on the unseen (testing) dataset indicating thatbasing DTmodels on engineering fundamentals can improvemodel performance The general interpretation of the treescorresponds well with engineering theory which indicatesthe reliability of themodels From the interpretation results ofthe trees the SPT values and initial soil friction angle values(used as first variables of the trees) have strong correlation

Some terminal nodes have small sample sizes whichmay lead to results that are anecdotal However theseanecdotal conclusions about soil liquefaction assessment arehighly informative from an exploratory point of view Asthe earthquake parameters do not have a wide range theirinterpretation may not be accurate enough Therefore for

future research the database used in this study should beexpanded to a wider range of earthquake parameters

References

[1] AHGandomi X S Yang S Talatahari andAHAlavi ldquoMeta-heuristics in modeling and optimizationrdquo in MetaheuristicApplications in Structures and Infrastructures A H GandomiX S Yang S Talatahari and A H Alavi Eds chapter 1 pp1ndash24 Elsevier New York NY USA 2013

[2] J Park D Ki K Kim S J Lee D H Kim and K J Oh ldquoUsingdecision tree to develop a soil ecological quality assessmentsystem for planning sustainable constructionrdquo Expert Systemswith Applications vol 38 no 5 pp 5463ndash5470 2011

[3] F Darve ldquoLiquefaction phenomenon of granular materials andconstitutive stabilityrdquo Engineering Computations vol 13 no 7pp 5ndash28 1996

[4] S S H Khozaghi and A J A Choobbasti ldquoPredicting ofliquefaction potential in soils using artificial neural networksrdquoElectronic Journal of Geotechnical Engineering vol 12 article0723 2007

8 The Scientific World Journal

[5] H B Seed and I M Idriss ldquoSimplified procedure for evaluatingsoil liquefaction potentialrdquo Journal of the Soil Mechanics andFoundations Division vol 97 no 9 pp 1249ndash1273 1971

[6] K Tokimatsu and Y Yoshimi ldquoEmpirical correlation of soilliquefaction based on SPT N-value and fines contentrdquo Soils andFoundations vol 23 no 4 pp 56ndash74 1983

[7] A Arman N Samtani R Castelli and G Munfakh ldquoGeotech-nical and foundation engineering module 1-subsurface investi-gationsrdquo FHWA-HI-97-021 1997

[8] G Kass ldquoAn exploratory technique for investigating largequantities of categorical datardquo Applied Statistics vol 29 no 2pp 119ndash127 1980

[9] D Biggs B de Ville and E Suen ldquoA method of choosing mul-tiway partitions for classification and decision treesrdquo Journal ofApplied Statistics vol 18 no 1 pp 49ndash62 1991

[10] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees ChapmanampHallCRC NewYork NY USA 1984

[11] A T C Goh ldquoSeismic liquefaction potential assessed by neuralnetworksrdquo Journal of Geotechnical Engineering vol 120 no 9pp 1467ndash1480 1994

[12] A T C Goh ldquoNeural-network modeling of CPT seismic liq-uefaction datardquo Journal of Geotechnical and GeoenvironmentalEngineering vol 122 no 1 pp 70ndash73 1996

[13] Y Najjar and H Ali ldquoOn the use of BPNN in liquefactionpotential assessment tasksrdquo in Proceedings of the InternationalWorkshop on Artificial Intelligent and Mathematical Methods inPavement and Geomechanical Systems N O Attoh-Okine Edpp 55ndash63 1998

[14] A T C Goh ldquoProbabilistic neural network for evaluatingseismic liquefaction potentialrdquo Canadian Geotechnical Journalvol 39 no 1 pp 219ndash232 2002

[15] M Pal ldquoSupport vector machines-based modelling of seismicliquefaction potentialrdquo International Journal for Numerical andAnalytical Methods in Geomechanics vol 30 no 10 pp 983ndash996 2006

[16] H B Zhao Z L Ru and S Yin ldquoUpdated support vectormachine for seismic liquefaction evaluation based on thepenetration testsrdquoMarine Georesources and Geotechnology vol25 no 3-4 pp 209ndash220 2007

[17] AMHannaDUral andG Saygili ldquoNeural networkmodel forliquefaction potential in soil deposits using Turkey and Taiwanearthquake datardquo Soil Dynamics and Earthquake Engineeringvol 27 no 6 pp 521ndash540 2007

[18] A T C Goh and S H Goh ldquoSupport vector machines theiruse in geotechnical engineering as illustrated using seismicliquefaction datardquoComputers and Geotechnics vol 34 no 5 pp410ndash421 2007

[19] A Baykasoglu A Cevik L Ozbakir and S Kulluk ldquoGeneratingprediction rules for liquefaction through data miningrdquo ExpertSystems with Applications vol 36 no 10 pp 12491ndash12499 2009

[20] A H Gandomi and A H Alavi ldquoMulti-stage genetic pro-gramming a new strategy to nonlinear system modelingrdquoInformation Sciences vol 181 no 23 pp 5227ndash5239 2011

[21] C Kayadelen ldquoSoil liquefactionmodeling by genetic expressionprogramming and neuro-fuzzyrdquo Expert Systems with Applica-tions vol 38 no 4 pp 4080ndash4087 2011

[22] A H Gandomi and A H Alavi ldquoA new multi-gene geneticprogramming approach to non-linear system modelingmdashpartII geotechnical and earthquake engineering problemsrdquo NeuralComputing and Applications vol 21 no 1 pp 189ndash201 2012

[23] P M Muduli and S K Das ldquoCPT-based seismic liquefactionpotential evaluation using multi-gene genetic programmingapproachrdquo Indian Geotechnical Journal 2013

[24] A H Gandomi and A H Alavi ldquoHybridizing genetic pro-gramming with orthogonal least squares for modeling of soilliquefactionrdquo International Journal of Earthquake Engineeringand Hazard Mitigation vol 1 no 1 pp 2ndash8 2013

[25] S Y LaiW J Chang and P S Lin ldquoLogistic regression nodel forevaluating soil liquefaction probability using CPT datardquo Journalof Geotechnical and Geoenvironmental Engineering vol 132 no6 pp 694ndash704 2006

[26] G Livingston M Piantedosi P Kurup and T G SitharamldquoUsing decision-tree learning to assess liquefaction potentialfrom CPT and Vsrdquo in Proceedings of the 4th GeotechnicalEarthquake Engineering and Soil Dynamics Congress pp 1ndash10May 2008

[27] H Pham Ed Springer Handbook of Engineering StatisticsSpringer London UK 2006

[28] M J Noruesis IBM SPSS Decision Trees 20 IBM Corporation3rd edition 2011

[29] SPSS Inc ver 21 2013 httpwwwspsscom

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

6 The Scientific World Journal

Table 4 Risk estimation of the DT and LR models

Model Testing Training AllCHAID 0234 0141 0160E-CHAID 0234 0161 0176CART 0194 0163 0169CHAID-SPT 0169a 0188 0184E-CHAID-SPT 0177 0161 0165CART-SPT 0185 0165 0169LR 0258 0270 0268aBold sets are the lowest risk values for each set

Table 5 Sensitivity and specificity of the DT and LR models

Model Testing Training AllSensitivity Specificity Sensitivity Specificity Sensitivity Specificity

CHAID 0622 0848 0806 0898 0773 0887E-CHAID 0644 0835 0754 0902 0734 0887CART 0667 0886 0739 0909 0727 0904CHAID-SPT 0867a 0810 0763 0849 0781 0841E-CHAID-SPT 0711 0886 0768 0891 0758 0890CART-SPT 0800 0823 0796 0863 0797 0854LR 0822 0696 0673 0772 0699 0755aBold sets are the highest sensitivity values for each set

increasing percentage of fines For larger SPT values (14 ltSPT le 24) peak ground surface horizontal acceleration is thebest predictor In this range increasing peak horizontal accel-eration substantially raises the chance of liquefaction whichcorresponds with earthquake engineering fundamentals

The third level consists largely of terminal nodes and hasbranches based on four different variables earthquake mag-nitude cyclic stress ratio total vertical stress and percentageof fines In this level the liquefaction probability decreaseswith increasing total vertical stress and increasing cyclicstress ratio Like in the second level liquefaction probabilitydecreases with increasing percentage of fines

In the fourth level there are five terminal nodes onenode is split into three ranges of soil friction angle and onenode is split using two ranges of effective vertical stressIn this level the liquefaction probability decreases withincreasing effective vertical stress which corresponds to thetrend of total vertical stress in the third level and geotechnicalengineering fundamentals

The final decision tree for the CHAID model is shown inFigure 4This tree hasmore terminal nodes than theCHAID-SPT tree with 26 nodes (17 of which are terminal nodes)Thefirst level of the tree was split to six initial branches accordingto the initial soil friction angle which was indicated as thevariable most associated with liquefaction according to theCHAID algorithm As seen in the first level of the tree thenode for the highest angles (1205931015840 gt 3929∘) is a terminal nodethat indicates zero probability of liquefaction

In the second level nodes 1 and 2 related to the lowestinitial soil friction angle (1205931015840 le 2734∘) are split using thepercentage of fines This indicates that percentage of finesis most associated with liquefaction prediction for low soilfriction angles From the branches of the nodes 1 and 2 of the

CHAID tree (in the third level) the liquefaction probabilitydecreases with increasing percentage of fines which agreeswith the previous resultsThe soil with an initial friction anglebetween the 2849∘ and 335∘ the cyclic stress ratio is thebest predictor of liquefaction The probability of liquefactionincreases significantly with an increasing cyclic stress ratio

In the third level only one node is a parent node theother six nodes are terminal nodes In the fourth level thereare only two terminal nodes split using the percentage offines From the third and forth levels it can be noted thatliquefaction probability decreases with increasing percentageof fines

Froma comparison of theCHAID-SPT andCHAID treesit can be seen that percentage of fines is the best second-levelpredictor for the lowest SPT values ((119873

1)60le 14) and for the

lowest initial soil friction angle values (1205931015840 le 2734∘) Anothersimilarity of the developed CHAID and CHAID-SPT trees inthe second level is that peak horizontal acceleration is the bestpredictor for midrange values of SPT (14 lt (119873

1)60le 24) and

midrange values of soil friction angle (335∘ lt 1205931015840 le 3929∘)In the second level for the highest values of SPT and initialsoil friction angle there is zero liquefaction probability Fromthese similarities there appears to be a strong correlationbetween SPT and initial soil friction angle This generalconclusion can be also verifiedwith the correlation coefficientof these two parameters (presented in Table 3) From thesedata SPT and initial soil friction angle are high correlatedparameters

5 Summary and Conclusions

Six decision tree models were successfully developed in thisstudy to classify the liquefied and nonliquefied soil using

The Scientific World Journal 7

PHA le 18 g 18 g lt PHA

LiquefactionNo 575Yes 425 Root node

No 524Yes 476

Node 1

No 25Yes 75

Node 5

No 65Yes 935

Node 10

No 42Yes 58 Node 11

No 432Yes 568

Node 6

No 808Yes 192

Node 12

No 255Yes 745

Node 13

No 768Yes 232

Node 7

No 724Yes 276 Node 14

No 761Yes 239 Node 18

No 538Yes 462

Node 19

No 903Yes 97 Node 20

No 100Yes 0 Node 15

No 41Yes 59 Node 2

No 923Yes 77 Node 8

No 23Yes 77

Node 9

No 0Yes 100

Node 16

No 321Yes 679

Node 17

No 156Yes 844 Node 21

No 571Yes 429

Node 22

No 909Yes 91 Node 3

No 100Yes 0 Node 4

(N1)60 le 14 14 lt (N1)60 le 24 24 lt (N1)60 le 28 28 lt (N1)60

F75 le 74 74 lt F75 le 93 93 lt F75

Mw le 74 74lt Mw 028 lt 120590vo le 1741kPa 1741kPa lt 120590vo F75 le 19 19 lt F75

120593998400 le ltle 977kPa 977kPa2734∘ 298∘ lt 1205939984002734 lt 120593998400 le 298

120591av120590998400vo le 028 120591av120590998400vo

120590998400vo120590998400vo

Figure 3 Final CHAID-SPT tree

LiquefactionNo 575Yes 425

Root node

No 383Yes 617

Node 1

No 43Yes 957

Node 7

No 708Yes 292

Node 8

No 585Yes 415

Node 2

No 353Yes 647 Node 9

No 100Yes 0 Node 10

No 294Yes 706

Node 3

No 171Yes 829

Node 11

No 562Yes 438

Node 12

No 48Yes 52

Node 4

No 851Yes 149

Node 13

No 60Yes 40 Node 18

No 100Yes 0 Node 19

No 325Yes 675 Node 14

No 100Yes 0 Node 20

No 206Yes 794 Node 21

No 85Yes 915 Node 25

No 538Yes 462 Node 26

No 0Yes 100 Node 15

No 778Yes 222 Node 5

No 100Yes 0 Node 16

No 651Yes 349

Node 17

No 824Yes 176

Node 22

No 19Yes 81 Node 23

No 92Yes 8 Node 24

No 100Yes 0 Node 6

(N1)60 le 6 18 lt (N1)60 le 24(N1)60 le 186 lt (N1)60 24 lt (N1)60F75 le 74 74 lt F75

F75 le 93 93 lt F75

F75 le 93 93 lt F75 F75 le 97 97 lt F75 at le 007 g 007 g le at le 028

120593998400 le le 2734 le 2849

le 058

2734 lt 12059399840026 lt 12059399840026∘

PHA le 18 g 18 g lt PHA

335 lt 120593998400 le 39292849 lt 120593998400 le 335

058 lt

3929∘ lt 120593998400

028 lt 120591av120590998400vo 120591av120590998400vo120591av120590998400vo

Figure 4 Final CHAID tree

three different techniques Based on geotechnical engineeringtheory three of the decision trees were developed usingSPT as the first variable The results of the DT models werecompared with those of an LR model the results show thatDT models generally have better performance than the LRmodel for this problem

The two best models (CHAID and CHAID-SPT) werechosen based on the risk values and the sensitivity of theliquefaction prediction The CHAID-SPT tree has betterperformance on the unseen (testing) dataset indicating thatbasing DTmodels on engineering fundamentals can improvemodel performance The general interpretation of the treescorresponds well with engineering theory which indicatesthe reliability of themodels From the interpretation results ofthe trees the SPT values and initial soil friction angle values(used as first variables of the trees) have strong correlation

Some terminal nodes have small sample sizes whichmay lead to results that are anecdotal However theseanecdotal conclusions about soil liquefaction assessment arehighly informative from an exploratory point of view Asthe earthquake parameters do not have a wide range theirinterpretation may not be accurate enough Therefore for

future research the database used in this study should beexpanded to a wider range of earthquake parameters

References

[1] AHGandomi X S Yang S Talatahari andAHAlavi ldquoMeta-heuristics in modeling and optimizationrdquo in MetaheuristicApplications in Structures and Infrastructures A H GandomiX S Yang S Talatahari and A H Alavi Eds chapter 1 pp1ndash24 Elsevier New York NY USA 2013

[2] J Park D Ki K Kim S J Lee D H Kim and K J Oh ldquoUsingdecision tree to develop a soil ecological quality assessmentsystem for planning sustainable constructionrdquo Expert Systemswith Applications vol 38 no 5 pp 5463ndash5470 2011

[3] F Darve ldquoLiquefaction phenomenon of granular materials andconstitutive stabilityrdquo Engineering Computations vol 13 no 7pp 5ndash28 1996

[4] S S H Khozaghi and A J A Choobbasti ldquoPredicting ofliquefaction potential in soils using artificial neural networksrdquoElectronic Journal of Geotechnical Engineering vol 12 article0723 2007

8 The Scientific World Journal

[5] H B Seed and I M Idriss ldquoSimplified procedure for evaluatingsoil liquefaction potentialrdquo Journal of the Soil Mechanics andFoundations Division vol 97 no 9 pp 1249ndash1273 1971

[6] K Tokimatsu and Y Yoshimi ldquoEmpirical correlation of soilliquefaction based on SPT N-value and fines contentrdquo Soils andFoundations vol 23 no 4 pp 56ndash74 1983

[7] A Arman N Samtani R Castelli and G Munfakh ldquoGeotech-nical and foundation engineering module 1-subsurface investi-gationsrdquo FHWA-HI-97-021 1997

[8] G Kass ldquoAn exploratory technique for investigating largequantities of categorical datardquo Applied Statistics vol 29 no 2pp 119ndash127 1980

[9] D Biggs B de Ville and E Suen ldquoA method of choosing mul-tiway partitions for classification and decision treesrdquo Journal ofApplied Statistics vol 18 no 1 pp 49ndash62 1991

[10] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees ChapmanampHallCRC NewYork NY USA 1984

[11] A T C Goh ldquoSeismic liquefaction potential assessed by neuralnetworksrdquo Journal of Geotechnical Engineering vol 120 no 9pp 1467ndash1480 1994

[12] A T C Goh ldquoNeural-network modeling of CPT seismic liq-uefaction datardquo Journal of Geotechnical and GeoenvironmentalEngineering vol 122 no 1 pp 70ndash73 1996

[13] Y Najjar and H Ali ldquoOn the use of BPNN in liquefactionpotential assessment tasksrdquo in Proceedings of the InternationalWorkshop on Artificial Intelligent and Mathematical Methods inPavement and Geomechanical Systems N O Attoh-Okine Edpp 55ndash63 1998

[14] A T C Goh ldquoProbabilistic neural network for evaluatingseismic liquefaction potentialrdquo Canadian Geotechnical Journalvol 39 no 1 pp 219ndash232 2002

[15] M Pal ldquoSupport vector machines-based modelling of seismicliquefaction potentialrdquo International Journal for Numerical andAnalytical Methods in Geomechanics vol 30 no 10 pp 983ndash996 2006

[16] H B Zhao Z L Ru and S Yin ldquoUpdated support vectormachine for seismic liquefaction evaluation based on thepenetration testsrdquoMarine Georesources and Geotechnology vol25 no 3-4 pp 209ndash220 2007

[17] AMHannaDUral andG Saygili ldquoNeural networkmodel forliquefaction potential in soil deposits using Turkey and Taiwanearthquake datardquo Soil Dynamics and Earthquake Engineeringvol 27 no 6 pp 521ndash540 2007

[18] A T C Goh and S H Goh ldquoSupport vector machines theiruse in geotechnical engineering as illustrated using seismicliquefaction datardquoComputers and Geotechnics vol 34 no 5 pp410ndash421 2007

[19] A Baykasoglu A Cevik L Ozbakir and S Kulluk ldquoGeneratingprediction rules for liquefaction through data miningrdquo ExpertSystems with Applications vol 36 no 10 pp 12491ndash12499 2009

[20] A H Gandomi and A H Alavi ldquoMulti-stage genetic pro-gramming a new strategy to nonlinear system modelingrdquoInformation Sciences vol 181 no 23 pp 5227ndash5239 2011

[21] C Kayadelen ldquoSoil liquefactionmodeling by genetic expressionprogramming and neuro-fuzzyrdquo Expert Systems with Applica-tions vol 38 no 4 pp 4080ndash4087 2011

[22] A H Gandomi and A H Alavi ldquoA new multi-gene geneticprogramming approach to non-linear system modelingmdashpartII geotechnical and earthquake engineering problemsrdquo NeuralComputing and Applications vol 21 no 1 pp 189ndash201 2012

[23] P M Muduli and S K Das ldquoCPT-based seismic liquefactionpotential evaluation using multi-gene genetic programmingapproachrdquo Indian Geotechnical Journal 2013

[24] A H Gandomi and A H Alavi ldquoHybridizing genetic pro-gramming with orthogonal least squares for modeling of soilliquefactionrdquo International Journal of Earthquake Engineeringand Hazard Mitigation vol 1 no 1 pp 2ndash8 2013

[25] S Y LaiW J Chang and P S Lin ldquoLogistic regression nodel forevaluating soil liquefaction probability using CPT datardquo Journalof Geotechnical and Geoenvironmental Engineering vol 132 no6 pp 694ndash704 2006

[26] G Livingston M Piantedosi P Kurup and T G SitharamldquoUsing decision-tree learning to assess liquefaction potentialfrom CPT and Vsrdquo in Proceedings of the 4th GeotechnicalEarthquake Engineering and Soil Dynamics Congress pp 1ndash10May 2008

[27] H Pham Ed Springer Handbook of Engineering StatisticsSpringer London UK 2006

[28] M J Noruesis IBM SPSS Decision Trees 20 IBM Corporation3rd edition 2011

[29] SPSS Inc ver 21 2013 httpwwwspsscom

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

The Scientific World Journal 7

PHA le 18 g 18 g lt PHA

LiquefactionNo 575Yes 425 Root node

No 524Yes 476

Node 1

No 25Yes 75

Node 5

No 65Yes 935

Node 10

No 42Yes 58 Node 11

No 432Yes 568

Node 6

No 808Yes 192

Node 12

No 255Yes 745

Node 13

No 768Yes 232

Node 7

No 724Yes 276 Node 14

No 761Yes 239 Node 18

No 538Yes 462

Node 19

No 903Yes 97 Node 20

No 100Yes 0 Node 15

No 41Yes 59 Node 2

No 923Yes 77 Node 8

No 23Yes 77

Node 9

No 0Yes 100

Node 16

No 321Yes 679

Node 17

No 156Yes 844 Node 21

No 571Yes 429

Node 22

No 909Yes 91 Node 3

No 100Yes 0 Node 4

(N1)60 le 14 14 lt (N1)60 le 24 24 lt (N1)60 le 28 28 lt (N1)60

F75 le 74 74 lt F75 le 93 93 lt F75

Mw le 74 74lt Mw 028 lt 120590vo le 1741kPa 1741kPa lt 120590vo F75 le 19 19 lt F75

120593998400 le ltle 977kPa 977kPa2734∘ 298∘ lt 1205939984002734 lt 120593998400 le 298

120591av120590998400vo le 028 120591av120590998400vo

120590998400vo120590998400vo

Figure 3 Final CHAID-SPT tree

LiquefactionNo 575Yes 425

Root node

No 383Yes 617

Node 1

No 43Yes 957

Node 7

No 708Yes 292

Node 8

No 585Yes 415

Node 2

No 353Yes 647 Node 9

No 100Yes 0 Node 10

No 294Yes 706

Node 3

No 171Yes 829

Node 11

No 562Yes 438

Node 12

No 48Yes 52

Node 4

No 851Yes 149

Node 13

No 60Yes 40 Node 18

No 100Yes 0 Node 19

No 325Yes 675 Node 14

No 100Yes 0 Node 20

No 206Yes 794 Node 21

No 85Yes 915 Node 25

No 538Yes 462 Node 26

No 0Yes 100 Node 15

No 778Yes 222 Node 5

No 100Yes 0 Node 16

No 651Yes 349

Node 17

No 824Yes 176

Node 22

No 19Yes 81 Node 23

No 92Yes 8 Node 24

No 100Yes 0 Node 6

(N1)60 le 6 18 lt (N1)60 le 24(N1)60 le 186 lt (N1)60 24 lt (N1)60F75 le 74 74 lt F75

F75 le 93 93 lt F75

F75 le 93 93 lt F75 F75 le 97 97 lt F75 at le 007 g 007 g le at le 028

120593998400 le le 2734 le 2849

le 058

2734 lt 12059399840026 lt 12059399840026∘

PHA le 18 g 18 g lt PHA

335 lt 120593998400 le 39292849 lt 120593998400 le 335

058 lt

3929∘ lt 120593998400

028 lt 120591av120590998400vo 120591av120590998400vo120591av120590998400vo

Figure 4 Final CHAID tree

three different techniques Based on geotechnical engineeringtheory three of the decision trees were developed usingSPT as the first variable The results of the DT models werecompared with those of an LR model the results show thatDT models generally have better performance than the LRmodel for this problem

The two best models (CHAID and CHAID-SPT) werechosen based on the risk values and the sensitivity of theliquefaction prediction The CHAID-SPT tree has betterperformance on the unseen (testing) dataset indicating thatbasing DTmodels on engineering fundamentals can improvemodel performance The general interpretation of the treescorresponds well with engineering theory which indicatesthe reliability of themodels From the interpretation results ofthe trees the SPT values and initial soil friction angle values(used as first variables of the trees) have strong correlation

Some terminal nodes have small sample sizes whichmay lead to results that are anecdotal However theseanecdotal conclusions about soil liquefaction assessment arehighly informative from an exploratory point of view Asthe earthquake parameters do not have a wide range theirinterpretation may not be accurate enough Therefore for

future research the database used in this study should beexpanded to a wider range of earthquake parameters

References

[1] AHGandomi X S Yang S Talatahari andAHAlavi ldquoMeta-heuristics in modeling and optimizationrdquo in MetaheuristicApplications in Structures and Infrastructures A H GandomiX S Yang S Talatahari and A H Alavi Eds chapter 1 pp1ndash24 Elsevier New York NY USA 2013

[2] J Park D Ki K Kim S J Lee D H Kim and K J Oh ldquoUsingdecision tree to develop a soil ecological quality assessmentsystem for planning sustainable constructionrdquo Expert Systemswith Applications vol 38 no 5 pp 5463ndash5470 2011

[3] F Darve ldquoLiquefaction phenomenon of granular materials andconstitutive stabilityrdquo Engineering Computations vol 13 no 7pp 5ndash28 1996

[4] S S H Khozaghi and A J A Choobbasti ldquoPredicting ofliquefaction potential in soils using artificial neural networksrdquoElectronic Journal of Geotechnical Engineering vol 12 article0723 2007

8 The Scientific World Journal

[5] H B Seed and I M Idriss ldquoSimplified procedure for evaluatingsoil liquefaction potentialrdquo Journal of the Soil Mechanics andFoundations Division vol 97 no 9 pp 1249ndash1273 1971

[6] K Tokimatsu and Y Yoshimi ldquoEmpirical correlation of soilliquefaction based on SPT N-value and fines contentrdquo Soils andFoundations vol 23 no 4 pp 56ndash74 1983

[7] A Arman N Samtani R Castelli and G Munfakh ldquoGeotech-nical and foundation engineering module 1-subsurface investi-gationsrdquo FHWA-HI-97-021 1997

[8] G Kass ldquoAn exploratory technique for investigating largequantities of categorical datardquo Applied Statistics vol 29 no 2pp 119ndash127 1980

[9] D Biggs B de Ville and E Suen ldquoA method of choosing mul-tiway partitions for classification and decision treesrdquo Journal ofApplied Statistics vol 18 no 1 pp 49ndash62 1991

[10] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees ChapmanampHallCRC NewYork NY USA 1984

[11] A T C Goh ldquoSeismic liquefaction potential assessed by neuralnetworksrdquo Journal of Geotechnical Engineering vol 120 no 9pp 1467ndash1480 1994

[12] A T C Goh ldquoNeural-network modeling of CPT seismic liq-uefaction datardquo Journal of Geotechnical and GeoenvironmentalEngineering vol 122 no 1 pp 70ndash73 1996

[13] Y Najjar and H Ali ldquoOn the use of BPNN in liquefactionpotential assessment tasksrdquo in Proceedings of the InternationalWorkshop on Artificial Intelligent and Mathematical Methods inPavement and Geomechanical Systems N O Attoh-Okine Edpp 55ndash63 1998

[14] A T C Goh ldquoProbabilistic neural network for evaluatingseismic liquefaction potentialrdquo Canadian Geotechnical Journalvol 39 no 1 pp 219ndash232 2002

[15] M Pal ldquoSupport vector machines-based modelling of seismicliquefaction potentialrdquo International Journal for Numerical andAnalytical Methods in Geomechanics vol 30 no 10 pp 983ndash996 2006

[16] H B Zhao Z L Ru and S Yin ldquoUpdated support vectormachine for seismic liquefaction evaluation based on thepenetration testsrdquoMarine Georesources and Geotechnology vol25 no 3-4 pp 209ndash220 2007

[17] AMHannaDUral andG Saygili ldquoNeural networkmodel forliquefaction potential in soil deposits using Turkey and Taiwanearthquake datardquo Soil Dynamics and Earthquake Engineeringvol 27 no 6 pp 521ndash540 2007

[18] A T C Goh and S H Goh ldquoSupport vector machines theiruse in geotechnical engineering as illustrated using seismicliquefaction datardquoComputers and Geotechnics vol 34 no 5 pp410ndash421 2007

[19] A Baykasoglu A Cevik L Ozbakir and S Kulluk ldquoGeneratingprediction rules for liquefaction through data miningrdquo ExpertSystems with Applications vol 36 no 10 pp 12491ndash12499 2009

[20] A H Gandomi and A H Alavi ldquoMulti-stage genetic pro-gramming a new strategy to nonlinear system modelingrdquoInformation Sciences vol 181 no 23 pp 5227ndash5239 2011

[21] C Kayadelen ldquoSoil liquefactionmodeling by genetic expressionprogramming and neuro-fuzzyrdquo Expert Systems with Applica-tions vol 38 no 4 pp 4080ndash4087 2011

[22] A H Gandomi and A H Alavi ldquoA new multi-gene geneticprogramming approach to non-linear system modelingmdashpartII geotechnical and earthquake engineering problemsrdquo NeuralComputing and Applications vol 21 no 1 pp 189ndash201 2012

[23] P M Muduli and S K Das ldquoCPT-based seismic liquefactionpotential evaluation using multi-gene genetic programmingapproachrdquo Indian Geotechnical Journal 2013

[24] A H Gandomi and A H Alavi ldquoHybridizing genetic pro-gramming with orthogonal least squares for modeling of soilliquefactionrdquo International Journal of Earthquake Engineeringand Hazard Mitigation vol 1 no 1 pp 2ndash8 2013

[25] S Y LaiW J Chang and P S Lin ldquoLogistic regression nodel forevaluating soil liquefaction probability using CPT datardquo Journalof Geotechnical and Geoenvironmental Engineering vol 132 no6 pp 694ndash704 2006

[26] G Livingston M Piantedosi P Kurup and T G SitharamldquoUsing decision-tree learning to assess liquefaction potentialfrom CPT and Vsrdquo in Proceedings of the 4th GeotechnicalEarthquake Engineering and Soil Dynamics Congress pp 1ndash10May 2008

[27] H Pham Ed Springer Handbook of Engineering StatisticsSpringer London UK 2006

[28] M J Noruesis IBM SPSS Decision Trees 20 IBM Corporation3rd edition 2011

[29] SPSS Inc ver 21 2013 httpwwwspsscom

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

8 The Scientific World Journal

[5] H B Seed and I M Idriss ldquoSimplified procedure for evaluatingsoil liquefaction potentialrdquo Journal of the Soil Mechanics andFoundations Division vol 97 no 9 pp 1249ndash1273 1971

[6] K Tokimatsu and Y Yoshimi ldquoEmpirical correlation of soilliquefaction based on SPT N-value and fines contentrdquo Soils andFoundations vol 23 no 4 pp 56ndash74 1983

[7] A Arman N Samtani R Castelli and G Munfakh ldquoGeotech-nical and foundation engineering module 1-subsurface investi-gationsrdquo FHWA-HI-97-021 1997

[8] G Kass ldquoAn exploratory technique for investigating largequantities of categorical datardquo Applied Statistics vol 29 no 2pp 119ndash127 1980

[9] D Biggs B de Ville and E Suen ldquoA method of choosing mul-tiway partitions for classification and decision treesrdquo Journal ofApplied Statistics vol 18 no 1 pp 49ndash62 1991

[10] L Breiman J H Friedman R A Olshen and C J StoneClassification and Regression Trees ChapmanampHallCRC NewYork NY USA 1984

[11] A T C Goh ldquoSeismic liquefaction potential assessed by neuralnetworksrdquo Journal of Geotechnical Engineering vol 120 no 9pp 1467ndash1480 1994

[12] A T C Goh ldquoNeural-network modeling of CPT seismic liq-uefaction datardquo Journal of Geotechnical and GeoenvironmentalEngineering vol 122 no 1 pp 70ndash73 1996

[13] Y Najjar and H Ali ldquoOn the use of BPNN in liquefactionpotential assessment tasksrdquo in Proceedings of the InternationalWorkshop on Artificial Intelligent and Mathematical Methods inPavement and Geomechanical Systems N O Attoh-Okine Edpp 55ndash63 1998

[14] A T C Goh ldquoProbabilistic neural network for evaluatingseismic liquefaction potentialrdquo Canadian Geotechnical Journalvol 39 no 1 pp 219ndash232 2002

[15] M Pal ldquoSupport vector machines-based modelling of seismicliquefaction potentialrdquo International Journal for Numerical andAnalytical Methods in Geomechanics vol 30 no 10 pp 983ndash996 2006

[16] H B Zhao Z L Ru and S Yin ldquoUpdated support vectormachine for seismic liquefaction evaluation based on thepenetration testsrdquoMarine Georesources and Geotechnology vol25 no 3-4 pp 209ndash220 2007

[17] AMHannaDUral andG Saygili ldquoNeural networkmodel forliquefaction potential in soil deposits using Turkey and Taiwanearthquake datardquo Soil Dynamics and Earthquake Engineeringvol 27 no 6 pp 521ndash540 2007

[18] A T C Goh and S H Goh ldquoSupport vector machines theiruse in geotechnical engineering as illustrated using seismicliquefaction datardquoComputers and Geotechnics vol 34 no 5 pp410ndash421 2007

[19] A Baykasoglu A Cevik L Ozbakir and S Kulluk ldquoGeneratingprediction rules for liquefaction through data miningrdquo ExpertSystems with Applications vol 36 no 10 pp 12491ndash12499 2009

[20] A H Gandomi and A H Alavi ldquoMulti-stage genetic pro-gramming a new strategy to nonlinear system modelingrdquoInformation Sciences vol 181 no 23 pp 5227ndash5239 2011

[21] C Kayadelen ldquoSoil liquefactionmodeling by genetic expressionprogramming and neuro-fuzzyrdquo Expert Systems with Applica-tions vol 38 no 4 pp 4080ndash4087 2011

[22] A H Gandomi and A H Alavi ldquoA new multi-gene geneticprogramming approach to non-linear system modelingmdashpartII geotechnical and earthquake engineering problemsrdquo NeuralComputing and Applications vol 21 no 1 pp 189ndash201 2012

[23] P M Muduli and S K Das ldquoCPT-based seismic liquefactionpotential evaluation using multi-gene genetic programmingapproachrdquo Indian Geotechnical Journal 2013

[24] A H Gandomi and A H Alavi ldquoHybridizing genetic pro-gramming with orthogonal least squares for modeling of soilliquefactionrdquo International Journal of Earthquake Engineeringand Hazard Mitigation vol 1 no 1 pp 2ndash8 2013

[25] S Y LaiW J Chang and P S Lin ldquoLogistic regression nodel forevaluating soil liquefaction probability using CPT datardquo Journalof Geotechnical and Geoenvironmental Engineering vol 132 no6 pp 694ndash704 2006

[26] G Livingston M Piantedosi P Kurup and T G SitharamldquoUsing decision-tree learning to assess liquefaction potentialfrom CPT and Vsrdquo in Proceedings of the 4th GeotechnicalEarthquake Engineering and Soil Dynamics Congress pp 1ndash10May 2008

[27] H Pham Ed Springer Handbook of Engineering StatisticsSpringer London UK 2006

[28] M J Noruesis IBM SPSS Decision Trees 20 IBM Corporation3rd edition 2011

[29] SPSS Inc ver 21 2013 httpwwwspsscom

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of