evaluating the poential of digital soil mapping...
TRANSCRIPT
INTERUNIVERSITY PROGRAMME IN
PHYSICAL LAND RESOURCES
Ghent University Vrije Universiteit Brussel
Belgium
EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING TO MAP SOIL TYPES IN VIETNAM
Promoter: Prof. Dr. Peter Finke
Master dissertation submitted in partial fulfillment of the requirements for the degree of Master of Science in Physical Land Resources by Doan Thanh Thuy
Academic Year 2012 - 2013
I C E
This is an unpublished M.Sc dissertation and is not prepared for further distribution. The author and the promoter give the permission to use this Master dissertation for consultation and to copy parts of it for personal use. Every other use is subject to the copyright laws, more specifically the source must be extensively specified when using results from this Master dissertation. Gent, The Promoter, The Author, Prof. Dr. Peter Finke Doan Thanh Thuy
i
TABLE OF CONTENTS
TABLE OF CONTENTS .......................................................................................................................... i
LIST OF TABLES .................................................................................................................................. iv
LIST OF FIGURES ................................................................................................................................. v
ABSTRACT ........................................................................................................................................... vi
SAMENVATTING ................................................................................................................................. vii
1. INTRODUCTION ................................................................................................................................ 1
2. OBJECTIVES AND HYPOTHESIS .................................................................................................... 2
2.1. Objectives ................................................................................................................................... 2
2.2. Hypothesis .................................................................................................................................. 2
3. LITERATURE REVIEW ...................................................................................................................... 3
3.1. Map of soil types ........................................................................................................................ 3
3.1.1. Soil classification ................................................................................................................... 3
3.1.2. World Reference Base for Soil Classification. ....................................................................... 4
3.1.3. Major soil types in Vietnam .................................................................................................... 6
3.2. Digital soil mapping ................................................................................................................... 8
3.2.1. Soil mapping .......................................................................................................................... 8
3.2.2. Overview of Digital Soil Mapping. .......................................................................................... 9
3.2.3. Digital soil mapping methods for mapping soil types. .......................................................... 11
4. MATERIALS AND METHODS ......................................................................................................... 15
4.1. Study area ................................................................................................................................. 15
4.2. Data collection .......................................................................................................................... 15
4.2.1. Soil point data ...................................................................................................................... 15
4.2.2. Digital elevation model (DEM) ............................................................................................. 18
4.2.3. Remote Sensing indices ...................................................................................................... 18
4.2.4. Land use map ...................................................................................................................... 18
4.3. Multinomial logistic regression .............................................................................................. 19
4.3.1. The multinomial logistic regression model ........................................................................... 19
4.3.2. Assessing model significance and contribution of predictors .............................................. 20
4.4. Artificial neural network .......................................................................................................... 20
4.5. Validation .................................................................................................................................. 21
4.6. Soil diversity indices ............................................................................................................... 22
4.7. Combined Index practical management ................................................................................ 23
5. RESULTS AND DISCUSSION ......................................................................................................... 24
5.1. The soil maps modeled by multinomial logistic regression ................................................ 24
5.2. The soil maps modeled by artificial neural network ............................................................. 27
ii
5.3. Comparison of predictive methods ........................................................................................ 31
5.3.1. Soil map purity ..................................................................................................................... 31
5.3.2. Soil diversity ........................................................................................................................ 32
6. CONCLUSION .................................................................................................................................. 35
BIBLIOGRAPHY .................................................................................................................................. 36
iii
ACKNOWLEDGEMENTS
I am grateful to many people for help, both direct and indirect, in doing my thesis as well as my study
at Ghent University
First and foremost I would like to express my sincerest gratitude to my promoters: Prof. Dr. Peter
Finke for the continuous support of my research, for his patience, motivation, enthusiasm and
immense knowledge. This thesis cannot be finished without his encouragements and supports. Under
the supervision of my promoters, I have gained not only much of knowledge in digital soil mapping, but
also much of experiences in work organization for which I highly appreciate.
I would like to express my thankfulness to the colleagues in Department of Soil Genesis and
Classification Research - Soils and Fertilizers Research Institute charged by Dr. Tran Minh Tien for
providing me with soil observations database based on which I have built the models.
Besides, I would like to thank VLIR who have provided me with financial supports, as well as all
teachers and staffs in Ghent University who made my learning desire become realistic. My sincere
thanks also go to ITC-Gent staffs, especially two wonderful coordinators: Hilde Luyckx and Anne-
Marie Tanghe , who have helped me a lot in organizing my life and my study in Belgium.
I owe my deeply gratitude to all my ex-teachers who have given me the knowledge and promotion to
pursue higher education level. I am grateful to Dr. Tran Quoc Vinh, Dr. Le Thi Giang and Assoc Prof.
Dr. Tran Duc Vien for their supports and encouragements.
I wish to thank my family members, especially my parents. They raised me, supported me, taught me,
and loved me. To them I dedicate this thesis. The most special thanks goes to my best partner and
friend, my husband for giving me unconditional support and love.
Doan Thanh Thuy
August, 2013
iv
LIST OF TABLES
Table 1. Diagnostic horizons, properties and materials for classification into World Reference Base ... 5 Table 2. Classification of Vietnam’s soil types in FAO - UNESCO ......................................................... 7 Table 3. Presence of soil profiles at the most detailed categorical level .............................................. 16 Table 4. Presence of soil profiles at the intermediate categorical level ................................................ 17 Table 5: Available ancillary data ........................................................................................................... 19 Table 6: Confusion matrix ..................................................................................................................... 22 Table 7: The variable used to predict soil group and intermediate level of soil group in MLR. ............. 24 Table 8: The variable used to predict soil group and intermediate level of soil group in ANNs. ........... 28 Table 9: Distribution of soil classes predicted by MLR and ANNs ........................................................ 31 Table 10: Map purity, diversity indices and combined index of maps predicted by MLR and ANNs .... 32
v
LIST OF FIGURES
Figure 1: Generalized flowcharts for Digital Soil Mapping .................................................................... 10 Figure 2: Exemplified topology of a feed-forward multilayer ANNs. ..................................................... 13 Figure 3: Location of study area in Vietnam ......................................................................................... 15 Figure 4: Map of Reference Soil Group predicted by Multinomial Logistic Regression ........................ 25 Figure 5: Map of intermediate level of Soil Group predicted by Multinomial Logistic Regression ........ 26 Figure 6: Digital Elevation Model of Bac Ninh ...................................................................................... 27 Figure 7: Map of Reference Soil Group predicted by Artificial Neural Networks .................................. 29 Figure 8: Map of intermediate level of Soil Group predicted by Artificial Neural Networks .................. 30 Figure 9: Variation of the purity, Shannon Index and the combined index for the map predicted by MLR
at two level of soil class ........................................................................................................................ 33 Figure 10: Variation of the purity, Shannon Index and the combined index for the map predicted by
ANNs at two level of soil class .............................................................................................................. 34
vi
ABSTRACT
There has been considerable expansion in the application of digital soil mapping (DSM) techniques
because it could help to save much time and costs for collecting and analyzing soil data points
compared to conventional methods. This research aims to assess the potential of mapping soil types
in a Northern region of Vietnam based on the comparison between two DSM methods: Multinomial
Logistic Regression (MLR) and Artificial Neural Networks (ANNs).
Eight predictive variables were derived from the ancillary data including land use, altitude, slope,
NDVI, PVI, RVI, Topographic Wetness Index and SAGA Wetness Index. MLR and ANNs models were
constructed to predict soil classes at 2 levels: WRB-Reference Soil Group and intermediate level of
Soil Group between Reference Soil Group and the full WRB soil name. The map quality was indicated
by the soil map purity estimated with an independent validation dataset. The diversity indices were
calculated to assess the information content of the resultant maps. Selection of the best model is
based on the soil map purity, the Shannon’s entropy and a combined index.
At both taxonomic levels, MLR yields higher map purity than ANNs. When the taxonomic level
changed from Reference Soil Group level to intermediate level the map purity decreases while the
value of the diversity indices increases. Therefore, soil mapping using MLR in predicting Reference
Soil Group will be more efficient. However, at intermediate level, the model predicts higher diversity of
soil map and thus the informative value estimated by the combined index is higher.
vii
SAMENVATTING
De toepassing van digitalebodemkarteringsmethoden (DSM) is sterktoegenomen ten opzichte van
meerconventionelemethodenomdathiermeetijd en kostenkunnenwordenbespaardbij de verzameling
en analyse van bodemdata op de puntschaal. Ditonderzoekrichtzich op eeninschatting van het
potentieel van twee DSM-methodenbij de kartering van bodemklassen in eenregio in het noorden van
Vietnam: MultinomialeLogistischeRegressie (MLR) en ArtificiëleNeuralenetwerken (ANNs).
Acht predictive variabelen, afgeleid van hulpgegevenswerdengebruiktombodemklassentekarteren op
het niveau van de WRB-Reference Soil Group en eenniveautussen de Reference Soil Group en de
volledige WRB-naam. Dezepredictievevariabelenomvatten het landgebruik, de terreinhoogte, de
helling, de NDVI, PVI, RVI, de ‘Topographic Wetness Index’ en de ‘SAGA Wetness Index’. De
kaartkwaliteitwerdaangeduid met de kaartzuiverheid, welkewerdgeschat met
eenonafhankelijkegegevensset. Diversiteitsindiceswerdenberekendom het informatiegehalte van de
resultaatkaarten in teschatten. De selectie van het beste model is gebaseerd op de kaartzuiverheid,
Shannon’s entropy en eengecombineerde index.
Op beidetaxonomischeniveausgeeft MLR eenhogerekaartzuiverheiddan ANNs. Bijverandering van het
taxonomischniveau van Reference Soil Group naar het tussenliggendeniveauneemt de
kaartzuiverheidafterwijl de waarde van de diversiteitsindicestoeneemt.
Daaromzalbodemkarteringgebruikmakend van MLR om Reference Soil Groups
tekarterenefficienterzijn. Echter, op het tussenliggendeniveauvoorspelt het model eenhogerediversiteit
in de bodemkaart en is dushetinformatiegehaltezoalsingeschat door de gecombineerde index hoger
1
1. INTRODUCTION
Soil remains one of the most important, yet most abused natural resources on the planet; indeed a
responsible management of soil resources plays a critical role in the survival and prosperity of many
nations around the world (White, 2005). Soil is limited in quantity and degradable in quality. Soil is an
irreplaceable capital good in all the productive activities of human and plays central role in the natural
environment.
The understanding of soil properties and behavior strongly support sustainable land management.
During the last decade, increasing attention has been paid to the soil resource in order to understand
the internal mechanisms that define its nature as well as its relationship with other environmental
factors. One of the most helpful and functional tools to study soil science is soil mapping. Many
countries have been involved in making maps of their soils to determine the range of soil types in their
territory, where they occur and how they can be used efficiently. Soil mapping is the combination of
locating and identifying the different soil types by collecting information about their location, properties
and potential use, and recording this information on maps and all supporting documents.
Modern users of soil geo-information require maps at detailed scales. The technological and
theoretical advances in the last 20 years have led to a number of new methodological improvements
in the field of soil mapping. Most of these belong to the domain of a new emerging discipline –
pedometrics – for the quantitative, (geo)statistical production of soil geoinformation. Pedometrics is
strongly focused on predictive or digital soil mapping (DSM). DSM embraces a set of quantitative
mapping methods that have developed from more traditional soil mapping techniques. There were
various case studies that demonstrated the application of DSM methods in mapping soil properties
and classes, updating soil attribute maps or mapping soil features (CarréGirard, 2002; Jafari et al.,
2012; Kempen et al., 2009; Yang et al., 2011). Because traditional methodologies are costly and time-
consuming, the use of DSM methods has increased and has resulted in improvements in soil survey
and classification steps, also allowing the application of the results in other similar landscapes
(Resende, 2000).
Vietnam has a total land area of 32.924.064 ha (9.345.346 ha of agriculture land, 11.575.429 ha of
forestry land, 1.532.843 ha of special used land, 443.147 ha of residential land and 10.027.265 ha of
un-used land). Vietnam is a developing country; agriculture has played a key role in the economy
coupled with the dramatic development of industry and service. These create a huge pressure in using
land resources. Its government attached great importance to the appropriate management and use of
land to serve the needs of production and people’s lives, on the basis of sustainable development.
In Vietnam, the national map of soil types at a scale of 1:1,000,000 was published by the Vietnamese
Soil Science Association in 1996 using the classification of FAO (FAO/ISRIC/CSIC, 1988). Although
soil mapping has made certain progress, conventional soil maps produced in the past decades are the
major data sources for information on the spatial variation of soil. They are limited in terms of both the
level of spatial detail and the accuracy of soil attributes as well as high requirements of costs and time.
2
2. OBJECTIVES AND HYPOTHESIS
2.1. Objectives
The overall objective of this project is to propose a digital soil mapping method which is capable of
mapping soil types of Vietnam in more detail and requires lower costs for soil survey. To do this, the
following tasks were identified:
1. Apply two chosen methods: Multinomial Logistic Regression and Artificial Neural
Networks on mapping soil types at 2 levels: WRB-Reference Soil Group and intermediate
level of reference soil group with respect to the soil management in the test area,
respectively.
2. Validate the resultant maps using independent validation data.
3. Select the Digital Soil Mapping method based on the comparison of the two methods in
the test area at the 2 classification levels by evaluation of the quality (taxonomy purity)
and the information content of the resulting maps.
2.2. Hypothesis
It is possible to sample a reference area including most of the soil types of a region. Based on this
area, the prediction of soil distribution in other areas may be facilitated if there are enough data
observations and other ancillary data.
3
3. LITERATURE REVIEW
3.1. Map of soil types
3.1.1. Soil classification
Classification of soils is finding some common properties or behavior between individual soil profiles in
order to make meaningful classes to help us organize our knowledge and simplify our decision-making
in soil management. Soils profiles were classified by grouping them into classes, for example soil
series. These classes form other objects and then can be classified into still more general classes, e.g.
Reference Soil Group. This is a hierarchic classification, and is common in soil science.
Why do we classify soil?
Classification is an essential part of data reduction process, whereby complex sets of observations are
made understandable. Another obvious reason for classifying is to save time and simplify our
description. Simply, if many of the soil types have three or four properties in common, it is sensible to
use one short name for them all in order to be easier to remember and define the relationships among
them. Classification studies the number and composition of the groups in a set of data, which allows
the human mind to recall information and relate entities and attributes to each other. One important
function of soil classification is to accommodate the apparent individuals in the most satisfactory
manner so as to permit the compilation of legible and meaningful soil maps. It also facilitates the
prediction of unknown soil types which is based on the observed property ranges and the factors
which govern soil formation. Finally, soil classification enables a concise description of the spatial
variation of soil as a three dimensional multivariate system.
Soil classification and soil maps are important basic documents for soil survey, evaluating soil, land
management, land-use planning and agricultural planning. Depending on the characteristics and
nature of each soil types, managers can allocate appropriate land-use economically and sustainably.
The development of many soil classification systems all over the world reflect different views based on
the concepts of soil formation and mirror differences of opinion about the criteria to be used for
classification. The two most important international scientific soil classification systems that are still
being developed and maintained are both diagnostic systems, and hence are based on the absence
or presence of diagnostic properties:
* USDA (United States Department of Agriculture): Soil Taxonomy.
* FAO (Food and Agriculture Organization) of the United Nations and UNESCO: World Reference
Base for Soil Resources.
In these systems, diagnostic properties of the soil are derived from the subdivision of the soil profile
into horizons and the soil properties of each of these horizons. The (hierarchical) classification is done
like in other determination systems (flora, fauna) by means of determination keys (Finke, 2011).
4
In this project, World Reference Base (WRB) was selected as a base for correlation of soil units in
Vietnam. WRB is designed for such a purpose to serve “as an easy means of communication amongst
scientists to identify, characterize and name major types of soils, to be a tool for better correlation
between national systems, to act as common dominator through which national system can be
compared” (WRB, 2006).
3.1.2. World Reference Base for Soil Classification.
WRB is the international standard soil classification system sanctioned by the International Union of
Soil Science. It was developed by an international collaboration coordinated by the International Soil
Reference and Information Center (ISRIC) and sponsored by the International Union of Soil Science
(IUSS) and the FAO. It replaces the FAO Legend for the Soil Map of the World.
Classification principles:
Stepwise, classification of a soil in WRB proceeds as follows:
1. Soil morphology and horizonation is described according to the FAO-guidelines for soil
descriptions (FAO, 2006). Colors are recorded using the Munsell Colors Charts (KIC, 1990).
Chemical and physical characteristics are determined conform to the Procedures for Soil
analysis (Van ReeuwijkHouba, 1998).
2. Diagnostic horizons, properties and materials are inferred.
3. The classification itself is a 2-tier approach:
a. The first level is the classification into the Reference Soil Group (RSG), using the
classification key. In a specified, mandatory order, each RSG is tested against the identified
diagnostic horizons, properties and materials. The first RSG that gives a fully positive test
results is the classified RGS.
b. In the second level, the RSG is further specified using prefix and suffix qualifiers. Prefix
qualifiers, RSG name and suffix qualifiers together, in a prescribed order, make the full
taxonomic name. Each RSG has a unique set of these prefix and suffix qualifiers that are
eligible. The election of a qualifier is again based on diagnostic horizons, properties and
materials (Finke, 2011).
Diagnostic horizons, properties and materials
Tab.1 gives an overview about the diagnostic criteria of the horizons, properties and materials for
classification (WRB, 2006). The inference of these diagnostics from the soil description and laboratory
measurements is done by application of logical (AND, EITHER, OR) operators to diagnostic criteria
using profile data and measurements.
5
Table 1. Diagnostic horizons, properties and materials for classification into World Reference Base (WRB, 2006)
Diagnostic horizons Diagnostic properties Albic Salic Abrupt textural change
Anthraquic Sombric Albeluvic tonguing
Anthric Spodic Andic properties
Argic Takyric Aridic properties
Calcic Terric Continuous rock
Cambic Thionic Ferralic properties
Cryic Umbric Geric properties
Duric Vertic Gleyiccolour pattern
Ferralic Voronic Lithological discontinuity
Ferric Yermic Reducing conditions
Folic Secondary carbonates
Fragic Stagniccolour pattern
Fulvic Vertic properties
Gypsic Vitric properties
Histic
Hortic
Hydragric Diagnostic materials
Irragric Artefacts
Melanic Calcaric material
Mollic Colluvic material
Natric Fluvic material
Nitic Gypsiric material
Petrocalcic Limnic material
Petroduric Mineral material
Petrogypsic Organic material
Petroplinthic Ornithogenic material
Pisoplinthic Sulphidic material
Plaggic Technic hard rock
Plinthic Tephric material
Salic
Elements for lower level units
Soil subunits can be identified in WRB in the second level of classification. In this level, so-called
qualifiers are added to the RSG name. Each RSG has a unique list of qualifiers that can (or not) be
selected based on the presence of diagnostic horizons, properties or materials. There are two groups
of qualifiers:
1. Prefix qualifiers: these qualifiers describe properties of the RSG that are either:
a. Typically associated to the RSG;
b. Intergrades to other RSG;
6
c. The haplic prefix qualifier is only used when no typically associated or intergraded qualifiers apply.
2. Suffix qualifiers: these qualifiers give additional information on the RSG and are related to either:
a. Diagnostic horizons, properties or materials,
b. Chemical properties,
c. Physical characteristics,
d. Mineralogical characteristics,
e. Surface characteristics,
f. Texture,
g. Colour,
h. Other characteristics.
3.1.3. Major soil types in Vietnam
Vietnam has a total land area of 32.924.064 ha, population of 90.549.390 in 2011 with a growth rate of
1.02%. Viet Nam can be divided into four physiographic regions: the Annamese extending from north
to south through west-central Viet Nam, the Red River delta in the north, the Mekong River delta in the
south, and the coastal plain in the east. The extremely rugged and densely forested Cordillera, a
southward extension of the Yunnan Plateau, covers about two-thirds of the country. Parallel
northwest-southeast ranges with several peaks rising to more than 1,800 meters dominate the
northern half, and a series of heavily eroded longitudinal plateaus average elevation 750 to 1,500
meters extend into the southern half.
According to the reports of national project named “Mapping soil types of Vietnam using the
classification of World Reference Base for Soil Resources of FAO”, Vietnam has twenty one soil
groups with 61 soil units (Table 2). However, for easier evaluation these soils can be grouped into 2
big combinations:
- Mountainous and hilly soils: Most are Acrisols, Ferralsols or Alisols. Under annual cropping,
without reasonable improving measure, the soil is rapidly degraded. The mountainous and hilly
soils should be reserved for forestation, cultivation of perennial crops, and fruit crops with
appropriate protection measures.
- Delta soils: The centers of food production are mainly the deltas of the Red River, the Mekong
River and other rivers. These are regions with high levels of intensive cultivation and crop
intensity. With irrigation, moisture is sufficient, the rate of soil degradation is low; alluvial deposits
bring fertility annually; this is often augmented by organic and mineral fertilizers (Mui, 2006).
7
Table 2. Classification of Vietnam’s soil types in FAO - UNESCO
In recent years, the significant socio-economic development of Vietnam along with the high growth
rate of the population has caused intense pressure on soil resources. In order to manage that
important resource sustainably and reasonably, it was needed to have a national soil map. This
document should provide basic data about soil characteristics and properties, which are very
necessary information for using land. In 1996, a national soil map of Vietnam at a scale of 1/1,000,000
was published by Vietnamese Soil Science Association using the classification of FAO (FAO, 1988,
No Symbol Name FAO-UNESCO No Symbol Name
FAO-UNESCO I AR XI VR Vertisols
1 ARl LuvicArenosols 34 VRe Eutric Vertisols 2 ARr RhodicArenosols 35 VRd Dystric Vertisols 3 ARh HaplicArenosols XII LX Lixisols 4 ARb CambicArenosols 36 LXh HaplicLixisols 5 ARa AlbicArenosols 37 LXx ChromicLixisols 6 ARg GleyicArenosols 38 LXh HaplicLuvisols 7 ARo FerralicArenosols XIII CL Calcisols II SC Solonchaks 39 CLh HaplicCalcisols 8 SCg GleyicSolonchaks 40 CLl LuvicCalcisols 9 SCh HaplicSolonchaks XIV PT Plinthosols
10 SCm MollicSolonchaks 41 PTd DystricPlinthosols III FLt ThionicFluvisols 42 PTa AlbicPlinthosols 11 GLtp Proto-ThionicGleysols 43 PTu Humic Plinthosols 12 FLto Orthi-ThionicFluvisols XV PD Podzoluvisols IV FL Fluvisols 44 PDd DystricPodzoluvisols 13 FLe EutricFluvisols 45 PDg GleyicPodzoluvisols 14 FLd DystricFluvisols XVI AC Acrisols 15 FLg GleyFluvisols 46 ACh HaplicAcrisols 16 FLu UmbricFluvisols 47 ACp PlinthicAcrisols 17 FLb CambicFluvisols 48 ACg GleyicAcrisols V GL Gleysol 49 ACf FerricAcrisols
18 GLe EutricGleysol 50 ACu Humic Acrisols 19 GLd DystricGleysol XVII NT Nitisols 20 GLu UmbricGleysol 51 NTh HaplicNitisols VI HS Histosols 52 NTr RhodicNitisols 21 HSf FibricHistosol XVIII FR Ferralsols 22 HSt ThionicHistosol 53 FRr RhodicFerralsols VII SN Solonetz 54 FRx XanthicFerralsols 23 SNh HaplicSolonetz 55 FRp PlinthicFerralsols 24 SNg GleyicSolonetz 56 FRu Humic Ferralsols
VIII CM Cambisols XIX AL Alisols 25 CMe EutricCambisols 57 ALh Humic Alisols 26 CMd DystricCambisols 58 ALg GleyicAlisols IX AN Andosols 59 ALu HisticAlisols 27 ANh HaplicAndosols XX LP Leptosols 28 ANm MollicAndosols 60 LPq LithicLeptosols X LV Luvisols XXI AT Anthrosols
29 LVf FerricLuvisols 61 AT Anthrosols 30 LVg GleyicLuvisols 31 LVk CalcicLuvisols 32 LVx ChromicLuvisols 33 LVq LithicLuvisols
8
FAO, 1994). This soil map was built by conventional soil mapping methods which are generally
created using free survey. The soil surveyor employs a conceptual soil-landscape model to select
observation locations at which the most useful information is likely to be obtained. The average area
for each observation was 1920 ha. The soil samples then were analyzed in the laboratory. Landscape
features as seen in the field and expert experiments are also taken into account to describe the soil
profile. The soil map has a set of soil profile descriptions. Each map unit is characterized by one or
more representative soil profiles of the soil types that comprise the map unit. These profiles are used
for the interpretation of the soil map.
Subsequently, some regions and provinces also have soil maps at larger scale, for example: Soil map
of Tay Nguyen region at scale of 1/100.000, Soil map of Nam Dinh, Ninh Binh at scale of 1/50.000…
Nevertheless, because of the lack of government funds, there are still many regions and areas in
Vietnam that do not have soil map which is a very important material to manage and use land
efficiently. Therefore, a new method which can map soil in more detail but cost less than conventional
soil mapping method is needed in order to deal with those problems in Vietnam.
3.2. Digital soil mapping
3.2.1. Soil mapping
Soil mapping or soil survey is a process of determining the spatial distribution of physical, chemical
and descriptive soil properties and presenting it in understandable and interpretable form to various
users (Beckett, 1976; DentYoung, 1981). Traditional soil mapping consists of the following steps:
- Project planning;
- Preparation for fieldwork;
- Photo-interpretation and pre-processing of auxiliary data;
- Collecting field data and laboratory analysis;
- Data input and organization
- Presentation and application of soil mapping products.
Project planning is especially important step for a success of soil survey project because it includes
definition of a sampling plan, inspection density, classification system and data organization system.
Preparation for fieldwork typically includes literature study and reconnaissance surveys. The end
product of a soil mapping project is a soil resource inventory, i.e. a map showing distribution of soils
and its properties accompanied by a soil survey report (Avery, 1987)
Due to the significant development of informatics, the soil resource inventory data is organized into a
thematic type of geographic information system called a Soil Information System (SIS), of which the
major part is a Soil Geographical Database (SGDB) (Burrough, 1991). This is a combination of spatial
data (map of polygon and point) closely linked with attribute data for profile observations, soil mapping
units, soil classes and all relevant data. SIS was not only applied to soil science but also on a wide
range of civil applications such as planning, urban administration, environment… It offers not only the
9
information on soils but also on their potential (and actual) use, environmental risks involved (e.g.
erosion risk) and gives prediction of soil behavior on intended management (Hengl, 2003).
Soil mapping projects vary in the inspection intensity levels, purpose and type of conceptual models
used. In view of the intensity levels, soil mapping projects range from small scale (1:100 K to 1: 1 M)
surveys to medium (1:50 K) and large scale surveys (1:25 K to 1:5 K or larger). Considering the
intentional purposes, a soil mapping project can be classified as the special purpose (commonly
referred to as thematic) and general purpose. The former is completely demand-driven and focuses on
a limited set of soil variables or a single soil variable, the latter is more holistic, but also more complex,
thus more costly and often not affordable at large scale. The conceptual models of soils reflect the
purpose of the mapping project: (i) special-purpose mapping projects commonly follow the continuous
model of spatial variation, thus geostatistical techniques are used to make prediction; (ii) general-
purpose mapping projects commonly rely on photo-interpretation and profile descriptions, following the
discrete model of spatial variation (Hengl, 2003).
It is not easy to cope with soil variation from the beginning of the soil mapping. Soil variables vary not
only horizontally but also with depth, not only continuously but also abruptly. Soil mapping requires
much denser field inspections in comparison with vegetation or land use mapping. Furthermore, soil
horizons and soil types are often hard to be distinguished or measured. Especially the polygenetic
nature of soils has always been a main problem in description and classification of soils (Jenny, 1941).
Many pioneer soil geographers have wondered whether they will be able to fully describe the patterns
of soil cover (Jenny, 1941). The quality and usefulness of the polygon-type soil maps has for decades
been an object of argue (WebsterBeckett, 1968). However, it is obvious that the technological and
theoretical progress in the last 30 years have led to a dramatic improvement in mapping soil
methodology. Most of these belong to the new emerging discipline: Digital Soil Mapping (DSM)
3.2.2. Overview of Digital Soil Mapping.
The great expansion in informatics has yielded huge amounts of data and tools in all fields of
application. Soil science is no exception, with the ongoing development of regional, national,
continental and worldwide database. The challenge of understanding these large stores of data has
led to the development of new tools in the field of statistics and spawned new areas such as data
mining and machine learning (Hastie et al., 2001). In soil science, the development of GIS, GPS,
Remote Sensing and data sources such as digital elevation models (DEMs) is leading to new ways
forward. These techniques provide wide range of soil data and information for environmental
monitoring and modeling.
Worldwide, there are more and more researchers that investigate the potential of applying the new
techniques of information technology and science to soil survey and soil mapping. The main principle
is soil assessment using GIS, for example the digital soil property and class maps with the constraint
of limited fieldwork and laboratory analysis which are very expensive. DSM is the next great
advancement in delivering soil survey information.
10
DSM is a spatial soil information system created by numerical models that account for the spatial and
temporal variations of soil properties based on soil information and related environmental variables
(LagacherieMcBratney, 2007). Pedologists working with DSM technology are dealing with various
topics: the production and processing of covariates (soil forming factors derived from remote sensing,
digital terrain models, existing soil maps, et cetera), the collection of soil data, the development of soil
predictions based on numerical models, the evaluation of the quality and the representation of digital
soil maps. The recent advances and open questions within each of these topics are already examined
with a certain success.
The world’s overpopulation of the human race and associated pressures on resources, necessitate the
immediate need for valuable soil information to make informed decisions about the soil resource as
well as make people aware of the problems and potential problems. We do not have enough time or
resources to canvass the earth to make soil surveys by our traditional methods. DSM would be able to
deliver the needed information and may provide better and more accurate information. DSM is a
credible alternative to fulfill the increasing worldwide demand in spatial soil data due to its ability to (i)
increase spatial resolutions and enlarge extents and (ii) convey relevant information. The first
challenge requires developing a specific spatial data infrastructure for DSM, to implement DSM in
existing soil survey programs and to build up soil spatial inference systems. The second challenge has
the need of mapping soil function and threats to develop a framework for the accuracy assessment of
DSM products and to introduce the time dimension (Lagacherie, 2006)
Figure 1: Generalized flowcharts for Digital Soil Mapping
Soil observations Auxiliary data
Soil spatial
inference system
Application domain
Spatially predicted soil properties and features
Spatially predicted soil classes
11
DSM is a response to the demand of quantitative soil information for environmental monitoring and
modeling. The environmental or so-called scorpan factors (scorpan is a mnemonic for factors for
prediction of soil attributes: soil, climate, organisms, parent materials, age and spatial position
proposed by McBratney et al., (2003) derived from digital elevation models (DEM), remote sensing
images, existing soil maps… are used to generate soil information in the form of a database where
most of the information consists of predictions that are statistically optimal. Figure1 summarizes the
process of digital soil mapping, where geo-referenced soil observations coupled with environmental
variables form the input data. In the spatial soil inference system, soil properties over the whole area
can be predicted and mapped using spatial soil prediction functions (such as regression, kriging…).
This prediction is based on correlations between the environmental variables and soil attributes, as
well as the spatial autocorrelation of the attributes themselves. These spatially inferred soil properties
can be used to predict more difficult – to – measure functional soil properties, for examples: field
capacity, available water capacity using pedotransfer functions under soil inference system. All of the
predicted soil properties can be used to evaluate soil functions.
There were many case studies that demonstrated the application of DSM methods in mapping soil
properties and classes, updating soil attribute maps or mapping soil feature, examining the spatial-
temporal changes in land cover… (CarréGirard, 2002; Kempen et al., 2009; Turetta et al., 2006; Yang
et al., 2011). However, in this project, we concern about the application of DSM in mapping soil types
by two methods: Multinomial Logistic Regression and Artificial Neural Networks because of their ability
to predict soil classes such as WRB – classes. These methods will be discussed in more detail in next
part.
3.2.3. Digital soil mapping methods for mapping soil types.
3.2.3.1. Multinomial Logistic Regression.
DSM involves quantitative prediction of soils and their properties using observed data and auxiliary
data on soil forming factors. The major part of the prediction is to quantitatively model the relationship
between the predictors and the dependent variable. Because it is complicated to build a non-linear
model, a model that linearizes the relationship is preferred. One of the most suitable models is the logit
model which is built using logistic regression. The logit model relates the natural logarithm of the odds
(ratio of the probability of the existence to that of non-existence) of a categorical variable to its
predictor variables (Menard, 2002). Logit model is widely used in many other areas of research for
analyzing categorical variables and it is less demanding in terms of data characteristics such as
normality and constant moments (Menard, 2002; Raimundo et al., 2006). In cases where the
dependent categorical variable has more than two categories, the multinomial logistic regression
(MLR) is used; otherwise, the binomial version is used.
The logit (ℓ) is the logarithmic function of the ratio between the probability (P) that a pixel (i) is a
member of a class (j) and the probability that it is not (1−P). Its value can be directly predicted from the
12
predictor values through regression function as adapted from Fagerland et al. (2008); Goeman and
Cessie(2006); Menard (2002):
ij = lnPij1− Pij
= a j +b1 j × Xij +b2 j × X 2i + ...+bnj × Xni (1)
Equation (1) shows how to calculate the logit (ℓ) of a category, e.g. soilgroup j, predicted from the
values of a number of quantitative factors X1 … Xn e.g. soil properties, of pixel i. The ‘a’ indicates the
intercept of the regression curve for the soil class j, the ‘b1j…nj’ are the coefficients of each predictor
‘X1…n’ for the respective soil class j. The n stands for the total number of the soil properties that
significantly correlate with the given soil group j. From equation (1), another equation (2) estimates the
probability that a given soil group j is present at pixel i (Pij) can be derived as:
1
11 ( )
ij
ijij m
j
ePe−
=
=+∑
l
l (2)
where m stands for the total number of the dependent categories, whereas the Σ indicates the
summation of the logits of all the soil groups (except the reference group) for the particular pixel i. One
of the categories, often the last in the list, is considered as reference (r) and its probability of presence
is given as:
1
1
11 ( )ij
r m
j
Pe−
=
=+∑ l
(3)
The value of ‘a’ and ‘b’ will have to be determined for each soil group based on the empirical data. The
logit models are then related to the probability models as in Equation (2) and (3) is used to predict the
probability of the reference category. The probability of the soil groups can then beused as inputs in –
for instance - the raster calculator of ArcGIS to produce a map showing the likelihood of presence of
each soil group at each pixel (Debella-GiloEtzelmuller, 2009).
There is a variety of studies that applied linear models for the predictions of soil classes: Gessler et al.
(1995) used generalized linear models to predict the presence or absence of a bleached A2 horizon
from digital terrain information; MLR was applied to predict soil drainage classes using terrain
attributes and vegetation indices by Campling et al. (2002); Debella-Gio and Etzelmuller (2009)
predicted the soil classes in Vestfold County, Norway using digital terrain analysis and MLR modeling
integrated in GIS.
3.2.3.2. Artificial Neural networks.
Artificial neural networks (ANNs) attempt to build a mathematical model that supposedly works in an
analogous way to the human brain. The design and the basic concept have been adopted from data
processing in biological nervous systems, since there are different groups of cells for reception,
13
forwarding, storage and outward release of information. Neural networks have a system of many
elements or “neurons” interconnected by communication channels or “connectors” which usually carry
numeric data, encoded by a variety of means and organized into layers (A.B. McBratney et al., 2003).
The application of an ANN consists of two stages. In the first stage, the network is trained to learn the
conditions on which a certain feature (e.g. a soil class) occurs. Each input unit (cell or neuron) of the
ANN represents a predictor variable (Figure 2): a terrain attribute (R1…Rn), a land-use unit (L1…Ln),
and/or a geological unit (G1…Gn). The output represents the target variable as the desired output (the
soil class).
Exemplified topology of a feed-forward multilayer
ANN. Each cell or unit of the input layer represents
one terrain attribute (R1…Rn), one land–use unit
(L1…Ln), or one geological unit (G1…Gn),
respectively. The input cells are connected to the
cells of the output layer (S), representing one soil
unit, via hidden cells (H1…Hn). The knowledge of the
relation between input and output is saved through
the weight (w)which are adjusted during the learning
process. I = input unit (I = 1, …, n; n = input units x
hidden units), h = hidden unit (h = 1,..., n; n = hidden
units x output units) (Behrens et al., 2005)
Figure 2: Exemplified topology of a feed-forward multilayer ANNs.
The connections exemplified by the arrows are expressed by the weights wi (wi1…win). The adjustment
of these weights which are randomly chosen at the beginning is the intrinsic learning process. As each
attribute combination (in terms of pixels of a grid map) is put into the network in succession, the
weights are adjusted iteratively if the output (S) does not match the output of a training data set.
The mean square error of the network (MSE) is used to test the performance of the ANNs and is
continuously calculated during the learning process as equation (4):
21 ( )MSE o pn
= −∑ (4)
Where o represents the observed output value for each one of n pixels and p is the predicted output.
The training has to be disrupted when the average-error function and/or the gradient of the average-
error function for the training set becomes small (Sarle, 2002 ), otherwise more iterations may cause an
over fitting effect, associated with decreasing generalization ability due to learned noise (Sarle, 2002 ).
During the second stage, the learned knowledge in terms of the calibrated weights can be applied to
prediction areas, for which the same input parameters (e.g. terrain attributes, land use, and geological
14
units) are available, but no soil map has been surveyed. The network then predicts the soil units
based on the learned weights (Behrens et al., 2005).
Neural networks are widely applied in the soil science literature, mainly for predicting soil attributes. It
also can be used to predict the probability of soil classes using multi-logit transformation of the output.
Zhu (2000) used neural networks to predict soil classes form soil environmental factors. Fidèncio et al.
(2001) applied artificial neural networks to classify soils from Sao Paulo state by means of their near-
infrared spectroscopy. Behrens et al. (2005) used artificial neural networks to spatially predict soil units
based on terrain data.
15
4. MATERIALS AND METHODS
4.1. Study area
The DSM methods are applied in Bac Ninh province in the Northern part of Vietnam. Bac Ninh is
located at 21o 05’ N latitude and 106o10’ E longitude and covers an area of about 82,300 ha. Bac Ninh
is located in a tropical monsoon region, and the average annual precipitation and temperature are
1500 mm and 230 C, respectively. It has a rather level and flat terrain; mainly sloping from North to
South and West to East. The terrain is not much dissected, field areas are 3-7m high and hill and
mountain areas are 300-400m high above sea level. The area was selected based on the availability
of most of the necessary data as well as the representativeness for the deltaic region of Vietnam.
Figure 3: Location of study area in Vietnam
4.2. Data collection
4.2.1. Soil point data
The point dataset was collected during a soil survey project in 2010 and contains 537 observations.
The observations locations are chosen based on the topography, geomorphology, and land use over
the 47,000 ha of agricultural area. At the selected locations, soil profiles were made to describe and
classify according to the WRB classification system. The soil was classified in 2 levels: the Reference
Soil Groups (RSG) and the qualifiers which describe in detail the properties of the RSG by adding a
set of uniquely defined qualifiers (WRB, 2006).
There were five WRB Reference Soil Groups found in the surveyed area:
- Fluvisols (402 samples): Genetically young, azonal soils in alluvial deposits.
16
- Acrisols (58 samples): soils having higher clay content in the subsoil than in the topsoil as a
result of pedogenetic processes (especially clay migration) leading to an argic subsoil horizon.
Acrisols have at certain depths a low base saturation and low-activity clays.
- Arenosols (7 samples): sandy soils, including both soils developed in residual sands after in
situ weathering of usually quartz-rich sediments or rocks, and soils developed on recently
deposited sands such as dunes in desert and beach lands.
- Gleysols (15 samples): wetland soils which, unless drained, are saturated with ground water
for long enough periods to develop a characteristic gleyic color pattern.
- Plinthosols (55 samples): soils with plinthite, petroplinthite or pisoliths. Plinthite is an Fe-rich
(Mn-rich), humus-poor mixture of kaolinitic clay (and other products of strong weathering such
as gibbsite) with quartz and other constituents that changes irreversibly to a layer with hard
nodules, a hardpan or irregular aggregates on exposure to repeated wetting and drying.
Petroplinthite is a continuous, fractured or broken sheet of connected, strongly cemented to
indurated nodules or mottles. Pisoliths are discrete strongly cemented to indurated nodules.
Both petroplinthite and pisoliths develop from plinthite by hardening. (WRB, 2006)
The 537 soil profiles in the surveyed area were also classified using qualifiers in addition to the WRB
Reference Soil Group. This leads to 30 different soil categories, which was considered a too high
number for digital soil mapping because of the low presence of samples in many of the categories.
This is illustrated by table 3.
Table 3. Presence of soil profiles at the most detailed categorical level
Soil category
Number
of soil
profiles
Soil category
Number
of soil
profiles
Abrupti-DystricFluvisol 1 Areni- PlinthicAcrisol 10
Areni- EutricFluvisol 5 Areni - HyperdystricAcrisol 3
Dystric- GleyicFluvisol 46 Endoferri - HyperdystricAcrisol 2
Dystric-CambicFluvisol 54 Hyperdystri - ArenicAcrisol 4
Endogleyi-CambicFluvisol 8 Hyperdystri - PlinthicAcrisol 6
Gleyi-DystricFluvisol 52 Plinthi - HyperdystricAcrisol 14
Plinthi-DystricFluvisol 33 Skeleti - HaplicAcrisol 2
Silti- EutricFluvisol 39 Veti - HyperdystricAcrisol 2
Silti-DystricFluvisol 34 Dystri - HaplicArenosol 5
Endoplinthi-DystricFluvisol 50 Fluvi - DystricArenosol 1
Epigleyi-CambicFluvisol 8 Veti - DystricPlinthosol 22
Epiplinthi-DystricFluvisol 31 Areni- DystricPlinthosol 22
Eutri-CambicFluvisol 2 Dystri - AlbicPlinthosol 7
Albi - HyperdystricAcrisol 8 Endocamni- DystricGleysol 4
Anthraqui - ArenicAcrisol 1 Fluvi- DystricGleysol 8
17
For this reason, soil data points were classified into an intermediate level. This intermediate
classification was based on some properties relevant for soil management: base saturation status as
indicator of soil fertility, texture and appearance of hard layer in the soil profile. These properties were
assigned based on the qualifiers in the profile soil classifications: eutric=high base saturation;
dystric=low base saturation; plinthic=plinthite present that may be or become a hardpan;
epigleyic=reducing conditions in upper 50 cm, endogleyic=reducing conditions between 50 and 100
cm. As a result, 15 intermediate level of soil units were classified as summarized in table 4.
Table 4. Presence of soil profiles at the intermediate categorical level
No Intermediate level
classification
Number of
soilprofiles Properties
1 Acrisol00000 4 Acrisolshaving no special property
2 Acrisol00001 11
Acrisols having a hard subsurface horizon (plinthic
horizon) which make it more difficult to work on this
soil
3 Acrisol10000 21 Acrisols having a low base saturation(dystric
qualifier) ,thus with higher fertilizers need
4 Acrisol10001 22 Acrisols having both hard subsurface horizon
(plinthic horizon) and a low base saturation
5 Arenosol10000 7 Arenosolshavinga low base saturation
6 Fluvisol0001000 9 very wet Fluvisols having reducing condition within
50cm of the soil surface
7 Fluvisol0010000 9 a wet Fluvisols that have reducing condition between
50cm and 100cm from the soil surface
8 Fluvisol0100010 42 Fluvisols have high base saturation and texture of
silt, silt loam, silty clay loam or silty clay
9 Fluvisol1000000 171 Fluvisols have low base saturation
10 Fluvisol1000010 38 Fluvisols have low base saturation and texture of silt,
silt loam, silty clay loam or silty clay
11 Fluvisol1000100 126 Fluvisols have low base saturation and a hard
subsurface horizon
12 Fluvisol0100000 2 Fluvisols have high base saturation
13 Fluvisol0100001 5 Fluvisols have high base saturation and texture of
loamy fine sand or coarser
14 Gleysol10000 15 Gleysols have low base saturation
15 Plinthosol10000 55 Plinthosols have low base saturation
18
4.2.2. Digital elevation model (DEM)
Topography is one of the most important factors which affects the soil formation, thus it may determine
the soil types in an area. Landscape position cause localized changes in moisture and temperature.
Therefore, a DEM of the area at the grid resolution of 25m was created by digitizing the topographic
map of the region. The DEM was used to derive four terrain attributes using the Saga GIS: Altitude,
Slope, Topographic wetness index and SAGA wetness index (Olaya, 2004). Those attributes may
reflect the soil forming condition in the study area.
4.2.3. Remote Sensing indices
The SPOT image from Vietnam Space Technology Institute has a resolution of 20m, and was used to
compute remote sensing indices such as Normalized Difference Vegetation Index (NDVI), Ratio
Vegetation Index (RVI) and Perpendicular Vegetation Index (PVI) by using ArcGIS. As a result, three
raster maps at a resolution of 20m were derived: NDVI map, RVI map and PVI map. Subsequently,
these maps were rescaled into a resolution of 25m in order to obtain the same map extent and grid
size as the DEM – derived attributes maps. This was done in ArcGIS.
The vegetation indices are numerical indicators that uses the visible and near-infrared bands of the
electromagnetic spectrum to assess whether the target being observed contain live green vegetation
or not. These indices are widely applied in vegetative studies and are often directly related to ground
parameters such as percent of ground cover, photosynthetic activity of the plant, surface water,… The
NDVI algorithm subtracts the red reflectance values from the near-infrared (NIR) and divides it by the
sum of near-infrared and red bands.(Rouse et al., 1973)
NDVI= (NIR-RED) / (NIR+RED)
The RVI formed by dividing the NIR radiance by the red radiance (PearsonMiller, 1972)
RVI = NIR / RED
4.2.4. Land use map
A land use map of Bac Ninh province at a scale of 1:25,000 in 2010 was produced to be a source of
ancillary data. The study area locates in the biggest deltaic region of Vietnam and paddy rice is the
dominant crop. Because the observations were obtained only in the agricultural area, the following
three main land use types were encountered in the study area: two crops per year of rice
cultivation(LUC), one crop per year of rice cultivation (LUK) and annual crops (BHK). Annual crops
include maize, potatoes, sweet potatoes, vegetables and cassava.
19
Table 5: Available ancillary data
No Data set Predictor name Resolution / Scale
1 Digital Elevation model ALTITUDE 25 m
2 Map of slope SLOPE 25 m
3 Map of Saga Wetness Index SAGAWET 25 m
4 Map of Topographicwetness index WETNESS 25 m
5 Map of NDVI NDVI 25 m
6 Map of PVI PVI 25 m
7 Map of RVI RVI 25 m
8 Land use map LU 1 : 25,0000
4.3. Multinomial logistic regression
4.3.1. The multinomial logistic regression model
Multinomial logistic regression was used to model the relationships between the Reference Soil Group
or the intermediate level soil groups (categorical dependent variables) and the terrain attributes,
remote sensing indices and land use types in the research area (quantitative predictors) using the
“nnet’ package of R. This model belongs to the family of generalized linear models and is used when
with categorical response variable. Suppose that we want to model the probability πij that observation i
in each jth class of the m soil groups j = 1 … m. In the model for predicting soil groups, the Fluvisols
(j=1) is taken as the reference class due to its dominance in the soil point data (402 of 537 samples).
In the MLR model for more detail level, the Fluvisol1000000 is the reference class for the same reason
(171/537 samples). Consequently, the base probability πi1 is computed as the residual probability after
the other classes πi2 … πim have been modeled.
Thus the model has k +1 coefficients for each of the j = m – 1 classes (leaving out the reference class):
one intercept αj and one “slope” for each predictor βlj, where l = 1 … k is a column in the model matrix.
The fitted probabilities are then:
1 1
1 1
( ... )
( ... )
2
12
, 2,...,1
1
j j i kj ik
j j i kj ik
x x
ij m x x
lm
i ijj
e j me
α β β
α β βπ
π π
+ + +
+ + +
=
=
= =+
= −
∑
∑
where xi is a vector of explanatory variables. This set of equations is fitted by maximizing the
likelihood.
The fitted α and β can then be used to assess the log-odds of an observation being classified in each
soil class, relative to the base class. That is, what is the chance that, instead of Fluvisols in Soil Group
20
level (or Fluvisol1000000 in intermediate level), the observation is in another soil group. The log-odds
are computed as:
1 11
ln ... , 2,...,ijj j i kj ik
i
x x j mπ
α β βπ
= + + + =
So, once we fit the model, we can predict the odds of some soil groups (or intermediate level of soil
groups), relative to the reference one. To recover the actual odds, the inverse logistic transformation is
used. In R-project, we use the predict function to provide the probability of all the classes (which of
course sum to 1).
4.3.2. Assessing model significance and contribution of predictors
In order to find the best model, one that provide the maximum fit for the fewest predictors, it is
important to select the predictor variables in the logistic regression model that contributes most to the
pattern in the categorical response variable. The criteria for assessing different models include the
deviance statistics and the Akaike Information Criteria (AIC). (Akaike, 1973). AIC is a measurement of
relative quality of a statistical model for a given data set. AIC deals with the trade-off between the
complexity of the model and the goodness of fit of the model, thus it provides a mean for model
selection. AIC adjusts the residual deviance for the number of predictor variables:
AIC = 2K – 2ln(L)
where K is the number of the estimated parameters included in the model, L is maximized value of the
likelihood function for the estimated model which is readily available in the statistical output, and
reflects the overall fit of the model. In itself, the AIC value for a given data set has no meaning. It
becomes interesting when it is compared to the AIC of a series of models, one with the lowest AIC
being the best model. If many models have similarly low AICs, the one with the fewest predictor
variables should be chosen.
In this research, the stepwise-forward method was used for model selection. Firstly, we begin with no
variables in the model. For each of the independent variables, the model was fitted, and then the AIC
for each model was computed and models were compared. The most influential predictor variables
which have the lowest AIC will be included in the final model firstly; other variables are added one by
one to the model in order of increasing AIC. The variables selection will stop if the AIC of the fitted
model increases. Finally, the selected model is the one have the fewest independent variables and the
lowest AIC.
4.4. Artificial neural network
ANNs are a standard technique in the range of artificial intelligence and data mining in general. They
are thus designed to learn rules from examples. In R-project, ANNs was run using the “neuralnet”
package (FritschGuenther, 2012). The package contains a very flexible function to train feed-forward
21
neural networks. It was built to train neural networks in the context of regression analysis and focuses
on multiple layer perceptrons, which are well applicable when modeling functional relationships. In
“neuralnet”, the predictors are selected using the stepwise-forward methods as described in 4.3.2.
In this study, a model of back-propagation ANN as developed to predict soil types at both Soil Group
level and the intermediate level. Back-propagation networks were trained with a back-propagation
technique which adjusted the weight and bias values along a negative gradient descent directed in an
attempt to minimize the mean squared error (MSE) between the input and output vectors of training
data set (SigillitoHutton, 1990).
The application of an ANNs consists of two stages. During the first stage, the network is trained,
meaning that it learns the conditions on which a certain soil group occurs using the calibration data
set. Each input unit (cell or neuron) of the ANNs represents a prediction variable: terrain attributes,
remote sensing indices and land use units. The output unit represents the Soil Groups or the
intermediate level of Soil Groups. The connection between neurons are described by the weight wi (wi1
… win). The adjustment of these weights depends on the learning process. As each attribute
combination (in terms of pixels of a grid map) is put into the network in succession, the weights are
adjusted iteratively if the predicted output does not match the output of a training data set. The other
network parameters including the optimum iteration learning rates, the number of hidden layer and
transfer function were adjusted after the stage of learning to train the network. During the second
stages, the learned knowledge in terms of the calibrated weights can be applied to the whole study
area, for which the same input parameters (terrain attributes, remote sensing indices and land use
maps) are available but no soil map has been surveyed. The network then predicts the soil units
based on the learned weights. (Behrens. et al., 2005)
4.5. Validation
The quality of a soil map can be determined by comparing the prediction at the calibration sites with
the observed values. However, the accuracy thus obtained, referred to as the internal accuracy, often
over-estimates the actual accuracy (Chatfield, 1995). Therefore, in this project, an independent
validation data set of 53 observations was selected randomly from the data set. The predictions based
on the dataset excluding the validation dataset are then compared with independent validation data
which were not used in the modeling.
For assessing the quality of the predicted soil maps, the map purity was used based on the confusion
matrix (Brus et al., 2011). Table 6 shows an error matrix: the row margins (the area covered by the
map units) of the matrix are known, whereas the column margins (the areas covered by the true
classes) are unknown, and must be estimated from the samples.
22
Table 6: Confusion matrix
Map
Field
1 2 ... U ∑
1 A11 A12 ... A1U A1+
2 A21 A22 ... A2U A2+
. . . ... . .
. . . ... . .
. . . ... . .
U AU1 AU2 ... AUU AU+
∑ A+1 A+2 ... A+U A
Aij = number of observations mapped as class Ci with observed soil class Cj
The overall purity is defined as the proportion of the mapped samples in which the predicted soil class,
which is the soil class as depicted on the map, equals the true soil class as determined on validation
points. In other words, it is the proportion correctly classified:
1
UUU
u
ApA=
=∑
Where U denotes the number of classes, AUU denotes the number of correctly classified observations
of map unit u and A denotes the total number of observations in the study area. A good map has a
value for map purity close to 1 (Finke, 2011).
4.6. Soil diversity indices
The diversity indices were calculated to access the variation of the predicted soil maps. In this
research, three pedodiversity indices including Shannon’s entropy H’, richness S and evenness E
were calculated for each predicted map.
• Richness (S): is the number of soil classes that exists in an area.
• Shannon’s entropy: is the most commonly used measurement of pedodiversity (Guo et al.,
2003; Ibáñez et al., 1998)
1ln
S
i ii
H p p=
= − ×∑
Where piis the proportion of area found in i-th unit over the total area of the map. When one class
dominates over the area, we have p = 1, thus Hmin= 0. The closer values of p to 1/S, the more
homogeneous the distribution of p, the more diverse the class composition is. The maximum value of
23
H is calculated as Hmax = lnS, a value close to Hmax indicate an equal proportional contribution of all
classes (MartínRey, 2000).
• Evenness (E) refers to the relative abundance of each soil class in the area. It canbedefined as:
max
' 'ln
H HEH S
= =
If each soil class is equally abundant, the evenness has high value and inversely, an area in which the
abundance of soil classes differ greatly has low evenness (A. B. McBratneyMinasny, 2007).
The diversity of a map indicates the amount of information depicted on the map: a high diversity
correspond to high information content.
4.7. Combined Index practical management
The map purity is the indicator of map quality whereas soil diversity gives you an idea about the
information content of the map. Thus, both aspects can be used to express how useful the map is. In
terms of management practices, the goal of soil mapping is to construct a map with high purity that
adequately represents soil diversity. Therefore, the combination of map purity and Shannon’s entropy
is an important index to assess the soil mapping’s performance. The combined index for accuracy and
depicted diversity was defined by multiplying H’ and map purity.
24
5. RESULTS AND DISCUSSION
5.1. The soil maps modeled by multinomial logistic regression
In model of MLR for predicting Reference Soil Group in Bac Ninh, the stepwise-forward method results
in the selection of fives prediction variables including altitude, NDVI, slope, topographic wetness index
and saga wetness index (Table 7). The Wetness indices are frequently used to simulate the soil
moisture conditions in a watershed quantitatively. Altitude and slope are very important terrain
attributes. Therefore, the combination of relief and distribution of water over the area significantly
affects the formation of soil at higher level of classification. The effects of terrain attributes on
distribution of soil groups were shown by Debella-Gilo and Etzelmuller (2009) using Multinomial
Logistic Regression. In addition, Jafari, et al (2012) also found that the degree of wetness plays a role
in the identification of soil types in a semi-arid area via the same method.
The MLR model for predicting intermediate level of Soil Group consists of the same variables with the
model above (altitude, NDVI, slope, topographic wetness index, SAGA wetness index) and land use
(Table 7). It is reasonable to expect that to predict soil class in more detail, the model need more
predictive variables because the relationship between the soil class and the covariates is more
complex at lower categorical levels. In addition, the more detail level was classified base on the soil
management properties, land use also have considerable influence on the soil definition.
Table 7: The variable used to predict soil group and intermediate level of soil group in multinomial logistic regression.
Soil class Variable in modeling
MLR
Reference Soil Group ALTITUDE+NDVI+SLOPE+WETNESSIN+SAGAWETNET
Intermediate level of Soil
Group LU+ALTITUDE+NDVI+SLOPE+WETNESSIN+SAGAWETNET
Multinomial logistic regression predicts the soil classes directly from the predictors. Figure 4 shows the
occurrence of Reference Soil Group predicted by MLR. As can be seen from the map, Fluvisols is the
dominant class over the area. This can be explained by the fact that Bac Ninh is located in Red River
delta that is the biggest delta in the North of Vietnam. Fluvisols are genetically young soil in alluvial
deposits, thus over the study area, this soil group accounts for the largest area. The good natural
fertility of this soil group make Bac Ninh become one of the highest paddy rice production region
in Vietnam.
Beside Fluvisols, Acrisols, Arenosols and Plinthosols are predicted with a very limited proportion by
MLR method. However, the model did not predict any Gleysols even though we have samples belong
to this group, too. Looking back to the input observations, it is clear that Fluvisols account for more
than 70% and four other soil groups only account for about 25% of the total number of samples. This
explains for the excessive appearance of Fluvisols compared to the others and the exclusion of
Gleysols as the output of the model (Gleysols only have 15 samples over the total of 537).
25
Figure 4: Map of Reference Soil Group predicted by Multinomial Logistic Regression
Acrisols occurs in high landscape position in the study area as compared to the topographic map
(Figure 6), which is a good prediction of the model because this soil group is often associated with hilly
or undulating topography in wet tropical climates (FAO, 2001).
Figure 5 illustrates the distribution of the intermediate level of soil group predicted by MLR. At this
level, the Reference Soil Group was reclassified based on the soil management properties to avoid
the predominance of one soil class in the input sampling. As expected, the model predicted more
detailed soil classes: 11 soil classes appear in the resultant map. Nevertheless, there is still no
occurrence of Gleysols which lead to the missing information of the model similar to the soil
group prediction.
26
Figure 5: Map of intermediate level of Soil Group predicted by Multinomial Logistic Regression
Fluvisol1000000 – Fluvisols have low base saturation - occurs in most area of Bac Ninh. Generally this
is the fertile alluvial soil, distributed over different types of terrain, but due to the long exploitation for
cultivation without appropriate land treatment reduces the soil fertility. The second dominant soil class
over the study area is Fluvisol1000100 – Fluvisols have low base saturation and a hard
subsurface horizon.
Fluvisols have high base saturation and fine texture (Fluvisol0100010) appears in both sides following
the Red river. This soil class has high fertility because the river annually deposits a certain amount of
sediment to the area around it.
The model also results in the distribution of Acrisol00000 over the hilly region but in a more extensive
area as compared to the Reference Soil Group level. The prediction of the MLR model for other soil
classes concerns very small area.
27
Figure 6: Digital Elevation Model of Bac Ninh
5.2. The soil maps modeled by artificial neural network
Artificial Neural Networks were used to estimate the probabilities of occurrence of each soil class at
the nodes of 25m raster covering Bac Ninh. Subsequently, the soil type with the largest probability at
each pixel was used to construct a prediction map. Therefore, at Reference Soil Group level, 5 models
were constructed to predict 5 Reference Soil Groups appearing in the study area. Similarly, there are
15 ANN models corresponding to 15 intermediate level of Soil groups.
The parsimonious model for prediction was selected in a similar way to Multinomial Logistic
Regression based on the smallest AIC and residual deviance. However, as shown in Table 8, the
entire chosen model for each soil class by ANNs have only one predictive variable. Surprisingly, the
increasing number of covariates led to the increasing in AIC for all models despite the fact that more
variables included in the model could describe the relationship between the target variable and the
covariates better.
28
Table 8: The variable used to predict soil group and intermediate level of soil group in artificial neural networks.
Level Soil class Variable in modeling
Reference Soil Group
Acrisols SAGAWET
Fluvisols SAGAWET
Arenosol WETNESS
Gleysol WETNESS
Plinthosol NDVI
Intermediate level of Soil
Group
Acrisol00000 ALTITUDE
Acrisol00001 NDVI
Acrisol10000 SAGAWET
Acrisol10001 ALTITUDE
Arenosol10000 NDVI
Fluvisol0001000 WETNESS
Fluvisol0010000 LU
Fluvisol0100000 ALTITUDE
Fluvisol0100001 LU
Fluvisol0100010 LU
Fluvisol1000000 PVI
Fluvisol1000010 ALTITUDE
Fluvisol1000100 PVI
Gleysol10000 LU
Plinthosol100000 NDVI
Figure 7 shows the map of Reference Soil Group constructed by ANNs model. Three out of the five
Reference Soil Groups were predicted by the model: Fluvisols, Acrisols and Plinthosols. ANNs
predicted Fluvisols in about 98% of the total area (Table 9). This was also attributed to the unequal
presence of the soil types in the observation data: more than 400 samples were Fluvisols in a 537
points dataset. Acrisols and Plinthosols having 58 and 55 samples respectively occur in the resultant
map in a very limited proportion. Arenosols and Gleysols which have the lowest number of
observations were not present in the predictive map.
29
Figure 7: Map of Reference Soil Group predicted by Artificial Neural Networks
In terms of ANNs for predicting intermediate level of Soil groups, the model predicted six soil classes
belong to the same Soil groups with higher level: Acrisols, Fluvisols and Plinthosols. Similarly, the soil
classes belonging to both Gleysols and Arenosols were not classified by the model. This map shows
similar pattern with the map produced by MLR: Fluvisols have low base saturation cover most of the
area (78.8%), Fluvisols have high base saturation and fine texture located in both sides following the
Red river, and Acrisols distribute in hilly regions.
30
Figure 8: Map of intermediate level of Soil Group predicted by Artificial Neural Networks
31
Table 9: Distribution of soil classes predicted by Multinomial Logistic Regression and Artificial Neural Networks
Soil class Area (m2) Proportion
MLR forReference
Soil Group
Fluvisol 533214375 0.961
Acrisol 7025000 0.013
Arenosol 11722500 0.021
Plinthosol 3138125 0.006
MLR forintermediate
level of Soil group
Acrisol00000 12662500 0.023
Acrisol10000 6856250 0.012
Arenosol10000 23313125 0.042
Fluvisol0010000 4143750 0.007
Fluvisol0100000 3651250 0.007
Fluvisol0100001 938125 0.002
Fluvisol0100010 25088125 0.045
Fluvisol1000000 318666250 0.574
Fluvisol1000010 36875 0.000
Fluvisol1000100 154650625 0.279
Plinthosol10000 5093125 0.009
ANNsfor Reference Soil Group
Plinthosol 832500 0.001
Acrisol 6525625 0.012
Fluvisol 547741875 0.987
ANNsforintermediate
level of Soil group
Acrisol00000 13628125 0.025
Acrisol10000 3309375 0.006
Fluvisol0100010 25390000 0.046
Fluvisol1000000 437148750 0.788
Fluvisol1000100 67043125 0.121
Plinthosol10000 8580625 0.015
5.3. Comparison of predictive methods
5.3.1. Soil map purity
The predictive soil maps were validated with independent data of 53 points collected by simple
random sampling from the dataset. The overall purity of the maps was calculated from the confusion
matrix. It has been used for many soil maps as a criterion to assess map quality. Many surveys reports
state that the intention of the soil survey was to obtain a map purity of ca. 70%, which means that the
soils should be classified correctly on about 70% of the map (Finke, 2011)
Table 9 presents the estimated purity of the soil maps predicted by Multinomial Logistic Regression
and Artificial Neural Networks at both levels. Both of the two methods get the same map purity value
32
(0.73) at the high level of soil class. This indicates a good performance of both methods in predicting
Reference Soil Group.
As expected, in terms of lower level of soil class, the map purity drops dramatically to 0.39 and 0.37
for MLR and ANNs, respectively. MLR have slightly higher purity in predictive map than that of ANNs.
Descending in the classified level introduces more properties that might be related to local conditions
and natural selection, thus can lead to the complexity of the system (Toomanian et al., 2006).
Therefore, some properties might not be included in the applied covariates and disconnection occurs
between soil classes and covariates at lower level. Digital soil mapping relies on the relationships
between soil samples and environmental factors of the target area. Weak relationships will result in
weak prediction as seen in the performance of both methods at intermediate level of Soil groups.
Jafari et al (2013) also found that soil map purity decreased toward the lower taxonomy category.
Another reason is that the number of different soil units at Reference Soil Group level is much less
than at the intermediate level (5 Reference Soil Groups compare to 15 Intermediate levels). The soil
map purity decreases due to low contrasting soil units at lower level. Olaniyan and Ogunkunle (2007)
reported that soil mapping units with high purity included very contrasting soil types.
Table 10: Map purity, diversity indices and combined index of maps predicted by MLR and ANNs
Level Map
purity Richness
Shannon
H’
Evenness
E Purity * Shannon
MLR
Reference Soil
Group 0.73 5 0.20 0.12 0.15
Intermediate
level of Soil
Group
0.39 15 1.21 0.44 0.41
ANNs
Reference Soil
Group 0.73 5 0.07 0.04 0.05
Intermediate
level of Soil
Group
0.37 15 0.77
0.28
0.29
5.3.2. Soil diversity
Table 9 shows the Richness, Shannon index and the Evenness of the resultant maps from two
methods at both taxonomic levels of soil units. It is clear to see that with increasing number of soil
units from the Reference Soil Group to the intermediate level, the diversity and the evenness rise
sharply. The greater number of soil units correspond to the higher the diversity at the lower
taxonomic level.
At the same taxonomic level, MLR always yields a higher value of the Shannon’s index than ANNs.
With the same Richness, the higher values of H’ from MLR compared to that of ANNs indicate that
33
higher soil diversity was MLR. This was confirmed above in table 8: Fluvisols predicted by MLR are
less abundant than that by ANNs model even if both methods have a very low diversity index at
Reference Soil Group level (0.2 for MLR and 0.07 for ANNs). The lower level of classification acquires
ahigher value of Shannon’s index: 1.21 for MLR and 0.77 for ANNs. This could be attributed to the
increasing number of soil map units at this level, thus induce the diversity of the predicted map. Similar
with Reference Soil Group, the diversity is higher in maps made with MLR than with ANNs.
In addition, Figure 9 and Figure 10 illustrate the relationship between the map purity, the Shannon
index and the combined index for MLR and ANNs model respectively. The diversity index always
shows the opposite trend as the soil map purity. When the soil map purity decreases, the diversity
index increases. The number of different soil units (richness) in each classification level may explain
for this. H’ is closely related to the number of soil units: if the number of different soil classes
increases, a greater number of fractions are summed in H’.
Figure 9: Variation of the purity, Shannon Index and the combined index for the map predicted by MLR at two level of soil class
The diversity indices including richness, Shannon’s index and evenness represent the deterministic
soil complexity(Jafari et al., 2013). For that reason, the increase of entropy in the study area from
Reference Soil Group to lower level indicates higher complexity of the soil system. Besides, an
increase in entropy associated with the larger number of different soil classes influences the prediction
ability of the model. When the system complexity increases, there are more different soil classes in the
area, thus the model should be trained for larger number of soil classes. It means that there are fewer
observations per class for training of the model. This raises the uncertainty of the prediction for each
soil classes and soil map purity decreases for the intermediate level of Soil groups. The soil diversity
is a reflection of the intricacy of soil maps and may therefore influence the soil map purity (Minasny et
al., 2010).
0
0.2
0.4
0.6
0.8
1
1.2
1.4
RSG -‐ MLR InterSG -‐ MLR
Purity
Shannon
Purity * Shannon
34
Figure 10: Variation of the purity, Shannon Index and the combined index for the map predicted by ANNs at two level of soil class
The combined index defined by multiplying Shannon’s entropy and map purity increases from the
Reference Soil Group level to intermediate level in both MLR and ANNs approaches. However, MLR
show higher value at both levels in comparison with ANNs as illustrated in table 9.
In terms of management practices, we need a soil map with high purity that adequately represents soil
diversity. The pedodiversity measurements are related to the density of soil map or presence of
various soil units (Jafari et al., 2013). Soil mapping methods should acquire high map purity and also,
it should represent the real soil diversity. In this research, although there are small differences in map
purity between those two predictive methods, MLR shows higher pedodiversity at both mapping levels
than ANNs does. Therefore, it seems that soil mapping will be more efficient by using Multinomial
Logistic Regression than Artificial Neural Network. In MLR methods, the map purity at Reference Soil
Group level is much higher than that value at intermediate level of Soil groups. Therefore, the model
performs much better in predicting Soil groups. However, at lower level, the model predicts better
diversity of the soil map and thus the informative value estimated by the combined index of the
intermediate level maps is higher.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
RSG -‐ ANN InterSG -‐ ANN
Purity
Shannon
Purity * Shannon
35
6. CONCLUSION
Some main conclusions can be drawn from the results of this study:
1. The Multinomial Logistic Regression could successfully be used to directly predict soil types map.
2. The soil map purity shows an opposite trend to that of the mapped soil diversity: as the purity
decreases from Soil Groups to intermediate level of Soil groups, the soil diversity increases.
3. Based on the map purity and the combined index, Multinomial Logistic Regression performed better
for predicting soil types than Artificial Neural Networks. Soil mapping at the level of Reference Soil
Group acquires a high map purity and a low diversity.
4. To improve the model performance, more observations are needed for Acrisols, Plinthosols,
Arenosol and especially Gleysols to avoid the abundance of Fluvisol over the dataset.
36
BIBLIOGRAPHY
Akaike, H. (Ed.). (1973). Information theory and an extension of the maximum likelihood principle. Budapest.
Avery, B. W. (1987). Soil survey methods: a review. Technical Monograph 18. Beckett, P. H. T. (1976). Agriculture Progress. Soil survey, 51, 33-49. Behrens, T., Forster, H., Scholten, T., Steinrucken, U., Spies, E. D., and Goldschmitt, M. (2005).
Digital soil mapping using artificial neural networks. Journal of Plant Nutrition and Soil Science, 168, 1-13.
Behrens., T., Förster, H., Scholten, T., Steinrücken, U., Spies, E.-D., and Goldschmitt, M. (2005). Digital soil mapping using artificial neural networks. Journal of Plant Nutrition and Soil Science, 168(1), 21-33.
Brus, D. J., Kempen, B., and Heuvelink, G. B. M. (2011). Sampling for validation of digital soil maps. European Journal of Soil Science, 62, 394–407.
Campling, P., Gobin, A., and Feyen, J. (2002). Logistic modeling to spatially predict the probability of soil drainage classes. Soil Science Society of America Journal, 66, 1390–1401.
Carré, F., and Girard, M. C. (2002). Quantitative mapping of soil types based on regression kringing of taxonomics distances with landform and land cover attributes. Geoderma, 111, 241-263.
Chatfield, C. (1995). Model uncertainty, data mining and statistical inference. Journal of the Royal Statistical Society, Serries A, 419-466.
Debella-Gilo, M., and Etzelmuller, B. (2009). Spatial prediction of soil classes using digital terrain analysis and multinomial logistic regression modeling integrated in GIS: Examples from Vestfold County, Norway. Catena, 77, 8-18.
Dent, D., and Young, A. (1981). Soil survey and land evaluation. London: George Allen and Unwin. Fagerland, M. W., Hosmer, D. W., and Bofin, A. M. (2008). Multinomial goodness-of-fit tests for logistic
regression models. Statistics in Medicine, 27(21). FAO. (2001). Lecture notes on the Major Soils of the World (Vol. 94). FAO. (2006). Guidelines for soil description. Rome: FAO. FAO/ISRIC/CSIC. (1988). Revised Legend of the Soil Map of the World World Soil Resources Report.
Rome. Fidèncio, P. H., Ruisanchez, I., and Poppi, R. J. (2001). Application of artificial neural networks to the
classification of soils from Sao Paulo state using near-infrared spectroscopy. Analyst, 126, 2194-2200.
Finke, P. (2011). Syllabus for the course Soil prospection and classification in the Physical Land Resources program.
Fritsch, S., and Guenther, F. (2012, 2012-09-19). Package "neuralnet". Training of neural network, (1.32).
Gessler, P. E., Moore, I. D., McKenzie, N. J., and Ryan, P. J. (1995). Soil–landscape modelling and spatial prediction of soil attributes. International Journal of Geographical Information Systems, 9, 421– 432.
Goeman, J. J., and Cessie, S. L. (2006). A goodness-of-fit test for multinomial logistic regression. Biometrics, 62(4), 980-985.
Guo, Y., Gong, P., and Amundson, R. (2003). Pedodiversity in the United States of America. Geoderma, 117, 99-115.
Hastie, T., Tibshirani, R., and Friedman, J. (2001) The elements of statistical learning: data mining, inference and prediction. Springer series in Statistics. New York: Springer-Verlag.
Hengl, T. (2003). Pedometric mapping. Bridging the gaps between conventional and pedometric approach. (PhD), Wageningen University.
Ibáñez, J. J., De-Alba, S., Lobo, A., and Zucarello, V. (1998). Pedodiversity and global soil patterns at coarse scales (with discussion). Geoderma, 83, 171-192.
Jafari, A., Ayoubi, S., Khademi, H., Finke, P. A., and Toomanian, N. (2013). Selection of taxonomic level for soil mapping using diversity and map purity indices: A case study from an Iranian arid region. Geomorphology.
Jafari, A., Finke, P., Van de Wauw, J., Ayoubi, S., and Khademi, H. (2012). Spatial prediction of USDA-great group in arid Zarand region, Iran, comparing logistic regression approaches to predict diagnostic horizons and soil types. European journal of Soil science.
Jenny, H. (1941). Factors of soil formation - a system of quantitative pedology: New York: McGraw-Hill.
37
Kempen, B., Brus, D. J., Heuvelink, G. B. M., and Stoorvogel, J. J. (2009). Updating the 1:50,000 Dutch soil map using legacy soil data: A multinomial logistic regression approach. Geoderma, 151, 311-326.
KIC, K. I. C. (1990). Munsell Soil Colors Charts. USA: Baltimore. Lagacherie, P. (2006). Chapter 1: Digital Soil Mapping: A state of the Art. In A. E. hartemink, A. B.
McBratney & M. L. Mendoca Santos (Eds.), Digital Soil Mapping with limited data (pp. 3-14): Springer.
Lagacherie, P., and McBratney, A. B. (2007). Chapter 1. Spatial soil information system and spatial soil inference systems: perspective for Diagital Soil Mapping. In P. Lagacherie, A. B. McBratney & M. Voltz (Eds.), Diagital Soil Mapping: An Introductory Perspective. (Vol. Development in Soil Science, pp. 3-24). Amsterdam: Elsevier.
Martín, M. A., and Rey, J. M. (2000). On the role of Shannon's entropy as a measure of heterogeneity. Geoderma, 98, 1-3.
McBratney, A. B., Mendoca Santos, M. L., and Minasny, B. (2003). On Digital Soil Mapping. Geoderma, 117, 3-52.
McBratney, A. B., and Minasny, B. (2007). On measuring pedodiversity. Geoderma, 141, 149-154. Menard, S. S. (2002). Applied Logistic Regression Analysis. Quantitative Applications in the Social
Sciences. Thounsand Oaks: Sage Publications. Minasny, B., McBratney, A. B., and Hartemink, A. E. (2010). Global pedodiversity, taxonomic distance,
and the World Reference Base. Geoderma, 155, 132-139. Mui, N. T. (2006). Vietnam Country Pasture/Forage Resource Profile. Rome: FAO. Olaniyan, J. O. and Ogunkunle, A. O. (2007). An evaluation of the soil map of Nigeria: II. Purity of
mapping unit. Journal of World Association of Soil and Water Conservation, J2, 97-108. Olaya, V. F. (Ed.). (2004). A gentle introduction to Saga GIS Gottingen, Germany. Pearson, R. L., and Miller, L. D. (1972). Remote mapping of standing crop biomass for estimation of
the productivity of the short-grass Prairie, Pawnee National Grasslands, Colorado. Proceedings of the Eighth International Symposium on Remote Sensing of Environment, 1357-1381.
Raimundo, R., Barbosa, A. M., and Vargas, J. M. (2006). Obtaining environmental favourability functions from logistic regression. Environmental and Ecological Statistics, 3(2).
Resende, R. J. T. P. (2000). Characterizations of the Physical Environment of Coffee Areas of the South of Minas Through SPRING. University of Lavras, UFLA, MG, Brazil
Rouse, J. W., Haas, R. H., Schell, J. A., and Deering, D. W. (1973). Monitoring vegetation systems in the Great Plains with ERTS. Washington DC.
Sarle, W. (2002 ). The IEEE Transactions on Neural Networks. Neural Network FAQ, from ftp:/ftp.sas.com/pub/neural/FAQ.html
Sigillito, V. G., and Hutton, L. V. (1990). Case study II: radar signal processing: Academic Press. Toomanian, N., Jalalian, A., Khademi, H., Eghbal, K., and M., P., A. (2006). Pedodiversity and
pedogenesis in Zayandeh-rud Valley, Central Iran. Geomorphology, 81, 376-393. Turetta, A. P. D., Mendoca Santos, M. L., Anjos, L. H. C., and Berbara, R. L. L. (2006). Chapter 22:
Spatial-Temporal Changes in Land Cover, Soil Properties and Carbon Stocks in Rio de Janeiro. In A. E. Hartemink, A. B. McBratney & M. L. Mendoca Santos (Eds.), Digital Soil Mapping with Limited data: Springer.
Van Reeuwijk, L. P., and Houba, V. J. G. (1998). Guidelines for Quality Management in Soil and Plant Laboratories. Rome: FAO.
Webster, R., and Beckett, P. H. T. (1968). Quality and usefulness of soil maps. Nature, 219, 680-682. White, R. E. (Ed.). (2005). Principles and Practice of Soil Science: The Soil as a Natural Resource:
Wiley-Blackwell. WRB, I. W. G. (2006). World Reference Base for soil resources 2006 (Vol. ). Rome: FAO. Yang, L., Jiao, Y., Fahmy, S., Zhu, A.-X., Hann, S., Burt, J. E., and Qi, F. (2011). Updating
Conventional Soil Maps through Digital Soil Mapping. Soil Science Society of America, 75(3), 1044-1053.
Zhu, A. X. (2000). Mapping soil landscape as spatial continua: the neural network approach. Water Resources Research, 36, 663-677.