evaluating the poential of digital soil mapping...

46
INTERUNIVERSITY PROGRAMME IN PHYSICAL LAND RESOURCES Ghent University Vrije Universiteit Brussel Belgium EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING TO MAP SOIL TYPES IN VIETNAM Promoter: Prof. Dr. Peter Finke Master dissertation submitted in partial fulfillment of the requirements for the degree of Master of Science in Physical Land Resources by Doan Thanh Thuy Academic Year 2012 - 2013

Upload: others

Post on 11-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

INTERUNIVERSITY PROGRAMME IN

PHYSICAL LAND RESOURCES

Ghent University Vrije Universiteit Brussel

Belgium

EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING TO MAP SOIL TYPES IN VIETNAM

Promoter: Prof. Dr. Peter Finke

Master dissertation submitted in partial fulfillment of the requirements for the degree of Master of Science in Physical Land Resources by Doan Thanh Thuy

Academic Year 2012 - 2013

I C E

Page 2: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

This is an unpublished M.Sc dissertation and is not prepared for further distribution. The author and the promoter give the permission to use this Master dissertation for consultation and to copy parts of it for personal use. Every other use is subject to the copyright laws, more specifically the source must be extensively specified when using results from this Master dissertation. Gent, The Promoter, The Author, Prof. Dr. Peter Finke Doan Thanh Thuy

Page 3: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

 

  i  

TABLE OF CONTENTS

TABLE OF CONTENTS .......................................................................................................................... i

LIST OF TABLES .................................................................................................................................. iv

LIST OF FIGURES ................................................................................................................................. v

ABSTRACT ........................................................................................................................................... vi

SAMENVATTING ................................................................................................................................. vii

1. INTRODUCTION ................................................................................................................................ 1

2. OBJECTIVES AND HYPOTHESIS .................................................................................................... 2

2.1. Objectives ................................................................................................................................... 2

2.2. Hypothesis .................................................................................................................................. 2

3. LITERATURE REVIEW ...................................................................................................................... 3

3.1. Map of soil types ........................................................................................................................ 3

3.1.1. Soil classification ................................................................................................................... 3

3.1.2. World Reference Base for Soil Classification. ....................................................................... 4

3.1.3. Major soil types in Vietnam .................................................................................................... 6

3.2. Digital soil mapping ................................................................................................................... 8

3.2.1. Soil mapping .......................................................................................................................... 8

3.2.2. Overview of Digital Soil Mapping. .......................................................................................... 9

3.2.3. Digital soil mapping methods for mapping soil types. .......................................................... 11

4. MATERIALS AND METHODS ......................................................................................................... 15

4.1. Study area ................................................................................................................................. 15

4.2. Data collection .......................................................................................................................... 15

4.2.1. Soil point data ...................................................................................................................... 15

4.2.2. Digital elevation model (DEM) ............................................................................................. 18

4.2.3. Remote Sensing indices ...................................................................................................... 18

4.2.4. Land use map ...................................................................................................................... 18

4.3. Multinomial logistic regression .............................................................................................. 19

4.3.1. The multinomial logistic regression model ........................................................................... 19

4.3.2. Assessing model significance and contribution of predictors .............................................. 20

4.4. Artificial neural network .......................................................................................................... 20

4.5. Validation .................................................................................................................................. 21

4.6. Soil diversity indices ............................................................................................................... 22

4.7. Combined Index practical management ................................................................................ 23

5. RESULTS AND DISCUSSION ......................................................................................................... 24

5.1. The soil maps modeled by multinomial logistic regression ................................................ 24

5.2. The soil maps modeled by artificial neural network ............................................................. 27

Page 4: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

 

  ii  

5.3. Comparison of predictive methods ........................................................................................ 31

5.3.1. Soil map purity ..................................................................................................................... 31

5.3.2. Soil diversity ........................................................................................................................ 32

6. CONCLUSION .................................................................................................................................. 35

BIBLIOGRAPHY .................................................................................................................................. 36

Page 5: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

 

  iii  

ACKNOWLEDGEMENTS

I am grateful to many people for help, both direct and indirect, in doing my thesis as well as my study

at Ghent University

First and foremost I would like to express my sincerest gratitude to my promoters: Prof. Dr. Peter

Finke for the continuous support of my research, for his patience, motivation, enthusiasm and

immense knowledge. This thesis cannot be finished without his encouragements and supports. Under

the supervision of my promoters, I have gained not only much of knowledge in digital soil mapping, but

also much of experiences in work organization for which I highly appreciate.

I would like to express my thankfulness to the colleagues in Department of Soil Genesis and

Classification Research - Soils and Fertilizers Research Institute charged by Dr. Tran Minh Tien for

providing me with soil observations database based on which I have built the models.

Besides, I would like to thank VLIR who have provided me with financial supports, as well as all

teachers and staffs in Ghent University who made my learning desire become realistic. My sincere

thanks also go to ITC-Gent staffs, especially two wonderful coordinators: Hilde Luyckx and Anne-

Marie Tanghe , who have helped me a lot in organizing my life and my study in Belgium.

I owe my deeply gratitude to all my ex-teachers who have given me the knowledge and promotion to

pursue higher education level. I am grateful to Dr. Tran Quoc Vinh, Dr. Le Thi Giang and Assoc Prof.

Dr. Tran Duc Vien for their supports and encouragements.

I wish to thank my family members, especially my parents. They raised me, supported me, taught me,

and loved me. To them I dedicate this thesis. The most special thanks goes to my best partner and

friend, my husband for giving me unconditional support and love.

Doan Thanh Thuy

August, 2013

Page 6: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

 

  iv  

LIST OF TABLES

Table 1. Diagnostic horizons, properties and materials for classification into World Reference Base ... 5  Table 2. Classification of Vietnam’s soil types in FAO - UNESCO ......................................................... 7  Table 3. Presence of soil profiles at the most detailed categorical level .............................................. 16  Table 4. Presence of soil profiles at the intermediate categorical level ................................................ 17  Table 5: Available ancillary data ........................................................................................................... 19  Table 6: Confusion matrix ..................................................................................................................... 22  Table 7: The variable used to predict soil group and intermediate level of soil group in MLR. ............. 24  Table 8: The variable used to predict soil group and intermediate level of soil group in ANNs. ........... 28  Table 9: Distribution of soil classes predicted by MLR and ANNs ........................................................ 31  Table 10: Map purity, diversity indices and combined index of maps predicted by MLR and ANNs .... 32  

Page 7: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

 

  v  

LIST OF FIGURES

Figure 1: Generalized flowcharts for Digital Soil Mapping .................................................................... 10  Figure 2: Exemplified topology of a feed-forward multilayer ANNs. ..................................................... 13  Figure 3: Location of study area in Vietnam ......................................................................................... 15  Figure 4: Map of Reference Soil Group predicted by Multinomial Logistic Regression ........................ 25  Figure 5: Map of intermediate level of Soil Group predicted by Multinomial Logistic Regression ........ 26  Figure 6: Digital Elevation Model of Bac Ninh ...................................................................................... 27  Figure 7: Map of Reference Soil Group predicted by Artificial Neural Networks .................................. 29   Figure 8: Map of intermediate level of Soil Group predicted by Artificial Neural Networks .................. 30  Figure 9: Variation of the purity, Shannon Index and the combined index for the map predicted by MLR

at two level of soil class ........................................................................................................................ 33  Figure 10: Variation of the purity, Shannon Index and the combined index for the map predicted by

ANNs at two level of soil class .............................................................................................................. 34  

Page 8: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

 

  vi  

ABSTRACT

There has been considerable expansion in the application of digital soil mapping (DSM) techniques

because it could help to save much time and costs for collecting and analyzing soil data points

compared to conventional methods. This research aims to assess the potential of mapping soil types

in a Northern region of Vietnam based on the comparison between two DSM methods: Multinomial

Logistic Regression (MLR) and Artificial Neural Networks (ANNs).

Eight predictive variables were derived from the ancillary data including land use, altitude, slope,

NDVI, PVI, RVI, Topographic Wetness Index and SAGA Wetness Index. MLR and ANNs models were

constructed to predict soil classes at 2 levels: WRB-Reference Soil Group and intermediate level of

Soil Group between Reference Soil Group and the full WRB soil name. The map quality was indicated

by the soil map purity estimated with an independent validation dataset. The diversity indices were

calculated to assess the information content of the resultant maps. Selection of the best model is

based on the soil map purity, the Shannon’s entropy and a combined index.

At both taxonomic levels, MLR yields higher map purity than ANNs. When the taxonomic level

changed from Reference Soil Group level to intermediate level the map purity decreases while the

value of the diversity indices increases. Therefore, soil mapping using MLR in predicting Reference

Soil Group will be more efficient. However, at intermediate level, the model predicts higher diversity of

soil map and thus the informative value estimated by the combined index is higher.

Page 9: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

 

  vii  

SAMENVATTING

De toepassing van digitalebodemkarteringsmethoden (DSM) is sterktoegenomen ten opzichte van

meerconventionelemethodenomdathiermeetijd en kostenkunnenwordenbespaardbij de verzameling

en analyse van bodemdata op de puntschaal. Ditonderzoekrichtzich op eeninschatting van het

potentieel van twee DSM-methodenbij de kartering van bodemklassen in eenregio in het noorden van

Vietnam: MultinomialeLogistischeRegressie (MLR) en ArtificiëleNeuralenetwerken (ANNs).

Acht predictive variabelen, afgeleid van hulpgegevenswerdengebruiktombodemklassentekarteren op

het niveau van de WRB-Reference Soil Group en eenniveautussen de Reference Soil Group en de

volledige WRB-naam. Dezepredictievevariabelenomvatten het landgebruik, de terreinhoogte, de

helling, de NDVI, PVI, RVI, de ‘Topographic Wetness Index’ en de ‘SAGA Wetness Index’. De

kaartkwaliteitwerdaangeduid met de kaartzuiverheid, welkewerdgeschat met

eenonafhankelijkegegevensset. Diversiteitsindiceswerdenberekendom het informatiegehalte van de

resultaatkaarten in teschatten. De selectie van het beste model is gebaseerd op de kaartzuiverheid,

Shannon’s entropy en eengecombineerde index.

Op beidetaxonomischeniveausgeeft MLR eenhogerekaartzuiverheiddan ANNs. Bijverandering van het

taxonomischniveau van Reference Soil Group naar het tussenliggendeniveauneemt de

kaartzuiverheidafterwijl de waarde van de diversiteitsindicestoeneemt.

Daaromzalbodemkarteringgebruikmakend van MLR om Reference Soil Groups

tekarterenefficienterzijn. Echter, op het tussenliggendeniveauvoorspelt het model eenhogerediversiteit

in de bodemkaart en is dushetinformatiegehaltezoalsingeschat door de gecombineerde index hoger

Page 10: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

 

  1  

1. INTRODUCTION

Soil remains one of the most important, yet most abused natural resources on the planet; indeed a

responsible management of soil resources plays a critical role in the survival and prosperity of many

nations around the world (White, 2005). Soil is limited in quantity and degradable in quality. Soil is an

irreplaceable capital good in all the productive activities of human and plays central role in the natural

environment.

The understanding of soil properties and behavior strongly support sustainable land management.

During the last decade, increasing attention has been paid to the soil resource in order to understand

the internal mechanisms that define its nature as well as its relationship with other environmental

factors. One of the most helpful and functional tools to study soil science is soil mapping. Many

countries have been involved in making maps of their soils to determine the range of soil types in their

territory, where they occur and how they can be used efficiently. Soil mapping is the combination of

locating and identifying the different soil types by collecting information about their location, properties

and potential use, and recording this information on maps and all supporting documents.

Modern users of soil geo-information require maps at detailed scales. The technological and

theoretical advances in the last 20 years have led to a number of new methodological improvements

in the field of soil mapping. Most of these belong to the domain of a new emerging discipline –

pedometrics – for the quantitative, (geo)statistical production of soil geoinformation. Pedometrics is

strongly focused on predictive or digital soil mapping (DSM). DSM embraces a set of quantitative

mapping methods that have developed from more traditional soil mapping techniques. There were

various case studies that demonstrated the application of DSM methods in mapping soil properties

and classes, updating soil attribute maps or mapping soil features (CarréGirard, 2002; Jafari et al.,

2012; Kempen et al., 2009; Yang et al., 2011). Because traditional methodologies are costly and time-

consuming, the use of DSM methods has increased and has resulted in improvements in soil survey

and classification steps, also allowing the application of the results in other similar landscapes

(Resende, 2000).

Vietnam has a total land area of 32.924.064 ha (9.345.346 ha of agriculture land, 11.575.429 ha of

forestry land, 1.532.843 ha of special used land, 443.147 ha of residential land and 10.027.265 ha of

un-used land). Vietnam is a developing country; agriculture has played a key role in the economy

coupled with the dramatic development of industry and service. These create a huge pressure in using

land resources. Its government attached great importance to the appropriate management and use of

land to serve the needs of production and people’s lives, on the basis of sustainable development.

In Vietnam, the national map of soil types at a scale of 1:1,000,000 was published by the Vietnamese

Soil Science Association in 1996 using the classification of FAO (FAO/ISRIC/CSIC, 1988). Although

soil mapping has made certain progress, conventional soil maps produced in the past decades are the

major data sources for information on the spatial variation of soil. They are limited in terms of both the

level of spatial detail and the accuracy of soil attributes as well as high requirements of costs and time.

Page 11: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

 

  2  

2. OBJECTIVES AND HYPOTHESIS

2.1. Objectives

The overall objective of this project is to propose a digital soil mapping method which is capable of

mapping soil types of Vietnam in more detail and requires lower costs for soil survey. To do this, the

following tasks were identified:

1. Apply two chosen methods: Multinomial Logistic Regression and Artificial Neural

Networks on mapping soil types at 2 levels: WRB-Reference Soil Group and intermediate

level of reference soil group with respect to the soil management in the test area,

respectively.

2. Validate the resultant maps using independent validation data.

3. Select the Digital Soil Mapping method based on the comparison of the two methods in

the test area at the 2 classification levels by evaluation of the quality (taxonomy purity)

and the information content of the resulting maps.

2.2. Hypothesis

It is possible to sample a reference area including most of the soil types of a region. Based on this

area, the prediction of soil distribution in other areas may be facilitated if there are enough data

observations and other ancillary data.

Page 12: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

  3  

3. LITERATURE REVIEW

3.1. Map of soil types

3.1.1. Soil classification

Classification of soils is finding some common properties or behavior between individual soil profiles in

order to make meaningful classes to help us organize our knowledge and simplify our decision-making

in soil management. Soils profiles were classified by grouping them into classes, for example soil

series. These classes form other objects and then can be classified into still more general classes, e.g.

Reference Soil Group. This is a hierarchic classification, and is common in soil science.

Why do we classify soil?

Classification is an essential part of data reduction process, whereby complex sets of observations are

made understandable. Another obvious reason for classifying is to save time and simplify our

description. Simply, if many of the soil types have three or four properties in common, it is sensible to

use one short name for them all in order to be easier to remember and define the relationships among

them. Classification studies the number and composition of the groups in a set of data, which allows

the human mind to recall information and relate entities and attributes to each other. One important

function of soil classification is to accommodate the apparent individuals in the most satisfactory

manner so as to permit the compilation of legible and meaningful soil maps. It also facilitates the

prediction of unknown soil types which is based on the observed property ranges and the factors

which govern soil formation. Finally, soil classification enables a concise description of the spatial

variation of soil as a three dimensional multivariate system.

Soil classification and soil maps are important basic documents for soil survey, evaluating soil, land

management, land-use planning and agricultural planning. Depending on the characteristics and

nature of each soil types, managers can allocate appropriate land-use economically and sustainably.

The development of many soil classification systems all over the world reflect different views based on

the concepts of soil formation and mirror differences of opinion about the criteria to be used for

classification. The two most important international scientific soil classification systems that are still

being developed and maintained are both diagnostic systems, and hence are based on the absence

or presence of diagnostic properties:

* USDA (United States Department of Agriculture): Soil Taxonomy.

* FAO (Food and Agriculture Organization) of the United Nations and UNESCO: World Reference

Base for Soil Resources.

In these systems, diagnostic properties of the soil are derived from the subdivision of the soil profile

into horizons and the soil properties of each of these horizons. The (hierarchical) classification is done

like in other determination systems (flora, fauna) by means of determination keys (Finke, 2011).

Page 13: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

  4  

In this project, World Reference Base (WRB) was selected as a base for correlation of soil units in

Vietnam. WRB is designed for such a purpose to serve “as an easy means of communication amongst

scientists to identify, characterize and name major types of soils, to be a tool for better correlation

between national systems, to act as common dominator through which national system can be

compared” (WRB, 2006).

3.1.2. World Reference Base for Soil Classification.

WRB is the international standard soil classification system sanctioned by the International Union of

Soil Science. It was developed by an international collaboration coordinated by the International Soil

Reference and Information Center (ISRIC) and sponsored by the International Union of Soil Science

(IUSS) and the FAO. It replaces the FAO Legend for the Soil Map of the World.

Classification principles:

Stepwise, classification of a soil in WRB proceeds as follows:

1. Soil morphology and horizonation is described according to the FAO-guidelines for soil

descriptions (FAO, 2006). Colors are recorded using the Munsell Colors Charts (KIC, 1990).

Chemical and physical characteristics are determined conform to the Procedures for Soil

analysis (Van ReeuwijkHouba, 1998).

2. Diagnostic horizons, properties and materials are inferred.

3. The classification itself is a 2-tier approach:

a. The first level is the classification into the Reference Soil Group (RSG), using the

classification key. In a specified, mandatory order, each RSG is tested against the identified

diagnostic horizons, properties and materials. The first RSG that gives a fully positive test

results is the classified RGS.

b. In the second level, the RSG is further specified using prefix and suffix qualifiers. Prefix

qualifiers, RSG name and suffix qualifiers together, in a prescribed order, make the full

taxonomic name. Each RSG has a unique set of these prefix and suffix qualifiers that are

eligible. The election of a qualifier is again based on diagnostic horizons, properties and

materials (Finke, 2011).

Diagnostic horizons, properties and materials

Tab.1 gives an overview about the diagnostic criteria of the horizons, properties and materials for

classification (WRB, 2006). The inference of these diagnostics from the soil description and laboratory

measurements is done by application of logical (AND, EITHER, OR) operators to diagnostic criteria

using profile data and measurements.

Page 14: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

  5  

Table 1. Diagnostic horizons, properties and materials for classification into World Reference Base (WRB, 2006)

Diagnostic horizons Diagnostic properties Albic Salic Abrupt textural change

Anthraquic Sombric Albeluvic tonguing

Anthric Spodic Andic properties

Argic Takyric Aridic properties

Calcic Terric Continuous rock

Cambic Thionic Ferralic properties

Cryic Umbric Geric properties

Duric Vertic Gleyiccolour pattern

Ferralic Voronic Lithological discontinuity

Ferric Yermic Reducing conditions

Folic Secondary carbonates

Fragic Stagniccolour pattern

Fulvic Vertic properties

Gypsic Vitric properties

Histic

Hortic

Hydragric Diagnostic materials

Irragric Artefacts

Melanic Calcaric material

Mollic Colluvic material

Natric Fluvic material

Nitic Gypsiric material

Petrocalcic Limnic material

Petroduric Mineral material

Petrogypsic Organic material

Petroplinthic Ornithogenic material

Pisoplinthic Sulphidic material

Plaggic Technic hard rock

Plinthic Tephric material

Salic

Elements for lower level units

Soil subunits can be identified in WRB in the second level of classification. In this level, so-called

qualifiers are added to the RSG name. Each RSG has a unique list of qualifiers that can (or not) be

selected based on the presence of diagnostic horizons, properties or materials. There are two groups

of qualifiers:

1. Prefix qualifiers: these qualifiers describe properties of the RSG that are either:

a. Typically associated to the RSG;

b. Intergrades to other RSG;

Page 15: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

  6  

c. The haplic prefix qualifier is only used when no typically associated or intergraded qualifiers apply.

2. Suffix qualifiers: these qualifiers give additional information on the RSG and are related to either:

a. Diagnostic horizons, properties or materials,

b. Chemical properties,

c. Physical characteristics,

d. Mineralogical characteristics,

e. Surface characteristics,

f. Texture,

g. Colour,

h. Other characteristics.

3.1.3. Major soil types in Vietnam

Vietnam has a total land area of 32.924.064 ha, population of 90.549.390 in 2011 with a growth rate of

1.02%. Viet Nam can be divided into four physiographic regions: the Annamese extending from north

to south through west-central Viet Nam, the Red River delta in the north, the Mekong River delta in the

south, and the coastal plain in the east. The extremely rugged and densely forested Cordillera, a

southward extension of the Yunnan Plateau, covers about two-thirds of the country. Parallel

northwest-southeast ranges with several peaks rising to more than 1,800 meters dominate the

northern half, and a series of heavily eroded longitudinal plateaus average elevation 750 to 1,500

meters extend into the southern half.

According to the reports of national project named “Mapping soil types of Vietnam using the

classification of World Reference Base for Soil Resources of FAO”, Vietnam has twenty one soil

groups with 61 soil units (Table 2). However, for easier evaluation these soils can be grouped into 2

big combinations:

- Mountainous and hilly soils: Most are Acrisols, Ferralsols or Alisols. Under annual cropping,

without reasonable improving measure, the soil is rapidly degraded. The mountainous and hilly

soils should be reserved for forestation, cultivation of perennial crops, and fruit crops with

appropriate protection measures.

- Delta soils: The centers of food production are mainly the deltas of the Red River, the Mekong

River and other rivers. These are regions with high levels of intensive cultivation and crop

intensity. With irrigation, moisture is sufficient, the rate of soil degradation is low; alluvial deposits

bring fertility annually; this is often augmented by organic and mineral fertilizers (Mui, 2006).

Page 16: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

  7  

Table 2. Classification of Vietnam’s soil types in FAO - UNESCO

In recent years, the significant socio-economic development of Vietnam along with the high growth

rate of the population has caused intense pressure on soil resources. In order to manage that

important resource sustainably and reasonably, it was needed to have a national soil map. This

document should provide basic data about soil characteristics and properties, which are very

necessary information for using land. In 1996, a national soil map of Vietnam at a scale of 1/1,000,000

was published by Vietnamese Soil Science Association using the classification of FAO (FAO, 1988,

No Symbol Name FAO-UNESCO No Symbol Name

FAO-UNESCO I AR XI VR Vertisols

1 ARl LuvicArenosols 34 VRe Eutric Vertisols 2 ARr RhodicArenosols 35 VRd Dystric Vertisols 3 ARh HaplicArenosols XII LX Lixisols 4 ARb CambicArenosols 36 LXh HaplicLixisols 5 ARa AlbicArenosols 37 LXx ChromicLixisols 6 ARg GleyicArenosols 38 LXh HaplicLuvisols 7 ARo FerralicArenosols XIII CL Calcisols II SC Solonchaks 39 CLh HaplicCalcisols 8 SCg GleyicSolonchaks 40 CLl LuvicCalcisols 9 SCh HaplicSolonchaks XIV PT Plinthosols

10 SCm MollicSolonchaks 41 PTd DystricPlinthosols III FLt ThionicFluvisols 42 PTa AlbicPlinthosols 11 GLtp Proto-ThionicGleysols 43 PTu Humic Plinthosols 12 FLto Orthi-ThionicFluvisols XV PD Podzoluvisols IV FL Fluvisols 44 PDd DystricPodzoluvisols 13 FLe EutricFluvisols 45 PDg GleyicPodzoluvisols 14 FLd DystricFluvisols XVI AC Acrisols 15 FLg GleyFluvisols 46 ACh HaplicAcrisols 16 FLu UmbricFluvisols 47 ACp PlinthicAcrisols 17 FLb CambicFluvisols 48 ACg GleyicAcrisols V GL Gleysol 49 ACf FerricAcrisols

18 GLe EutricGleysol 50 ACu Humic Acrisols 19 GLd DystricGleysol XVII NT Nitisols 20 GLu UmbricGleysol 51 NTh HaplicNitisols VI HS Histosols 52 NTr RhodicNitisols 21 HSf FibricHistosol XVIII FR Ferralsols 22 HSt ThionicHistosol 53 FRr RhodicFerralsols VII SN Solonetz 54 FRx XanthicFerralsols 23 SNh HaplicSolonetz 55 FRp PlinthicFerralsols 24 SNg GleyicSolonetz 56 FRu Humic Ferralsols

VIII CM Cambisols XIX AL Alisols 25 CMe EutricCambisols 57 ALh Humic Alisols 26 CMd DystricCambisols 58 ALg GleyicAlisols IX AN Andosols 59 ALu HisticAlisols 27 ANh HaplicAndosols XX LP Leptosols 28 ANm MollicAndosols 60 LPq LithicLeptosols X LV Luvisols XXI AT Anthrosols

29 LVf FerricLuvisols 61 AT Anthrosols 30 LVg GleyicLuvisols 31 LVk CalcicLuvisols 32 LVx ChromicLuvisols 33 LVq LithicLuvisols

Page 17: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

  8  

FAO, 1994). This soil map was built by conventional soil mapping methods which are generally

created using free survey. The soil surveyor employs a conceptual soil-landscape model to select

observation locations at which the most useful information is likely to be obtained. The average area

for each observation was 1920 ha. The soil samples then were analyzed in the laboratory. Landscape

features as seen in the field and expert experiments are also taken into account to describe the soil

profile. The soil map has a set of soil profile descriptions. Each map unit is characterized by one or

more representative soil profiles of the soil types that comprise the map unit. These profiles are used

for the interpretation of the soil map.

Subsequently, some regions and provinces also have soil maps at larger scale, for example: Soil map

of Tay Nguyen region at scale of 1/100.000, Soil map of Nam Dinh, Ninh Binh at scale of 1/50.000…

Nevertheless, because of the lack of government funds, there are still many regions and areas in

Vietnam that do not have soil map which is a very important material to manage and use land

efficiently. Therefore, a new method which can map soil in more detail but cost less than conventional

soil mapping method is needed in order to deal with those problems in Vietnam.

3.2. Digital soil mapping

3.2.1. Soil mapping

Soil mapping or soil survey is a process of determining the spatial distribution of physical, chemical

and descriptive soil properties and presenting it in understandable and interpretable form to various

users (Beckett, 1976; DentYoung, 1981). Traditional soil mapping consists of the following steps:

- Project planning;

- Preparation for fieldwork;

- Photo-interpretation and pre-processing of auxiliary data;

- Collecting field data and laboratory analysis;

- Data input and organization

- Presentation and application of soil mapping products.

Project planning is especially important step for a success of soil survey project because it includes

definition of a sampling plan, inspection density, classification system and data organization system.

Preparation for fieldwork typically includes literature study and reconnaissance surveys. The end

product of a soil mapping project is a soil resource inventory, i.e. a map showing distribution of soils

and its properties accompanied by a soil survey report (Avery, 1987)

Due to the significant development of informatics, the soil resource inventory data is organized into a

thematic type of geographic information system called a Soil Information System (SIS), of which the

major part is a Soil Geographical Database (SGDB) (Burrough, 1991). This is a combination of spatial

data (map of polygon and point) closely linked with attribute data for profile observations, soil mapping

units, soil classes and all relevant data. SIS was not only applied to soil science but also on a wide

range of civil applications such as planning, urban administration, environment… It offers not only the

Page 18: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

  9  

information on soils but also on their potential (and actual) use, environmental risks involved (e.g.

erosion risk) and gives prediction of soil behavior on intended management (Hengl, 2003).

Soil mapping projects vary in the inspection intensity levels, purpose and type of conceptual models

used. In view of the intensity levels, soil mapping projects range from small scale (1:100 K to 1: 1 M)

surveys to medium (1:50 K) and large scale surveys (1:25 K to 1:5 K or larger). Considering the

intentional purposes, a soil mapping project can be classified as the special purpose (commonly

referred to as thematic) and general purpose. The former is completely demand-driven and focuses on

a limited set of soil variables or a single soil variable, the latter is more holistic, but also more complex,

thus more costly and often not affordable at large scale. The conceptual models of soils reflect the

purpose of the mapping project: (i) special-purpose mapping projects commonly follow the continuous

model of spatial variation, thus geostatistical techniques are used to make prediction; (ii) general-

purpose mapping projects commonly rely on photo-interpretation and profile descriptions, following the

discrete model of spatial variation (Hengl, 2003).

It is not easy to cope with soil variation from the beginning of the soil mapping. Soil variables vary not

only horizontally but also with depth, not only continuously but also abruptly. Soil mapping requires

much denser field inspections in comparison with vegetation or land use mapping. Furthermore, soil

horizons and soil types are often hard to be distinguished or measured. Especially the polygenetic

nature of soils has always been a main problem in description and classification of soils (Jenny, 1941).

Many pioneer soil geographers have wondered whether they will be able to fully describe the patterns

of soil cover (Jenny, 1941). The quality and usefulness of the polygon-type soil maps has for decades

been an object of argue (WebsterBeckett, 1968). However, it is obvious that the technological and

theoretical progress in the last 30 years have led to a dramatic improvement in mapping soil

methodology. Most of these belong to the new emerging discipline: Digital Soil Mapping (DSM)

3.2.2. Overview of Digital Soil Mapping.

The great expansion in informatics has yielded huge amounts of data and tools in all fields of

application. Soil science is no exception, with the ongoing development of regional, national,

continental and worldwide database. The challenge of understanding these large stores of data has

led to the development of new tools in the field of statistics and spawned new areas such as data

mining and machine learning (Hastie et al., 2001). In soil science, the development of GIS, GPS,

Remote Sensing and data sources such as digital elevation models (DEMs) is leading to new ways

forward. These techniques provide wide range of soil data and information for environmental

monitoring and modeling.

Worldwide, there are more and more researchers that investigate the potential of applying the new

techniques of information technology and science to soil survey and soil mapping. The main principle

is soil assessment using GIS, for example the digital soil property and class maps with the constraint

of limited fieldwork and laboratory analysis which are very expensive. DSM is the next great

advancement in delivering soil survey information.

Page 19: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

  10  

DSM is a spatial soil information system created by numerical models that account for the spatial and

temporal variations of soil properties based on soil information and related environmental variables

(LagacherieMcBratney, 2007). Pedologists working with DSM technology are dealing with various

topics: the production and processing of covariates (soil forming factors derived from remote sensing,

digital terrain models, existing soil maps, et cetera), the collection of soil data, the development of soil

predictions based on numerical models, the evaluation of the quality and the representation of digital

soil maps. The recent advances and open questions within each of these topics are already examined

with a certain success.

The world’s overpopulation of the human race and associated pressures on resources, necessitate the

immediate need for valuable soil information to make informed decisions about the soil resource as

well as make people aware of the problems and potential problems. We do not have enough time or

resources to canvass the earth to make soil surveys by our traditional methods. DSM would be able to

deliver the needed information and may provide better and more accurate information. DSM is a

credible alternative to fulfill the increasing worldwide demand in spatial soil data due to its ability to (i)

increase spatial resolutions and enlarge extents and (ii) convey relevant information. The first

challenge requires developing a specific spatial data infrastructure for DSM, to implement DSM in

existing soil survey programs and to build up soil spatial inference systems. The second challenge has

the need of mapping soil function and threats to develop a framework for the accuracy assessment of

DSM products and to introduce the time dimension (Lagacherie, 2006)

Figure 1: Generalized flowcharts for Digital Soil Mapping

Soil observations Auxiliary data

Soil spatial

inference system

Application domain

Spatially predicted soil properties and features

Spatially predicted soil classes

Page 20: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

  11  

DSM is a response to the demand of quantitative soil information for environmental monitoring and

modeling. The environmental or so-called scorpan factors (scorpan is a mnemonic for factors for

prediction of soil attributes: soil, climate, organisms, parent materials, age and spatial position

proposed by McBratney et al., (2003) derived from digital elevation models (DEM), remote sensing

images, existing soil maps… are used to generate soil information in the form of a database where

most of the information consists of predictions that are statistically optimal. Figure1 summarizes the

process of digital soil mapping, where geo-referenced soil observations coupled with environmental

variables form the input data. In the spatial soil inference system, soil properties over the whole area

can be predicted and mapped using spatial soil prediction functions (such as regression, kriging…).

This prediction is based on correlations between the environmental variables and soil attributes, as

well as the spatial autocorrelation of the attributes themselves. These spatially inferred soil properties

can be used to predict more difficult – to – measure functional soil properties, for examples: field

capacity, available water capacity using pedotransfer functions under soil inference system. All of the

predicted soil properties can be used to evaluate soil functions.

There were many case studies that demonstrated the application of DSM methods in mapping soil

properties and classes, updating soil attribute maps or mapping soil feature, examining the spatial-

temporal changes in land cover… (CarréGirard, 2002; Kempen et al., 2009; Turetta et al., 2006; Yang

et al., 2011). However, in this project, we concern about the application of DSM in mapping soil types

by two methods: Multinomial Logistic Regression and Artificial Neural Networks because of their ability

to predict soil classes such as WRB – classes. These methods will be discussed in more detail in next

part.

3.2.3. Digital soil mapping methods for mapping soil types.

3.2.3.1. Multinomial Logistic Regression.

DSM involves quantitative prediction of soils and their properties using observed data and auxiliary

data on soil forming factors. The major part of the prediction is to quantitatively model the relationship

between the predictors and the dependent variable. Because it is complicated to build a non-linear

model, a model that linearizes the relationship is preferred. One of the most suitable models is the logit

model which is built using logistic regression. The logit model relates the natural logarithm of the odds

(ratio of the probability of the existence to that of non-existence) of a categorical variable to its

predictor variables (Menard, 2002). Logit model is widely used in many other areas of research for

analyzing categorical variables and it is less demanding in terms of data characteristics such as

normality and constant moments (Menard, 2002; Raimundo et al., 2006). In cases where the

dependent categorical variable has more than two categories, the multinomial logistic regression

(MLR) is used; otherwise, the binomial version is used.

The logit (ℓ) is the logarithmic function of the ratio between the probability (P) that a pixel (i) is a

member of a class (j) and the probability that it is not (1−P). Its value can be directly predicted from the

Page 21: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

  12  

predictor values through regression function as adapted from Fagerland et al. (2008); Goeman and

Cessie(2006); Menard (2002):

ij = lnPij1− Pij

= a j +b1 j × Xij +b2 j × X 2i + ...+bnj × Xni (1)

Equation (1) shows how to calculate the logit (ℓ) of a category, e.g. soilgroup j, predicted from the

values of a number of quantitative factors X1 … Xn e.g. soil properties, of pixel i. The ‘a’ indicates the

intercept of the regression curve for the soil class j, the ‘b1j…nj’ are the coefficients of each predictor

‘X1…n’ for the respective soil class j. The n stands for the total number of the soil properties that

significantly correlate with the given soil group j. From equation (1), another equation (2) estimates the

probability that a given soil group j is present at pixel i (Pij) can be derived as:

1

11 ( )

ij

ijij m

j

ePe−

=

=+∑

l

l (2)

where m stands for the total number of the dependent categories, whereas the Σ indicates the

summation of the logits of all the soil groups (except the reference group) for the particular pixel i. One

of the categories, often the last in the list, is considered as reference (r) and its probability of presence

is given as:

1

1

11 ( )ij

r m

j

Pe−

=

=+∑ l

(3)

The value of ‘a’ and ‘b’ will have to be determined for each soil group based on the empirical data. The

logit models are then related to the probability models as in Equation (2) and (3) is used to predict the

probability of the reference category. The probability of the soil groups can then beused as inputs in –

for instance - the raster calculator of ArcGIS to produce a map showing the likelihood of presence of

each soil group at each pixel (Debella-GiloEtzelmuller, 2009).

There is a variety of studies that applied linear models for the predictions of soil classes: Gessler et al.

(1995) used generalized linear models to predict the presence or absence of a bleached A2 horizon

from digital terrain information; MLR was applied to predict soil drainage classes using terrain

attributes and vegetation indices by Campling et al. (2002); Debella-Gio and Etzelmuller (2009)

predicted the soil classes in Vestfold County, Norway using digital terrain analysis and MLR modeling

integrated in GIS.

3.2.3.2. Artificial Neural networks.

Artificial neural networks (ANNs) attempt to build a mathematical model that supposedly works in an

analogous way to the human brain. The design and the basic concept have been adopted from data

processing in biological nervous systems, since there are different groups of cells for reception,

Page 22: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

  13  

forwarding, storage and outward release of information. Neural networks have a system of many

elements or “neurons” interconnected by communication channels or “connectors” which usually carry

numeric data, encoded by a variety of means and organized into layers (A.B. McBratney et al., 2003).

The application of an ANN consists of two stages. In the first stage, the network is trained to learn the

conditions on which a certain feature (e.g. a soil class) occurs. Each input unit (cell or neuron) of the

ANN represents a predictor variable (Figure 2): a terrain attribute (R1…Rn), a land-use unit (L1…Ln),

and/or a geological unit (G1…Gn). The output represents the target variable as the desired output (the

soil class).

Exemplified topology of a feed-forward multilayer

ANN. Each cell or unit of the input layer represents

one terrain attribute (R1…Rn), one land–use unit

(L1…Ln), or one geological unit (G1…Gn),

respectively. The input cells are connected to the

cells of the output layer (S), representing one soil

unit, via hidden cells (H1…Hn). The knowledge of the

relation between input and output is saved through

the weight (w)which are adjusted during the learning

process. I = input unit (I = 1, …, n; n = input units x

hidden units), h = hidden unit (h = 1,..., n; n = hidden

units x output units) (Behrens et al., 2005)

Figure 2: Exemplified topology of a feed-forward multilayer ANNs.

The connections exemplified by the arrows are expressed by the weights wi (wi1…win). The adjustment

of these weights which are randomly chosen at the beginning is the intrinsic learning process. As each

attribute combination (in terms of pixels of a grid map) is put into the network in succession, the

weights are adjusted iteratively if the output (S) does not match the output of a training data set.

The mean square error of the network (MSE) is used to test the performance of the ANNs and is

continuously calculated during the learning process as equation (4):

21 ( )MSE o pn

= −∑ (4)

Where o represents the observed output value for each one of n pixels and p is the predicted output.

The training has to be disrupted when the average-error function and/or the gradient of the average-

error function for the training set becomes small (Sarle, 2002 ), otherwise more iterations may cause an

over fitting effect, associated with decreasing generalization ability due to learned noise (Sarle, 2002 ).

During the second stage, the learned knowledge in terms of the calibrated weights can be applied to

prediction areas, for which the same input parameters (e.g. terrain attributes, land use, and geological

Page 23: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

  14  

units) are available, but no soil map has been surveyed. The network then predicts the soil units

based on the learned weights (Behrens et al., 2005).

Neural networks are widely applied in the soil science literature, mainly for predicting soil attributes. It

also can be used to predict the probability of soil classes using multi-logit transformation of the output.

Zhu (2000) used neural networks to predict soil classes form soil environmental factors. Fidèncio et al.

(2001) applied artificial neural networks to classify soils from Sao Paulo state by means of their near-

infrared spectroscopy. Behrens et al. (2005) used artificial neural networks to spatially predict soil units

based on terrain data.

Page 24: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

  15  

4. MATERIALS AND METHODS

4.1. Study area

The DSM methods are applied in Bac Ninh province in the Northern part of Vietnam. Bac Ninh is

located at 21o 05’ N latitude and 106o10’ E longitude and covers an area of about 82,300 ha. Bac Ninh

is located in a tropical monsoon region, and the average annual precipitation and temperature are

1500 mm and 230 C, respectively. It has a rather level and flat terrain; mainly sloping from North to

South and West to East. The terrain is not much dissected, field areas are 3-7m high and hill and

mountain areas are 300-400m high above sea level. The area was selected based on the availability

of most of the necessary data as well as the representativeness for the deltaic region of Vietnam.

Figure 3: Location of study area in Vietnam

4.2. Data collection

4.2.1. Soil point data

The point dataset was collected during a soil survey project in 2010 and contains 537 observations.

The observations locations are chosen based on the topography, geomorphology, and land use over

the 47,000 ha of agricultural area. At the selected locations, soil profiles were made to describe and

classify according to the WRB classification system. The soil was classified in 2 levels: the Reference

Soil Groups (RSG) and the qualifiers which describe in detail the properties of the RSG by adding a

set of uniquely defined qualifiers (WRB, 2006).

There were five WRB Reference Soil Groups found in the surveyed area:

- Fluvisols (402 samples): Genetically young, azonal soils in alluvial deposits.

Page 25: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

  16  

- Acrisols (58 samples): soils having higher clay content in the subsoil than in the topsoil as a

result of pedogenetic processes (especially clay migration) leading to an argic subsoil horizon.

Acrisols have at certain depths a low base saturation and low-activity clays.

- Arenosols (7 samples): sandy soils, including both soils developed in residual sands after in

situ weathering of usually quartz-rich sediments or rocks, and soils developed on recently

deposited sands such as dunes in desert and beach lands.

- Gleysols (15 samples): wetland soils which, unless drained, are saturated with ground water

for long enough periods to develop a characteristic gleyic color pattern.

- Plinthosols (55 samples): soils with plinthite, petroplinthite or pisoliths. Plinthite is an Fe-rich

(Mn-rich), humus-poor mixture of kaolinitic clay (and other products of strong weathering such

as gibbsite) with quartz and other constituents that changes irreversibly to a layer with hard

nodules, a hardpan or irregular aggregates on exposure to repeated wetting and drying.

Petroplinthite is a continuous, fractured or broken sheet of connected, strongly cemented to

indurated nodules or mottles. Pisoliths are discrete strongly cemented to indurated nodules.

Both petroplinthite and pisoliths develop from plinthite by hardening. (WRB, 2006)

The 537 soil profiles in the surveyed area were also classified using qualifiers in addition to the WRB

Reference Soil Group. This leads to 30 different soil categories, which was considered a too high

number for digital soil mapping because of the low presence of samples in many of the categories.

This is illustrated by table 3.

Table 3. Presence of soil profiles at the most detailed categorical level

Soil category

Number

of soil

profiles

Soil category

Number

of soil

profiles

Abrupti-DystricFluvisol 1 Areni- PlinthicAcrisol 10

Areni- EutricFluvisol 5 Areni - HyperdystricAcrisol 3

Dystric- GleyicFluvisol 46 Endoferri - HyperdystricAcrisol 2

Dystric-CambicFluvisol 54 Hyperdystri - ArenicAcrisol 4

Endogleyi-CambicFluvisol 8 Hyperdystri - PlinthicAcrisol 6

Gleyi-DystricFluvisol 52 Plinthi - HyperdystricAcrisol 14

Plinthi-DystricFluvisol 33 Skeleti - HaplicAcrisol 2

Silti- EutricFluvisol 39 Veti - HyperdystricAcrisol 2

Silti-DystricFluvisol 34 Dystri - HaplicArenosol 5

Endoplinthi-DystricFluvisol 50 Fluvi - DystricArenosol 1

Epigleyi-CambicFluvisol 8 Veti - DystricPlinthosol 22

Epiplinthi-DystricFluvisol 31 Areni- DystricPlinthosol 22

Eutri-CambicFluvisol 2 Dystri - AlbicPlinthosol 7

Albi - HyperdystricAcrisol 8 Endocamni- DystricGleysol 4

Anthraqui - ArenicAcrisol 1 Fluvi- DystricGleysol 8

Page 26: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

  17  

For this reason, soil data points were classified into an intermediate level. This intermediate

classification was based on some properties relevant for soil management: base saturation status as

indicator of soil fertility, texture and appearance of hard layer in the soil profile. These properties were

assigned based on the qualifiers in the profile soil classifications: eutric=high base saturation;

dystric=low base saturation; plinthic=plinthite present that may be or become a hardpan;

epigleyic=reducing conditions in upper 50 cm, endogleyic=reducing conditions between 50 and 100

cm. As a result, 15 intermediate level of soil units were classified as summarized in table 4.

Table 4. Presence of soil profiles at the intermediate categorical level

No Intermediate level

classification

Number of

soilprofiles Properties

1 Acrisol00000 4 Acrisolshaving no special property

2 Acrisol00001 11

Acrisols having a hard subsurface horizon (plinthic

horizon) which make it more difficult to work on this

soil

3 Acrisol10000 21 Acrisols having a low base saturation(dystric

qualifier) ,thus with higher fertilizers need

4 Acrisol10001 22 Acrisols having both hard subsurface horizon

(plinthic horizon) and a low base saturation

5 Arenosol10000 7 Arenosolshavinga low base saturation

6 Fluvisol0001000 9 very wet Fluvisols having reducing condition within

50cm of the soil surface

7 Fluvisol0010000 9 a wet Fluvisols that have reducing condition between

50cm and 100cm from the soil surface

8 Fluvisol0100010 42 Fluvisols have high base saturation and texture of

silt, silt loam, silty clay loam or silty clay

9 Fluvisol1000000 171 Fluvisols have low base saturation

10 Fluvisol1000010 38 Fluvisols have low base saturation and texture of silt,

silt loam, silty clay loam or silty clay

11 Fluvisol1000100 126 Fluvisols have low base saturation and a hard

subsurface horizon

12 Fluvisol0100000 2 Fluvisols have high base saturation

13 Fluvisol0100001 5 Fluvisols have high base saturation and texture of

loamy fine sand or coarser

14 Gleysol10000 15 Gleysols have low base saturation

15 Plinthosol10000 55 Plinthosols have low base saturation

Page 27: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

  18  

4.2.2. Digital elevation model (DEM)

Topography is one of the most important factors which affects the soil formation, thus it may determine

the soil types in an area. Landscape position cause localized changes in moisture and temperature.

Therefore, a DEM of the area at the grid resolution of 25m was created by digitizing the topographic

map of the region. The DEM was used to derive four terrain attributes using the Saga GIS: Altitude,

Slope, Topographic wetness index and SAGA wetness index (Olaya, 2004). Those attributes may

reflect the soil forming condition in the study area.

4.2.3. Remote Sensing indices

The SPOT image from Vietnam Space Technology Institute has a resolution of 20m, and was used to

compute remote sensing indices such as Normalized Difference Vegetation Index (NDVI), Ratio

Vegetation Index (RVI) and Perpendicular Vegetation Index (PVI) by using ArcGIS. As a result, three

raster maps at a resolution of 20m were derived: NDVI map, RVI map and PVI map. Subsequently,

these maps were rescaled into a resolution of 25m in order to obtain the same map extent and grid

size as the DEM – derived attributes maps. This was done in ArcGIS.

The vegetation indices are numerical indicators that uses the visible and near-infrared bands of the

electromagnetic spectrum to assess whether the target being observed contain live green vegetation

or not. These indices are widely applied in vegetative studies and are often directly related to ground

parameters such as percent of ground cover, photosynthetic activity of the plant, surface water,… The

NDVI algorithm subtracts the red reflectance values from the near-infrared (NIR) and divides it by the

sum of near-infrared and red bands.(Rouse et al., 1973)

NDVI= (NIR-RED) / (NIR+RED)

The RVI formed by dividing the NIR radiance by the red radiance (PearsonMiller, 1972)

RVI = NIR / RED

4.2.4. Land use map

A land use map of Bac Ninh province at a scale of 1:25,000 in 2010 was produced to be a source of

ancillary data. The study area locates in the biggest deltaic region of Vietnam and paddy rice is the

dominant crop. Because the observations were obtained only in the agricultural area, the following

three main land use types were encountered in the study area: two crops per year of rice

cultivation(LUC), one crop per year of rice cultivation (LUK) and annual crops (BHK). Annual crops

include maize, potatoes, sweet potatoes, vegetables and cassava.

Page 28: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

  19  

Table 5: Available ancillary data

No Data set Predictor name Resolution / Scale

1 Digital Elevation model ALTITUDE 25 m

2 Map of slope SLOPE 25 m

3 Map of Saga Wetness Index SAGAWET 25 m

4 Map of Topographicwetness index WETNESS 25 m

5 Map of NDVI NDVI 25 m

6 Map of PVI PVI 25 m

7 Map of RVI RVI 25 m

8 Land use map LU 1 : 25,0000

4.3. Multinomial logistic regression

4.3.1. The multinomial logistic regression model

Multinomial logistic regression was used to model the relationships between the Reference Soil Group

or the intermediate level soil groups (categorical dependent variables) and the terrain attributes,

remote sensing indices and land use types in the research area (quantitative predictors) using the

“nnet’ package of R. This model belongs to the family of generalized linear models and is used when

with categorical response variable. Suppose that we want to model the probability πij that observation i

in each jth class of the m soil groups j = 1 … m. In the model for predicting soil groups, the Fluvisols

(j=1) is taken as the reference class due to its dominance in the soil point data (402 of 537 samples).

In the MLR model for more detail level, the Fluvisol1000000 is the reference class for the same reason

(171/537 samples). Consequently, the base probability πi1 is computed as the residual probability after

the other classes πi2 … πim have been modeled.

Thus the model has k +1 coefficients for each of the j = m – 1 classes (leaving out the reference class):

one intercept αj and one “slope” for each predictor βlj, where l = 1 … k is a column in the model matrix.

The fitted probabilities are then:

1 1

1 1

( ... )

( ... )

2

12

, 2,...,1

1

j j i kj ik

j j i kj ik

x x

ij m x x

lm

i ijj

e j me

α β β

α β βπ

π π

+ + +

+ + +

=

=

= =+

= −

where xi is a vector of explanatory variables. This set of equations is fitted by maximizing the

likelihood.

The fitted α and β can then be used to assess the log-odds of an observation being classified in each

soil class, relative to the base class. That is, what is the chance that, instead of Fluvisols in Soil Group

Page 29: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

  20  

level (or Fluvisol1000000 in intermediate level), the observation is in another soil group. The log-odds

are computed as:

1 11

ln ... , 2,...,ijj j i kj ik

i

x x j mπ

α β βπ

= + + + =

So, once we fit the model, we can predict the odds of some soil groups (or intermediate level of soil

groups), relative to the reference one. To recover the actual odds, the inverse logistic transformation is

used. In R-project, we use the predict function to provide the probability of all the classes (which of

course sum to 1).

4.3.2. Assessing model significance and contribution of predictors

In order to find the best model, one that provide the maximum fit for the fewest predictors, it is

important to select the predictor variables in the logistic regression model that contributes most to the

pattern in the categorical response variable. The criteria for assessing different models include the

deviance statistics and the Akaike Information Criteria (AIC). (Akaike, 1973). AIC is a measurement of

relative quality of a statistical model for a given data set. AIC deals with the trade-off between the

complexity of the model and the goodness of fit of the model, thus it provides a mean for model

selection. AIC adjusts the residual deviance for the number of predictor variables:

AIC = 2K – 2ln(L)

where K is the number of the estimated parameters included in the model, L is maximized value of the

likelihood function for the estimated model which is readily available in the statistical output, and

reflects the overall fit of the model. In itself, the AIC value for a given data set has no meaning. It

becomes interesting when it is compared to the AIC of a series of models, one with the lowest AIC

being the best model. If many models have similarly low AICs, the one with the fewest predictor

variables should be chosen.

In this research, the stepwise-forward method was used for model selection. Firstly, we begin with no

variables in the model. For each of the independent variables, the model was fitted, and then the AIC

for each model was computed and models were compared. The most influential predictor variables

which have the lowest AIC will be included in the final model firstly; other variables are added one by

one to the model in order of increasing AIC. The variables selection will stop if the AIC of the fitted

model increases. Finally, the selected model is the one have the fewest independent variables and the

lowest AIC.

4.4. Artificial neural network

ANNs are a standard technique in the range of artificial intelligence and data mining in general. They

are thus designed to learn rules from examples. In R-project, ANNs was run using the “neuralnet”

package (FritschGuenther, 2012). The package contains a very flexible function to train feed-forward

Page 30: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

  21  

neural networks. It was built to train neural networks in the context of regression analysis and focuses

on multiple layer perceptrons, which are well applicable when modeling functional relationships. In

“neuralnet”, the predictors are selected using the stepwise-forward methods as described in 4.3.2.

In this study, a model of back-propagation ANN as developed to predict soil types at both Soil Group

level and the intermediate level. Back-propagation networks were trained with a back-propagation

technique which adjusted the weight and bias values along a negative gradient descent directed in an

attempt to minimize the mean squared error (MSE) between the input and output vectors of training

data set (SigillitoHutton, 1990).

The application of an ANNs consists of two stages. During the first stage, the network is trained,

meaning that it learns the conditions on which a certain soil group occurs using the calibration data

set. Each input unit (cell or neuron) of the ANNs represents a prediction variable: terrain attributes,

remote sensing indices and land use units. The output unit represents the Soil Groups or the

intermediate level of Soil Groups. The connection between neurons are described by the weight wi (wi1

… win). The adjustment of these weights depends on the learning process. As each attribute

combination (in terms of pixels of a grid map) is put into the network in succession, the weights are

adjusted iteratively if the predicted output does not match the output of a training data set. The other

network parameters including the optimum iteration learning rates, the number of hidden layer and

transfer function were adjusted after the stage of learning to train the network. During the second

stages, the learned knowledge in terms of the calibrated weights can be applied to the whole study

area, for which the same input parameters (terrain attributes, remote sensing indices and land use

maps) are available but no soil map has been surveyed. The network then predicts the soil units

based on the learned weights. (Behrens. et al., 2005)

4.5. Validation

The quality of a soil map can be determined by comparing the prediction at the calibration sites with

the observed values. However, the accuracy thus obtained, referred to as the internal accuracy, often

over-estimates the actual accuracy (Chatfield, 1995). Therefore, in this project, an independent

validation data set of 53 observations was selected randomly from the data set. The predictions based

on the dataset excluding the validation dataset are then compared with independent validation data

which were not used in the modeling.

For assessing the quality of the predicted soil maps, the map purity was used based on the confusion

matrix (Brus et al., 2011). Table 6 shows an error matrix: the row margins (the area covered by the

map units) of the matrix are known, whereas the column margins (the areas covered by the true

classes) are unknown, and must be estimated from the samples.

Page 31: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

  22  

Table 6: Confusion matrix

Map

Field

1 2 ... U ∑

1 A11 A12 ... A1U A1+

2 A21 A22 ... A2U A2+

. . . ... . .

. . . ... . .

. . . ... . .

U AU1 AU2 ... AUU AU+

∑ A+1 A+2 ... A+U A

Aij = number of observations mapped as class Ci with observed soil class Cj

The overall purity is defined as the proportion of the mapped samples in which the predicted soil class,

which is the soil class as depicted on the map, equals the true soil class as determined on validation

points. In other words, it is the proportion correctly classified:

1

UUU

u

ApA=

=∑

Where U denotes the number of classes, AUU denotes the number of correctly classified observations

of map unit u and A denotes the total number of observations in the study area. A good map has a

value for map purity close to 1 (Finke, 2011).

4.6. Soil diversity indices

The diversity indices were calculated to access the variation of the predicted soil maps. In this

research, three pedodiversity indices including Shannon’s entropy H’, richness S and evenness E

were calculated for each predicted map.

• Richness (S): is the number of soil classes that exists in an area.

• Shannon’s entropy: is the most commonly used measurement of pedodiversity (Guo et al.,

2003; Ibáñez et al., 1998)

1ln

S

i ii

H p p=

= − ×∑

Where piis the proportion of area found in i-th unit over the total area of the map. When one class

dominates over the area, we have p = 1, thus Hmin= 0. The closer values of p to 1/S, the more

homogeneous the distribution of p, the more diverse the class composition is. The maximum value of

Page 32: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

  23  

H is calculated as Hmax = lnS, a value close to Hmax indicate an equal proportional contribution of all

classes (MartínRey, 2000).

• Evenness (E) refers to the relative abundance of each soil class in the area. It canbedefined as:  

max

' 'ln

H HEH S

= =

If each soil class is equally abundant, the evenness has high value and inversely, an area in which the

abundance of soil classes differ greatly has low evenness (A. B. McBratneyMinasny, 2007).

The diversity of a map indicates the amount of information depicted on the map: a high diversity

correspond to high information content.

4.7. Combined Index practical management

The map purity is the indicator of map quality whereas soil diversity gives you an idea about the

information content of the map. Thus, both aspects can be used to express how useful the map is. In

terms of management practices, the goal of soil mapping is to construct a map with high purity that

adequately represents soil diversity. Therefore, the combination of map purity and Shannon’s entropy

is an important index to assess the soil mapping’s performance. The combined index for accuracy and

depicted diversity was defined by multiplying H’ and map purity.

Page 33: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

  24  

5. RESULTS AND DISCUSSION

5.1. The soil maps modeled by multinomial logistic regression

In model of MLR for predicting Reference Soil Group in Bac Ninh, the stepwise-forward method results

in the selection of fives prediction variables including altitude, NDVI, slope, topographic wetness index

and saga wetness index (Table 7). The Wetness indices are frequently used to simulate the soil

moisture conditions in a watershed quantitatively. Altitude and slope are very important terrain

attributes. Therefore, the combination of relief and distribution of water over the area significantly

affects the formation of soil at higher level of classification. The effects of terrain attributes on

distribution of soil groups were shown by Debella-Gilo and Etzelmuller (2009) using Multinomial

Logistic Regression. In addition, Jafari, et al (2012) also found that the degree of wetness plays a role

in the identification of soil types in a semi-arid area via the same method.

The MLR model for predicting intermediate level of Soil Group consists of the same variables with the

model above (altitude, NDVI, slope, topographic wetness index, SAGA wetness index) and land use

(Table 7). It is reasonable to expect that to predict soil class in more detail, the model need more

predictive variables because the relationship between the soil class and the covariates is more

complex at lower categorical levels. In addition, the more detail level was classified base on the soil

management properties, land use also have considerable influence on the soil definition.

Table 7: The variable used to predict soil group and intermediate level of soil group in multinomial logistic regression.

Soil class Variable in modeling

MLR

Reference Soil Group ALTITUDE+NDVI+SLOPE+WETNESSIN+SAGAWETNET

Intermediate level of Soil

Group LU+ALTITUDE+NDVI+SLOPE+WETNESSIN+SAGAWETNET

Multinomial logistic regression predicts the soil classes directly from the predictors. Figure 4 shows the

occurrence of Reference Soil Group predicted by MLR. As can be seen from the map, Fluvisols is the

dominant class over the area. This can be explained by the fact that Bac Ninh is located in Red River

delta that is the biggest delta in the North of Vietnam. Fluvisols are genetically young soil in alluvial

deposits, thus over the study area, this soil group accounts for the largest area. The good natural

fertility of this soil group make Bac Ninh become one of the highest paddy rice production region

in Vietnam.

Beside Fluvisols, Acrisols, Arenosols and Plinthosols are predicted with a very limited proportion by

MLR method. However, the model did not predict any Gleysols even though we have samples belong

to this group, too. Looking back to the input observations, it is clear that Fluvisols account for more

than 70% and four other soil groups only account for about 25% of the total number of samples. This

explains for the excessive appearance of Fluvisols compared to the others and the exclusion of

Gleysols as the output of the model (Gleysols only have 15 samples over the total of 537).

Page 34: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

  25  

Figure 4: Map of Reference Soil Group predicted by Multinomial Logistic Regression

Acrisols occurs in high landscape position in the study area as compared to the topographic map

(Figure 6), which is a good prediction of the model because this soil group is often associated with hilly

or undulating topography in wet tropical climates (FAO, 2001).

Figure 5 illustrates the distribution of the intermediate level of soil group predicted by MLR. At this

level, the Reference Soil Group was reclassified based on the soil management properties to avoid

the predominance of one soil class in the input sampling. As expected, the model predicted more

detailed soil classes: 11 soil classes appear in the resultant map. Nevertheless, there is still no

occurrence of Gleysols which lead to the missing information of the model similar to the soil

group prediction.

Page 35: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

  26  

Figure 5: Map of intermediate level of Soil Group predicted by Multinomial Logistic Regression

Fluvisol1000000 – Fluvisols have low base saturation - occurs in most area of Bac Ninh. Generally this

is the fertile alluvial soil, distributed over different types of terrain, but due to the long exploitation for

cultivation without appropriate land treatment reduces the soil fertility. The second dominant soil class

over the study area is Fluvisol1000100 – Fluvisols have low base saturation and a hard

subsurface horizon.

Fluvisols have high base saturation and fine texture (Fluvisol0100010) appears in both sides following

the Red river. This soil class has high fertility because the river annually deposits a certain amount of

sediment to the area around it.

The model also results in the distribution of Acrisol00000 over the hilly region but in a more extensive

area as compared to the Reference Soil Group level. The prediction of the MLR model for other soil

classes concerns very small area.

Page 36: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

  27  

Figure 6: Digital Elevation Model of Bac Ninh

5.2. The soil maps modeled by artificial neural network

Artificial Neural Networks were used to estimate the probabilities of occurrence of each soil class at

the nodes of 25m raster covering Bac Ninh. Subsequently, the soil type with the largest probability at

each pixel was used to construct a prediction map. Therefore, at Reference Soil Group level, 5 models

were constructed to predict 5 Reference Soil Groups appearing in the study area. Similarly, there are

15 ANN models corresponding to 15 intermediate level of Soil groups.

The parsimonious model for prediction was selected in a similar way to Multinomial Logistic

Regression based on the smallest AIC and residual deviance. However, as shown in Table 8, the

entire chosen model for each soil class by ANNs have only one predictive variable. Surprisingly, the

increasing number of covariates led to the increasing in AIC for all models despite the fact that more

variables included in the model could describe the relationship between the target variable and the

covariates better.

Page 37: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

  28  

Table 8: The variable used to predict soil group and intermediate level of soil group in artificial neural networks.

Level Soil class Variable in modeling

Reference Soil Group

Acrisols SAGAWET

Fluvisols SAGAWET

Arenosol WETNESS

Gleysol WETNESS

Plinthosol NDVI

Intermediate level of Soil

Group

Acrisol00000 ALTITUDE

Acrisol00001 NDVI

Acrisol10000 SAGAWET

Acrisol10001 ALTITUDE

Arenosol10000 NDVI

Fluvisol0001000 WETNESS

Fluvisol0010000 LU

Fluvisol0100000 ALTITUDE

Fluvisol0100001 LU

Fluvisol0100010 LU

Fluvisol1000000 PVI

Fluvisol1000010 ALTITUDE

Fluvisol1000100 PVI

Gleysol10000 LU

Plinthosol100000 NDVI

Figure 7 shows the map of Reference Soil Group constructed by ANNs model. Three out of the five

Reference Soil Groups were predicted by the model: Fluvisols, Acrisols and Plinthosols. ANNs

predicted Fluvisols in about 98% of the total area (Table 9). This was also attributed to the unequal

presence of the soil types in the observation data: more than 400 samples were Fluvisols in a 537

points dataset. Acrisols and Plinthosols having 58 and 55 samples respectively occur in the resultant

map in a very limited proportion. Arenosols and Gleysols which have the lowest number of

observations were not present in the predictive map.

Page 38: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

  29  

Figure 7: Map of Reference Soil Group predicted by Artificial Neural Networks

In terms of ANNs for predicting intermediate level of Soil groups, the model predicted six soil classes

belong to the same Soil groups with higher level: Acrisols, Fluvisols and Plinthosols. Similarly, the soil

classes belonging to both Gleysols and Arenosols were not classified by the model. This map shows

similar pattern with the map produced by MLR: Fluvisols have low base saturation cover most of the

area (78.8%), Fluvisols have high base saturation and fine texture located in both sides following the

Red river, and Acrisols distribute in hilly regions.

Page 39: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

  30  

Figure 8: Map of intermediate level of Soil Group predicted by Artificial Neural Networks

Page 40: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

  31  

Table 9: Distribution of soil classes predicted by Multinomial Logistic Regression and Artificial Neural Networks

Soil class Area (m2) Proportion

MLR forReference

Soil Group

Fluvisol 533214375 0.961

Acrisol 7025000 0.013

Arenosol 11722500 0.021

Plinthosol 3138125 0.006

MLR forintermediate

level of Soil group

Acrisol00000 12662500 0.023

Acrisol10000 6856250 0.012

Arenosol10000 23313125 0.042

Fluvisol0010000 4143750 0.007

Fluvisol0100000 3651250 0.007

Fluvisol0100001 938125 0.002

Fluvisol0100010 25088125 0.045

Fluvisol1000000 318666250 0.574

Fluvisol1000010 36875 0.000

Fluvisol1000100 154650625 0.279

Plinthosol10000 5093125 0.009

ANNsfor Reference Soil Group

Plinthosol 832500 0.001

Acrisol 6525625 0.012

Fluvisol 547741875 0.987

ANNsforintermediate

level of Soil group

Acrisol00000 13628125 0.025

Acrisol10000 3309375 0.006

Fluvisol0100010 25390000 0.046

Fluvisol1000000 437148750 0.788

Fluvisol1000100 67043125 0.121

Plinthosol10000 8580625 0.015

5.3. Comparison of predictive methods

5.3.1. Soil map purity

The predictive soil maps were validated with independent data of 53 points collected by simple

random sampling from the dataset. The overall purity of the maps was calculated from the confusion

matrix. It has been used for many soil maps as a criterion to assess map quality. Many surveys reports

state that the intention of the soil survey was to obtain a map purity of ca. 70%, which means that the

soils should be classified correctly on about 70% of the map (Finke, 2011)

Table 9 presents the estimated purity of the soil maps predicted by Multinomial Logistic Regression

and Artificial Neural Networks at both levels. Both of the two methods get the same map purity value

Page 41: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

  32  

(0.73) at the high level of soil class. This indicates a good performance of both methods in predicting

Reference Soil Group.

As expected, in terms of lower level of soil class, the map purity drops dramatically to 0.39 and 0.37

for MLR and ANNs, respectively. MLR have slightly higher purity in predictive map than that of ANNs.

Descending in the classified level introduces more properties that might be related to local conditions

and natural selection, thus can lead to the complexity of the system (Toomanian et al., 2006).

Therefore, some properties might not be included in the applied covariates and disconnection occurs

between soil classes and covariates at lower level. Digital soil mapping relies on the relationships

between soil samples and environmental factors of the target area. Weak relationships will result in

weak prediction as seen in the performance of both methods at intermediate level of Soil groups.

Jafari et al (2013) also found that soil map purity decreased toward the lower taxonomy category.

Another reason is that the number of different soil units at Reference Soil Group level is much less

than at the intermediate level (5 Reference Soil Groups compare to 15 Intermediate levels). The soil

map purity decreases due to low contrasting soil units at lower level. Olaniyan and Ogunkunle (2007)

reported that soil mapping units with high purity included very contrasting soil types.

Table 10: Map purity, diversity indices and combined index of maps predicted by MLR and ANNs

Level Map

purity Richness

Shannon

H’

Evenness

E Purity * Shannon

MLR

Reference Soil

Group 0.73 5 0.20 0.12 0.15

Intermediate

level of Soil

Group

0.39 15 1.21 0.44 0.41

ANNs

Reference Soil

Group 0.73 5 0.07 0.04 0.05

Intermediate

level of Soil

Group

0.37 15 0.77

0.28

0.29

5.3.2. Soil diversity

Table 9 shows the Richness, Shannon index and the Evenness of the resultant maps from two

methods at both taxonomic levels of soil units. It is clear to see that with increasing number of soil

units from the Reference Soil Group to the intermediate level, the diversity and the evenness rise

sharply. The greater number of soil units correspond to the higher the diversity at the lower

taxonomic level.

At the same taxonomic level, MLR always yields a higher value of the Shannon’s index than ANNs.

With the same Richness, the higher values of H’ from MLR compared to that of ANNs indicate that

Page 42: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

  33  

higher soil diversity was MLR. This was confirmed above in table 8: Fluvisols predicted by MLR are

less abundant than that by ANNs model even if both methods have a very low diversity index at

Reference Soil Group level (0.2 for MLR and 0.07 for ANNs). The lower level of classification acquires

ahigher value of Shannon’s index: 1.21 for MLR and 0.77 for ANNs. This could be attributed to the

increasing number of soil map units at this level, thus induce the diversity of the predicted map. Similar

with Reference Soil Group, the diversity is higher in maps made with MLR than with ANNs.

In addition, Figure 9 and Figure 10 illustrate the relationship between the map purity, the Shannon

index and the combined index for MLR and ANNs model respectively. The diversity index always

shows the opposite trend as the soil map purity. When the soil map purity decreases, the diversity

index increases. The number of different soil units (richness) in each classification level may explain

for this. H’ is closely related to the number of soil units: if the number of different soil classes

increases, a greater number of fractions are summed in H’.

Figure 9: Variation of the purity, Shannon Index and the combined index for the map predicted by MLR at two level of soil class

The diversity indices including richness, Shannon’s index and evenness represent the deterministic

soil complexity(Jafari et al., 2013). For that reason, the increase of entropy in the study area from

Reference Soil Group to lower level indicates higher complexity of the soil system. Besides, an

increase in entropy associated with the larger number of different soil classes influences the prediction

ability of the model. When the system complexity increases, there are more different soil classes in the

area, thus the model should be trained for larger number of soil classes. It means that there are fewer

observations per class for training of the model. This raises the uncertainty of the prediction for each

soil classes and soil map purity decreases for the intermediate level of Soil groups. The soil diversity

is a reflection of the intricacy of soil maps and may therefore influence the soil map purity (Minasny et

al., 2010).

0  

0.2  

0.4  

0.6  

0.8  

1  

1.2  

1.4  

RSG  -­‐  MLR   InterSG  -­‐  MLR  

Purity  

Shannon  

Purity  *  Shannon  

Page 43: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

  34  

Figure 10: Variation of the purity, Shannon Index and the combined index for the map predicted by ANNs at two level of soil class

The combined index defined by multiplying Shannon’s entropy and map purity increases from the

Reference Soil Group level to intermediate level in both MLR and ANNs approaches. However, MLR

show higher value at both levels in comparison with ANNs as illustrated in table 9.

In terms of management practices, we need a soil map with high purity that adequately represents soil

diversity. The pedodiversity measurements are related to the density of soil map or presence of

various soil units (Jafari et al., 2013). Soil mapping methods should acquire high map purity and also,

it should represent the real soil diversity. In this research, although there are small differences in map

purity between those two predictive methods, MLR shows higher pedodiversity at both mapping levels

than ANNs does. Therefore, it seems that soil mapping will be more efficient by using Multinomial

Logistic Regression than Artificial Neural Network. In MLR methods, the map purity at Reference Soil

Group level is much higher than that value at intermediate level of Soil groups. Therefore, the model

performs much better in predicting Soil groups. However, at lower level, the model predicts better

diversity of the soil map and thus the informative value estimated by the combined index of the

intermediate level maps is higher.

0  

0.1  

0.2  

0.3  

0.4  

0.5  

0.6  

0.7  

0.8  

0.9  

RSG  -­‐  ANN   InterSG  -­‐  ANN  

Purity  

Shannon  

Purity  *  Shannon  

Page 44: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

  35  

6. CONCLUSION

Some main conclusions can be drawn from the results of this study:

1. The Multinomial Logistic Regression could successfully be used to directly predict soil types map.

2. The soil map purity shows an opposite trend to that of the mapped soil diversity: as the purity

decreases from Soil Groups to intermediate level of Soil groups, the soil diversity increases.

3. Based on the map purity and the combined index, Multinomial Logistic Regression performed better

for predicting soil types than Artificial Neural Networks. Soil mapping at the level of Reference Soil

Group acquires a high map purity and a low diversity.

4. To improve the model performance, more observations are needed for Acrisols, Plinthosols,

Arenosol and especially Gleysols to avoid the abundance of Fluvisol over the dataset.

Page 45: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

  36  

BIBLIOGRAPHY

Akaike, H. (Ed.). (1973). Information theory and an extension of the maximum likelihood principle. Budapest.

Avery, B. W. (1987). Soil survey methods: a review. Technical Monograph 18. Beckett, P. H. T. (1976). Agriculture Progress. Soil survey, 51, 33-49. Behrens, T., Forster, H., Scholten, T., Steinrucken, U., Spies, E. D., and Goldschmitt, M. (2005).

Digital soil mapping using artificial neural networks. Journal of Plant Nutrition and Soil Science, 168, 1-13.

Behrens., T., Förster, H., Scholten, T., Steinrücken, U., Spies, E.-D., and Goldschmitt, M. (2005). Digital soil mapping using artificial neural networks. Journal of Plant Nutrition and Soil Science, 168(1), 21-33.

Brus, D. J., Kempen, B., and Heuvelink, G. B. M. (2011). Sampling for validation of digital soil maps. European Journal of Soil Science, 62, 394–407.

Campling, P., Gobin, A., and Feyen, J. (2002). Logistic modeling to spatially predict the probability of soil drainage classes. Soil Science Society of America Journal, 66, 1390–1401.

Carré, F., and Girard, M. C. (2002). Quantitative mapping of soil types based on regression kringing of taxonomics distances with landform and land cover attributes. Geoderma, 111, 241-263.

Chatfield, C. (1995). Model uncertainty, data mining and statistical inference. Journal of the Royal Statistical Society, Serries A, 419-466.

Debella-Gilo, M., and Etzelmuller, B. (2009). Spatial prediction of soil classes using digital terrain analysis and multinomial logistic regression modeling integrated in GIS: Examples from Vestfold County, Norway. Catena, 77, 8-18.

Dent, D., and Young, A. (1981). Soil survey and land evaluation. London: George Allen and Unwin. Fagerland, M. W., Hosmer, D. W., and Bofin, A. M. (2008). Multinomial goodness-of-fit tests for logistic

regression models. Statistics in Medicine, 27(21). FAO. (2001). Lecture notes on the Major Soils of the World (Vol. 94). FAO. (2006). Guidelines for soil description. Rome: FAO. FAO/ISRIC/CSIC. (1988). Revised Legend of the Soil Map of the World World Soil Resources Report.

Rome. Fidèncio, P. H., Ruisanchez, I., and Poppi, R. J. (2001). Application of artificial neural networks to the

classification of soils from Sao Paulo state using near-infrared spectroscopy. Analyst, 126, 2194-2200.

Finke, P. (2011). Syllabus for the course Soil prospection and classification in the Physical Land Resources program.

Fritsch, S., and Guenther, F. (2012, 2012-09-19). Package "neuralnet". Training of neural network, (1.32).

Gessler, P. E., Moore, I. D., McKenzie, N. J., and Ryan, P. J. (1995). Soil–landscape modelling and spatial prediction of soil attributes. International Journal of Geographical Information Systems, 9, 421– 432.

Goeman, J. J., and Cessie, S. L. (2006). A goodness-of-fit test for multinomial logistic regression. Biometrics, 62(4), 980-985.

Guo, Y., Gong, P., and Amundson, R. (2003). Pedodiversity in the United States of America. Geoderma, 117, 99-115.

Hastie, T., Tibshirani, R., and Friedman, J. (2001) The elements of statistical learning: data mining, inference and prediction. Springer series in Statistics. New York: Springer-Verlag.

Hengl, T. (2003). Pedometric mapping. Bridging the gaps between conventional and pedometric approach. (PhD), Wageningen University.

Ibáñez, J. J., De-Alba, S., Lobo, A., and Zucarello, V. (1998). Pedodiversity and global soil patterns at coarse scales (with discussion). Geoderma, 83, 171-192.

Jafari, A., Ayoubi, S., Khademi, H., Finke, P. A., and Toomanian, N. (2013). Selection of taxonomic level for soil mapping using diversity and map purity indices: A case study from an Iranian arid region. Geomorphology.

Jafari, A., Finke, P., Van de Wauw, J., Ayoubi, S., and Khademi, H. (2012). Spatial prediction of USDA-great group in arid Zarand region, Iran, comparing logistic regression approaches to predict diagnostic horizons and soil types. European journal of Soil science.

Jenny, H. (1941). Factors of soil formation - a system of quantitative pedology: New York: McGraw-Hill.

Page 46: EVALUATING THE POENTIAL OF DIGITAL SOIL MAPPING ...lib.ugent.be/fulltxt/RUG01/002/063/702/RUG01-002063702...Classification Research - Soils and Fertilizers Research Institute charged

  37  

Kempen, B., Brus, D. J., Heuvelink, G. B. M., and Stoorvogel, J. J. (2009). Updating the 1:50,000 Dutch soil map using legacy soil data: A multinomial logistic regression approach. Geoderma, 151, 311-326.

KIC, K. I. C. (1990). Munsell Soil Colors Charts. USA: Baltimore. Lagacherie, P. (2006). Chapter 1: Digital Soil Mapping: A state of the Art. In A. E. hartemink, A. B.

McBratney & M. L. Mendoca Santos (Eds.), Digital Soil Mapping with limited data (pp. 3-14): Springer.

Lagacherie, P., and McBratney, A. B. (2007). Chapter 1. Spatial soil information system and spatial soil inference systems: perspective for Diagital Soil Mapping. In P. Lagacherie, A. B. McBratney & M. Voltz (Eds.), Diagital Soil Mapping: An Introductory Perspective. (Vol. Development in Soil Science, pp. 3-24). Amsterdam: Elsevier.

Martín, M. A., and Rey, J. M. (2000). On the role of Shannon's entropy as a measure of heterogeneity. Geoderma, 98, 1-3.

McBratney, A. B., Mendoca Santos, M. L., and Minasny, B. (2003). On Digital Soil Mapping. Geoderma, 117, 3-52.

McBratney, A. B., and Minasny, B. (2007). On measuring pedodiversity. Geoderma, 141, 149-154. Menard, S. S. (2002). Applied Logistic Regression Analysis. Quantitative Applications in the Social

Sciences. Thounsand Oaks: Sage Publications. Minasny, B., McBratney, A. B., and Hartemink, A. E. (2010). Global pedodiversity, taxonomic distance,

and the World Reference Base. Geoderma, 155, 132-139. Mui, N. T. (2006). Vietnam Country Pasture/Forage Resource Profile. Rome: FAO. Olaniyan, J. O. and Ogunkunle, A. O. (2007). An evaluation of the soil map of Nigeria: II. Purity of

mapping unit. Journal of World Association of Soil and Water Conservation, J2, 97-108. Olaya, V. F. (Ed.). (2004). A gentle introduction to Saga GIS Gottingen, Germany. Pearson, R. L., and Miller, L. D. (1972). Remote mapping of standing crop biomass for estimation of

the productivity of the short-grass Prairie, Pawnee National Grasslands, Colorado. Proceedings of the Eighth International Symposium on Remote Sensing of Environment, 1357-1381.

Raimundo, R., Barbosa, A. M., and Vargas, J. M. (2006). Obtaining environmental favourability functions from logistic regression. Environmental and Ecological Statistics, 3(2).

Resende, R. J. T. P. (2000). Characterizations of the Physical Environment of Coffee Areas of the South of Minas Through SPRING. University of Lavras, UFLA, MG, Brazil

Rouse, J. W., Haas, R. H., Schell, J. A., and Deering, D. W. (1973). Monitoring vegetation systems in the Great Plains with ERTS. Washington DC.

Sarle, W. (2002 ). The IEEE Transactions on Neural Networks. Neural Network FAQ, from ftp:/ftp.sas.com/pub/neural/FAQ.html

Sigillito, V. G., and Hutton, L. V. (1990). Case study II: radar signal processing: Academic Press. Toomanian, N., Jalalian, A., Khademi, H., Eghbal, K., and M., P., A. (2006). Pedodiversity and

pedogenesis in Zayandeh-rud Valley, Central Iran. Geomorphology, 81, 376-393. Turetta, A. P. D., Mendoca Santos, M. L., Anjos, L. H. C., and Berbara, R. L. L. (2006). Chapter 22:

Spatial-Temporal Changes in Land Cover, Soil Properties and Carbon Stocks in Rio de Janeiro. In A. E. Hartemink, A. B. McBratney & M. L. Mendoca Santos (Eds.), Digital Soil Mapping with Limited data: Springer.

Van Reeuwijk, L. P., and Houba, V. J. G. (1998). Guidelines for Quality Management in Soil and Plant Laboratories. Rome: FAO.

Webster, R., and Beckett, P. H. T. (1968). Quality and usefulness of soil maps. Nature, 219, 680-682. White, R. E. (Ed.). (2005). Principles and Practice of Soil Science: The Soil as a Natural Resource:

Wiley-Blackwell. WRB, I. W. G. (2006). World Reference Base for soil resources 2006 (Vol. ). Rome: FAO. Yang, L., Jiao, Y., Fahmy, S., Zhu, A.-X., Hann, S., Burt, J. E., and Qi, F. (2011). Updating

Conventional Soil Maps through Digital Soil Mapping. Soil Science Society of America, 75(3), 1044-1053.

Zhu, A. X. (2000). Mapping soil landscape as spatial continua: the neural network approach. Water Resources Research, 36, 663-677.