soft computing-based decision support tools for spatial datasoft computing-based decision support...

16
Soft computing-based decision support tools for spatial data Serge Guillaume 1 Brigitte Charnomordic 2 , Bruno Tisseyre 3 , James Taylor 4 1 Irstea, UMR ITAP, 361 rue J.F.Breton - BP 5095, F-34196 Montpellier, France E-mail: [email protected] 2 INRA/SupAgro, UMR MISTEA 2 Place Viala, F-34060 Montpellier, France E-mail: [email protected] 3 SupAgro, UMR ITAP, 2 Place Viala,F-34060 Montpellier, France Email: [email protected] 4 CLEREL, Dept. of Horticulture, CALS, Cornell University, USA Email: [email protected] Abstract In many fields, due to the increasing number of automatic sensors and devices, there is an emerging need to integrate georeferenced and temporal data into decision support tools. Geographic Information Systems (GIS) and Geostatistics lack some functionalities for modelling and reasoning using georefer- enced data. Soft computing techniques and software suited to these needs may be useful to implement new functionalities and use them for modelling and decision making. This work presents an open source framework designed for that purpose. It is based upon open source toolboxes, and its design is inspired by the fuzzy software capabilities developed in FisPro for ordinary non-georeferenced data. Two real world applications in Agronomy are included, and some perspectives are given to meet the challenge of using soft computing for georeferenced data. Keywords: Fuzzy, zone, learning, geostatistics, FisPro, GeoFIS, agronomy, environment 1. Introduction Management of complex systems cannot only rely on a thorough mathematical modelling. Decision support systems are necessary to assist the deci- sion maker, and system design should benefit from all the available knowledge, including expert know- ledge and data. In many application fields, for instance in Agro- nomic and Environmental Sciences, the considered data are often georeferenced and temporal data. They come from measurements (satellite or aerial images, embedded sensors e.g. yield, harvest com- pounds, etc.), manual sampling (soil analyzes) or may be given by experts (flood-risk area). There is a need for aggregating heterotopic data of va- rious kinds (expert, measurements), from different sources, with various spatial resolutions, protocols and assessments. Imprecision, partial truth, and un- certainty are a recurring characteristic. Much effort has been made to design dedicated software for spatial data management, mainly Geo- graphic Information Systems (GIS) used to handle and display georeferenced data, and geostatistical methods for data processing and estimation. Ne- vertheless, there have been relatively few soft com-

Upload: others

Post on 01-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Soft computing-based decision support tools for spatial dataSoft computing-based decision support tools for spatial data Serge Guillaume1 Brigitte Charnomordic2, Bruno Tisseyre3, James

Soft computing-based decision support tools for spatial data

Serge Guillaume 1 Brigitte Charnomordic 2, Bruno Tisseyre 3, James Taylor 4

1 Irstea, UMR ITAP, 361 rue J.F.Breton - BP 5095, F-34196 Montpellier, FranceE-mail: [email protected]

2 INRA/SupAgro, UMR MISTEA 2 Place Viala, F-34060 Montpellier, FranceE-mail: [email protected]

3 SupAgro, UMR ITAP, 2 Place Viala,F-34060 Montpellier, FranceEmail: [email protected]

4 CLEREL, Dept. of Horticulture, CALS, Cornell University, USAEmail: [email protected]

Abstract

In many fields, due to the increasing number of automatic sensors and devices, there is an emergingneed to integrate georeferenced and temporal data into decision support tools. Geographic InformationSystems (GIS) and Geostatistics lack some functionalities for modelling and reasoning using georefer-enced data. Soft computing techniques and software suited to these needs may be useful to implementnew functionalities and use them for modelling and decision making. This work presents an open sourceframework designed for that purpose. It is based upon open source toolboxes, and its design is inspired bythe fuzzy software capabilities developed in FisPro for ordinary non-georeferenced data. Two real worldapplications in Agronomy are included, and some perspectives are given to meet the challenge of usingsoft computing for georeferenced data.

Keywords: Fuzzy, zone, learning, geostatistics, FisPro, GeoFIS, agronomy, environment

1. Introduction

Management of complex systems cannot only relyon a thorough mathematical modelling. Decisionsupport systems are necessary to assist the deci-sion maker, and system design should benefit fromall the available knowledge, including expert know-ledge and data.

In many application fields, for instance in Agro-nomic and Environmental Sciences, the considereddata are often georeferenced and temporal data.They come from measurements (satellite or aerialimages, embedded sensors e.g. yield, harvest com-

pounds, etc.), manual sampling (soil analyzes) ormay be given by experts (flood-risk area). Thereis a need for aggregating heterotopic data of va-rious kinds (expert, measurements), from differentsources, with various spatial resolutions, protocolsand assessments. Imprecision, partial truth, and un-certainty are a recurring characteristic.

Much effort has been made to design dedicatedsoftware for spatial data management, mainly Geo-graphic Information Systems (GIS) used to handleand display georeferenced data, and geostatisticalmethods for data processing and estimation. Ne-vertheless, there have been relatively few soft com-

Page 2: Soft computing-based decision support tools for spatial dataSoft computing-based decision support tools for spatial data Serge Guillaume1 Brigitte Charnomordic2, Bruno Tisseyre3, James

S. Guillaume et al.

puting developments to address the specific charac-teristics of georeferenced data. Even if some GISpropose fuzzy methods, like the popular fuzzy clus-tering algorithm, fuzzy k-means, these methods arenot designed specifically for georeferenced data.

Soft computing techniques, especially fuzzylogic and fuzzy inference systems, proved to beefficient to cope with imprecise data and uncer-tainty attached to expert judgment and have al-ready been used in Agronomic and Environmen-tal Sciences1,2,3,4,5,6,7. Spatial data specificities arelikely to open novel research topics in soft comput-ing. For instance, the notion of zone is not clearlydefined in GIS, it is often mistaken for a projec-tion of a classification achieved in the attribute spacewithout considering geographic continuity. Thisconcept is central in spatial reasoning and essentialin decision making, particularly in these fields, Asin practice, decisions need to be applied to manage-ment zones, satisfying geographical contiguity andshape criteria. For realistic decision support, zonesmust be defined with respect to the imprecision anduncertainty of available data and knowledge.

This work presents the outline of a decisionsupport system framework for spatial data. It isfreely available and based upon open source tool-boxes as well as on the authors’ experience in softcomputing software, through the former develop-ment of FisProa, that offers a high level of seman-tics and human-machine interaction. It could bepart, as a spatial package, of a wider project likethe GNU Fuzzy one proposed in the 2007 Fuzz’IeeeConference8.

The paper organization is as follows. The nextsection presents a state of the art of the availablesoftware environments for spatial data. The pro-posed architecture, including FisPro and the Ge-oFISb framework, is introduced in Section 3. Sec-tion 4 presents a soft computing-based distance,available in FisPro and GeoFIS. The frameworkfunctionalities are illustrated with two real world ap-plications in Section 5. Finally, Section 6 summa-rizes the main conclusions and the open challenges.

2. State of the art and need for specializedsoftware

GIS are powerful systems designed to capture, store,manipulate, analyze, manage, and display geogra-phically referenced data. They are used in many ap-plication areas, archaeology, resource management,agriculture, etc.

The most popular GIS include commercial soft-ware such as ArcGIS, JMap, MapInfo, Small-World, or open source library and software, such asGeoServer, GRASS, gvSIG, GeoToolsc, OpenMap,Quantum GIS, Udig or SAGA.

GIS use digital data and a spatio-temporal(space-time) location as the key index variable forall information, allowing information from diffe-rent sources to be related by accurate spatial infor-mation. They include a vast range of spatial ana-lysis techniques, among them contour lines, topo-logical and hydrological modelling, map overlay,geocoding, geostatistics and classification. In a GIS,geographical features are often expressed as vectors,by considering those features as geometrical shapes:points, lines or polygons. A spatial data set witha given geometry constitutes a layer. Alternatively,a layer can also be constituted by a raster data set.Map overlay uses the combination of several of theselayers to create a new output, visually similar tostacking several maps of the same region. Elemen-tary operators are available, such as union, intersec-tion and symmetric difference.

Geostatistics relies on statistical models based onrandom variable theory to produce field estimationsfrom data points, by modelling the uncertainty asso-ciated with spatial estimation and simulation. It in-cludes interpolation methods to complete the inputdata collected at a number of sample points.

Despite these powerful tools, GIS lack somefunctionalities for modelling and reasoning usinggeoreferenced data. Geographic information is dis-played for informing decision making, but there isneither a clear definition nor handling of some con-cepts, for instance the zone concept, which is often

ahttp://www7.inra.fr/mia/M/fispro/bhttps://mulcyber.toulouse.inra.fr/projects/geofis/chttp://geotools.org/

Page 3: Soft computing-based decision support tools for spatial dataSoft computing-based decision support tools for spatial data Serge Guillaume1 Brigitte Charnomordic2, Bruno Tisseyre3, James

Soft computing-based decision support tools for spatial data

confused with the class concept. GIS focus on pro-viding tools for multi-criteria decision-making, forinstance for site selection and suitability. Howeverthe concept of learning from data is not explicit. Toour knowledge, zone learning, zone operators andfeatures for dynamic evolution of zones seem not tobe available.

Another notable point is the limited use of softcomputing techniques in GIS, though reasoningabout space often has to deal with some form of un-certainty or imprecision. Recent add-ons to ArcGISinclude fuzzy operators for map overlay and fuzzyclassification. The concept of a linguistic variableis used to model the inaccuracies in attributes andin the geometry of spatial data. Data are fuzzifiedthrough membership functions and overlay opera-tors are applied on membership values instead ofraw data. An add-on to GRASS provides fuzzymembership functions, fuzzy operators and fuzzyrules to implement fuzzy inference systems for clas-sification tasks.

Fuzzy k-means clustering may be used for min-ing GIS data. In recent work9 the authors proposean extended fuzzy k-means method for GIS, that al-lows cluster centers to be hyperspheres, and apply itto find fire-point event hotspots from georeferenceddata. Recent publications, for instance10 that usesa fuzzy GIS-based spatial multi-criteria frameworkfor irrigated agriculture, take place in the applicationfields of Agronomic and Environmental Sciences.

On a different note, several advanced packages(spatial, geoR, gstat. . . ), are available for the opensource R11 software. They provide multivariate geo-statistical functions for kriging, analysis and simu-lation, and often include GIS support (GRASS forgstat) for querying data and executing scripts. Theyare intended for researchers or engineers having agood background in Statistics. SAGA (System forAutomated Geoscientific Analyses)d offers an opensource comprehensive set of geoscientific methods.

The need for modelling using georeferenced datais increasing, in many application fields, but particu-larly so in Life Sciences. The large amount of avai-lable spatial data has begun to open new avenues ofscientific inquiry into behaviors and patterns of pre-

viously considered unrelated information. However,the software tools presented above, including GISand R, are complex and require lengthy training andspecialised skills to be taken over. This is a limi-ting factor for the practical use of spatial modellingin some domains, such as Agronomic and Environ-mental Sciences where the stakeholders are not spe-cialists of spatial data. Moreover, the available soft-ware products lack an easy way to introduce expertknowledge, and are poor in soft computing tools.

Zadeh proposed the concept of a linguisticvariable12 to implement approximate concepts andreasoning. A fuzzy partition carries semantics andknowledge about the variable behavior. Recentwork13 makes it possible to take advantage of thefuzzy set formalism to add a semi-supervised as-pect to distance-based statistical procedures such asclustering. The semi-supervision is done by usingavailable expert knowledge to superimpose linguis-tic concepts onto numerical data. This pretreatmentis followed by the design of a pseudo metric basedon the fuzzy partitions corresponding to the con-cepts. The procedure is thus an approach to combinenumerical data and imperfect data resulting from ahuman judgment, and its software implementationcan help to promote soft computing.

New software, designed to facilitate modellingusing expertise as well as georeferenced data, wouldbe most useful to stakeholders intervening at diffe-rent levels of decision. Ideally it should providesome of the basic viewing functionalities of GIS andinteraction with maps. Expertise and data are availa-ble, and Decision Support Systems (DSS) must inte-grate them. The software should be easy to use witha quick and progressive learning, and a friendly in-terface so that decisions can be made and updatedfrom map viewing, learning using expert knowledgeand data, and map evolution. The concept of mana-gement zones, not limited to classes, is required. Tolimit the necessary work, the DSS software must beopen, be based on existing GIS components throughavailable libraries, include elementary geostatisticaltechniques through calls to R.

It can then become an open platform for addingnew soft computing developments, adapted to spa-

dhttp://www.saga-gis.org/

Page 4: Soft computing-based decision support tools for spatial dataSoft computing-based decision support tools for spatial data Serge Guillaume1 Brigitte Charnomordic2, Bruno Tisseyre3, James

S. Guillaume et al.

tial data. Targeted users include researchers in mo-delling tasks, counselors in Agronomic and Environ-mental Sciences and also teachers in those fields.

3. Proposed architecture

The DSS architecture is shown in Figure 1.The figure is divided by a dashed line: the

left part includes the components involved in theGeoFIS design while the right one illustrates howthey are used.

Uncertainty

Georeferenced

Data

Expert knowledgeGeoFIS

DSS

Quantitative evaluationPerformance indices

Open source toolboxes

Geotools: GIS library

R: Statistics, geostatistics

FisPro: Semantics, 1D, dynamical behavior

CGAL: Geometric Library

Output

Design Use

Qualitative evaluationVisualisation, interpretation

Decision support

Fig. 1. GeoFIS architecture.

The data under consideration are georeferenceddata. Another characteristic of the data available forthe decision maker, especially in life sciences likeAgronomic and Environmental Sciences, is their un-certainty. This is due to biological variability butalso to the necessity of using poorly defined con-cepts, such as flood-risk area.

Expert knowledge is central in decision making.The DSS should be oriented towards the service ofthe decision maker, his/her knowledge being giventhe leading part.

In the proposed architecture, various open sourcetoolboxes and libraries are used for the coopera-tion between expert knowledge and data. Statis-tical and geostatistical functions are implementedin the R projecte and, among the available GISlibraries, GeoTools is chosen because it includes

all of the necessary concepts and is written inJava, a good language to design friendly inter-faces. CGAL (Computational Geometry AlgorithmsLibrary)f provides efficient and reliable geometricalgorithms in the form of a C++ library.

The FisPro environment offers a high level of in-teraction between expertise and data for designingand optimizing fuzzy inference systems. It is not de-signed to handle geographic data, but can be usefulfor instance to build composite variables by approxi-mate reasoning or to design fuzzy partitions in theattribute space. Available on the Web since 2002,it is widely used in different fields and for variouspurposes (education, research, commercial).

FisPro’s main functionalities, which are detailedbelow, inspired the GeoFIS framework. The goal isto provide the decision maker not only with usefulindices for a quantitative evaluation but with a user-friendly interface to make a qualitative evaluation ofthe whole model. Interactive modelling capabilitiesare a must. Specific tools needed for spatial data vi-sualization, spatial reasoning and to investigate thespatial system behavior are under development andintroduced in the GeoFIS section.

3.1. FisPro (Fuzzy Inference System Design andOptimization)

FisPro has a C++ core and a Java interface. Itallows Fuzzy Inference System (FIS) design fromexpert knowledge or data. Among the availablefuzzy software toolboxes, FisPro stands out for sys-tem interpretability, which is a necessary conditionfor cooperation between expert knowledge and data.

FIS can be completely, and automatically, de-signed from data14. In the latter case, semantics isguaranteed at each step. The necessary conditionsfor fuzzy partitions to be interpretable and to im-plement the linguistic variable concepts have beenstudied by several authors15. The main points aredistinguishability, a justifiable number of fuzzy sets,normalization, sufficient overlapping and coverage.These conditions are met by so-called strong fuzzy

ehttp://www.R-project.orgfhttp://www.cgal.org/

Page 5: Soft computing-based decision support tools for spatial dataSoft computing-based decision support tools for spatial data Serge Guillaume1 Brigitte Charnomordic2, Bruno Tisseyre3, James

Soft computing-based decision support tools for spatial data

partitions (SFPs), such as the one shown in Figure 2.A SFP described by f membership functions (MFs)on the universe U fulfills the following condition:

∀x ∈U,f

∑i=1

µi(x) = 1 (1)

where µi(x) is the membership degree of x in the ithMF.

Fig. 2. A strong fuzzy partition (Fispro snapshot).

The rules share the same linguistic terms andthe optimization module does not modify the FISstructure and semantics are preserved after param-eter tuning.

FisPro’s efficient approach in exploratory ana-lysis and system modelling has been used to dealwith agricultural applications3. Special attention hasbeen put on the dynamical behavior of a FIS fol-lowing user modifications. Each variable, rule ordata item can be activated/deactivated. The systemparameters (operators, partitions, rule description)can be edited. All changes are dynamically han-dled and all current windows are updated, includingthe inference result ones. Response surfaces are alsoavailable for an analysis of the system behavior.

To help the user to assess the rule representa-tiveness, an option that evaluates the links between

rules and examples is available. A detailed cross-summary is given for each rule, the samples that firethis rule above a given matching degree, and for eachsample, the rules that are fired.

Inference can be done manually or on the cur-rent data file, with evaluation criteria that take intoaccount the numerical accuracy as well as the signif-icance of data items regarding the FIS.

Fig. 3. FisPro Inference from the data table.

Figure 3 shows two distinct windows. The up-per one shows the data as a table: a row correspondsto a data item, a column to a variable. The outputvariable is in the last column. A double-click on agiven row opens the inference window with the cor-responding input values, as shown in the bottom partof the figure. Each row corresponds to a rule. Foreach rule, the four first columns correspond to theinput variables. The fuzzy set is shaded up to thecorresponding membership degree for the given in-put value. The second input variable is not involved

Page 6: Soft computing-based decision support tools for spatial dataSoft computing-based decision support tools for spatial data Serge Guillaume1 Brigitte Charnomordic2, Bruno Tisseyre3, James

S. Guillaume et al.

in any rule. The last column displays the rule out-puts. This being a Sugeno FIS, the rule conclusionis given in parenthesis below the rule matching de-gree for the current input data. The inferred outputvalue, which results from rule output aggregation,appears in the top right corner (5.249). Modifyingany FIS element would update this window.

Fuzzy inference systems are useful for buildingcomposite variables to be used in DSS. Fuzzy parti-tioning can be used to model uncertainties throughlinguistic variables, and an example will be given inSection 5.

3.2. GeoFIS (Geographic Fuzzy Reasoning)

GeoFIS provides a simple evolutive frame-work to visualize and analyze spatial data. Based onopen source libraries, it is written in Java and usesGeoTools to display existing data layers or generatethem from raw text files. It includes calls to R toprovide one-dimensional spatial analysis. It is rel-atively easy to implement more geostatistical tech-niques through calls to R spatial packages. GeoFISalso includes an elementary zone learning module,written in C++. Add-ons will allow to introduce newlearning methods into the framework, in particularsoft computing ones.

Fig. 4. The GeoFIS main window.

Figure 4 shows an example of a two layer map. Thefirst layer displays the data points while the secondone corresponds to their Voronoi tessellation. TheVoronoi tessellation for a set of points S in the planeis a partition of the plane into convex polygons, eachof which consists of all the area in the plane closerto one particular point of S than to any other.

3.2.1. One-dimensional statistical analysis

All these functionalities are implemented using theR software11 with the gstatg package. The R func-tions are used by a large research community andare well tested. The interface implemented here usesthe Rserverh developments, which allow the directtransfer of objects between R and Java.

Fig. 5. GeoFIS histogram window.

The histogram window shows the distributionof data values for the selected variable. The num-ber of classes and the class bounds can be cus-tomized. Different choices are possible, includingequally spaced containers, bins with an equal num-ber of elements, or Sturges16 algorithm for selectingthe best number of classes.

ghttp://www.gstat.org/hhttp://www.rforge.net/Rserve/

Page 7: Soft computing-based decision support tools for spatial dataSoft computing-based decision support tools for spatial data Serge Guillaume1 Brigitte Charnomordic2, Bruno Tisseyre3, James

Soft computing-based decision support tools for spatial data

Given the distribution, data can be automaticallyor manually filtered, to define a validity range, forinstance one that holds 95% of the data, or by se-lecting the bounds, and so remove outliers.

The histogram window and the map viewing oneare dynamically linked, so that the valid and re-moved data points are plotted in the latter windowin two distinct colors, and updated according to theuser edits in the former one.

In the case of spatial data, it is important tomodel the degree of spatial dependence. This isdone using a semi-variogram. In GeoFIS, the var-iogram window prepares for kriging, i.e. interpola-tion using a defined model. The variogram modeloften needs expert tuning to fit the model taking intoaccount the data set specificities (spatial resolution,shape and size of the area under study . . . ). All ofthe model parameters can be adjusted and the theo-retical model (exponential, Gaussian, linear with silland spherical), as well as the data fit, are updatedaccordingly.

The variogram model can be saved in standardformat (xml) for reuse on new data or exported toother software.

Fig. 6. GeoFIS variogram window.

3.2.2. Learning module

The zone learning module is based on a segmenta-tion algorithm, inspired from an image-processingregion merging algorithm. It allows the delineationof discrete contiguous management zones. Ma-nagement in agricultural systems is dependent onboth the magnitude of variation and how it ispartitioned17. Segmentation algorithms differ fromclassification algorithms in that they are object-oriented (note: the term object-oriented here is usedin its image analysis context, not a software engi-neering context). This focus leads to the productionof discrete zones rather than classes and the outputis spatially structured.

Fig. 7. GeoFIS zone learning parameters.

Page 8: Soft computing-based decision support tools for spatial dataSoft computing-based decision support tools for spatial data Serge Guillaume1 Brigitte Charnomordic2, Bruno Tisseyre3, James

S. Guillaume et al.

One of the disadvantages with many object-oriented segmentation algorithms is a reliance onregular grid data for determining segment morpho-logy. This is probably an artefact from their pri-mary application in image analysis and has re-stricted the use of these algorithms on irregular agro-environmental data sets. The zone learning algo-rithm implemented in GeoFIS is able to process lowor high resolution data, on a regular grid or not. Itis inspired from a region-merging algorithm and alldetails can be found in 18. A fundamental point isthe way the spatial coordinates are used here. Theyare not involved in any distance calculation, but areonly used to define point and zone neighbourhood.The algorithm works on two spaces simultaneously(attribute space and geographic space). The prox-imity criterion used for zone merging is based on adistance in the attribute space, and it is only calcu-lated within a given neighbourhood. Spatial interpo-lation of data is not necessary for the algorithm torun. This is an asset, as interpolation generates syn-thetic data, whose artificial nature is often forgottenin the interpretation of the results.

Figure 7 shows the main parameters of the zonelearning algorithm. It presently works on a singledimension in the attribute space, which is referred toby Attribute column number. Stop criteria includethe number of zones to generate and a zone spatialheterogeneity based criterion. Intermediate mapsmay be required to allow users to see the evolutionof the zone merging process. An auxiliary variablecan be specified to recursively re-run the algorithmon a zone, using that auxiliary feature to guide thenew zoning.

As for all segmentation or classification meth-ods, the algorithm is sensitive to the choice of thedistance in the attribute space. Options include theEuclidean distance, as well as a fuzzy partition baseddistance.

3.2.3. Implementation details

GeoFIS currently includes the two previously des-cribed modules, one-dimensional statistical analysisand zone learning. Both modules are implementedinto the Java-based interfaced general framework,and the second one is also available as a stand aloneC++ program. This makes it possible to use it inde-pendently for computationally intensive automatedlearning tasks. GeoFIS is protected by a CeCILLopen source licensei.

The programs are hosted on the collaborativedevelopment environment MULCYBERj, in binaryand source form.

GeoFIS handles various input and output for-mats: csv files, shapefiles, raster files.

4. Introduction of soft computing

In the present work, soft computing is introducedusing fuzzy partition-based distances in the segmen-tation algorithm, instead of the classical Euclideandistance. A fuzzy partition-based distance, alsocalled FP-based distance, allows the introductionof expert knowledge in the algorithm19. The FP-based distance combines numerical and symbolicelements. Its numerical part allows it to handle mul-tiple membership in transition zones, while the sym-bolic one takes into account the granularity of theconcepts associated with the fuzzy sets (see 19 fordetails).

The proposal applies to data in the unit intervalU = [0,1] and relies on Fuzzy Partitions (FPs). Thedistance is called dP. We only recall here the defi-nition and some properties. All details can be foundin 13.

4.1. Mono-dimensional FP-based distance

Although the pseudo-metric, dP, is defined for gen-eral FPs13, it expression becomes quite simple forStrong Fuzzy Partitions (SFPs), as used in this pa-per.

Let Si = [Si,Si] the ith MF support, defined by{x|µi(x)> 0}.

ihttp://www.cecill.info/licences/Licence CeCILL V2-en.htmljhttp://mulcyber.toulouse.inra.fr/projects/geofis/

Page 9: Soft computing-based decision support tools for spatial dataSoft computing-based decision support tools for spatial data Serge Guillaume1 Brigitte Charnomordic2, Bruno Tisseyre3, James

Soft computing-based decision support tools for spatial data

Let Ki = [Ki,Ki] the ith MF kernel, defined by{x|µi(x) = 1}.

Let Xi = [Ki,Ki+1[, for 0 6 i 6 f .We denote by I(x) the function such that:

∀i ∈ [0, f ],x ∈ Xi⇔ I(x) = i

Let us introduce the function P:

P(x) = I(x)−µI(x)(x) (2)

P is a positive non-decreasing function of x and isincreasing in overlapping zones.

dP(x,y) can then be written as:

dP(x,y) =|P(y)−P(x)|

f −1(3)

4.1.1. Rank inversion vs Euclidean distance

Figure 8 shows an example of rank inversion of thefuzzy partition based distance results compared withthe Euclidean distance ones. With the univariatefuzzy partition-based distance dP, y and z are furtherapart than x and y, while they would be closer than xand y, were the Euclidean distance used. This rankinversion is due to the fact that all elements within agiven fuzzy set kernel have a null distance.

0

x y z

10.6 0.70.4

PPd (x,y)=0.067 d (y,z)=0.133

Fig. 8. Example of FP-based distance (dP) behavior.

4.1.2. Particular case of regular SFPs

A regular SFP is composed of triangular member-ship functions, with equidistributed kernel centersK1 . . .K f , with Ki = Si−1 = Si+1.

For this particular case, it was shown in 13 thatthe proposed pseudo-metric is a metric and that it

yields the same result as the Euclidean distance, re-gardless of the number of terms in the partition. Thereason is that the distance proposed in Eq. (3) dis-torts the Euclidean distance according to the two fol-lowing points: the symbolic distance between con-cepts and the indistinguishability of the kernel ele-ments. In the case of a regular SFP, these two cha-racteristics disappear because all kernels are reducedto single points and are equidistant.

4.2. Multi-dimensional FP-based distance

A simple and efficient way to obtain a multidimen-sional pseudo-metric is to perform a Minkowski-likecombination of the univariate pseudo-metrics. Lettwo multidimensional points x = (x1, . . . ,xM) andy = (y1, . . . ,yM) with xi,yi ∈ [0,1], ∀i ∈ 1, . . . ,M.

We have the following definition for the multi-dimensional distance, which is also a pseudo-metric:

∀x,y d(x,y) =

[M

∑j=1

(d j(x j,y j))k

] 1k

(4)

where k is a scalar positive value, corresponding tothe Minkowski exponent. The advantage of this def-inition is that one can use different sub-distances inthe various dimensions, for instance a FP-based dis-tance in dimension a if expert knowledge is availablefor the corresponding feature, and on the contrary,the Euclidean one in dimension b.

5. Case studies

This section presents two real world agronomic ap-plications involving spatial data and expert know-ledge. The first one is monodimensional, and thesecond one is bidimensional.

5.1. Defining management zones in a winegrowing application

The georeferenced data are yield data20, comingfrom an embedded sensor on a grape-harvestingmachine. The 1.4 ha field is planted with theBourboulenc variety and was harvested in 2001 inProvence (France). The average sampling rate is

Page 10: Soft computing-based decision support tools for spatial dataSoft computing-based decision support tools for spatial data Serge Guillaume1 Brigitte Charnomordic2, Bruno Tisseyre3, James

S. Guillaume et al.

about 2400 measurements per ha. But, due to a dataacquisition problem, some records are missing.

The objective of the study is to find suitable ma-nagement zones from the information found in theyield data and the domain knowledge. Several oper-ations could then be adapted including, for example,fertilization, winter pruning and inter row manage-ment. In this case, the grower was considering theestablishment of grass in the rows located in zonesof high production to introduce a competition withthe vines and reduce their vigour and the resultingyield.

Let us discuss the different modelling steps madepossible by the software framework.

5.1.1. Viewing the spatial distribution

The first stage is to view the spatial distribution ofthe yield attribute, by splitting it into classes, andprojecting it into a two dimensional map. Variousmethods can be used: expert definition of classesor automatic definition from data. We present herethree different choices for clustering in the attributespace: a) crisp clustering using expert boundaries,b) automatic k-means with three groups and c) clus-tering into three equi populated groups. Figures 9,10 and 11 show the respective clustered maps.

Fig. 9. Clustering vine data - three expert groups.

Fig. 10. Clustering vine data - three K-means clusters.

Page 11: Soft computing-based decision support tools for spatial dataSoft computing-based decision support tools for spatial data Serge Guillaume1 Brigitte Charnomordic2, Bruno Tisseyre3, James

Soft computing-based decision support tools for spatial data

Fig. 11. Clustering vine data - three equipopulated groups.

All three maps are derived from interpolateddata. Interpolation is used to represent a continu-ous map, so even if the sampling is irregular and/orthere are gaps in the data (see Figure 9), it is possibleto visualize the main spatial patterns of the field.

Each of the different types of maps is importantfor operational data analysis.

The map displayed on Figure 9 provides expertclasses. It displays the response of the field in rela-tion to the technical goals of the grower. The centralclass corresponds to the yield target, the lower andupper classes are the yields for which the vineyardoperations (pruning, fertilization, etc.) are probablynot appropriate. Figure 9 shows a northern zone thatmatches the yield goal and a southern zone for whichthe vine management does not seem appropriate be-cause the yield is too high.

Other representations are necessary for opera-tional purposes. The k-means classification (Figure10) helps to identify whether there is a particular dis-tribution of data in the plot. Equiprobable classifica-tion (Figure 11) allows to visualize the data variabil-ity, showing for example that the northern zone con-

sists of medium and very low yields. This map maybe useful to highlight the effects of environmentalfactors (soil, elevation, etc..) which explain the ob-served spatial variability. In all examples, regardlessof the classification methods used, the maps showdiscontinuous spatial patterns. Although classifica-tion is interesting for analysis purposes, the resultingmaps can hardly be taken into account to proposesite-specific management of the field.

5.1.2. Zoning with a Euclidean distance

Fig. 12. Zoning vine data using the Euclidean distance.

The second stage consists in a spatial zoning ofthe yield data, using a Euclidean distance in the at-tribute space. The merging algorithm mentioned inSection 3.2 is used. It yields a series of maps witha decreasing number of zones. The six zone mapis presented in Figure 12, that highlights the useful-ness of zoning. It shows zones where site-specificmanagement may be considered. However, from apractical point of view, that map remains difficult touse.

Page 12: Soft computing-based decision support tools for spatial dataSoft computing-based decision support tools for spatial data Serge Guillaume1 Brigitte Charnomordic2, Bruno Tisseyre3, James

S. Guillaume et al.

This zoning method yields zones with complexborders and does not allow a simple view of the field.

5.1.3. Zoning with a FP-based distance

The third stage improves the spatial zoning ofthe yield data by incorporating expert know-ledge through a fuzzy partition-based distance (seeSection 3.2). The fuzzy set breakpoints are 7,9,11,which are related to the choice made previously forthe crisp classification. A FisPro snapshot is shownin Figure 13, that displays the fuzzy partition to-gether with the data distribution.

Fig. 13. Histogram and fuzzy partition for vine data.

The six zone map obtained by running the zo-ning algorithm, guided by the fuzzy partition baseddistance, is shown in Figure 14.

The introduction of fuzzy logic in the zoningmethod provides a map that simplifies the represen-tation of the field. Two main management zones arehighlighted, one corresponding to the northern lowyield area, the other one to the southern high yieldarea. Note that a few specific zones of small sizeare also identified. They correspond to i) a zone ofvery high yield in the center of the plot and ii) twolow yield zones located along the southern edge of

the field which are due to border effects (beginningof the rows). Depending on the goal and the ma-chinery of the grower, these small zones may notbe considered sensible for site-specific management.

Fig. 14. Zoning vine data using a FP-based distance.

5.2. Bivariate study of yield-protein interactionin cereal production

The interaction between yield and protein is key tounderstanding yield potential and Nitrogen (N) bud-gets in cereal production systems. With on-harvestercereal yield and protein sensors now commerciallyavailable it is possible to acquire high-density yieldand protein information. However, the applicationof agronomic decisions with co-joined yield andprotein data has not been well examined to date, inpart because effective measures of interrogating thedata for decision-making have not been proposed.

Here we propose a two-step process using anadapted version of the univariate zoning algorithmpresented in Section 3.2.2. The alternative would beto use a bidimensional distance, see Eq. (4). Theadvantage of the solution proposed here is the in-

Page 13: Soft computing-based decision support tools for spatial dataSoft computing-based decision support tools for spatial data Serge Guillaume1 Brigitte Charnomordic2, Bruno Tisseyre3, James

Soft computing-based decision support tools for spatial data

terpretability of the zoning results for the decisionmakers.

The yield and protein data were collected in 2004from a 80 ha field in north-west NSW, Australia, us-ing on-harvesters sensors connected to a DGPS uniton a commercial grain combine harvester. The datawere trimmed of outliers. The two data sets were atdifferent spatial resolutions; the protein data was ata density of 65 points/ha while the yield data was at725 points/ha.”

5.2.1. FP-based bivariate segmentation

The two-step process is as follows:

1. The trimmed (uninterpolated) yield and pro-tein data were independently run through thezoning algorithm to give two independent re-sults (y yield and p protein zones). Themembership functions used in the process areshown in Figure 15 and 16, together with thedata distribution. The Protein MFs are basedon the general rules for hard bread wheat21.

Zones are restricted to a minimum size, whichfor yield was set at 350 points and for proteinat 30 points. Both equate to approximately 0.5ha. Multiple outputs [5,10,15] are generatedfor each univariate analysis (p/y).

2. The two results (y-zones and p-zones) fromthe segmentation were intersected to makea polygon layer of z segmentation-derivedYield-Protein zones.

All possible combinations of the derived p-zonesand y-zones were considered. The zoning was la-beled as p05y05 for the intersection of the 5-zoneprotein and 5-zone yield maps; p05y10 for 5-zoneprotein and 10-zone yield maps etc.

Fig. 15. Histogram and fuzzy partition for cereal protein.

Fig. 16. Histogram and fuzzy partition for cereal yield.

Page 14: Soft computing-based decision support tools for spatial dataSoft computing-based decision support tools for spatial data Serge Guillaume1 Brigitte Charnomordic2, Bruno Tisseyre3, James

S. Guillaume et al.

5.2.2. Data clean up and results

In the segmentation approach, the univariate seg-mentation is constrained to ensure that the y yieldzones and the p protein zones are large enough to bemanaged, as explained in the previous case study.

However, when these are intersected to produceYield-Protein zones, some of the zones may be toosmall to be managed. Therefore the Yield-Proteinzone maps need to be cleaned to produce a mapthat is suitable for agronomic use (and decision-making). Polygons that were less than a desiredthreshold (in this example 0.5 ha) were identifiedand removed, creating a clean polygon map that con-tains holes. The 5-m field grid (that was used for theinterpolation) was intersected with the cleaned poly-gon maps. Grid points located outside a polygonwere re-associated with the nearest polygon using anearest neighbor interpolation. Grid points locatedinside a polygon retain the polygon features. Thegrid data was then transformed into a final polygonmap, which is now a continuous surface.

Fig. 17. Map of the cleaned 43-zone segmentation results(cereal yield-protein interaction).

5.3. Perspectives

These two case studies show the interest of incor-porating expert knowledge into zoning algorithmsto guide the zoning and generate more interpretablemaps. This is important for decision making inmany application fields. Even when dealing withmultidimensional cases, the expert knowledge oftenremains monodimensional, as interactions are com-plex and difficult to grasp. Therefore, the FP-baseddistance is a useful tool that must be associated withaggregation operators. These operators should usesoft computing to relax constraints on zone deli-neation and allow more flexibility in the aggregationprocess, while preserving the result interpretability.

6. Conclusion

Cooperation between knowledge and data is stillan open challenge in system modelling. Amongsoft computing methods, fuzzy logic provides ori-ginal efficient solutions. Its success stems from theability to express the system behavior in a linguis-tic, highly interpretable way. An emerging ambi-tious challenge is the development of methods andsoftware suitable for cooperation between domainknowledge and georeferenced data, also called spa-tial data, which are now becoming available in greatquantities.

In this paper, we proposed an open source frame-work, based on specialized toolboxes and software,to be used for modelling and decision support. Italso aims to answer some educational needs of stu-dents in these application domains, including ad-vanced programs for developing countries where theuse of open source software is an asset. We intro-duced a soft computing tool allowing the users to de-fine distances based on expert knowledge by meansof fuzzy partitions and to incorporate them in a seg-mentation algorithm.

The software functionalities are illustrated ontwo case studies in Agronomy, that show how theycan help practitioners.

This is only a first step. For instance, it is neces-sary to develop specific visualization tools, in orderto represent a fuzzy zone, with uncertainties in two

Page 15: Soft computing-based decision support tools for spatial dataSoft computing-based decision support tools for spatial data Serge Guillaume1 Brigitte Charnomordic2, Bruno Tisseyre3, James

Soft computing-based decision support tools for spatial data

different spaces, the geographical space and the at-tribute space.

The interpretability constraints which have beenimplemented in fuzzy software for ordinary data,such as FisPro, are not so easy to define for geo-referenced data. There is no trivial extension ofstrong fuzzy partitions to a two-dimensional space.The development of approximate map comparisontechniques and suitable aggregation operators, in or-der to monitor the temporal evolution of zones ona map, or to compare maps for different attributes,constitutes another topic of interest. Image analysistechniques have to be extended to include irregularlyspaced data, coming from manual measurements,and domain knowledge.

Applying fuzzy logic tools, or more generallysoft computing tools, to spatial data is an attrac-tive perspective that opens new research topics, bothmethodological and software related.

Acknowledgments

The authors are thankful to Mr. James Hassall andDr Brett Whelan for making the data (and theirknowledge) available for this work. The cereal datawas collected as part of the GRDC SIP09 project.This work was funded in part by the AgropolisFoundation.

References

1. F. Colin, S. Guillaume, B. Tisseyre, “Small catchmentagricultural management using decision variables de-fined at catchment scale and a fuzzy rule-based sys-tem: a mediterranean vineyard case study”, Water Re-sources Management, 25, 2649–2668 (2011).

2. M. El Hajj, A. Begue, S. Guillaume, “Integrating spot-5 time series, crop growth modeling and expert know-ledge for monitoring agricultural practices - the caseof sugarcane harvest on reunion island”, Remote Sens-ing of Environment, 113 (10), 2052–2061 (2009).

3. S. Guillaume, B. Charnomordic, “Interpretable fuzzyinference systems for cooperation of expert know-ledge and data in agricultural applications using fis-pro”, I. C. N. CFP10FUZ-DVD (Ed.), IEEE Inter-national Conference on Fuzzy Systems, Barcelona,Spain, 2019–2026 (2010).

4. S. Matreata, M. Matreata, “Application of fuzzylogic systems for the elaboration of an opera-

tional hydrological warning system in ungaugedbasins”, L. Pfister, L. Hoffmann (Eds.), Uncertain-ties in the ’monitoring-conceptualisation-modelling’sequence of catchment research. 11th Conference ofthe Euromediterranean Network of Experimental andRepresentative Basins (ERB), Luxembourg, 57–162(2006).

5. J.-N. Paoli, O. Strauss, B. Tisseyre, J.-M. Roger,S. Guillaume, “Spatial data fusion for qualitative es-timation of fuzzy request zones: Application on pre-cision viticulture”, Fuzzy Sets and Systems, 158 (5),535–554 (2007).

6. T. Rajaram, A. Das, “Modeling of interactions amongsustainability components of an agro-ecosystem usinglocal knowledge through cognitive mapping and fuzzyinference system”, Expert Systems with Applications,37 (2), 1734–1744 (2010).

7. N. Tremblay, M. Bouroubi, B. Panneton, S. Guil-laume, P. Vigneault, C. Belec, “Development and vali-dation of fuzzy logic inference to determine optimumrates of n for corn on the basis of field and crop fea-tures”, Precision Agriculture, 11, 621–635 (2010).

8. D. D. Nauck, “Gnu fuzzy”, IEEE International Con-ference on Fuzzy Systems, 1019–1024 (2007).

9. F. Di Martino, S. Sessa, “The extended fuzzy c-meansalgorithm for hotspots in spatio-temporal gis”, ExpertSystems with Applications, 38, 11829–11836 (2011).

10. Y. Chen, S. Khan, Z. Paydar, “To retire or expand?a fuzzy gis-based spatial multi-criteria evaluationframework for irrigated agriculture”, Irrigation andDrainage, 59 (2), 174–188 (2010).

11. R Development Core Team, “R: A Language and En-vironment for Statistical Computing”, R Foundationfor Statistical Computing, Vienna, Austria, ISBN 3-900051-07-0 (2008).

12. L. A. Zadeh, “The concept of linguistic variable andits application to approximate reasoning - parts i,ii and iii”, Information Sciences, 8-9, 199–249,301–357,43–80 (1975).

13. S. Guillaume, B. Charnomordic, P. Loisel, “Fuzzypartitions: a way to integrate expert knowledge intodistance calculation”, Information Sciences, in press(2013), doi:10.1016/j.ins.2012.07.045.

14. S. Guillaume, B. Charnomordic, “Learning inter-pretable fuzzy inference systems with fispro”, Infor-mation Sciences, 181, 4409–4427 (2011).

15. J. V. de Oliveira, “Semantic constraints for member-ship functions optimization”, IEEE Transactions onSystems, Man and Cybernetics. Part A, 29 (1), 128–138 (1999).

16. H. A. Sturges, “The Choice of a Class Interval”, Jour-nal of the American Statistical Association, 21 (153),65–66 (1926).

17. M. Pringle, A. McBratney, B. Whelan, J. Taylor, “Apreliminary approach to assessing the opportunity for

Page 16: Soft computing-based decision support tools for spatial dataSoft computing-based decision support tools for spatial data Serge Guillaume1 Brigitte Charnomordic2, Bruno Tisseyre3, James

S. Guillaume et al.

site-specific crop management in a field, using yieldmonitor data”, Agricultural Systems, 76 (1), (2003)273 – 292.

18. M. Pedroso, J. Taylor, B. Tisseyre, B. Charnomordic,S. Guillaume, “A segmentation algorithm for the de-lineation of management zones”, Computer and Elec-tronics in Agriculture, 70, 199–208 (2010).

19. S. Guillaume, B. Charnomordic, P. Loisel, “A numer-ical distance based on fuzzy partitions”, S. Galichet,J. Montero, G. Mauris (Eds.), Proceedings of the 7th

conference of the European Society for Fuzzy Logicand Technology (EUSFLAT-2011) and LFA-2011, An-necy, France, 1000–1006 (2011).

20. B. Tisseyre, A. B. McBratney, “A technical oppor-tunity index based on mathematical morphology forsite-specific management: an application to viticul-ture”, Precision Agriculture, 9, 101–113 (2008).

21. W. M. Strong, I. C. Holford, “Fertilisers and manures,Sustainable crop production in the subtropics”, ALClarke, PB Wylie, 214–234 (1997).