gis exercise - spring, 2011 - laboratorio di...

GIS Exercise –Spring 2011 1

GIS Exercise - Spring, 2011

Maria Antonia Brovelli

Laura Carcano, Marco Minghini

ArcGIS exercise 3 - Global trend removal

Introduction:

Besides exploring statistical characteristics and distributional information,

we may be interested in knowing whether the data show any spatial trend or

not. Compared to a certain trend, are there any significant outliers (unusual

data)? Checking for outliers should be a routine part of any data analysis.

And so potential outliers have to be analyzed: if they are wrong, they should

be corrected if possible and deleted if it’s not.

Clues for outlier presence can be tracked by lots of tools, for instance,

histogram, semivariogram cloud, statistical tests, Voronoi polygons and so

forth.

In this exercise we will take a look in detail at these questions. We will see

one example of trend removal process by polynomial interpolation.


I: Global trend removal: polynomial interpolation

Data:

Lidar1.dbf

• Add data -> select the file Lidar1.dbf

• Right click on the name Lidar 1 -> Display XY data – put X field=N1, Y field=N2, Z

field=N3; change the reference system clicking on Edit… -> … -> Monte Mario

Italy1.prj

• Right click on the layer name -> Properties -> Quantities -> Quantity numbers:

Classify -> Sampling – put a number higher than 38000 (ex. 50000); and change the

colours of the field using a color ramp over the N3 fields, with for example 5 classes.

Tasks:

1. Subdivide the data set Lidar1 into two subsets.

Subdivide the lidar dataset into two subsets, the first one used as training

and the second as test, corresponding respectively to 90% and 10% and name

the subsets as “Lidar1_Events_training1” (corresponding to 90% of the

population = 34207) and “Lidar1_Events_test1” (corresponding to 10% of the

population = 3801).

• Geostatistical Analyst -> Subset Features… – put as Input features = Lidar1 Events,

Output training feature class = Lidar1_Events_training1, Output test feature class =

Lidar1_Events_test1; Size of training feature subset = 90; Subset size units =

PERCENTAGE_OF_INPUT

This process extracts data in a completely random way.


2. Repeat the same operation on the data set Lidar1 Events, dividing it

another time in two subsets corresponding to the 90% and 10% and

naming the two subsets as “Lidar1_Events_training2” and

“Lidar1_Events_test2”.

Because the operation of data extraction is random, we expect that the

subsets training2 and test2 are different from the two subsets created

previously. To verify that this statement is true, we have to compare the two

datasets.


3. Compare the two datasets (respectively training and test), using the

General QQ plot function.

• Geostatistical Analyst -> Explore Data -> General QQ Plot; Data source #1 =

Lidar1_Events_training1, Attribute = N3, Data source #2 = Lidar1_Events_training2,

Attribute = N3. Handling coincidental sample: choose Include all.

Results for the training sets: the result can be slightly different depending on

the way the program extracts the data.

You can see that the plot follows approximately a straight line, except in

some parts of the plot (as you can see in the highlighted rectangle).


Results for the test sets: the result can be slightly different depending on the

way the program extracts the data.

What can you observe? What conclusion can you draw from the observation?

-----------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------

4. Compare now the two subsets (training and test) belonging to the

same dataset (for example the first one extracted from Lidar1 Events),

using the General QQ plot function.

• Geostatistical Analyst -> Explore Data -> General QQ Plot; Data source #1 =

Lidar1_Events_training1, Attribute = N3, Data source #2 = Lidar1_Events_test1,

Attribute = N3. Handling coincidental sample: choose Include all.


Result: the result can be slightly different depending on the way the program

extracts the data.

What can you observe? What conclusion can you draw from the observation?

-----------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------

5. Interpolate “Lidar1_Events_training1” with a Global polynomial

interpolation using different polynomial degrees and validate the

model using the two subsets “Lidar1_events_training1” and

“Lidar1_Events_test1”

Use the data of the training set to create the model:


• Geostatistical Analyst -> Geostatistical Wizard -> Global Polynomial Interpolation;

Source dataset = Lidar1_Events_training1, Data field = N3. Handling coincidental

sample: choose Include all.

Validate the model using the training data and then the test data.

• Right click on the layer name “Global polynomial interpolation

[Lidar1_Events_training1]” -> Validation/Prediction -> Input geostatistical layer =

Global polynomial interpolation; Input point observation locations =

Lidar1_Events_training1; Field to validate on = N3; Output statistics at point

locations = Validation1_training1.shp.

• Right click on the layer name “Global polynomial interpolation [Lidar1

Events_training1]” -> Validation/Prediction -> Input geostatistical layer = Global

polynomial interpolation; Input point observation locations = Lidar1 Events_test1;

Field to validate on = N3; Output statistics at point locations = Validation1_test1.shp.

Keep record of the Mean (M) and the Root-Mean-Square (RMS) values of the

Training group and the Test group by changing the power degree from 1 to 9.


Both for the training and test we have to save the Validation file and open

the file validation.dbf with Excel -> there you can compute the RMS given by

the formula:

( )2

1

ˆN

i ii

x xRMS

N=

−=∑

Here follows the example obtained in this case:

Training Test

power Mean RMS Mean RMS

1 0,000000 6,209330 -0,162161 6,235039

2 0,000000 6,137045 -0,181887 6,176774

3 0,000000 4,423970 -0,011410 4,553183


4 0,000000 4,304828 -0,092253 4,436488

5 0,000000 4,102886 -0,095261 4,240710

6 0,000000 4,027354 -0,097905 4,171919

7 0,000000 3,995920 -0,104914 4,140018

8 0,000000 3,957234 -0,121594 4,106244

9 0,000000 3,882958 -0,121594 4,106245

Note: Since the sub data sets have been extracted randomly, the

interpolation results are subjected to slight variations one case from another.

Keep down your own result table; analyze the different Mean (M) and Root-

Mean-Square (RMS) Prediction Errors. What is your conclusion about the

optimal polynomial interpolation order?

-----------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------

6. Visualize the results in 3D

• Geostatistical Analyst -> Explore data -> Trend Analysis; in the window you have to

select as Layer = Lidar1 Events and as Attribute = N3.

Here follows the 3D visualization of the original data:


Here follows the 3D visualization of the predicted data:


The 3D visualization of error shows as:

gis exercise - spring, 2011 - laboratorio di...

Documents