gis exercise - spring, 2011 - laboratorio di...
TRANSCRIPT
GIS Exercise –Spring 2011 1
GIS Exercise - Spring, 2011
Maria Antonia Brovelli
Laura Carcano, Marco Minghini
ArcGIS exercise 3 - Global trend removal
Introduction:
Besides exploring statistical characteristics and distributional information,
we may be interested in knowing whether the data show any spatial trend or
not. Compared to a certain trend, are there any significant outliers (unusual
data)? Checking for outliers should be a routine part of any data analysis.
And so potential outliers have to be analyzed: if they are wrong, they should
be corrected if possible and deleted if it’s not.
Clues for outlier presence can be tracked by lots of tools, for instance,
histogram, semivariogram cloud, statistical tests, Voronoi polygons and so
forth.
In this exercise we will take a look in detail at these questions. We will see
one example of trend removal process by polynomial interpolation.
GIS Exercise –Spring 2011 2
I: Global trend removal: polynomial interpolation
Data:
Lidar1.dbf
• Add data -> select the file Lidar1.dbf
• Right click on the name Lidar 1 -> Display XY data – put X field=N1, Y field=N2, Z
field=N3; change the reference system clicking on Edit… -> … -> Monte Mario
Italy1.prj
• Right click on the layer name -> Properties -> Quantities -> Quantity numbers:
Classify -> Sampling – put a number higher than 38000 (ex. 50000); and change the
colours of the field using a color ramp over the N3 fields, with for example 5 classes.
Tasks:
1. Subdivide the data set Lidar1 into two subsets.
Subdivide the lidar dataset into two subsets, the first one used as training
and the second as test, corresponding respectively to 90% and 10% and name
the subsets as “Lidar1_Events_training1” (corresponding to 90% of the
population = 34207) and “Lidar1_Events_test1” (corresponding to 10% of the
population = 3801).
• Geostatistical Analyst -> Subset Features… – put as Input features = Lidar1 Events,
Output training feature class = Lidar1_Events_training1, Output test feature class =
Lidar1_Events_test1; Size of training feature subset = 90; Subset size units =
PERCENTAGE_OF_INPUT
This process extracts data in a completely random way.
GIS Exercise –Spring 2011 3
2. Repeat the same operation on the data set Lidar1 Events, dividing it
another time in two subsets corresponding to the 90% and 10% and
naming the two subsets as “Lidar1_Events_training2” and
“Lidar1_Events_test2”.
Because the operation of data extraction is random, we expect that the
subsets training2 and test2 are different from the two subsets created
previously. To verify that this statement is true, we have to compare the two
datasets.
GIS Exercise –Spring 2011 4
3. Compare the two datasets (respectively training and test), using the
General QQ plot function.
• Geostatistical Analyst -> Explore Data -> General QQ Plot; Data source #1 =
Lidar1_Events_training1, Attribute = N3, Data source #2 = Lidar1_Events_training2,
Attribute = N3. Handling coincidental sample: choose Include all.
Results for the training sets: the result can be slightly different depending on
the way the program extracts the data.
You can see that the plot follows approximately a straight line, except in
some parts of the plot (as you can see in the highlighted rectangle).
GIS Exercise –Spring 2011 5
Results for the test sets: the result can be slightly different depending on the
way the program extracts the data.
What can you observe? What conclusion can you draw from the observation?
-----------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------
4. Compare now the two subsets (training and test) belonging to the
same dataset (for example the first one extracted from Lidar1 Events),
using the General QQ plot function.
• Geostatistical Analyst -> Explore Data -> General QQ Plot; Data source #1 =
Lidar1_Events_training1, Attribute = N3, Data source #2 = Lidar1_Events_test1,
Attribute = N3. Handling coincidental sample: choose Include all.
GIS Exercise –Spring 2011 6
Result: the result can be slightly different depending on the way the program
extracts the data.
What can you observe? What conclusion can you draw from the observation?
-----------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------
5. Interpolate “Lidar1_Events_training1” with a Global polynomial
interpolation using different polynomial degrees and validate the
model using the two subsets “Lidar1_events_training1” and
“Lidar1_Events_test1”
Use the data of the training set to create the model:
GIS Exercise –Spring 2011 7
• Geostatistical Analyst -> Geostatistical Wizard -> Global Polynomial Interpolation;
Source dataset = Lidar1_Events_training1, Data field = N3. Handling coincidental
sample: choose Include all.
Validate the model using the training data and then the test data.
• Right click on the layer name “Global polynomial interpolation
[Lidar1_Events_training1]” -> Validation/Prediction -> Input geostatistical layer =
Global polynomial interpolation; Input point observation locations =
Lidar1_Events_training1; Field to validate on = N3; Output statistics at point
locations = Validation1_training1.shp.
• Right click on the layer name “Global polynomial interpolation [Lidar1
Events_training1]” -> Validation/Prediction -> Input geostatistical layer = Global
polynomial interpolation; Input point observation locations = Lidar1 Events_test1;
Field to validate on = N3; Output statistics at point locations = Validation1_test1.shp.
Keep record of the Mean (M) and the Root-Mean-Square (RMS) values of the
Training group and the Test group by changing the power degree from 1 to 9.
GIS Exercise –Spring 2011 8
Both for the training and test we have to save the Validation file and open
the file validation.dbf with Excel -> there you can compute the RMS given by
the formula:
( )2
1
ˆN
i ii
x xRMS
N=
−=∑
Here follows the example obtained in this case:
Training Test
power Mean RMS Mean RMS
1 0,000000 6,209330 -0,162161 6,235039
2 0,000000 6,137045 -0,181887 6,176774
3 0,000000 4,423970 -0,011410 4,553183
GIS Exercise –Spring 2011 9
4 0,000000 4,304828 -0,092253 4,436488
5 0,000000 4,102886 -0,095261 4,240710
6 0,000000 4,027354 -0,097905 4,171919
7 0,000000 3,995920 -0,104914 4,140018
8 0,000000 3,957234 -0,121594 4,106244
9 0,000000 3,882958 -0,121594 4,106245
Note: Since the sub data sets have been extracted randomly, the
interpolation results are subjected to slight variations one case from another.
Keep down your own result table; analyze the different Mean (M) and Root-
Mean-Square (RMS) Prediction Errors. What is your conclusion about the
optimal polynomial interpolation order?
-----------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------
6. Visualize the results in 3D
• Geostatistical Analyst -> Explore data -> Trend Analysis; in the window you have to
select as Layer = Lidar1 Events and as Attribute = N3.
Here follows the 3D visualization of the original data:
GIS Exercise –Spring 2011 10
GIS Exercise –Spring 2011 11
Here follows the 3D visualization of the predicted data:
GIS Exercise –Spring 2011 12
The 3D visualization of error shows as: