land cover classification of finnish lapland using …...about 80% when aggregated to four corine...

15
The Photogrammetric Journal of Finland, Vol. 23, No. 2, 2013 Received 23.9.2013, Accepted 20.12.2013 LAND COVER CLASSIFICATION OF FINNISH LAPLAND USING DECISION TREE CLASSIFICATION ALGORITHM Markus Törmä Finnish Environment Institute SYKE Finland [email protected] ABSTRACT Land cover of Finnish Lapland was classified to 16 land cover classes using optical IRS LISS, Spot XS and MODIS satellite images, ancillary GIS data and decision tree classifier. The aim of this study was to test decision tree classifier for land cover classification and study the effects of its parameters to classification result. In the best case, the overall accuracy was about 68% for all 16 classes when individual images were classified. The overall accuracy was only about 45% when whole mosaic was classified. It seems that the most problematic classes are those with vegetation but which are not forest. 1. INTRODUCTION Food and Agriculture Organization of the United Nations, when planning their Land Cover Classification System (LCCS), have defined land cover as (Di Gregorio and Jansen, 2000) "Land cover is the observed (bio)physical cover on the earth's surface." In other words, land cover should include directly observable vegetation and man-made structures, but quite often bare rock, soil and water are also included. Companion to land cover is land use, which is defined as the arrangements, activities and inputs people undertake in a certain land cover type to produce, change or maintain it (Di Gregorio and Jansen, 2000). It is important to realize that certain land cover can have several different land uses, like forested area has its economical (e.g. forestry, hunting) and recreational uses (e.g. trekking). The main resource controlling primary productivity for terrestrial ecosystems can be defined in terms of land: the area of land available, land quality and the soil moisture characteristics. Changes in land cover and land use affect the global systems (e.g. atmosphere, climate and sea level) or they occur in a localized fashion in enough places to add up to a significant total. Hence, land cover is a geographical feature which may form a reference base for applications ranging from forest and rangeland monitoring, production of statistics, planning, investment, biodiversity, climate change, to desertification control. Nowadays it is realized that it is very important to know how land cover has changed over time, in order to make assessments of the changes one could expect in the future and the impact these changes will have on peoples' lives (Di Gregorio and Jansen, 2000). There have been made many land cover classification in northern part of Boreal forest and Arctic zones, from global classification based on low resolution satellite images (Walker et al., 2002), to regional (Kumpula et al., 2006; Hagner et al., 2005; Reese and Nilsson, 2005) and local classifications based on very high resolution satellite or aerial images (Mikkola and Pellikka, 17

Upload: others

Post on 07-Aug-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Land cover classification of Finnish Lapland using …...about 80% when aggregated to four Corine 3rd level forest classes (Hagner et al., 2005). The Finnish Corine Land Cover 2000

The Photogrammetric Journal of Finland, Vol. 23, No. 2, 2013        Received 23.9.2013, Accepted 20.12.2013 

LAND COVER CLASSIFICATION OF FINNISH LAPLAND USING DECISION TREE CLASSIFICATION ALGORITHM

Markus Törmä

Finnish Environment Institute SYKE

Finland [email protected]

ABSTRACT Land cover of Finnish Lapland was classified to 16 land cover classes using optical IRS LISS, Spot XS and MODIS satellite images, ancillary GIS data and decision tree classifier. The aim of this study was to test decision tree classifier for land cover classification and study the effects of its parameters to classification result. In the best case, the overall accuracy was about 68% for all 16 classes when individual images were classified. The overall accuracy was only about 45% when whole mosaic was classified. It seems that the most problematic classes are those with vegetation but which are not forest. 1. INTRODUCTION Food and Agriculture Organization of the United Nations, when planning their Land Cover Classification System (LCCS), have defined land cover as (Di Gregorio and Jansen, 2000)

"Land cover is the observed (bio)physical cover on the earth's surface." In other words, land cover should include directly observable vegetation and man-made structures, but quite often bare rock, soil and water are also included. Companion to land cover is land use, which is defined as the arrangements, activities and inputs people undertake in a certain land cover type to produce, change or maintain it (Di Gregorio and Jansen, 2000). It is important to realize that certain land cover can have several different land uses, like forested area has its economical (e.g. forestry, hunting) and recreational uses (e.g. trekking). The main resource controlling primary productivity for terrestrial ecosystems can be defined in terms of land: the area of land available, land quality and the soil moisture characteristics. Changes in land cover and land use affect the global systems (e.g. atmosphere, climate and sea level) or they occur in a localized fashion in enough places to add up to a significant total. Hence, land cover is a geographical feature which may form a reference base for applications ranging from forest and rangeland monitoring, production of statistics, planning, investment, biodiversity, climate change, to desertification control. Nowadays it is realized that it is very important to know how land cover has changed over time, in order to make assessments of the changes one could expect in the future and the impact these changes will have on peoples' lives (Di Gregorio and Jansen, 2000). There have been made many land cover classification in northern part of Boreal forest and Arctic zones, from global classification based on low resolution satellite images (Walker et al., 2002), to regional (Kumpula et al., 2006; Hagner et al., 2005; Reese and Nilsson, 2005) and local classifications based on very high resolution satellite or aerial images (Mikkola and Pellikka,

17

Petri
Typewritten Text
doi:10.17690/013232.2
Page 2: Land cover classification of Finnish Lapland using …...about 80% when aggregated to four Corine 3rd level forest classes (Hagner et al., 2005). The Finnish Corine Land Cover 2000

2002). An example of land cover classification for a bit specialized purpose is Finnish reindeer pasture inventory (Kumpula et al., 2006). This classification has 17 classes describing vegetation types and non-vegetated areas of Northern Finland. These classes can be combined to form three main classes of reindeer pastures. The classification is based on Landsat ETM+ and TM images, their semi-unsupervised classification, cluster labeling with plot measurements, and post-processing with ancillary GIS data. The classification accuracies of main reindeer pasture classes were 82-92% depending on class. Another example is Swedish Corine Land Cover classification. Concerning forests, there are seven forest classes, differing according to tree height, canopy closure and species composition. Classification was based on Landsat ETM+ or TM images, maximum likelihood classification and Swedish National Forest Inventory plots for ground truth. The classification accuracies of individual classes were 50-70% depending on class, increasing to about 80% when aggregated to four Corine 3rd level forest classes (Hagner et al., 2005). The Finnish Corine Land Cover 2000 classification was based on automated interpretation of Landsat-7 ETM+ satellite images and data integration with existing digital map data. Map data provided information describing land use and soils, and satellite images provided information about land cover and were used to update map data. Continuous land cover variables like tree height, crown cover and deciduous tree crown cover as well as proportion of vegetation cover, grass- and heathlands were transformed into discrete CLC classes using thresholds of these variables according to class descriptions in CLC nomenclature. There are national version (raster with 25 m pixel size) and European versions (vector with 25 ha minimum mapping unit) available. Unfortunately, the classification accuracy of the national version was quite poor in Northern Finland when compared to Finnish National Forest Inventory plots; the overall accuracy was a bit over 50%. The reasons are that the classes on this area can be quite similar and are not easily separable if only spectral information from satellite image is used (Törmä et al., 2004; Härmä et al., 2005). The aim of this study was to test decision tree classifier for land cover classification and study the effects of its parameters to classification result. More specifically, what are the effects of different options of decision tree classifier like pruning or boosting, and the most important features for classification? The features used in this study consisted of ordinary image channels and index images computed from them, but also features computed from digital elevation model, temporal information in form of MODIS NDVI-feature, forest inventory variables and soil information. Also, interesting question is if images should be classified individually or if it would be better to classify image mosaic of whole area. On the more practical side, the aim was to increase the thematic precision of Finnish Corine Land Cover classification by making more detailed 4th-level national classes, especially concerning tree species. 2. STUDY AREA The study area belongs to the northern part of Boreal forest zone and consists of zones 4c (Forest-Lapland, southern part of study area) and 4d (Mountain-Lapland, northern part of study area) of Finnish forest vegetation zones (Härmä et al., 2005). The size of the study area is about 44 500 km2, the largest length in east-west direction is 385 km and height in south-north direction 325 km. Topographic height variations are largest in Finland, varying from about 20 m in River Teno valley to a little over 1300 m in mountains close to Norwegian border. Typically, height is between 200 – 400 m (about 67% from area) and the median value of height is 285 m. High mountain areas (height more than 600 m) are quite rare, covering less than 4% from area.

18

Page 3: Land cover classification of Finnish Lapland using …...about 80% when aggregated to four Corine 3rd level forest classes (Hagner et al., 2005). The Finnish Corine Land Cover 2000

Forest-Lapland zone belongs to northern part of taiga. Forests are low and sparse; the main tree species are pine and birch. Spruce is quite rare. There are plenty of wetlands, mostly aapa mires. Mountain-Lapland zone consists of scrubby pine and mountain birch forests, bushy vegetation above forest boundary and bare mountain tops. Understory vegetation is a mixture of tundra and taiga vegetation, and bare mountain top areas are characterized by tundra vegetation. There are plenty of wetlands, mostly aapa with some palsa mires (Linkola and Salminen, 1980). According to Corine Land Cover 2000 classification, the most common land cover types are forests (44.6% from area), other natural areas in mineral soil (32.7%), wetlands (14.7%) and water (7.7%), and the rarest are artificial surfaces (0.3%) and agricultural areas (0.04%). 3. SATELLITE IMAGES AND GIS DATA The performed classification was based on optical satellite images, forest inventory variables estimated from satellite images and GIS data like elevation model and information about soil. Biotope maps produced by Metsähallitus were used as reference data. 3.1 IRS and Spot satellite images Table 1 presents the used satellite images and their acquisition dates. The 9 IRS P6 LISS and 5 Spot-4 multispectral images used in this study are part of Finnish IMAGE2006 coverage. The instruments in both satellites have very similar channels; green, red, near-infrared and middle-infrared. Geometric and radiometric corrections were made in order to use images acquired at different times in different atmospheric conditions with different imaging geometries together in common coordinate system. The orthorectification of images was performed by Metria Sweden. Images were resampled to 20 meter pixel using cubic convolution interpolation. Geometric correction was quite successful; the mean residuals of the average residuals of individual images were 7.9 and 7.8 meters in X and Y-direction (Hatunen et al., 2008). Table 1. The used satellite images and their acquisition dates, number of training samples, size of

constructed decision trees, number of classes and classification errors of training data. Name Satellite Path /

Row Acquisition

date Training samples

Size of tree

CL Error%

Halti (HL) IRS 22/16 1.7.2005 110466 6006 8/11 19.1 Hammasjärvi (HJ) Spot 61/207 30.7.2006 102366 8835 15/16 19.7 Inari-Itäraja (II) Spot 62/207 1.7.20006 10395 655 10/12 16.7 Inarijärvi (IJä) IRS 29/16 12.7.2005 161556 10621 14/15 21.1 Ivalojoki (IJo) Spot 58/208 9.8.2006 65831 5838 15/15 18.5

Kaaresuvanto (KS) IRS 24/16 30.7.2006 175883 11002 16/16 19.7 Lokka (LK) Spot 65/209 10.8.2006 3284 241 10/12 12.4

Lompolo (LM) IRS 27/17 2.7.2005 109986 9007 15/16 19.1 Muotkatunturi (MT) IRS 27/16 2.7.2005 309513 23552 16/16 21.2

Päälaki (PL) IRS 27/15 2.7.2005 174120 8808 15/15 20.0 Porttipahta (PP) Spot 61/208 30.7.2006 69657 5670 12/13 18.8

Salla (SL) IRS 32/18 3.7.2005 308 24 7/8 14.3 Savukoski (SK) IRS 32/17 3.7.2005 51744 4672 15/15 18.5 Sevettijärvi (SJ) IRS 29/15 12.7.2005 127827 7555 15/15 21.2

19

Page 4: Land cover classification of Finnish Lapland using …...about 80% when aggregated to four Corine 3rd level forest classes (Hagner et al., 2005). The Finnish Corine Land Cover 2000

Figure 1. Mosaic of IRS and Spot satellite images covering vegetation zones 4c and 4d.

Unfortunately, there are some holes due to lack of cloud-free satellite images. Clouds and their shadows were detected visually and masked out. Atmospheric correction was done using ATCOR2 of Erdas Imagine. The aim of atmospheric correction was to remove the effects of atmospheric disturbances and noise, and make the corrected images as similar as possible with the IMAGE2000 mosaic. Topographic correction was made using the statistical-empirical correction (Itten and Meyer, 1993) where the effect of topographic variations is determined by computing illumination image using DEM, then computing regression line between image channels and illumination image and correcting image by subtracting the product of illumination image and slope of regression line from original image. Shadow areas were also determined during topographic correction (Hatunen et al, 2008). Figure 1 presents the mosaic covering vegetation zones 4c and 4d. Histogram matching was used to fine-tune pixel values, because there were considerable differences in reflectance values between overlapping images in some cases. Histogram matching was made using stable areas like dense forests and other natural areas on mineral soil. Areas like agricultural areas, wetlands and water were omitted. Mosaics consisting of all channels were resampled to 25 m pixel size in order to be more easily usable with other GIS data.

20

Page 5: Land cover classification of Finnish Lapland using …...about 80% when aggregated to four Corine 3rd level forest classes (Hagner et al., 2005). The Finnish Corine Land Cover 2000

Figure 2. The number of weeks that the long-term MODIS NDVI is greater than 0.5. Black and

red areas mean short time, yellow and green longer. 3.2 MODIS satellite images MODIS-images from Terra-satellite with 250 m pixel size were used to compute Normalized Difference Vegetation Index (NDVI)-mosaics, which were further processed to form a feature indicating the length of growing season and the fertility of growth place. NDVI is a simple vegetation index which is related to photosynthesis (Sellers, 1985) and it is computed by dividing the difference between near-infrared and red channels with their sum. The daily MODIS images were received from Sodankylä Receiving Station of Finnish Meteorological Institute. Pixel values were transformed to reflectance and normalized to a nadir view with sun zenith angle of 45º. Geometric correction was done using latitude and longitude files. Clouds were detected using their temperatures and the resulting mask was visually checked and manually corrected if needed (Törmä et al., 2007). Time-series describing the phenological development was formed by computing NDVI–images from daily MODIS-images from early April to mid-October, for years 2001 (287 individual images), 2005 (304), 2006 (340) and 2007 (230). Weekly mosaics were constructed for each year by selecting the maximum NDVI-value from all daily NDVI-values within that week. Then the mean value for each week was computed from mosaics of different years. Finally, the number of weeks in which the NDVI greater than 0.5 was computed from mean time-series. Figure 2 presents the resulting image; black and red areas mean short time, yellow and green longer.

21

Page 6: Land cover classification of Finnish Lapland using …...about 80% when aggregated to four Corine 3rd level forest classes (Hagner et al., 2005). The Finnish Corine Land Cover 2000

3.3 GIS data The Digital Elevation Model produced by Finnish National Land Survey describes the topographic height above sea level. The DEM is interpolated into 25x25 m grid using contour lines and coastline elements of the basic map 1:20 000. Contour lines are based on photogrammetric interpretation of aerial photographs (NLS, 2013a). Slope- and aspect-images were computed from DEM and aspect-image was further divided to 19 aspect-classes, one for flat areas and others in 20-degree intervals. Slope- and aspect-images were used in topographic correction and DEM, slope-image and aspect-classes in decision tree classification. The Topographic database produced by National Land Survey is comparable to maps on scale 1:5 000 - 1:10 000 and covers whole Finland. The database is continuously updated on regional basis using aerial images and stereo mapping. Information concerning soil was used to produce layers of bogs, open rock, boulders, and sand per hectare (NLS, 2013b). The forest inventory estimates produced by Finnish Forest Research Institute Metla were also used to enhance the separability of forest classes. Metla interpreted IMAGE 2006-images for Corine 2006 classification using k-NN estimation methods developed for Finnish National Forest Inventory (Tomppo et al., 2008a). Metla produced pixel-wise estimates of tree height, tree crown cover and deciduous tree crown cover. The reported errors for pixel and forest stand-level are high for this kind of estimation method, the coefficient of variation ranges typically from 40% to 100% for many variables (Tomppo et al., 2008b). But it should be noted that the absolute accuracy is not of interest here, because the forest inventory variables are not used directly to make the classification of forested areas. The relative accuracy is more important, because the decision tree classifier determines the classification rules. Forest boundary mask describes the area where tree growth is very low due to topographic height and environmental conditions. Tree cuttings do not happen within this area (Härmä et al., 2005). 3.4 Reference data The biotope maps produced by Metsähallitus were used as reference data. The aim of biotope mapping is to describe the nature of area; biotopes, state of nature and vegetation cover. Mapping has been performed as interpretation of 1:20 000 false color aerial photographs with the aid of available forest and topographic maps, forest fire reports, interviews and ground surveys. The minimum mapping unit is 1 hectare, but there can be smaller units due to cartographic and ecological reasons (Eeronheimo, 2000). Formed classes and their descriptions are presented in table 2 and figure 3. Classes were determined for stands by thresholding the polygon attributes of biotope maps. The polygons larger than 1 km2 and high variance of near-infrared channel were discarded. Then vector data was rasterized and border pixels of polygons removed. The stands were classified as forest based on tree height (>5m), crown cover (>20%) and tree species information. The coniferous forests were classified according to species (pine or spruce) if the proportion of that species was more than 75%. If the proportion of coniferous trees was more than 75% but tree species proportion was less than 75%, then the stand was classified as coniferous forest. If the proportion of mountain birch was more than 75% then the stand was classified according to that but if the proportion of mountain birch was less but the proportion of all deciduous trees more than 75% then the stand was classified as deciduous forest. Otherwise, forest stands were classified as mixed forest. If the amount of trees was less, in other words height was less than 5 meters or crown cover less than 20%, then the inventory class of biotope map was used to define the class.

22

Page 7: Land cover classification of Finnish Lapland using …...about 80% when aggregated to four Corine 3rd level forest classes (Hagner et al., 2005). The Finnish Corine Land Cover 2000

Figure 3. Coverage of reference data. Colors are explained in table 2.

Table 2. Classes, their descriptions and the proportion from whole reference data.

CC: tree crown cover, H. tree height. Code Description Color Prop.(%) 3111 Deciduous forest (CC>20%, H>5m, prop. of decid.>75%) Light green 11.7 3112 Mountain birch, over 5m (CC > 20%, H > 5m, prop. of

mountain birch > 75%) Light green 1.8

3121 Coniferous forest (CC>20%, H>5m, prop. of conif.>75%) Dark green 7.1 3122 Pine forest (CC>20%, H>5m, prop. of pine>75%) Dark green 10.6 3123 Spruce forest (CC>20%, H>5m, prop. of spruce>75%) Dark green 0.6 3130 Mixed forest (CC>20%, H>5m) Green 0.4 3210 Grassland Brown 1.2 3220 Heathland Reddish brown 23.9 3241 Transitional woodland, CC < 10% Blue 9.9 3242 Transitional woodland, CC 10-30%, mineral soil Cyan 0.6 3243 Transitional woodland, CC 10-30%, peat soil Dark cyan 0.1 3244 Transitional woodland, CC 10-30%, rocky soil Light cyan 5.9 3245 Mountain birch, H < 5m Pink 2.2 3310 Sand and dunes Yellow 0.1 3320 Open rocks and boulders Grey 4.1 4120 Open bog Magenta 20.0

23

Page 8: Land cover classification of Finnish Lapland using …...about 80% when aggregated to four Corine 3rd level forest classes (Hagner et al., 2005). The Finnish Corine Land Cover 2000

4. CLASSIFICATION Decision trees are classification systems that form classification rules employing a top-down, divide-and-conquer strategy that partitions the given set of objects into smaller and smaller subsets in step with the growth of the tree (Friedl and Brodley, 1997). Decision tree classifier used in this study was See5 by RuleQuest (RuleQuest, 2013a) which constructs decision trees and rule sets automatically using training samples. One of the benefits of this kind of classifier is that variables can be continuous like satellite images or estimation results, or categorical like map layers or previous classification results. Following features were used in classification experiments:

Image channels: Atmospherically and topographically corrected green, red, near-infrared and middle-infrared channels of IRS P6 LISS and Spot-4 XS images.

Normalized Difference Indices of image channels: Six NDIs computed as ( ChA – ChB ) / ( ChA + ChB ) where ChA and ChB are image channels A and B. NDIs were computed so that the channel with longer wavelength was A and shorter B.

MODIS NDVI: The number of weeks when the long-term MODIS NDVI is greater than 0.5 for growing season.

DEM: Height from sea level, surface slope in degrees and aspect class. Aspect was divided to 19 classes: one for flat land and 18 classes for different directions.

Forest variable estimates: Tree height, crown cover and crown cover of deciduous trees estimated by Finnish Forest Research Institute Metla.

Soil: Proportions of peat (i.e. open bogs), open rock, boulders and sand per hectare. Forest boundary mask.

Datasets for training of classifier and validation of classification result were created using systematic sampling, but different start position and sampling frequency. Table 1 presents the number of training samples for each image. The number of samples for validation was 618701. Table 1 presents also the size of constructed decision trees, in other words the number of non-empty leaves in the tree, and the classification errors of training data of images. It should be remembered that this estimate of classification accuracy is optimistically biased giving too optimistic view about the success of classification. Classifications were made using all channels of individual images and whole mosaic. Input files were prepared using Erdas Imagine FIA Tools-macros which have been made by Earth Satellite Corporation for U.S.Geological Survey (Brewer et al., 2005). Classification accuracies were estimated for classifications of whole mosaic and different ways to combine the classifications of individual images:

V1: Combination is made according to large number of classes in individual classification (Table 1, column CL), small difference in number of classes in classification result (Table 1, column CL, left number) and reference data (Table 1, column CL, right number), good accuracy of training data (Table 1) and amount of training data (Table 1, image with large amount of training data is preferred).

V2: Good accuracy, large number of classes and amount of training data. V3: Amount of training data. V4: Use confidence layers produced by See5 classifier.

24

Page 9: Land cover classification of Finnish Lapland using …...about 80% when aggregated to four Corine 3rd level forest classes (Hagner et al., 2005). The Finnish Corine Land Cover 2000

The effect of the parameters of classifier was studied using Spot-4 image 61/207 acquired 30.7.2006. These parameters were the pruning options of the decision tree and classifier boosting. Pruning means that a large tree is first grown to fit the data closely and is then pruned by removing parts that are predicted to have a relatively high error rate. The default option is global pruning, and if it is turned off then the pruning component is disabled and generally results in larger decision trees. The Pruning CF option affects the way that error rates are estimated and hence the severity of pruning; values smaller than the default 25% cause more of the initial tree to be pruned, while larger values result in less pruning. The Minimum cases pruning option constrains the degree to which the initial tree can fit the data. At each branch point in the decision tree, the stated minimum number of training cases must follow at least two of the branches. Values higher than the default (2 cases) can lead to an initial tree that fits the training data only approximately. In boosting, the idea is to generate several classifiers rather than just one. When a new case is to be classified, each classifier votes for its predicted class and the votes are counted to determine the final class (RuleQuest, 2013b). The idea of boosting is to generate several classifiers rather than just one. When a new case is to be classified, each classifier votes for its predicted class and the votes are counted to determine the final class. There is also a possibility to set out differential misclassification costs for classes, giving a much higher penalty for certain types of mistakes. Then the constructed classifier tries to avoid these mistakes (RuleQuest, 2013b). 5. RESULTS Error matrix was used to compare the classification results and reference data, and accuracy measures overall accuracy, producer’s accuracies of individual classes, and user’s accuracies of individual classes were computed from error matrix (Lillesand and Kiefer, 1994). 5.1 Effect of parameters of decision tree classifier The overall accuracy of classification of Spot-4 image 61/207 acquired 30.7.2006 was 86.2% for training set and 78.2% for test set when default parameters were used. Different pruning options did not necessarily increase the classification accuracy and in some cases decreased it. The overall accuracy of training set varied 82.5-87.6% and test set 77.5%-78.7%. Boosting was tested using 10 decision trees. The overall accuracy of training set increased a lot to 94.5% but the overall accuracy of test set increased much less to 80.3% indicating that the classifier fits well to training data but its ability to generalize, in other words classify the test set, is not increased that much. In the end, it was decided that the increase of classification accuracy was so small that the classification of whole area was made using default parameters, in other words using global pruning, no boosting and no weighting of classes. 5.2 Classification of whole area The sample size for validation was 618701 pixels and these were systematically sampled from the classification result and reference data. Table 3 presents the overall accuracies of different alternatives for different class combinations, in other words all 16 classes, 9 Corine level-3 classes or 4 Corine level-2 classes. The results indicate that the best way to classify the area is to classify individual images and then combine these classifications using confidence layers produced by See5. Figure 4 represents this classification result. The differences between other methods to combine individual classification are really small. The worst results were got when images were mosaicked and whole mosaic classified.

25

Page 10: Land cover classification of Finnish Lapland using …...about 80% when aggregated to four Corine 3rd level forest classes (Hagner et al., 2005). The Finnish Corine Land Cover 2000

Table 3. The overall accuracies of different alternatives for different class combinations. V* means different ways to combine the classifications of individual images and

Mos classification of whole mosaic. N=618701 V1 V2 V3 V4 Mos

All classes (16) 66.3 66.4 66.2 67.6 44.7 CLC 3-level (9) 71.8 71.8 71.7 72.8 55.7 CLC 2-level (4) 81.5 81.5 81.5 82.3 73.4

Figure 4. The best classification result. Water areas in black are taken

from Corine Land Cover 2000 classification. Figure 5 presents the class-wise classification accuracies for all 16 classes on the left and 9 Corine level-3 classes on the right, blue bar means user's accuracy and red bar producer's accuracy. The best classes are open bogs, open rocks and heathlands, in those cases the class-wise accuracies are about 80%. The worst classes are mountain birch (H<5m) and mixed forest. Mountain birch is mainly mixed with deciduous trees which are usually other kind of birches, and heathland. Coniferous forest and pine forest are mixed with each other quite heavily, but the mixing with spruce forest is quite small. This indicates that the coniferous forests are mixed forests dominated by pines or they are pine forests but the proper species information of stand is missing from reference data. Not surprisingly, mixed forest is mixed with other forest classes. Transitional woodland-classes are not mixed with each other that much, but forest classes,

26

Page 11: Land cover classification of Finnish Lapland using …...about 80% when aggregated to four Corine 3rd level forest classes (Hagner et al., 2005). The Finnish Corine Land Cover 2000

heathland and open bog are. Heathland is mixed with quite many classes like deciduous forest, grassland, sand, open rock and open bog. Combination of classes increases the accuracies of forest classes and transitional woodland, meaning that different classes within forest are mainly mixed with each other. Same is true to transitional woodland. See5 decision tree classifier outputs also the importance of feature, in other words the percentage of the training cases that a feature has been used to classify. This gives an indication how useful or necessary a feature is from a point of view of classification. Usually, all features contribute to the classification of an image at least a little. The most notable differences were the images Salla, Lokka and Inari-Itäraja (see Table 1) in a sense that there were more redundant features when classifying these images. These images were also images with least amount of training data. Figure 6 illustrates the importance of different features for classification; horizontal axis represents the individual images and vertical the importance of feature, in other words how often the feature has influenced classification decision. All features have not been plotted due to clarity of plot, but features have been divided to groups: image channels, NDI index images, DEM-features, tree-features and soil features by plotting the maximum value of that group. The exceptions are the features proportion of peat soil, forest boundary mask and MODIS NDVI. The importance of different features varies quite a lot between images. The most common occurrences are that the proportion of peat is used in all decisions in all images, and usually the NDI index images are the least important features. DEM-features are usually very important as well as forest boundary, but soil features are less important. The importance of soil and tree features seems to increase northwards. In case of MODIS NDVI, the importance seems to decrease northwards. 6. CONCLUSIONS Land cover classification of Lapland was made using optical IRS and Spot satellite images, GIS data like digital elevation model and soil information, and decision tree classification algorithm. Good feature of decision tree classifier is that it can easily use continuous and categorical variables together, and it is non-parametric classifier so the user does not have to worry about statistical distribution of classes.

Figure 5. Classwise accuracies for all 16 classes on the right

and 9 Corine level-3 classes on the left.

27

Page 12: Land cover classification of Finnish Lapland using …...about 80% when aggregated to four Corine 3rd level forest classes (Hagner et al., 2005). The Finnish Corine Land Cover 2000

Figure 6. The importance of different features. The images (see Table 1 for abbreviations) have been arranged in South-North order; the more southern images are on the left. The importance means how often a feature has influenced the classification decision, 100 means that feature has

influenced all decisions and 0 none at all.

The different classification options of See5 decision tree classifier were tested. The different pruning options did not increase classification accuracy much and in many cases decreased it. The boosting increased the classification accuracy of training data but the effect to the accuracy of test data was much smaller or nonexistent. This indicates that the classifier adapts to training data and the performance is worse with other data. In the end, it was decided to perform classification using default options: global pruning, no boosting and all classes would have equal weights. Considering the most important features, in other words features which are used more often, it was noticed that usually all features are used for the classification of individual images. The exceptions were the images with small amount of training data. The features dealing with soil information and computed from digital elevation model were most important ones, and the satellite images were surprisingly unimportant. Especially normalized difference index images were used rather seldom. But in order to get better idea about the importance of different types of features, more classification experiments should be carried out. The classification of individual images is preferable to classification of one large image mosaic of whole area. The mosaicking of classification results can be based on confidence values produced by decision tree classifier. In the best case the overall accuracy was about 68% for all 16 classes when individual images were classified. The overall accuracy was only about 45% when whole mosaic was classified. The drawback of this option is that it requires more training data,

28

Page 13: Land cover classification of Finnish Lapland using …...about 80% when aggregated to four Corine 3rd level forest classes (Hagner et al., 2005). The Finnish Corine Land Cover 2000

especially if the size of individual image is small, than the classification of mosaic covering larger area. In the end, the more detailed 4th-level classification was not used in the production of Finnish Corine Land Cover 2006 (Törmä et al., 2011). The more general 3rd-level classification which was made by combining 4th-level classes was used above the tree line in northernmost parts of Lapland (CLC2006, 2009). It was thought that the accuracy of 4th-level classification was not good enough, and pine and spruce forest classes should have covered whole Finland. With hindsight, mountain birch class would have been useful because there is need for that information for ecological and biodiversity applications. Nowadays, tree species information is available covering whole Finland provided by Finnish Forest Research Institute Metla (see http://kartta.metla.fi/). When the new Corine Land Cover 2012 classification will be made, it should be studied if the Metla information is sufficient or is there need for new classification. Another classification worth studying would be the reindeer pasture inventory data (Kumpula et al., 2006) from Finnish Game and Fisheries Research Institute. The pixel size of images used in this study was 20 m. The decrease of pixel size is already reality; there are many satellites with very-high resolution instruments. Although Corine Land Cover is based on images with 20 m pixel size, there are also available Rapideye images with 5 m pixel size or Spot with 2.5 or 1.5 m pixel size. Therefore, if there is a need for a new classification of Lapland for Corine, then the chosen interpretation method will most likely be segmentation of VHR images and classification of segments using decision tree classifier, incorporating spectral information from VHR images, temporal information from medium resolution images, forest inventory data, and available DEM and soil information. 6. ACKNOWLEDGMENT I would like to thank anonymous reviewers for their valuable comments to make this article better, and the editors Dr. Petri Rönnholm and Prof. Dr. Henrik Haggrén for patience and advice. 7. REFERENCES Brewer, K., Ruefenacht, B., Finco, M., 2005. Development and production of a moderate resolution forest type map of the United States, In M. Marsden, M. Downing, and M. Riffe, eds., Workshop Proceedings: Quantitative Techniques for Deriving National Scale Data, July 26-28 2005, Westminster, Colorado, USA, USDA Forest Service Publication FHTET-05-12, http://www.fs.fed.us/foresthealth/technology/pdfs/Brewer.pdf, (19.122013). CLC2006, 2009. CLC2006 Finland - Final technical report, Finnish Environment Institute, http://www.syke.fi/download/noname/%7BC7C849EB-3F4D-42AE-9A94-5B8069FFDFFB%7D/37641, (18.12.2013). Di Gregorio, A., Jansen , L., 2000. Land Cover Classification System (LCCS): Classification Concepts and User Manual, Food and Agriculture Organization of the United Nations 2000, http://www.fao.org/docrep/003/x0596e/x0596e00.htm, (22.7.2013). Eeronheimo, H., 2000. Ylä-Lapin luontokartoitus: Biotooppikuviointi ja LUOTI-tietojärjestelmän tiedot, 12.6.2000, Metsähallitus, Perä-Pohjolan luontopalvelut, Rovaniemi. (Finnish)

29

Page 14: Land cover classification of Finnish Lapland using …...about 80% when aggregated to four Corine 3rd level forest classes (Hagner et al., 2005). The Finnish Corine Land Cover 2000

Friedl, M., Brodley, C., 1997. Decision tree classification of land cover from remotely sensed data, Remote Sensing of Environment, Vol. 61, No. 3, pp. 399-409. Hagner, O., Nilsson, M., Reese, H., Egberth, M., Olsson, H., 2005. Procedure for classification of forests for CORINE land cover in Sweden, 24th EARSel Symposium on New Strategies for European Remote Sensing, Dubrovnik, Croatia, May 25 - 27, 2004, Oluic (ed.), pp. 523-529. Hatunen, S. Härmä, P. Kallio, M., Törmä, M., 2008. Classification of Natural Areas in Northern Finland Using Remote Sensing Images and Ancillary Data, Remote Sensing for Environmental Monitoring, GIS Applications, and Geology VIII, Proceedings of SPIE Vol. 7110, 71100W. Härmä, P., Teiniranta, R., Törmä, M., Repo, R., Järvenpää, E., Kallio, E., 2005. CLC2000 Finland: Final Report, Finnish Environment Institute, Geoinformatics and Land Use Division, May 2005. URL: http://www.ymparisto.fi/download.asp?contentid=38725&lan=fi, (30.7.2013). Itten, K., Meyer, P., 1993. Geometric and Radiometric Correction of TM Data of Mountainous Forested Areas, IEEE Transactions on Geoscience and Remote Sensing, vol. 31, no. 4, pp. 764-770. Kumpula, J., Colpaert, A., Tanskanen, A., Anttonen, M., Törmänen, H., Siitari, J., 2006. Porolaidunten inventoinnin kehittäminen - Keski-Lapin paliskuntien laiduninventointi vuosina 2005-2006, Finnish Game and Fisheries Research Institute, Research Report 397, Helsinki p. 72 (Kala- ja riistaraportteja No. 397). Lillesand, T., Kiefer, R., 1994. Remote Sensing and Image Interpretation, 3rd ed., John Wiley & Sons, p. 750, ISBN 0-471-57783-9. Linkola, M., Salminen, P., 1980. Suomen luonto ja maisema tuntureita Itämerelle, in P. Havas, ed., Suomen Luonto 1: Luonto toimii - Tunturit, Kirjayhtymä, Helsinki, ISBN 951-26-1747-1, pp. 11-64. Mikkola, J., Pellikka, P., 2002. Normalization of bidirectional effects in aerial CIR photographs to improve classification accuracy of boreal and subarctic vegetation for pollen-landscape calibration, Journal of Remote Sensing, Vol. 23, No. 21, pp. 4719-4742. NLS, 2013a. Elevation model 25 m, http://www.maanmittauslaitos.fi/en/digituotteet/elevation-model-25-m, (22.9.2013). NLS, 2013b. The Topographic database, http://www.maanmittauslaitos.fi/en/digituotteet/ topographic-database, (22.9.2013). Reese, H., Nilsson, M., 2005. Classification of mountain vegetation using plot data from the new National Inventory of the Landscape in Sweden (NILS) and Landsat satellite data, 31st International Symposium on Remote Sensing of Environment, 20-24 May, 2005, Saint Petersburg, Russia, http://www.isprs.org/proceedings/2005/ISRSE/html/papers/657.pdf RuleQuest, 2013a. Data Mining Tools See5 and C5.0, http://www.rulequest.com/see5-info.html, (30.7.2013).

30

Page 15: Land cover classification of Finnish Lapland using …...about 80% when aggregated to four Corine 3rd level forest classes (Hagner et al., 2005). The Finnish Corine Land Cover 2000

RuleQuest, 2013b. See5: An Informal Tutorial, Rulequest http://www.rulequest.com/see5-win.html, (22.9.2013). Sellers, P. J., 1985. Canopy reflectance, photosynthesis and transpiration International Journal of Remote Sensing, Vol. 6, pp. 1335–1372. Tomppo, E., Haakana, M., Katila, M., Peräsaari, J., 2008a. MultiSource National Forest Inventory Methods and Applications, Springer-Verlag, 374 p. (Series: Managing Forest Ecosystems , Vol. 18) ISBN 978-1-4020-8712-7. Tomppo, E., Olsson, H., Ståhl, G., Nilsson, M., Hagner, O., Katila, M., 2008b. Combining national forest inventory field plots and remote sensing data for forest databases, Remote Sensing of Environment, Vol. 112, pp. 1982–1999. Törmä M., Härmä P., Teiniranta R., Repo R., Järvenpää E., Kallio E., 2004. The Production of Finnish Corine Land Cover 2000 Classification, In: Altan O. (ed.), XXth ISPRS Congress 832 Technical Commission IV, ISPRS Archives Vol. XXXV Part B4, pp. 1330- -1335. Törmä, M., Härmä, P., Hatunen, S., Teiniranta, R., Kallio, M., Järvenpää, E. 2011. Change detection for Finnish CORINE land cover classification, Earth Resources and Environmental Remote Sensing/GIS Applications II, Proceedings of SPIE Vol. 8181, SPIE, Bellingham, WA 2011: 81810Q. Törmä, M., Rankinen, K., Härmä, P., 2007. Using phenological information derived from MODIS-data to aid nutrient modeling, IGARSS 2007, IEEE International Geoscience and Remote Sensing Symposium, 23-28 July 2007, Barcelona Spain. 2007, IEEE, pp. 2298-2301. Walker, D., Gould, W., Maier, H., Raynolds, M., 2002. The Circumpolar Arctic Vegetation Map: AVHRR-derived base maps, environmental controls, and integrated mapping procedures, Journal of Remote Sensing, Vol. 23, No. 21, pp. 4551-4570.

31