mapping forest changes using multi-temporal remote sensing … › etd › ucb › text ›...

Mapping forest changes using multi-temporal remote sensing images: BITE for accurate

trajectory extraction and CBEST for efficient clustering

Yanlei Chen

A dissertation submitted in partial satisfaction of the

requirements for the degree of

Doctor of Philosophy

Environmental Science, Policy and Management

in the

Graduate Division

of the

University of California, Berkeley

Committee in charge:

Professor Peng Gong, Chair

Professor Gregory Biging

Professor John Radke

Fall 2014

Abstract

Mapping forest changes using multi-temporal remote sensing images: BITE for accurate

trajectory extraction and CBEST for efficient clustering

Yanlei Chen

Doctor of Philosophy in Environmental Science, Policy and Management

University of California, Berkeley

Professor Peng Gong, Chair

We developed a semi-automatic algorithm named Berkeley Indices Trajectory Extractor

(BITE) to detect forest disturbances, especially slow-onset disturbances such as insect mortality,

from time series of Landsat 5 Thematic Mapper (TM) images. BITE is a streamlined process that

features trajectory extraction and interpretation of multiple spectral indices followed by an

integration of all indices. The algorithm was tested over Grand County in Colorado, located in

the Southern Rocky Mountains Ecoregion, where forests dominated by lodgepole pine have been

under mountain pine beetle attack since 2000. We produced a disturbance map using BITE with

an identification accuracy of 94.7% assessed from 602 validation sample pixels. The algorithm

shows its robustness in deriving forest disturbance type and timing with the presence of different

levels of atmospheric conditions, noises, pixel misregistration and residual cloud/snow cover in

the imagery. Outputs of the BITE algorithm could be used in studies designed to increase

understanding of the mechanisms of mountain pine beetle dispersal and tree mortality, as well as

other types of forest disturbances.

Large remote sensing datasets, that either cover large areas or have high spatial resolution, are

often a burden for information mining for scientific studies. Here, we present an approach that

conducts clustering after gray-level vector reduction. In this manner, the speed of clustering can

be considerably improved. The approach features applying eigenspace transformation to the

dataset followed by compressing the data in the eigenspace and storing them in coded matrices

and vectors. The clustering process takes advantage of the reduced size of the compressed data

and thus reduces computational complexity. We name this approach Clustering Based on Eigen

Space Transformation (CBEST). In our experiment with a subscene of Landsat Thematic

Mapper (TM) imagery, CBEST was found to be able to improve speed considerably over

conventional K-means as the volume of data to be clustered increases. We assessed information

loss and several other factors. In addition, we evaluated the effectiveness of CBEST in mapping

land cover/use with the same image that was acquired over Guangzhou City, South China and an

AVIRIS hyperspectral image over Cappocanoe County, Indiana. Using reference data we

assessed the accuracies for both CBEST and conventional K-means and we found that the

CBEST was not negatively affected by information loss during compression in practice. We then

applied CBEST in mapping the forest change from 1986-2011 for the entire state of California,

USA with over 400 Landsat TM images. We discussed potential applications of the fast

clustering algorithm in dealing with large datasets in remote sensing studies.

We present an efficient approach for a practice of large-area mapping of forest changes based on

the Clustering Based on Eigen Space Transformation (CBEST) algorithm using remote sensing.

By analyzing 450 Landsat Thematic Mapper (TM) satellite images from 1986 to 2011 with a

five-year interval covering the entire state of California, USA, we derived a forest change type

map, a forest loss map and a forest gain map. Although California has 99.6 million acres land

area in total and the spatial resolution of Landsat TM is 30m, the computing time of the task took

only 10 hours in a computer with an Intel 2.8 Ghz i5 CPU and 8 Gigabytes RAM. The overall

accuracy of the forest cover in year 2011 was reported as 92.9% ± 1.6%. We found that the

estimated forest area changed from 28.20 ± 1.98 million acres to 28.05 ± 1.98 million acres from

1986-2011. In particular, our rough estimate indicates that each year California’s forest

experienced loss of 92 thousand acres and recovery of 85 thousand acres, resulting in seven

thousand acres forest loss per year. In addition, during 1986-2011, around 12% of the forestland

experienced changes, in which the change was 4% each for deforestation, afforestation and

deforestation then recovered respectively. We concluded that the forestland in California had

been managed in a sustainable manner over the 25 years, since no significantly directional

changes were observed. Our approach made a tighter estimate of the true canopy coverage such

that 29% of land in California is forestland, comparing with the statistics of 33% and 40% made

by previous studies that had lower spatial resolution and shorter temporal coverage.

Table of Contents

LIST OF TABLE CAPTIONS ................................................................................................................... III

LIST OF FIGURE CAPTIONS ................................................................................................................. IV

INTRODUCTION ........................................................................................................................................ V

ACKNOWLEDGEMENT .........................................................................................................................VII

CHAPTER 1 BITE: AN ALGORITHM FOR MAPPING SLOW-ONSET FOREST DISTURBANCES CAUSED BY MOUNTAIN PINE BEETLES WITH LANDSAT IMAGE STACKS .................................................. 1

ABSTRACT .................................................................................................................................................. 2

1 INTRODUCTION .............................................................................................................................. 3

2 METHODOLOGY .............................................................................................................................. 5 2.1 STUDY AREA ............................................................................................................................................................. 5 2.2 DISTURBANCE MAPPING PROCEDURE ................................................................................................................. 6

2.2.1 Data and Preprocessing ....................................................................................................................................... 7 2.2.2 Spectral Indices ........................................................................................................................................................ 9 2.2.3 Trajectory Extraction ............................................................................................................................................ 9 2.2.4 Trajectory Interpretation ................................................................................................................................ 13 2.2.5 Post-classification Process ............................................................................................................................... 14

3 RESULTS AND DISCUSSION ...................................................................................................... 15 3.1 EVALUATION OF THE CLASSIFIERS AND THE INDICES ..................................................................................... 15 3.2 ACCURACY ASSESSMENT ...................................................................................................................................... 16 3.3 THE DISTURBANCE MAP PRODUCT ................................................................................................................... 18

4 CONCLUSION AND PERSPECTIVES ......................................................................................... 20

CHAPTER 2 CLUSTERING BASED ON EIGEN SPACE TRANSFORMATION – CBEST FOR EFFICIENT CLASSIFICATION ..................................................................................................................................... 22

ABSTRACT ............................................................................................................................................... 23

1 INTRODUCTION ........................................................................................................................... 24

2 BACKGROUND............................................................................................................................... 25 2.1 K-MEANS ................................................................................................................................................................. 25 2.2 EIGEN-BASED GRAY-LEVEL VECTOR REDUCTION .......................................................................................... 27

3 CLUSTERING BASED ON EIGEN SPACE TRANSFORMATION ......................................... 28 3.1 COMPRESSION ........................................................................................................................................................ 28 3.2 CLUSTERING ........................................................................................................................................................... 30 3.3 FURTHER IMPROVEMENT .................................................................................................................................... 31

3.3.1 Mean vectors .......................................................................................................................................................... 31 3.3.2 Vacant Eigenspace Partitions ........................................................................................................................ 32 3.3.3 Boundary Optimization ..................................................................................................................................... 32

4 EXPERIMENTAL DESIGN ........................................................................................................... 32 4.1 EXPERIMENT DATA ............................................................................................................................................... 33 4.2 PREPROCESSING .................................................................................................................................................... 34

4.3 METHODS ............................................................................................................................................................... 34

5 RESULTS AND ANALYSIS ........................................................................................................... 37 5.1 EFFICIENCY & PERFORMANCE TEST .................................................................................................................. 37 5.2 APPLICATION EXPERIMENTS ............................................................................................................................... 46

5.2.1 Landsat TM Image ............................................................................................................................................... 46 5.2.2 AVIRIS Hyperspectral Image .......................................................................................................................... 49

6 DISCUSSIONS ................................................................................................................................. 51

CHAPTER 3 APPLICATIONS OF CBEST IN EFFICIENTLY MAPPING FOREST CHANGES IN THE STATE OF CALIFORNIA FROM 1986-2011 ..................................................................................................... 54

ABSTRACT ............................................................................................................................................... 55

1 INTRODUCTION ........................................................................................................................... 56

2 METHODOLOGY ........................................................................................................................... 58 2.1 STUDY AREA ........................................................................................................................................................... 58 2.2 DATA ....................................................................................................................................................................... 59 2.3 PROCEDURE ............................................................................................................................................................ 60

2.3.1 Data Preparation ................................................................................................................................................. 61 2.3.2 Initial Clustering ................................................................................................................................................... 62 2.3.3 Integrating Cluster Centers ............................................................................................................................. 63 2.3.4 Probability Assigning ......................................................................................................................................... 63 2.3.5 Probability Trajectory Interpretation ........................................................................................................ 64 2.3.6 Post-processing ..................................................................................................................................................... 66

3 RESULTS AND ANALYSIS ........................................................................................................... 67 3.1 INTERMEDIATE RESULTS ..................................................................................................................................... 67 3.2 FOREST CHANGE MAP AND ACCURACY ASSESSMENT ..................................................................................... 70

4 DISCUSSIONS ................................................................................................................................. 72

5 CONCLUSIONS ............................................................................................................................... 77

CHAPTER 4 CONCLUSIONS AND PERSPECTIVES ............................................................................ 79

1 SUMMARY OF THE RESULTS .................................................................................................... 79

2 FUTURE PERSPECTIVES ............................................................................................................ 80

REFERENCES ........................................................................................................................................... 81

List of Table Captions

Table 1 Data acquisition dates and land percentage. ................................................................................................ 7 Table 2 The list of the spectral indices. .......................................................................................................................... 9 Table 3 Overall accuracies of the classification test results. ‘CV’ represents the cross-validation test on

the training dataset. ‘Test’ represents the evaluation on the test dataset. ........................................... 15 Table 4 Overall accuracies of the classification test of integration of multiple indices. The evaluation

was done on the test dataset. ............................................................................................................................... 16 Table 5 Confusion matrix of the forest change type classification result. The evaluation was done on

the test dataset. ........................................................................................................................................................ 17 Table 6 Conventional K-means Algorithm ................................................................................................................. 26 Table 7 Eigen-Based Gray Level Vector Reduction ................................................................................................. 28 Table 8 CBEST Algorithm ................................................................................................................................................. 30 Table 9 Description of the Indicators .......................................................................................................................... 36 Table 10 Classification System for Guangzhou ......................................................................................................... 36 Table 11 Test Results w/respect to Data Size ........................................................................................................... 38 Table 12 Test Results w/respect to k .......................................................................................................................... 39 Table 13 Test Results w/respect to N .......................................................................................................................... 42 Table 14 Assignment of Eigenspace Partitions for Eigen Axes ............................................................................ 44 Table 15 Performance Test w/respect to the Max number of Iterations ........................................................ 45 Table 16 Confusion Matrices for Validation (Landsat) .......................................................................................... 46 Table 17 Summary of Classification Results (Landsat) ......................................................................................... 47 Table 18 Summary of Class Results (AVIRIS) ............................................................................................................ 51 Table 19 Verification classes and corresponding probability weights ............................................................ 64 Table 20 Elapsed time for the clustering process. Each result was selected as the lowest within-cluster

sum of squares from 5 runs. 1986N10 means the mosaicked image in 1986 with projection of UTM Zone 10 North. ................................................................................................................................................ 67

Table 21 Means and standard deviations of forest probabilities calculated for the clusters ................... 69 Table 22 The error matrix of samples with four classes from validation labels and two classes from

the forest cover in 2011 ......................................................................................................................................... 71 Table 23 Error matrix of accuracies for the forest cover in 2011 ...................................................................... 71

List of Figure Captions

Figure 1 Study Area: Grand County, CO, USA. ............................................................................................................... 6 Figure 2 Flowchart of processing steps using in the BITE algorithm. .................................................................. 7 Figure 3 Example of an NDVI time series for 1 disturbed pixel, and intermediate results of processing

steps in the time series. These processing steps include 1) Inter-year value selection (a)-(b); 2) Noise removal (b)-(c); 3) Segmentation (c)-(d). ............................................................................................ 10

Figure 4 Example of the intermediate processing steps producing segments of the entire NDVI trajectory for 1 pixel (Segmentation Process). .............................................................................................. 13

Figure 5 Outputs of the BITE algorithm, including starting year of (left) slow-onset disturbances and (right) rapid-onset disturbances. ....................................................................................................................... 18

Figure 6 Area affected by different disturbance types for 2001-2009. ............................................................ 19 Figure 7 Enlarged view of the BITE output showing the staring year of slow-onset disturbances. ........ 20 Figure 8 Illustrative Comparisons between CBEST and K-means ...................................................................... 31 Figure 9 Test Area: Guangzhou, China ......................................................................................................................... 33 Figure 10 Experiment Flow Chart ................................................................................................................................. 35 Figure 11 Speed Comparison w/respect to Data Size. (a) Elapsed Time Comparison; (b) Elapsed Time

ratio (how many times faster); (c) ETI Comparison; (d) ETI ratio ........................................................... 39 Figure 12 Efficiency w/respect to k. (a) Elapsed Time Comparison; (b) Elapsed Time ratio (how many

times faster); (c) ETI Comparison; (d) ETI ratio; (e) Rescaled Within-Cluster Sum of Square average; (f) Rescaled Within-Cluster Sum of Square Best/Worst Case. ................................................. 41

Figure 13 Efficiency w/respect to N. (a) Elapsed Time Comparison; (b) Elapsed Time ratio (how many times faster); (c) ETI Comparison; (d) ETI ratio. (e) Within-Cluster Sum of Squares Comparison; (f) Within-Cluster Sum of Squares Limited by various max numbers of Iterations. .......................... 43

Figure 14 Scatterplot of Ground Truth ........................................................................................................................ 48 Figure 15 Land Cover/Use Map derived by K-means and CBEST in Guangzhou ........................................... 49 Figure 16 Validation Samples as Ground Reference in Guangzhou ................................................................... 49 Figure 17 Mapping Results in Tippercanoe County ................................................................................................ 50 Figure 18 California: Study Area and Landsat TM scenes. Since the study area is in the northern

hemisphere, the UTM is of North Zone. ............................................................................................................ 59 Figure 19 Flowchart of the Procedure to map forest changes in California ................................................... 61 Figure 20 CBEST software interface. The initial clustering was implemented under the configuration

in this figure............................................................................................................................................................... 63 Figure 21 Graphic demonstration of probability trajectory interpretation. (a) A typical forest loss

pixel with elaborations on the rules for automatic determination of forest loss; (b) Non-forest, all points fall within the bounds; (c) Forest; (d) Forest Gain detected in 2006. ........................................ 65

Figure 22 Post-clustering result in year 2011 and stratified samples ............................................................. 68 Figure 23 California Forest Change Maps 1986-2011. Left: Change Type Map; Upper right: Forest loss

characterized by years; Lower right: Forest gain/recovery characterized by years. ........................ 70 Figure 24 Estimated Forest Area by Mapping Years ............................................................................................... 72 Figure 25 Proportions of forest change type in California for 1986-2011 ...................................................... 73 Figure 26 Local views of some chosen places of the forest change map. The four images in the bottom

of the figure demonstrate the changes detected with historical aerial photographs back in the 1988 and 1993 in comparison with high resolution image acquired recently. Orange circles indicate a regenerated forest patch after early removal while red circles encompass a clearcutting area. ..................................................................................................................................................... 74

Figure 27 An example of how scale affect the classified area. Suppose each smallest cell unit is 30m by 30m in size, there are 20 cells or 18000m2 forest area. If using 120m by 120m cell, there are 3 cells or 43200m2. If using the entire 240m by 240m scene, the area is classified as one forest patch, with an area of 57600m2. ......................................................................................................................... 76

Introduction

Forestland is commonly defined as land that is at least one acre in area and has at least 10% area

stocked with trees of any size, or previously had such tree cover but not currently being

developed for non-forest use (Helms, 1998). The Resource Planning and Act assessment (USDA

Forest Service, 2012) additionally limits a width of at least 120 feet (37 meters). It also includes

transition zones with 10% tree cover and excludes lands predominantly under agricultural and

urban land use. Forests, when properly managed, are known to be a major carbon sink that can

mitigate the process of climate change. In the United States, forest growth and afforestation

offset approximately 13 percent of the Nation’s fossil fuel CO2 production in 2012 (Vose et al.,

2012). Traditionally, forest is well recognized for its economic, social and ecological values.

Commercial forest (Timberland) provides valuable wood products, while reserved forest is

preserved for recreations, aesthetics, wildlife, biodiversity, etc. The importance of sustainable

forest management that aims to conserve the forest for the benefit and sustainability for future

generations is increasingly acknowledged by the public nowadays. Therefore, it is crucial to

monitor forest changes and to estimate deforestation for tracking carbon stocks and fluxes

(Running, 2008), as well as to support decision making for better forest management for the

benefit of the society. Moreover, monitoring these deforestation and regeneration events over

time is also important since natural and human-induced disturbances that cause deforestation is

becoming more and more frequent under climate change (Overpeck et al., 1990; Westerling et

al., 2006). Natural disturbances include hurricanes, earthquakes, wildfires, increased temperature,

drought, pathogens and insect attacks (Soja et al., 2007; Kurtz et al., 2008; Westerling and

Bryant, 2008). Human-induced disturbances include logging, clear-cutting and prescribed fire.

The detection of these disturbances and land use changes provides evidence for scientists and

policy makers to study the implications of such changes and to project future trends. In

particular, slow-onset forest disturbances, which are commonly caused by insects and pathogens,

comprise a significant source of long-term carbon dioxide emissions to the atmosphere through

decomposition of dead organic matter leading to climate warming (Metz, 2001; Kurz et al. 2008;

Maness et al., 2013). Currently, there are two major challenges for forest change mapping.

Firstly, there is a lack of reliable approaches for detecting slow-onset disturbances spatially and

temporally as well as distinguishing them from rapid-onset disturbances. Secondly, there is a

lack of efficient algorithms for detecting forest changes over many years in a large area such as a

large State such as California with rich forest resources, or even the entire United States.

Therefore, to address the first challenge, we were particularly interested in accurately tracking

slow-onset disturbances with satellite images acquired in multiple years. For the second

challenge, we focused on developing an efficient automatic algorithm based on K-means, a

widely used algorithm for data mining and applied this algorithm in a practice of large-area

mapping over many years.

This dissertation paper consists of four chapters. In the first chapter, a reliable semi-automatic

algorithm for detecting slow-onset disturbances vs. rapid-onset disturbances based on Landsat

image stacks from 2001 to 2011 for Grand County in Colorado was developed. The algorithm

was named Berkeley Indices Trajectory Extractor (BITE). Temporal trajectories of multiple

spectral indices were processed with unique techniques followed by interpretation and

integration. An overall accuracy of 94.7% for the classification of disturbance types was

achieved. The BITE product effectively maps the spatial and temporal dispersal of mountain pine

beetle outbreak that occurred during the time frame in the study area, supporting better

understanding of fundamentals of mechanics of insect attack patterns. Furthermore, this

algorithm should be suitable for detecting other disturbances that result in canopy loss regardless

of the speed of deforestation.

However, BITE had high computational cost and was time consuming when executed in an

ordinary lab computer. In the second chapter, an efficient unsupervised algorithm was proposed

with great improvement of lowering computational cost of conventional K-means algorithm. The

algorithm was named Clustering Based on Eigen Space Transformation (CBEST). The algorithm

compressed the data before iterating calculations for the clustering process, making the original

data size based computational cost to be based solely on a fixed number of desired compressed

space. Although there is information loss during the compression, the analysis and experiment on

some test images suggest the loss could be ignored in practice, however achieving great

improvement in computing time.

In the third chapter, the CBEST algorithm was applied in producing a forest change map for the

entire state of California from 1986-2011 with a five-year interval. With a total of 450 Landsat

Thematic Mapper images, the entire computing time was approximately 10 hours in an ordinary

lab computer. The overall accuracy was assessed for the forest cover in 2011 derived from the

map as 92.9% ± 1.6%. This efficient approach allowed us to produce the first California forest

change map with such spatial resolution of 30 meters and temporal coverage of 25 years. The

facts of California’s forestland were found using the produced map. No significant directional

change was observed. The differences between the produced map and previous forest inventories

were discussed.

In the fourth chapter, the achievements from the first three chapters were summarized. The links

between these chapters were explored and the further integration of BITE and CBEST was

envisioned to take advantages of both algorithms in a larger extent. The ultimate goal was to

efficiently and reliably map the forest changes for a relative large administrative area or

ecoregions. The potentials and benefit of the study were also prospected.

Acknowledgement

I am grateful to Congcong Li for sharing the processed TM image and validation data used in

this article. We also thank David Landgrebe from Laboratory for Applications of Remote

Sensing, Purdue University for sharing the data online. This research has been partially

supported by USGS (grant number G12AC20085) and a national high technology program grant

from China (grant number 2009AA12200101).

Chapter 1 BITE: an algorithm for mapping slow-onset forest disturbances caused

by mountain pine beetles with Landsat image stacks