copyright 2017, vaughn smith

86
Near real-time monitoring of tropical dry forests in North and Central America by Vaughn Smith, B.S. A Thesis In Wildlife, Aquatic, and Wildlands Science and Management Submitted to the Graduate Faculty of Texas Tech University in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE Approved Dr. Carlos Portillo-Quintero Chair of Committee Dr. Guofeng Cao Dr. Gad Perry Mark Sheridan Dean of the Graduate School December, 2017

Upload: others

Post on 29-May-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Copyright 2017, Vaughn Smith

Near real-time monitoring of tropical dry forests in North and Central America

by

Vaughn Smith, B.S.

A Thesis

In

Wildlife, Aquatic, and Wildlands Science and Management

Submitted to the Graduate Faculty

of Texas Tech University in

Partial Fulfillment of

the Requirements for

the Degree of

MASTER OF SCIENCE

Approved

Dr. Carlos Portillo-Quintero

Chair of Committee

Dr. Guofeng Cao

Dr. Gad Perry

Mark Sheridan

Dean of the Graduate School

December, 2017

Page 2: Copyright 2017, Vaughn Smith

Copyright 2017, Vaughn Smith

Page 3: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

i

ACKNOWLEDGMENTS

My time at Texas Tech University has been incredibly rewarding, primarily

thanks to Dr. Carlos Portillo-Quintero, my thesis committee chair and mentor, who

expertly guided me through the last two and a half years of scientific exploration. I have

always had an interest in the natural and technical sciences, but was admittedly hesitant

upon accepting the graduate research assistant position with Dr. Portillo-Quintero as I

thought my skills and knowledge may be lacking. However, with Dr. Portillo-

Quintero’s continued encouragement and tutelage I acquired the necessary geographic

information system and data science proficiencies to significantly contribute to an

emerging body of research. This study would not have been possible without support

from the Tropi-Dry Collaborative Research Network funded by the Inter-American

Institute of Global Change Research. I am quite pleased with the results of my research

and feel like I can truly call myself a ‘scientist,’ thanks to Dr. Carlos Portillo-Quintero.

I would also like to thank my other committee members, Dr. Guofeng Cao and

Dr. Gad Perry for guiding me through my thesis. Their thoughtful and thorough

inquiries throughout the process helped me to create a more robust final product, on

which I am proud to put my name.

Additionally, I would like to thank Dr. Robert Cox, Dr. Terry McLendon, and

Dr. Katie Lewis, all brilliant and passionate professors who’s courses I sincerely

appreciated. They, as well as all of the faculty, staff and students comprising the College

of Agricultural Sciences and Natural Resources, helped to provide all of the

supplementary expertise needed to fully round out and complete my graduate education.

Finally, I would like to thank my friends, old and newly made in Lubbock, and

family, especially my mother, Dr. Katherine A. Groves, who has provided unending

love, support, and guidance throughout my life; her strength and intellect have always

been and will continue to be an inspiration. I must also acknowledge my father, Chester

B. “Solo” Smith, who passed away in 2005 – your love, wisdom and overall grand

personality is missed and cherished.

Page 4: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

ii

TABLE OF CONTENTS

ABSTRACT ................................................................................................................. iii

LIST OF FIGURES .................................................................................................... iv

INTRODUCTION .........................................................................................................1

1.1 Objectives of the study ..........................................................................................5

LITERATURE REVIEW .............................................................................................7

2.1 Defining tropical deforestation .............................................................................7

2.2 Tropical dry forest deforestation trends in Latin and Central America ................8

2.3 Mapping deforestation using remote sensing and potential for near

real-time monitoring ...........................................................................................10

2.4 BFAST algorithm family ....................................................................................12

2.5 The challenges of detecting and tracking deforestation in TDF .........................16

2.6 Other change detection algorithms .....................................................................16

MATERIALS AND METHODS ...............................................................................18

3.1 Study area and project context ............................................................................18

3.1.1 Yucatan, Mexico ..........................................................................................19

3.1.1.1 Study Area 1 (Y1): East of Tekit ..........................................................20

3.1.1.2 Study Area 2 (Y2): Tekik de Regil .......................................................20

3.1.2 Guanacaste, Costa Rica ................................................................................21

3.1.2.1 Study Area 1 (G1): Cuajiniquil-Soley ..................................................21

3.1.2.2 Study Area 2 (G2): North of Bijagua ....................................................22

3.2 Data acquisition and preprocessing.....................................................................23

3.3 Vegetation indices ...............................................................................................25

3.4 Open-source and licensed software.....................................................................27

3.5 System architecture .............................................................................................28

3.6 Design of a near-real-time monitoring system using BFAST.............................34

3.7 Validation ............................................................................................................36

3.8 Near real-time validation ....................................................................................39

RESULTS AND DISCUSSION .................................................................................41

4.1 Breakpoints and magnitudes ...............................................................................41

4.2 Accuracy assessment...........................................................................................48

4.3 Step towards a near real-time deforestation monitoring system in

Central America. .................................................................................................55

4.4 Sources of error and implications for BFAST implementation ..........................56

4.5 Implications for biodiversity and conservation ...................................................59

CONCLUSION ............................................................................................................62

LITERATURE CITED ...............................................................................................63

APPENDICES .............................................................................................................67

A. BFAST CODE IMPLEMENTATION IN RSTUDIO .........................................67

B. ERROR MATRICES ...........................................................................................72

Page 5: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

iii

ABSTRACT

Tropical Dry Forests (TDF) represent one of the most preferred habitats in the

tropics for human settlements and exploitation, and directly and indirectly provide vital

natural resources, such as food, water, wood products, minerals, medicines, etc., to

support the lives and livelihoods of approximately 90 million people in Latin America.

Unfortunately, the rate of TDF deforestation in Latin America, as well as globally, has

had an increasing trend for the last several decades. Deforestation monitoring, using

time-series of Landsat imagery, has is becoming a reality with the advent of cloud

computing, open-source programming platforms, and near real time distribution of

imagery data, but little has been done to implement these systems in TDF landscapes.

The general objective for my research was to evaluate the feasibility and efficiency of

automated time-series analysis tools (e.g. BFAST in the R statistical analysis

programming language) for detecting and monitoring deforestation in TDF landscapes

using satellite imagery. Results show that BFAST time-series analysis tools were

effective in accurately determining deforestation events. Vegetation indices that utilize

the shortwave infrared bands prove to be more sensitive to forest disturbance than other

indices using the red and near infrared bands. Moderate to extreme negative magnitude

values proved to be the determining products that indicated a deforestation event, with

value ranges varying widely between study sites/regions. However, the application of

BFAST for shorter time frames in near real-time (weeks to 3 months) will only be

possible through the use of combined, multi-sensor data to handle gaps due to poor

quality images and cloud cover, as well as external data to eliminate commission errors.

The methods discussed in this study could provide near real-time and eventually true

real-time capabilities that provide a better understanding of land-cover change

dynamics, which would assist in conservation efforts to help protect biodiversity around

the world.

Page 6: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

iv

LIST OF FIGURES

1. Example of first-order harmonic model fitted to real Landsat

observations (pixel values) by DeVries et al., 2015 demonstrating

sequential monitoring approach for detecting breaks (Red line). .....................13

2. Example of a breakpoint flagged on a single pixel by BFAST.........................14

3. Location of the study areas in Mesoamerica: A) Yucatan, Mexico and

B) the Guanacaste Region, Costa Rica. Green coverage refers to

Tropical Dry Forest extent mapped by Portillo-Quintero and Sanchez-

Azofeifa (2010) for Mesoamerica. ....................................................................18

4. Location of Y1 and Y2 sites in the Guanacaste Region, Costa Rica.

Green coverage refers to Tropical Dry Forest extent mapped by

Portillo-Quintero and Sanchez-Azofeifa (2010) for Mesoamerica. ..................20

5. Location of G1 and G2 sites in the Guanacaste Region, Costa Rica.

Green coverage refers to Tropical Dry Forest extent estimated by the

GlobCover2009 project for Mesoamerica. ........................................................22

6. Comparison of spectral bands for Landsat 7 (L7 ETM+) and Landsat 8

(OLI & TIRS) ....................................................................................................24

7. Example of the electromagnetic signature of healthy green vegetation

and associated absorption and reflectance features. ..........................................25

8. Folder architecture that needs to be created outside of R environment

on computer. ......................................................................................................29

9. Flowchart representing methods of this study. .................................................33

10. Model visualization of how near real-time system functions. ..........................35

11. Reference data locations for validation. One-hectare grid cells visually

inspected via multi-temporal images available in Google Earth (total

n=373). Y1 (‘D’=64; ‘S’=54), Y2 (‘D’=32; ‘S’=32), G1 (‘D’=21;

‘S’=50), G2 (‘D’=70; ‘S’=50). ..........................................................................38

12. Validation example. Deforested cells (‘D’) – areas visibly covered by

TDF beginning 2013, and visibly non-forested by 2016, with soil

exposure. Stable cells (‘S’) referred to areas of any land covers that

remained the same during 2013-2016. ..............................................................39

13. Magnitude outputs for NBR2 in Yucatan site 2. Monitoring period

07/11 to 07/15 (green to red) overlays monitoring period 01/12 to

12/15 (green to blue) so that new deforestation is highlighted in blue. ............40

14. Maps of breakpoint magnitudes for all VIs for Y1 and pixel

percentages. .......................................................................................................44

Page 7: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

v

15. Maps of breakpoint magnitudes for all VIs for Y2 and pixel

percentages. .......................................................................................................45

16. Maps of breakpoint magnitudes for all VIs for G1 and pixel

percentages. .......................................................................................................46

17. Maps of breakpoint magnitudes for all VIs for G2 and pixel

percentages. .......................................................................................................47

18. Overall accuracies for Y1, Y2, G1, G2, all sites, and all sites except G2.........49

19. Producer’s accuracies for Y1, Y2, G1, G2, all sites, and all sites except

G2. .....................................................................................................................50

20. User’s accuracies for Y1, Y2, G1, G2, all sites, and all sites except G2. .........51

21. Evaluation of near real-time accuracy of BFAST. Points represent new

breakpoints detected in a 6-month window. Imagery shows ground

truth data. The accuracy for this assessment in Y2 was estimated in

55%. ..................................................................................................................54

22. Example of false positive detected breaks not associated with

deforestation. .....................................................................................................55

23. Image A from 02/25/2005 and B from 10/11/2015 in G2 NDVI stack

showing lack of data due to cloud mask (and due to Landsat 7 Scan

Line Corrector error in Image A). .....................................................................57

24. Example taken from Murillo-Sandoval et al. 2017. Three breakpoints

(dashed red lines) and four segments (black lines) identified over time

series (blue lines). The slope coefficients (β) are all significant

(α = 0.05) and ρ represents p-values. ................................................................58

Page 8: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

1

CHAPTER I

INTRODUCTION

Global deforestation rates have displayed an overall increasing trend throughout

the last several decades contributing significantly to the loss of biodiversity as well as

to the loss of the potential for carbon sequestration (Portillo-Quintero, et al. 2014).

Interestingly, while forests ecosystems across the world experience loss and regrowth

at different rates, the global trend in recent years suggests a reduction in forest loss

(Hansen, et al. 2013). For example, the forests of Brazil, which have historically

experienced large scale deforestation events, underwent a verified reduction in

deforestation from 2.8 million hectares in 2003 – 2004 to 1.3 million hectares in 2007

– 2008 due to a combination of enhanced conservation efforts as well as economic

decline (Butchart, et al. 2010). However, this is offset by increased deforestation in

Eurasian tropical rainforest, African tropical moist deciduous forest, South American

dry tropical forests, and Eurasian tropical moist deciduous and dry forests (Hansen, et

al. 2013). Additionally, regrowth does not necessarily translate to regained biodiversity,

as regrowth tends to result in secondary forest with altered successional species

composition that differ from mature primary forests (Read and Lawrence, 2003).

The importance of biodiversity cannot be understated as it is what underlies the

delicate balance of various forest ecosystems around the world. Biodiversity has a

number of features, such as richness of species, ecosystem type rarity, abnormal

evolutionary or ecological occurrences, rarity of higher taxonomical groups, and status

of endemic species that each can each individually contribute to overall biodiversity loss

if affected (Olson et al., 2001). Biodiversity is significant, not only in terms of the

simple aesthetic beauty of nature and availability of precious natural resources, but also

in terms of measures of productivity. Generally, the productivity of a forest is positively

correlated with species richness (Vila et al., 2007). It has also been found by Bohn and

Huth (2017) that forest structure as well as species richness have an impact on

productivity factors such as above-ground wood production, which significantly

Page 9: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

2

impacts carbon-sequestration. The loss of biodiversity in forest ecosystems around the

globe is a current major issue in biological conservation. Remnants of forest in high

biodiversity ecoregions need to be protected and continually monitored, ideally as close

to real-time as possible in order to produce actionable results to stem the tide of

biodiversity loss.

One of the national strategies implemented by governments around the world to

protect biodiversity and avoid further loss is the design and maintenance of protected

areas. Since the mid-twentieth century, national and international environmental

agencies have designed and expanded a system of protected areas (PA) across the globe

to preserve the currently fragmented natural ecosystems of the world. However, these

areas, once isolated from highly populated areas and distant from threats, are now

embedded in social-ecological land systems characterized by patches of preserved or

managed natural ecosystems in a ‘matrix’ of urban, agriculture and livestock ranching

land uses that has expanded rapidly, increasing land conflicts between stakeholders and

degradation outside and inside the PA system (Boillat et al. 2017). Patches of forest

survive in human-dominated landscapes that are highly variable in time and space,

where choices of rotational crops or land abandonment shape the dynamics of forests in

terms of its extent or ecological functionality. Such human-dominated landscapes are

common in the tropical forests of Mesoamerica, a region that is known for still harboring

some of the most biodiversity-rich forests in the world (Garcia-Frapolli, 2007).

Land use and land cover in the countries of Mesoamerica have undergone

change in different directions as a result of the complex history of the politics and

socioeconomic conditions of the region. Armed conflicts in the 80’s, post-conflict

pacification and recent changes in socioeconomic conditions have caused profound

fluctuations in land distribution, tenure and land use change. Different circumstances in

land quality, as well as access to credit and insurance, for small and large land owners

have shaped the distribution of land use in Central America. Poverty and migration have

also influenced decisions on land use that have led to expansion of pastures for cattle

ranching to the detriment of natural landscapes, while government driven investment on

Page 10: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

3

agricultural intensification has favored the expansion of high-yield crops, especially for

large land owners (Davis and Lopez-Carr, 2014). Furthermore, all countries differ in

their land use policies, land redistribution history, as well as biodiversity protection

policies and law enforcement capacity.

In 2003, natural vegetation, including secondary forests and selectively logged

forests, was estimated to cover 57% of Mesoamerica, with the remaining area being

used predominantly for crop (mostly corn, coffee, beans and sugar cane) and cattle

production (42%) and 1–2% in urban and other land covers (DeClerck et al. 2010). The

Central American Commission on Environment and Development (CCAD) calculates

that 400,000 ha of forest are being lost on an annual basis in the Mesoamerican region.

Deforestation rates continue to be high, although recent reports have also notice forest

regrowth in some areas of Central America. In any case, the dynamics of forest loss,

regrowth and disturbance, in addition to the land use change related to urban,

agricultural and cattle ranching expansion are complex in the region and mostly tied to

contextual and local factors.

The use of Geographic Information Systems (GIS) and remote sensing (satellite

imagery) has played a key role in understanding the past patterns and trends in

deforestation across the region. Each country in the region has established a monitoring

program that relies on the use of satellite imagery for mapping the extent and

distribution of terrestrial ecosystems. Countries like Costa Rica and Mexico have had a

long tradition in the use of remote sensing products for understanding ecosystem extent,

while other countries currently lack updated information on the conservation status of

its forests. However, in even the best of cases, land cover and land use maps are typically

generated every 5 to 10 years for a country. An example is the CCAD Central American

Land Cover Map for 1980, 1990, 2000, and 2010 developed by CATHALAC

(http://cathalac.org/) in 2011, which allows observing long term deforestation patterns

and trends to study its causes.

One of the reasons for studying deforestation dynamics at these time intervals in

local institutions of Mesoamerican countries, is the limitation of computing power and

Page 11: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

4

software licensing that allows for larger storage and image processing capabilities.

However, the new era of cloud computing has enabled faster and more efficient

processing of satellite images that are globally available. Distribution archives from

NASA and the European Space Agency (ESA), are now capable of freely distributing

raw satellite imagery collected in the same day, or readily processed data with only a

few days of delay. A recent dataset produced by Hansen et al. (2013) and distributed

through the Global Forest Watch website, has produced ‘tree cover’ loss and gain maps

from 2000-2015 at annual rates for the whole world. Although the ‘tree cover’ definition

includes not only forests, but also disturbed vegetation, plantations and other tree

dominated land covers, this dataset is helping researchers understand trends in

vegetation cover, especially in threatened ecoregions, while highlighting areas of rapid

change in the last years. The Hansen et al. (2013) product is updated every year, but still

only provides information with a lag of two years (2000-2015) and cannot provide sub-

annual information (monthly) and/or indicate the occurrence of deforestation in near

real-time.

Over the past several years there have been significant advances in the design of

continuous land cover change (CLCC) mapping algorithms that use the complete record

of Landsat data, taking advantage of the high-quality Landsat data archive that became

freely available in 2008 (Cohen et al. 2017). These unique techniques in remote sensing

allow the user to study the trend in pixel values across hundreds to thousands of images

and detect when a pixel value drastically changes, indicating a change in surface

reflectance, and thus, in land cover or land use. CLCC algorithms can produce outputs

that include the exact date when the abrupt change occurred. Some algorithms can

produce highly accurate land use and land cover maps at any given time for the satellite

image time series. Its application relies on the heavy use of programming languages

such as Python and Matlab and the use of high performance computational

infrastructure. CLCC mapping algorithms can be iterated to register significant breaks

in pixel values of satellite imagery, as new data is acquired and processed. These

Page 12: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

5

algorithms have opened up the possibility of establishing real-time or near real-time

detection of deforestation using satellite imagery.

The potential for real-time or near real-time change detection has become more

of a reality in recent years as access to higher spatial, spectral and temporal resolution

datasets have become more available, along with better tools to analyze this data.

However, algorithms and methodologies have been mostly applied for research

purposes and are not yet operational on the ground. For biodiversity-rich Latin

American countries, the technology is still far from its operationalization. Ideally, such

a system will allow a director of a conservation effort or the manager of a national park

to receive a report with a map of potentially deforested areas every month, or every 3 to

6 months, allowing for actions to be taken in exact locations in the field as early as

possible to prevent further forest cover losses. For this system to operate as a supporting

tool in decision making in countries of the Mesoamerican region, it has to be credible.

High accuracies in the detection of deforestation need to achieved. Because of this, it is

important to test its accuracy in different scenarios, ecoregions and socio-ecological

systems, using a variety of image-based products (vegetation indices) and algorithm

parameters.

1.1 Objectives of the study

The general objective for this research was to evaluate the feasibility and

efficiency of automated time-series analysis tools (e.g. BFAST; BFASTMonitor;

BFASTSpatial) for detecting and monitoring deforestation in tropical dry forest

landscapes using Landsat satellite imagery.

The specific objectives for this research were:

1. Evaluate the accuracy of automated time-series analysis tools (e.g.

BFAST; BFASTMonitor; BFASTSpatial) applied on Landsat imagery for

the detection of deforestation events in tropical dry forests.

2. Evaluation of the capabilities of BFAST time-series analysis tools to

track changes in near real-time using Landsat imagery.

Page 13: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

6

To fulfill specific objective 1, vegetation index data from Landsat satellite

imagery between 2000 and 2016 was used as input data for the ‘bfastSpatial’ algorithm.

This analysis produced outputs that showed breaks in a seasonal trend that correlated to

TDF canopy loss. An accuracy assessment was implemented to evaluate the algorithm’s

accuracy for each vegetation index. To fulfill specific objective 2, I tested the accuracy

of the ‘bfastSpatial’ algorithm in detecting recent deforestation when sets of new

observations were added to the time series. I then evaluated the sequential outputs to

determine temporal differences between the detected break and ground truth data.

Page 14: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

7

CHAPTER II

LITERATURE REVIEW

2.1 Defining tropical deforestation

According to the Forestry Department of the F.A.O. in their Global Forest

Resources Assessment 2010, “Deforestation” is defined as the conversion of forest to

other land use or the long-term reduction of the tree canopy cover below the minimum

10 percent threshold. Deforestation implies the long-term or permanent loss of forest

cover and denotes transformation into another land use. Such a loss can only be caused

and maintained by a continued human-induced or natural perturbation. Deforestation

includes areas of forest converted to agriculture, pasture, water reservoirs and urban

areas. The term specifically excludes areas where the trees have been removed as a

result of harvesting or logging, and where the forest is expected to regenerate naturally

or with the aid of silvicultural measures. Unless logging is followed by the clearing of

the remaining logged-over forest for the introduction of alternative land uses, or the

maintenance of the clearings through continued disturbance, forests commonly

regenerate, although often to a different, secondary condition. In areas of shifting land

use, forest, fallow forest and agricultural lands appear in a dynamic pattern where

deforestation and the return of forest occur frequently in small patches. To simplify

reporting of such areas, the net change over a larger area is typically used by F.A.O

methodologies. Deforestation also includes areas where, for example, the impact of

disturbance, overutilization or changing environmental conditions affects the forest to

an extent that it cannot sustain a tree cover above the 10 percent threshold.

However, others authors such as Sierra (2000) have a much simpler definition

of deforestation whereby deforestation is simply, total removal of forest canopy for any

reason (including logging). For the purposes of this research deforestation will be

defined as complete loss of forest canopy for any reason at any scale, even at sub-hectare

scales. This level of small-scale deforestation may not seem significant, but typically

processes in forest conversion to other land uses is progressive, starting with small

Page 15: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

8

clearings and then expanding to greater extensions. Therefore, the method here applied

will be evaluated at its full potential for detecting and including all detected forest

clearings at the minimum mapping unit of a Landsat pixel size (0.09 ha).

2.2 Tropical dry forest deforestation trends in Latin and Central America

This study is focused on Tropical Dry forests (TDF), which represent one of the

most preferred habitats in the tropics for human settlements and exploitation (Murphy

& Lugo, 1986; Sánchez-Azofeifa et al., 2005). Tropical dry forests are defined by

various authors in various ways, but in general, TDF can be defined, as Sánchez-

Azofeifa et al. (2005) have described, as a tropical ecosystem where at least 50% of

trees present are drought deciduous (trees completely shed their leaves in the dry

season), the mean annual temperature is at least 25 °C, total annual precipitation ranges

between 70 and 200 cm, and there are three or more dry months every year (precipitation

less than 10 cm).

Tropical dry forest loss in Latin and Central America, as well as globally, has

had an increasing trend for the last several decades. Murphy and Lugo (1986) identified

that about 40% of the earth’s tropical and subtropical landmass is dominated by open or

closed forest, where 42% is dry forest. According to Miles et al. (2006) more than half

(54.2%) of the remaining dry forests are located within South America, and the

remaining area of dry forest is almost equally divided between North and Central

America (12.5%), Africa (13.3%), and Eurasia (16.4%), with a relatively small portion

in Australasia and Southeast Asia (3.8%). Miles et al. (2006) suggest that the total

estimated area of remaining TDF is approximately 1,048,700 km2. According to

Portillo-Quintero et al. (2010) the potential extent of TDF in North and Central America,

South America, and the Caribbean islands is approximately 1,520,659 km2 while the

current extent is actually 519,597 km2. Such findings indicate that the TDF has suffered

a loss of 66% of its historical potential cover.

Drivers of deforestation in TDF can be very different between and within

countries, but the main driver of deforestation is unequivocally due to intensive

Page 16: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

9

anthropogenic disturbance. According to Portillo-Quintero et al. (2014) and Murphy &

Lugo (1986), the tendency for TDF to have relatively flat terrain, fertile soils with less

aggressive successional vegetation, seasonality in rainfall that allows for short-cycle

crop agriculture, climate more suitable for livestock and less suitable for mosquitoes

that spread diseases, and lower overall biomass that facilitates clearing are all the

primary reasons that human populations have an affinity for TDF. Many resources that

are useful to human populations, not only in rural areas, but urban as well, are found in

TDF: plants used for food, beverages, condiments, construction materials, firewood,

medicinal/herbal remedies; animals for hunting; and shade and fresh air for locals to

enjoy (Portillo-Quintero et al., 2014).

Tropical dry forest ecosystems directly and indirectly provide vital natural

resources, such as food, water, wood products, minerals, medicines, etc., to support the

lives and livelihoods of approximately 90 million people in Latin America. In addition

to providing life-supporting natural resources, tropical dry forest ecosystems have a

significant impact on global climate as they have at least half of the rainforest’s carbon

storage capacity. In the Americas, for example, TDF restoration could potentially add 8

Gt (gigatons) of carbon to the potential total ecosystem carbon stock (Portillo-Quintero

et al. 2014). Beyond these facts, TDF provide much of the planet’s biodiversity, which

is intrinsically beneficial – at the least in terms of simple aesthetics and the beauty of

nature, and more so to provide opportunities to study, research, learn from and gain a

deeper understanding of nature.

Understanding the patterns of tropical deforestation and having the ability to

measure monthly or annual rates of deforestation in an efficient and timely manner, will

help to efficiently allocate resources for TDF management, conservation, and

restoration efforts in critical areas of its distribution in Latin America. In Mesoamerica,

preventing further TDF losses is especially important in the context of current watershed

management within the “Corredor Seco Centroamericano” (or Central American Dry

Corridor), a region of the pacific coast that has been recently subject to frequent

droughts, with detrimental consequences to local economies and vulnerable populations

Page 17: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

10

dependent on subsistence agriculture. The region is so at risk that it has been a concern

for international humanitarian aid agencies.

2.3 Mapping deforestation using remote sensing and potential for near

real-time monitoring

Early mapping techniques, such as those demonstrated by Trejo and Dirzo

(1999), used early potential vegetation maps with more recent current land use maps to

compare the potential versus the existing vegetation. Most of this work included the

manual digitization of forest cover extent over large areas using aerial photography and

satellite imagery as ground-truth information for forest cover. This approach suggested

a tremendous effort from digitizers and analysts and limited change detection to few

time steps in time. However, technological developments in automated mapping tools

for satellite imagery since the 70's have allowed scientists to map forests on a regular

basis for any particular area of the world to understand the temporal trends of

deforestation on an annual basis or across decades. As new satellites and data

distribution methods become available, the temporal resolution and the level of detail

and data (spatial and spectral resolution) of the datasets have increased, yielding much

better products that help to understand the dynamics of deforestation at any particular

site.

The field of remote sensing has been advancing rapidly over the last 10-20 years

and two sensors have been of high importance for mapping and monitoring

deforestation: the MODIS (Moderate Resolution Imaging Spectroradiometer) sensor

system aboard the Terra and Aqua NASA satellites, which have been in orbit since the

year 2000; and Landsat series of satellites, which have been in orbit since the 1970s.

MODIS allows for surface multispectral data to be collected at 250-m, 500-m, and 1-

km resolution daily, every 8 days, every 16 days, or monthly, depending on the specific

data product such as surface reflectance, snow cover, or vegetation indices. LANDSAT

satellites also collect multispectral data every 16 days, but with much higher spatial

resolutions at 15-m, 30-m, and 100-m. Until recently, this data was difficult to collect,

Page 18: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

11

process and analyze, but due to the improvements made in computing technologies over

the last several years, this flow of data is much more efficient and less costly. For

example, the United States Geological Survey opened their archive of Landsat scenes

to the public free of charge in 2008, which then spawned several related products such

as EarthExplorer and Glovis, which are browser-based viewing tools as well as the Earth

Resources Observation and Science (EROS) Center Science Processing Architecture

(ESPA) ordering interface that allows for bulk ordering of customized, preprocessed

data. Additionally, other data repositories that may have imagery with better resolutions

that may charge a fee are offered free of charge or at a reduced rate through various

organizations and institutions, which helps to make remote sensing and its applications

much more accessible.

According to DeVries et al. (2015), to date, only a few remote sensing based

forest monitoring systems exist in tropical countries, the most advanced of which are

the PRODES and DETER systems of the Brazilian Space Agency (INPE), used for

annual deforestation mapping and near real-time deforestation monitoring, respectively.

However, with the opening of the U.S. Geological Service (USGS) Landsat data

archive, large amounts of medium-resolution optical earth observation data have been

made freely available to the public, which combined with continued advances in the

field of cloud computing for geospatial data has allowed for high temporal resolution

forest change monitoring at unprecedented spatial scales. An example of

implementation of remote sensing technologies and cloud computing techniques can be

found in the work of Hansen et al. (2013), which currently provides deforestation

information for the Global Forest Watch organization on an annual basis. In this study,

loss and gain of global tree cover extent was mapped using Landsat 7 data from 2000 to

2012 at a 30-m resolution. Over 600,000 Landsat 7 images were compiled and analyzed

using Google Earth Engine which applied a supervised learning algorithm to identify

per pixel tree cover.

Many scientists and researchers globally have started to utilize the full temporal

resolution of MODIS and Landsat datasets to detect and track trends in vegetation index

Page 19: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

12

products by automating time-series analysis of the satellite imagery. Vegetation Indices

(VIs) are combinations of surface reflectance at two or more wavelengths designed to

highlight a particular property of vegetation. One of the more commonly used

vegetation indices is the Normalized Difference Vegetation Index (NDVI), which serves

to quantify healthy, green vegetation (Daughtry et al. 2005). The NDVI is created using

a ratio of these two wavelengths: the red wavelength (~600-700 nm) in the visual

spectrum and the near-infrared (NIR) wavelength (~700-1300 nm). Recently, a team of

researchers from Wageningen University (Jan Verbesselt, Loic Dutrieux and Ben

DeVries) have designed and implemented a package in the R statistical programming

language called ‘Breaks for Additive Seasonal and Trends’ or BFAST that allows for

the detection of breaks from a seasonal trend in a time-series of Vegetation Index values.

The BFAST algorithm in R includes a set of utilities and wrappers to perform change

detection on spatially gridded, time-series satellite data (Landsat and MODIS).

Essentially, they have used historical NDVI data over several years to create a trend line

based on the seasonality of the forest. Then, once a break from the trend is detected

using statistical methods in the algorithm, a magnitude is calculated for that break,

which is then used to determine if deforestation has occurred. In the next section, I

explain in detail the composition of the BFAST algorithm family.

2.4 BFAST algorithm family

The BFAST package for the statistical programming language, R, stands for

Breaks for Additive Seasonal Trends and was developed by Verbesselt, et.al. (2010).

The function accepts a univariate time-series object as an input along with other

adjustable parameters. For each pixel in a Landsat scene, the time-series is used to

create a best-fit seasonal regression model with a trend component. Seasonal regression

models recommended by previous studies (DeVries et al., 2015) are first-order

harmonic, which allows better description of the trajectories of pixel values in natural

systems under seasonal changes in precipitation and phenology (Figure 1).

Page 20: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

13

The first order harmonic model is explained by this equation:

where yt and t are the response (dependent variable) and time (independent

variable), f is the temporal frequency, α is the intercept, γ, and δ are the amplitude and

phase of the harmonic component, and εt is the residual (noise component).

The algorithm then detects if the real data deviates significantly enough from

the model and creates a breakpoint with a magnitude of deviation from the trend.

Breakpoints are detected within a user-defined monitoring period by computing

ordinary least squares (OLS-based) moving sums (MOSUM) of residuals using

observations from a selected fraction of the history period (defined by the h value):

Figure 1. Example of first-order harmonic model fitted to real Landsat observations

(pixel values) by DeVries et al., 2015 demonstrating sequential monitoring approach

for detecting breaks (Red line).

Page 21: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

14

where y and ŷ are actual and expected observations, respectively, n is the number of

sample observations, h is the MOSUM bandwidth (fraction of the number of sample

observations), and 𝜎 is an estimator of the variance (De Vries et al. 2015). A breakpoint

is signaled when |MOt| deviates from zero beyond a 95% significance boundary (Figure

2).

In addition, BFAST allows the computation of change magnitude (M) for each

breakpoint detected by taking the median of residuals within the monitoring period, in

which tn ≤ ti ≤ tN:

where yt and ŷt are actual and expected observations, respectively based on the

methods used by DeVries et al. (2015). BFAST registers the time when the breakpoint

was detected within the monitoring period.

There are several parameters that can be modified in BFAST, but the most

significant are:

▪ formula – regression model formula (harmonic and/or trend component)

▪ order – order of the harmonic term

Figure 2. Example of a breakpoint flagged on a single pixel by BFAST

Page 22: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

15

▪ start – starting date of the monitoring period

▪ history – specification of the stable history period

▪ h – numeric scalar between 0 and 1 specifying bandwidth relative to the sample

size in the MOSUM monitoring process

The ‘bfastmonitor’ package in R was later optimized to run on spatial data, since

the time-series input for ‘bfastmonitor’ is univariate and it cannot accept data in a raster

format. The optimized version, ‘bfmSpatial’, is able to accept a raster brick as an input

and runs ‘bfastmonitor’ on every pixel of an image. A raster brick is an object class in

R that consists of a single object that consists of multiple layers, which in this case

correspond to Landsat images with each layer from a specific date. The output of

‘bfmSpatial’ is a raster brick with the default layers being breakpoint, magnitude and

error with supplementary layers history, r.squared, adj.r.squared, and coefficients for

further external statistical analysis, but were not used for the purposes of this research.

Other research conducted utilizing the BFAST family of algorithms has shown

promising results. Verbesselt et al. (2010) found that BFAST accurately detected

significant phenological changes, both abrupt and gradual, over long periods of time

with an ability to filter out noise, or false positive breaks (although the quality of data

was noted as an important factor in handling noise). However, later research conducted

by Schultz et al. (2016) found several error sources related to the BFAST algorithm

including topography, atmosphere, edge effects and data availability and variance. All

of these factors contribute to commission error, but data availability is particularly

important in that the number of observations in the monitoring period significantly

affects accuracy and omission errors. The density of the time-series is key in that the

more data that is available, the better a model can be fit, and the more advancements in

data availability (i.e. data repositories, cloud computing, other Landsat-like sensors such

as Sentinel 1 and 2 with higher temporal, spatial and spectral resolution) will allow for

increased data density to fill in any potential gaps (Schultz et al., 2016).

Page 23: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

16

2.5 The challenges of detecting and tracking deforestation in TDF

Despite the many advances discussed, significant challenges still present

themselves when tracking deforestation, especially within TDF ecosystems. The

persistent small-scale changes in these landscapes (usually related to small-holder

agricultural expansion) and natural temporal patterns of leaf senescence are two major

constraints to the accurate mapping and accounting of deforestation (DeVries et al.,

2015). Patterns of senescence are prominent in tropical dry forests due to the

pronounced seasonality that is a feature of TDF. As water stress increases, senescence

increases as well, and the severity and length of dry periods will have an effect on the

level of senescence that a TDF species experience. Senescence can present

complications in a remote sensing capacity, especially in regard to disturbance /

deforestation tracking, but when observed over long enough periods of time,

phenological patterns can be distinguished from disturbance or deforestation events.

Fitting the right regression model over the pixel values for tropical dry forests

ecosystems ensures that phenological patterns are taken into account when estimating a

break in the series.

2.6 Other change detection algorithms

There are other change detection algorithms that are also currently being

researched, which show promise in annual as well as near real-time change detection.

Two examples of these algorithms are the Continuous Change Detection and

Classification (CCDC) algorithm developed by Zhu and Woodcock (2014) and the

Landsat-based Detection of Trends in Disturbance and Recovery (LandTrendr)

algorithm developed by Kennedy et al. (2010). The CCDC algorithm utilizes all

spectral bands from Landsat within a different mathematical model over each individual

pixel. The continuous aspect of the algorithm implies near real-time functionality, as

the algorithm is intended to have the capacity to detect changes with each newly added

image. LandTrendr on the other hand recognizes the limiting factors of Landsat data,

which include the 16-day temporal cycle of Landsat, cloud cover issues, as well as data

collection gaps. Because of this, LandTrendr utilizes an annual temporal scale in year-

Page 24: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

17

to-year change detection. LandTrendr is also a pixel based system like CCDC and

BFAST, but the LandTrendr algorithm allows for smoothing over longer periods

reducing spectral noise, as well as capture of more abrupt unsmoothed events, which

combines the trend-seeking and deviation-seeking approaches of previous studies.

BFAST was selected over these other well-known disturbance monitoring

algorithms such as the CCDC and LandTrendr because it has shown to be more resistant

to noise and missing data (due to persistent cloud cover) and it produces monthly

information on breakpoints and trends.

Page 25: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

18

CHAPTER III

MATERIALS AND METHODS

3.1 Study area and project context

For this study, I selected my study sites within two tropical dry forest ecoregions of

Mesoamerica: the dry forests of Yucatan Peninsula, Mexico; and the dry forests of the

Guanacaste Conservation Area, Costa Rica (Figure 3). Both ecoregions have distinctive

and contrasting land use histories, landscape distribution, species composition,

management regimes and anthropogenic threats.

Figure 3. Location of the study areas in Mesoamerica: A) Yucatan, Mexico and B)

the Guanacaste Region, Costa Rica. Green coverage refers to Tropical Dry Forest

extent mapped by Portillo-Quintero and Sanchez-Azofeifa (2010) for Mesoamerica.

Page 26: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

19

3.1.1 Yucatan, Mexico

The Yucatán Peninsula is located in southeastern Mexico and separates the

Caribbean Sea from the Gulf of Mexico. The tropical sub-humid climate becomes drier

moving towards the central portion of the region, with a pronounced dry season lending

to the deciduous nature of the forests, receiving less than 1200 mm/year of rainfall.

Additionally, the dry forests of the Yucatán are isolated from other dry forests by sea

and vast rainforests, which has created a region with a unique composition of

biodiversity.

The tropical dry forests of the Yucatan Peninsula are among the most threatened

ecosystems in America. Rotational crops (milpas) are a widespread practice in the

region. Forests are cleared for the establishment of crops (mainly corn), and then after

two to three years the land is abandoned and vegetation is allowed to regrow, while the

adjacent parcels with secondary vegetation are cleared for establishing another crop

(Garcia-Frapolli, 2007). This cycle is evident in the dynamics of land use and land cover

in the region. However, the expansion of agribusiness practices, tourism and the

expansion of cattle ranching in the area has contributed to increased rates of forest

conversion in this region. Many square kilometers of dry forest have been also

substituted either by henequén plantations, or by secondary communities that arise from

intense cattle grazing (Valero et al., 2017). I selected two (2) sites to implement this

methodology in the Yucatan peninsula (Figure 4): a) East of Tekit, Yucatan, Mexico,

and b) Tekik de Regil, Yucatan, Mexico.

Page 27: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

20

3.1.1.1 Study Area 1 (Y1): East of Tekit

The site east of Tekit will be referred to from now on as ‘Y1’ site (Figure 4).

The study site is centered around the following coordinates (UTM 16N WGS84

261754.47 m E, 2281785.68 m N) covering 100 km2 of tropical dry forest dominated

landscape, ten kilometers northeast of the rural Town of Tekit, which has a population

of around 10,000 inhabitants.

3.1.1.2 Study Area 2 (Y2): Tekik de Regil

The site Tekik de Regil will be referred to from now on as ‘Y2’ site (Figure 4).

The study site is centered around the following coordinates (UTM 16N WGS84

235655.17 m E, 2305423.92 m N) also covering 100 km2 of tropical dry forest

dominated landscape, fifteen kilometers southeast of the city of Merida, capital of

Yucatan state, which has a population of around 800,000 inhabitants.

Y1

Y2

Figure 4. Location of Y1 and Y2 sites in the Guanacaste Region, Costa Rica. Green

coverage refers to Tropical Dry Forest extent mapped by Portillo-Quintero and

Sanchez-Azofeifa (2010) for Mesoamerica.

Page 28: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

21

3.1.2 Guanacaste, Costa Rica

The Guanacaste Conservation Area is in the northwestern part of Costa Rica and

consists of two geographical zones: The Nicoya Peninsula and the Tempisque Northeast

Basin. It contains three national parks, as well as wildlife refuges and other nature

reserves that are managed by the Sistema Nacional de Areas de Conservacion (SINAC).

Because of this protected status, the area contains the largest amount of continuous,

undisturbed tropical dry forest from Mexico to Panama with approximately 120,000

terrestrial hectares. This area also experiences a pronounced dry period typical of

tropical dry forest, when at least 80% of the trees lose their leaves and stand leafless for

three to five months. The area receives between 800 and 2600 mm of rainfall, typically

between May and November. The Guanacaste area is characterized by a mix of tropical

dry and moist ecological zones with steep terrain and thin or infertile soils that are

mostly classified as unsuitable for agriculture (Calvo-Alvarado et al. 2009). I selected

two (2) sites to implement this methodology in Guanacaste (Figure 5): a) Cuajiniquil-

Soley, Guanacaste, Costa Rica., b) North of Bijagua, Alajuela Province, Costa Rica.

3.1.2.1 Study Area 1 (G1): Cuajiniquil-Soley

The site east of Cuajiniquil-Soley will be referred to from now on as ‘G1’ site

(Figure 5). The study site is centered around the following coordinates: UTM 16N

WGS84 645540.18 m E, 1213377.58 m N, covering 50 km2 of tropical dry forest

dominated landscape, within the Guanancaste Conservation Area in the Tempisque

Northeast Basin.

Page 29: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

22

3.1.2.2 Study Area 2 (G2): North of Bijagua

The site north of Bijagua will be referred to from now on as ‘G2’ site (Figure 5).

The study site is centered around the following coordinates: UTM 16N WGS84

708810.64 m E, 1195708.74 m N, covering 80 km2 of human dominated and fragmented

tropical dry forest landscape, outside of protected areas, and in the proximity of the

Arenal Volcano. The area has a low density population of rural cattle ranching farmers

and small towns.

These sites were selected because they represented areas where deforestation

has occurred annually (low or high levels of deforestation) since 2001 as verified by the

GFW tree loss dataset (http://www.globalforestwatch.org/map). Given that processing

larger sizes of data (one complete landsat scene) will take several hours of processing,

the size for each study areas comprised between 50-100 square kilometers of land. This

G1

G2

Figure 5. Location of G1 and G2 sites in the Guanacaste Region, Costa Rica. Green

coverage refers to Tropical Dry Forest extent estimated by the GlobCover2009

project for Mesoamerica.

Page 30: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

23

size was identified as optimum for this study because it allows us to repeat and iterate

‘bfastSpatial’ processing of Landsat Time series multiple times using the power of a

Desktop PC (16GB RAM) for research purposes. The size also allowed us to purchase

complete coverage of high resolution imagery (GeoEye, Worldview) for the same

locations for inspection and initial validation purposes.

3.2 Data acquisition and preprocessing

Satellite images used for this analysis consisted of multispectral images from the

Landsat 7 Enhanced Thematic Mapper + (ETM+) and Landsat 8 Operational Land

Imager (OLI) sensors.

Landsat 7 was launched in 1999 and utilizes a whisk broom scanning approach

that uses a single detector and mirror to acquire data one pixel at a time by scanning

back and forth. However, these scanners have more moving parts that are subject to

failure, as in 2003 when the scan line corrector failed, creating data gaps. While

approximately 75% of the data for each scene is collected, it still creates an issue for

time-series data analysis. Landsat 8 on the other hand uses a push broom scanner, which

has multiple detectors that scan a line of pixels all at once, thus being less susceptible

to the wear and tear of having more moving parts. The Landsat 8 satellite has been

operational without error since 2013. Landsat 7 and 8 are both similar in terms of spatial

and temporal resolution, with a 30 meter resolution (each pixel is 30x30 m) and 16-day

revisit time. However, they differ slightly in their spectral resolution with Landsat 7

having 9 bands and Landsat 8 having 11 bands with some variation in their position and

range within the electromagnetic spectrum (see Figure 6 for comparison).

Page 31: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

24

I used the USGS Glovis and Earth Explorer application to retrieve Landsat scene

lists of all available scenes without any filters such as cloud cover, which were then

used as inputs in the USGS ESPA ordering system. The USGS ESPA ordering system

provides additional data output products such as calculated vegetation indices, product

metadata, surface reflectance, top of atmosphere reflectance, brightness temperature,

and pixel QA band that is used to create a cloud mask in R via the processLandsatBatch

function. The USGS ESPA ordering system also allows for image preprocessing such

as reprojection, image extent modification and pixel resizing.

For the Y1 and Y2 sites, Landsat imagery from Path/Row 20/46 scenes was

acquired through ESPA. A total of 241 images available between 2000 – 2016 were

processed. For G1 and G2, Landsat imagery from Path/Row 16/52 and 16/53 scenes

was acquired through ESPA. A total of 224 images available between 2000 – 2016 were

processed.

The Landsat imagery products requested through ESPA corresponded to

vegetation indices (VIs). The VIs used in this study included the normalized difference

vegetation index (NDVI), enhanced vegetation index (EVI), normalized burn ratio

(NBR), normalized burn ratio 2 (NBR2), modified soil-adjusted vegetation index

Figure 6. Comparison of spectral bands for Landsat 7 (L7 ETM+) and Landsat 8

(OLI & TIRS)

Page 32: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

25

(MSAVI), and normalized difference moisture index (NDMI), which are explained in

the next section.

Unfortunately, the USGS ESPA ordering system made significant changes

during the course of this research, which presented issues that are currently in the

process of being resolved. The USGS ESPA ordering system changed the Landsat file

naming nomenclature as they made the switch to only processing Landsat Collection 1

images as opposed to Landsat pre-collection images. This new naming convention

wasn’t initially recognized by the ‘bfmSpatial’ algorithm, but in working with the

developers an initial workaround was put into place with a more permanent solution

being established currently.

3.3 Vegetation indices

Vegetation indices are spectral comparison functions of two or more bands on

the electromagnetic spectrum intended to emphasize various properties of vegetation.

For example, the normalized difference vegetation index (NDVI) is a ratio comparing

near-infrared and red reflectance values, as healthy vegetation typically displays very

low reflectance in the red band with high reflectance in the near-infrared band (see

Figure 7).

Figure 7. Example of the electromagnetic signature of healthy green vegetation

and associated absorption and reflectance features.

Page 33: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

26

The formulas to obtain all indices are as follows:

▪ NDVI = (𝑁𝐼𝑅 − 𝑅𝑒𝑑)

(𝑁𝐼𝑅 + 𝑅𝑒𝑑)

▪ EVI = 2.5 ∗𝑁𝐼𝑅 − 𝑅𝑒𝑑

𝑁𝐼𝑅 + 6 ∗ 𝑅𝑒𝑑 − 7.5 ∗ 𝐵𝑙𝑢𝑒 + 1

▪ 𝑁𝐵𝑅 = (𝑁𝐼𝑅 − 𝑆𝑊𝐼𝑅2)

(𝑁𝐼𝑅 + 𝑆𝑊𝐼𝑅2)

▪ 𝑁𝐵𝑅2 =(𝑆𝑊𝐼𝑅1 – 𝑆𝑊𝐼𝑅2)

(𝑆𝑊𝐼𝑅1 + 𝑆𝑊𝐼𝑅2)

▪ 𝑀𝑆𝐴𝑉𝐼 =2 ∗𝑁𝐼𝑅+1 −√(2∗𝑁𝐼𝑅+1)2 −8(𝑁𝐼𝑅 −𝑅𝑒𝑑)

2

▪ 𝑁𝐷𝑀𝐼 =(𝑁𝐼𝑅 − 𝑆𝑊𝐼𝑅1)

(𝑁𝐼𝑅 + 𝑆𝑊𝐼𝑅1)

Where:

NIR – Near-infrared band (Band 4 in Landsat 7 and Band 5 in Landsat 8)

Red – Red band (Band 3 in Landsat 7 and Band 4 in Landsat 8)

2.5 – Gain factor for correction

6 & 7.5 – Coefficients of aerosol resistance term

Blue – Blue band (Band 1 in Landsat 7 and Band 2 in Landsat 8)

1 – Canopy background adjustment

SWIR1 – Short-wave infrared 1 (Band 5 in Landsat 7 and Band 6 in Landsat 8)

SWIR2 – Short-wave infrared 2 (Band 7 in Landsat 7 and 8)

NDVI is a commonly used vegetation index that measures green, healthy

vegetation as it utilizes the regions of the electromagnetic spectrum most associated

with high absorption of chlorophyll in the red band and high reflectance of NIR band

by leaf mesophyll layers. (Jensen 2016). EVI was developed as an improvement to

NDVI as it corrects potential NDVI saturation issues due to areas with a high leaf area

Page 34: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

27

index (LAI), which is an estimate that characterizes foliage cover and plant canopies

(Exelis, 2017). NBR and NBR2 both utilize infrared bands that are most sensitive to

changes related to fire and are significant indicators of burn severity. NBR uses a

combination of band 5 (Near infrared) and band 7 (shortwave infrared) from Landsat 8,

while NBR 2 uses both shortwave infrared bands 6 and 7 (Boer et al. 2008). MSAVI is

a modified or improved version of the Soil-Adjusted Vegetation Index (SAVI) that has

an adjustment factor to minimize soil noise that is usually picked up by NDVI. This

adjustment factor is iterated continuously in the MSAVI, which increases the dynamic

range of SAVI, optimizing soil adjustments (Qi et al. 1994). NDMI improves upon

NDVI in its ability to track water stress and plant biomass changes more closely, as the

bands used highly correlate with water content of canopies. NDMI is similar to NBR,

but it uses band 6 as the shortwave infrared information (Jensen 2016).

3.4 Open-source and licensed software

Many software tools were used to complete this research, primarily R, RStudio,

ArcGIS, and Google Earth. R is an open-source, object-oriented statistical

programming language which was used in combination with RStudio, an integrated

development environment that has robust features such as code editing, debugging and

various graphics and visualization tools. The R language is widely used amongst

researchers and data specialists for developing statistical software and analyzing data,

and has continued to rise in popularity since the release of the first stable beta version

in 2000 (R version 3.3.3 was used for this research). The open-source nature of R makes

it quite accessible as there are several packages already built in with advanced

functionality and thorough documentation, and more being developed continually

through various R communities such as the Comprehensive R Archive Network

(CRAN). ArcGIS is a powerful mapping and analytics platform that was used to

analyze output data produced by ‘bfastSpatial’ in R. ArcGIS is a licensed software

product developed by ESRI that was initially released in 1999 and is currently in its 10th

version (ArcGIS 10.3.1 was used for this research). ArcGIS is used for a variety of

purposes and has many capabilities including spatial analytics, mapping and

Page 35: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

28

visualization, 3D modeling and visualization, real-time GIS applications, remote

sensing imagery, and data collection and management.

Google Earth is another powerful mapping platform that has many built in

datasets that were used for validating results. Google Earth is freely available and was

initially released by Google in 2001. It is also rather intuitive to use, making it a widely

popular platform. The program uses satellite imagery to create a 3D rendition of Earth,

which can be navigated very simply, and allows for the addition of layers from existing

Google datasets or custom, user-created layers. Google Earth was particularly useful in

this research for validation of the accuracy of ‘bfastSpatial’ outputs as well as for the

valdation of near-real-time functionality as Google Earth has time-lapse data built in

that allows users to inspect high spatial resolution imagery at variable time frames.

3.5 System architecture

The general approach consists of firstly acquiring Landsat 7 and 8 images from

2000 to 2016. Imagery is then used as inputs for the ‘bfastSpatial’ algorithm. The

images are first stacked, representing a singular brick as the input. Then ‘bfastSpatial’

objectively analyzes each pixel individually, by creating a seasonal trend model based

on the real vegetation index data over time. This model is used as the basis of

comparison against real data in a specified monitoring period, and if the real data breaks

from what is expected in the model, a breakpoint is then flagged with a specific

magnitude of how severe the break was from the trend.

As a first step, pre-processed data acquired from the USGS ESPA ordering

system are placed into a directory named “landsat.” (See Figure 8 for details of directory

architecture).

Page 36: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

29

These directories also need to be created within the R environment with function

command such as: landsatDir <- file.path(stepDir, ‘landsat’) (See Appendix A for exact

coding).

The function ‘processLandsatBatch’, which is part of the ‘bfastSpatial’ package,

is then used to extract data for all of the vegetation indices and apply a cloud mask via

the ‘pixel_qa’ layer, which is a quality assessment algorithm that contains cloud, cloud

confidence, cloud shadow, snow/ice, and water data. This mask allows for the extraction

of low quality pixels from of the analysis.

This vegetation index data is stored in separate directories, which are then used

to create raster brick object types via the ‘timeStack’ function, another ‘bfastSpatial’

function. Each raster brick contains layers and each of the layers is an image of

vegetation index data with an associated date from 2000 – 2016. These images stacked

together forming a brick is the time series data used as the input for ‘bfmSpatial’.

data

(Stores VI Stacks)

datastep

ndvi (and other VI folders)

landsat

out

(Stores Outputs)

Figure 8. Folder architecture that needs to be created outside of R environment on

computer.

Page 37: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

30

Once a vegetation index is bricked it can be run through ‘bfmSpatial’, which has

several parameters and inner workings to be considered.

First, the date vector object ‘dates’ is created by acquiring the scene information

provided in the Landsat scene ID. Next, sensor information is acquired for the sub-

setting of data by sensor. Then the length of the coefficient vector is determined based

on the formula selected (trend and/or harmonic). At this point the system is set to run

the iterative function that runs ‘bfastmonitor’ on every pixel over the raster brick. The

brick is first subset by sensor, which can be used to limit the analysis by sensor (all

sensors were used in this study), and then converted to a BFAST time-series object by

the ‘bfastts’ function as ‘bfastmonitor’ does not accept raster class objects.

After BFAST time-series creation, ‘bfastmonitor’ is run on every pixel with the

following parameters:

▪ data = time-series raster brick

▪ start = start of monitoring period (2013 in this study)

▪ formula = response ~ harmonic

▪ order = 1

▪ lag = NULL

▪ slag = NULL

▪ history = c(“all”)

▪ type = “OLS-MOSUM”

▪ h = 0.25

▪ end = 10

▪ level = 0.05

The most important of these parameters are ‘formula’, ‘order’ and the ‘h value’

as these are how the model is created from which breaks are detected.

Page 38: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

31

The ‘formula’ parameter refers to the seasonal model to be fitted from landsat

observations, ‘order’ refers to the order of the equation of the model, while ‘h-value’

referes to the fraction of values from the history period (all data) that will be used to

compute the OLS-MOSUM statistic. This means that for the period of 2000-2016, an

‘h-value’ of 0.25 or 25% will allow a 4 year window of data to be included in the OLS-

MOSUM computation. At this h-value, only one break will be able to be detected every

four years.

Previous research from Verbesselt et al. (2012) and DeVries et al. (2015), as

well as experimentation during the course of this research have suggested that a first

order seasonal harmonic model with an h value of 0.25 produce the most accurate

results. The reason for this is that a first order harmonic model better represents seasonal

variability in leaf phenology in tropical dry forests. In addition, because low-quality

pixel values are very frequent (up to 50% of the complete available data), an h-value of

0.25 (4 years), allows enough data to be included in the computation of the MOSUM

statistic for break detection.

The ‘monitoring period’ is set by the user and represents the window or time

frame where the user wants to visualize breakpoints. In this study, I selected the period

2013-2016 as the monitoring period.

Additionally, if the internals of ‘bfmSpatial’ are being run manually there are

additional parameters that need to be set as they are the actual parameters within

‘bfmSpatial’ that are required for the internal components of ‘bfmSpatial’ to function

and to process ‘bfastmonitor’ outputs (see Appendix A for detailed code). These

parameters are:

▪ x = time-series raster brick

▪ dates = NULL (set internally within bfastSpatial)

▪ pptype = ’irregular’ (temporal resolution or time between images)

▪ monend = NULL (optional end of monitoring period)

Page 39: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

32

▪ mc.cores = 1 (optional parameter in parallel processing)

▪ returnLayers = c(“breakpoint”, “magnitude”, “error”) – output brick layers to

include

▪ sensor = c(“ETM+ SLC-on”, “ETM+ SLC-off”, “OLI”) – sensors to include

Once ‘bfastmonitor’ has run over all pixels the output is a raster brick with the following

layers:

▪ breakpoint – timing of breakpoints detected for each pixel

▪ magnitude – the median of the residuals within the monitoring period

▪ error – a value of 1 for pixels where an error was encountered by the algorithm

and NA where the method was successfully run

▪ Additional layers not in default returnLayers parameter include history,

r.squared, adj.r.squared and coefficients, which can be used for additional

statistical analysis not covered in this research.

The output layers can be further manipulated by separating breakpoint timing by

year and month with the changeMonth function, which is part of the ‘bfastSpatial’

package, as well as by creating a magnitude map of only the breakpoints (the magnitude

layer by default shows magnitude for all pixels). The various outputs can be verified

via the ‘plot’ function, a standard R function that allows graphical viewing of data. The

output layers and manipulations can then be converted to GeoTiff files via the

writeRaster function, which is part of the raster package for R. The breakpoint timing,

magnitude of breakpoints, and month of breakpoint separated by year (2013 – 2016)

were all used for the purposes of this research. Once outputs are obtained a threshold

can be applied to the magnitude product so that only negative values remain thus

creating a map of only potentially deforested areas (see Figure 9 for detailed workflow).

Page 40: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

33

Figure 9. Flowchart representing methods of this study.

Page 41: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

34

3.6 Design of a near-real-time monitoring system using BFAST

The nature of the ‘bfastSpatial’ algorithm is such that the monitoring period

requires enough data to accurately determine breakpoints. Therefore, if a single year is

used as the monitoring period, there may not be enough data to accurately determine

breakpoints. This is why I used a four year monitoring period to more accurately detect

breaks. However, during this monitoring period, only one break can be detected per

pixel, which negatively impacts the temporal accuracy of detected breaks. Since the

monitoring period is four years a break could be detected prematurely due to variations

in precipitation so an area that is drought-stricken in 2013 for example could be

prematurely flagged as a breakpoint when deforestation actually occurred in 2015.

Because of these temporal accuracy issues, a new approach was devised. With the new

approach, outputs were obtained from ‘bfastSpatial’ with a monitoring period from July

2011 to July 2015 and compared to outputs with a monitoring period from January 2012

to December 2015 (includes new data from the most recent 6 months).

Previous research suggested that simply adding single images onto the

monitoring period as they become available would result in new breakpoint detection in

that particular image if deforestation had occurred. This ideal setup could be considered

real-time. However, it was found that one additional image was not enough data to

ensure an accurate detection of a break so I hypothesized that additional data (up to 6

months) would provide better detection accuracy. Although, six months is a long period

of time compared to the concept of real-time, I consider this time frame to be ‘near real-

time’, in comparison to current global approaches based on annual or decadal

information.

This additional 6 months of data provided enough for the algorithm to

accurately detect breaks within the new data (see Figure 10 for visualization).

Page 42: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

35

MOSUM computed over 4- year data from 07/11

to 07/15

4-year window shifted to 01/12 to 12/15 to include 6 months of

new data (blue box) with break detected (red dotted line and circle)

0.35

0.45

0.55

0.65

0.75

0.85

0.95

1.05

Yucatan2 NBR2 Secondary Run - 6 Months Later

Figure 10. Model visualization of how near real-time system functions.

Page 43: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

36

3.7 Validation

For evaluating the accuracy of the BFAST method in detecting small scale and

short term TDF loss, I implemented an ‘error matrix’ or ‘confusion matrix’ accuracy

assessment approach based on Congalton (1991), which has been extensively used for

map validation in scientific studies. The error matrix compares map values generated

by automated processes (in my case, breakpoint magnitudes indicating deforestation)

with true ground information collected from reference data. Reference data can be

collected by field visits or by using very high spatial resolution imagery (VHR).

An error matrix allows quantifying the simplest measure of accuracy, called the

‘Overall accuracy’, which is computed by dividing the total correctly classified pixels

by the total number of pixels in the error matrix. In addition, the error matrix method

allows to calculate the ‘Producer’s accuracy’ which refers to the number of correct

pixels in a category divided by the total number of pixels in that category based on the

reference data. This accuracy measure indicates the probability of a reference pixel

being correctly classified and is considered a measure of omission error, indicating how

well a certain area was classified. A third accuracy measurement is given by the ‘User’s

accuracy’, which is the total number of correct pixels in a category divided by the total

pixels classified in that category and is a measure of commission error, which indicates

the probability that a pixel classified on the map/image actually represents that category

in the ground.

For calculating the accuracies of the BFAST products, I collected ground truth

information using multi-temporal very-high resolution (VHR) imagery acquired for the

project and available in Google Earth platform (Rapid eye/World View < 5 m spatial

resolution) similarly to DeVries et al. (2015), Grogan et al. (2016), Murillo-Sandoval et

al. (2016) and Schultz et al. (2016).

For collecting ground truth data, I used all available VHR imagery in Google

Earth from all dates available starting 01/2012 until December 12/2015. A grid with 1

hectare cells (100 x 100 m) was created using ArcGIS covering the entire study area

Page 44: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

37

extent. However, for practical purposes, I selected a subset of the area near to 20% of

the whole study area, for performing the validation processes. This allows dedicating

higher level of attention for detecting deforestation using VHR imagery and thus,

obtaining a higher quality reference dataset. This was performed for Y1, Y2 and G2

sites (See Figure 11). For G1, I used the entire grid covering the study site in order to

collect the highest possible amount of reference information since deforestation in this

site occurred in very low densities.

Using multi-temporal imagery in Google Earth, sites Y1, Y2 and G1 were

inspected thoroughly on screen. One-hectare grid cells were categorized as ‘D’

(indicating a deforested location) and ‘S’ (stable land cover/land use). Deforested cells

(‘D’) referred to areas that were visibly covered by tropical dry forests in 2012 and the

beginning of 2013, and visibly non-forested by the beginning of 2016, with evident soil

exposure (Figure 12). Stable cells (‘S’) referred to areas that were either agriculture,

forest, pasture and other land covers and remained the same during the 2012-2016

period. Forest regrowth areas, initially considered, referred to deforested areas that

started to regain vegetation and accumulated sufficient biomass that was visible in VHR

imagery and potentially picked up by the BFAST algorithm. However, although an

important percentage of the original biomass was restored in 3 years, BFAST

predominantly mapped these areas as ‘D’ events. Therefore, I considered these areas as

already deforested sites that will rather fit into the ‘S’ category.

Multi-temporal VHR imagery for the G2 site was lacking in the Google Earth

platform. Therefore, I used the Hansen et al. (2015) dataset of annual tree cover loss for

the years 2013-2016 available in the Global Forest Watch website (www.gfw.org) as an

independent source of information.

Page 45: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

38

Reference data

Y2 site Y1 site

‘D’ (deforested) ‘S’ (stable)

G1 site G2 site

Figure 11. Reference data locations for validation. One-hectare grid cells visually

inspected via multi-temporal images available in Google Earth (total n=373). Y1

(‘D’=64; ‘S’=54), Y2 (‘D’=32; ‘S’=32), G1 (‘D’=21; ‘S’=50), G2 (‘D’=70; ‘S’=50).

Page 46: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

39

The reference dataset consisted of 373 locations with confirmed deforestation

events across the four sites. The 373 locations were compared with BFAST products

(breakpoint and change magnitude maps). Breakpoints with moderate to extreme

negative change magnitudes were considered indicators of forest clearings. Error

matrices were constructed using BFAST outputs from each vegetation index. I

evaluated the accuracy of breakpoints detected for each time series of vegetation indices

(EVI, NBR/NBR2, NDMI, MSAVI and NDVI) and compared accuracy ranks obtained

among indices.

3.8 Near real-time validation

The near real-time results were validated by viewing both of the magnitude

outputs from 07/2011 to 07/2015 and 01/2012 to 12/2015 in ArcGIS and creating a point

shapefile of all of the newly detected breaks in the 6-month period (see Figure 13 for

visualization). This shapefile was then converted into a KMZ file to be imported into

Google Earth. The historical image tool was then used to view imagery from 2015

corresponding to the period between July 2015 and December 2015. In order to

determine accuracy of the near real-time method every point that corresponded to a new

‘D‘D

‘D

‘D‘D

‘D

‘S’

‘S’

‘S’

‘S’

‘S’ 2013 2016

Figure 12. Validation example. Deforested cells (‘D’) – areas visibly covered by TDF

beginning 2013, and visibly non-forested by 2016, with soil exposure. Stable cells

(‘S’) referred to areas of any land covers that remained the same during 2013-2016.

Page 47: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

40

breakpoint was analyzed based on the Google Earth image prior to 07/15, which was

03/25/2015 and the images between 07/15 and 12/15, which was 09/05/2015. Using

these sample pixels from the BFAST output, I evaluated the producer’s accuracy of

BFAST at a local scale to detect new deforestation in a period of 6 months.

Figure 13. Magnitude outputs for NBR2 in Yucatan site 2. Monitoring period 07/11

to 07/15 (green to red) overlays monitoring period 01/12 to 12/15 (green to blue) so

that new deforestation is highlighted in blue.

Page 48: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

41

CHAPTER IV

RESULTS AND DISCUSSION

In this research the overall efficacy and accuracy of the BFAST set of algorithms

was evaluated in the analysis of small-scale deforestation in tropical dry forest

environments in the Yucatan Peninsula region and Guanacaste Conservation Area. Data

availability was an immediately apparent issue as Landsat data has a 16-day temporal

resolution and due to the scan line corrector failure in Landsat 7 data, as well as poor

quality and high cloud cover of tropical areas, a significant number of observations

(clear pixel values) were masked by the image quality mask. For example, each year

should have approximately 23 corresponding Landsat scenes, but some years had 10 or

less, thereby impacting the creation of the model.

Overall, the ‘bfastSpatial’ algorithm performed well in terms of processing time

and accuracy of results. The majority of the Landsat 8 tar files were approximately 6

MB, with the Landsat 7 files being approximately 3 MB. All of the processing was

done on either a MacBook Pro running OS X 10.11.6 with an Intel Core i7 3.1 GHz

processor and 16 GB of 1887 MHz DDR3 RAM memory or a Dell OptiPlex 2010

running Windows 7 Enterprise with an Intel Core i7 3.4 GHz processor and 16 GB of

RAM memory, with both having similar results in terms of processing time. Time-

series raster brick creation was efficient, generally taking 5 minutes or less and creating

bricks approximately 45 MB in size. Running ‘bfmSpatial’ was the longest part of the

process and generally took about 20-30 minutes to produce the output bricks with

breakpoints, magnitudes, error, and other supplementary outputs, which were typically

around 7 MB in size.

4.1 Breakpoints and magnitudes

Figures 14-17 (next pages) show the distribution of pixels where breakpoints

were detected for the 2013-2016 monitoring period for each site. Breakpoints are

labeled with its corresponding magnitude value using a red>yellow>green color

Page 49: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

42

gradient scheme. Green and yellow magnitude values correspond to slight to moderate

positive breakpoints, while reddish magnitude values correspond to slight to extreme

negative breakpoints.

Both areas in the Yucatan experienced significantly higher rates of breakpoint

detection and more overall deforestation than both areas of Guanacaste. Breakpoint

detection in both study areas of the Yucatan ranged widely from about 14.1% to 92.5%

(see Figures 14 and 15) of pixels being labeled as breaks from the model, whereas both

areas of Guanacaste ranged from about 2.8% to 51.4% (see Figures 16 and 17). The

most notable in terms of high breakpoint detection percentage were NDVI, MSAVI and

NDMI readings from Yucatan study site 2 (Figure 17) with 92.5%, 89.3%, and 81.3%,

respectively, of total pixels flagged as breakpoints. This could also be due to several

factors such as better data quality, or the study site’s proximity to the metropolitan area

of Merida. On the other hand, the most notable low breakpoint percentage was EVI

with 2.8% in Guanacaste study site 2 (Figure 17), although this area had low breakpoint

percentages among all indices. This area is more centrally located in the Guanacaste

region near the Arenal Volcano and a mountain range, opposed to the first study site

which is located near coastline. I believe G2 has more probability of being obscured by

heavy cloud cover, thus creating large gaps in the data, which negatively impact the

model created by BFAST for those pixels.

Breakpoint magnitude was the most significant predictor of a deforestation event

with the most negative values correlating most to deforestation. The magnitude values

varied greatly, but quite interestingly amongst sites not even in the same region rather

than vegetation index. Yucatan site 1 and Guanacaste site 2 had similarly broad

magnitude ranges amongst all vegetation indices with the extremes being -0.660399

(NDMI) and 0.477973 (NDVI) in Yucatan site 1, and -0.644136 (EVI) and 0.461431

(NDMI) in Guanacaste site 2. Conversely, Yucatan site 2 and Guanacaste site 1 had

comparably narrow magnitude ranges amongst all vegetation indices, the confined

outliers being -0.267992 (NBR) and 0.286369 (NBR) in Yucatan site 2, and -0.29846

(NBR) and 0.277289 (NDVI) in Guanacaste site 1.

Page 50: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

43

DeVries et al. (2015) determined that moderate to extreme negative magnitude

values are associated with declines in forest cover, which is expected since significant

negative breaks in vegetation index values can only occur with conversion of a

vegetated cover to other land use. Yellow to green values represent positive breaks

caused by a sudden increase in the value of the vegetation index. This could be explained

by increases of precipitation in the area, which could produce an excess of moisture in

the soil and vegetation and therefore, a slight increase in the values for indices such as

NBR and NDMI. For the purposes of this research, I only considered moderate to

extreme negative magnitude values as indicators of TDF loss.

Page 51: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

44

Y1 site – Magnitude Values of All Detected Breakpoints

Vegetation Index NDVI EVI MSAVI NDMI NBR NBR2

Breakpoint Pixels 54,079 14,071 40,003 42,337 39,362 24,389

Total Pixels 99,855 99,855 99,855 99,855 99,855 99,855

Percentage 54.2% 14.1% 40.1% 42.4% 39.4% 24.4%

MSAVI NDVI EVI

NDMI NBR NBR2

Figure 14. Maps of breakpoint magnitudes for all VIs for Y1 and pixel percentages.

Page 52: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

45

Yucatan Area 2 – Magnitude Values of All Detected Breakpoints

Vegetation Index NDVI EVI MSAVI NDMI NBR NBR2

Breakpoint Pixels 92,334 46,317 89,121 81,144 68,658 33,895

Total Pixels 99,855 99,855 99,855 99,855 99,855 99,855

Percentage 92.5% 46.4% 89.3% 81.3% 68.8% 33.9%

MSAVI NDVI EVI

NBR2 NDMI NBR

Figure 15. Maps of breakpoint magnitudes for all VIs for Y2 and pixel percentages.

Page 53: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

46

Guanacaste Area 1 – Magnitude Values of All Detected Breakpoints

Vegetation Index NDVI EVI MSAVI NDMI NBR NBR2

Breakpoint Pixels 12,152 9,867 11,695 23,401 22,774 12,752

Total Pixels 45,530 45,530 45,530 45,530 45,530 45,530

Percentage 26.7% 21.7% 25.7% 51.4% 50.0% 28.0%

Figure 16. Maps of breakpoint magnitudes for all VIs for G1 and pixel percentages.

NDVI EVI MSAVI

NDMI NBR NBR2

Page 54: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

47

Guanacaste Area 2 – Magnitude Values of All Detected Breakpoints

Vegetation Index NDVI EVI MSAVI NDMI NBR NBR2

Breakpoint Pixels 7,944 3,022 7,771 8,355 6,052 5,657

Total Pixels 108,240 108,240 108,240 108,240 108,240 108,240

Percentage 7.3% 2.8% 7.2% 7.7% 5.6% 5.2%

NDVI EVI MSAVI

NDMI NBR NBR2

Figure 17. Maps of breakpoint magnitudes for all VIs for G2 and pixel percentages.

Page 55: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

48

4.2 Accuracy assessment

The reference dataset consisted of 373 locations with confirmed deforestation

events across the four sites. However, it is important to note that ‘ground-truth’

deforestation for G2 was estimated using the Hansen et al. 2013 dataset. The accuracy

of this dataset at landscape scales for TDF is unknown, therefore, I considered reporting

accuracy measures with and without the G2 validation dataset.

In analyzing the overall accuracies of the vegetation indices across all sites it is

clear that NBR/NBR2 was the most accurate with an overall accuracy of 74%, which

increased to 83.4% if excluding poor G2 validation data (Figure 18). NDMI was also a

fairly accurate determiner of deforestation with an overall accuracy of 71.6% across all

sites and 81.4% excluding G2. NDVI, EVI and MSAVI were not as effective in

successfully detecting deforestation with overall accuracies of 64.6%, 63.5%, and

60.1%, respectively, across all sites and overall accuracies of 73.1%, 68%, and 64%,

respectively, if excluding G2.

The producer’s accuracy (which refers to the number of correct pixels in a

category divided by the total number of pixels in that category based on the reference

data) yields more optimistic results for NBR/NBR2 and NDMI indices across all sites.

The user’s accuracy (commission error) yields similar accuracies among indices (60-

80%) for all sites.

This is indicative of and validates previous research suggesting that vegetation

indices that exploit the water absorption features of the SWIR band in the

electromagnetic spectrum are more sensitive to forest change than the chlorophyll

absorption features of the red band. Details of the reasons behind this discrepancy will

be discussed in the next section.

Page 56: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

49

0

20

40

60

80

100

Overall accuracies (Y1 site)

0

20

40

60

80

100

Overall accuracies (Y2 site)

020406080

100

Overall accuracies (G1 site)

020406080

100

Overall accuracies (G2 site)

404550556065707580

Overall accuracies (all sites)

4050607080

Overall accuracies (all except G2)

Figure 18. Overall accuracies for Y1, Y2, G1, G2, all sites, and all sites except G2.

Page 57: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

50

020406080

100

Producer's accuracies for 'Deforested' class (Y1 site)

020406080

100

Producer's accuracies for 'Deforested' class (Y2 site)

020406080

100

Producer's accuracies for 'Deforested' class (G1 site)

020406080

100

Producer's accuracies for 'Deforested' class (G2 site)

0

20

40

60

80

100

Producer's accuracies for 'Deforested' class (all sites)

0

20

40

60

80

100

Producer's accuracies for 'Deforested' class (all sites)

w/o G2

Figure 19. Producer’s accuracies for Y1, Y2, G1, G2, all sites, and all sites except G2.

Page 58: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

51

60

80

100

User's accuracies for 'Deforested' class (Y1 site)

020406080

100

User's accuracies for 'Deforested' class (Y2 site)

020406080

100

User's accuracies for 'Deforested' class (G1 site)

0.0020.0040.0060.0080.00

100.00

User's accuracies for 'Deforested' class (G2 site)

0

20

40

60

80

100

User's accuracies for 'Deforested' class (all sites)

0

20

40

60

80

100

User's accuracies for 'Deforested' class (all sites) w/o G2

Figure 20. User’s accuracies for Y1, Y2, G1, G2, all sites, and all sites except G2.

Page 59: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

52

The varying accuracy of the different vegetation indices was interesting to

witness and has been explained in previous studies, and validated by this research.

DeVries et al. (2015) noted that vegetation indices like NBR/NBR2 and NDMI that

utilize near infrared and short wave infrared bands are particularly sensitive in detecting

canopy moisture content, thus making them highly accurate in detecting deforestation

as well as in differentiating age classes of forest (primary vs. secondary). Additionally,

these vegetation indices can better distinguish not only age classes of forests, but also

discriminate minimal vegetation in pastures or degraded areas from bare soil, which

reduces cropland false positives (Bewernick, 2015). NDMI has been found to be

particularly useful in previous studies of herbaceous biomass in savanna ecosystems in

order to determine fire risk, again due to the SWIR band’s responsiveness to plant tissue

water content (Verbesselt et al. 2006). In even earlier studies conducted by Wilson et

al. (2002) NDVI and NDMI were directly compared in forest harvest type detection,

also using Landsat imagery, with the older method of comparing 2 images from different

dates at 2, 3, and 6-year intervals. Their research showed NDMI significantly

outperforming NDVI over all intervals in instances of obvious clearcutting, but

especially in smaller scale partial harvests, suggesting increased precision and accuracy.

In this research it has been demonstrated that the water absorption associated with this

region of the electromagnetic spectrum (used in NBR, NBR2 and NDMI) is more

sensitive to change than the chlorophyll absorption associated with the red or NIR band

used in other indices (NDVI, EVI, MSAVI). This notion is corroborated by other

research such as that conducted by Sims and Gamon (2003) whereby direct comparisons

were made between vegetation indices based on water and chlorophyll absorption

features. They note that a remote sensors ability to deeply penetrate a forest canopy to

acquire information is directly tied to the strength of the absorption of wavelengths.

This is why NDVI and other chlorophyll absorption based vegetation indices cannot

penetrate forest canopy deeply as they absorb much more strongly than water,

particularly in forests with high leaf area indices. Essentially, since the chlorophyll is

being absorbed by the leaves higher up in the canopy, this prevents better data from

Page 60: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

53

being acquired as the wavelength being sensed is stopping short, whereas water

absorption features can be detected more thoroughly throughout the entirety of the

canopy (Sims and Gamon, 2003). For example, if forest canopy is removed and

replaced with pasture or agricultural land, these could have similar chlorophyll

absorption features, but pasture or agricultural land could not replicate the moisture

absorption levels of a forest canopy. These differences in wavelength absorption

features are why NBR, NBR2 and NDMI are more sensitive to changes in forest

structure than NDVI, EVI and MSAVI.

The second study site in Guanacaste proved very difficult to validate in terms of

accuracy due to the lack of an adequate validation data set. This lack of data is why I

chose not to fully include the second site in evaluating the overall accuracy of the

method.

Regarding the ‘near real-time’ assessment, after analyzing all newly detected

breaks, 76 points out of 138 were determined to be actual deforestation yielding 55%

accuracy with false positives yielding 45% of breaks detected (Figure 21). Many of the

false positive breaks seemed to be associated with disturbance events such as pasture

clearing (Figure 22). However, many points selected did not represent the most negative

magnitude values so accuracy could have been compromised by commission errors.

Page 61: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

54

Figure 21. Evaluation of near real-time accuracy of BFAST. Points represent new

breakpoints detected in a 6-month window. Imagery shows ground truth data. The

accuracy for this assessment in Y2 was estimated in 55%.

Detection of deforestation events like this is significant considering they

happened between 03/25/2015 and 09/06/2015, but the new breakpoint data was from

07/2015 to 12/2015, which would suggest that the deforestation happened between

07/2015 and 09/06/2015. Some of the omission errors, might be associated to the lack

of reference data beyond 09/2015.

Page 62: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

55

4.3 Step towards a near real-time deforestation monitoring system in

Central America.

The high level of accuracy of the BFAST method and its potential for near real-

time application could have a significant impact on the way that land uses and forests

are managed in tropical dry forest areas. My results support the confidence values

reported by other authors in humid forests and other dry forest sites, and contributes to

the design of a near real-time system for the Guanacaste Conservation Area and Yucatan

area. With the accurate results produced by ‘bfastSpatial’ combined with the near real-

time method described in this research, actionable results can be attained. For example,

if this method were to be applied every 3 to 6 months a report could be produced for a

director or other conservation, land-use, or forest manager, which could then be used to

determine if action needed to be taken on the ground. This would be particularly useful

in areas like Guanacaste where deforestation is illegal. In areas like Yucatan this

method could also be proven useful in allowing for better management of land for milpa

Figure 22. Example of false positive detected breaks not associated with deforestation.

Page 63: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

56

agricultural practices whereby cropland is rotated and allowed to lie fallow for periods

of time.

Some aspects need to be taken into consideration for a monitoring system based

on BFAST to work. First, local authorities and local scientists need to be trained in the

use of R-software and the application of BFAST family of algorithm. Local capacity

building is an important aspect since users on the ground need to be aware of the

advantages and limitations of the method.

Furthermore, the possibility of including more observations at the spatial

resolution of Landsat (30-m) or higher, will greatly increase the accuracy of the method.

New methods are being developed to enable algorithms such as BFAST to incorporate

multiple data streams from Landsat, MODIS and Sentinel 1 & 2, to help fill gaps in the

time series. If, for example, the BFAST algorithm is applied using a time-series

consisting of harmonized Landsat, Sentinel-2 and Radar data, then this will allow to

have up to 10 observations per month for every pixel. This will greatly increase the

probability of having non-contaminated or cloud-free pixels and allow subsequent

BFAST runs after acquiring just 1 month or 3 months of data.

The possibility of filling the gaps using more imagery could also increase the

processing time. Using only Landsat, the processing time at an ecoregional scale could

reach about 12 hours using a single computer or 2 hours using parallel computing just

for one run. This could double or triple with the use of harmonized multi-sensor data.

Parallel computing or cloud computing through international collaborations could be

implemented to reduce processing time.

4.4 Sources of error and implications for BFAST implementation

There are various error issues related to the methodology described in this study,

with the primary sources of error being the lack of data caused by interference from

cloud cover and the need for additional metrics derived from BFAST outputs and from

external sources. Cloud cover is the most significant factor in analyzing time series data

Page 64: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

57

with BFAST, with study site G2 displaying this issue most prominently (See Figure 23

for examples of clouded images within stack).

A B

The G2 stack contained 224 images for the period 2000-2016 with each image

containing 108,240 pixels for a total of 24,245,760 pixels. Of these 24,245,760 pixels,

21,294,806 pixels were flagged NA (Not Applicable) by the cloud mask file, meaning

that 87.8% of the data was useless, with only 12.2% of the data being used by BFAST.

Furthermore, this 12.2% of usable data wasn’t necessarily consistent for each pixel over

time so a single pixel location may have a vegetation index value on one date, but could

be flagged NA on another, thus affecting the overall model for that pixel created by

BFAST.

Additional sources of data that could be used to reduce sources of error in the

methodology are the slope (statistical) of segments between breakpoints, which is

derived from BFAST outputs, as well as elevation and slope (topographical) data, which

can be acquired from external sources. Murillo-Sandoval et al. (2017) utilized the slope

of each breakpoint segment to determine if actual deforestation had taken place. In their

study, the authors utilized BFAST without attention to a specific monitoring period so

Figure 23. Image A from 02/25/2005 and B from 10/11/2015 in G2 NDVI stack

showing lack of data due to cloud mask (and due to Landsat 7 Scan Line Corrector error

in Image A).

Page 65: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

58

multiple breaks could be detected, but a similar method could be applied for a specific

monitoring period with one breakpoint. For example, the user can calculate the slope

between the start of the monitoring period and the break, and then, the slope between

the break and the end of the monitoring period. If negative slope coefficients are

significant (α = 0.05) this could be considered to be potential deforestation or browning

(see figure 24 for Murillo-Sandoval slope graph).

In addition to statistical slope data, externally sourced elevation and

topographical slope data can also be used to enhance deforestation detection accuracy.

It was noted by Murillo-Sandoval et al. (2017) that areas of high elevation (> 2000 m

above sea level) not only experienced much higher levels of cloud cover (much like the

G2 site), but were also expected to have less anthropogenic disturbance due to forest

access difficulty. Singh et al. (2017) also expanded upon this idea by utilizing slope

and Shuttle Radar Topography Mission elevation data (as well as other approachability

factors such as settlements, major roads, distance to forest edge, and water body

locations) to model deforestation with the use of an artificial intelligence neural

network. With the use of cloud cover, statistical slope, and topographical data that

Figure 24. Example taken from Murillo-Sandoval et al. 2017. Three breakpoints

(dashed red lines) and four segments (black lines) identified over time series (blue

lines). The slope coefficients (β) are all significant (α = 0.05) and ρ represents p-

values.

Page 66: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

59

influence access to forests like slope and elevation, a confidence product can be created.

This confidence product can be used to determine if a particular area is more or less

susceptible to deforestation, thus providing enhanced potential to filter out spurious

breakpoints.

4.5 Implications for biodiversity and conservation

When considering the implementation of new technologies and methods such as

those described in this study, it is important to consider the end user and how this

methodology could actually be effectively and efficiently utilized in the field, especially

by those unfamiliar with how to use this technology. Firstly, it is necessary to be able

to convey how exactly this research would contribute to the protection of biodiversity

and conservation efforts. For example, Simons-Legaard et al. (2016) utilized 16 total

Landsat images between 1973 and 2010 to create time-series forest disturbance maps

used for habitat monitoring and projections for the Canada lynx, a US federally

threatened species. The decline in vegetation density in the shared boreal and sub-boreal

forest habitat of the Canada lynx and the snowshoe hare (primary food source for the

Canada lynx) equated to a decline in snowshoe hare population thus negatively

impacting the population of Canada lynx. The researchers note that time-series data is

commonly used in mapping land-cover change, while regular non-time-series imagery

is used in developing species-habitat model predictors, but the two methods are rarely

combined. However, comparable studies can be improved upon by employing methods

like BFAST on time-series data of a higher temporal resolution created through utilizing

full Landsat archives. Not only could initial disturbance maps be improved with the use

of more data and change detection algorithms like BFAST, the wildlife habitat could be

monitored in near real-time, thus adding an additional level of depth to the research for

directly tracking changes at regular intervals over the course of the study. Determining

near real-time changes in habitat extent for a threatened species such as the Canada lynx

could further the potential for developing appropriate modeling parameters to more

accurately monitor and predict changes in habitat. Ultimately, understanding the

Page 67: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

60

dynamic nature of habitat extent transformation can greatly assist in conservation efforts

to preserve unique and vulnerable biodiversity.

Although not directly addressed in this research the potential for other types of

change detection could also be of value, more specifically, the magnitude of detected

breakpoints that were positive. Originally this was hypothesized to represent regrowth

(especially at the most extreme positive magnitude classes), but the nature of regrowth

dynamics is complex. For example, in this research, when very high positive magnitude

values were detected they didn’t necessarily correlate to forest regrowth as true forest

regrowth takes place over years, and wouldn’t register as an abrupt change in a

vegetation index. However, regrowth of some type was detected from bare soil to

grasslands or pastureland, which can be particularly useful in general near real-time

land-cover change mapping, especially in regards to grassland management for fire

suppression, for example. Moreover, while initial results suggested positive magnitudes

did not correlate with forest regrowth specifically, perhaps decreasing data gaps,

changing model parameters and statistically analyzing supplementary BFAST outputs

combined with external data (as mentioned in the previous section) could yield accurate

forest regrowth results. This being said, more research needs to be conducted regarding

the analysis of positive magnitude value breaks as this was not the focus of this

particular study.

As the efficacy of this technology and methodology becomes more apparent, it

becomes increasingly important to understand how this can be practically used in the

field, particularly in areas where resources may be scarce. Despite the environment in

which the methodology described in this study is used, optimization of several of the

functions to make a simpler, more unified system is key. The downloading of imagery

via USGS ESPA (as well as other repositories) can be automated through a Unix based

programming language like R via a bulk ordering application programming interface

(API). The bulk ordering API would allow this more unified system to first download

the pre-processed images (reprojected, image extents cropped to study site, vegetation

indices, cloud mask, etc.), then stack and analyze time-series data to produce outputs.

Page 68: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

61

With proper parameter selection and scripting to connect the disparate functions, inputs

can be acquired, then analyzed to produce outputs, which can also be further

manipulated within the same system to produce more “end-user” (i.e. conservation

director) based outputs such as the magnitude threshold map of probable deforestation

correlating to lowest magnitude class (additional error reduction techniques described

in the previous section could also be applied). With this map of probable deforestation,

areas of interest could be investigated directly or used to geotag waypoints for drone

monitoring. However, developing and operating this optimized system would require

significant resources including sufficient computing power, stable internet connection,

and at least one staff member (as well as drone if trying to maximize efficiency). In

more financially distressed areas with fewer assets this system is infeasible, and would

require the collaborative effort of conservation organizations and universities around

the world.

Page 69: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

62

CHAPTER V

CONCLUSION

To conclude, the need for this research is becoming increasingly necessary as

the rates of deforestation continue to climb and the status of tropical dry forests,

especially primary forests, and their biodiversity continue to be under threat. The

overall objectives of this study were to evaluate the accuracy of the BFAST set of

algorithms to validate the findings of previous research as well as to determine potential

near real-time capabilities of BFAST. These objectives were successfully met by

employing the BFAST parameters that yielded the most accurate results in previous

studies, most notably the first-order harmonic formula with an h value of 0.25 applied

over NDMI and NBR/NBR2 vegetation indices. Vegetation indices that utilize the

shortwave infrared bands proved to be more sensitive to forest disturbance than other

indices using the red and near infrared bands. Moderate to extreme negative magnitude

values were revealed to be the determining output products that indicated a deforestation

event, with value ranges varying widely between study sites and regions. Cloud cover

impacted the low level of accuracy achieved for the G2 site, which contrasted with the

rest of the sites. This site had more variable topography, and potentially increased

probability for atmospheric contamination of Landsat observations. The near real-time

monitoring objective was met with some initial success in that the method was able to

detect new breakpoints within a 6-month period or less. Because of poor data

availability and spuriously detected breakpoints, the application of BFAST for shorter

time frames in near real-time (weeks) will only be possible through the use of multi-

sensor data and external data sources. Such a system will also improve detection

probability in mountainous areas. Additionally, it was found that overall, the

methodology of this study was demonstrated to be effective and accurate for detecting

deforestation at sub-annual temporal scales, and could be upscaled to ecoregional or

national scales using available Landsat data, making it a beneficial means of conserving

biodiversity in the field.

Page 70: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

63

LITERATURE CITED

Bewernick, T. 2015. Mapping post deforestation land use in the Brazilian Amazon

using remote sensing time series. Wageningen University.

Boer, M.M., C. Macfarlane, J. Norris, R.J. Sadler, J. Wallace, and P.F. Grierson.

2008. Mapping burned areas and burn severity patterns in SW Australian

eucalypt forest using remotely-sensed changes in leaf area index. Remote

Sensing of Environment 112:4358-4369.

Bohn, F.J. and A. Huth. 2017. The importance of forest structure to biodiversity-

productivity relationships. Royal Society Open Science 4:160521.

Boillat, F.M. Scarpa, J.P. Robson, I. Gasparri, T.M. Aide, A.P. Dutra Aguiar, L.O.

Anderson, M. Batistella, M. Gesteira Fonseca, C. Futemma, H.R. Grau, S.-L.

Mathez-Stiefel, J.P. Metzger, J.P.H. Balbaud Ometto, M.A. Pedlowski, S.G.

Perz, V. Robiglio, L. Soler, I. Vieira, E.S. Brondizio.2017. Land system

science in Latin America: challenges and perspectives, Curr. Opin. Environ.

Sustain., 26–27, pp. 37-46

Butchart, S.H.M., et al. 2010. Global Biodiversity: Indicators of Recent Declines.

Science 328:1164-1168.

Cohen WB, Healey SP, Yang Z, Stehman SV, Brewer CK, N G, Huang C, Kennedy

RE et al.. 2017. How Similar Are Forest Disturbance Maps Derived from

Different Landsat Time Series Algorithms? Forests. 8(4):98

Congalton, R.G. (1991) A Review of Assessing the Accuracy of Classifications of

Remotely Sensed Data. Remote Sensing of Environment 37:35-46.

Davis J, Lopez-Carr D. 2014. Migration, remittances and smallholder decision-

making: implications for land use and livelihood change in Central America.

Land Use Policy 38: 319-329

DeClerck, F.A.J., Chazdon, R., Holl, K.D., Milder, J.C., Finegan, B., Martinez-

Salinas, A., Imbach, P., Canet, L., Ramos, Z., 2010. Biodiversity conservation

in human-modified landscapes of Mesoamerica: Past, present, and future.

Biological Conservation 14, 2301–2313.

DeVries, B., M. Decuyper, J. Verbesselt, A. Zeileis, M. Herold, and S. Joseph. 2015.

Tracking disturbance-regrowth dynamics in tropical forests using structural

change detection and Landsat time series. Remote Sensing of Environment

169:320-334.

Page 71: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

64

Exelis (2017) Exelis Documentation center: Vegetation Indices. Available at:

https://www.harrisgeospatial.com/docs/vegetationindices.html

García-Frapolli, E., Ayala-Orozco, B., Bonilla-Moheno, M., Espadas-Manrique, C.,

Ramos-Fernández, G., 2007. Biodiversity conservation, traditional agriculture

and ecotourism: land cover/land use change projections for a natural protected

area in the northeastern Yucatan Peninsula, Mexico. Landscape and Urban

Planning 83, 137–153.

Hansen, M.C., P.V. Potapov, R. Moore, M. Hancher, S.A. Turubanova, A. Tyukavina,

D. Thau, S.V. Stehman, S.J. Goetz, T.R. Loveland, A. Kommareddy, A.

Egorov, L. Chini, C.O. Justice, and J.R.G. Townshend. 2013. High-

Resolution Global Maps of 21st-Century Forest Cover Change. Science

342:850-853.

Jensen, J.R. 2016. Introductory digital image processing: A remote sensing

perspective (4th edition). Pearson series in geographic information science.

Kennedy, R.E., Z. Yang, and W.B. Cohen. 2010. Detecting trends in forest

disturbance and recovery using yearly Landsat time series: 1. LandTrendr –

Temporal segmentation algorithms. Remote Sensing of Environment

114:2897-2910.

Miles, L., Newton, A. C., DeFries, R. S., Ravilious, C., May, I., Blyth, S., Kapos, V.

and Gordon, J. E., A global overview of the conservation status of tropical dry

forests. Journal of Biogeography, 2006, 33, 491-505.

Murillo-Sandoval, P.J., J. Van Den Hoek, and T. Hilker. 2017. Leveraging Multi-

Sensor Time Series Datasets to Map Short- and Long-Term Tropical Forest

Disturbances in the Colombian Andes. Remote Sens 9:179

Olson, D.M., E. Dinerstein, E.D. Wikramanayake, N.D. Burgess, G.V.N. Powell, E.C.

Underwood, J.A. D’Amico, I. Itoua, H.E. Strand, J.C. Morrison, C.J. Loucks,

T.F. Allnutt, T.H. Ricketts, Y. Kura, J.F. Lamoreux, W.W. Wettengel, P.

Hedao, and K.R. Kassem. 2001. Terrestrial Ecoregions of the World: A New

Map of Life on Earth. BioScience 51(11):933-938.

Portillo-Quintero, C. A. and Sanchez-Azofeifa, G. A. 2010., Extent and conservation

of tropical dry forests in the Americas. Biological Conservation, 2010, 143,

144-155.

Portillo-Quintero, C., A. Sanchez-Azofeifa, J. Calvo-Alvarado, M. Quesada, and

M.M. do Espirito Santo. 2014. The role of tropical dry forests for

biodiversity, carbon and water conservation in the neotropics: lessons learned

and opportunities for its sustainable management. Reg Environ Change

15:1039-1049.

Page 72: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

65

Qi, J., A. Chehbouni, A.R. Huete, Y.H. Kerr, and S. Sorooshian. 1994. A modified

soil adjusted vegetation index. Remote Sensing of Environment 48:119-126.

Read, L. and D. Lawrence. 2003. Recovery of biomass following shifting cultivation

in dry tropical forests of the Yucatan. Ecological Applications 13(1):85-97.

Sanchez-Azofeifa, G. A., Quesada, M., Rodriguez, J. P., Nassar, J. M., Stoner, K. E.,

Castillo, A., Garvin, T., Zent, E. L., Calvo-Alvarado, J. C., Kalacska, M. E. R.,

Fajardo, L., Gamon, J. A. and Cuevas-Reyes, P., Research Priorities for

Neotropical Dry Forests. Biotropica, 2005, 37(4) 477-485.

Schultz, M., J. Verbesselt, V. Avitabile, C. Souza, and M. Herold. 2016. Error

Sources in Deforestation Detection Using BFAST Monitor on Landsat Time

Series Across Three Tropical Sites. Journal of Selected Topics in Applied

Earth Observations and Remote Sensing 9(8):3667-3679.

Simons-Legaard, E.M., D.J. Harrison, and K.R. Legaard. 2016. Habitat monitoring

and projections for Canada lynx: linking the Landsat archive with carnivore

occurrence and prey density. Journal of Applied Ecology 53:1260-1269.

Sims, D.A. and J.A. Gamon. 2003. Estimation of vegetation water content and

photosynthetic tissue area from spectral reflectance: a comparison of indices

based on liquid water and chlorophyll absorption features. Remote Sensing of

Environment 84:526-537.

Singh, S., C.S. Reddy, S.V. Pasha, K. Dutta, K.R.L. Saranya, and K.V. Satish. 2017.

Modeling the spatial dynamics of deforestation and fragmentation using Multi-

Layer Perceptron neural network and landscape fragmentation tool. Ecological

Engineering 99:543-551.

Valero, A; Schipper, J and Alnutt, T. (2017) Yucatán Dry Forests. World Wildlife

Fund. Available at https://www.worldwildlife.org/ecoregions/nt0235.

Accessed on September 2017.

Verbesselt, J., R. Hyndman, A. Zeileis, and D. Culvenor. 2010. Phenological change

detection while accounting for abrupt and gradual trends in satellite image time

series. Remote Sensing of Environment 144:2970-2980.

Verbesselt, J., B. Somers, J. van Aardt, I. Jonckheere, and P. Coppin. 2006.

Monitoring herbaceous biomass and water content with SPOT VEGETATION

time-series to improve fire risk assessment in savanna ecosystems. Remote

Sensing of Environment 101:399-414.

Page 73: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

66

Vila, M., J. Vayreda, L. Comas, J. J. Ibáñez, T. Mata, and B. Obón. 2007. Species

richness and wood production: a positive association in Mediterranean forests.

Ecology Letters 10:241-250.

Wilson, E.H. and S.A. Sader. 2002. Detection of forest harvest type using multiple

dates of Landsat TM imagery. Remote Sensing of Environment 80:385-396.

Zhu, Z. and C.E. Woodcock. 2014. Continuous change detection and classification of

land cover using all available Landsat data. Remote Sensing of Environment

144:152-171.

Page 74: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

67

APPENDICES

A. BFAST CODE IMPLEMENTATION IN RSTUDIO

**Note it is advised to visit https://github.com/loicdtx/bfastSpatial prior to

implementation to ensure correct versions, that there are no issues, and to become

familiar with the algorithm. For a step by step tutorial visit http://changemonitor-

wur.github.io/talks/bfastSpatial-2016/bfastSpatial_Peru.html#(1) **

# install developer’s version of bfastSpatial, unless it has been updated to

accommodate the new Landsat collection 1 data naming convention then no need for

ref = ‘develop’

devtools::install_github(‘loicdtx/bfastSpatial’, ref = ‘develop’)

# set directory path

setwd(‘~/path_to_study_site_directory’)

# set path for reading and saving files

path <- getwd()

# load bfastSpatial and set tmpdir

library(bfastSpatial)

tmpDir <- rasterOptions()$tmpdir

# set the path to the location of script

inDir <- file.path(path, 'data')

# stepDir is where intermediary outputs are stored

stepDir <- file.path(inDir, 'datastep')

# directory for Landsat data

landsatDir <- file.path(stepDir, 'landsat')

# where individual VI layers are stored prior to being stacked; ndviDir, eviDir, etc. are

subdirectories of stepDir

ndviDir <- file.path(stepDir, 'ndvi')

eviDir <- file.path(stepDir, 'evi')

msaviDir <- file.path(stepDir, 'msavi')

ndmiDir <- file.path(stepDir, 'ndmi')

nbrDir <- file.path(stepDir, 'nbr')

nbr2Dir <- file.path(stepDir, 'nbr2')

# outDir is where outputs are stored

outDir <- file.path(inDir, 'out')

Page 75: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

68

# processLandsatBatch is variable due to the change in USGS ESPA file naming

convention. If using developers version of bfastSpatial use the following to apply the

cloud mask: keep = c(322, 386) applies to Landsat 8 data. Change to: keep = c(66,

130) for Landsat 5-7 data

# script to unzip Landsat files, apply cloud mask, and calculate VI if not available

if (!file.exists(file.path(inDir, 'ndvi_stack.grd'))) {

# unzip individual file, use the cloud mask, create ndvi if not available

processLandsatBatch(x = landsatDir, outdir = ndviDir,

delete = TRUE, overwrite = TRUE, mask = 'pixel_qa', vi = 'ndvi',

keep = c(322, 386))

# make temporal ndvi stack

ndviStack <- timeStack(x = ndviDir, pattern = glob2rx('*.grd'),

filename = file.path(inDir, 'ndvi_stack.grd'),

datatype = 'INT2S')

} else {

ndviStack <- brick(file.path(inDir, 'ndvi_stack.grd'))

}

# set ndviStack to x to prepare to run through bfmSpatial

x <- ndviStack

# run bfmSpatial on x/ndviStack with same parameters used in this research

bfmSpatial <- function(x, dates=NULL, pptype='irregular', start = 2013,

monend=NULL,

formula = response ~ harmon, order = 1, lag = NULL, slag = NULL,

history = c("all"), type = "OLS-MOSUM", h = 0.25, end = 10, level =

0.05, mc.cores=1, returnLayers = c("breakpoint", "magnitude", "error",

“history”, “r.squared”, “adj.r.squared”, “coefficients”), sensor=NULL,

...) {

# populate date parameter with date data from Landsat scene info

if(is.null(dates)) {

if(is.null(getZ(x))) {

if(!.isLandsatSceneID(x)){ # Check if dates can be extracted from layernames

stop('A date vector must be supplied, either via the date argument, the z

dimension of x or comprised in names(x)')

} else {

dates <- as.Date(getSceneinfo(names(x))$date)

}

} else {

dates <- getZ(x)

}

}

Page 76: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

69

# optional: reformat sensor if needed

# prepare for subsetting

sensor <- c(sensor, "ETM+ SLC-on", "ETM+ SLC-off", “OLI”)

s <- getSceneinfo(names(x))

s <- s[which(s$sensor %in% sensor), ]

# determine length of coefficient vector

# = intercept [+ trend] [+ harmoncos*order] [+ harmonsin*order]

coef_len <- 1 # intercept

modterms <- attr(terms(formula), "term.labels")

if("trend" %in% modterms)

coef_len <- coef_len + 1

if("harmon" %in% modterms)

coef_len <- coef_len + (order * 2) # sin and cos terms

fun <- function(x) {

# subset x by sensor

if(!is.null(sensor))

x <- x[which(s$sensor %in% sensor)]

# convert to bfast ts

ts <- bfastts(x, dates=dates, type=pptype)

#optional: apply window() if monend is supplied

if(!is.null(monend))

ts <- window(ts, end=monend)

# run bfastmonitor(), or assign NA if only NA's (ie. if a mask has been applied)

if(!all(is.na(ts))){

bfm <- try(bfastmonitor(data=ts, start=start,

formula=formula,

order=order, lag=lag, slag=slag,

history=history,

type=type, h=h,

end=end, level=level), silent=TRUE)

# assign 1 to error and NA to all other fields if an error is encountered

if(class(bfm) == 'try-error') {

bkpt <- NA

Page 77: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

70

magn <- NA

err <- 1

history <- NA

rsq <- NA

adj_rsq <- NA

coefficients <- rep(NA, coef_len)

} else {

bkpt <- bfm$breakpoint

magn <- bfm$magnitude

err <- NA

history <- bfm$history[2] - bfm$history[1]

rsq <- summary(bfm$model)$r.squared

adj_rsq <- summary(bfm$model)$adj.r.squared

coefficients <- coef(bfm$model)

}

} else {

bkpt <- NA

magn <- NA

err <- NA

history <- NA

rsq <- NA

adj_rsq <- NA

coefficients <- rep(NA, coef_len)

}

res <- c(bkpt, magn, err, history, rsq, adj_rsq)

names(res) <- c("breakpoint", "magnitude", "error", "history", "r.squared",

"adj.r.squared")

res <- res[which(names(res) %in% returnLayers)]

if("coefficients" %in% returnLayers)

res <- c(res, coefficients)

return(res)

}

out <- mc.calc(x=x, fun=fun, mc.cores=mc.cores, ...)

return(out)

}

#after bfmSpatial runs view output brick

out

# extract change raster

Page 78: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

71

change <- raster(out, 1)

# create month product

months <- changeMonth(change)

# set up labels and colormap for months

monthlabs <- c("jan", "feb", "mar", "apr", "may", "jun",

"jul", "aug", "sep", "oct", "nov", "dec")

cols <- rainbow(12)

# extract magnitude of the raster and scale values between 0 – 1.

magn <- raster(out, 2) / 10000

# make a version showing only breaking pixels

magn_bkp <- magn

magn_bkp [is.na(chang)] <- NA

opar <- par(mfrow=c(1, 2))

# Write breakpoint, yearly break month product, and breakpoint magnitude raster

layers to GeoTiff files as well as the raster brick to a .grd file.

writeRaster(out[[1]], filename = "Site1_NDVI_breaks.tif", format = "GTiff",

overwrite = TRUE)

writeRaster(months$changeMonth2013, filename = "Site1_NDVI_breaksmos13.tif",

format = "GTiff", overwrite = TRUE)

writeRaster(months$changeMonth2014, filename = "Site1_NDVI_breaksmos14.tif",

format = "GTiff", overwrite = TRUE)

writeRaster(months$changeMonth2015, filename = "Site1_NDVI_breaksmos15.tif",

format = "GTiff", overwrite = TRUE)

writeRaster(months$changeMonth2016, filename = "Site1_NDVI_breaksmos16.tif",

format = "GTiff", overwrite = TRUE)

writeRaster(magn_bkp, filename = "Site1_NDVI_magbreaks.tif", format = "GTiff",

overwrite = TRUE)

writeRaster(out, filename = "data/out/out_NDVI.grd", overwrite = TRUE)

# Test breakpoints

plot(ndviStack[[80]], col = grey.colors(255), legend = F)

plot(out[[1]], add=TRUE)

# Test months product

plot(months, col=cols, breaks=c(1:12), legend=FALSE)

legend("bottomright", legend=monthlabs, cex=0.5, fill=cols, ncol=2)

# Test magnitudes

plot(magn_bkp, main="Magnitude of a breakpoint")

plot(magn, main="Magnitude: all pixels")

Page 79: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

72

B. ERROR MATRICES Yucatan Area 1

NDMI

D S Total UA

D 52 9 61 85.25

S 12 45 57 78.95

Total 64 54 118

PA 81.25 83.33333333 OA 82.20338983

NBR/NBR2

D S Total UA

D 50 9 59 84.75

S 14 45 59 76.27

Total 64 54 118

PA 78.125 83.33333333 OA 80.50847458

EVI

D S Total UA

D 28 2 30 93.33

S 36 52 88 59.09

Total 64 54 118

PA 43.75 96.2962963 OA 67.79661017

NDVI

D S Total UA

D 35 4 39 89.74

S 29 50 79 63.29

Total 64 54 118

PA 54.6875 92.59259259 OA 72.03389831

MSAVI

D S Total UA

D 19 2 21 90.48

S 45 52 97 53.61

Total 64 54 118

PA 29.6875 96.2962963 OA 60.16949153

Page 80: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

73

Yucatan Area 2

NDMI

D S Total UA

D 23 4 27 85.19

S 9 28 37 75.68

Total 32 32 64

PA 71.875 87.5 OA 79.6875

NBR/NBR2

D S Total UA

D 28 3 31 90.32

S 4 29 33 87.88

Total 32 32 64

PA 87.5 90.625 OA 89.0625

EVI

D S Total UA

D 8 2 10 80.00

S 24 30 54 55.56

Total 32 32 64

PA 25 93.75 OA 59.375

NDVI

D S Total UA

D 19 8 27 70.37

S 13 24 37 64.86

Total 32 32 64

PA 59.375 75 OA 67.1875

MSAVI

D S Total UA

D 11 5 16 68.75

S 21 27 48 56.25

Total 32 32 64

PA 34.375 84.375 OA 59.375

Page 81: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

74

Yucatan Overall

NDMI

D S Total UA

D 75 13 88 85.23

S 21 73 94 77.66

Total 96 86 182

PA 78.125 84.88372093 OA 81.31868132

NBR/NBR2

D S Total UA

D 78 12 90 86.67

S 18 74 92 80.43

Total 96 86 182

PA 81.25 86.04651163 OA 83.51648352

EVI

D S Total UA

D 36 4 40 90.00

S 60 82 142 57.75

Total 96 86 182

PA 37.5 95.34883721 OA 64.83516484

NDVI

D S Total UA

D 54 12 66 81.82

S 42 74 116 63.79

Total 96 86 182

PA 56.25 86.04651163 OA 70.32967033

MSAVI

D S Total UA

D 30 7 37 81.08

S 66 79 145 54.48

Total 96 86 182

PA 31.25 91.86046512 OA 59.89010989

Page 82: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

75

Guanacaste Area 1

NDMI

D S Total UA

D 16 8 24 66.67

S 5 42 47 89.36

Total 21 50 71

PA 76.19047619 84 OA 81.69014085

NBR/NBR2

D S Total UA

D 15 6 21 71.43

S 6 44 50 88.00

Total 21 50 71

PA 71.42857143 88 OA 83.09859155

EVI

D S Total UA

D 15 11 26 57.69

S 6 39 45 86.67

Total 21 50 71

PA 71.42857143 78 OA 76.05633803

NDVI

D S Total UA

D 15 8 23 65.22

S 6 42 48 87.50

Total 21 50 71

PA 71.42857143 84 OA 80.28169014

MSAVI

D S Total UA

D 14 11 25 56.00

S 7 39 46 84.78

Total 21 50 71

PA 66.66666667 78 OA 74.64788732

Page 83: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

76

Guanacaste Area 2

NDMI

D S Total UA

D 19 8 27 70.37

S 51 42 93 45.16

Total 70 50 120

PA 27.14285714 84 OA 50.83333333

NBR/NBR2

D S Total UA

D 19 4 23 82.61

S 51 46 97 47.42

Total 70 50 120

PA 27.14285714 92 OA 54.16666667

EVI

D S Total UA

D 17 2 19 89.47

S 53 48 101 47.52

Total 70 50 120

PA 24.28571429 96 OA 54.16666667

NDVI

D S Total UA

D 9 3 12 75.00

S 61 47 108 43.52

Total 70 50 120

PA 12.85714286 94 OA 46.66666667

MSAVI

D S Total UA

D 14 2 16 87.50

S 56 48 104 46.15

Total 70 50 120

PA 20 96 OA 51.66666667

Page 84: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

77

Guanacaste Overall

NDMI

D S Total UA

D 35 16 51 68.63

S 56 84 140 60.00

Total 91 100 191

PA 38.46153846 84 OA 62.30366492

NBR/NBR2

D S Total UA

D 34 10 44 77.27

S 57 90 147 61.22

Total 91 100 191

PA 37.36263736 90 OA 64.92146597

EVI

D S Total UA

D 32 13 45 71.11

S 59 87 146 59.59

Total 91 100 191

PA 35.16483516 87 OA 62.30366492

NDVI

D S Total UA

D 24 11 35 68.57

S 67 89 156 57.05

Total 91 100 191

PA 26.37362637 89 OA 59.16230366

MSAVI

D S Total UA

D 28 13 41 68.29

S 63 87 150 58.00

Total 91 100 191

PA 30.76923077 87 OA 60.20942408

Page 85: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

78

All sites overall

NDMI

D S Total UA

D 110 29 139 79.14

S 77 157 234 67.09

Total 187 186 373

PA 58.82352941 84.40860215 OA 71.58176944

NBR/NBR2

D S Total UA

D 112 22 134 83.58

S 75 164 239 68.62

Total 187 186 373

PA 59.89304813 88.17204301 OA 73.99463807

EVI

D S Total UA

D 68 17 85 80.00

S 119 169 288 58.68

Total 187 186 373

PA 36.36363636 90.86021505 OA 63.53887399

NDVI

D S Total UA

D 78 23 101 77.23

S 109 163 272 59.93

Total 187 186 373

PA 41.71122995 87.6344086 OA 64.61126005

MSAVI

D S Total UA

D 58 20 78 74.36

S 129 166 295 56.27

Total 187 186 373

PA 31.01604278 89.24731183 OA 60.0536193

Page 86: Copyright 2017, Vaughn Smith

Texas Tech University, Vaughn Smith, December 2017

79

All sites without Guanacaste site 2

NDMI

D S Total UA

D 91 21 112 81.25

S 26 115 141 81.56

Total 117 136 253

PA 77.77777778 84.55882353 OA 81.4229249

NBR/NBR2

D S Total UA

D 93 18 111 83.78

S 24 118 142 83.10

Total 117 136 253

PA 79.48717949 86.76470588 OA 83.39920949

EVI

D S Total UA

D 51 15 66 77.27

S 66 121 187 64.71

Total 117 136 253

PA 43.58974359 88.97058824 OA 67.98418972

NDVI

D S Total UA

D 69 20 89 77.53

S 48 116 164 70.73

Total 117 136 253

PA 58.97435897 85.29411765 OA 73.12252964

MSAVI

D S Total UA

D 44 18 62 70.97

S 73 118 191 61.78

Total 117 136 253

PA 37.60683761 86.76470588 OA 64.03162055