creating functional groups for palanan forest plot data using...
TRANSCRIPT
Proceeding of the International Conference on Artificial Intelligence in Computer Science and ICT (AICS 2013), 25 -26 November 2013, Langkawi, MALAYSIA. (e-ISBN 978-967-11768-3-2). Organized by WorldConferences.net 305
CREATING FUNCTIONAL GROUPS FOR PALANAN FOREST PLOT
DATA USING CLUSTERING AND LINEAR DISCRIMINANT
ANALYSIS Jaymar Soriano
1, Angelo Meneses
1, Ryan Odylon Rondon
1, Adrian Roy Valdez
1, Melissa
Cardenas2, and Sandra Yap
2
1Scientific Computing Laboratory, Department of Computer Science, University of the Philippines,
Diliman, Quezon City
2Institue of Biology, University of the Philippines, Diliman, Quezon City
Corresponding authors: [email protected], [email protected]
Abstract
Forest management is usually done for commercial and/or conservation purposes. It involves
collection of data from individual forest trees, such as coordinates and diameter-at-breast height
(DBH), and an extensive analysis that could be obtained from these data. Development of growth and
yield models is affected by parameters derived from an individual tree level or species level of
analysis. Thus, creation of functional groups is usually sought as an initial step. The Palanan Forest
Dynamics Plot under study was established on 1994 in Palanan, Isabela by the Center for Tropical
Forest Science of the Smithsonian Tropical Research Institute. It has one of the highest biodiversity
indices for forests in the Philippines. Local and wide competition factors, recruitment, and recruitment
factor are computed from the DBH and coordinates while the slopes are computed from the elevation.
Growth factor is also computed from the differences in the DBH values of the tress from 1998 to
2010. We perform discriminant analysis using several combinations of these parameters. We find that
elevation, slope, local and wide competition factors, and growth factor yield the most distinctive
functional groups. A dedicated analysis of Dipterocarpaceae family is also performed since they are
the most dominant comprising almost 50% of the plot. They are found to be characterized with high
elevation, low competition factors, and large growth factors.
Keywords: Clustering, dipterocarps, discriminant analysis, forest management, soft computing
1. Introduction
The Philippines has long been identified as a biodiversity hotspot with its high degree of endemicity
per unit area coupled with high pressures on habitats (Myers et al. 2000). The Sierra Madre Range in
particular may be considered as the hottest spot in the Philippines (Antolin 2003) as it is one of the last
areas that have contiguous intact primary forests, in a country that has only 18-20% forest cover
(ESSC, 1999 and IBON, 2002) and only 2.7% primary forest cover in 1999, as compared to 70%
coverage in 1900 (ESSC, 1999). As of 2003, forest cover has been estimated at 24% but this includes
open forests and plantations (Forest Management Bureau, 2008) that may not be considered
biologically diverse. In the advent of increasing importance to reducing the effects of climate change,
older primary forests that are multi-layered and multi-aged have been shown to have higher carbon
storage than other types of tree stands (Keith et al., 2009).
Meanwhile, Dipterocarp trees dominate Southeast Asian forests; most species are found in the upper
canopy and emergent layers, but some species are understory trees (Hopea, Vatica some Shorea)
(LaFrankie, 2010). These forests have a long history of being logged for their hardwood, which has
been intensified in the late 1800s to the mid-1900s, and still continues today, leading to species
classified as critically endangered. Non-timber forest products of dipterocarps include resins such as
Proceeding of the International Conference on Artificial Intelligence in Computer Science and ICT (AICS 2013), 25 -26 November 2013, Langkawi, MALAYSIA. (e-ISBN 978-967-11768-3-2). Organized by WorldConferences.net 306
dammar (LaFrankie 2010). In the Philippines, noted non-timber products include palosapis and balau
resins, seed oils such as malayakal and gisok. Pardo de Tavera in his Plantas Medicinales de Filipinas
(1892) notes that the oleo resin from Dipterocarpus turbinatos, locally called mayapis, yields an oleo
resin that can be used as a diuretic, and for other indications such as bronchial catarrh and ulcers.
Very clearly, forest management is needed in order to come up with policies for stable commercial use
and also preservation of species. Forest management decision makers come up with growth and yield
models to help in this endeavor. This starts by creating functional groups to which the models will be
used. This avoids the complexity of implementing a different model for different species of trees.
Resource management decisions for commercial and conservational purposes can be made from the
state of the forest implied by species groups. Analysis of the discriminants that effectively identified
distinct functional groups will be useful for future researchers that seek to create new species
groupings in other areas. This will also guide future researchers about which relevant data are worth
recording.
In this paper, data from individual trees of the Palanan Forest Dynamics Plot were analyzed to come
up with functional groups, which can later be used for forest management decisions. Growth factors
such as recruitment and competition are computed from the diameter-at-breast height (DBH) data
while slope is computed from the elevation data. We performed clustering and discriminant analysis,
both on tree-level and species-level and identified what combination of factors can yield significant
discrimination of functional groups. Finally from the result of the analyses, further analysis of trees
under the identified groups is performed and verified. Since the Dipterocarpaceae family of trees
composes more than 50% of the forest plot and is popular for commercial use, a dedicated analysis of
is performed with them.
Figure 1: Distribution of forest tree species with respect to rank.
2. The Palanan Forest Dynamics Plot Census
The Palanan Forest Dynamics Plot (PFDP) was established on 1994 in Palanan, Isabela by the Center
for Tropical Forest Science of the Smithsonian Tropical Research Institute. It is located 17° 02' 36 N,
122° 22' 58 E, in Isabela, Philippines. The forest profile of the PFDP is similar to a lowland mixed
dipterocarp forests in other CTFS plots in Southeast Asia (CTFS, 2011), however it exhibited
relatively poor recruitment. Despite this, basal area is high due to a numerous large number of trees,
0.0000001
0.000001
0.00001
0.0001
0.001
0.01
0.1
1
1 10 100
Frequency
Rank
BasalArea
Popula on
Proceeding of the International Conference on Artificial Intelligence in Computer Science and ICT (AICS 2013), 25 -26 November 2013, Langkawi, MALAYSIA. (e-ISBN 978-967-11768-3-2). Organized by WorldConferences.net 307
the largest of which was a Shorea negrosensis tree with a DBH of 203 cm. The project plot in Palanan
is one of 40 permanent forest plots distributed in 21 countries (CTFS-STRI, 2011). The first census
was conducted in 1994 over the 8-ha area, which was extended to 16-ha (400 m x 400 m) in the 1998,
2004, and 2010 censuses (Co et al., 2006). The censuses contain the x and y coordinates, DBH, and
status of individual trees among 310 different species identified in the forest. Additionally 2,530 more
trees are unidentified. The population and total basal area of trees per species is shown in Figure 1.
The first 50 dominant species comprise about 72% of the plot and it is also verified that the
Dipterocarpaceae family dominates the plot occupying more than 50% of the total occupied area.
A separate elevation data for the forest plot is provided and from which the elevation (z-coordinate) of
the individual trees are interpolated and consequently, the slopes are also calculated. Figure 2 shows
the topology of the forest plot.
Figure 2: Topology of the Palanan Forest Dynamics Plot.
3. Growth Factors
Aside from DBH, we computed for other growth factors, which will later be used for clustering and
discriminant analysis. Competition factor is a quantity that indicates the resources that are not
available to the reference tree due to competition with its neighbors (Phillips, 2000). This value is
taken as an alternative to resource data such as luminance and soil nutrition maps that are not in the
census data. Two competition factors are computed –one is the local competition factor (within 5
meters) and the other is the wide competition factor (within 30 meters). They are respectively
computed as follows:
Sj=
DBHi
dijDBH
ji =1
Nlocal
å (1)
S '
j=
DBHi
dijDBH
ji =1
Nwide
å (2)
where dij is the absolute distance of a tree within the competition area and the reference tree. The
formulas imply that closer neighbor trees with large DBH contribute to increasing the competition
factor while a large DBH of the reference tree lowers its own competition factor. Recruitment is
computed for each tree as the number of trees of the same species in a 5m radius with DBH less than
10 cm. To be able to count its recruits, the reference tree must also have a DBH greater than or equal
X
0
100
200
300
400
Y
0
100
200
300
400
8090
100110
120
V1
V2
V3
V4
V5
V6
V7
V8
V9
V1
0V
11
V1
2V
13
V1
4V
15
V1
6V
17
V1
8V
19
V2
0V
21
V2
2V
23
V2
4V
25
V2
6V
27
V2
8V
29
V3
0V
31
V3
2V
33
V3
4V
35
V3
6V
37
V3
8V
39
V4
0V
41
V4
2V
43
V4
4V
45
V4
6V
47
V4
8V
49
V5
0V
51
V5
2V
53
V5
4V
55
V5
6V
57
V5
8V
59
V6
0V
61
V6
2V
63
V6
4V
65
V6
6V
67
V6
8V
69
V7
0V
71
V7
2V
73
V7
4V
75
V7
6V
77
V7
8V
79
V8
0V
81
818079787776757473727170696867666564636261605958575655545352515049484746454443424140393837363534333231302928272625242322212019181716151413121110987654321
Slope Matrix
2 4 6 8 10
Value0
100
02
500
Color Key
and Histogram
Co
un
t
Proceeding of the International Conference on Artificial Intelligence in Computer Science and ICT (AICS 2013), 25 -26 November 2013, Langkawi, MALAYSIA. (e-ISBN 978-967-11768-3-2). Organized by WorldConferences.net 308
to 10 cm, otherwise the tree has no recruit. Recruitment factor is also computed per tree. Recruitment
factor for a sufficiently large tree (with DBH greater than or equal to 10 cm) is the average of the wide
area competition factors of its recruits. It is a value that indicates whether the reference tree’s recruits
grow successfully (Phillips, 2000). Finally, the growth rate of a tree is computed as the difference of
the DBHs of the same tree between the 1998 and 2010 censuses divided by 12.
Figure 3: Spatial distribution of trees clustered by growth factor: DBH (upper left), recruitment factor
(upper right), competition factor (lower left), and growth rate (lower right).
4. Spatial Distribution
Clustering is performed using k-means algorithm. Here, the number of clusters is initially set. It
generates a random centroid and attempts to minimize the sum of squares of Euclidean distances of
each data point to the centroid. The data is partitioned into k distinct clusters and each observation
belongs to exactly one cluster.
Clustering was done using each of the growth factors as preliminary analysis on the distribution of the
trees. By plotting the trees on their coordinates identified in different clusters, we can analyze the
spatial distribution based on the growth factor used. The results are shown in Figure 3. It is observed
Proceeding of the International Conference on Artificial Intelligence in Computer Science and ICT (AICS 2013), 25 -26 November 2013, Langkawi, MALAYSIA. (e-ISBN 978-967-11768-3-2). Organized by WorldConferences.net 309
that majority of the trees have small DBH values and highly clustered spatially while trees having
distinctly large DBH are scattered throughout the plot. Similar observation is noted with recruitment
and recruitment factor. This logically follows since their definitions are derived from DBH values. On
the other hand, the spatial distribution of the trees by competition factors does not follow from the
DBH values. We can see that the trees of comparable competition factors are not highly clustered
spatially. Trees with low competition factors can also be identified in certain regions of the plot. For
clustering by growth rate, we found that a total of 39,239 trees survived from the 1998 to 2010 census.
Figure 3 shows that majority of the trees grow at a slower rate and are highly clustered spatially
similar to that observed with clustering by DBH.
5. Identification of Functional Groups
Linear discriminant analysis is a classification algorithm that achieves minimum error rate
classification for observations with normal densities. This is done by maximizing the ratio of between-
class variance to the within-class variance in any particular data set, thus maximizing separability of
classes. Geometrically, it doesn’t change the shape and location of the original data sets and only
draws a hyper plane that separates the given classes.
Discriminant analysis, together with clustering analysis, was used to identify functional groups for the
forest plot. From the complete set of factors that can be used for the analysis namely: DBH,
recruitment, recruitment factor, local competition factor, wide competition factor, growth rate,
elevation, and slope, we have identified that five among these factors are able to effectively
discriminate the trees, identifying possible functional groups for the forest plot. These factors are local
competition factor (S), wide competition factor (S’), growth rate (G), elevation (z), and slope (m). The
elimination of DBH, recruitment, and recruitment factor can be explained from the dependence of
these factors with each other together with the wide competition factor. Thus, wide competition factor
becomes the representative for these factors.
We first look at the classification on a tree-level. Discriminant analysis shows that two discriminant
functions can account for 99.9% of variation in the data. These two functions are given by:
Function1 = 0.986 z - 0.044 m + 0.015 S + 0.147 S' + 0.008 G
Function2 = -0.146 z - 0.067 m + 0.080 S + 0.944 S' + 0.064 G (3)
The first discriminant function tends to discriminate by elevation while the second by wide
competition factor. Projecting the data onto the subspace determined by these functions, we see in
Figure 4 that highly discriminated groups are identified. The same is achieved using clustering
analysis juxtaposed in Figure 4. The clustering analysis is able to verify qualitative findings that a
group of trees are clustering across the ridge which are found in high elevations, a group with high
wide competition factors and another with low wide competition factors.
We also performed classification on a species-level. For this, the species are represented by the
average values of the growth factors used for classification. Discriminant analysis shows that two
discriminant functions can account for 94.5% of variation in the data. These two functions are given
by:
Function1 = 0.684 z - 0.830 m - 0.859 S + 0.751 S' + 0.581 G
Function2 = 0.355 z - 0.455 m + 0.723 S + 0.160 S' + 0.034 G (3)
Proceeding of the International Conference on Artificial Intelligence in Computer Science and ICT (AICS 2013), 25 -26 November 2013, Langkawi, MALAYSIA. (e-ISBN 978-967-11768-3-2). Organized by WorldConferences.net 310
Figure 4: Tree-level classification using clustering analysis (left) and discriminant analysis (right).
Figure 5: Tree-level classification using clustering analysis (left) and discriminant analysis (right).
The first discriminant function tends to discriminate by the negative of the local competition factor
and growth rate while the second by the local competition factor. Projecting the data onto the subspace
determined by these functions, Figure 4 shows that the 310 different species are discriminated into
distinct groups. The same is achieved using clustering analysis juxtaposed in Figure 4. Sufficiently
distinguishable groups can also be recognized from the result of clustering analysis, which identified
five out of ten Dipterocarpaceae species into one cluster. This cluster is characterized by high growth
rate and low local competition factor, which verifies the classification by discriminant analysis. Four
out of ten Dipterocarpaceae species together with 11 other species are also found in another cluster
characterized by low growth rate and high local competition factor.
Proceeding of the International Conference on Artificial Intelligence in Computer Science and ICT (AICS 2013), 25 -26 November 2013, Langkawi, MALAYSIA. (e-ISBN 978-967-11768-3-2). Organized by WorldConferences.net 311
Clustering and discriminant analysis on Dipterocarpaceae species only reiterates the findings observed
in the species-level classification. That is, Dipterocarpaceae species can be divided into two functional
groups – one with high growth rate and low competition factors, and another with the exact opposite.
5. Conclusion and Future Works
We performed clustering and discriminant analysis on the Palanan Forest Dynamics Plot censuses
using local competition factor, wide competition factor, growth rate, elevation, and slope. Tree level
classification generated results that were highly influenced by the terrain data while species-level
classification by local competition factor and growth rate. The advantage of the species-level
classification is that it can group species that coexist frequently and can be used to obtain species
preferences useful for forest management. The clusters generated can then be used for simulation in
the Palanan Forest Dynamics plot for growth and yield modeling. The characteristics of each cluster
can be used as inputs for predictive growth and yield functions.
We remark that in the species-level classification, the species are represented by the average values of
the growth factors of trees under the same species. Although significant classification has been
achieved, the standard deviation and other statistics will be investigated if or not a more effective
discrimination can be achieved.
References
[1] Co L., et. al. Forest Trees of Palanan,Philippines: A Study in Population Ecology, CIDS UP
Diliman, Philippines , 2006
[2] Phillips P.D., et. al. Grouping tree species for analysis of forest data in Kalimantan (Indonesian
Borneo), 2002
[3] Kohler P., Huth A. The effects of tree species grouping in tropical rain forest modelling, Center
for Environmental Systems Research, University of Kassel, December 1998
[4] Myers N, Mittermeier RA, Mittermeier CG, da Fonseca GAB, Kent J. Biodiversity hotspots for
conservation priorities. Nature 403: 853-858. 2000.
[5] Environmental Science for Social Change. Decline of the Philippine forest. The Bookmark, Inc.
Makati. 1999.
[6] IBON Foundation, Inc. The state of the Philippine environment. IBON Foundation, Inc. Manila.
1997.
[7] Keith H, Mackey BG and Lindenmayer DB. Re-evaluation of forest biomass carbon stocks and
lessons from the world’s most carbon-dense forests. Proceedings of the National Academy of
Sciences 106 (28): 11635–11640. 2009.
[8] LaFrankie, JVJ. Trees of tropical Asia: an illustrated guide to diversity. Black Tree
Publications, Philippines. 2010.
[9] W. Hardle and L. Simar. Applied Multivariate Statistical Analysis. Springer. 2007.