creating functional groups for palanan forest plot data using...

Proceeding of the International Conference on Artificial Intelligence in Computer Science and ICT (AICS 2013), 25 -26 November 2013, Langkawi, MALAYSIA. (e-ISBN 978-967-11768-3-2). Organized by WorldConferences.net 305

CREATING FUNCTIONAL GROUPS FOR PALANAN FOREST PLOT

DATA USING CLUSTERING AND LINEAR DISCRIMINANT

ANALYSIS Jaymar Soriano

1, Angelo Meneses

1, Ryan Odylon Rondon

1, Adrian Roy Valdez

1, Melissa

Cardenas2, and Sandra Yap

2

1Scientific Computing Laboratory, Department of Computer Science, University of the Philippines,

Diliman, Quezon City

2Institue of Biology, University of the Philippines, Diliman, Quezon City

Corresponding authors: [email protected], [email protected]

Abstract

Forest management is usually done for commercial and/or conservation purposes. It involves

collection of data from individual forest trees, such as coordinates and diameter-at-breast height

(DBH), and an extensive analysis that could be obtained from these data. Development of growth and

yield models is affected by parameters derived from an individual tree level or species level of

analysis. Thus, creation of functional groups is usually sought as an initial step. The Palanan Forest

Dynamics Plot under study was established on 1994 in Palanan, Isabela by the Center for Tropical

Forest Science of the Smithsonian Tropical Research Institute. It has one of the highest biodiversity

indices for forests in the Philippines. Local and wide competition factors, recruitment, and recruitment

factor are computed from the DBH and coordinates while the slopes are computed from the elevation.

Growth factor is also computed from the differences in the DBH values of the tress from 1998 to

2010. We perform discriminant analysis using several combinations of these parameters. We find that

elevation, slope, local and wide competition factors, and growth factor yield the most distinctive

functional groups. A dedicated analysis of Dipterocarpaceae family is also performed since they are

the most dominant comprising almost 50% of the plot. They are found to be characterized with high

elevation, low competition factors, and large growth factors.

Keywords: Clustering, dipterocarps, discriminant analysis, forest management, soft computing

1. Introduction

The Philippines has long been identified as a biodiversity hotspot with its high degree of endemicity

per unit area coupled with high pressures on habitats (Myers et al. 2000). The Sierra Madre Range in

particular may be considered as the hottest spot in the Philippines (Antolin 2003) as it is one of the last

areas that have contiguous intact primary forests, in a country that has only 18-20% forest cover

(ESSC, 1999 and IBON, 2002) and only 2.7% primary forest cover in 1999, as compared to 70%

coverage in 1900 (ESSC, 1999). As of 2003, forest cover has been estimated at 24% but this includes

open forests and plantations (Forest Management Bureau, 2008) that may not be considered

biologically diverse. In the advent of increasing importance to reducing the effects of climate change,

older primary forests that are multi-layered and multi-aged have been shown to have higher carbon

storage than other types of tree stands (Keith et al., 2009).

Meanwhile, Dipterocarp trees dominate Southeast Asian forests; most species are found in the upper

canopy and emergent layers, but some species are understory trees (Hopea, Vatica some Shorea)

(LaFrankie, 2010). These forests have a long history of being logged for their hardwood, which has

been intensified in the late 1800s to the mid-1900s, and still continues today, leading to species

classified as critically endangered. Non-timber forest products of dipterocarps include resins such as

mailto:[email protected]

mailto:[email protected]


dammar (LaFrankie 2010). In the Philippines, noted non-timber products include palosapis and balau

resins, seed oils such as malayakal and gisok. Pardo de Tavera in his Plantas Medicinales de Filipinas

(1892) notes that the oleo resin from Dipterocarpus turbinatos, locally called mayapis, yields an oleo

resin that can be used as a diuretic, and for other indications such as bronchial catarrh and ulcers.

Very clearly, forest management is needed in order to come up with policies for stable commercial use

and also preservation of species. Forest management decision makers come up with growth and yield

models to help in this endeavor. This starts by creating functional groups to which the models will be

used. This avoids the complexity of implementing a different model for different species of trees.

Resource management decisions for commercial and conservational purposes can be made from the

state of the forest implied by species groups. Analysis of the discriminants that effectively identified

distinct functional groups will be useful for future researchers that seek to create new species

groupings in other areas. This will also guide future researchers about which relevant data are worth

recording.

In this paper, data from individual trees of the Palanan Forest Dynamics Plot were analyzed to come

up with functional groups, which can later be used for forest management decisions. Growth factors

such as recruitment and competition are computed from the diameter-at-breast height (DBH) data

while slope is computed from the elevation data. We performed clustering and discriminant analysis,

both on tree-level and species-level and identified what combination of factors can yield significant

discrimination of functional groups. Finally from the result of the analyses, further analysis of trees

under the identified groups is performed and verified. Since the Dipterocarpaceae family of trees

composes more than 50% of the forest plot and is popular for commercial use, a dedicated analysis of

is performed with them.

Figure 1: Distribution of forest tree species with respect to rank.

2. The Palanan Forest Dynamics Plot Census

The Palanan Forest Dynamics Plot (PFDP) was established on 1994 in Palanan, Isabela by the Center

for Tropical Forest Science of the Smithsonian Tropical Research Institute. It is located 17° 02' 36 N,

122° 22' 58 E, in Isabela, Philippines. The forest profile of the PFDP is similar to a lowland mixed

dipterocarp forests in other CTFS plots in Southeast Asia (CTFS, 2011), however it exhibited

relatively poor recruitment. Despite this, basal area is high due to a numerous large number of trees,

0.0000001

0.000001

0.00001

0.0001

0.001

0.01

0.1

1

1 10 100

Frequency

Rank

BasalArea

Popula on


the largest of which was a Shorea negrosensis tree with a DBH of 203 cm. The project plot in Palanan

is one of 40 permanent forest plots distributed in 21 countries (CTFS-STRI, 2011). The first census

was conducted in 1994 over the 8-ha area, which was extended to 16-ha (400 m x 400 m) in the 1998,

2004, and 2010 censuses (Co et al., 2006). The censuses contain the x and y coordinates, DBH, and

status of individual trees among 310 different species identified in the forest. Additionally 2,530 more

trees are unidentified. The population and total basal area of trees per species is shown in Figure 1.

The first 50 dominant species comprise about 72% of the plot and it is also verified that the

Dipterocarpaceae family dominates the plot occupying more than 50% of the total occupied area.

A separate elevation data for the forest plot is provided and from which the elevation (z-coordinate) of

the individual trees are interpolated and consequently, the slopes are also calculated. Figure 2 shows

the topology of the forest plot.

Figure 2: Topology of the Palanan Forest Dynamics Plot.

3. Growth Factors

Aside from DBH, we computed for other growth factors, which will later be used for clustering and

discriminant analysis. Competition factor is a quantity that indicates the resources that are not

available to the reference tree due to competition with its neighbors (Phillips, 2000). This value is

taken as an alternative to resource data such as luminance and soil nutrition maps that are not in the

census data. Two competition factors are computed –one is the local competition factor (within 5

meters) and the other is the wide competition factor (within 30 meters). They are respectively

computed as follows:

Sj=

DBHi

dijDBH

ji =1

Nlocal

å (1)

S '

j=

DBHi

dijDBH

ji =1

Nwide

å (2)

where dij is the absolute distance of a tree within the competition area and the reference tree. The

formulas imply that closer neighbor trees with large DBH contribute to increasing the competition

factor while a large DBH of the reference tree lowers its own competition factor. Recruitment is

computed for each tree as the number of trees of the same species in a 5m radius with DBH less than

10 cm. To be able to count its recruits, the reference tree must also have a DBH greater than or equal

X

0

100

200

300

400

Y

0

100

200

300

400

8090

100110

120

V1

V2

V3

V4

V5

V6

V7

V8

V9

V1

0V

11

V1

2V

13

V1

4V

15

V1

6V

17

V1

8V

19

V2

0V

21

V2

2V

23

V2

4V

25

V2

6V

27

V2

8V

29

V3

0V

31

V3

2V

33

V3

4V

35

V3

6V

37

V3

8V

39

V4

0V

41

V4

2V

43

V4

4V

45

V4

6V

47

V4

8V

49

V5

0V

51

V5

2V

53

V5

4V

55

V5

6V

57

V5

8V

59

V6

0V

61

V6

2V

63

V6

4V

65

V6

6V

67

V6

8V

69

V7

0V

71

V7

2V

73

V7

4V

75

V7

6V

77

V7

8V

79

V8

0V

81

818079787776757473727170696867666564636261605958575655545352515049484746454443424140393837363534333231302928272625242322212019181716151413121110987654321

Slope Matrix

2 4 6 8 10

Value0

100

02

500

Color Key

and Histogram

Co

un

t


to 10 cm, otherwise the tree has no recruit. Recruitment factor is also computed per tree. Recruitment

factor for a sufficiently large tree (with DBH greater than or equal to 10 cm) is the average of the wide

area competition factors of its recruits. It is a value that indicates whether the reference tree’s recruits

grow successfully (Phillips, 2000). Finally, the growth rate of a tree is computed as the difference of

the DBHs of the same tree between the 1998 and 2010 censuses divided by 12.

Figure 3: Spatial distribution of trees clustered by growth factor: DBH (upper left), recruitment factor

(upper right), competition factor (lower left), and growth rate (lower right).

4. Spatial Distribution

Clustering is performed using k-means algorithm. Here, the number of clusters is initially set. It

generates a random centroid and attempts to minimize the sum of squares of Euclidean distances of

each data point to the centroid. The data is partitioned into k distinct clusters and each observation

belongs to exactly one cluster.

Clustering was done using each of the growth factors as preliminary analysis on the distribution of the

trees. By plotting the trees on their coordinates identified in different clusters, we can analyze the

spatial distribution based on the growth factor used. The results are shown in Figure 3. It is observed


that majority of the trees have small DBH values and highly clustered spatially while trees having

distinctly large DBH are scattered throughout the plot. Similar observation is noted with recruitment

and recruitment factor. This logically follows since their definitions are derived from DBH values. On

the other hand, the spatial distribution of the trees by competition factors does not follow from the

DBH values. We can see that the trees of comparable competition factors are not highly clustered

spatially. Trees with low competition factors can also be identified in certain regions of the plot. For

clustering by growth rate, we found that a total of 39,239 trees survived from the 1998 to 2010 census.

Figure 3 shows that majority of the trees grow at a slower rate and are highly clustered spatially

similar to that observed with clustering by DBH.

5. Identification of Functional Groups

Linear discriminant analysis is a classification algorithm that achieves minimum error rate

classification for observations with normal densities. This is done by maximizing the ratio of between-

class variance to the within-class variance in any particular data set, thus maximizing separability of

classes. Geometrically, it doesn’t change the shape and location of the original data sets and only

draws a hyper plane that separates the given classes.

Discriminant analysis, together with clustering analysis, was used to identify functional groups for the

forest plot. From the complete set of factors that can be used for the analysis namely: DBH,

recruitment, recruitment factor, local competition factor, wide competition factor, growth rate,

elevation, and slope, we have identified that five among these factors are able to effectively

discriminate the trees, identifying possible functional groups for the forest plot. These factors are local

competition factor (S), wide competition factor (S’), growth rate (G), elevation (z), and slope (m). The

elimination of DBH, recruitment, and recruitment factor can be explained from the dependence of

these factors with each other together with the wide competition factor. Thus, wide competition factor

becomes the representative for these factors.

We first look at the classification on a tree-level. Discriminant analysis shows that two discriminant

functions can account for 99.9% of variation in the data. These two functions are given by:

Function1 = 0.986 z - 0.044 m + 0.015 S + 0.147 S' + 0.008 G

Function2 = -0.146 z - 0.067 m + 0.080 S + 0.944 S' + 0.064 G (3)

The first discriminant function tends to discriminate by elevation while the second by wide

competition factor. Projecting the data onto the subspace determined by these functions, we see in

Figure 4 that highly discriminated groups are identified. The same is achieved using clustering

analysis juxtaposed in Figure 4. The clustering analysis is able to verify qualitative findings that a

group of trees are clustering across the ridge which are found in high elevations, a group with high

wide competition factors and another with low wide competition factors.

We also performed classification on a species-level. For this, the species are represented by the

average values of the growth factors used for classification. Discriminant analysis shows that two

discriminant functions can account for 94.5% of variation in the data. These two functions are given

by:

Function1 = 0.684 z - 0.830 m - 0.859 S + 0.751 S' + 0.581 G

Function2 = 0.355 z - 0.455 m + 0.723 S + 0.160 S' + 0.034 G (3)


Figure 4: Tree-level classification using clustering analysis (left) and discriminant analysis (right).

Figure 5: Tree-level classification using clustering analysis (left) and discriminant analysis (right).

The first discriminant function tends to discriminate by the negative of the local competition factor

and growth rate while the second by the local competition factor. Projecting the data onto the subspace

determined by these functions, Figure 4 shows that the 310 different species are discriminated into

distinct groups. The same is achieved using clustering analysis juxtaposed in Figure 4. Sufficiently

distinguishable groups can also be recognized from the result of clustering analysis, which identified

five out of ten Dipterocarpaceae species into one cluster. This cluster is characterized by high growth

rate and low local competition factor, which verifies the classification by discriminant analysis. Four

out of ten Dipterocarpaceae species together with 11 other species are also found in another cluster

characterized by low growth rate and high local competition factor.


Clustering and discriminant analysis on Dipterocarpaceae species only reiterates the findings observed

in the species-level classification. That is, Dipterocarpaceae species can be divided into two functional

groups – one with high growth rate and low competition factors, and another with the exact opposite.

5. Conclusion and Future Works

We performed clustering and discriminant analysis on the Palanan Forest Dynamics Plot censuses

using local competition factor, wide competition factor, growth rate, elevation, and slope. Tree level

classification generated results that were highly influenced by the terrain data while species-level

classification by local competition factor and growth rate. The advantage of the species-level

classification is that it can group species that coexist frequently and can be used to obtain species

preferences useful for forest management. The clusters generated can then be used for simulation in

the Palanan Forest Dynamics plot for growth and yield modeling. The characteristics of each cluster

can be used as inputs for predictive growth and yield functions.

We remark that in the species-level classification, the species are represented by the average values of

the growth factors of trees under the same species. Although significant classification has been

achieved, the standard deviation and other statistics will be investigated if or not a more effective

discrimination can be achieved.

References

[1] Co L., et. al. Forest Trees of Palanan,Philippines: A Study in Population Ecology, CIDS UP

Diliman, Philippines , 2006

[2] Phillips P.D., et. al. Grouping tree species for analysis of forest data in Kalimantan (Indonesian

Borneo), 2002

[3] Kohler P., Huth A. The effects of tree species grouping in tropical rain forest modelling, Center

for Environmental Systems Research, University of Kassel, December 1998

[4] Myers N, Mittermeier RA, Mittermeier CG, da Fonseca GAB, Kent J. Biodiversity hotspots for

conservation priorities. Nature 403: 853-858. 2000.

[5] Environmental Science for Social Change. Decline of the Philippine forest. The Bookmark, Inc.

Makati. 1999.

[6] IBON Foundation, Inc. The state of the Philippine environment. IBON Foundation, Inc. Manila.

1997.

[7] Keith H, Mackey BG and Lindenmayer DB. Re-evaluation of forest biomass carbon stocks and

lessons from the world’s most carbon-dense forests. Proceedings of the National Academy of

Sciences 106 (28): 11635–11640. 2009.

[8] LaFrankie, JVJ. Trees of tropical Asia: an illustrated guide to diversity. Black Tree

Publications, Philippines. 2010.

[9] W. Hardle and L. Simar. Applied Multivariate Statistical Analysis. Springer. 2007.

creating functional groups for palanan forest plot data using...

Documents