autonomous classification of bees

Autonomous Classification of Intra- andInterspecific Bee Species Using Acoustic Signals in

Real TimeDavid Ireland

School of Information Technology and Electrical EngineeringUniversity of QueenslandBrisbane, Australia 4072

Abstract—This paper pertains to the development of a real timeclassification system for the discrimination of intraspecific andinterspecific bee species using the K-nearest neighbor and prob-abilistic neural network classification algorithms. The intendedapplications for this system are for autonomous surveillance ofinvasive bee species and monitoring tools for entomologists. Thesystem was developed on a low cost platform which showed atleast 80% classification accuracy on two intraspecific bee speciesand 100% accuracy in the classification of four distinct beespecies.

I. INTRODUCTION

With the rapid decline of insect pollinators there is anincreasing demand for tools that provide autonomous trackingof the movements and activities of pollinating insects. Thispaper focuses on the initial development of a system for thedetection and classification of bees: an essential insect in theproduction of the the global flood supply. In a cost effectiveand portable platform, our system aims to:

1) Provide a tool for entomologist to study the behaviortraits of foraging bees. For example, to determine whichbee species favor pollinating a particular agriculturalcrops.

2) Provide an autonomous surveillance system for invasivebee species that present a potential hazard to currentecosystems. Australia for example, considers the Asianbee (Apis cerana) and bumble bee (Bombus terrestris)insects invasive.

Given the enormous diversity of insects, autonomous detec-tion is a widely research fielded. Insect classification methodscan usually be placed into two broad categories: acoustic andimaging methods. A perusal of the literature shows acous-tic methods are mainly used for field measurements whileimaging approaches are conducted in a laboratory environmentusually posthumous. Examples using acoustic methods includedetection systems for insects in grain silos [1], [2] classifica-tion of mosquitoes in [3], [4] and aphids in [5]. Identifyingcrickets based on their sounds can be found in [6] and [7].Examples of insect classification using imaging systems canbe found in [8] for the identification of aquatic insects and [9]for the identification of aphids.

The method proposed in this paper relies solely on acousticsignals emitted by the insects during flight. The novelty of

this paper is the discrimination of bee insects using acousticssignals which is absent from the literature. Moreover, emphasison a cost effective, field detection/classification system in realtime is a major feature of this paper.

II. ACOUSTIC INSECT DETECTION

Insect classification by acoustic signals emitted during flightis not a new technique. The method relies on the phenomenonthat the acoustics emitted by an insect in flight has anfundamental frequency approximately equal to the wing-beatfrequency of the insect [11]. Further spectrum analysis alsoreveals a harmonic series where often the dominant frequen-cies are not the fundamental frequency [11]. Figures 2 and 3give the spectrogram of two distinct bee species, Apis melliferaand Amegilla cingulata. Both waveforms have similar funda-mental frequencies 220Hz however, in the latter example thefundamental frequency is not the dominant frequency and hasmore power in the higher harmonics as opposed to the Apismellifera species.

A statistical analysis done by [10] has shown the the wing-beat frequency (and thus the produced fundamental frequency)is inversely proportional to the wing area of the insect.Given the extensive variation in insect anatomy, the wing-beat frequency and the associated harmonic series, a featureset can be extracted electronically. This work was inspired byMoore et al in [3], [4], [5], who pioneered the use of insectdiscrimination using the harmonic sets.

Figure 1 provides a flowchart of our proposed field system.The system records continuously where after some duration,the fundamental frequency fo is determined. If fo is de-termined to be in a region of interest, a feature vector isconstructed from the audio sample and subsequently classifiedand logged. Section II and III will discuss the feature vectorextract and considered classifications used in this instance.

III. FEATURE VECTOR GENERATION

Given the harmonic nature of the signal emitted by insectsin flight, the cepstrum method was used in determining thefundamental frequency. This method involves first finding thecepstrum power using:

C (q) =∣∣∣F

{log

(|F {y (n)}|2

)}∣∣∣2

(1)

Recordaudiosample

Computef o

Discardaudiosample

Log event

f o ∈f mino , f max

o

Extractfeaturevector

Classifyfeaturevector

no

yes

Fig. 1. An overview flowchart of the proposed detection/classification system.

Fig. 2. Spectrogram of an acoustic signal emitted by an European honeybee Apis mellifera during flight.

where y (n) is the sampled waveform, F (·) denotes theFourier transform, and q are the units of the cepstrum powerreferred to as the quefrency. Subsequently fo is found by:

fo =fs

argmaxq

C (q)(2)

where fs is the sampling frequency. In order to scale fo

for the classification algorithm, we propose the followingnormalisation function:

f∗o =

(fo − fmin

o

)

(fmaxo − fmin

o )(3)

where fmino and fmax

o are the minimum and maximumpossible ranges of the fo.

The next step in creating the feature vector is to computerelative power at multiples of the estimated fo. This is firstachieved by summing the power spectrum density Gy (f) atthe harmonics of interest. In this instance we considered ±5%at the harmonic regions. Using a sampling frequency fs of44.1KHz and fast Fourier transform length of fs, we have foreach multiple n after some simplifications:

Fig. 3. Spectrogram of an acoustic signal emitted by an Australia bluebanded bee Amegilla cingulata during flight.

hn =d 21nfo

20 e∑

f=b 19nfo20 c

Gy (f) ∀n = 1, . . . , Nh (4)

where Nh is the number of multiples considered, this isan arbitrary value and dependent on the bee species to bedetected. Functions b·c and d·e denote the floor and ceilfunctions respectively. The hn values are further normalisedusing:

h∗n =hn

max {h1, h2, . . . , hNh} (5)

Finally the feature vector is defined as:

x ={f∗o , h∗1, h

∗2, . . . , h

∗Nh

}(6)

For future reference we denote the length of x as Nx whereNx = Nh + 1.

IV. CLASSIFICATION ALGORITHMS

For convenience we define the classification of the featurevector x as the function:

D (x) ∈ {1, 2, . . . , Nclass} (7)

where Nclass is the number of classes (or bee insects) consid-ered.

A. K-Nearest Neighbor Method

The K-nearest neighbor method (kNN) is a widely usedclassification method. Given an unknown sample, the kNNmethod finds the K nearest objects (training data) typicallyusing the Euclidean distance as a metric. Subsequently, thesample is classified based on a majority vote of the K objects.For example, if:

{x1,x2,x3 . . . ,xK} (8)

denote the K nearest feature vectors to the unknown featurevector, determined by some distance metric, then the newlyassigned class is determined by:

k = M{D (x1) ,D (x2) , . . . ,D (xK) , } (9)

where M (·) computes the mode of the dataset of classes. TheEuclidean distance metric was used in this paper for all usesof the kNN algorithm.

B. Probabilistic Neural Network

Probabilistic neural networks (pNN)s are a practical meansof implementing Bayesian classification techniques. If anobject is to be classified into one of two classes denoted i andj, then class i is chosen according to Bayes optimal decisionrule:

`icifi (x) > `jcjfj (x) (10)

` denotes the loss associated with misclassifying x, hi is theprior probability of occurrence in the ith class, and fi (x)is the posterior probability density function (PDF) for theith class. In practise fi (x) is usually not known and mustbe estimated using Parzen’s method. This involves taking anaverage sum of a suitably chosen kernel for each observationin the training data [12].

The Gaussian function is a common choice for the kernelas it is well behaved and easily computed [12]. After somesimplification the estimated PDF for a particular class withNk training observations becomes:

fk (x) =1

Nk

Nk∑

i=1

exp

(||x− xki||2

σ2

)(11)

where xki is the ith example of the training data for class kand σ is a scaling parameter that controls the area of influenceof the kernel. There is no rigorous mathematical method todetermine an optimal σ, however, the author has found asimple first-order optimisation approach such as the gradientdescent method [13] quite efficient in determining a suitableσ for the training set prior to the system being placed online.

Assuming the misclassification loss and prior probabilitiesof occurrence are constant, the class belonging to the featurevector is determined by:

D (x) = argmaxn

{f1, f2, . . . , fn, . . . , fNclass} (12)

V. EXPERIMENT SETUP

A. Hardware

An algorithm to perform the operation given in figure 1was programmed on a FriendlyARM mini2440 platform [14].This platform features a 400MHz Samsung ARM9 processorwith on board circuitry for sound recording and USB interfacefor data storage. The platform is capable of running bothLinux and Windows based operating systems. It was a powered

by a 12V lead acid battery. The cost of the platform isapproximately $90AUD.

The developed classification software was written in C++and provided continuous recording using dual threads anddual alternating buffers. Two threads were initially created,these will be referred to as the recording thread and theclassification thread. The recording thread continuously placedaudio samples into an available buffer while the classificationthread waits for a buffer to be full (1 second of recording time).Once a buffer was full, the recording thread redirects the audiosamples into the second buffer while the classification threadcomputes the fo of the waveform stored in the full buffer andsubsequently classifies the waveform if the right conditions aremet i.e. fo ∈ fmin

o , fmaxo . Continuous recording was found

to be met while the fo computation and classification stagesrequired no more than 1 second of computation time. Thefreely available FFTW subroutine library [15] was used tocompute the PSD. This library is considered to be the mostefficient freely available library for computing the fast Fouriertransformation. Benchmarks performed on on a variety ofplatforms show that FFTW’s performance is typically superiorto that of other publicly available FFT software, and is evencompetitive with vendor-tuned codes [15]. Figure 4 providesa photo of the classification system being tested on a colonyof Apis mellifera honey bees.

Fig. 4. Photo of the classification system being tested on a colony of Apismellifera honey bees.

B. Classification Performance Criteria

The performance of the algorithm was determined by theamount of successful classifications that occurred during thetesting. We mathematical define the function:

gi ={

1 if D (xi) = k0 if D (xi) 6= k

where xi is the ith testing sample. The total error whichrepresents the number of successful classifications is givenas:

ε =1

Ntest

Ntest∑

i=1

gi (13)

where Ntest the number of testing samples.

VI. EXPERIMENT 1: COLONY CLASSIFICATION

The first study presented in this paper is on the efficacy ofthe classification system to classify between two intraspecificcolonies of European honey bees (Apis mellifera) with anarbitrary size of training data. The system was given a totalof Ntrain training samples with a 50% distribution of trainingsamples for each colony. Each training system was audiblechecked to ensure it contained a acoustics signal produced bya bee and had a fo ∈

[fmin

o , fmaxo

], where fmin

o = 200Hzand fmax

o = 250Hz. The system was stopped running afterit had classified 100 bees. This was repeated 5 times with theclassification error defined in equation 13 evaluated at eachinstance. It has been observed a priori, that the harmonicsemitted by the Apis mellifera bee have negligible amplitudepassed the 3rd harmonic there,fore Nx was set to 4.

The results of this experiment are given in table I where εµ

denotes the mean of the classification error for each systemrun. Evidently with a minimal training size of 2, the systemis able to obtain, on average, at least 61% and 54% accuracyusing the pNN and kNN algorithms respectively. There wasobservable increase in classification accuracy as the number oftraining samples increased, on average, 78% and 72% accuracywas obtained for the pNN and kNN algorithms respectively.The pNN algorithm is seen to be the more accurate algorithm.

TABLE IPERCENTAGE OF SUCCESSFUL CLASSIFICATIONS DETERMINED BY

EQUATION 13 FOR THE PNN AND KNN ALGORITHMS AS A FUNCTION OFTRAINING SIZE Nt FOR WHEN Nx = 4.

Ntrain Alg. ε1 ε2 ε3 ε4 ε5 εµ

pNN 67% 38% 59% 73% 69% 61%2kNN (k = 1) 45% 24% 59% 72% 69% 54%

pNN 65% 72% 70% 80% 71% 72%10kNN (k = 5) 64% 72% 67% 73% 68% 69%

pNN 79% 75% 76% 86% 73% 76%20kNN (k = 5) 68% 77% 72% 73% 63% 71%

pNN 71% 78% 77% 73% 74% 75%40kNN (k = 5) 63% 74% 67% 69% 73% 69%

pNN 79% 78% 78% 76% 80% 78%100kNN (k = 5) 68% 77% 76% 70% 73% 73%

VII. EXPERIMENT 2: INTERSPECIFIC BEECLASSIFICATION

The second experiment presented in this paper is on theefficacy of the classification system to classify between fourdifferent bee species. The species include the Asian honeybee

TABLE IITABLE OF THE WING-BEAT FREQUENCY ESTIMATIONS FOR FOUR

DIFFERENT BEE SPECIES. ESTIMATIONS DONE IN PRESENT STUDY WEREAVERAGE VALUES FROM THE TRAINING SET

Species fo Citation

Apis cerana 265Hz Present studyApis cerana 306Hz [18]

Amegilla cingulata 229Hz Present studyApis mellifera 225Hz Present studyApis mellifera 240Hz [16]Apis mellifera 197Hz [17]

Bombus terrestris 175Hz Present studyBombus terrestris 156Hz [16]Bombus terrestris 130Hz [17]

(Apis cerana), the native Australian blue banded bee Amegillacingulata, the European honey bee (Apis mellifera) and thebumble bee (Bombus terrestris). In Australia, the Apis ceranaand Bombus terrestris are prohibited species and thereforeobtaining audio recordings of this insects in flight very difficultto obtain. As such, the author obtained permission to use audiorecordings taken by amateur entomologist in Japan for the Apiscerana bee and in South America for the Bombus terrestrisbee. From these audio recordings and further recordings donelocally, a training set was constructed which contained 5, 1-second audio samples of the bee species under consideration.The centroids of the testing set for each species are givenin figure 5. There is evidently a large variation in wing-beatfrequency and the distribution of power in the harmonics.To provide some evidence to the veracity of this figure, theaverage recorded fo measured wing-beat frequency for eachspecies was compared to values cited in the literature shown intable II. Generally consistency is shown between previouslycited values for all species except Amegilla cingulata as noliterature value could be found.

Due to the small number of training and testing samples, theclassification system was tested offline. A testing sample wasremoved from the training set and applied to the classificationalgorithm. Table III provides the results for Nx = 1 i.e.only the wing-beat frequency is used in the classificationalgorithms, and Nx = 12. As seen, both algorithms performedthe same and were able to provide 88% accuracy when usingonly the fo as the classification feature. However, when bothalgorithms were given the complete feature vector (Nx = 12)both algorithms achieved 100% classification accuracy. Itwould also appear the kNN in this instance preferred a lowvalue of k. Given the figure 5, these results are not surprisingas there is significant difference between feature vectors forthe different bee species. It is also apparent the wing-beatfrequency can also be reasonable feature in interspecific beediscrimination.

1 2 3 4 5 6 7 8 9 10 11 120

1 Apis cerana

hn

1 2 3 4 5 6 7 8 9 10 11 120

1 Amegilla cingulata

hn

1 2 3 4 5 6 7 8 9 10 11 120

1 Apis mellifera

hn

1 2 3 4 5 6 7 8 9 10 11 120

1 Bombus terrestris

n

hn

Fig. 5. Centroid of the training samples for the four bee species: Apis cerana,Amegilla cingulata, Apis mellifera and Bombus terrestris. fmin

o = 150Hzand fmax

o = 300Hz.

TABLE IIIPERCENTAGE OF SUCCESSFUL CLASSIFICATIONS DETERMINED BY

EQUATION 13 FOR THE PNN AND KNN ALGORITHMS FOR WHEN Nx = 1AND Nx = 12.

ε

Algorithm Nh = 1 Nh = 12

pNN 88 % 100%kNN (k = 1) 88% 100%kNN (k = 2) 88% 100%kNN (k = 3) 71% 82%kNN (k = 4) 82% 88%kNN (k = 5) 65% 76%

VIII. CONCLUSION

This paper has presented a system for the surveillance andclassification of bee insects in real time using the acousticsemitted by the insects during flight. The intended purpose ofthis system is in the surveillance of invasive bee species andtools for the tracking of bee behavior for new entomologystudies. Extraction of a feature vector from the sampled acous-tic waveform was described followed by two classificationalgorithms implemented on a low cost prototype platform.

The first experiment pertained to the intraspecific of twocolonies of Apis mellifera colonies. An average classificationaccuracy of 79% was obtained using a probabilistic neuralnetwork. The second experiment pertained to the interspecificclassification of four distinct bee species. 100% classificationaccuracy was obtained using both the probabilistic neuralnetwork and k-nearest neighbor methods. This shows intraspe-cific classification is possible and obtains reasonable accuracywith the proposed algorithms. The results of interspecificclassification were very promising given, albeit, a limitedtraining and testing set.

Future work on the proposed system includes the inclusionof more bee species in the training set and the extension ofwireless connectivity for event notification. Subsequently thesystem is expected to deployed in a wider area and operatedfor long periods of time.

ACKNOWLEDGMENT

The author would like to thank Yu’s apiaries for the use oftheir beehives and the amateur and professional entomologistswho donated their audio recordings of various insects. Theauthor acknowledges the technical assistance given by Dr.Konstanty Bialkowski of the University of Queensland.

REFERENCES

[1] K.M. Coggins and J. Pricipe,, Detection and classification of insectsounds in a grain silo using a neural network, Neural Networks Pro-ceedings, 1998. IEEE World Congress on Computational Intelligence,vol.3, pp.1760-1765, 4-9 May 1998

[2] F. Fleurat-Lessard, B. Tomasini, L. Kostine and B. Fuzeau, Acousticdetection and automatic identification of insect stages activity in grainbulks by noise spectra processing through classification algorithms,Proceedings of the 9th International Working Conference on StoredProduct Protection, 15 - 18th October 2006, Campinas, Sao Paulo, Brazil.

[3] A. Moore, J.R. Miller, B.E. Tabashnik and S.H. Gage, Automated iden-tification of flying insects by analysis of wingbeat frequencies, J. Econ.Entomol. 79: 1703-1706

[4] A. Moore, Artificial neural network trained to identify mosquitoes inflight, Journal of insect Behavior, Vol. 4 No. 3 1991

[5] A. Moore and R.H. Miller Automated identification of optically sensedaphid (Homoptera: Aphidae) wing waveforms, Annals of the Entomolog-ical Society of America, 95(1):1-8, 2002

[6] I. Potamitis, T. Ganchev and N. Fakotakis, Automatic acoustic identi-fication of insects inspired by the speaker recognition paradigm, IN-TERSPEECH 2006 - ICSLP, 9th International Conference on SpokenLanguage Processing Pittsburgh, PA, USA September 17-21, 2006

[7] E.D. Chesmore, Application of time domain signal coding and artificialneural networks to passive acoustical identification of animals, AppliedAcoustics 62 (2001) 13591374

[8] M. J. Sarpola, R. K. Paasch, E. N. Mortensen, T. G. Dietterich, D. A.Lytle, A. R. Moldenke and L. G. Shapiro, An aquatic insect imagingsystem to automate insect classification, Transactions of the AmericanSociety of Agricultural and Biological Engineers, 51(6): 2217-2225. 2008

[9] R. Kumar, V. Martin and S. Moisan, Robust insect classification appliedto real time greenhouse infestation monitoring, IEEE ICPR workshop onVisual Observation and Analysis of Animal and Insect Behavior, Istanbul,2010

[10] M. Deakin, Formulate for insect wingbeat frequency, Journal of InsectScience, 10(96):1-9 2010

[11] R. Dudley, The Biomechanics of insect flight, Princeton University press,Oxfordshire United Kingdom.

[12] T. Masters, Practical neural network recipes in C++, Morgan Kauf-mann, 1st edition Academic Press Inc. (April 14, 1993)

[13] J. A. Snyman, Practical mathematical optimization: An introduction tobasic optimization theory and classical and new gradient-based algo-rithms. Springer Publishing. 2005

[14] FriendlyARM. [Online]. Available: http://www.friendlyarm.net [Ac-cessed: April 12th, 2011].

[15] FFTW, [Online]. Available: http://www.fftw.org/ [Accessed: April 12th,2011].

[16] O. Sotavalta, The essential factor regulating the wing stroke frequencyof insects in wing mutilation and loading experiments and in experimentsat subatmospheric pressure. Ann. Zool. Soc. ”Vanaino” 15, 1-67

[17] D. N. Byrne, Relationship between wing loading, wingbeat frequencyand body mass in Homopterous insects, Journal of Experimental Biology,135, 9-23, 1988

[18] N.P. Goyal and A.S. Atwal, Wingbeat frequency of A. indica indica Fand A. mellifera L., Journal of Apiculture Research, 16:4748, 1977

autonomous classification of bees

Documents