Predicting species identity of bumblebees through analysis of flight buzzing sounds

Abstract We present a study of buzzing sounds of several common species of bumblebees, with the focus on automatic classification of bumblebee species and types. Such classification is useful for bumblebee monitoring, which is important in view of evaluating the quality of their living environment and protecting the biodiversity of these important pollinators. We analysed natural buzzing frequencies for queens and workers of 12 species. In addition, we analysed changes in buzzing of Bombus hypnorum worker for different types of behaviour. We developed a bumblebee classification application using machine learning algorithms. We extracted audio features from sound recordings using a large feature library. We used the best features to train a classification model, with Random Forest proving to be the best training algorithm on the testing set of samples. The web and mobile application also allows expert users to upload new recordings that can be later used to improve the classification model and expand it to include more species.


Introduction
Bumblebees (genus Bombus from the bee family Apidae) are social insects that play an important role in the ecosystem as pollinators of various plants. Bumblebees typically have bulkier bodies than honeybees, which allows them to be active at lower temperatures and in a wider variety of weather conditions. Furthermore, unlike domestic honeybees, bumblebees use a technique called buzz-pollination or sonication to extract pollen from flowers of certain plants that release pollen only through small openings in the anthers' tips by shaking the anthers (De Luca & Vallejo-Marín 2013). This makes bumblebees the key pollinators of plants such as clover or tomatoes. Cranberries, blueberries and kiwifruit also benefit from buzz pollination (Buchmann 1985). Shipping bumblebee families to greenhouses has even become a lucrative business. While only some species are of commercial interest, all bumblebee species are important in natural ecosystems, with certain plants depending on a single species for pollination. Worldwide, there are around 250 known bumblebee species (Williams & Osborne 2009;Grad et al. 2010). The highest diversity is found in the temperate regions of Asia where the genus originates. Bumblebees are also common in Europe, North Africa, North America, and the mountains of Central and South America. They have also been introduced to other regions, such as Australia, New Zealand and South Africa, for agricultural purposes (Grad et al. 2010). Central European countries show a relatively high diversity in Bombus species, including the subgenus Psithyrus (Rasmont & Iserbyt 2010-2013. Forty species have been identified in Germany and Switzerland, 45 in Austria (Schwarz et al. 1996), 39 in Poland and 35 in Slovenia (Grad et al. 2010). However, studies in the last decade (Williams & Osborne 2009) have demonstrated that bumblebee species are declining worldwide, with possible reasons being related to land-use change and agricultural practices. The decline of pollinator numbers was also highlighted in the recent report of the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services, pointing out that further declines would present serious risks in production of foods, such as seeds and fruits, which rely on these species (Gilbert 2016). Consequently, methods for quick and accurate automatic recognition of bumblebee species are becoming a matter of increased interest.
Classification of bumblebee species typically relies on visible morphological characteristics, since species differ in body size and structure or hair colour and pattern. Within the same species, queens are typically larger than workers, while males can differ either in colouration or other physical characteristics, such as the number of dorsal plates (tergites) in the abdomen or the length of the antennae. Males also lack pollen baskets on their hind legs. Several web or mobile applications are available to help with visual identification (Trilar 2014;Bumblebee Conservation Trust 2015). These classification techniques require active human involvement in the decision procedure, which makes them time-consuming, and demand skilled observers with visual integrity, since several bumblebee species are similar to an untrained eye. Automatic classification based on visual features (image recognition) would be beneficial, but it is difficult due to complications arising from bumblebee orientation, image quality, light condition or background. On the other hand, bumblebee buzzing sound is relatively easy to acquire and can in principle be collected remotely and continuously, which is more practical than in traditional scientific surveys that collect individuals.
Automatic classification of animal sounds has been attempted before for several different animal groups, such as birds and frogs. Huang et al. (2009), used k-th Nearest Neighbour (kNN) and support vector machines (SVM) classifiers to recognize the frog species based on three extracted audio features, resulting in around 90% classification accuracy for six species. Cheng et al. (2010) attempted to distinguish between four different species of passerine birds using probabilistic models, especially the Gaussian mixture model (GMM), and Mel-Frequency Cepstrum Coefficients (MFCCs) as the audio features. They reached around 90% accuracy of classification. A similar approach was used by Lee et al. (2008) on a larger set of birds, reporting an overall 84% classification accuracy on a set of 28 bird species. The classification reached 100% accuracy for several species, while it was significantly lower (even less than 10%) in few cases. The authors acknowledged the limits of their method due to a limited amount of data and a lack of standard test data. Other groups compared performances of different machine learning algorithms for classification of birds and frogs (Acevedo et al. 2009) and birds (Lopes et al. 2011). The SVM algorithm produced good classification results in both cases, with classification accuracies above 90% in both cases.
In the field of insect sound classification, substantial work has been done with two different goals -to monitor biodiversity and to detect pests (such as larvae in timber) for phytosanitary applications (Chesmore 2008). Different types of artificial neural networks, including multilayer perceptron (MLP), self-organizing map, and learning vector quantization have been used for classification of cicada and grasshopper species based on their sound and for identification of beetle species based on sounds, generated by their larvae biting on wood fibres. In these studies, the classification accuracy based on 3 or 4 different species in each case, was typically above 80%. Chesmore and Nellenbach (2001) also demonstrated that it is possible to correctly identify 25 British Orthoptera species with up to 99% accuracy; however, the authors noted that they were performing the study using high-quality recordings with no interfering signals. Ganchev et al. (2007) used an approach that is similar to the one employed in human speech recognition. They used a series of linear frequency cepstral coefficients as feature vectors and various approaches, including probabilistic neural networks and GMM, to build classification models. On a set of 313 species of crickets, katydids and cicadas, they reported 86% classification accuracy on a species level, with the accuracy further increasing if the classification was performed on a genus or subfamily level.
Currently, various web and mobile applications for recognizing animal species from sounds exist; however, none of them has been available for bumblebees until now. One should also note that animal sound recognition is typically based on detecting structured sounds like bird songs or frog calls whereas flying insects create sounds in the form or rather monotonic buzzing. Therefore, the methods applied for bird and flying insect recognition might differ due to the different input. Still, bumblebees can produce different buzzing sounds under different circumstances. We distinguish buzzing during flight, sonication and hissing. These sounds are produced by oscillations of the flight muscles inside the metathorax. We define the natural frequency as the frequency at which an undamped system will vibrate in the absence of an opposing force (King et al. 1996) -which is the case during the flight. In addition, bumblebees produce sounds with significantly higher frequencies during sonication (King et al. 1996) to release pollen from certain types of flowers. Bumblebees achieve vibrations by placing their thorax close to the anthers and contracting their flight muscles at a high frequency of about 400 Hz (King 1993;Goulson 2010). When disturbed, bumblebees hiss (Kirchner & Röschard 1999). Hissing was found to be a defence mechanism intended to chase away potential intruders, such as mice, from the nests. It was demonstrated that hissing can be triggered by vibrating the nest or increasing the CO 2 concentration, both of which can be related to the presence of an intruder (Kirchner & Röschard 1999). Higher frequencies of both sonication and hissing sounds have been linked to the reduced inertia of the flight system by decoupling the wings -by moving the flight muscles without moving the wings (King et al. 1996;Kirchner & Röschard 1999;De Luca & Vallejo-Marín 2013;De Luca et al. 2014). De Luca et al. (2014 further studied pollination and defence buzzes in five bumblebee taxa in relation to body size. In addition to these characteristic sounds, we noticed that bumblebees produce audibly different sounds when cold, ventilating the nest, or when trapped in a large closed space, such as a room. In this paper, we focus on analysing buzzing sounds of several common bumblebee species of Central Europe. The first part deals with natural frequencies (from the buzzing sound produced during flight) of different bumblebee species and castes (queens and workers). In addition, we present a sonogram of a B. hypnorum worker, recorded in various situations (flight, buzz pollination, hissing and buzzing in a room).
The second part of the paper focuses on automatic classification of bumblebee species and castes based on the buzzing sound. Here, we only consider the sound produced during flight, as it is the easiest to obtain in the field, and represents the characteristic sound of undisturbed bumblebees. Different machine-learning algorithms were tested to build classification models, using several audio features calculated from original sound recordings. We discuss the accuracy of our classification approach and consider possible further improvements.

Data acquisition
Buzzing sounds were recorded using a Yamaha Pocketrak PR7 recorder. Samples were recorded at 24 bit/96 kHz and written in the .wav format. Sound recordings were obtained for 12 bumblebee species, both for queens and workers, except for B. argillaceus and B. terrestris, where recordings were obtained only for queens, and B. jonellus, only for workers. For all species, recordings were obtained for bumblebees during foraging -visiting flowers. The length of individual recordings ranged from several seconds to over a minute. Additionally, in order to analyse the buzzing sound in different scenarios, a B. hypnorum worker was recorded in various circumstances, e.g. while feeding on an Aquilegia vulgaris flower (sonication), trapped in a glass jar, and in a room. All recordings were obtained in spring and summer months of 2014 and 2015 on various locations in Slovenia.

Sound processing
In the first step, original recordings were manually cut to segments up to 5 s long, and parts with no bumblebee sound were discarded. In addition, the segments where the background noise was significantly interfering with the buzzing were excluded as well. No additional pre-processing was used.
Sound recordings were analysed using the Audacity and Matlab software. Natural frequencies for bumblebees were obtained using the Fourier transform of the recordings.

Acoustic feature extraction
Audio feature extraction is applied to transform raw audio data into features that explicitly represent properties of the data that may be relevant for classification. The features were extracted using the openSMILE feature extraction tool (Eyben et al. 2010). The software takes a .wav file as input and then computes 1582 numerical features. These features include, among others, the MFCCs, which typically perform well in audio classification scenarios as seen in related work (Lee et al. 2008;Cheng et al. 2010).
It is quite common approach in several machine learning communities to generate an abundant number of features, estimate their quality and construct a much smaller subset of most relevant attributes for further use (Robnik-Šikonja & Kononenko 2003). The smaller set typically enables better performance in terms of quality, manageability and processing speed. For the bumblebees, the best 100 features were chosen based on the information gain (IG) as the feature quality measure, with 100 being a number that proved a reasonable choice in previous experiments in similar domains. The list of 100 chosen attributes is provided in Supplementary Information.
In information theory, the IG of a particular feature i describes the change in information entropy H after this feature is used to split the training data T into subsets, Entropy H is a measure of unpredictability of information (about bumblebee species), so it is low in subsets that are pure with respect to the particular species and vice versa. If a feature i discriminates between the species well, H(T|i) is significantly lower than H(T), and IG(T, i) is therefore high.
By using the IG, the features with the highest potential to discriminate between the species in the training set were chosen. The values of the best extracted features were then computed for each recording and saved into a database for machine learning.

Machine learning and classification
Machine learning algorithms have been used extensively to recognize patterns in large sets of data in various applications (Vidulin et al. 2014;Gjoreski et al. 2015). For bumblebee classification, the WEKA open source machine learning software (Hall et al. 2009) was used. Four different algorithms were used for training: J48 tree, Naïve Bayes, SVM, and Random Forest, to enable comparison of performance of different methods.
J48 builds decision tress in which internal nodes correspond to (audio) features, branches to different values of the features, and leaves to classes (bumblebee species). Classification of an example (bumblebee sound) starts at the root and proceeds along the branches corresponding to the feature values of the example, until a leaf is reached and its class assigned to the example. J48 decision trees are built in steps, always using the feature with the highest IG to create the split at a node, until the training data at a node are pure enough with respect to the class. Naïve Bayes computes the probability that an example belongs to a class based on the frequency of its features in the training data belonging to that class. It then classifies the example into the class with the highest probability. It is called naïve because it assumes that the value of each feature is independent of the value of any other feature. Although this assumption does not always hold in real cases, Naïve Bayes still often produces good classification results. SVM approach is more elaborate. It places data in a multidimensional space, where each dimension corresponds to one feature. It then searches for a hyperplane splitting the space in two, so that each side contains examples belonging to one class. Random Forest is -as the name suggests -a set of multiple decision trees. Each tree is built on a randomly chosen subset of data using a randomly chosen subset of features, which prevents the correlation of the trees. To classify an example, it is classified by all the trees in the forest, which then vote for the final class (Hastie et al. 2009). Of these four methods, only J48 creates a comprehensible model, i.e. a single decision tree.
To evaluate the classification accuracy of each of the models, the data were split into a training set (80% of the samples) and testing set (the remaining 20%). Since several audio samples were created by splitting longer recordings into shorter segments, special attention was paid to always allocating all samples from an individual recording either to the training or testing set. The models were constructed from the training set. A 5-fold cross-validation IG(T, i) = H(T) − H T|i of the training set was performed to assess the quality of the set itself. In the cross-validation process, the training set is split into five smaller subsets. In each evaluation run, four subsets were used to train the algorithm and the remaining one was used for testing. In this procedure, parts of a single longer recording can be used both in training and testing sets. We consider this acceptable, since small variations in conditions during each long recording (the distance between the bumblebee and the microphone, differences in the environmental noise, etc.) nevertheless make each sample distinct from the others. Using the confusion matrix, we identified the bumblebee types that are most commonly misclassified as another type and used this knowledge on the testing set that consisted of independent recordings.
In the field of machine learning, a confusion matrix, also known as a contingency table or an error matrix, is a table that allows visualization of the performance of an algorithm. Each column of the matrix represents the instances in a predicted class, while each row represents the instances in an actual class. The name stems from the fact that it makes it easy to see if the system is confusing two classes (i.e. commonly mislabelling one as another).

Frequency analysis
As noted previously, bumblebees can produce different types of buzzing, depending on the circumstances. Figure 1 shows a spectrogram of different types of buzzing for a B. hypnorum worker, chosen for a detailed analysis as one of the more common bumblebee species. Buzzing types are clearly distinct from one another. While flying, the spectrogram is time-independent and consists of natural frequency of around 200 Hz, together with higher harmonics. Slight temporal variations of the amplitude are related to the sound intensity since the distance of the bumblebee from the microphone varies. During the sonication section, two parts are clearly visible -the part when the bumblebee lands on the flower, with frequencies that are the same as in the regular flight, and the sonication itself, where the fundamental frequency increases significantly (around 300 Hz). A shift from the natural frequency is also prominent when the bumblebee is hissing. The strongest frequency component in this case comes at around 700 Hz. When bumblebee was flying in a room, the spectrogram is similar to the regular flight sound, only the fundamental frequency is lower (170 Hz). A slight temporal variation in the main frequency can be observed as well.
As demonstrated, bumblebee buzzing varies depending on the situation. A detailed analysis of the connection between different buzzing types and morphological structures responsible for each type is out of the scope of the paper. However, we notice that buzzing during flying is roughly time-independent and can be used to approximately discriminate between different species and castes. In the rest of this paper, we deal only with bumblebees flying under normal circumstances, without distortions due to surroundings such as glass or room. Capturing data only during normal flight is important for real-life testing, as demonstrated in Figure 1. For a detailed analysis of defensive and pollination buzzing sounds in several bumblebee species, see also De Luca et al. (2014). Table 1 lists the average natural frequencies of 12 species of bumblebees, in most cases including both queens and workers (in three cases, only one of the two types were found, therefore there are 21 bumblebee types in the table). In most cases, at least 5-10 different individuals were recorded -except for the species that were more difficult to find, where the number was lower.
It is interesting to observe that the natural frequencies of queens are always lower than those of workers. This is consistent with the fact that queens typically have a considerably larger body size, which is also shown in Figure 2, in a body length vs. natural frequency plot. Figure 3 shows a comparison of spectra for queens of two different species, B. sylvarum and B. lucorum. Again, the natural frequency of B. lucorum, which is a larger species, is much lower than that of B. sylvarum.   Figure 2 shows that the natural frequency alone is not sufficient to discriminate well between different species and castes: in our study, eight types of bumblebees have natural frequencies in a frequency window less than 15 Hz wide. Considering also the experimental error in frequency determination and different sizes of individuals, this makes classification based solely on a single parameter unreliable. On the other hand, bumblebee buzzing is more complex since morphological characteristics of different bumblebee types result in different widths of spectral lines, small peaks at additional frequencies in the spectra, etc. Such subtle differences are better considered using a multitude of audio features together with a machine learning algorithm.

Machine learning on the training set
In the evaluation process, we only considered the classes where a reasonable number of samples was available for the training set in order to avoid overfitting -constructing a classifier that works well on the training data but generalizes poorly to other data. Therefore, the evaluation was carried out on 17 classes. The training set consisted of 1120 samples.
In the first classification evaluation (5-fold cross-validation) on the training set only, the best results were obtained using the Random Forest algorithm, with 82.7% of the samples classified correctly. The accuracies for other models were significantly lower, 67.8% for J48, 52.2% for Naïve Bayes, and 74% for SVM. At this point, it is informative to have a look at the so-called confusion matrix (Table 2) for the Random Forest algorithm, which shows the actual and the predicted classification. The values in the diagonal represent correctly classified samples (correct species determined by the row name), while off-diagonal elements represent misclassifications as the wrong species (columns). In some classes, the classification accuracy is excellent. For example, B. pratorum workers are classified correctly in 94% of the cases and B. humilis workers in 96%. On the other hand, none of the samples for B. ruderarius worker and B. sylvarum queen were classified correctly due to the small number of samples in each of these classes. For a more reliable test, a larger number of samples is required.
The confusion matrix also provides information which bumblebee types are commonly misclassified as particular other types (seen as the off-diagonal elements in the confusion matrix). For example, B. lucorum queens and B. hypnorum workers are sometimes misclassified as B. hortorum queens. In a similar manner, B. hypnorum workers and B. humilis queens are sometimes misclassified as B. pascuorum workers.

Internet application and evaluation on the testing set
The knowledge about classes that are often confused was used when creating a web and mobile application (animal-sounds.ijs.si) intended for general use. In an attempt to improve the classification accuracy when classifying a new recording, the application presents the most likely class as the main output, accompanied by one or two most likely alternatives (see Figure 4). Since the application is intended for field use, it also displays photographs of the corresponding bumblebee species and castes -which may assist the user with the final classification.
The application allows the user to record the bumblebee sound in the field with a mobile phone and immediately send it to the server (located at Jožef Stefan Institute) where the classification algorithm is running. Mobile or wireless data connection has to be available Table 2. confusion matrix for 5-fold cross validation of the training set, using the Random Forest classifier, built on 17 classes of bumblebees, including queens (q) and workers (w). the numbers in the matrix correspond to correctly (diagonal elements, bold) and incorrectly (out-of-diagonal elements) samples in the set.  for data transfer. Alternatively, sound recordings can be stored to the smartphone and analysed later. The output of the classification algorithm is sent back to the smartphone and the results are displayed on the screen. The process from submitting the recording to retrieving the results typically takes around half a minute. The web-based application has the same functionality as the mobile one, except for the live recording option.
To independently evaluate the performance of the application, it was tested on the testing set consisting of samples originating from different recordings than those used in the training set. If an individual sample was classified as one of up to two classes that are most often confused with the class in question, we still considered the classification to be accurate, since these options are presented as alternatives. Again, the Random Forest algorithm produced the best results, with 86% of the 260 samples classified correctly. In this test, several types were classified with a very high accuracy, such as 100% of B. hypnorum workers and 98% of B. sylvarum workers. On the other hand, none of the samples of B. ruderarius workers and B. sylvarum queens were again classified correctly, as a direct consequence of the small number of samples in the training set, which resulted in the model not being reasonably trained for those classes. The overall classification accuracy is comparable to those in other studies mentioned in the description of related work, where the accuracy ranged between 80 and 90%. One study (Chesmore & Nellenbach 2001) reported almost 99% accuracy, but in that case the recordings were made in a highly controlled environment with no interference, which is not comparable to field recordings performed in our experiments.

Discussion
We analysed buzzing sounds of several common species of Central European bumblebees, including queens and workers. Buzzing sounds depend both on morphological characteristics of different bumblebee types and on the situation a bumblebee finds itself in. In the case of B. hypnorum worker, we analysed flying sound, sonication, hissing and the sound that a bumblebee produces when trapped in a room. The flying sound is well resolved and roughly time-independent. It consists of the natural frequency (the frequency at which the wings oscillate) and its higher harmonics. Both in case of sonication and hissing, higher frequencies become more prominent. When trapped in a room, the frequencies are slightly lower than when bumblebee is flying outside.
Natural frequencies for 12 bumblebee species, including queens and workers, were determined. Consistently with the fact that queens are larger than workers, the natural frequencies of queens are lower than those of workers. The same effect can be seen also when comparing species that are of different sizes. Although a positive correlation between the bumblebee body size and the natural frequency exists, focusing solely on the natural frequency for characterization is not sufficient to reliably distinguish between different bumblebee species. On the other hand, a machine-learning approach considered also additional, less obvious features in the buzzing sound.
Following the selection of audio features with the highest IG, machine learning algorithms were used on a set containing over 1000 samples in 17 classes. 5-fold cross-validation of the training set produced over 82% classification accuracy using the Random Forest algorithm, which is considered a good result. Furthermore, the confusion matrix provided information on which types of bumblebees are commonly misclassified as particular other types. This knowledge was incorporated in the web application to display alternatives to type considered most likely by the classifier. When testing this application on an independent testing set, the overall classification accuracy was 86%.
An advantage of the presented approach is that it is easy to expand with new bumblebee species as additional classes. Potential future work encompasses the inclusion of males and the addition of other types of bumblebee buzzing (sonication, hissing, etc.) -since the present study focuses only on the sound produced during flight as means for classification.
Our approach to the bumblebee classification is in fact rather general. As opposed to previous similar studies where a small number of audio features were manually selected, we instead chose a larger number of features from an existing feature library, using the IG as the merit. This means that the approach can be easily adapted to classify sounds produced by other animal groups, which we have already started working on.
There are some limitations to our approach. The output of the classification algorithm will always "recognize" some type of bumblebee, even if the recording does not represent a bumblebee buzzing. In principle, this can be fixed by a prior classifier that distinguishes bumblebee sounds from other recordings. Another issue may arise in the case of recordings made with significantly different equipment or different settings since it may affect the values of the features. To some degree, different settings can be corrected in post-processing.
For accurate classification, a large number of samples in the training set is required. In our study, some cases had only a few recordings available. Our application allows experts to upload additional recordings and add new bumblebee types. When a sufficient number of new recordings is uploaded, the models can be built again -on the improved data-set.
This will lead to improvement and wider usability of the classification model in future. As a side note, the samples used in the testing set have already been added to the new iteration of the algorithm.
In summary, our approach with machine learning based on sound and combined with the pictures of the classified bumblebee species and types enables fast and accurate classification in the field with the mobile application, demanding no prior knowledge of bumblebee species. The system is freely available (animal-sounds.ijs.si) and can be updated remotely with the administrator verifying the samples before adding them to the databases. For uploading samples to the database, a registration (free of charge) is required whereas the classification part itself is open. Other animal groups can be added to the database as well since our ambition is to make the application available to wider public.
One should note that the method presented here achieves reasonably good results in a different way than traditional methods such as morphological identification. Further studies should reveal the relation between the approach presented here, and the currently dominant approaches for animal sound recognition. Now that our approach is available for practical use, it will also be possible to find out if the traditional and our approach complement each other and enable best practical automatic classification of bumblebees.