Handling dolphin detections from C-PODs, with the development of acoustic parameters for verification and the exploration of species identification possibilities

Abstract C-PODs are static passive acoustic monitoring devices used to detect odontocete vocalizations in the range of 20–160 kHz. However, falsely classified detections may be an issue, particularly with broadband species (i.e. many dolphin species) due to anthropogenic and other noise occurring at the same frequency. While porpoise detections are verified using species-specific acoustic parameters, the equivalent does not currently exist for verifying dolphin detections. Development of such parameters would increase the accuracy of dolphin detections and eliminate the need for additional monitoring techniques or devices, reducing the cost of monitoring programmes. Herein, we present parameters based on acoustic characteristics of bottlenose (n = 29), common (n = 19) and Risso’s (n = 99) dolphin click trains, sighted within 1 km of C-PODs during land-based surveys, for in-software verification. Overlap of click train parameters among dolphin species prevented robust species identification; therefore, parameters were devised for these dolphin species collectively using frequency, inter-click interval and click train duration. A data set of 4898 Detection Positive Hours was visually verified using these parameters. The temporal and spatial patterns in the visually verified data were similar to land-based observations, suggesting the parameters operate at an acceptable accuracy. However, 68% of high-, moderate- and low-quality KERNO detections were false-positive. Our results suggest that the accuracy of classifiers and quality class weightings are site-specific, and we highlight the importance of data exploration to make the most appropriate software choices based on the aims of a study.


Introduction
C-PODs (Cetacean-PODs) are the most commonly used Passive Acoustic Monitoring (PAM) devices for detecting cetaceans in Europe, logging odontocete click trains across a range of 20-160 kHz (Dudzinksi et al. 2011;Dähne et al. 2013). These devices have ABSTRACT C-PODs are static passive acoustic monitoring devices used to detect odontocete vocalizations in the range of 20-160 kHz. However, falsely classified detections may be an issue, particularly with broadband species (i.e. many dolphin species) due to anthropogenic and other noise occurring at the same frequency. While porpoise detections are verified using species-specific acoustic parameters, the equivalent does not currently exist for verifying dolphin detections. Development of such parameters would increase the accuracy of dolphin detections and eliminate the need for additional monitoring techniques or devices, reducing the cost of monitoring programmes. Herein, we present parameters based on acoustic characteristics of bottlenose (n = 29), common (n = 19) and Risso's (n = 99) dolphin click trains, sighted within 1 km of C-PODs during land-based surveys, for insoftware verification. Overlap of click train parameters among dolphin species prevented robust species identification; therefore, parameters were devised for these dolphin species collectively using frequency, inter-click interval and click train duration. A data set of 4898 Detection Positive Hours was visually verified using these parameters. The temporal and spatial patterns in the visually verified data were similar to land-based observations, suggesting the parameters operate at an acceptable accuracy. However, 68% of high-, moderate-and lowquality KERNO detections were false-positive. Our results suggest that the accuracy of classifiers and quality class weightings are site-specific, and we highlight the importance of data exploration to make the most appropriate software choices based on the aims of a study. KEYWORDS c-PoD; dolphin; passive acoustic monitoring; acoustic characteristics; species identification been used extensively to quantify porpoise (primarily harbour porpoise) but also dolphin occurrence. While porpoises are easily differentiated by their high-frequency, narrow-band echolocation signals, dolphins produce broadband signals (Villadsgaard et al. 2007;Lopez 2011) which make it more difficult to distinguish dolphin click trains from other broadband sources (e.g. boat traffic). As a result, the classification of dolphins might not be as accurate as that of porpoises, and can lead to incorrect (false-positive), and missed (false-negative) classifications. Accuracy of presence/absence data is vital for monitoring programmes to facilitate appropriate management (Forney 2000), so it is important to verify detections. Verification is key in acoustically active environments such as areas prone to anthropogenic noise, which can mask cetacean vocalizations of similar frequencies, possibly lasting for extended periods and consisting of high numbers of clicks (Nowacek et al. 2007;Clark et al. 2009) and thus increasing the potential for incorrect detection rates.
Due to high variation between acoustic characteristics, dolphins are more prone to false-positive detections (Philpott et al. 2007). Consequently, some studies focussing on dolphin occurrence have relied on multi-method monitoring to verify the accuracy of C-POD detections (e.g. Read et al. 2012;Nuuttila, Thomas et al. 2013;Roberts and Read 2014). Multi-method verification can utilize additional acoustic devices or combine acoustic monitoring and sightings effort. This verification identifies false-positive and false-negative detections, which can lead to an over-estimation or under-estimation of occurrence, respectively (Roberts and Read 2014). While porpoise detections are sometimes verified through visual screening of C-POD files in accordance with parameters which meet the species' acoustic characteristics (Gallus et al. 2012), parameters do not currently exist for verifying dolphin detections, neither collectively or to species level, as it is not thought to be possible to differentiate between species (Tregenza 2013). The development of such parameters for dolphin detections for C-PODs would eliminate the need for additional monitoring techniques or devices, resulting in increased accuracy of detections and reduced financial investment in monitoring programmes.
Two generic classifiers exist within the C-POD software; KERNO is the base classifier which processes all imported data, separating detections into quality classes of high ("Hi"), moderate ("Mod"), low ("Lo") and doubtful ("?"). GENENC is an optional supplementary classifier that passes data through additional algorithms to assist in distinguishing dolphin echolocation from background noise and other species (Tregenza 2013). For dolphin detections, the manufacturer recommends using the base KERNO classifier with high-and moderate-quality detections, or the additional GENENC encounter classifier if the environment is "noisy" (Tregenza 2013;Tregenza, pers. comm. 2014;unreferenced).
Here, we determine whether C-PODs can be used to differentiate between three species, Risso's, common and bottlenose dolphins, and create a standardized methodology for visually verifying dolphin detections based on acoustic characteristics. Furthermore, we investigate whether classifier and quality choices are site-specific and if an optimal combination of classifier and quality exists for dolphin detection data. We use these findings to investigate suitable sample sizes for sub-sampling data sets in relation to the likelihood of obtaining the correct proportion of true-positive detections within the complete data set.

Study site and data collection
Located on the north-west coast of County Mayo, Ireland, Broadhaven Bay (N54°18′, W009°55′) is relatively shallow with depths typically less than 50 m. C-PODs (version 0) were deployed in 2010 and 2011, at two locations; Site A, inside the bay in 17-m water depth and subject to high vessel traffic and at Site B, at the entrance to the bay, in 40-m water depth in an area of regular tidal fronts. The devices were moored using concrete weights connected to the C-POD with 4 m of steel chain and held upright by a sub-surface buoy. The device was attached to a surface buoy with rope.
A preset click limit of 4096 was used in 2010, which was increased to 65,536 clicks a minute in the summer of 2011 as part of a change in the data collection protocol of the monitoring project. This was done to prevent data loss as a result of "noise" from anthropogenic activity (e.g. boat sonar) and environmental sources (e.g. sediment transport). Lower click limits can miss detections due to masking noise exceeding the click limit. This data loss and the resulting times of unknown dolphin occurrence may be reduced by adopting higher click limits.
The accompanying software (CPOD.exe version 2.043, Tregenza 2013), was used to download raw data (CP1 files) from C-PODs and process them into CP3 files with KERNO or GENENC, depending upon the analysis being undertaken. Click trains can be classified as narrowband high-frequency animals ("NBHF") including harbour porpoise, other cetaceans ("Other Cet") indicating other odontocete species (excluding sperm whales), sonar ("Sonar") or unclassed ("?"). Unless otherwise stated, detections were classified by KERNO including high-, moderate-and low-quality classes, and only the "Other Cet" was selected, to extract highest number of possible dolphin detections for verification and analysis.

Species differentiation
Land-based sightings were conducted from two cliff top sites overlooking the C-POD deployment areas to provide species identification corresponding to acoustic detections. Observers identified animals to species level, documenting the time of sightings and triangulating the animals' location from theodolite readings. Initial observations of dolphins' acoustic characteristics were based on click trains of dolphins recorded at the same time as individuals identified to species level from land within a 1 km radius of a C-POD (following Nuuttila, Thomas et al. 2013). Only single-species recordings in the absence of other noise were used to investigate inter-and intra-specific characteristics. To investigate whether classification to species level was possible, and to produce initial characteristics for defining parameters, frequency (kHz), sound pressure level (SPL), inter-click interval (ICI) and duration of cycles were investigated. These acoustic characteristics of click trains were considered for short-beaked common dolphins (Delphinus delphis), Atlantic bottlenose dolphins (Tursiops truncatus) and Risso's dolphins (Grampus griseus). Click train characteristics were used to create a similarity matrix based on Euclidean distance in PRIMER statistical software (PRIMER 6, Plymouth Marine Laboratory) and visualized using non-metric Multidimensional Scaling Plots (nMDS). This effectively gives an indication of how similar each acoustic detection is to all other detections. Groupings of acoustic detections by species were tested using permutational Multivariate Analysis of Variance (PERMANOVA) with unrestricted permutations of raw data (Anderson et al. 2008).

Parameters
To create a standardized methodology for the visual verification of dolphin detections classified by the C-POD software, we devised parameters based on the acoustic characteristics of dolphin species. These parameters were devised and tested within an hourly temporal scale, with hours containing dolphin detections classified as a Detection Positive Hour (DPH). For each classified dolphin detection in the CP3 file, the surrounding minute was screened for verified dolphin presence within the CP1 file. Once one dolphin detection within an hour was visually verified as a true-positive detection (i.e. correctly classified by the software), then the entire hour was regarded as true-positive for the occurrence of dolphins, irrespective of whether or not false-positive detections occurred. In an effort to be conservative, hours with high numbers of clicks over extended periods (i.e. masking noise) were deemed to be negative for dolphin presence unless clear trains meeting parameters existed outside of the noise.

Sample sizes and the suitability of classifiers and filters
To ascertain how many detections would need to be visually validated in order to increase the likelihood of correctly identifying the percentage of true-positive detections in a data set, we randomly sampled 1000 iterations of 100, 250, 500 and 1000 dolphin detections. From these random samples, we identified what proportion were true-positives at each site, as defined by the visual verification parameters created herein.
The manufacturer recommends the omission of low-quality classifications due to the possibility that these are less likely to be of dolphin origin (for example, boat sonar or environmental noise producing false-positive detections). However, correctly classified detections may still be classified as low-quality detections. We investigated the proportion of true-positive detections within the low-quality classifications, and whether ignoring these data increased false-negative detections when using KERNO (high-and-moderate quality classes, only). Lastly, we compared visually verified KERNO detections to detections extracted using the GENENC classifier within sites of different noise profiles, to determine which was the most suitable option for these data. A subset of 2962 h was run through the KERNO classifier and visually verified, recording the quality assigned to detections within each hour.

Results
A total of 104.45 days of acoustic data were collected at Site A, and 323.58 days at Site B across a 730 days period throughout 2010 and 2011. Gaps in data coverage were due to either click limits being met, or adverse weather preventing C-POD download and redeployment.

Species differentiation
From land-based sightings, dolphins around the 1-km buffer of Site A were primarily bottlenose dolphin (n = 132 of 133), whereas those at Site B were of mixed species, dominated by common dolphins (n = 95 of 151). Detections of common (n = 19), bottlenose (n = 29) and Risso's dolphins (n = 99) were recorded by the C-PODs. Mean train characteristics of the three species showed that Risso's echolocate at a lower minimum frequency, with common dolphins having higher maximum frequencies (Table 1). However, despite these species-specific characteristics, mid-range frequencies  were shared between all three species, and therefore frequency occupation cannot be used to differentiate between these species. Similarly, SPL, cycle number (duration of clicks) and ICI also showed similarities between species (Table 1).
nMDS plots showed a high degree of overlap between species in terms of acoustic characteristics (Figure 1). Multivariate analysis of click train characteristics using PERMANOVA found significant differences between species groups (Risso's v common, t = 2.391, p = 0.003; Risso's v bottlenose, t = 1.986, p = 0.017; common v bottlenose dolphins, t = 2176, p = 0.01). Differences in mean SPL, maximum frequency and ICI were the main contributors to differentiation between Risso's and both common and bottlenose dolphins, while mean SPL,  maximum frequency and the number of clicks per second contributed most to differentiation between common and bottlenose dolphins.
Despite the ability of PERMANOVA to detect species groupings within the data, high overlap of click train characteristics between dolphin species confirmed that predictive species differentiation is not currently possible with C-PODs alone. Therefore, verification parameters were devised for dolphin species collectively in consultation with the C-POD manufacturer (Nick Tregenza, Chelonia Ltd.). In order to form robust parameters, intra-specific acoustic characteristics were checked within multi-species recordings. These recordings included quiet periods, and localized and ambient noise to confirm that parameters were effective across different sound profiles with varied species presence.

Parameters for identifying dolphin detections
After visually analysing an extensive set of dolphin trains identified by the C-POD software, we developed parameters that fit dolphin detections confirmed by land-based sightings. To be positively identified as a dolphin detection, a click train should consist of a minimum of 9 clicks across broadband frequencies in the range of 30-150 kHz. Nine clicks were chosen to provide an accurate representation of trends over time with contour shape similarity. Click trains that were characteristic of dolphin echolocation but did not meet the minimum of 9 clicks per train were accepted if 9 clicks were present in two trains separated by no more than 0.5 s. Considerably segmented trains with multiple gaps of less than 0.5 s between 9 clicks were not accepted as being dolphin in origin as segmentation makes pattern recognition more difficult. The dolphin species in this study rarely vocalized in excess of 100 ms ICI; consequently, trains above this limit were rejected. Trains must not show entirely random characteristics, or unnatural tendencies (including no variation between clicks), and should not repeat exact patterns over an extended period of time as these may be produced by anthropogenic sources, or natural rhythms such as shifting substrate.
ICIs and SPLs show consistently similar trends; these were typically characterized by undulating patterns across the spectrogram. Clicks of higher frequencies are often produced at higher SPL. 89.2% of verified click trains consisted of maximum cycle numbers of 17, therefore the duration of 90% of clicks within a train must be less than or equal to 17 cycles. Click trains with 10% or less of clicks with durations of 17-25 cycles were accepted as several clicks over 100 kHz often exhibit longer durations (based on 73.3% of verified detections and further investigation of non-verified click trains). Figure 2 displays screenshots from the C-POD software of a correctly classified dolphin click train within the CP1 file that underwent visual verification using the parameters described above.
The parameters were applied to 27,214 h of recorded C-POD data collected at the two sites, of which, 4898 h were classified by the software as DPH for "Other Cet". Of these, 1586 were classified as true-positive dolphin detections after visual validation. In general, the random samples showed that for both sites and across all four sample sizes (100, 250, 500 and 1000), on average, the percentage of true-positive detections were correctly identified in the subsample; however, as expected, with lower sample sizes the 95% confidence intervals were typically broader, which would lead to the increased possibility of under or overestimating the percentage of true-positives. For example, taking a random sample of 100 detections at Site A gave a true-positive range between 0 and 29.41% where 1000 samples gave a range of 6.01-18.50%, with the true value being 10.4% (Figure 3).

Software selections
Correctly classified hours included multiple qualities (n = 44 of 88). If low-quality detections were omitted, 46% of correctly classified hours would have been missed (n = 41 of 88). However, the inclusion of low-quality detections corresponded to a higher number of false-positive detections (n = 237 of 243). Temporal trends in dolphin occurrence were similar between classifier choices at Site B, with more detections occurring when using verified data or GENENC; however, higher variation between classifiers was present at Site A with verified data resulting in considerably more detections (Figure 4). Site A showed no clear trends in circadian or seasonal occurrence, whereas there were clear peaks in occurrence at night and during the winter months at Site B.

Discussion
Our findings support previous studies on bottlenose (e.g. ), common (e.g. Soldevilla et al. 2008) and Risso's dolphin acoustic characteristics (e.g. Madsen et al. 2004 andSoldevilla et al. 2008). While the PERMANOVA analysis was able to detect species groups, indicating that within species variability in acoustic characteristics was less than the variability across the three species, the high overlap in acoustic characteristics means that species cannot be predicted based on click train characteristics alone.
Verified dolphin detections showed similar temporal and spatial trends to those observed from land observations (Anderwald et al. 2012). However, sightings data are limited to favourable weather and daylight hours, typically reducing data availability in winter months in temperate climates such as Ireland. C-POD data showed that there were peaks in occurrence during hours of darkness and winter months, which illustrates the importance of PAM in monitoring programmes. Site B, which was observed to be predominantly visited by common dolphins (a relatively pelagic species; Bearzi et al. 2003;Pusineri et al. 2007), showed similar temporal trends to those land-based observations from Broadhaven Bay (Anderwald et al. 2012). The majority of dolphins observed at Site A were identified as bottlenose, which supports well-documented evidence of this species in temperate waters favouring inshore areas, often close to or within narrow channels and river mouths (Wilson et al. 1997;Ingram and Rogan 2002;Culloch and Robinson 2008). Therefore, we have reasonable confidence in dolphin detections from Site A being of bottlenose dolphins, assuming that there is no shift in habitat use of delphinid species at night. We consider a nocturnal shift to be unlikely due to the often pelagic distribution of common dolphins and their observed occurrence at Site B, near the mouth of the bay.
The combination of the land-based and C-POD data suggests that the devised parameters are reliable for the verification of dolphin click trains. As parameters were devised from bottlenose, common and Risso's dolphin click train characteristics they are applicable in environments where these are the more commonly sighted species, such as around the UK and Ireland. To expand the number of dolphin species applicable for verification, click trains of additional dolphin species could be analysed using the methodology outlined herein.
Although the verification of detections is an important process, the choice between software algorithms and quality classes can, to a large extent, influence the amount of data available for verification. The similarity between verified data and automated classification at Site B, and higher variation at Site A suggests that these choices will be site-and study-specific. The difference in depth between the two sites is not expected to alter these results (Alonso and Nuuttila 2015), but optimum software choice is expected to vary according to ambient sound profiles and species' presence. There is a trade-off between the volume of data available and the degree of validation required. For example, the inclusion of low-quality click trains would result in more true-positive detections, but would also be more likely to increase the number of false-positive detections. This inclusion with higher verification effort may be suited to quiet environments with low densities of target species to yield the majority of true-positive detections. Consequently, we recommend that a subset of the data should be verified in an effort to determine whether or not the software selections applied are operating within acceptable error margins with respect to the aims of the study. The simulated random sample study showed, unsurprisingly, that visually verifying a larger sample of detections will reduce the confidence intervals, and thus increase the likelihood of obtaining the true proportion of true-positive DPH within the entire data set.

Conclusion
This study presents a new method to verify detections of dolphins within the C-POD software using parameters based on acoustic characteristics. To improve the reliability and accuracy of the outputs from acoustic monitoring of small cetaceans using C-PODs, we suggest that an appropriate level of verification (determined by data exploration) should be carried out in combination with suitable automated software choices. Appropriate selections and verification is especially important in areas of low density, where false-negative and false-positive detections are likely to substantially influence results, which could have significant implications for establishing risk to marine mammals and designing effective mitigation. This is especially true at study sites where noise is prevalent (e.g. environmental noise, such as sediment transport and tides or anthropogenic noise, such as boat sonar). The method presented herein will assist with future data exploration so that researchers may make informed decisions about their study sites, and increase the reliability of future data sets without the requirement of additional monitoring devices. It is suggested that future studies using this method should report false-negative and false-positive results from their exploratory analysis for additional transparency and to determine how common these issues are.