Photographs, datasets and code supporting ‘An accurate and efficient semiautomated approach to counting birds: estimating Northern Gannet colony size in Canada'
ABSTRACT
Improving the efficiency of population monitoring and conservation programs is beneficial, so long as the accuracy of the information collected is not diminished. The need to expeditiously estimate the population size of seabird colonies is especially acute during mass mortality events when aerial surveys can provide information quickly on the extent of effects and total mortality. In 2022, the Highly Pathogenic Avian Influenza virus caused outbreaks at most Northern Gannet Morus bassanus colonies worldwide, killing tens of thousands of gannets in eastern Canada. In this study, we evaluated the accuracy and efficiency of a semiautomated method using the free software CountEm for counting Northern Gannet nests by reanalysing thirteen years of aerial photographs from past population surveys (2009–2020 and 2022). We developed a protocol that generated population estimates that are accurate enough to support population management objectives (i.e., within 2–5% of manual counts) and outline additional ways to improve CountEm accuracy. Additionally, using CountEm was 1100% more efficient than manually counting based on counting time. Since CountEm relies on human identification of objects to be counted, our methods, results, and conclusions are transferable to any taxa that form large aggregations and can be identified and counted in photographs.
About this repository
This repository contains contains photographs, datasets and code supporting ‘An accurate and efficient semiautomated approach to counting birds: estimating Northern Gannet colony size in Canada’, which is published in Ecosphere. The repository can be cited as follows:
Walker, Jacob, Trevor S. Avery, Francis St-Pierre, Jean-François Rail, Danielle E. A. Quinn, Matthew English, and Stephanie Avery-Gomm. 2024. “Photographs, datasets and code supporting ‘An accurate and efficient semiautomated approach to counting birds: estimating Northern Gannet colony size in Canada’.” Figshare. https://doi.org/10.6084/m9.figshare.25483174
Photographs
This repository contains data associated with 52 composite photographs of Northern Gannet colonies at Ile Bonaventure and Rochers aux Oiseaux taken between 2009 and 2022. See the manuscript above for details.
Datasets
The raw data are provided in alldata.csv. Results of repeated CountEm runs (n = 11 photographs) are found in multipleruns.csv. The provided variable key describes variables in both data files (VariableKey.xlsx
). To reproduce the analyses performed in this study, use the code provided in reproducible_analysis.R
.
Code
The code required to reproduce the double count analysis is in reproducible_analysis.R
, using the input files data/rawdata.csv
and data/multipleruns.csv
. The code, including detailed comments, is organized into 8 sections:
- Load Packages: loads the required packages (see Sofware requirements, below)
- Import, Restructure, and Subset Data: five subsets of the available data are created to faciliate analyses in sections 3-8
df
: requiresdata/rawdata.csv
; results from the first CountEm run for each photo using 300 quadrats, only considering AOTs (52 rows, 22 columns)df_500
: requiresdata/rawdata.csv
; CountEm results from a subset of 12 photos using 300 and 500 quadrats, only considering AOTs (12 rows, 10 columns)df_dead
: requiresdata/rawdata.csv
; results from the first CountEm run for each photo using 300 quadrats, only considering dead birds (4 rows, 22 columns)mdf
: requiresdata/multipleruns.csv
; results from ten CountEm runs for a subset of 11 photos, only considering AOTs (110 rows, 22 columns)mdf_dead
: requiresdata/multipleruns.csv
; results from ten CountEm runs for a subset of 11 photos, only considering dead birds (30 rows, 22 columns)
Sections 3-8 are used to generate the results found in the corresponding Results subheaders of the text:
- Results: CountEm Accuracy: uses the data object
df
to assess the use of CountEm to estimate the number of AOTs; produces Figures 3 and 4 - Results: Increasing CountEm Quadrats: uses the data object
df_500
to assess the impact of increasing the number of CountEm quadrats from 300 to 500 - Results: Estimating the Number of Dead Birds: uses the data object
df_dead
to assess the use of CountEm to estimate the number of dead birds - Results: Accuracy of Multiple CountEm Runs: uses the data object
mdf
to run a resampling routine and assess the use of multiple CountEm runs to estimate the number of AOTs - Results: CV to Inform Number of CountEm Runs: uses the results of the simulation in section 6 to determine if the coefficient of variation (CV) can be used to determine the number of CountEm runs that could be summarised to generate estimates within 5% of the manual count of AOTs
- Results: Efficiency: uses the data objects
df
andmdf
to summarise the user time required to apply CountEm to generate estimates of the number of AOTs
Software requirements
Scripts are written for R v4.3.1. See scripts and manuscript for packages and software citations.
Required R packages can be installed in R with:
install.packages(c("tidyverse",
"readxl",
"BSDA",
"boot",
"ggdist",
"ggeffects",
"emmeans",
"marginaleffects"))