Data mining hive inspections: more frequently inspected honey bee colonies have higher over-winter survival rates

Abstract Honey bee colonies frequently suffer from high over-winter losses attributed to various factors, including management, poor nutrition, pests, pathogens, and parasites. Most beekeepers have only limited control over these factors. This study looks at the role of the number and timing of hive inspections in relation to over-winter losses, which beekeepers can largely control. The impact of hive inspections on over-winter survival is usually difficult to measure in traditional studies due to confounding factors, including geography, sample size, and variability in practices. This study mines data collected, anonymized and shared from an apiary management software system and includes data from 4,072 hives managed by 717 beekeepers across the continental United States (U.S.) over a five-year period from (2013–2018), consisting of 60,920 inspections to identify the relationship between hive inspections and over-winter survival. Hives are grouped into nine climate zones deemed similar by the National Oceanic and Atmospheric Association (NOAA), and inspections were grouped by season for analysis. Results suggest that more frequent hive inspections are associated with higher over-winter survival rates across most U.S. regions. Unexpectedly, this also includes having relatively more inspections during the winter months in every region. Also, surprisingly only one of the nine climatic regions had significantly different average over-winter survival rates despite the significant geographic and climate difference across the continental U.S. This finding suggests that other factors like management actions may be more important to over-winter survival rates than climate. Finally, this inspection analysis shows that the number of inspections performed by beekeepers is a relevant factor in predicting over-winter hive mortality.


Introduction
Following the unusually high winter losses in the U. S. in 2006-2007 for the commercial beekeeping sector, honey bee mortality has received much attention in both public discourse and scientific research and discussion.While much has been learned about the factors critical to honey bee health, high mortality in honey bee colonies continues to be reported worldwide without definitive identification of the causes (Benaets et al., 2017).Many reasons have been hypothesized and tested while researching the reasons for winter bee losses (Gray et al., 2020) and actions to minimize bee losses (Steinhauer et al., 2021).However, the main management action (inspection) has not been thoroughly tested across a wide geographic region.No study has stated how frequently inspections should be performed to keep honey bee colonies healthy and to increase the chance of their survival.
The objective of the present study is to discover how inspection patterns may drive or foreshadow winter loss rates of honey bee colonies.Data from an apiary management software system, official climate data, and other citizen science data sources have been mined.

Bee biology and inspection theory
The western honey bee Apis mellifera has been found in all continents except Antarctica (Mortensen et al., 2013).The U.S. and North America do not have native honey bee species; the first species were introduced from Europe in the 1700s (Carpenter & Harpur, 2021).Consequently, only a few bee stocks dominate the population within the U.S., particularly in the hobbyist and backyard beekeeping demographics.These stocks come from the same root species, Apis mellifera, mainly the Italian Bee and Carniolan Bee stocks (Tarpy, 2016).This indicates a lower genetic biodiversity in the backyard beekeeping realm, making it easier to generalize, at scale, the species type for this study, of which the vast majority is derived from Apis mellifera stock.This lack of genetic diversity in the U.S. arguably strengthens this paper by reducing the variance in the test population, contributing to more generalizable knowledge.
Since the continental U.S. has a large variety of climates, from the snowy north, the temperate midsections, and the warm sunny south, the researchers expected variations in over-winter survival rates.Indeed, since this data set consists primarily of stationary hobbyist hives, this is a suitable dataset for testing regional climatic variation on over-winter survival rates.
Hive inspections are one of the primary and most frequent management practices in beekeeping, and notably one of the few that beekeepers have a high degree of control over.It is difficult to accurately know what other management practices are needed without inspecting the hive, which renders inspections central and essential to any beekeeping operation.The frequency of inspections often varies depending on the season and location, as well as on beekeeper-specific factors.Generally, inspections are particularly frequent at the beginning and during the growing season (spring and summer, respectively).During winter and under unfavorable weather conditions, the inspection frequency is usually reduced to a minimum.
The best practices for beekeeping guidelines recommend carrying out "thorough inspections" on a hive at the beginning and end of the beekeeping season, which was extrapolated to the spring and fall seasons (Pietropaoli et al., 2020).Additionally, the best practice guidelines also recommend reducing bee stress by "avoiding unnecessary winter inspections" (Pietropaoli et al., 2020).By using these best practice guidelines as a framework, the following six hypotheses were developed to test the significance of hive inspections and interactions during each of the beekeeping seasons: H1: Over-winter survival rates will vary by climatic region.H2: If beekeepers perform more inspections on their hives during the entire year, their hive is more likely to survive over winter.
H3: If beekeepers perform more inspections on their hives during the spring months, their hive is more likely to survive over winter.H4: If beekeepers perform more inspections on their hives during the summer months, their hive is more likely to survive over winter.H5: If beekeepers perform more inspections on their hives during the fall months, their hive is more likely to survive over winter.H6: If beekeepers reduce the number of inspections on their hives during the winter months, their hive is more likely to survive over winter.
Due to geographic and climate differences as well as genetic variations in hives, these hypotheses need to be tested over a very large area with several climate zones (Figure 1) that are substantially similar and where the majority of hives are derived from the same rootstock.In doing so, the strengths and weaknesses of these recommendations can be assessed at scale while largely controlling for the dominant local (to the U.S.) species.

Data sourcing
As noted above, a large and representative data set is needed to adequately test the inspection hypotheses in this study.A wide range of geographic samples is needed to demonstrate significance in the inspection frequency for different geographic and climate zones (Figure 1), especially with how critical these factors are expected to be to over-winter survival rates.Additionally, to control for unprecedented weather events in the years leading up to the overwinter survival test, multi-year data is needed to prevent bias and be truly representative of the regions' typical conditions.Apiary management software systems provide a natural source for a dataset that meets these criteria.
The apiary management software firm HiveTracks provided the anonymized data for this project as part of their goal to partner with the beekeeping community to better understand how data can be used to evaluate the factors affecting honey bee health.Within the application, beekeepers are able to record detailed inspection information like queen status, hive strength, feedings, harvest, and treatments, share data, view maps, and organize their hive management tasks.HiveTracks has been used across over 150 countries by over 40,000 beekeepers over the last decade, making it a valuable tool for research on beekeeper trends and hive health (Wilkes, 2021).
While there is great potential for using such a database to explore research questions, careful evaluation of the data set is required to assess the quality and quantity of data available.Selecting the best data for this project was done through a series of analytics techniques, starting with database querying, joining of information, and feature engineering new variables that could test the hypotheses.

Hive inclusion
When testing the hypotheses, ensuring that the colonies or hives used were from a group of consistent and reliable users of the software was essential due to the retrospective nature of the research.Rather than designating a group of experimental colonies, setting a control group, and monitoring the changes in realtime, this study looked to the past records of actual beekeepers in the field, making real decisions on their colonies.With this quasi-experiment, a longitudinal within-subject research method design was used to follow the same groups of colonies and measure the treatment variable over a long period of time (West et al., 2004).This basic post hoc data process of gathering, cleaning, and analysis is used in numerous industries.However, following the data management process with a preparatory stage, organization stage, and analysis and dissemination stage is essential for safe and reliable scientific data study (Tavakoli, 2006).
Because people come and go, understanding when someone stopped using the software versus having a colony perish was a key outcome to be able to deduce with data.Therefore, a variety of methods were tested in trying to predict when a user stopped using the software.Tracing activity across all hives, login activity, and inspection data were analyzed.
From the start of the study, only colonies from the continental U.S. were analyzed to limit the climatic variability of different regions, species of bees, trends in beekeeping practices, and differences in diseases present in those areas.To achieve a list of dependable hives, a multi-step process was developed for validating the colony as alive or dead (Figure 1).The first step was to state whether the beekeeper was still active early in the year following the over-winter season.If the beekeeper was still active on the software, then the hives were included in the first round.From there, hives qualified for the study based on whether they were active in the fall season (09/01-11/30) before the "Over Winter" hypothesis, verifying that they were alive going into the test winter season.This ensured that no colonies that were already dead before the winter analysis skewed the final results.
Next, colonies were classified as alive (surviving the over-winter test) if there was more than one inspection on the hive in the following spring season (03/01-05/31).By having multiple inspections completed on the hive the following spring season, it could be assumed that the colony was alive and stable enough to warrant returning to or being sold and removed from the bee yard.Additionally, if there was only one hive inspection in this period, the assumption was made that the colony did not survive.This process was followed for each set of colonies for the beekeeping seasons from July 2013 to July 2017.The list of hives that met both beekeeper and hive requirements was added to the analysis and marked as alive or dead in a new "Over-Winter Survival" feature-engineered, dependent variable based on this workflow.By adding this variable to the dataset, a concrete framework provided clarity to colony mortality within the data for analysis on a variety of variables.

Data procedures
Following the detailed criteria on which hives were included in the study, a list of 717 beekeepers carefully monitoring their hives was developed.This version of HiveTracks is mainly a hobbyist platform, usually used by beekeepers with under 10-15 hives.The average number of hives owned by a beekeeper in the study was 5.7, with a median of two hives.In this study, there were 4,072 unique hives analyzed from the group of beekeepers.Of those hives, 2,189 were marked as still alive, and 1,883 were marked as dead following the winter.The calculated survival rate was 53%.
Beekeepers were selected to be included in the study due to the consistency and reliability of their data.The beekeepers had to be active within hives on their account both in the spring of the test year and the following spring after the over-winter trial to ensure they continued using the platform.This helped to eliminate beekeepers with lower usage or dropping off the platform from skewing the yearly survival trial.From there, merging a series of tables was necessary to produce a cohesive dataset for analysis to test over-winter survival rates against hive inspections for not only the colonies that died, but also for the ones that survived.This process is illustrated in Figure S1.

Data analysis
Data analysis was conducted in SAS Viya version 3.4 through a series of statistical tests, including descriptive analysis and nonparametric tests.For example, to test the differences between the regions' mean rate of over-winter survival, a Kruskal-Wallis test was used because of the non-normal nature of the data.
1. Analysis of regional climatic differences in overall survival rates.To further analyze the differences in over-winter survival between climatic regions, a Dwass, Steel, Critchlow-Fligner method comparison analysis was used (Table 2).This post-hoc test is an extension of the Kruskal-Wallis test.2. Analysis of overall hive inspections by season.
Summary statistics were used to get a general idea of the trends of the groups and regions, while the Mann-Whitney U-test was used to identify significant differences between the alive and dead colonies.The Mann-Whitney U-test was selected because the colonies included in the study were split into two independent samples depending on their over-winter survival, and the number of hive inspections was not normally distributed in each sample.
The data were cleaned mainly through the collection process.However, a few straggling errors needed to be fixed before conducting the analysis.Because the data used for analysis was queried using MySQL Workbench, specific tables were created for the hypothesis testing.Each table was built on tidy data principles, where each row is a single observation with a unique key, each column has only one variable attached, and the data is clean and free of missing values.For the final analysis, developing a count of the number of inspections within the given season for each hive refined the independent variables for each hypothesis.
The timeframes for each season were based on the rough month-to-month trends across the continental U.S. Notably the winter season is from December to the end of February, the spring season is from March to the end of May, the summer season is from June to the end of August, and the fall season is from September to the end of November (Thomas & Koss, 1984).While climatic variability is present across the U.S., these timeframes are consistent with beekeeping seasons and practices across the country.Following the hive inspection count, each hive in the study had four independent predictive variables in the number of inspections performed during each beekeeping season, and the dependent over-winter survival variable examined throughout this paper.By including data from a five-year span, the impact of annual variations are lessened by the breadth of information collected over the multi-year period.
Hives were broken down into regions to further investigate the seasonal inspection hypotheses based on climate and assigned a grouping variable based on a region within the dataset.The state-bystate regional grouping in this study is based on NOAA's (NOAA National Centers for Environmental information, 2021) U.S. climate regions (Figure 1) used for climate monitoring (Thomas & Koss, 1984).
NOAA's climatically consistent regions try to remove "climatic anomalies" and contain "temporal and spatial temperature and precipitation distribution for groups of contiguous states," creating relative climate uniformity in the regions.
The Southeast region had the most beekeepers in the study with 197, followed by Northeast with 147, the Ohio Valley with 111, and the Northwest with 91.The South had 59, Upper Midwest 37, the West 41, the Southwest 28, and the Northern Rockies and Plains 6 beekeepers, respectively.The sample size for the Northern Rockies region is too small to test statistical significance; however, this hive data is reported as descriptive information only.
To demonstrate the trends of the seasonal climate forces for each region, temperature and precipitation records were collected from the National Centers for Environmental Information, Climate at a Glance feature for the timeframe and averaged seasonally (Table S1).By summarizing these trends, the bees' conditions during the study time period and the expectation for beekeeper inspections can be better understood.Each region has different requirements based on the natural forces at play.For example, a beekeeper in the Southern region may responsibly inspect their hive year-round without disturbing the bees due to the warmer weather.In contrast, a beekeeper in the Northern Rockies region would find it very dangerous to add any more stress on their hives during the winter months due to intense colder temperatures.

Results
The Kruskal-Wallis test that was used to test the differences between the regions' mean rate of overwinter survival showed a very high Chi-Square value of 53.9, significant at the 0.01 level.This one-way, non-parametric ANOVA test showed that not all means of over-winter survival across the regions were equal.

Analysis of regional climatic differences in overall survival rates
The Dwass, Steel, Critchlow-Fligner method comparison analysis showed that the Northeast region had a mean over-winter survival rate with a significant difference from the rest, as illustrated by the number of significance asterisks ( Ã ; Table S2).However, contrary to the researchers' initial expectations, the key finding is that there was no significant statistical difference in the overall survival rate in eight of the nine climatic regions, even with a high degree of climatic variability over the five-year period analyzed and across the continental U.S.

Analysis of overall hive inspections by season
The results of the seasonal statistical analysis using the Mann-Whitney U test showed significant differences between the groups of colonies that survived and those that died (Table 1), with the surviving hives generally having more inspections.
For the general hypothesis analysis, very high statistical significance indicates that colonies that survived over winter had more inspections completed, regardless of the season.This held true across each beekeeping season, but with a more significant difference between the groups in the fall and winter seasons, where interacting with a hive an additional time before winter appears to have impacted survival rates.

Analysis of regional hive inspections by season
All regions of the U.S. reduce their hive inspections during the winter due to extreme temperatures and conditions.For most regions, including the winter season, hives that had more inspections had higher survival rates.To analyze these trends further and investigate the role climate could play in each of these seasons, the hives were dissected on a regional basis using the similar climate zones referenced above.Results are presented in Table 2.
For the spring hypothesis, there were five regions with a significant difference between the number of hive inspections performed that survived and those that died.The Northeast, Southeast, South, Southwest, and Northwest all had significance levels high enough to conclude that the number of spring inspections performed on colonies that lived and those that died in those areas is different.Additionally, every testable region had a higher mean inspection count on the colonies that lived in the spring (Table 3).
For the summer hypothesis, there were four regions with a statistically significant difference.The Northeast, Southeast, South, and Northwest regions all had a significant difference between the two groups of colonies in the number of inspections recorded during the summer (Table 3).
With the fall hypothesis, four of the eight testable regions had significant levels of difference in inspections, with those with more inspections more likely to survive in the Northeast, Southeast, and West.Overall, the results show a significant difference in the number of inspections and hive survival in spring, summer, and winter (Table 3).
For the winter season hypothesis, all eight of the testable regions had a significant difference in the number of inspections performed during the winter.In each instance, the mean number of inspections was greater for the colonies that survived (Table 3).

Discussion
Interestingly, the only region with a significant difference in over-winter survival was the Northeast.The  than those during the other three seasons since the bees are clustered and require less management.The bee is cold-blooded, so the colony must maintain a warm temperature to keep the colony alive.Honey bees accomplish this through the clustering process.Once temperatures fall below 50 F (10 C), honey bees form a winter cluster within the hive by crowding together tightly around the queen.In doing this, they are able to keep a survivable heat at the center of the cluster during the winter months, when temperatures can reach 90-100 F (32-37 C).Additionally, the temperature outside the cluster varies around 50 F (10 C) (Hogeback, 2021).However, the data suggest that this common practice might need to be reevaluated or adjusted.
Note that in general, however, winter inspections of the survival group were much less frequent than during the rest of the year (1-3 times in winter versus 3-7 times during other seasons).The results show that those who inspected relatively more in the winter than others (1 -times) compared to those that did not inspect or did so just once (0-1 times), had better over-winter survival.As noted earlier, there are many reasons not to inspect in the winter, especially on a cold day.Therefore, it would be very important to not overdo it.Yet, results do suggest that those hives that were inspected at least once in the winter, presumably on a warmer day, were associated with higher over-winter survival rates.However, it is unlikely that this trend would increase survivability indefinitely.Too much hive exposure to severe climatic conditions would undoubtedly increase mortality, so finding the right balance between the two is essential to maximizing over-winter survival.
While there are several significant findings from this study, there are also ways that it can be improved upon for greater efficiency and accuracy.Understanding what was needed for analysis and what could be ignored within the database took considerable time and expertise.Working with this retrospective data gave a set of restrictions and parameters on what could be analyzed; therefore, it was essential to focus on only the data that was available and reliable.
All beekeepers in this study use software technology to aid their management practices, which differentiates them from other hobbyists who might be less inclined to perform routine record-keeping.Importantly, these hives were primary stationary, so future search should explore how migratory hives are impacted by the frequency of hive inspection.One of the initial steps was to limit the hive pool to only the hives in the lower U.S. to simplify the changes that would come with seasonal differences.While this initial study used this limited focus, verifying the findings in other regions around the world is critical to proving the veracity of the findings.

Conclusions
Contrary to expectations, the locations of hives in different climate regions across the United States had little actual impact on over-winter survival rates in our sample.What was highly significant was the number and frequency of hive inspections.Namely, hives that were inspected more frequently than others in the same region were much more likely to survive the subsequent winter than those that were not.Especially notable was that inspecting during the winter was also associated with higher survival rates, regardless of the climate region where the hives are located.
In nearly every instance, the over-winter surviving colonies received significantly more frequent inspections during each of the previous spring, summer, fall, and winter seasons.Considering each environmental region separately, there was a significant impact of spring inspections in five of the eight testable regions on over-winter survival.This was particularly true in the South and Southeast regions, where milder springs lead to an earlier onset of the beekeeping season.
Recent studies on honey bee mortality have been carried out (Kulhanek et al., 2017;Gray et al., 2020) concerning the annual hive mortality's three main risk factors: "Varroa" (Genersch et al., 2010;Giacobino et al., 2015), "Queen failure" (Brodschneider et al., 2016;van Engelsdorp et al., 2013) and "Pesticides" (Dietemann et al., 2013;Traynor et al., 2016).Concluding from the results of the present study, at least another relevant factor for hive mortality should be considered: the number of inspections performed by beekeepers and research to benefit the beekeeping sector and to provide opportunities for citizen scientists to contribute to the global beekeeping effort.Additionally, all the HiveTracks data has been anonymized and aggregated to hide information that could pinpoint one specific beekeeper.This is in line with the HiveTracks Terms and Conditions and Privacy Policies for users of the software.No potential conflict of interest was reported by the author(s).

Table 1 .
Average number of inspections of hives by season.

Table 2 .
Average number of inspections of hives by season for climatic regions.Region not statistical testable due to small sample size, numbers presented as information only.Significant levels: ÃÃ p <. 01, ÃÃÃ p < .001. a