figshare
Browse

Global Roadkill Data: a dataset on terrestrial vertebrate mortality caused by collision with vehicles

Version 5 2025-04-03, 11:21
Version 4 2025-03-31, 09:47
Version 3 2024-09-20, 11:03
Version 2 2024-09-20, 08:59
Version 1 2024-09-04, 15:31
dataset
posted on 2025-04-03, 11:21 authored by Clara Grilo, Tomé Neves, Jennifer Bates, Aliza le Roux, Pablo Medrano‐Vizcaíno, Mattia Quaranta, Inês Silva, KYLIE SOANESKYLIE SOANES, Yun Wang, Sergio Damián Abate, Fernanda Delborgo Abra, Stuart Aldaz Cedeño, Pedro Rodrigues de Alencar, Mariana Fernada Peres de Almeida, Mario Henrique Alves, Paloma Alves, André Ambrozio de Assis, Rob Ament, Richard Andrášik, Edison Araguillin, Danielle Rodrigues de Araújo, Alexis Araujo-Quintero, Jesús Arca-Rubio, Morteza Arianejad, Carlos Armas, Erin Arnold, Fernando AscensãoFernando Ascensão, Badrul Azhar, Seung-Yun Baek
<p dir="ltr">We present the GLOBAL ROADKILL DATA, the largest worldwide compilation of roadkill data on terrestrial vertebrates. We outline the workflow (Fig. 1) to illustrate the sequential steps of the study, in which we merged local-scale survey datasets and opportunistic records into a unified roadkill large dataset comprising 208,570 roadkill records. These records include 2283 species and subspecies from 54 countries across six continents, ranging from 1971 to 2024.</p><p dir="ltr">Large roadkill datasets offer the advantage ofpreventing the collection of redundant data and are valuable resources for both local and macro-scale analyses regarding roadkill rates, road and landscape features associated with roadkill risk, species more vulnerable to road traffic, and populations at risk due to additional mortality. The standardization of data - such as scientific names, projection coordinates, and units - in a user-friendly format, makes themreadily accessible to a broader scientific and non-scientific community, including NGOs, consultants, public administration officials, and road managers. The open-access approach promotes collaboration among researchers and road practitioners, facilitating the replication of studies, validation of findings, and expansion of previous work. Moreover, researchers can utilize suchdatasets to develop new hypotheses, conduct meta-analyses, address pressing challenges more efficiently and strengthen the robustness of road ecology research. Ensuring widespreadaccess to roadkill data fosters a more diverse and inclusive research community. This not only grants researchers in emerging economies with more data for analysis, but also cultivates a diverse array of perspectives and insightspromoting the advance of infrastructure ecology.</p><p dir="ltr">Methods</p><p dir="ltr"><b>Information sources:</b> A core team from different continents<b> </b>performed a systematic literature search in Web of Science and Google Scholar for published peer-reviewed papers and dissertations. It was searched for the following terms: “roadkill* OR “road-kill” OR “road mortality” AND (country) in English, Portuguese, Spanish, French and/or Mandarin. This initiative was also disseminated to the mailing lists associated with transport infrastructure: <a href="https://conservationcorridor.org/ccsg/working-groups/twg/" target="_blank">The CCSG Transport Working Group (WTG), </a>Infrastructure & Ecology Network Europe (IENE) and Latin American & Caribbean Transport Working Group (LACTWG) (Fig. 1). The core team identified 750 scientific papers and dissertations with information on roadkill and contacted the first authors of the publications to request georeferenced locations of roadkill andofferco-authorship to this data paper. Of the 824 authors contacted, 145agreed to sharegeoreferenced roadkill locations, often involving additional colleagues who contributed to data collection. Since our main goal was to provide open access to data that had never been shared in this format before, data from citizen science projects (e.g., globalroakill.net) that are already available were not included.</p><p dir="ltr"><br></p><p dir="ltr"><b>Data compilation:</b> A total of 423 co-authors compiled the following information: continent, country, latitude and longitude in WGS 84 decimal degrees of the roadkill, coordinates uncertainty, class, order, family, scientific name of the roadkill, vernacular name, IUCN status, number of roadkill, year, month, and day of the record, identification of the road, type of road, survey type, references, and observers that recorded the roadkill (Supplementary Information Table S1 - description of the fields and Table S2 - reference list). When roadkill data were derived from systematic surveys, the dataset included additional information on road length that was surveyed, latitude and longitude of the road (initial and final part of the road segment), survey period, start year of the survey, final year of the survey, 1st month of the year surveyed, last month of the year surveyed, and frequency of the survey. We consolidated 142 valid datasets into a single dataset. We complemented this data with OccurenceID (a UUID generated using Java code), basisOfRecord, countryCode, locality using OpenStreetMap’s API (https://www.openstreetmap.org), geodeticDatum, verbatimScientificName, Kingdom, phylum, genus, specificEpithet, infraspecificEpithet, acceptedNameUsage, scientific name authorship, matchType, taxonRank using Darwin Core Reference Guide (https://dwc.tdwg.org/terms/#dwc:coordinateUncertaintyInMeters) and link of the associatedReference (URL).</p><p><br></p><p dir="ltr"><b>Data standardization -</b> We conducted a clustering analysis on all text fields to identify similar entries with minor variations, such as typos, and corrected them using OpenRefine (http://openrefine.org). Wealsostandardized all date values using OpenRefine. Coordinate uncertainties listed as 0 m were adjusted to either 30m or 100m, depending on whether they were recorded after or before 2000, respectively, following the recommendation in the Darwin Core Reference Guide (https://dwc.tdwg.org/terms/#dwc:coordinateUncertaintyInMeters).</p><p><br></p><p dir="ltr"><b>Taxonomy -</b> We cross-referenced all species names with the Global Biodiversity Information Facility (GBIF) Backbone Taxonomy using Java and GBIF’s API (<a href="https://doi.org/10.15468/39omei" target="_blank">https://doi.org/10.15468/39omei</a>). This process aimed to rectify classification errors, include additional fields such as Kingdom, Phylum, and scientific authorship, and gather comprehensive taxonomic information to address any gap withinthe datasets. For species not automatically matched (matchType - Table S1), we manually searched for correct synonyms when available.</p><p><br></p><p dir="ltr"><b>Species conservation status</b> - Using the species names, we retrieved their conservation status and also vernacular names by cross-referencing with the database downloaded from the IUCNRed List of Threatened Species (<a href="https://www.iucnredlist.org/" target="_blank">https://www.iucnredlist.org</a>). Species without a match were categorized as "Not Evaluated".</p><p><br></p><h3>Data Records</h3><p><br></p><p dir="ltr">GLOBAL ROADKILL DATA is available at Figshare<sup>27</sup> <a href="https://doi.org/10.6084/m9.figshare.25714233" target="_blank">https://doi.org/10.6084/m9.figshare.25714233</a>. <a href="" target="_blank">The dataset incorporates opportunistic (collected incidentally without data collection efforts) and systematic data</a> (collected through planned, structured, and controlled methods designed to ensure consistency and reliability). In total, it comprises 208,570 roadkill records across 177,428 different locations(Fig. 2). Data were collected from the road network of 54 countries from 6 continents: Europe (n = 19), Asia (n = 16), South America (n=7), North America (n = 4), Africa (n = 6) and Oceania (n = 2).</p><p><br></p><p dir="ltr">(Figure 2 goes here)</p><p><br></p><p dir="ltr">All data are georeferenced in WGS84 decimals with maximum uncertainty of 5000 m. Approximately 92% of records have a location uncertainty of 30 m or less, with only 1138 records having location uncertainties ranging from 1000 to 5000 m. Mammals have the highest number of roadkill records (61%), followed by amphibians (21%), reptiles (10%) and birds (8%). The species with the highest number of records were roe deer (<i>Capreolus capreolus</i>, n = 44,268), pool frog (<i>Pelophylax lessonae, </i>n = 11,999) and European fallow deer (<i>Dama dama</i>, n = 7,426).</p><p><br></p><p dir="ltr">We collected information on 126 threatened species with a total of 4570 records. Among the threatened species, the giant anteater (<i>Myrmecophaga tridactyla</i>, VULNERABLE) has the highest number of records n = 1199), followed by the common fire salamander (<i>Salamandra salamandra</i>, VULNERABLE, n=1043), and European rabbit (<i>Oryctolagus cuniculus</i>, ENDANGERED, n = 440). Records ranged from 1971 and 2024, comprising 72% of the roadkill recorded since 2013. Over 46% of the records were obtained from systematic surveys, with road length and survey period averaging, respectively, 66 km (min-max: 0.09-855 km) and 780 days (1-25,720 days).</p><h3><br></h3><h3>Technical Validation</h3><p><br></p><p dir="ltr">We employed the OpenStreetMap API through Java todetect location inaccuracies, andvalidate whether the geographic coordinates aligned with the specified country. We calculated the distance of each occurrence to the nearest road using the GRIP global roads database<sup>28</sup>, ensuring that all records were within the defined coordinate uncertainty. We verified if the survey duration matched the provided initial and final survey dates. We calculated the distance between the provided initial and final road coordinates and cross-checked it with the given road length. We identified and merged duplicate entries within the same dataset (same location, species, and date), aggregating the number of roadkills for each occurrence.</p><h3>Usage Notes</h3><p dir="ltr">The GLOBAL ROADKILL DATA is a compilation of roadkill records and was designed to serve as a valuable resource for a wide range of analyses. Nevertheless, to prevent the generation of meaningless results, users should be aware of the followinglimitations:</p><p><br></p><p dir="ltr">- <b>Geographic representation</b> – There is an evident bias in the distribution of records. Data originatedpredominantly from Europe (60% of records), South America (22%), and North America (12%). Conversely, there is a notable lack of records from Asia (5%), Oceania (1%) and Africa (0.3%). This dataset represents 36% of the initial contacts that provided geo-referenced records, which may not necessarily correspond to locations where high-impact roads are present.</p><p><br></p><p dir="ltr">- <b>Location accuracy </b>- Insufficient location accuracy was observed for 1% of the data (ranging from 1000 to 5000 m), that was associated with various factors, such as survey methods, recording practices, or timing of the survey.</p><p><br></p><p dir="ltr">- <b>Sampling effort</b> - This dataset comprised both opportunistic data and records from systematic surveys, with a high variability in survey duration and frequency. As a result, the use of both opportunistic and systematic surveys may affect the relative abundance of roadkill making it hard to make sound comparisons among species or areas.</p><p><br></p><p dir="ltr"><b>- Detectability and carcass removal bias</b> - Although several studies had a high frequency of road surveys,the duration of carcass persistence on roads may vary with species size and environmental conditions, affecting detectability. Accordingly, several approaches account for survey frequency and target speciesto estimate more realistic roadkill rates<sup>29,30</sup>.</p><p><br></p><p dir="ltr">Acknowledging these limitations, it is important to highlight that this dataset is the largest availableroadkill compilation on terrestrial vertebrate species worldwide. <a href="" target="_blank">Records obtained from systematic surveys enable the estimation of roadkill rates, exploration of spatial and temporal patterns of roadkill, modelling of factors potentially explaining roadkill risk across diverse species, and analysis of the potential population impacts. Opportunistic data have the potential to generate extensive datasets across large areas, providing an overview of which species and regions are most exposed to and at risk from traffic. By integrating both systematic and opportunistic data, we can compile lists of species affected by collisions with vehicles, identify threatened species at risk, and assess local and landscape-level factors influencing mortality probabilities using presence-only approaches</a>. Furthermore, this dataset facilitates the identification of geographic gaps in roadkill surveys, focusing scientists and road agencies’ efforts in data-deficient areas. Finally, beyond its applications in road ecology, this dataset contributes with species occurrences for distribution modelling and broader macroecological and conservation studies.</p><p><br></p><h3>Code Availability</h3><p><br></p><p dir="ltr">The Java code used to process and validate the dataset is available at https://github.com/PORBIOTA/PORBIOTA-ICNF/tree/main/DwC_Creation_Helpers.</p><p><br></p>

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC