This Simpler_signatures_post-invasion_README.txt file was generated on 02 September 2020 by Grace Smith-Vidaurre Updated 19 June 2021 ------------------- GENERAL INFORMATION ------------------- Title of Dataset: Simpler_signatures_post-invasion Associated Article: Individual vocal signatures show reduced complexity following invasion, Animal Behavior, 2021 Co-authors: Smith-Vidaurre, Grace Perez-Marrufo, Valeria, Wright, Timothy F. Corresponding Author Information: Grace Smith-Vidaurre gsvidaurre@gmail.com Species: Monk parakeet (Myiopsitta monachus) Dates of data collection: May - November 2017 in the native range. Invasive range calls were recorded in 2004, 2011, 2018, and 2019. Geographic locations of data collection: Uruguay (native range) and the United States of America (invasive range). Geographic coordinates are available in metadata provided here. Previously published data: Native range contact calls were made publicly available with a 2020 publication in Behavioral Ecology. Article citation: Smith-Vidaurre, G., Araya-Salas, M., & Wright, T.F. (2020). Individual signatures outweigh social group identity in contact calls of a communally nesting parrot. Behavioral Ecology, 31(2), 448–458. https://doi.org/10.1093/beheco/arz202 Data citation: Smith-Vidaurre, Grace; Araya-Salas, Marcelo; Wright, Timothy F. (2019), Individual signatures outweigh social group identity in contact calls of a communally nesting parrot, Dryad, Dataset, https://doi.org/10.5061/dryad.w6m905qkg Invasive range calls recorded in 2004 were obtained from the authors of a previous publication: Buhrman-Deever, S.C., Rappaport, A.R., & Bradbury, J.W. (2007). Geographic variation in contact calls of feral North American populations of the monk parakeet. Condor, 109(2), 389–398. https://doi.org/10.1525/boom.2013.3.4.67.B -------------------------- SHARING/ACCESS INFORMATION -------------------------- Licenses/restrictions placed on the data, or limitations of reuse: See information for the figshare license (CC BY 4.0). We place no restrictions on the use this data, although we would appreciate being contacted beforehand to discuss whether collaboration and co-authorship would be appropriate. Recommended citation for the data: See the figshare citation associated with this data. Links to publicly available and citable code for reproducing results: Code and knitted RMarkdown files were made available in a GitHub repository (gsvidaurre/simpler-individual-signatures) for reproducing results with the data provided. -------------------- DATA & FILE OVERVIEW -------------------- See below for more specific information about each file. -------------------------- DATA-SPECIFIC INFORMATION -------------------------- We provide metadata and measurements of pre-processed monk parakeet calls that were selected within longer recordings and taken through a quality control filtering pipeline. Nest and social group size estimates from fieldwork are also provided. This data can be used along with the code made available on GitHub to reproduce results reported in the associated publication. Audio files of selected contact calls for the invasive range have not been made available, as these are currently being used for an independent manuscript in preparation. Native range calls were made available on Data Dryad with the publication of Smith-Vidaurre, Araya-Salas & Wright 2020, Behavioral Ecology (see above) as an extended selection table (.wav files and metadata in .RDS format). Audio files employed in native and invasive range comparisons will be made available as .wav files at a later date (not in .RDS format, as this R-specific file format eventually becomes incompatible with newer versions of R). Until then, please contact the corresponding author with requests or questions related to native and invasive range audio files. 1. nat_inv_indiv_site_seltbl.csv: Selection table with metadata of calls across datasets and ranges in .csv format. This file has 1596 rows and 26 columns, in which each row represents a different call. This selection table is the data frame of an extended selection table employed in analyses. The columns are as follows: - sound.files: Unique sound file names that contain the original recording name and the original selection ID within the full recording - selec: Unique selection ID for each row. This will always be 1 because each sound file was cut from the original recording as a distinct audio file while making an extended selection table - start: Start coordinate in seconds for the given call. Note this is in reference to the .wav file associated with the extended selection table, not the original recording - end: End coordinate in seconds. This is also in reference to the .wav file associated with the extended selection table, not the original recording - date: Date of the recording session in YYYY-MM-DD format. See appendix for more details - range: Either native or invasive - social_scale: Either individual or site scale. "Individual scale" refers to the dataset of repeatedly sampled individuals. "Site scale" refers to the larger dataset representing broader geographic sampling in each range, in which each call represents a unique individual - site: Site code with 4 alphanumeric symbols - Bird_ID: Unique alphanumeric codes of known repeatedly sampled marked or unmarked individuals - lat: Latitude, decimal degrees - lon: Longitude, decimal degrees - Visual_quality_score: Low (L), medium (M) or high (H) quality scores obtained by visual inspection - Overlapping_signal: Was there an overlapping signal for this call (of the same or another species)? Yes (Y) or No (N) - Truncated: Was the signal truncated during manual selection? Yes (Y) or No (N). A good number of 2004 calls were truncated on one or both ends - tailored: Were temporal coordinates tailored using warbleR::seltailor? Yes (Y) or No (N) - prev_SNR: Previous signal-to-noise ratio (SNR) calculation from native range analysis published in Behavioral Ecology - SNR_margin_works: Did a margin of 0.01 seconds for measuring SNR work for the given call? Yes (Y) or No (N). Determined by visual inspection using warbleR::snrspecs - SNR_before_only: If SNR_margin_works has a value of N, was SNR measured before the call only? For instance, SNR would be measured before a given call if there was another call produced shortly afterwards. This column contains values of Yes (Y) or No (N), or NA (if the column SNR_margin_works was set to Y) - old.sound.file.name: Old sound file name prior to renaming .wav files in an extended selection table - SNR: SNR measurement with the same version of warbleR for all calls. NAs indicate that SNR could not be measured (some truncated calls for instance) - region: Region sampled in each country - country: Country of sampling - site_year: Site and year combined - year: Year of sampling - dept_state: Department (Uruguay) or state (United States) sampled - invasive_city: City sampled in the United States. All native range samples have NAs in this column Note that this dataset contains metadata for 4 calls of a repeatedly sampled unmarked bird (NAT-UM4) that had not been removed at the site scale for a single site in our previous work. However, including or removing these calls did not change previous results reported in the Behavioral Ecology article for native range populations. See the code in SimplerSignatures_AdditionalMaterials_02_AcousticStructure_SupervisedML.Rmd for information on the associated file names and how to remove metadata or measurements for these calls as needed. 2. NestEstimates.csv A spreadsheet in .csv format containing nests estimated at a subsample of sites per range. Each row is a different site-year. This spreadsheet has 37 rows and 12 columns. The columns are as follows: - Range: Native or invasive - Year: 2017 (native range), 2011, 2018, or 2019 (invasive range) - Department_orCityState: Department in Uruguay (native range) or city and state (United States, invasive range) - Site_Code: Site code with 4 alphanumeric symbols. This matches site codes in the site column in the contact call selection table - Date: In DD-Month-YYYY format - Estimated_Nests: Number of nests estimated at the given site - Polygon_Traced: Was a polygon that encompassed the minimum area with nests traced in Google My Maps? Yes (Y) or No (N) - WhyNoPolygon: For sites without polygons, why was a polygon not traced? This column contains NAs for sites with polygons, and a character string describing why polygons were not traced for a handful of sites. Polygons were traced to obtain nest densities per range - Paces_Long: Length of the given cluster of nests in paces (GSV took paces unless otherwise specified, CH was Clara Hansen), each pace is about a meter. This column contains NAs when site areas were not estimated in this way - Paces_Wide: Width of the given cluster of nests in paces. This column contains NAs when site areas were not estimated in this way - Nest_Substrate: Substrate in which nests were observed - Notebook_Name: Name of notebook from which nest estimates were taken. GSV is Grace Smith-Vidaurre, VP is Valeria Perez-Marrufo, and TW is Tim Wright - Notebook_Pages: Notebook page numbers from which nest estimates were taken - Notes: Additional information 3. URY_US_Polygons.kml Polygons of nesting site areas for 25 sites total between the native and invasive ranges. Polygons were manually traced on Google My Maps and saved in Keyhole Markup Language (KML) format to be compatible with R. These polygons were used to calculate site areas and estimated nest densities per range 4. SocialGroupSize_Observations.csv A spreadsheet with 14 rows and 14 columns, in which each row is a unique observation of the largest social groups observed during fieldwork in each range. This spreadsheet contains the following columns: - Row_ID: Unique numeric identifier per row - Range: Native or invasive - Country: Country of sampling - Dept_State: Department in Uruguay (native range) or state (United States, invasive range) - Estimated_Flock_Size: Flock size estimates from fieldwork. Note that all observations of 100 or more birds were set to 100 during analysis - Behavioral_Context: Behavioral context in which flocks were observed - Observer: Observer initials. Observations were performed by the 3 co-authors - Year: 2017 (native range), 2011 or 2019 (invasive range) - Date: In DD-Month-YYYY format - Site_Code: Site code with 4 alphanumeric symbols. Some of these match site codes in the contact call selection table, but other sites were used only for observation (e.g. not for recording) and do not appear in the selection table - Site_Name: Full site name - Notebook: Name of field notebook from which observations were taken. GSV is Grace Smith-Vidaurre, VP is Valeria Perez-Marrufo, and TW is Tim Wright - Notebook_page: Page of field notebook from which observations were taken - Notes: Notes on the page number of scanned field notebook pages 5. sup_ML_fin.csv: A spreadsheet with 1561 rows and 206 columns, containing metadata from the selection table and predictors used for supervised machine learning. Each row represents a different contact call across social scales and ranges. The first 3 columns are metadata. The remaining 203 numeric predictors (15 original spectral acoustic measurements and 188 features) were pre-processed for collinearity. The columns are in the following order: - sound.files: Unique sound file name from the extended selection table - range: Native or invasive range - set: Possible values are training, validation, or prediction. This column was used to split calls into datasets for predictive modelling that classified calls back to the correct range - 15 standard spectral acoustic parameters as described in appendix - 15 multidimensional scaling (MDS) features generated from spectrographic cross-correlation (SPCC) on spectrograms, prefix "SPCC_spec_MDS" - 15 MDS features from SPCC on Mel-frequency cepstral coefficients, prefix "SPCC_ceps_MDS" - 15 MDS features from dynamic time warping (DTW) on dominant frequency contours features, prefix "DTW_domf_MDS" - 15 MDS features from DTW on spectral entropy contours, prefix "DTW_spen_MDS" - 15 MDS features from multivariate DTW on dominant frequency and spectral entropy contours, prefix "DTW_mult_MDS" - 25 principal components analysis (PCA) features from a set of standard acoustic measurements, prefix "acs_params_PCA" - 88 PCA features derived from Mel-frequency cepstral coefficients, prefix "cep_coeff_PCA" 6. freq_mod_est_m2h.csv A spreadsheet in .csv format of manually traced second harmonic frequency contours for 233 calls. This was generated from an extended selection table generated by subsampling the original extended selection table of all contact calls. This spreadsheet has 233 rows (each corresponds to a single call) and 128 columns. The first 26 columns were previously described in the selection table description above. An additional column called "question" delineates the question/comparison for which each call was used (but see notes on the 6 duplicates below). The remaining 100 columns (prefix "ffreq") contain manually traced frequency values at 100 timepoints across each call (in kHz). 7. freq_mod_df.csv A spreadsheet in .csv format that contains a randomly selected sample of calls from the full selection table. This spreadsheet has 239 rows and 27 columns. Each row represents a call, but some calls are duplicated. Six calls were selected for comparisons of frequency modulation between ranges, as well as temporal comparisons in the invasive range. Most columns are the same as the selection table described above (although slightly out of order), but one additional column called "question" indicates whether the given call was assigned to the range comparison ("spatial"), the temporal comparison within the invasive range ("temporal"), or to be possibly used for Beecher's statistic calculations ("indiv_scale"). 8. peaks_troughs_df.csv A spreadsheet of peak and trough estimates from the second harmonic frequency contours in .csv format with 1309 rows and 15 columns. Peak and trough estimates were obtained from smoothed spline curves of frequency contours, using customized code provided on GitHub. Here, each row corresponds to a peak identified per call using the customized peak searching routine. The first 8 columns were previously described above for the selection table. The remaining columns are as follows: - peak_number: Unique numeric identifier per peak identified per call - peak_height: Height of each peak, in kHz - peak_max_index: Index of the point along the smoothed spline curve that exhibited the maximum frequency value for the given peak - peak_start_index: Index of the start of the peak - peak_end_index: Index of the end of the peak - peak_trgh_slope: The slope of the given peak and the trough following the peak (if identified). Here the slope was calculated as the change in frequency/change in points along the smoothed spline curve - trgh_index: Index of the trough identified. Corresponds to the maximum index point for the peak obtained when the smoothed spline curve was inverted (troughs were obtained by searching for inverted peaks) 9. Mel_freq_cepstral_coefficients.csv A spreadsheet in .csv format with Mel-frequency cepstral coefficients used for supervised machine learning analyses as well as Beecher's statistic. The spreadsheet contains 1596 rows and 90 columns. Each row corresponds to a call. The first two columns were described above for the first selection table (sound.files and selec), and the remaining 88 columns correspond to descriptive statistics of the cesptral coefficients (e.g. minimum, maximum, median, as described in the associated publication and the warbleR package documentation for the function mfcc_stats). 10. acoustic_parameters.csv A spreadsheet in .csv format with standard spectral acoustic measurements used for supervised machine learning analyses as well as assessing the effect of range and habitat on contact call structure. The spreadsheet contains 1596 rows and 29 columns. Each row corresponds to a call. The first two columns were described above for the first selection table (sound.files and selec), and the remaining 27 columns correspond to acoustic measurements described in the associated publication and the warbleR package documentation for the function specan or spectro_analysis. 11. PCA_spectral_acoustic_measurements.csv A spreadsheet in .csv format with metadata and principal components generated by performing Principal Components Analysis on the 27 spectral acoustic measurements provided above. These principal components were used for supervised machine learning analyses as well as assessing the effect of range and habitat on contact call structure. The spreadsheet contains 1596 rows and 39 columns. Each row corresponds to a call. The first 12 columns were described above for the first selection table, and the remaining 27 columns correspond to principal components. -------------------------- METHODOLOGICAL INFORMATION -------------------------- See the associated publication, appendix, and commented code with knitted RMarkdown output for more information on how data were collected, processed and analyzed. These materials contain information about the software and versions of software used, quality control processing, and those who contributed to this research. -------------------------- FUNDING INFORMATION -------------------------- This research was supported by a Fulbright Study/Research grant to G.S.V., a New Mexico State University Honors College scholarship to Clara Hansen, an American Ornithologists’ Union Carnes Award to G.S.V., Experiment.com crowdfunding led by G.S.V. and Dr Kevin Burgio, a donation to G.S.V. from Michael and Susan Achey, a New Mexico State University Whaley Field Award to G.S.V., and MARC funding to V.P.M. (Biomedical Research Training for Honor Undergraduates supported by the U.S. National Institutes of Health/National Institute of General Medical Sciences (NIH/NIGMS 5T34GM007667)). G.S.V. was also supported by a NSF Postdoctoral Research Fellowship (grant number 2010982) while working on this research.