figshare
Browse
Barthe_et_al_Immunity_island_birds_Supp_Mat_final.zip (9.16 MB)

Supplementary materials for "Evolution of immune genes in island birds: reduction in population sizes can explain island syndrome"

This item is shared privately
dataset
modified on 2022-11-01, 16:38

Here is a brief description of the files present in each folder:

List of genes: 

  • List Immunity genes: Description and references for the immunity genes.
  • list_Sma3s: List of genes identified by Sma3s program
  • list_database: List of genes identified Immunome Knowledge Base, InnateDB and gene annotation.

Polymorphism analysis :

  • README: help to run seq_stat_coding
  • seq_stat_coding: Executable C++ file and source file (.cpp) to estimate synonymous nucleotide genetic diversity (Ps) and non-synonymous nucleotide diversity (Pn) 
  • removeStopCodon: Executable C++ file and source file (.cpp) to remove the stop codon from the alignments 
  • Clean_Alignment : Executable C++ fileand source file (.cpp) to exclude the site with less than a define number of  individuals
  • TLR7_Pmajor_XP_015492521.1: Input file example

Scripts and dataset : 

Simulations with SliM:

  • SliM__immunity.slim: Script used to simulate sequences under balancing selection using codon format and exon-intron format  

Formating dataset:

  • out_Seq_stat_*: Output of Seq_stat_coding according to our differents gene categories and selection pressure
  • Formating_dataframe_script: Script to calculate Pn/Ps ratio and make a dataframe ready to plot.

Plots and models :

  • Dataframe_ready_to_plot.csv: Output from Formating_dataframe_script. Table containing : Species, PNPS, PN, PS, D_Taj, GC, S, category, Origin, family and selection regime.
  • Plot_and_models_script: Script to plot results, and make differents models 
  • data_from_Leroy_etal_2021.csv : Informations and statistics about dataset from Leroy et al., 2021. 
  • Mitochondrial_phylogeny.treefile: Phylogeny based on mitochondrial genes of species from the dataset reconstructed by maximum likelihood method (IQTREE model GTR+Gamma and ultrafast bootstrap). 

Simulation analysis

  • Tab_h_overdominance : Effect of parameters h (dominance coefficient) on Pn/Ps for sequences simulated under overdominace.
  • Tab_m3_freq_dep :  Effect of parameters S (selection coefficient) on Pn/Ps for sequences simulated under frequency dependent.
  • Tab_Ne_overdominance : Effect of population size on Pn/Ps for sequences simulated under overdominance
  • Tab_Ne_freq_dep : Effect of population size on Pn/Ps for sequences simulated under frequency dependent.
  • Plot_simulated_results: Script to plot the effect of parameters on Pn/Ps from simulations.

Supplementary table and figures : 

  • Figure S1 : Distribution of the percentage of contaminating contigs. The red line represents the 80% quantile.
  • Figure S2: PCA of Cyanistes species
  • Figure S3: PCA of Cyanomitra and Turdus species
  • Figure S4: PCA of Ploceus species
  • Figure S5: Fis ~Nucleotide diversity 
  • Figure S6: Correlation between Pn/Ps (a) and Ps (b) calculated on the control genes in this study's dataset and those calculated by Leroy et al. (2021). 
  • Figure S7 - Missing_data_ps_Phylloscopus.pdf : Relationship between the maximum number of missing individuals allowed and synonymous nucleotide diversity (Ps) Phylloscopus trochilus and Fringilla coelebs.
  • Figure S8: Effect of sub-sampling size on PN/PS
  • Figure S9: Pn/PS according to Ps for sub-sampling control, TLR and BD genes.
  • Figure S10: Boxplot of Pn/Ps according to population size for simulated sequences under overdomiance via SLiM 
  • Figure S11: Boxplot of Pn/Ps according to population size for simulated sequences under frequency dependence via SLiM 
  • Figure S12: Boxplot of Pn/Ps according to a) initial selection coefficient of the mutation under frequency dependence b) dominance coefficient (h) for simulated sequences under overdominance via SLiM 
  • Table S1 - Model comparison using reduce number of families : Model selection of all genes categories using reduce number of families (we grouped Turdidae within Muscicapidae, Nectariniidae, and Estrildidae within Ploceidae and Fringillidae within Thraupidae). 
  • Table S2 - Lm & PGLS on dPn/Ps : Alternative models (Lm for linear models and PGLS for Phylogenetic Generalized Least Squares) using the difference between Pn/Ps of immunity genes and control genes (Pn/Ps) as dependent variable, and species origin as explanatory variable. 
  • Table S3 - Samples & sequencing information : Table with sampling and sequencing information regarding the samples newly-sequenced in this study, and those obtained from Leroy et al. 2021
  • Table S4 - Quality of sequences per individual : Table with sequencing quality information ( number of genes analysed, proportion of available positions, depth coverage) regarding the samples newly-sequenced in this study, and those obtained from Leroy et al. 2021
  • Table S5 to S14 : Model selection by AICc criterion and ANOVA test. Summary of the best models. 

Mitochondrial phylogeny:

  • AllSp_ultrafastaboot.treefile: The species phylogeny was estimated using mitochondrial genes and a maximum likelihood method implemented in IQTREE (model GTR+Gamma and ultrafast bootstrap
  • AllSp_mito.fst : Alignment of the mitochondrial genes in fasta format.

Table and figure of the main text:

  • Figure 1: Phylogeny based on mitochondrial genes of species from the dataset 
  • Figure 2 : Conceptual diagram showing the expected results 
  • Figure 3 : Boxplot of Pn/Ps according to species origin for different gene categories under purifying selection. 
  • Figure 4 : Boxplot of Pn/Ps according to species origin for different gene categories under purifying selection. 
  • Figure 5 : Effect of mutation type on Pn/Ps acording to Ne
  • Figure 6 : Boxplot of Pn/Ps according to species origin (mainland in green and insular in orange) for different gene categories under balancing selection. 
  • Table 1: List of species and sampling localities, along with the type of data obtained and the number of individuals (N). 
  • Table 2 : Statistical model explaining Pn/Ps variation of Toll-Like Receptors, Beta-Defensins genes, and control genes. The p-values of ANOVA test between simpler models are not reported if a more complex model explains a larger proportion of the variance.
  • Table 3 : Summary of the best statistical model selected using AICc explaining variation in Pn/Ps in control genes, Toll-Like receptors and Beta-Defensins genes under purifying selection with origin, gene category parameters.* indicates significances : * < 0.05; ** < 0.01; *** < 0.001.
  • Table 4 : Statistical model explaining Pn/Ps variation of genes under balancing selection (i.e MHC class I and II), and simulated sequences under frequency dependence.