Phylogenetic reconstruction and evolution of the Rab GTPase gene family in Amoebozoa

ABSTRACT Rab GTPase is a paralog-rich gene family that controls the maintenance of the eukaryotic cell compartmentalization system. Diverse eukaryotes have varying numbers of Rab paralogs. Currently, little is known about the evolutionary pattern of Rab GTPase in most major eukaryotic ‘supergroups’. Here, we present a comprehensive phylogenetic reconstruction of the Rab GTPase gene family in the eukaryotic ‘supergroup’ Amoebozoa, a diverse lineage represented by unicellular and multicellular organisms. We demonstrate that Amoebozoa conserved 20 of the 23 ancestral Rab GTPases predicted to be present in the last eukaryotic common ancestor and massively expanded several ‘novel’ in-paralogs. Due to these ‘novel’ in-paralogs, the Rab family composition dramatically varies between the members of Amoebozoa; as a consequence, ‘supergroup’-based studies may significantly change our current understanding of the evolution and diversity of this gene family. The high diversity of the Rab GTPase gene family in Amoebozoa makes this ‘supergroup’ a key lineage to study and advance our knowledge of the evolution of Rab in Eukaryotes.


Introduction
Cell compartmentalization is a crucial characteristic of Eukaryotes, and the Rab GTPase gene family is a central controller of these compartments [1,2]. Rab GTPases comprise a paralog-rich family that regulates all stages of membrane trafficking [1,3,4]. Rab proteins control from vesicle budding, cargo sorting and transportation, to vesicle tethering and fusion [2,3,5]. Through these processes, Rabs perform several cellular roles, such as maintaining the communication between the cell compartments and membrane, endocytic and exocytic pathways, and intraflagellar transport [1,2].
Amoebozoa are a very diverse eukaryotic 'supergroup' and have diverse cell forms, life cycles, and ecologies [25]. Currently, the Rab GTPase family has been annotated in members of two of the three major lineages of amoebozoans: Evosea, represented by Dictyostelium discoideum with around 56 annotated Rabs, Mastigamoeba balamuthi with around 25 Rabs, and Entamoeba histolytica with over 90 Rabs; Discosea, represented by Acanthamoeba castellanii, with 93 Rabs [23,26]; no Tubulinea has been sampled for Rabs. Thus, the investigation of the repertoire of the Rab GTPase gene family in Amoebozoa has been restricted to few species. Recently, several deeply sequenced transcriptomes of amoebozoans have been generated [25,27], enabling a broader study of the Rab GTPases in this eukaryotic 'supergroup'.
Here we present a comprehensive phylogenetic study of the Rab GTPase family in Amoebozoa. We considered genomes and deeply sequenced transcriptomes of 44 Amoebozoa lineages and a comprehensive eukaryotic Rab GTPase dataset previously available [23]; we also included representatives of breviates and apusomonads, two lineages that with Opisthokonta represent Obazoa, the sister group of Amoebozoa. We focused on a broad perspective of amoebozoan diversity, aiming to identify the general pattern of evolution of robust Rab GTPase subfamilies in a eukaryotic 'supergroup', rather than a comprehensive identification and annotation of all Rabs in all Amoebozoa. Our phylogenetic reconstruction put in an evolutionary perspective the Rabs previously annotated in the genomes of some amoebozoans and the new paralogs identified in the available transcriptomic data, comparing with the paralogs present in diverse eukaryotic lineages. We demonstrate that the three major lineages of Amoebozoa conserve most of the ancestral paralogs present in the Last Eukaryotic Common Ancestor (LECA) and have undergone a massive expansion of the Rab GTPase gene family through the origin of 'novel' in-paralogs (i.e., new paralogs of a given Rab subfamily originated through gene duplication of ancestral paralogs). By sampling several flagellated amoebozoans, we identified one ancestral paralog that has not been previously found in Amoebozoa. Our study demonstrates that no single amoebozoan lineage represents the diversity of Rab GTPase in Amoebozoa and corroborates that sampling diverse eukaryotic lineages in a 'supergroup' perspective may significantly improve our knowledge of the Rab GTPase gene family diversity and evolution.

Results and discussion
We considered a dataset of 44 Amoebozoa lineages, Pygsuia biforma (breviates), and Thecamonas trahens (apusomonads) composed of genomes and deeply sequenced transcriptomes (Supplementary Table 1 . The more significant number of Evosea representatives is due to the availability of several genomes for this major group. Also, we considered several evosean flagellated species in our analysis since a Rab paralog involved with the flagellum (IFT27/ RabL4) had not been previously identified in Amoebozoa. Additionally, we considered P. biforma and T. trahens, that compose amoebozoan sister-group Obazoa, and have not been sampled for Rabs.
We identified Rab sequences from the genomes and transcriptomes of amoebozoans, P. biforma, and T. trahens through similarity search (BLAST). For that, we compiled a comprehensive dataset of Rab sequences to serve as our query dataset. We initially considered as potential Rabs the sequences of amoebozoans, P. biforma, and T. trahens significantly similar to the sequences of the query Rab dataset (considering a BlastP E-value ≥ 1e-4). The BLAST similarity searches did not enable us to easily assign several Rabs of Amoebozoa, P. biforma, and T. trahens to one of the ancestral Rab subfamilies predicted to be present in the LECA or identify sequences that represent other families of the Ras superfamily; this was already expected given the high diversity and divergence of some Rab paralogs and the sequence similarity between Rab and other members of the Ras superfamily [28][29][30]. Thus, we further analysed the sequences identified by BLAST through phylogenetic reconstructions (not shown) and excluded non-Rab sequences (i.e., sequences representing other Ras subfamilies) to create a curated amoebozoan, breviate, and apusomonad Rab dataset.
We performed multiple phylogenetic reconstructions to assign the amoebozoan, breviate, and apusomonad Rab sequences to the Rab GTPase subfamilies predicted to be present in LECA (Supplementary Figure 1 A -B). First, we generated a master phylogenetic reconstruction considering the curated Rab dataset of the 44 Amoebozoa species, P. biforma, T. trahens (Supplementary Table 2), and the dataset curated by 23, (Supplementary Figure 1A and Supplementary  Figure 2; all the sequences considered in the present study are available in FASTA format as Supplementary Material 1). Although the master phylogeny has several regions with low resolution, especially at deep branching levels, it enabled us to recover and identify highly supported clades (i.e., ultrafast bootstrap branch support ≥95% as suggested by IQ-TREE documentation; Supplementary Figure 1C) of most Rab subfamilies present in Amoebozoa (Supplementary Figure 2). Interestingly, six of the seven Rab subfamilies recovered in clades of lower support (ultrafast bootstrap branch support between 80% and 94%) are those that expanded in Amoebozoa and have several 'novel' inparalogs, as shown below. Regions of low resolution have been consistently identified as a characteristic of the phylogenetic reconstruction of the Rab GTPase family given the evolutionary complexity of this gene family [23,24]. To further analyse specific subfamilies and the 'novel' in-paralogs that compose the amoebozoan Rab repertoire, we generated multiple phylogenetic reconstructions considering subsets of our master reconstruction (Supplementary Figure 1B and Supplementary Figure 3-9). We applied the Rabifier automated annotation to cross-validate the assignments of Rab sequences to Rab subfamilies made based on the phylogenetic reconstructions (Supplementary Table  2), enabling us to unambiguously identify the Rab subfamilies that were conserved in the last amoebozoan common ancestor and the extant amoebozoans, as well as the 'novel' Rab in-paralogs that appeared during the evolution of Amoebozoa.

Amoebozoa conserves most of the Rab GTPases subfamilies present in LECA
Amoebozoa conserves 20 Rab subfamilies of the 23 predicted to be present in LECA (Figure 1). We identified these 20 subfamilies in all major groups of Amoebozoa, except for IFT27 (RabL4), RTW (RabL2), and Rab23, that we were able to find exclusively in the major group Evosea, and Rab34 that was identified only in few amoebozoans sampled (Figures 2 and Figures 3). By sampling several flagellated amoebozoans, we identified for the first time the paralog IFT27 (RabL4) in Amoebozoa, a paralog known to be involved with intraflagellar transport in diverse Eukaryotes [31]. These findings demonstrate that most of the Rab GTPase paralogs present in LECA have been conserved in the last amoebozoan common ancestor (LACA) and are present in extant Amoebozoa lineages ( Figure 3).
The 20 paralogs predicted to be present in LACA have different conservation and potential loss patterns throughout eukaryotic groups. Several Amorphea (Amoebozoa + Obazoa) have conserved most of these paralogs, except for Pygsuia biforma (breviate), Thecamonas trahens (apusomonads), and Fungi, where there are marked potential losses ( Figure 3). Several of these paralogs have not been found in members of the other 'supergroups', such as Excavata, Archaeplastida, and SAR [23,24]. While Rab 1, 2, 5, 6, 7, 8, 11, and 18 have been conserved in most eukaryotes examined for Rabs, Rab 4,14,20,21,23,24,32,34,50, and Titan have been constantly lost in diverse lineages of Fungi, SAR, Excavata, and Archaeplastida [24]. Thus, while Fungi, SAR, Excavata, and some Archaeplastida can be characterized by a pattern of a massive reduction of these Rabs [23,24], Amorphea (except for Fungi, P. biforma, and T. trahens) have a pattern of conservation of most of the Rab paralogs present in the LECA, including consistent conservation in the three major lineages of Amoebozoa as shown here ( Figure 3).
Potentially 3 Rab paralogs (Rabs 20, 22, and 28) of the 23 predicted to be present in the LECA were absent in the LACA. These paralogs are absent in the genomes or transcriptomes of the 44 amoebozoans considered in this study. Although most amoebozoans transcriptomes and genomes are not complete (Supplementary Table  1), thus not being informative about the absence of a given Rab paralog, currently, we have no evidence . Note that the subfamilies Rab1, 2, 4, and 32 were recovered in lower supported (ultrafast BS<95%) or paraphyletic clades. This observation is consistent to what previous studies have found; Rab 1 and Rab2 have been consistently recovered as paraphyletic or lower supported clades due to Rab8 and Rabs4/14 respectively [23,24]. The Rab32 subfamily is recovered as a paraphyletic clade due to the branching pattern of Entamoeba's sequences classified as Rab32. for the presence of Rabs 20, 22, and 28 in any of the three amoebozoan major groups. These three paralogs have been consistently lost in several eukaryotic groups. For example, most Fungi, Excavata, Archaeplastida, and SAR have none of these paralogs [23,24]. Conversely, choanoflagellates, Metazoa, and P. biforma have retained Rabs 20, 22, and 28 (Supplementary Figure 2). We identified Rab28 in the T. trahens genome (Supplementary Figure 2), a paralogs also conserved in the kinetoplastids Trypanosoma brucei, Trypanosoma cruzi, and Leishmania major [18,43,44].
In light of our phylogenetic reconstructions and the previous discussion, we show that the evolution of Rab family in Amoebozoa is characterized by the conservation of most of the ancestral Rab representatives predicted to be present in LECA. This has been previously identified for single amoebozoan lineages with available genomes at that time (D. discoideum, A. castellanii, and M. balamuthi); here we show that this pattern of conservation of ancestral Rab subfamilies is robustly observed throughout the three Amoebozoa major lineages. This pattern of ancestral Rab conservation in all Amoebozoa major lineages contrasts to some eukaryotic lineages. For example, the whole rhodophyte red algae group (Archaeplastida 'supergroup') shows a pattern of massive loss of ancestral paralogs having only 6 of the 17 Rab paralogs presumably present in the ancestor of Archaeplastida [24]. Furthermore, most of the ancestral subfamilies of Rab that composed the last common ancestor of Amoebozoa have expanded over the evolution and diversification of Amoebozoa.

Rab GTPase family has expanded in all amoebozoan major lineages
The 20 Rab subfamilies predicted to be present in LACA represent the base for innovations of Rab GTPase in Amoebozoa. From the 'prototypical' sequences of these subfamilies, many 'novel' inparalogs originated through gene duplication across the evolution of Amoebozoa (for the approach behind the identification of 'novel' in-paralogs, please check ( Supplementary Figure 1 D -E). We unambiguously identified 'novel' in-paralogs in seven of the 20 Rab subfamilies that we predicted to compose the last amoebozoan common ancestor; these are Rabs 1, 2, 5, 8, 7, 11, and 32A/B (Supplementary Figures 3-10).  Figures 3 and 10). The 'novel' inparalogs EvoRab1B, EvoRab1C, EvoRab1D, and EvoRabG1/G2, previously annotated in Dictyostelium discoideum, were also identified in other evoseans; while EvoRab1B, EvoRab1C, and EvoRabG1/G2 (duplicated in D. discoideum) are present in diverse Eumycetozoa, EvoRab1D was identified in members of the four groups of Evosea (Cutosea, Archamoebae, Eumycetozoa, and Variosea) (Supplementary Figures  3 and 10). EvoRab1E, a 'novel' in-paralog that was first identified here, was exclusively found in species of the Variosea (Evosea) group. The 'novel' in-paralogs DdiRab1E, DdiRabA, and DdiRabF1, previously annotated in D. discoideum, were not identified in other amoebozoan species sampled in the present study (Supplementary Figures 3 and 10). Similarly, the 'novel' in-paralogs MbaRab1C and MbaRab1E were identified exclusively in Mastigamoeba, while EntRab1B was identified exclusively in species of the Entamoeba genus. 'Novel' in-paralogs of the Rab1 subfamily were also identified in Discosea. The in-paralogs DisRab1B, DisRab1D, and DisRab1E, previously annotated in Acanthamoeba castellanii, are also present in other Centramoebida (Discosea) ( Supplementary  Figures 3 and 10). DisRab1G, also previously known exclusively in A. castellanii, was identified in multiple members of Flabellinia and Centramoebia (Discosea). Conversely, we found the in-paralog AcaRab1F exclusively in A. castellanii (Supplementary Figures 3  and 10).

Rab5 expansions
Amoebozoa have at least eight 'novel' in-paralogs of Rab5 (TubRab5B, EvoRab5B, EvoRb5C, EvoRab5D, AcaRab5B, AcaRab5C, AcaRab5L, and AcaRab5L2- Supplementary Figures 5 and 10). TubRab5B, an inparalog first identified here, was exclusively identified in Tubulinea (Supplementary Figures 5 and 10). Three 'novel' Rab5 in-paralogs characterize the Evosea group; EvoRab5B was identified exclusively in the Variosea group species, EvoRab5C was identified in members of Eumycetozoa and Variosea, and EvoRab5D was identified in members of Eumycetozoa and Archamoebae (Supplementary Figures 5 and 10). The AcaRab5B, AcaRab5C, AcaRab5L, and AcaRab5L2, were found exclusively in A. castellanii; exceptionally, Rab5L2 were not annotated as a Rab5 by Rabifier, being annotated as RabX (Supplementary Table 2), although it branches as a member of the Rab5 clade and may represent a divergent member of this subfamily.
Altogether, the expansions observed in the subfamilies Rab1, Rab2, Rab5, Rab8, Rab7, Rab11, and Rab32A/B account for the total of 83 'novel' Rab in-paralogs currently identified only in Amoebozoa and that are present in at least one of its major lineages ( Supplementary  Figures 3-10). Most of these in-paralogs, mostly analysed in few amoebozoan lineages previously studied, are present in several species of Amoebozoa. Based on the pattern of presence observed for these 'novel' in-paralogs among the representatives of Amoebozoa sampled, we can presume in which ancestral these 'novel' in-paralog were already present (Figure 4). The current evidence indicates that independent Rab duplications leading to 'novel' in-paralogs may have occurred early in the evolution of each of the Amoebozoa major groups, for instance,  [25, 40, 66, and 67]. We named the 'novel' Rab in-paralogs based on the lineages they were identified and subfamily they compose. The Rab subfamilies are indicated by numbers and the members of the same subfamily are differentiated by letters. TubRab represents in-paralogs identified in multiple Tubulinea lineages, EvoRab represents in-paralogs identified in multiple Evosea lineages, DisRab represents in-paralogs identified in multiple Discosea, and EntRab represents in-paralogs identified in multiple Entamoeba lineages. AmoRab represents in-paralogs identified in at least one member of each of the three amoebozoan major lineages. TubRab2B (Tubulinea), EvoRab1D (Evosea), and DisRab1G (Discosea) (Figure 4). Some other in-paralogs may have appeared during the evolution of more inclusive groups, such as EvoRab1B (Eumycetozoa), EvoRab2C (Variosea), EvoRab7B (Dictyostelia: Eumycetozoa), and DisRab7D (Acanthamoebidae: Centramoebia) or even in a single genus, for example, EntRab1B (Entamoeba), DdiRab11B (Dictyostelium), and AcaRab8B (Acanthamoeba) (Figure 4). Interestingly, our analyses indicate that two 'novel' Rab in-paralogs (AmoRab2AC and AmoRab7B), previously identified in few species, may have appeared early in the evolution of Amoebozoa and have been conserved in extant members of the three amoebozoan major groups (Figure 4). This finding indicates that LACA had at least 22 Rab paralogs, represented by 20 that were already present in LECA and 2 (AmoRab2AC and AmoRab7B) exclusively identified in Amoebozoa.
These results demonstrate that Rab GTPases have independently expanded in all amoebozoan major lineages. We highlight the massive expansion of robust subfamilies observed in Evosea and Discosea. However, it is worth noting that Tubulinea, the major group of Amoebozoa with the least expressive evidence of Rab expansions, has no genome available to date. Interestingly, most of the 'novel' in-paralogs exclusive to Amoebozoa are assigned as Rab 1, 7, 11, or Rab32. This finding corroborates the observation of recurrent duplications of specific paralogs in diverse lineages [11,24].
For example, diverse lineages of Archaeplastida have duplicated Rabs 1 and 11 multiple times [24], while several eukaryotic lineages have independently duplicated Rab 5 [11]. Recurrent gene expansions in Amoebozoa is not restricted to the Rab GTPase gene family. The genome of D. discoideum is characterized by the presence of ~2770 genes that have originated through recent gene duplications [45], E. histolytica have several gene families expanded, such as Arf, Rho GTPases, receptor Ser/Thr kinases, and cysteine proteases [46][47][48], while M. balamuthi have expanded kinase, cathepsin, guanylate cyclases, and cGMP-dependent phosphodiesterases gene families [48]. The patterns and correlation of these gene family expansions are yet to be elucidated based on sequencing and analyses of amoebozoan genomes. For instance, analyses of E. histolytica have demonstrated the link between the expansions of some families (e.g., Hsp70) with transportable elements, while tandem duplication, local inversion, and interchromosomal exchange account for most of the gene duplication identified in D. discoideum [45,47].
Among these 'novel' in-paralogs, several seem highly divergent from their 'prototypical' in-paralogs, based on their relatively longer branches and distribution pattern in our phylogenetic reconstructions. As proposed by other authors [e.g., 24], relatively more divergent Rab in-paralogs may suggest the occurrence of neofunctionalization. In accordance, studies have successfully demonstrated some roles of Rabs that are characteristic to Amoebozoa [49,50]. For instance, EntRab11B, an 'novel' in-paralog of Rab11 subfamily identified in all Entamoeba species considered, is involved in the process of cysteine proteases secretion in E. histolytica and has a role in the pathogenicity of this species [49]. Interestingly, even some 'prototypical' in-paralogs (i.e., conserved amoebozoan Rab inparalogs that represent orthologs shared by diverse eukaryotic lineages) have unique cellular roles in Amoebozoa; for example, the 'prototypical' EntRab11A, other member of the Rab11 subfamily present in Entamoeba, may be involved in the encystation process of these organisms [51], while the 'prototypical' in-paralogs Rab7A and Rab5 of E. histolytica are involved with the function and biogenesis of the prephagosomal vacuole, a cellular structure characteristic to this species [52,53]. It is worth noting the massive expansion of the Rab7 subfamily in Amoebozoa, a subfamily involved with phagocytosis [53][54][55], that raises the question whether this expansion can be linked to a diversification of specialized phagocytosis in Amoebozoa. Thus, the diversity of Rabs identified in Amoebozoa, given the conservation and expansion of many Rab subfamilies, may underlie role innovations of this gene family in Amoebozoa that can be elucidated base on further studies of Rab functions in these organisms.

'Orphan' in-paralogs
In addition to the 'novel' in-paralogs assigned to one of the Rab subfamilies, several Rab GTPases of Amoebozoa are highly divergent and cannot be assigned to a Rab subfamily. These in-paralogs does not consistently branch as a member of one of the Rab clades analysed and, accordingly, are annotated as RabX by Rabifier (Supplementary Table 2 Table  2); while some of the RabX identified in Entamoeba are shared between different species of this genus, most of the RabX paralogs identified in amoebozoans are exclusive to single species and are not present in the other amoebozoans and eukaryotes considered in this study. These observations corroborate the notion that the high diversity of the Rab GTPase gene family in Amoebozoa impairs the unambiguous assignment of various Rab paralogs to a given subfamily or even identify the complete Rab repertoire of a given lineage, as noted by some previous studies [28,29]. Moreover, the vast repertoire of 'orphan' Rab in-paralogs present in Amoebozoa may represent a vast functional innovation and pseudogene origination of Rabs in this diverse eukaryotic group.
The quantitative disparity of RabX repertoire identified in lineages represented by sequenced genomes and those lineages represented by transcriptomes demonstrate the relevance of genomes to comprehensively assess the Rab GTPase gene family diversity. The abundance of divergent RabX is not exclusive to amoebozoans, some other groups deeply studied for Rab GTPases have diverse repertoires of divergent Rabs, for instance, Trichomonas vaginalis that has at least 51 divergent Rabs and Tetrahymena thermophila that has at least 42 divergent RabX [13,22]. It is worth noting the diversity and divergence of Rabs that compose the Rab repertoire of the parasitic amoebae Entamoeba histolytica Supplementary Figure 2; 26,29). Besides having a vast number of Rab in-paralogs, most of them currently assigned as RabX, even in-paralogs successfully assigned to one of the known Rab subfamilies (e.g., E. histolytica Rabs 1 and 32A/B) seems to be highly diverging sequences based on their relatively longer and divergent branches (Figure 1 and Supplementary  Figure 2). The identification of a large Rab family have been reported for other Eukaryotes [28], including for parasitic lineages [21,22]. Beyond underlying diversification of the eukaryotic cells, large and diverging Rab GTPase repertoires account for the potential of targeting Rabs to treat diseases caused by parasitic organisms, such as the parasitic amoebae E. histolytica [22,50,56].

Conclusions
Here, we present a comprehensive phylogenetic reconstruction and annotation of the Rab GTPase gene family in the 'supergroup' Amoebozoa. We demonstrate both the conservation of ancestral Rab paralogs in the extant representatives of Amoebozoa and the independent origin of 'novel' in-paralogs that occurred early in the evolution of Amoebozoa and in its three major lineages. From an amoebozoan ancestor with at least 22 Rab paralogs, each Amoebozoa major lineage diverged with different 'novel' in-paralogs. Several paralogs may even be restricted to more inclusive lineages (i.e., species, genus, or family). Our findings highlight that while key model organisms are useful as a starting point for understanding biological phenomena, taking into account the phylogenetic diversity is crucial. Also, we identified a consistent higher diversity of Rabs in lineages represented by genomes, supporting that the Rab GTPase gene family's repertoire is yet to be revealed once more genomes become available, not only in Amoebozoa but also in other eukaryotic groups. Thus, the diversity and evolution of the Rab GTPases are still underrepresented. The high diversity and evolutionary pattern of Rab in Amoebozoa bring a robust base for future studies aiming to reveal the structure, biochemistry, cellular role, and functional innovations of this gene family that may be responsible for part of the diversity of Amoebozoa. Furthermore, the diversity of Rab repertoire identified in Amoebozoa highlights the potential to target Rabs in therapeutic interventions against parasitic amoebozoans. Finally, Amoebozoa represents a fruitful lineage to advance further the current understanding of the Rab GTPase gene family, taking advantage of the availability of a robust body of knowledge about the diversity and evolution of this 'supergroup'.

Amoebozoa genomes and transcriptomes dataset
We considered a dataset of genomes and deeply sequenced transcriptomes of 44 Amoebozoa lineages (Supplementary Table 1). These lineages compose the nine subclades and the three major lineages of Amoebozoa (Supplementary Table 1; see [25], for a phylogenomic study of Amoebozoa), constituting a representative sampling of this group's diversity. This amoebozoan dataset is the compilation of deeply sequenced transcriptomes generated by two different studies, [25] (BioProject PRJNA380424) and [27] (BioProject PRJNA513164), as well as available genomes of Amoebozoa (Supplementary Table 1 Table 1) to sample a breviate and an apusomonad, respectively, that together with Opisthokonta form Obazoa, the sister group of Amoebozoa. We performed all the analyses of this study based on amino acid sequences; thus, we predicted the ORFs and obtained the amino acid sequences of all genomes and transcriptomes considered through TransDecoder (https://github.com/ TransDecoder/TransDecoder). We assessed the completeness of the genomes and transcriptomes through BUSCO tool suite, searching for 255 single-copy orthologs expected to be present in eukaryotes [57], command used: busco -i input_genome_transcriptome.faa -o output_file -m protein -l eukaryota_odb10.
The sequences identified through BlastP composed our preliminary Amoebozoa, P. biforma, and T. trahens Rab GTPase dataset. We combined this dataset and the dataset curated by [23],(see [23], Table S1 for Rab's sequence ID or accession number of 55 different eukaryotic lineages) and built our preliminary master dataset. To curate our preliminary dataset, we performed multiple sequence alignments and phylogenetic tree inference (not shown) using Ran sequences as our outgroup. This approach enabled us to select the sequences representing Rab GTPase members from the sequences identified through BlastP since the similarity search identified Rab sequences and sequences that represent members of the other families that belong to the Ras superfamily [see [30], for a description of the Ras superfamily and its members' relationship]. We also analysed these phylogenetic trees to identified artifactual duplication patterns of the same protein in each lineage. The artifactual pattern consists of a single lineage presenting several slightly different sequences of a single protein (probably due to assembly artefacts or alternative RNA splicing). We manually curated our Amoebozoa, P. biforma, and T. trahens Rab GTPase dataset, excluding the artifactual duplications identified. Finally, we obtained the master Rab GTPase dataset used in the present study, composed of 2,998 sequences (Supplementary Material1), including the 44 Amoebozoa species, P. biforma, T. trahens, and the dataset curated by [23]. To cross-validate our identification and assignments of Rab sequences, we classified the Rab sequences of Amoebozoa, P. biforma, and T. trahens with Rabifier2 [59].

Phylogenetic reconstructions
We performed the Rab GTPases' phylogenetic reconstructions based on multiple sequence alignments and phylogenetic tree inference by maximum likelihood. We performed the multiple sequence alignments with MAFFT [60], command used: mafft input > output. We performed automated alignment trimming to exclude and mask in the final alignments poorly conserved N-and C-terminal regions and highly variable internal regions using trimAl [61], command used: trimal -in input -out output -gt 0.75. We obtained trimmed Rab alignments composed of ~150 amino acids that we used for the phylogenetic analyses. We inferred all the maximumlikelihood trees using ModelFinder [62] and obtained node supports with the ultrafast bootstrap [63], both implemented in the IQ-TREE software [64], command used: iqtree -s input -m TEST -bb 1000. We considered Ran as an outgroup for the phylogenetic reconstructions that included all Rab subfamilies identified in Amoebozoa. Ran is the Ras superfamily member closest to Rab GTPase and has been proposed as the outgroup for phylogenetic studies of Rab GTPase [23,30]. For each phylogenetic reconstruction focusing on specific Rab subfamilies, we considered the closest Rab members as outgroup.

Disclosure statement
The authors declare no competing or financial interests.

Authors' contributions
ALPS conceived of the study, designed the study, carried out dataset curation, carried out the analyses, and drafted the manuscript; AT designed the study, helped carried out dataset curation and analyses, and critically revised the manuscript; MB designed the study, helped carried out dataset curation and critically revised the manuscript; DL conceived of the study, designed the study, coordinated the study, participated in data analysis and helped draft the manuscript. All authors gave final approval for publication and agree to be held accountable for the work performed therein.