Customized R scripts and a data table for recombining genes assignments.
This repository contains files that complement the work by Pfeifer and Rocha on the recombination and conversion of phages, plasmids and phage-plasmids (preprint is here available: https://doi.org/10.1101/2023.08.08.552325).
Following files are listed:
A table (.zip) with information on the assignments of recombining genes between different types of mobile genetic elements. Recombining genes were defined as genes between dissimilar elements (gene repertoire relatedness, wGRR < 0.1), but with high gene similarity (>80% identity, >80% sequence coverage, and no more than 25 highly related genes between two elements). The table includes recombining genes (assignments: "From-To") among 3585 phages, 20274 plasmids, and 1416 phage-plasmids. The IDs of genes, proteins, and genomes were sourced from the NCBI database. Protein percentage identity (pident), E-value, and bitscore were computed using MMseqs2 (see Methods), while alignment fractions (sequence coverage) were determined by dividing alignment lengths by sequence lengths (coverage_from, coverage_to).
Customized R scripts are provided for computing wGRR (which may be memory-intensive), gene clustering using single linkage, quantification of gene flow, and enrichment tests (including Fisher tests).
Please note that there is a corrected version of the wGRR script (=wGRR_MGE_v2). The original one had a typo in L. 57, in which a pipe (%>%) was missing. This is now corrected.