rover_ERV
Data and Jupyter notebooks supporting the manuscript "An endogenous retroviral element co-opts an upstream regulatory sequence to achieve somatic expression and mobility", authored by Natalia Rubanova, Darshika Singh, Louis Barolle, Fabienne Chalvet, Sophie Netter, Mickaël Poidevin, Nicolas Servant, Allison J. Bardin and Katarzyna Siudeja.
The notebooks notebooks/rover_TE_genotyping_ONT.ipynb and notebooks/rover_TE_aging.ipynb are Python scripts developed to perform post-processing and genotyping of the TE calls detected in the ONT samples using tldr [Ewing et al. (2020) Molecular Cell] (https://doi.org/10.1016/j.molcel.2020.10.024), as well as to normalize raw singleton counts.
data/PGFP_refTE_dm6.csv and data/PGFP_Illumina.csv are needed to run rover_TE_genotyping_ONT.ipynb.
The file data/PGFP_refTE_dm6.csv contains the list of the full-length reference TE insertions in the D. melanogaster ProsGFP genetic background.
The file data/PGFP_Illumina.csv contains the list of the non-reference TE insertions detected in the Illumina DNA-seq samples from [Siudeja et al (2021) EMBO J] (https://doi.org/10.15252/embj.2020106388), using readtagger [Siudeja et al. (2021) EMBO J] (https://doi.org/10.15252/embj.2020106388) and ngs_te_mapper2 [Han et al. (2021) Genetics] (https://doi.org/10.1093/genetics/iyab113).
The repository is also available on GitHub https://github.com/nrubanova/rover_ERV.