gff3toembl-1.1.0.tar.gz (3.64 MB)
GFF3toEMBL: Preparing annotated assemblies for submission to EMBL
software
posted on 2016-10-06, 06:49 authored by Andrew PageAndrew Page, Sascha SteinbissSascha Steinbiss, Ben Taylor, Torsten SeemannTorsten Seemann, Jacqueline A. KeaneGFF3toEMBL has been published in JOSS and this is the version of the code used for the paper.
An essential part of open reproducible research in genomics is the deposition of annotated de novo assembled genomes in public archives such as EMBL/GenBank. The interfaces provided by the major archives do not allow for data to be easily submitted on a large scale without substantial prior knowledge on the part of the submitter. This has lead to a situation where less than 15% of all sequenced bacteria have corresponding public assemblies. We address this by providing GFF3toEMBL, which converts the output of the most commonly used automatic annotation tool, Prokka, and converts it to a format suitable for submission to EMBL. Built on the GenomeTools annotation processing library, GFF3toEMBL is robust, fast, memory efficient and well tested, and has been used to submit more than 30% of all annotated genomes in EMBL/GenBank. It is a small, but essential missing step in making genomic research more open and reproducible.