Annotation of Acrocephalus scirpaceus genome
datasetposted on 15.09.2021, 13:50 authored by Camilla Lo Cascio SætreCamilla Lo Cascio Sætre, Fabrice Eroukhmanoff, Katja RönkäKatja Rönkä, Edward Kluen, Rose Thorogood, James Torrance, Alan Tracey, William Chow, Sarah Pelan, Kerstin Howe, Kjetill S JakobsenKjetill S Jakobsen, Ole K. TørresenOle K. Tørresen
The softmasked genome assembly of the reed warbler (Acrocephalus scirpaceus) and the softmasked genome assemblies of 9 bird species downloaded from NCBI were used with Mash v. 2.3 to estimate a distance matrix, which was converted into a full matrix and used as input to RapidNJ v. 2.3.2 to create a guide tree based on the neighbour-joining method. Cactus v. 1.3.0 was run with the guide tree and the softmasked genome assemblies as input. We also used Comparative Annotation Toolkit (CAT) v. 2.2.1-36-gfc1623d together with the hierarchical alignment format file from Cactus, with chicken (Gallus gallus) as reference genome, reed warbler as the target genome and the AUGUSTUS species parameter set to ‘chicken’. InterProScan v. 5.34-73 was run on the predicted proteins to find functional annotations, and DIAMOND v. 2.0.7 was used to compare the predicted proteins against UniProtKB/Swiss-Prot release 2021_03. AGAT v. 0.5.3 was used to generate statistics from the GFF3 file with annotations and to add functional annotations from InterProScan and gene names from UniProtKB/Swiss-Prot. We predicted 14,645 protein coding genes.