Harvest and phylogenetic network analysis of SARS virus genomes (CoV-1 and CoV-2)
datasetposted on 09.06.2020, 16:48 by Guido Grimm, David Morrison
Data and analyses relating to:
Grimm GW, Morrion D (2020). Trees and viruses: the SARS group. Genealogical World of Phylogenetic Networks, posted 30/3/2020.http://phylonetworks.blogspot.com/2020/03/trees-and-viruses-sars-group.html
Grimm GW (2020a). Using Median Networks to study SARS-CoV-2. Genealogical World of Phylogenetic Networks, posted 20/4/2020.https://phylonetworks.blogspot.com/2020/04/using-median-networks-to-study-sars-cov.html
The main 7z-archive [7z.org/Wikipedia] includes all used data (non-curated gene bank harvest, curated alignment) and analysis files in standard phylogenetic data formats (FASTA, NEXUS, NEWICK, Splits-NEXUS). For labelling conventions and archive content see ReadMe.txt.
New in Version 2: Fully annotated genotype spread sheet, CoV2Genotyping (XLSX) of the CoV-2 subsample included in our original harvest tabulating particular and general mutations patterns. The main archive has been updated.
New in Version 3: Archive Hack-and-Fish.7z including analysis files for the experiment described in this post:Grimm GW (2020b). Hack and fish ... for recombination in the SARS group. Genealogical World of Phylogenetic Networks, posted 8/6/2020. https://phylonetworks.blogspot.com/2020/06/hack-and-fish-for-recombination-in-sars.html
MLTreeStrictGrCons—Maximum likelihood tree based on strict group consensus sequences and branch support established via non-parametric bootstrapping
NNetCPlusRecomb—Uncorrected p-distance (Hamming) based planar phylogenetic network based on the (strict) group consensus
data. Coloured lines refer to shared sequence patterns as visible from the alignment (likely recombination events)
NNetPlusSupport—Uncorrected p-distance (Hamming) based planar phylogenetic networks based on the non-consensed (original) data (in total, 291 near-complete virus genomes) used to define major groups for consensing approach. Bottom-right, bootstrap consensus network for the same data.
MutationPatterns1, ...2—Visualisation of mutation patterns that are either the consequence of homoplasy, i.e. convergent mutation from C to U in independent CoV-2 sublineages, or recombination. See 2nd post (Grimm, 2020a) for further details.
HnS.All.sumCNet, HnS.All.sumBSNet—Consensus networks of nine bit-wise ML trees (...CNet, strict) and according, pooled ML bootstrap pseudoreplicate trees (...BSNet, only splits with a frequency ≥ 20%). See 5th post (Grimm, 2020b) for explanations.
Referenced and further related posts are linked below.
Comments are welcomed; please use the comment option at our blog Genealogical World of Phylogenetic Networks.