Harvest and phylogenetic network analysis of SARS virus genomes (CoV-1 and CoV-2)

Version 3 2020-06-09, 16:48

Version 2 2020-04-17, 11:29

Version 1 2020-03-30, 12:58

dataset

posted on 2020-06-09, 16:48 authored by Guido GrimmGuido Grimm, David Morrison

Data and analyses relating to:

Grimm GW, Morrion D (2020). Trees and viruses: the SARS group. Genealogical World of Phylogenetic Networks, posted 30/3/2020.http://phylonetworks.blogspot.com/2020/03/trees-and-viruses-sars-group.html

Grimm GW (2020a). Using Median Networks to study SARS-CoV-2. Genealogical World of Phylogenetic Networks, posted 20/4/2020.https://phylonetworks.blogspot.com/2020/04/using-median-networks-to-study-sars-cov.html

Comment to Forster et al. (2020), PNAS, doi: 10.1073/pnas.2004999117

The main 7z-archive [7z.org/Wikipedia] includes all used data (non-curated gene bank harvest, curated alignment) and analysis files in standard phylogenetic data formats (FASTA, NEXUS, NEWICK, Splits-NEXUS). For labelling conventions and archive content see ReadMe.txt.

New in Version 2: Fully annotated genotype spread sheet, CoV2Genotyping (XLSX) of the CoV-2 subsample included in our original harvest tabulating particular and general mutations patterns. The main archive has been updated.

New in Version 3: Archive Hack-and-Fish.7z including analysis files for the experiment described in this post:

Grimm GW (2020b). Hack and fish ... for recombination in the SARS group. Genealogical World of Phylogenetic Networks, posted 8/6/2020. https://phylonetworks.blogspot.com/2020/06/hack-and-fish-for-recombination-in-sars.html

Figures

MLTreeStrictGrCons—Maximum likelihood tree based on strict group consensus sequences and branch support established via non-parametric bootstrapping

NNetCPlusRecomb—Uncorrected p-distance (Hamming) based planar phylogenetic network based on the (strict) group consensus

data. Coloured lines refer to shared sequence patterns as visible from the alignment (likely recombination events)

NNetPlusSupport—Uncorrected p-distance (Hamming) based planar phylogenetic networks based on the non-consensed (original) data (in total, 291 near-complete virus genomes) used to define major groups for consensing approach. Bottom-right, bootstrap consensus network for the same data.

MutationPatterns1, ...2—Visualisation of mutation patterns that are either the consequence of homoplasy, i.e. convergent mutation from C to U in independent CoV-2 sublineages, or recombination. See 2^nd post (Grimm, 2020a) for further details.

HnS.All.sumCNet, HnS.All.sumBSNet—Consensus networks of nine bit-wise ML trees (...CNet, strict) and according, pooled ML bootstrap pseudoreplicate trees (...BSNet, only splits with a frequency ≥ 20%). See 5^th post (Grimm, 2020b) for explanations.

Referenced and further related posts are linked below.

Comments are welcomed; please use the comment option at our blog Genealogical World of Phylogenetic Networks.