Supporting data and files for, "Distinct evolutionary dynamics of horizontal gene transfer in drug resistant and virulent clones of Klebsiella pneumoniae."

Published on 2018-09-03T04:38:22Z (GMT) by KELLY WYRES
<div>This collection contains a number of supporting data files and code for the comparative analysis of 28 distinct K. pneumoniae clones described in, "Distinct evolutionary dynamics of horizontal gene transfer in drug resistant and virulent clones of Klebsiella pneumoniae."</div><div><br></div><div><b>CONTENTS:</b></div><div>1. Assemblies directory: 1092 genome assemblies in fasta format for K. pneumoniae isolates included in the comparative analyses</div><div>2. Annotations directory: gff formatted annotation files for each of the 1092 genomes (can be used as input for Roary pan-genome analysis)</div><div>2. Reference_chromosomes directory: 28 completed chromosomal reference sequences, one for each of the clones</div><div>3. Gubbins recombination analysis files directory: </div><div>i) 28 pseudo-chromosomal alignments, one for each of the clones (can be used as input for recombination detection and/or phylogenetic analyses). </div><div>ii) 28 Gubbins output recombination predictions (.gff, one per clone).</div><div>iii) 28 Gubbins per branch statistics files (one per clone). </div><div>iv) parseGubbins2counts.py - python script for parsing the recombination prediction files and calculating mean numbers of recombination events per window of the chromosome. </div><div>4. Pan-genome directory:</div><div>i) pan_genome_gene_content_matrix.tsv - tab delimited matrix with genes in rows and genomes in columns. 1 = gene present. 0 = gene absent. </div><div>ii) pan_genome_roary_presence_absence_output.csv - direct output from Roary pan-genome analysis</div><div>iii) pan_genome_PCA_coords.tsv - coordinates for first 40 principal components in tab delimited format, genomes in rows, coordinates in columns marked Axis1-40.</div><div>iv) pan_genome_PCA_coords_clone_centroids.tsv - coordinates for clone centroids in tab delimited format as above</div><div>v) summarisePanGenomeDistancesFromCentroids.py - python script for calculating individual Euclidean distances to clone centroids. Takes the genome coordinates and clone centroid coordinates files as inputs.</div><div>vi) accessory_gene_ancestry_by_clone.tsv - tab delimited table with clones in rows and genera in columns. Values indicate the proportion of accessory genes from each clone assigned to each genus by Kraken.</div><div>5. Phage directory:</div><div>i) phage_gene_content_matrix.tsv - phage gene content matrix in tab delimited format with genomes in rows and phage genes in columns. 1 = gene present. 0 = gene absent.</div><div>ii) phage_gene_reference_sequences.fasta - reference fasta sequences for the phage genes reported in the gene content matrix above.</div><div>iii) phagePCA_coords.tsv - coordinates for first 25 principal components in tab delimited format, genomes in rows, coordinates in columns marked Axis1-25.</div><div>iv) phagePCA_coords_clone_centroids.tsv - coordinates for clone centroids in tab delimited format as above</div><div>6. Defence mechanisms directory:</div><div>i) NTUH-K2044_cas_genes.fasta - strain NTUH-K2044 cas gene nucleotide sequences in fasta format.</div><div>ii) INF256_cas_genes.fasta - strain INF256 cas gene nucleotide sequences in fasta format. </div><div>iii) REase_reference_sequences.fasta - references for types I, II, III and IV REase sequences in fasta format.</div><div>7. CG15 subclades analyses directory:</div><div>i) CG15_KL2_subclade_pseudo_whole_genome_aln.fasta - pseudo-chromosomal alignment for CG15-KL2 subclade (input for Gubbins).</div><div>ii) CG15_other_subclade_pseudo_whole_genome_aln.fasta - pseudo-chromosomal alignment for CG15-other subclade (input for Gubbins).</div><div>iii) CG15_KL2_subclade.recombination_predictions.gff - Gubbins recombinations predictions for CG15-KL2 subclade.</div><div>iv) CG15_other_subclade.recombination_predictions.gff - Gubbins recombinations predictions for CG15-other subclade.</div><div>v) CG15_KL2_subclade.per_branch_statistics.csv - Gubbins branch statistics for CG15-KL2 subclade.</div><div>vi) CG15_other_subclade.per_branch_statistics.csv - Gubbins branch statistics for CG15-other subclade.</div><div>vii) CG15_subclades_PCA_coords.tsv - coordinates for first 40 principal components in tab delimited format, genomes in rows, coordinates in columns marked Axis1-40. CG15 split into CG15-KL2 and CG15-other.</div><div>viii) CG15_subclades_PCA_coords_clone_centroids.tsv - coordinates for clone centroids in tab delimited format as above.</div><div>ix) CG15_KL2_dated_genomes_beast.xml - XML input file for BEAST2 analysis for CG15-KL2 subclade (note only genomes for which isolate collection dates are known are included).</div><div><br></div><div><br></div><div><b>CITATION</b>:</div><div>Kelly L Wyres, Ryan R Wick, Louise M Judd, Roni Froumine, Alex Tokoloyi, Claire Gorrie, Margaret M C Lam, Sebastián Duchêne, Adam Jenney and Kathryn E Holt. 2018. Distinct evolutionary dynamics of horizontal gene transfer in drug resistant and virulent clones of Klebsiella pneumoniae.</div>

Cite this collection

WYRES, KELLY; Wick, Ryan; JUDD, LOUISE; Froumine, Roni; Tokolyi, Alex; GORRIE, CLAIRE; et al. (2018): Supporting data and files for, "Distinct evolutionary dynamics of horizontal gene transfer in drug resistant and virulent clones of Klebsiella pneumoniae.". University of Melbourne. Collection.