Data for PhD Thesis on Next Generation Nematode Genomes

2012-09-29T15:04:45Z (GMT) by Sujai Kumar
<p>Data for PhD thesis on "Next-generation Nematode Genomes" Sujai Kumar</p> <p> </p> <p>(Note: The thesis itself will be made publicly available after the viva/oral examination is complete).</p> <p>Update: Thesis available at http://hdl.handle.net/1842/7609 (https://www.era.lib.ed.ac.uk/handle/1842/7609) </p> <p>--------------------------------------------------------------------</p> <p>Species Abbreviations:</p> <p><em>Trichinella spiralis</em> (ts)</p> <p><em>Ascaris suum</em> (as)</p> <p><em>Dirofilaria immitis</em> (di)</p> <p><em>Brugia malayi</em> (bm)</p> <p><em>Litomosoides sigmodontis</em> (ls)</p> <p><em>Acanthocheilonema viteae</em> (av)</p> <p><em>Strongyloides ratii</em> (sr)</p> <p><em>Bursaphelenchus xylophilus</em> (bx)</p> <p><em>Meloidogyne hapla</em> (mh)</p> <p><em>Meloidogyne incognita</em> (mi)</p> <p><em>Meloidogyne floridensis</em> (mf)</p> <p><em>Pristionchus pacificus</em> (pp)</p> <p><em>Caenorhabditis angaria</em> (ca)</p> <p><em>Caenorhabditis japonica</em> (cj)</p> <p><em>Caenorhabditis elegans</em> (ce)</p> <p><em>Caenorhabditis brenneri</em> (cbn)</p> <p><em>Caenorhabditis sp. 11</em> (csp11)</p> <p><em>Caenorhabditis remanei</em> (cr)</p> <p><em>Caenorhabditis briggsae</em> (cbg)</p> <p><em>Caenorhabditis sp.5</em> (csp5)</p> <p>--------------------------------------------------------------------</p> <p>File descriptions:</p> <p>--------------------------------------------------------------------</p> <p><strong>Chapter 3: Annotating nematode genomes</strong></p> <p>- 20_nematode_protein_files.tgz - This tgz file has 20 Nematode protein fasta files used in Chapter 3 "Annotating nematode genomes". The original files were obtained from WormBase (WS230), http://nematod.es, and www.inra.fr/meloidogyne_incognita/genomic_resources . The fasta files have been cleaned up: a) all whitespace converted to spaces in sequence headers (otherwise NCBI's makeblastdb fails) b) multi-line sequences have been converted to single line c) sequence IDs have been prefixed with a species abbreviation.</p> <p>- 20_nematode_genome_files_part{1,2,3}.tgz - These three tgz files are Nematode genome nucleotide fasta files. The original files were obtained from WormBase (WS230),http://nematod.es, and www.inra.fr/meloidogyne_incognita/genomic_resources . The fasta files have been cleaned up: a) multi-line sequences have been converted to single line b) sequence IDs have been prefixed with a species abbreviation.</p> <p>- 20_nematode_blast2go.annot.goslim.tgz 20 Blast2GO annotation files for each nematode proteome</p> <p>- 20_nematode_iprscan.tgz 20 proteomes with InterProScan annotations</p> <p>- 20_nematode_tRNA_counts.xls tRNA counts for 20 nematode genomes</p> <p>- 20_nematode_tRNAscan_gff.tgz tRNA locations for 20 nematode genomes (GFF format)</p> <p>- 20_nematode_rfamscan_gff.tgz Rfamscan output for 20 nematode genomes (GFF format)</p> <p>--------------------------------------------------------------------</p> <p><strong>Chapter 4: Lack of deeply conserved non-coding elements in nematodes</strong></p> <p>- tba.alignments.tar Whole-genome multiple alignment files for specific nodes in the nematode phylogeny: Clade III, Onchocercidae, Clade IV, Meloidogyne, Clade V, Caenorhabditis, Elegans group</p> <p>- tba.alignments.CNEs.tar CNE multiple alignment files for specific nodes in the nematode phylogeny (whole- genome multiple alignments with coding regions removed</p> <p>- tba.alignments.CNEs.stats.tgz Tab delimited files with length and relative identity for each CNE</p> <p>- pairwise.megablast.tar Pairwise MegaBLAST alignments for all 20 genomes</p> <p>- megablast.cluster.tgz MegaBLAST based clusters of CNEs</p> <p>--------------------------------------------------------------------</p> <p><strong>Chapter 5: The <em>Meloidogyne floridensis</em> genome reveals complex hybrid origins of the root-knot nematodes</strong></p> <p>- protein.faa.tgz Protein sets used for M. hapla, M. incognita, and M. floridensis after truncating at stop codons and filtering short proteins (protein fasta files)</p> <p>- cds.fna.tgz CDS transcript files corresponding to proteins in M. hapla, M. incognita, and M. floridensis (nucleotide fasta files)</p> <p>- mhmimf.98.self.id Tab-delimited file with self-identity scores for each CDS in each species</p> <p>- InParanoid-mh-mi-mf.tgz InParanoid results (pair-wise clustering)</p> <p>- QuickParanoid-mh-mi-mf.tgz QuickParanoid results (orthologous clusters across three species)</p> <p>- raxml-mh-mi-mf.tgz phylogenetic trees for each QuickParanoid cluster</p>