figshare
Browse
1/1
8 files

Codon Usage of Secondary Metabolite Domains in Desert Soils and Marine Sponges

Version 5 2013-11-29, 17:11
Version 4 2013-11-29, 17:11
Version 3 2013-11-28, 20:14
Version 2 2013-11-28, 20:13
Version 1 2013-11-27, 19:02
dataset
posted on 2013-11-29, 17:11 authored by Ara KooserAra Kooser

This file set contains the initial analysis of codon usage on secondary metabolite genes in desert soils (from Reddy et al 2012), marine sponge microbiom (from Trindade-Silva et al 2012), and desert soil from New Mexico (Owen, 2013). 

Why Does this Exist:

I was wondering what were the similarities and differences in how microbial communities use secondary metabolite genes (genes that produce molecules that are sometimes useful to human as antibiotics and as other things). My PhD work is focusing on looking at these genes across several different environments in New Mexico (caves, sides of cliffs, and springs). I thought that there might be a connection between environmental conditions and the types and use of these genes. In this data set I am looking at single domains of larger gene clusters.

Methods:

The Reddy et al dataset is available as a sra and fastq file under the identifier SRR342214. The marine sponge KS domains are available under the accession numbers: JX012425:JX012657. The New Mexico desert soil data set is available from: eSNaPD http://esnapd2.rockefeller.edu/. The SRR342214 data set was mined with a custom python script to pull out all the barcoded KS domains and then remove domains with a size less than 150 nts. The New Mexico data set had all duplicate sequences removed. CodonW was used to calculate all condon bias indices and dinucleotide frequencies. The visualization of the data set was done in R studio using: ggplot2, gridExtrac, and FactoMineR. 

Questions Raised:

Does this single domain reflect the gene cluster? Tentatively I say yes based on some whole gene cluster analysis I did in codonw.

How much is random GC mutation as opposed to other pressures acting on the NP genes in the community?

A quick look the Nc versus GC3 shows most points falling away from the normal distribution. This shows something other then GC mutation influencing the domain.

Are there quantifiable difference between and within communities in their codon usage of the KS domains?

With this small dataset we can see a difference between marine and desert soils. Will this hold up for my larger dataset?

Does this tell us something useful about how the communities share their genes within and what pressure select for certain types of codon bias in the NP genes?

How are codons used across the different domains (AD, KS, and PKSa) within and across bacterial communities?

References:

Codonw, John Peden, Oxford University, available at http://bioweb.pasteur.fr/seqanal/interfaces/codonw.html

Natural product biosynthetic gene diversity in geographically distinct soil microbiomes.
Appl Environ Microbiol. 2012 May;78(10):3744-52. doi: 10.1128/AEM.00102-12. Epub 2012 Mar 16. Reddy BV, Kallifidas D, Kim JH, Charlop-Powers Z, Feng Z, Brady SF.

Polyketide synthase gene diversity within the microbiome of the sponge Arenosclera brasiliensis, endemic to the Southern Atlantic Ocean. Appl Environ Microbiol. 2013 Mar;79(5):1598-605. doi: 10.1128/AEM.03354-12. Epub. 2012 Dec 28. Trindade-Silva AE, Rua CP, Andrade BG, Vicente AC, Silva GG, Berlinck RG, Thompson FL.

Mapping gene clusters within arrayed metagenomic libraries to expand the structural diversity of biomedically relevant natural products Jeremy G. Owen, Boojala Vijay B. Reddy, Melinda A. Ternei, Zachary Charlop-Powers, Paula Y. Calle,Jeffrey H. Kim, and Sean F. Brady, PNAS 2013 ; published ahead of print July 3, 2013

R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.

H. Wickham. ggplot2: elegant graphics for data analysis. Springer New York, 2009.

Francois Husson, Julie Josse, Sebastien Le and Jeremy Mazet (2013). FactoMineR: Multivariate Exploratory Data Analysis and Data Mining with R. R package version 1.25. http://CRAN.R-project.org/package=FactoMineR

 

History