1/1
21 files

MAGs protein functional clusters data

dataset
posted on 07.05.2021, 12:31 by Emile FaureEmile Faure, Ayata Sakina-Dorothée, Lucie BittnerLucie Bittner
Tab_PFCLevel_Faureetal2020_PR
Table with 233,756 lines corresponding to protein functional clusters, and 67 columns : PFC ID | Size of the PFC | KEGG-based unctional homogeneity | EggNOG-based functional homogeneity | Taxonomical homogeneity at Phylum level | Class level | Order level | Family level | Genus level | MAG level | Cross-validation R squared | Test set predictions R-squared | Importance in RF models of each environmental variable (51 columns) | Position on CCA1 (when available) | Position on CCA2 (when available)

Tab_hlePFCLevel_Faureetal2020_PR
Same as Tab_PFCLevel_Faureetal2020 but only with the 14,585 lines corresponding to PFCs highly linked to the environment.

Tab_DM_PFCLevel_Faureetal2020_PR
Same as Tab_PFCLevel_Faureetal2020 but only with the 7,834 lines corresponding to dark PFCs (no functional annotation, no taxonomic annotation under the Phylum level).

Tab_S93_PFCLevel_Faureetal2020_PR
Same as Tab_PFCLevel_Faureetal2020 but only with the 2,836 lines corresponding to PFCs overabundant at station 93.

Tab_S93Pseudoalteromonas_PFCLevel_Faureetal2020_PR
Same as Tab_PFCLevel_Faureetal2020 but only with the 1,928 lines corresponding to PFCs overabundant at station 93 that contained at least one protein from a Pseudoalteromonas MAG.

Tab_SeqLevel_Faureetal2020_PR
Table with 757,457 lines corresponding to all protein sequences included in the sequence similarity network, and 12 columns: Associated PFC | Protein ID | MAG of origin | EggNOG annotation | KEGG annotation | Domain of the MAG of origin | Phylum | Class | Order | Family | Genus | Nucleotidic sequence

Tab_hle_SeqLevel_Faureetal2020_PR
Same as Tab_SeqLevel_Faureetal2020_withnucl, but only with the 52,536 lines corresponding to proteins from PFCs highly linked to environmental gradients.

Tab_DM_SeqLevel_Faureetal2020_PR
Same as Tab_SeqLevel_Faureetal2020_withnucl, but only with the 20,552 lines corresponding to proteins from dark PFCs.

Tab_S93_SeqLevel_Faureetal2020_PR
Same as Tab_SeqLevel_Faureetal2020_withnucl, but only with the 8,364 lines corresponding to proteins from PFCs overabundant at station 93.

Tab_S93Pseudoalteromonas_SeqLevel_Faureetal2020_PR
Same as Tab_SeqLevel_Faureetal2020_withnucl, but only with the 6,450 lines corresponding to proteins from PFCs overabundant at station 93 that contained at least one protein from a Pseudoalteromonas MAG.

TabAbund_CC_mean_CC2
Abundance table with 233,756 lines corresponding to all PFCs of size 2 or more in the sequence similarity network, and 93 columns corresponding to the different samples used in our study. Abundances correspond to mean normalized abundance for each PFC, which were used in the statistical analysis presented in the paper main text.

TabAbund_CC_NOsum_CC2
Abundance table with 757,457 lines corresponding to all proteins involved in PFCs of size 2 or more in the sequence similarity network, and 93 columns corresponding to the different samples used in our study.

Proka_nucl.faa
FASTA file of the nucleotidic sequences of the 1,914,171 proteins detected in the 885 prokaryotic MAGs used in our study.

hlePFCs_nucl.faa
FASTA file of the nucleotidic sequences of the 52,536 proteins detected in the 14,585 protein functional clusters highly linked to the environment.

DM_PFCs_nucl.faa
FASTA file of the nucleotidic sequences of the 20,552 proteins detected in the 7,834 protein functional clusters associated to microbial dark matter.

S93_PFCs_nucl.faa
FASTA file of the nucleotidic sequences of the 8,364 proteins detected in PFCs overabundant at station 93.

S93Pseudoalteromonas_PFC_nucl.faa
FASTA file of the 6,450 proteins from PFCs overabundant at station 93 that contained at least one protein from a Pseudoalteromonas MAG.

Proka_prot.faa
FASTA file of the proteic sequences of the 1,914,171 proteins detected in the 885 prokaryotic MAGs used in our study.

Envi_context_Faure_et_al_2020
All environmental data used in random forest models.

Github_Data_Faure.tar.gz
Archive containing raw files necessary to reproduce our results using R codes available at https://github.com/EmileFaure/MAGsProteinFunctionalClusters.

History