figshare
Browse
1/1
2 files

Supplementary Data for MS No Gene Left Behind: False and true positives in arthropod thermal adaptation candidate gene lists

Download all (1.17 MB) This item is shared privately
dataset
modified on 2020-11-16, 01:50

Genome-wide studies are prone to false positives due to inherently low priors and low statistical power. As the result, only repeated discoveries of the same candidate gene by independent studies have any predictive ability about environmental role of individual genes or gene families. We show that, across 28 genome-wide studies that reported Drosophila genes with possible roles in thermal adaptation, the combined list of candidate genes and orthologous groups are rapidly approaching, respectively, the total number of genes and orthologous groups in the genome. Yet, the majority of these spurious candidates have been identified by one or two studies, a likely event by chance alone. In contrast, a noticeable minority of genes have been identified by eight or more studies with the probabilities of such discoveries occurring by chance alone being exceedingly small. Thus, for this subset of genes, different studies are in agreement with each other, despite differences in the ecological settings, genomic tools used, methodology and reporting thresholds. Therefore, in order to identify genes most likely to be involved in a specific biological response, one should focus on genes that were included in candidate lists reported by independent studies. To that aim, we exemplarily provide a reference set of "confirmed" Drosophila candidate genes and orthologous groups involved in response to changes in temperature, as well as gene ontologies over- and under-represented among these genes and orthologous groups. Despite this approach is undoubtedly prone to false negatives, the list of “confirmed” Drosophila genes with thermal response includes many hundred genes, consistent with the “omnigenic” (Boyle et al. 2017) concept of genetic architecture of complex traits.


This dataset provides the necessary data for the final stages of this analysis.