This dataset contains 45 821 orthogroup (.fal) files plus accompanying tree (.arb) files, clustered from the genomes/transcriptomes of 97 taxa spanning the eukaryote tree of life.
Accompanying this are two plaintext files, containing the ID codes for every taxon in the dataset and the eukaryote group to which each belongs. The files differ in their eukaryote group definitions - one contains 15 distinct groups, the other splits two of these (SAR and Haptista) into their respective subgroups, for a total of 18 groups.
Using the ID codes the orthogroup files can be scanned programmatically to determine the number of orthogroups that are shared between different groups of eukaryotes.