This dataset contains nearly 5 million simulated rare disease patients. The patients are represented by HPO terms. There are 1000 patients simulated for each of approximately 5,000 disease genes as referenced in the Human Phenotype Ontology resources. Details on the method used for patient simulation are provided in the ICMLA 2024 paper, Automated Shared Phenotype Discovery in Undiagnosed Cohorts for Rare Disease Research and https://github.com/masino-lab/icmla-2024