The Health Gym v2.0 Synthetic Antiretroviral Therapy (ART) for HIV Dataset
###===###
This synthetic dataset, centred on ART for HIV, was synthesised employing the model outlined in reference [1], incorporating the techniques of WGAN-GP+G_EOT+VAE+Buffer.
This dataset serves as a principal resource for the Centre for Big Data Research in Health (CBDRH) Datathon (see: CBDRH Health Data Science Datathon 2023 (cbdrh-hds-datathon-2023.github.io)). Its primary purpose is to advance the Health Data Analytics (HDAT) courses at the University of New South Wales (UNSW), providing students with exposure to synthetic yet realistic datasets that simulate real-world data.
The dataset is composed of 534,960 records, distributed over 15 distinct columns, and is preserved in a CSV format with a size of 39.1 MB. It contains information about 8,916 synthetic patients over a period of 60 months, with data summarised on a monthly basis. The total number of records corresponds to the product of the synthetic patient count and the record duration in months, thus equating to 8,916 multiplied by 60.
The dataset's structure encompasses 15 columns, which include 13 variables pertinent to ART for HIV as delineated in reference [1], a unique patient identifier, and a further variable signifying the specific time point.
###===###
This dataset forms part of a continuous series of work, building upon reference [2]. For further details, kindly refer to our papers:
[1] Kuo, Nicholas I., Louisa Jorm, and Sebastiano Barbieri. "Generating Synthetic Clinical Data that Capture Class Imbalanced Distributions with Generative Adversarial Networks: Example using Antiretroviral Therapy for HIV." arXiv preprint arXiv:2208.08655 (2022).
[2] Kuo, Nicholas I-Hsien, et al. "The Health Gym: synthetic health-related datasets for the development of reinforcement learning algorithms." Scientific Data 9.1 (2022): 693.
###===###
Latest edit: 16th May 2023.