CVD Risk Prediction Synthetic Dataset

This is a synthetic dataset to teach students about using clinical and genetic covariates to predict cardiovascular risk in a realistic (but synthetic) dataset.

For the workshop materials, please go here:


1) dataDictionary.pdf - pdf file describing all covariates in the synthetic dataset.

2) fullPatientData.csv - csv file with multiple covariates

3) genoData.csv - subset of patients in fullPatientData.csv with additional SNP calls.