Columbia Open Health Data, a database of EHR prevalence and co-occurrence of conditions, drugs, and procedures
Posted on 2018-11-22 - 10:23
The Columbia Open Health Data (COHD) contains counts and prevalence rates of conditions, drug exposures, procedure occurrences, and patient demographics, and the co-occurrence frequencies between these concepts. Count and prevalence data were derived from the Columbia University Irving Medical Center's OHDSI database including inpatient and outpatient data. Counts are the number of patients associated with the concept, e.g., diagnosed with a condition, exposed to a drug, or who had a procedure. Prevalence is the prevalence of the concept in the electronic health records (EHR) calculated as the number of patients associated with the concept divided by the total number of patients in the data set. To protect patient privacy, all concepts and pairs of concepts where the count <= 10 were excluded, and counts were randomized by the Poisson distribution. The means and standard deviations of annual prevalence and co-occurrence rates are provided to assess the temporal stability of each concept or concept-pair.
Two data sets are available. The 5-year data set includes clinical data from calendar years 2013-2017 and 1,790,431 patients. The lifetime data set includes clinical data from all dates and 5,364,781 patients. While the lifetime data set captures a larger patient population and range of concepts, the 5-year data set has better underlying data consistency.
Clinical concepts (e.g., conditions, procedures, drugs) are coded by their standard concept ID in the OMOP Common Data Model.
COHD was developed at the Columbia University Department of Biomedical Informatics as a collaboration between the Weng Lab, Tatonetti Lab, and the NCATS Biomedical Data Translator program (Red Team). This work was supported in part by grants: NCATS OT3TR002027, NLM R01LM009886-08A1, and NIGMS R01GM107145.
Related resources:
OHDSI: https://www.ohdsi.org/
OMOP Common Data Model: https://github.com/OHDSI/CommonDataModel/wiki
OMOP vocabulary lookup: http://athena.ohdsi.org/
CITE THIS COLLECTION
DataCiteDataCite
No result found
Ta, Casey N.; Dumontier, Michel; Hripcsak, George; P. Tatonetti, Nicholas; Weng, Chunhua (2018). Columbia Open Health Data, a database of EHR prevalence and co-occurrence of conditions, drugs, and procedures. figshare. Collection. https://doi.org/10.6084/m9.figshare.c.4151252.v1