Poster SSBP 2019 Birmingham - History of rare diseases - A data driven approach

Background: Rare diseases have been observed and documented since the ancient. Nevertheless, only since the development of molecular biology methods in the last century it was possible to identify and investigate their underlying genetic causes. In this study we collected and investigated first documentations of rare diseases and the discovery of their genetic cause and used this information for further analysis.
Methods: Data and information about rare genetic diseases, their causative genes and literature information about the first publication were collected from OMIM, Whonamedit, PubMed, and Google scholar. The dataset was constructed and harmonised in a spreadsheet and as machine-readable RDF nanopublication. The data is available in a Figshare data collection. The acquired data identifiers were then used to harvest information from other resources like Wikidata, DisGeNET, and Orphanet.
Results: According to underlying data, the description of rare genetic diseases started in 1788 with osteogenesis imperfecta. The first discovery of a causative gene was in 1967 with the gene causing Lesch-Nyhan syndrome. Investigating the timeline, the discovery rate of genes is linked to developments in molecular biology techniques while first descriptions of rare diseases follow the general trends in publication numbers. Analysis of citation scores reveals that there are rare but highly researched diseases like Rett syndrome, ALS, and rare genetic causes of common diseases like Alzheimer’s and Parkinson’s disease, and truly neglected diseases. Using identifier mapping, made available by DisGeNET, further information like disease prevalence data from ORPHANET, preferred publication journals from Wikidata, and disease super classes from DisGeNET could be acquired.
Conclusion: The creation of this dataset is an example how linking data can give benefit and allows drawing new conclusions – e.g. about the documentation of rare diseases and their causative genes. A crucial part is identifier and entity mapping, which allows to link data across different resources.
Data available: https://figshare.com/collections/Gene-Rare_Disease-Provenance_dataset_collection/4400798