Mouse TB in vivo data over 70 years
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
Supplemental data for paper on mouse TB in vivo data
Selecting and translating in vitro leads for a disease into molecules with in vivo activity in an animal model of the disease is a challenge that takes considerable time and money. As an example, recent years have seen whole-cell phenotypic screens of millions of compounds yielding over 1500 inhibitors of Mycobacterium tuberculosis (Mtb). These must be prioritized for testing in the mouse in vivo assay for Mtb infection, a validated model utilized to select compounds for further testing. We demonstrate learning from in vivo active and inactive compounds using machine learning classification models (Bayesian, Support Vector Machines and recursive partitioning) consisting of 773 compounds. The Bayesian model predicted 8 out of 11 additional in vivo actives not included in the model as an external test set. Curation of seventy years of Mtb data can therefore provide statistically robust computational models to focus resources on in vivo active small molecule antituberculars. This highlights a cost effective predictor for in vivo testing elsewhere in other diseases.