Descriptor-Free
Deep Learning QSAR Model for the Fraction
Unbound in Human Plasma
Posted on 2023-09-01 - 19:11
Chemical-specific parameters are either measured in vitro
or estimated
using quantitative structure–activity relationship (QSAR) models.
The existing body of QSAR work relies on extracting a set of descriptors
or fingerprints, subset selection, and training a machine learning
model. In this work, we used a state-of-the-art natural language processing
model, Bidirectional Encoder Representations from Transformers, which
allowed us to circumvent the need for calculation of these chemical
descriptors. In this approach, simplified molecular-input line-entry
system (SMILES) strings were embedded in a high-dimensional space
using a two-stage training approach. The model was first pre-trained
on a masked SMILES token task and then fine-tuned on a QSAR prediction
task. The pre-training task learned meaningful high-dimensional embeddings
based upon the relationships between the chemical tokens in the SMILES
strings derived from the “in-stock” portion of the ZINC
15 dataseta large dataset of commercially available chemicals.
The fine-tuning task then perturbed the pre-trained embeddings to
facilitate prediction of a specific QSAR endpoint of interest. The
power of this model stems from the ability to reuse the pre-trained
model for multiple different fine-tuning tasks, reducing the computational
burden of developing multiple models for different endpoints. We used
our framework to develop a predictive model for fraction unbound in
human plasma (fu,p). This approach is flexible, requires minimum domain expertise,
and can be generalized for other parameters of interest for rapid
and accurate estimation of absorption, distribution, metabolism, excretion,
and toxicity.
CITE THIS COLLECTION
DataCiteDataCite
No result found
Riedl, Michael; Mukherjee, Sayak; Gauthier, Mitch (2023). Descriptor-Free
Deep Learning QSAR Model for the Fraction
Unbound in Human Plasma. ACS Publications. Collection. https://doi.org/10.1021/acs.molpharmaceut.3c00129