posted on 2021-11-15, 19:33authored byAlexander Kensert, Robbin Bouwmeester, Kyriakos Efthymiadis, Peter Van Broeck, Gert Desmet, Deirdre Cabooter
Machine
learning is a popular technique to predict the retention
times of molecules based on descriptors. Descriptors and associated
labels (e.g., retention times) of a set of molecules can be used to
train a machine learning algorithm. However, descriptors are fixed
molecular features which are not necessarily optimized for the given
machine learning problem (e.g., to predict retention times). Recent
advances in molecular machine learning make use of so-called graph
convolutional networks (GCNs) to learn molecular representations from
atoms and their bonds to adjacent atoms to optimize the molecular
representation for the given problem. In this study, two GCNs were
implemented to predict the retention times of molecules for three
different chromatographic data sets and compared to seven benchmarks
(including two state-of-the art machine learning models). Additionally,
saliency maps were computed from trained GCNs to better interpret
the importance of certain molecular sub-structures in the data sets.
Based on the overall observations of this study, the GCNs performed
better than all benchmarks, either significantly outperforming them
(5–25% lower mean absolute error) or performing similar to
them (<5% difference). Saliency maps revealed a significant difference
in molecular sub-structures that are important for predictions of
different chromatographic data sets (reversed-phase liquid chromatography
vs hydrophilic interaction liquid chromatography).