posted on 2024-02-29, 20:11authored bySon Gyo Jung, Guwon Jung, Jacqueline M. Cole
Molecular design depends heavily on optical properties
for applications
such as solar cells and polymer-based batteries. Accurate prediction
of these properties is essential, and multiple predictive methods
exist, from ab initio to data-driven techniques.
Although theoretical methods, such as time-dependent density functional
theory (TD-DFT) calculations, have well-established physical relevance
and are among the most popular methods in computational physics and
chemistry, they exhibit errors that are inherent in their approximate
nature. These high-throughput electronic structure calculations also
incur a substantial computational cost. With the emergence of big-data
initiatives, cost-effective, data-driven methods have gained traction,
although their usability is highly contingent on the degree of data
quality and sparsity. In this study, we present a workflow that employs
deep residual convolutional neural networks (DR-CNN) and gradient
boosting feature selection to predict peak optical absorption wavelengths
(λmax) exclusively from SMILES representations of
dye molecules and solvents; one would normally measure λmax using UV–vis absorption spectroscopy. We use a multifidelity
modeling approach, integrating 34,893 DFT calculations and 26,395
experimentally derived λmax data, to deliver more
accurate predictions via a Bayesian-optimized gradient boosting machine.
Our approach is benchmarked against the state of the art that is reported
in the scientific literature; results demonstrate that learnt representations
via a DR-CNN workflow that is integrated with other machine learning
methods can accelerate the design of molecules for specific optical
characteristics.