Robust Machine
Learning Inference from X‑ray
Absorption Near Edge Spectra through Featurization

Chen, Yiming; Chen, Chi; Hwang, Inhui; Davis, Michael J.; Yang, Wanli; Sun, Chengjun; Lee, Gi-Hyeok; McReynolds, Dylan; Allan, Daniel; Marulanda Arias, Juan; Ong, Shyue Ping; Chan, Maria K. Y.

doi:10.1021/acs.chemmater.3c02584.s001

cm3c02584_si_001.pdf (748.19 kB)

Robust Machine Learning Inference from X‑ray Absorption Near Edge Spectra through Featurization

journal contribution

posted on 2024-03-01, 19:05 authored by Yiming Chen, Chi Chen, Inhui Hwang, Michael J. Davis, Wanli Yang, Chengjun Sun, Gi-Hyeok Lee, Dylan McReynolds, Daniel Allan, Juan Marulanda Arias, Shyue Ping Ong, Maria K. Y. Chan

X-ray absorption spectroscopy (XAS) is a commonly employed technique for characterizing functional materials. In particular, X-ray absorption near edge spectra (XANES) encode local coordination and electronic information, and machine learning approaches to extract this information are of significant interest. To date, most ML approaches for XANES have primarily focused on using the raw spectral intensities as input, overlooking the potential benefits of incorporating spectral transformations and dimensionality reduction techniques into ML predictions. In this work, we focused on systematically comparing the impact of different featurization methods on the performance of ML models for XAS analysis. We evaluated the classification and regression capabilities of these models on computed data sets and validated their performance on previously unseen experimental data sets. Our analysis revealed an intriguing discovery: the cumulative distribution function feature achieves both high prediction accuracy and exceptional transferability. This remarkably robust performance can be attributed to its tolerance to horizontal shifts in the spectra, which is crucial when validating models using experimental data. While this work exclusively focuses on XANES analysis, we anticipate that the methodology presented here will hold promise as a versatile asset to the broader spectroscopy community.