figshare
Browse

Rapid Detection of Physicochemical Indicators of Tobacco Flavorings Using Fourier-Transform Near Infrared Spectroscopy with Chemometrics and Machine Learning

Download (17.25 MB)
dataset
posted on 2025-05-06, 12:08 authored by Qinlin Xiao, Jian Zheng, Jing Wen, Fada Deng, Ruifang Gu, Li Li, Yong He, Juan Yang
Timely and rapid monitoring of the quality of tobacco flavorings is crucial for the accurate quality management of cigarette products. In this study, FT-NIR spectroscopy combined with chemometrics and machine learning was used to detect physicochemical indicators of tobacco flavorings. FT-NIR spectra of 1,608 flavoring samples, encompassing 145 categories and 90 production batches from actual industrial scenarios, were collected. The physicochemical indicators, including the acid value, relative density, and refractive index, were accurately measured. The effect of different spectral preprocessing methods (standard normal variate transformation (SNV), multiplicative scatter correction (MSC), and normalization) was compared. The least angle regression (LAR), successive projection algorithm (SPA), and random frog (RF) were used to select characteristic wavelengths. Partial least-squares regression (PLSR), decision tree (DT), least-squares-support vector machine (LSSVM), and convolutional neural network regression (CNNR) were applied to establish detection models. For acid value, the normalization-SPA-LSSVM model achieved the best performance, reaching an R2p of 0.929, RMSEP of 1.155, and an RPD of 3.741. For relative density, the MSC-LAR-LSSVM model performed best, with an R2p of 0.951, RMSEP of 0.018, and an RPD of 4.481. For the refractive index, the SNV-SPA-LSSVM model obtained satisfactory results, with an R2p at 0.955, an RMSEP at 0.004, and an RPD of 4.664. The results illustrated that FT-NIR spectroscopy is an effective approach for detecting physicochemical indicators of large-scale industrial tobacco flavorings and holds promise for accurate quality assessment of tobacco flavoring products. Also, the performance of the CNNR model is not consistently superior to that of conventional models, especially in situations when the number of features used for building models is relatively limited.

History