Efficient Mass Spectrometry Peak Detection by Combining Resolution Enhancement and Image Segmentation

Abstract Mass spectrometry data may be affected by random noise and baseline drift due to experimental instruments and conditions, posing significant challenges for detecting spectral peaks, particularly when identifying weak and separating overlapping peaks. To increase the sensitivity and enhance the resolution, we propose a mass spectral peak detection algorithm that integrates resolution enhancement and image segmentation. Initially, the extended Mexican hat wavelet is proposed by integrating the peak sharpening method with its wavelet. This approach accurately transforms mass spectra into wavelet space using the continuous wavelet transform. Subsequently, the triangular single-peak thresholding method, a more suitable threshold segmentation approach for spectral analysis, is introduced to identify ridges in the two-dimensional wavelet space. Compared to traditional Otsu and its improved variants, long-tailed single-peaked histograms are more effectively processed by this method with lower computational complexity, enabling faster identification of segmentation thresholds and image segmentation. Ultimately, peak positions are determined by utilizing ridge and valley lines in wavelet space along with the original spectrum. To evaluate the performance of the peak recognition algorithm, two metrics are introduced: the receiver operating characteristic (ROC) curve and the balanced F score (F1 score). When compared to multi-scale peak detection (MSPD), continuous wavelet transform and image segmentation (CWT-IS), the developed approach is more suitable for weak and highly overlapping peaks. The robustness and practicality of the method are verified through peak detection using matrix-assisted laser desorption ionization-time of flight (MALDI-TOF) mass spectra.


Introduction
Mass spectrometry (MS) is a technique that identifies compounds through the preparation, separation, and detection of ions (Bauermeister et al. 2022).It is characterized by high specificity, high sensitivity, and rapid response (Zhou, Zhang, and Ouyang 2022).MS enables qualitative and quantitative analysis of substances by identifying the location, intensity, and peak area of the peaks.It has been extensively applied in life sciences (Cao et al. 2021;Greer et al. 2021), analytical chemistry (Marshall et al. 2021;Zhao et al. 2023), and environmental sciences (Bahureksa et al. 2021;Dibke, Fischer, and Scholz-B€ ottcher 2021).
Peak detection results reveal various biological and chemical information about the analyzed substance, significantly affecting subsequent analyses.Noise can result in a high false detection rate for direct peak detection, and the smoothing process may filter out weak and overlapping peaks, increasing the leakage rate.Additionally, baseline drift can affect peak detection outcomes.Thus, investigating highly accurate peak detection methods remains an essential research topic in mass spectrometry.
So far, various peak detection methods have been proposed, such as amplitude threshold (Modak, Taha, and Raheem 2021), first-or second-order derivative (Tsugawa et al. 2015), curve fitting (Sugimoto et al. 2012), and wavelet transform.Among these methods, wavelet transform has gained popularity in peak detection research in recent years due to its multi-resolution characteristic.Lange et al. (2006) proposed the use of continuous wavelet transform for peak detection and peak parameter estimation at a certain scale, but it is more difficult to choose a suitable section of continuous wavelet transform scale.Du, Kibbe, and Lin (2006) study developed a powerful pattern matching method, suggesting the use of the two-dimensional continuous wavelet transform coefficient matrix of the signal to identify characteristic peaks.This is achieved by visualizing the relationship between the image ridges and the peak positions in the mass spectrum, reducing the impact of signal noise and baseline variations.However, this method is still limited in detecting weak and overlapping peaks.Gregoire, Dale, and Dover (2011) used a weighted wavelet mother function to enhance the peak finding algorithm, improving its ability to identify overlapping peaks in X-ray spectra.However, the accuracy for weak peaks within overlapping peaks remained low.Nguyen et al. (2010) proposed peak detection using over-zero lines of wavelet coefficients with Gaussian derivative wavelets to accurately detect true peaks with a low false discovery rate.However, the detection of overlapping peaks was unsatisfactory.Zhang et al. (2015) introduced multiscale peak detection utilizing ridges, valleys, and over-zero points information in the continuous wavelet transform (CWT) coefficient matrix, improving peak detection accuracy and better identifying overlapping peaks.However, this method is insensitive to weak peak detection and has a high false detection rate.Yang et al. (2020) suggested an image segmentation method for removing false ridges, effectively reducing the number of false peaks.However, this segmentation method is computationally complex and using the number of image pixels as a criterion for distinguishing peaks proved inaccurate.
From the above, it is clear that how a peak detection method balances high sensitivity to weak peaks, high resolution of overlapping peak separation, and low false detection rate of false peaks is the key to improve the performance.In this study, a more efficient algorithm for mass spectral peak detection is proposed, called resolution enhancement and image segmentation peak detection (REISPD), which combines resolution enhancement and image segmentation.In this algorithm, the Gaussian function is sharpened and employed as a transform operator.A convolutional sliding transform, equivalent to performing the wavelet transform, is applied to the original spectral data.This approach aims to filter and enhance the separation of overlapping peaks, ultimately improving their detection.A triangular single-peak threshold segmentation method, better suited for spectrogram analysis, is introduced.This method allows for quick image segmentation due to its simple calculations, addressing the limitation of traditional Otsu and its improved variants in handling long-tailed single-peak histograms.Additionally, the method enhances weak peak identification while eliminating false peaks outside the peak region.

Peak sharpening
The peak sharpening algorithm proposed by Li et al. (2016) for weighting the signal and its negative second-order differential is expressed as follows: where R is the resolution enhancement signal, F is the original signal, F (2) is the second order derivative of the original signal, and k is the sharpening factor.
The essence of sharpening is the differentiation of the signal.However, due to the noise amplification during the differentiation operation, filtering is necessary before sharpening a signal that contains noise.

Gaussian sharpening
According to the properties of convolution, the nth order derivative of the signal F(t) can be expressed as: From the above equation, it becomes evident that the order of signal differentiation and smoothing can be interchanged without affecting the final calculation results.To obtain the differential of the smoothed signal, the differential of the smoothing operator can be computed first and then applied to determine the signal's differential.
By selecting the Gaussian derivative function as the smoothing operator, the Gaussian derivative filter can be derived.According to Bruce, Morgan, and Larsen (2001), derivatives of all orders of the Gaussian function meet the admissibility condition of the wavelet function, and these derivatives are considered wavelets.Consequently, the convolution process involving the signal and the Gaussian derivative as a smoothing operator can be executed using the wavelet transform.The expression of Gaussian function: where A, l and r denote the peak height, peak position and peak width of the Gaussian function, respectively.Bringing G(t) into Eq.(1) gives where g(t) is the Gaussian sharpening operator.

Mexican hat function combined with Gaussian sharpening
Since the Mexican hat function is the negative second order derivative of the Gaussian function, the result of the convolution of the sharpened signal with the Mexican hat function can be written in the following form: where M(t)¼G(t) Ã G(t), by the nature of the convolution of two Gaussian functions is still a Gaussian function, M(t) is a Gaussian function, ÀM (2) (t) is essentially a Mexican hat function, and kM (4) (t) ÀM (2) (t) is an extension of the Mexican hat function, the socalled extended Mexican hat wavelet.
Figure 1 shows the Mexican hat wavelet and extended Mexican hat wavelet of the same scale with normalization.The extended Mexican hat wavelet has smaller full width at half maxima (FWHM) and sharper main peaks compared to the Mexican hat wavelet.As a result, it can more effectively identify overlapping peaks in wavelet space, particularly weak peaks within those overlaps.
In this study, M (2) (t) and M (4) (t) are defined as follows: Figure 2 presents the wavelet transform results for the Mexican hat wavelet and the extended Mexican hat wavelet, both applied at the same scale to two strongly overlapping peaks.The extended Mexican hat wavelet exhibits significant values at both peak positions, enabling the detection of weak peaks for subsequent ridge identification.In contrast, the Mexican hat wavelet displays significant values solely at the strong peak positions within the overlapping peaks, causing the weak peaks to be overlooked.

Single-peak thresholding-based image segmentation
Traditional Otsu and improved Otsu threshold segmentation The fundamental concept of traditional Otsu's method involves treating the image to be segmented as comprising two classes: one for the background and another for the target.The method uses variance to gauge the difference between the target and the background, with the gray level exhibiting the maximum variance between these classes being deemed the optimal threshold.Its mathematical expression is described by: where p 0 is the proportion of target points to the total image, l 0 is its average gray value, p 1 is the proportion of background points to the image, l 1 is its average gray value, l T is the average gray value of the image, and L is the gray level.Equation ( 8) shows that when a clear double peak appears in the histogram, conventional Otsu uses the grayscale value corresponding to the valley between the double peaks as the threshold value to achieve the separation of background and target.However, if the histogram is single-peaked or near single-peaked, the separation of this method is reduced.The grayscale histogram derived from logical mapping of the wavelet coefficient matrix, obtained through wavelet transform, often approximates a single peak.Applying Otsu's method directly for threshold segmentation results in large grayscale thresholds, causing some weak peaks with grayscale values in the lower gray level range to be considered background noise and subsequently ignored.Due to the limitations of traditional Otsu's method in single-peak threshold segmentation, several improved versions have been proposed, such as Fuzzy Otsu (Yang et al. 2020), median-based Otsu (Deng et al. 2021), and intelligent algorithm-based Otsu (Zhou et al. 2022).Nonetheless, fuzzy Otsu relies on the grayscale threshold value derived from traditional Otsu, combined with the fuzzy C-means algorithm to determine the final threshold.However, this method is computationally demanding and illsuited for large-scale spectrometry analysis.In the case of median-based Otsu, the threshold value is occasionally too large, causing portions of the target to be misclassified as background.Intelligent algorithm-based Otsu necessitates setting specific parameter values according to the problem at hand.However, this approach is prone to local optima, which can compromise segmentation results.

Triangular single-peak threshold segmentation
The triangular single-peak threshold segmentation method (Rosin 2001), a shape-based approach, is particularly effective when the image histogram exhibits a distinct singlepeak feature, while the wavelet coefficient matrix of the spectrogram corresponds to a histogram that is often single-peaked.Moreover, its low computational complexity enables rapid image segmentation implementation.
The algorithm's core idea involves drawing a straight line from the histogram's peak, following the direction of increasing gray values, until reaching the non-empty bin immediately after the first empty bin.The threshold is subsequently selected as the gray value corresponding to the point in this interval that maximizes the distance between the histogram and the straight line, as shown in Figure 3. Figure 3A displays the spectral peaks with noise, while Figure 3B presents the corresponding grayscale images of CWT coefficients simulated using Gaussian functions.

REISPD procedure for detecting peaks
The algorithm proposed in this paper can be divided into four steps: (1) selecting the extended Mexican hat wavelet, (2) segmenting the image using the triangle algorithm, (3) extracting the ridges and valleys in the wavelet coefficient matrix, and (4) locating the peaks.The detailed steps are as follows: A. The extended Mexican hat wavelet is chosen as the mother wavelet for the transformation, with the sharpening factor k typically set to 1.After the continuous wavelet transform, a two-dimensional CWT coefficient matrix is obtained, and the matrix coefficients are mapped to the range of 0-255 gray levels using a logical mapping approach (defined in Equation (S5)).B. A triangle algorithm for single peaks is used to determine the threshold value and segment the image.In the mapped matrix coefficients, points with gray values greater than the threshold are deemed targets, while those with gray values less than the threshold are considered backgrounds.This separation distinguishes the background and target areas, corresponding to the noise and ridge regions, respectively.
C. The estimation of ridge and peak positions is closely related; thus, considering both ridges and valleys in the original spectrum allows for accurate estimation of spectral peak positions.As a result, separate extraction of ridges and valleys in wavelet space is necessary.Ridge extraction begins with a sliding window search method to identify local maxima (defined in Equation ( S3)) at each scale in the ridge region of the CWT coefficient matrix.Once the search is completed, a two-dimensional matrix of local maxima is obtained.Next, the smallest scale is chosen to be the initial scanning scale, and the local maxima's location at that scale serves as the ridge root.The local maxima's location at the next scale is compared with the last value's location in the scanned ridge.According to the closest distance principle, the local maxima's location that satisfies the condition is added to the ridge.By traversing all scales and adhering to the closest distance principle, the extraction of the ridge is completed.Valley extraction occurs at every scale of the CWT coefficient matrix.The same sliding window search method is employed to identify local minima (defined in Equation (S4)), which in turn help estimate the starting and ending points of peak locations.D. The ridges are not straight lines and shift with increasing scale, so they can only be considered as a rough estimate of the peak position.To approximate the peak position, the ridge coordinates with the highest frequency of occurrence in each ridge are selected.Next, the maximum of the original spectrum between the starting and ending points is searched within the interval defined by the valley and ridge coordinates.The corresponding position of this maximum value is the final determined peak position.

Experiment
In order to verify the advantage of the algorithm in dealing with strongly overlapping peaks, a mass spectrum containing Gaussian white noise was generated using Eq. ( 3), which contains a total of six peaks including two weak peaks and two overlapping peaks.The specific parameters of the peaks are shown in Table 1.Further, to illustrate the performance of the proposed peak detection algorithm, a virtual matrix-assisted laser desorption ionization-time of flight (MALDI-TOF) spectral data set was used for validation.The dataset consists of spectra generated by a virtual MALDI-TOF instrument, based on the fundamental principles of the instrumentation as described by Morris et al. (2005).In order to show the advantages of the method, it is compared with two methods, multiscale peak detection (MSPD), and continuous wavelet transform and image segmentation (CWT-IS).
Finally, to more effectively evaluate the effectiveness and feasibility of the REISPD for peak detection in real spectra, a real mass spectrometer was used to examine the actual samples.The mass spectra were generated by QitVenture 1, a portable MS instrument employing an electron impact (EI) ion source.A standard styrene sample with a concentration of 1 ppb was used in the experiment: the analyte is a commonly detected substance in air pollutant.

Results and discussion
Performance on overlapping peaks Figure 4A and Figure 4B show the peak detection using the extended Mexican hat wavelet and Mexican hat wavelet as the mother wavelet, respectively.All weak peaks and overlapping peaks are identified, and the detected peak positions highly coincide with the true positions.However, when encountering strong overlapping peaks (6th l ¼ 750), as shown in Figure 4C and 4D, all peaks can still be identified using the extended Mexican hat wavelet.In contrast, the Mexican hat wavelet fails to detect weak peaks within the strong overlapping peaks.The fundamental reason is that the extended Mexican hat wavelet integrates a peak sharpening algorithm based on the Mexican hat wavelet, resulting in enhanced resolution.
Table 1.Peak height (A), peak position (l) and peak width (r 2 ) parameters of each peak in the simulated spectrum.

Performance against weak peaks
Figure 4E illustrates the threshold determination using Otsu, fuzzy Otsu, median-based Otsu, and triangle algorithms for the spectrum in Figure 4A.Otsu has the highest threshold among the four algorithms, causing many peaks with strong intensity to be considered noise and filtered out.This reflects Otsu's limitation in handling cases where the grayscale map has a single peak.While fuzzy Otsu and median-based Otsu address some of these shortcomings by reducing the threshold, they are still larger than those obtained by the triangle algorithm, leading to some weak peaks being missed.The triangle algorithm not only features a simple algorithm and low computational complexity but also effectively determines the threshold value of single-peak grayscale maps, enhancing weak peak detection.

Details of the identification and extraction of ridges
Figure 4F shows the heat map corresponding to the coefficient matrix of the simulated spectrum obtained by CWT, and the brighter region represents the larger wavelet coefficients in that region.The coefficient matrix is subsequently converted into a grayscale image using the logical mapping mode, as shown in Figure 4G, where the peak contours can be clearly seen.Figure 4H shows the local maxima identified by the sliding window search method for the wavelet coefficient matrix.Because the spectrum contains noise, there are many local maxima with small wavelet coefficients in the low-scale region.
After segmentation of the image using the triangular single-peak thresholding method, the ridge region is determined.The search for peak ridges is performed within the ridge region in the local maximum matrix, and the final ridges are shown in Figure 4I.This approach effectively removes the noise ridges and extracts the ridges corresponding to each peak.

Evaluation criteria
To evaluate the performance of REISPD, the widely used receiver operating characteristic (ROC) curve is used as a standard and compared to the performance of other methods.The ROC curve is based on the true positive rate (TPR) and the false discovery rate (FDR).TPR and FDR are defined in Equations (S6) and Eq.(S7), respectively.When two peak detection methods have the same TPR, the method with a lower FDR is considered more robust.Meanwhile, a method with a higher TPR demonstrates better performance at the same FDR.To balance the relationship between TPR and FDR, and to consider the impact of both metrics on the performance of the peak detection method, the balanced F score (F 1 score) is utilized as an alternative evaluation criterion.F 1 score is defined in Equation (S8).
The virtual MALDI-TOF dataset contains 25 sets of spectral data, of which each set has 100 spectral data.In order to verify the recognition ability of the peak detection method, 10 sets of spectral data are randomly selected, and 10 spectral data are randomly selected from each set.These 100 spectra are analyzed using MSPD, CWT-IS, and REISPD, respectively.The TPR and FDR of each spectrum can be obtained by different parameter settings.After obtaining all TPR and FDR for 100 spectra, the average values of FDR and TPR are calculated.The ROC curves of the three methods are plotted based on the average values of FDR and TPR under the same parameter settings.The parameters of CWT-IS are set between 20 and 300, and the threshold of MSPD is the same as REISPD, set between 0.01 and 1.

Performance comparison of different peak detection methods
The ROC curves of the three peak detection methods are shown in Figure 4J.As seen from the ROC curves, the FDR of MSPD varies significantly with the increase of TPR.This indicates that selecting an appropriate parameter is crucial for the algorithm's peak detection performance, which presents a considerable constraint on the practical application of the algorithm.In contrast, the TPR and FDR of CWT-IS are insensitive to the parameter settings, and the method has better robustness.However, when the FDR is greater than approximately 0.3, the TPR of MSPD is higher than for CWT-IS, and the performance of the former is significantly better than that of the latter because the pixel-based image segmentation method used by CWT-IS is not very good at removing noise, mistaking noise for true peaks.
According to Yang et al. (2020), when the FDR is greater than 0.4, the detection results become meaningless.It can be seen that, in the range of FDR less than 0.4, the TPR corresponding to each FDR of REISPD is higher than the other two methods, while maintaining a low FDR, thereby achieving the best performance among the three.The algorithm's performance is attributed to its use of extended Mexican hat wavelet and triangular single-peak threshold segmentation.This approach enhances the detection of overlapping peaks and weak peaks while simultaneously considering ridges, valleys, and original spectra in the wavelet space, effectively avoiding false peak misidentification.
Table 2 gives the optimal F 1 score and the corresponding TPR and FDR for each dataset processed by the three peak detection methods, and it can be seen that the F 1 metric values of REISPD are larger than those of MSPD and CWT-IS, which are more suitable for peak detection.
The REISPD, MSPD, and CWT-IS methods all use the continuous wavelet transform to convert the spectral information into wavelet space because the selected wavelet basis function has zero mean and symmetry, and the signal noise and baseline drift are automatically removed during the transform.In order to visualize the differences in the detection performance of these methods in terms of weak peaks, overlapping peaks and false peaks, the spectra selected from the MALDI-TOF spectral data set are analyzed using each of the three methods, as presented in Figure S1.As demonstrated in the The corresponding true positive rate (TPR) and the false discovery rate (FDR) are also provided.
magnified figure, MSPD and CWT-IS exhibit poor performance in detecting strongly overlapping peaks.They only detect the more prominent peaks, while the weaker ones are ignored.MSPD detects more spurious peaks, as illustrated by the red dashed ellipse in Figure S1B.CWT-IS causes some weak peaks to be missed, as depicted by the red dashed ellipse in Figure S1C.As seen in Figure S1A, REISPD successfully compensates for the shortcomings of these two methods, providing more sensitive detection of overlapping and weak peaks, while reducing the detection of spurious peaks.REISPD achieves this by incorporating peak sharpening, using the extended Mexican hat wavelet as the mother wavelet, and by employing the triangular single-peak thresholding segmentation method, which identifies a threshold that better separates the ridge region from the noise.
The results of REISPD for peak identification on a real spectrum of styrene, normalized by peak intensity, are presented in Figure 5.The figure demonstrates the algorithm's accurate identification of weak peaks in the spectra, while effectively avoiding misjudgment of false peaks.Based on the results, it can be concluded that REISPD is highly applicable for analyzing real mass spectra.

Conclusion
In this study, a more efficient mass spectrometry peak detection method combining resolution enhancement and image segmentation is proposed.By combining Gaussian sharpening and Mexican hat wavelets, the extended Mexican hat wavelet is introduced using the Gaussian convolution property.This enhances the separation and detection capabilities for overlapping peaks, particularly strong overlapping peaks, without increasing computational effort.Furthermore, the REISPD method employs a triangular algorithm that is better suited for handling histograms as single peaks compared to Otsu and its improved method.This results in better separation of background and target, improved detection of weak peaks, and reduced false positives.Also the triangle

Figure 1 .
Figure 1.Comparison of Mexican hat wavelet and extended Mexican hat wavelet on the same scale.

Figure 2 .
Figure 2. Mexican hat wavelet with two strong overlapping Gaussian peaks and extended Mexican hat wavelet transform: A (peak height), l (peak position), and r 2 (peak width).

Figure 3 .
Figure 3. (A) Spectral peaks with noise simulated by Gaussian function.(B) Grayscale histogram with triangle algorithm.

Figure 4 .
Figure 4. (A) Extended Mexican hat wavelet for peak identification of simulated spectra.(B) Mexican hat wavelet for peak identification of simulated spectra.(C) Extended Mexican hat wavelet for peak identification of simulated spectra containing strong overlapping peaks.(D) Mexican hat wavelet for peak identification of simulated spectra containing strong overlapping peaks.(E) Gray scale histogram and threshold.(F) CWT (continuous wavelet transform) coefficients.(G) Gray scale map.(H) Local maxima.(I) REISPD (resolution enhancement and image segmentation peak detection) extracted ridges.(J) ROC (receiver operating characteristic) curves of three peak detection methods.

Figure 5 .
Figure 5. Results for identifying the true MS peak of styrene by REISPD.

Table 2 .
F 1 score (balanced F score) obtained by three algorithms: MSPD (multiscale peak detection), CWT-IS (continuous wavelet transform and image segmentation method) and REISPD (resolution enhancement and image segmentation peak detection).