AdaD-FNN for Chest CT-Based COVID-19 Diagnosis

Coronavirus disease 2019 (COVID-19) generated a global public health emergency since December 2019, causing huge economic losses. To help radiologists strengthen their recognition of COVID-19 cases, we developed a computer-aided diagnosis system based on deep learning to automatically classify chest computed tomography-based COVID-19, Tuberculosis, and healthy control subjects. Our novel classification model AdaD-FNN sequentially transfers the trained knowledge of an FNN estimator to the next FNN estimator while updating the weights of the samples in the training set with a decaying learning rate. This model inhibits the network from remembering the noisy information and improves the learning of complex patterns in the hard-to-identify samples. Moreover, we designed a novel image preprocessing model F-U2MNet-C by enhancing the image features using fuzzy stacking and eliminating the interference factors using U2MNet segmentation. Extensive experiments are conducted on four publicly available datasets namely, TLDCA, UCSD-Al4H, SARS-CoV-2, TCIA, and the obtained classification accuracies are 99.52%, 92.96%, 97.86%, 91.97%. Our novel system gives out compelling performance for assisting COVID-19 detection when compared with 22 state-of-the-art methods. We hope to help link together biomedical research and artificial intelligence and to assist the diagnosis of doctors, radiologists, and inspectors at each epidemic prevention site in the real world.


I. INTRODUCTION
C ORONAVIRUS disease 2019 (COVID- 19) is an ongoing pandemic disease caused by the SARS-CoV-2 virus. It rapidly swept worldwide, generating a global public health emergency within merely one month [1], [2]. According to the statistical data from World Health Organization, as of December 23, 2021, a total of 276436619 COVID-19 cases have been confirmed, resulting in 5374744 deaths. As the outbreak spread, many countries were affected, and a great deal of effort was devoted to the anti-epidemic project. However, many cases have been spread in communities with no history of travel to the outbreak area or contact with infected people. In such cases, health care workers need the highly sensitive COVID-19 diagnostic tool to ensure that every case is not missed, especially in those with false-negative reverse transcriptase-polymerase chain reaction (RT-PCR) [3].
Chest computed tomography (CCT) examination is regarded as a useful mainstream auxiliary technique for the diagnosis of viral pneumonia cases associated with COVID- 19 and was included in the national diagnostic treatment protocol (7th edition) of China [4]. When compared to those biomedical-based methods such as nucleic acid testing using real-time RT-PCR, CCT owns many outstanding advantages: (i) Normal nucleic acid testing process can take days to get results [5], while CCT approaches can quickly report back in minutes. (ii) Although nucleic acid testing is considered the 'gold standard' for clinical diagnosis, the problem of false-negative has been persistent [6]. This limitation can be supplemented with the reported high sensitivity of CCT in diagnosing COVID-19 [7], [8].
(iii) From an environmental point of view, CCT approaches reduce the consumption of materials, like swabs, paper boxes, plastic bags, etc., which to some degree relieve the possible pollution problems. In addition, CCT allows more precise visualization of extremely small nodules in the lung area compared with other chest imaging approaches such as chest X-ray and chest ultrasound approaches [9].
In the epidemic situation, the problem of labor intensity, work efficiency, and emotional fluctuation of radiologists are all significant factors that need consideration. A limitation of CCT is it may share certain similar imaging features between COVID-19 and other categories of chest diseases, thus making it difficult for radiologists to differentiate [8]. Besides, the presence of faintly ground glass in the early lung lesions may lead to missed diagnosis by radiologists [10], especially in the follow-up work of COVID-19 detection that requires a large number of slides reading. To help relieve this problem and strengthen the radiologists' recognition of the lesion features, we proposed a diagnostic system based on deep learning, which has made remarkable progress for automated diagnosis in the real world to automatically screen lesions, analyze and generate reports [11] Radiologists can utilize the rapidly generated information to make more credible judgments based on the overall screening, improving the detection rate of lesions, work efficiency, and reducing the possibility of missed diagnosis [12].
Our study intends to improve the recognition performance of COVID-19 infection in CCT images by developing a novel deep neural network 'Adaboosting-Decay Fractional maxpooling Neural Network (AdaD-FNN)' for classification and a novel image preprocessing model 'Fuzzy stacking enhancement U2MNet for CCT images (F-U2MNet-C)'. Our contributions and findings entail the following five angles: 1) Inspired by the architecture of U2-Net, an advanced version U2MNet is proposed. Although the U2-Net architecture provides rich information when the training set is sufficient, the boundary ambiguity problem about the regions of interest may exist when encountering a size-limited dataset. This paper demonstrates that the incorporation of maximum entropy threshold segmentation and erosion&dilation refining approach can effectively minimize this loss of boundary information. 2) A novel CCT image preprocessing model, F-U2MNet-C is designed by using the fuzzy stacking method for the feature enhancement and the redesigned U2MNet for the interference factors elimination. Compared with traditional preprocessing models, this model shows stronger universality by relieving the block effect at the boundary of sub-images, the difference in enhancement levels shown in adjacent areas, and the boundary ambiguity problem brought by the size-limited dataset. 3) A novel AdaD-FNN model is proposed for multi-class classification. The performance is evaluated on four public datasets with 22 state-of-the-art methods. 4) In the base estimator of AdaD-FNN, we used fractional max-pooling to replace max-pooling and average-pooling to further improve the network recognition capability. 5) The proposed AdaD-FNN consists of the multi-learningmode base estimators. Instead of simply introducing the decaying learning rate in a single estimator, we incorporated a lrdecay module inside the AdaBoosting structure to further help optimization and generalization. These five improvements can help enrich the performance of our model specially built for COVID-19 detection. Sections II-V will introduce the related work, methodology, discussions of experimental results, and the conclusions of our study.

II. RELATED WORK
Modern machine learning and deep learning technology have achieved a plethora of contributions when dealing with COVID-19 classification tasks from analyzing CCT images. Among these studies, research direction can be divided into single model-based classification models, ensemble learning-based classification models and image preprocessing models.

A. Single Model-Based Classification Models
Yu, et al. [13] gave the first attempt to integrate graph convolutional neural (GCN) network into COVID-19 detection. They constructed a graph based on the Euclidean distance between features extracted by the proposed ResNet101-C and then encoded the graphs with the features to output the final predicted results. The Laplacian smoothing in graph convolution can aggregate the features of nearest neighbour nodes and make the features of nodes from the same class similar, which is suitable for classification problems. However, when encountering a size-limited dataset, deep GCNs are prone to the phenomenon of over-smoothing, which makes the output features of nodes excessively smooth and difficult to be distinguished. To make better use of the information in a size-limited dataset, Scarpiniti, et al. [14] trained a deep denoising convolutional autoencoder and created a robust statistical representation by evaluating the histogram of the hidden features. Transfer learning-based approaches [15], [16] are also good choices for small datasets.
[17]- [21] fine-tuned the pre-trained models to extract distinct features from images, and further fed the resulting feature maps into appropriate classifiers. These rapid frameworks usually achieved promising results. But when the source task and target task are not sufficiently correlated, or the transfer learning does not make good use of the relationship between the source task and target task, a negative transfer phenomenon will occur, resulting in performance degradation. Thus transfer learning-based works perform not well on the heterogeneous dataset. To develop a network tailored for COVID-19 detection, Wang, et al. [22] presented COVID-Net by introducing a lightweight projection-expansion-projection-extension design, which enables enhanced representation capacity while reducing computational complexity. In its redesign study [23], a joint learning framework was proposed through conducting separate feature normalization and constructing the contrastive objective. However, previous works on other joint learning applications [24], [25] have observed that straight-forward joint learning brings limited improvement with heterogeneous datasets and may underperform when trained on a single dataset.

B. Ensemble Learning-Based Classification Models
Ensemble learning is a training concept of constructing multiple basic classifiers and ensembling them into a more powerful classifier to make the final decision [26], [27], which is widely used in classification tasks. One mode of ensemble learning is to generate the prediction function in parallel. Abdar, et al. [28] proposed a two-branches ensemble learning model. The first branch has five convolutional blocks, and the second branch is a transfer learning network based on VGG16. A fusion layer then concatenates the third, fourth, and fifth convolutional layers' output with the VGG16's output. Lu, et al. [29] employed ResNet-18 and ResNet-50 as backbone networks and fused the output by discriminant correlation analysis to obtain the refined features. Three randomized neural networks were trained using these refined features, and the predictions were ensembled. Khan, et al. [30] extracted features from AlexNet and VGG16 models and implemented an entropy-controlled Firefly optimization algorithm for the robust feature selection. Akram, et al. [31] first applied discrete wavelet transform and extended segmentation-based fractal texture analysis methods for feature extraction. Then an entropy controlled genetic algorithm was applied for feature selection. Another mode of ensembling is to generate each prediction function in order, as the latter model needs the weight of the previous model. Taherkhani et al. [32] utilized the Adaboosting to construct a strong classifier by combining a group of weak classifiers. The final prediction result was obtained by multiplying the predicted value and weight of the weak classifiers. In all, ensemble learning-based models usually outperform single models when trained on a single dataset and are not easy to overfit. But they have high computational costs and redundant features may exist after feature selection and feature fusion.

C. Image Preprocessing Models
Many studies incorporate the image preprocessing session in their classification system to improve the follow-up feature learning and extraction process. Khan, et al. [30] proposed a new hybrid contrast enhancement approach by sequentially employing linear filters to improve the visual quality of images. The framework in [33] generates lung masks using maximum entropy segmentation to let the subsequent training of the model focus more on the lung regions.
Through summarizing the previous studies, we conclude an overview: (i) The ensemble learning models usually consume high computational costs. (ii) Most of these previous works are sensitive to noise and may have a performance drop in the absence of high-quality and abundant datasets. To overcome these limitations, (i) we adopted the strategy of training using the multi-learning-mode base estimators, which can reduce the number of iterations as much as possible while maintaining high performance. (ii) F-U2MNet-C preprocessing model was proposed to enhance the image's visual quality and eliminate the interference factors. In the experimental section, the related works [34] and the state-of-the-art methods in the field of medical image multi-classification [35]- [37] will be further explored and compared with our framework.

III. METHODOLOGY
Section A, B, C give the basics of the novel F-U2MNet-C. Section D, E, F give the basics of the novel AdaD-FNN. Section G shows the implementation and measure indicators.

A. Preprocessing Using Proposed F-U2MNet-C
Preprocessing has already shown its success in many COVID-19 applications. Assume the raw dataset T R contained n 2D slice images be set as: First, the three-channel color CCT images were converted into grayscale images by retaining the luminance channel. The newly created grayscale dataset is: Fuzzy enhancement technique was then employed on t G (i) to enhance the features in the grayscale image, which is discussed in Section B. The newly reconstructed fuzzy enhanced dataset: Third, to minimize the interference in t F (i), we stacked the raw image t R (i) on t F (i) in terms of the enhanced correlation coefficient (ECC) criterion [38]. The fuzzy stacked dataset T ST is set as: Fourth, to suit the input dimension of deep neural networks, each image in T ST was resized to a smaller size of [a, b], obtaining a downsampled dataset T D as: After the above preprocessing procedures, assume the input image size as 1024 × 1024×3, each image will only cost about 1.60% of its original storage according to the byte compression ratio calculation: 224×224×1 1024×1024×3 = 0.01595. Fifth, to remove the background interference factors such as the regions of the heart, ribs, and thoracic vertebrae, we implemented U2MNet segmentation on T D . An element-wise multiplication was conducted between the obtained finishing segmentation mask set M ED and T D to output T SE : where represents element-wise multiplication. Details are discussed in Section C.

B. Improvement I: Using the Fuzzy Stacking Enhancement
A partition is often built to divide the image into statistically uniform sub-images for image enhancement. However, classic partition often meets a block effect at the border of sub-images. To avoid this, we employed fuzzy partition in the enhancement stage of F-U2MNet-C. The logic behind the fuzzy partition is 1) Separate the input image into fuzzy windows. 2) Calculate the membership degree of each pixel according to the distance between the window and the pixel. 3) Obtain the fuzzy mean and the fuzzy variance within the window. 4) Summarize the weights of the images of each fuzzy window in a weight way to generate the output image [39], [40]. Here, the weight values used are membership degrees, which define the fuzzy partition.
Without loss of generality, the rectangle S = [x 0 , x 1 ] × [y 0 , y 1 ] can be considered as image support. Let a fuzzy partition P of the image support S be: The space of gray levels is set as E = (−1, 1). For each t F (i) heaving the support S, the membership degrees ω kl of a point (x, y) ∈ S to the fuzzy window W kl is defined according to: where (x, y) refers to the coordinates of a pixel within the support S; δ works as a tuning parameter that controls the fuzzification-defuzzification degree of the partition P . For each window defined by (1) the fuzzy cardinality can be computed according to: f card (W kl ) = (x, y)∈S ω kl (x, y) .
Further on, fuzzy statistics of a gray level image t G (i) are employed in relation to the fuzzy window W kl . The fuzzy mean μ(t G (i), W kl ) and the fuzzy variance σ 2 (t G (i), W kl ) within the window W kl are thus defined as: where + is the addition; × is the scalar multiplication; − is the subtraction; E is the norm; f card (W kl ) is the fuzzy cardinality for each window W kl .
On support fuzzification, the function that transforms the pixels belonging to the fuzzy window W kl is defined as: where T kl represents the affine transform; σ 2 u is set to 1 3 in the experiment design.
Finally, the transform T enh is built as a sum of the affine transforms T kl , weighted according to the membership degree ω kl , to achieve the enhanced image t F (i): Significantly, as the transform is different depending on the brightness and contrast of each fuzzy window, the enhancement level shown in adjacent areas might be slightly different. To further improve the fuzzy enhancement, inspired by the work of [41], we stacked the raw image t R (i) on t F (i) in terms of the ECC criterion, which is a similarity measure for estimating the parameters of motion, and successfully solved this problem. Reasons for using ECC staking are: (i) Unlike the traditional similarity measure, ECC is variant to photometric distortions in brightness and contrast. (ii) ECC solves the optimization problem in a simple linear iterative strategy, which has a light computational cost. In all, compared with other enhancement methods, fuzzy stacking efficiently avoids the block effect and the adjacent area difference problem by supplying fuzzy partitions and enhanced correlation coefficient stacking.

C. Improvement II: A Novel U2MNet
Due to the limited size of the dataset, the preliminary masks obtained from traditional U2Net will possibly encounter a boundary ambiguity problem as revealed in Fig 1, which hinders the ideal segmentation effect of completely removing the interference factors. Inspired by the work of U2Net [42], we presented an advanced version named U2MNet that can deal with relatively small datasets through applying a Maximum entropy threshold segmentation-based [43], [44] approach to the preliminary masks M fuse .
Given an estimated probability density function ρ(g) in the digital image, the entropy in the downsampled image t D (i) can be defined as: where H represents the cumulative histogram corresponding to the cumulative probability. g is the grayscale level (with the abscissa and ordinate value of t D (i) as u and v). Given a specific threshold θ ∈ [0, C − 1], C stands for the limited range of pixel value (e.g., 256). For the two image regions Y 0 and Y 1 segmented by this threshold, the estimated probability density function can be expressed as: where P 0 (θ) and P 1 (θ) represent the cumulative probability of background and foreground pixels segmented by the threshold θ respectively, and the sum of them is 1. The corresponding entropy of background H 0 (θ), foreground H 1 (θ) and the total entropy of the image t D (i) are defined as: Through calculating the total entropy of the image under all the segmentation thresholds, we determined the final threshold by the segmentation threshold corresponding to the maximum entropy. The segmentation mask set M MET S was obtained by considering the pixel in the image whose gray value is larger than this threshold as the foreground, and the pixel whose gray value is smaller than the threshold as the background.
To avoid a circumstance that some lesion regions be eliminated together with the interference factors, a refining module ED was added into U2Mnet to create a finishing mask M ED by utilizing the erosion&dilation processing approach [45]- [47]. This approach depends solely on relative ordering, rather than the numerical values of the pixel values, thus is especially suited to the processing of the binary image.

D. Classification Using Proposed AdaD-FNN
An overview of our novel high-sensitivity deep learning framework AdaD-FNN for COVID 19 diagnosis is illustrated in part A of Fig 2. AdaD is an ensembled algorithm where a seqnuence of base estimators E is trained by a data weight vector W = {w i } , i = 1, 2, . . . , n to construct a strong classifier with higher classification ability. In our study, each estimator is set as a deep fractional max-pooling neural network (FNN). Suppose the training dataset is set as : {(κ 1 , e 1 ), . . . , (κ n , e n )}, s.t. e i ∈ (1, 2, . . . , L), where κ i is an input vector, e i is the corresponding output of κ i , n is the number of training samples, L is the total number of classes.
The first estimator E k = 1 (κ), where k is the index of estimators, is trained on all the training samples with the same weight of an initialized W 1 = 1 n and the learning rate η 1 . The output vector for an input sample κ i is defined as: ∈ (1, 2, . . . , L), where OP stands for the probabilities that the applied input belongs to the L classes.
OP k (κ i ) is then used to update the data weights W = {w i } by: i is the weight of the i th training sample utilized by the k th estimator, OP k (κ i ) refers to the output vector of the k th estimator in response to the i th training sample, η k stands for the learning rate of the k th estimator, Z i refers to the label vector corresponding to the i th training sample.
This weight updating concept helps the training sample focus more on the hard-to-identify samples by increasing the weights for the misclassified samples in further learning. The construction of a strong classifier is completed when the expected error rate is reached. After the training of K base estimators, the output class E(κ) is predicted through: where o k l (κ) stands for the l th element of the output vector of the k th estimator when κ is applied as its input.
The structure of the proposed 10-layer FNN is demonstrated in part B of Fig 2, within which 2D Conv Blocks (CBs), Fractional max-pooling Layers (FPs), and Dense Blocks (DBs) are utilized as the main building elements. In each CB block, the convolutional layer (CL) outputs are fed into a batch normalization layer to avoid the covariate shift, which will probably cause gradient divergency during backpropagation.

E. Improvement III: Using FP to Replace MP and AP
To avoid the rapid loss of surrounding information, Graham [48] formulated a fractional version of max-pooling (MP) named fractional max-pooling (FP) by giving allowance for the multiplicative factor α to be a non-integer value. For instance, if N in /N out ≈ n √ 2, then the reduction rate of feature information will be n times slower. From another perspective, a non-integer α gives chances for more pooling layers to be used in the backbone of the neural network. For instance, if pooling in the neural network structure is to reduce the size of FM by a factor of √ 2, then twice as many layers of pooling could be  used when compared with a factor of 2. Every time of pooling can be considered as an opportunity to view the input image at a different scale, and viewing the image at the 'correct' scale would help recognise the 'distinct' features that could identify subjects belonging to a particular class, thus we replaced MP and average pooling (AP) with FP to improve the recognition capability of our proposed network [48], [49].

F. Improvement IV: Incorporate a IrDecay Inside the AdaD
In the traditional AdaBoosting (Ada) structure, when the prescribed number of iterations is reached, the training of samples will stop without finding the expected error rate. However, simply setting a large iteration number for searching the expected error rate would bring huge computation costs and make the whole system long time lasting. Therefore, we adopted the idea of incorporating a learning rate decay (lrDecay) module into the structure to help every classifier find the global minimum of error function of neural networks and achieve better performance. This new version of Ada structure substitutes the traditional format of base estimator with multi-learning-mode base estimator and is named as AdaD structure. A toy illustration is shown in Fig 4, where E K (κ) represents the K th estimator when κ is applied as its input; W = {w i } represents the data weights.
In the lrDecay module, the learning rate η is scheduled to be reduced after the D 1 th , D 2 th , …, D h th estimator according to where k represents the index of estimator, η k represents the learning rate of the k th estimator, · stands for a multiplication process, is a drop factor belonging to (0, 1).
Through introducing this module, the initially large η will inhibit the network from remembering the noisy information of the dataset, while the subsequent decaying η will gradually improve the learning of complex patterns in the image [50]. The final determination of parameter setting in our study is η 1 = 0.05,

G. Implementation and Measures
The pseudocode for the implementation of our proposed F-U2MNet-C and AdaD-FNN is listed in Algorithm 1. The learning rate was initialized with 0.05 and the drop factor was set as 0.6. For our proposed method and all the comparison methods, we totally trained 80 epochs with a batch size as 10.
To make complete use of the dataset information, a five-fold cross-validation was implemented, and we would have five times of running. In each running, images were split into 80% for training and 20% for testing. The F-U2MNet-C was used on the training set of five-fold cross-validation. Four performance metrics (accuracy, precision, sensitivity and F1 score) are introduced to comprehensively evaluate our framework.

IV. EXPERIMENTS, RESULTS, AND DISCUSSIONS
The experiments were written using the language of Python 3.7.10 and the programming platform of Matlab R2021b. The programs ran with NVIDIA TESLA P100 GPUs. The performances are reported over the test sets with five runs.

A. Datasets and Statistical Results
The proposed framework is evaluated on three public datasets with 2D slices of CT volumes TLDCA [29], UCSD-Al4H [16], SARS-CoV-2 [51], and one with entire CT volumes TCIA [52]. Our experiment conducted five-fold cross-validation on each dataset. We report the results in form of average and standard deviation in Table I, and the results of ROC curves over four data sites in Fig 5. Through analyzing the misclassification cases in 10 runs, we made two-fold conclusions: (i) Many of these misclassifications are abnormal samples. The patterns of these samples are different from the same category samples input in the training set. They may obtain high weight in AdaD iteration and affect the prediction accuracy of the final strong classifier. (ii) There exists inconsistency in data. In other words, images collected from different machines in hospitals may not have the exact same resolution. Besides, in a few cases, there exist foreign objects or texts in the images. As a consequence, these issues impacted the model and led to misclassifications.
To further explore the robustness of our framework, we artificially lower the image quality by separately introducing    Table II, we concluded that all the drops are within 5%, thus our framework is robust to different categories of noise.

B. Ablation Study
The experiments were conducted on site A to further explore the effectiveness of each component in the final results.
1) Effectiveness of F-U2MNet-C: Assume the datasets be preprocessed at four stages, including 'Stage I: raw dataset'. 'Stage II: Stage I + fuzzy enhanced', 'Stage III: Stage II + ECC stacking', 'Stage IV: Stage III + U2MNet segmentation'. After being trained using AdaD-FNN, the results in Table III demonstrate that each stage outperforms its previous stage, which represents the 'enhanced', 'stacking' and 'segmentation' components incorporated in F-U2MNet-C are all effective. The whole preprocessing model not only helps improve all the metrics with a nearly 2% increase but also enhances the stability of the system. These improvements are consistent with our expectation that the F-U2MNet-C can optimize the training process because it enhances the features and removes the interference factors in the grayscale image.
2) Effectiveness of AdaD: Assume 'FNN' as 'single FNN without prediction ensemble technique', 'Ada-FNN' as 'FNN trained using traditional AdaBoosting ensemble technique', and 'AdaD-FNN' as 'FNN trained using proposed AdaD ensemble technique'. As can be observed in Table IV, AdaD-FNN outperforms FNN on all the evaluated metrics with 3.96% in accuracy, 3.67% in precision, 3.96% in sensitivity, 4.09% in F1 score and around 3% in stability. When compared with the Ada-FNN, AdaD-FNN also has nearly 2% improvements on all metrics. There are two reasons for this improvement. First, the weight updating concept in the AdaD structure could help the training sample focus more on the hard-to-identify samples by increasing the weights for the misclassified samples in further learning. Second, the lrDecay module inside the AdaD structure can offer help in inhibiting the network from remembering the noisy information in the dataset and improving the learning of complex patterns in the hard-to-identify samples.
3) Drop Factor Comparison: We then tried different settings of drop factor in our lrDecay module inside the AdaD structure to determine the optimal value of . As displayed in Table V, our framework gives the best classification results with = 0.6. Training with = 1 gives the worst performance, further proving that incorporating the lrDecay module into AdaD structure could bring improvement in performance. In all, the proposed lrDecay in AdaD structure is effective. The optimal value for can be varied for another dataset.

4) Effectiveness of FNN:
We compared the results between the disjointed FNN, overlapped FNN, standard CNN using MP and standard CNN using AP under the training of AdaD structure. The results in Table VI clearly reveal that the disjointed type of AdaD-FNN outperforms other pooling methods, which validates the effectiveness of our proposed FNN. This is because the utilization of FP could avoid the rapid loss of surrounding information and meanwhile allow more times of viewing the input image at a different scale when compared with traditional MP and AP.

C. Comparison to the State-Of-The-Art Approaches
We compared our approach with state-of-the-art COVID-19 classification methods, as shown in Table VII. To conclude, our framework outperforms the most of methods among four public datasets, indicating the superiority of our framework to exploit more robust features from heterogeneous datasets. On Site C, our framework shows a lower precision when compared with xDNN [34], but the sensitivity of ours is 2.33% higher than theirs. In real life, a high-sensitivity diagnosis system would offer significant help for the management of COVID-19 due to the false-negative problem of RT-PCR.
The reasons behind the good results are: First, preprocessing using F-U2MNet-C can help AdaD-FNN better capture the semantic representations and facilitate a smooth training  process by eliminating the interference factors. Second, the weight updating concept helps the training sample focus more on the hard-to-identify samples. Thirdly, with the incorporation of lrDecay module, the proposed network was inhibited from remembering the noisy information, and its learning ability of complex patterns was improved. Fourth, FP layers provide more opportunities to view the input image at a different scale, which improves the recognition capability of the network.

D. Explainability
To understand the manner of the framework, gradient-CAM [54] was applied on some randomly chosen sample images. For each category, the generated attention heatmaps and the corresponding manual delineations are shown in Fig 6. The results demonstrate that for COVID-19 and TB images, our framework focuses more on lesion areas (circled in red) and less on non-lesion areas. For HC images, the model's attention is not focused on any specific area, since there are no lesion areas in the HC group. In all, these heatmaps show how our framework predicts COVID-19, TB, and HC images in a clear and understandable manner. The concerns of our framework are highly consistent with the standard already approved in the medical community, which adds confidence that it can assist the diagnosis of radiologists in the real world.

V. CONCLUSION
In this study, we proposed a novel image preprocessing model F-U2MNet-C and a novel classification model AdaD-FNN, which entails five improvements: (i) proposed F-U2MNet-C with the usage of fuzzy partitions and ECC stacking, (ii) proposed U2MNet (iii) novel AdaD-FNN model (iv) usage of fractional max-pooling in proposed FNN, (v) novel lrDecay module. Through the help of these improvements, we created an explainable system for COVID-19 detection that shows advantages in diagnostic rate and stability not only on the public datasets but also when compared with other state-of-the-art deep learning approaches. It achieved an accuracy of 99.52% on TLDCA, 92.96% on UCSD-Al4H, 97.86% on SARS-CoV-2 and 91.97% on TCIA. Although the good performance is achieved, the limitation of our framework still exists. In our framework, the network is sensitive to abnormal samples, which may obtain high weight in iteration and affect the prediction accuracy of the final strong classifier. Future works will mainly focus on the incorporation of more advanced, rapid deep learning techniques into our system. Besides, we are interested in pretraining our framework on more large-scale datasets, which contain more classes of chest diseases, such as community-acquired pneumonia.