A New Methodology for Assessing SAR Despeckling Filters

Deep-learning (DL) methods require immense amounts of labeled data to provide reasonable results. In computer vision applications, and more specifically in despeckling synthetic aperture radar (SAR) images, due to the speckle content, there is no ground truth available. To test the performances of despeckling filters, the common protocol is to synthetically corrupt optical images with a suitable speckle model, and then, after filtering, well-known metrics are obtained. Then, filters are tested on actual SAR data. However, even the most elaborated speckle models are far from accounting for the complex mechanisms related to SAR images. In this letter, a methodology to design a realistic dataset is proposed. Actual SAR images of the same scene are acquired with the same sensor on different dates, and then they are properly coregistered and averaged to get a ground-truth-like reference image to objectively evaluate the performance of a despeckling method. To show the benefits of the proposed methodology, a DL approach is used to filter the data by using the designed dataset, which will be called the “SAR model.” Then, they are compared with the standard protocol by using synthetically corrupted optical images, which will be the “Synthetic model.” One last validation is performed by filtering the same images with FANS, a well-known despeckling filter, and compared with the results obtained with an autoencoder (AE). The validation of actual SAR data not included in the training phase validates the proposed methodology. From the results shown, it is recommended to test filters on the proposed more realistic dataset.

is a problem for assessing the performance of any filter [1].To overcome this, in the seminal work by Lee et al. [2], a protocol is proposed: optical (natural) images are considered as noiseless data and then, using a suitable speckle model (i.e., Gamma distribution law), are properly corrupted, therefore, both the noiseless and the speckled images are available and evaluated through well-established objective metrics and also visually by an expert.Moschetti et al. [3] replaced the one single synthetic image used in Lee's protocol, with massive Monte Carlo runs, providing more statistical significance to results.After being turned on synthetic/modeled data, despeckling filters are often evaluated on actual SAR data.In [4], it is stated that approaches trained on synthetic datasets usually perform poorly in practice due to the unavailability of clean SAR images, and an unsupervised despeckling method by combining online speckle generation and unpaired training is proposed.In [5], a dataset that is varied and realistic using a multicategory generalized Gaussian coherent SAR simulator is built.
A radically different approach has emerged recently [6], [7], [8] in which an improved SAR dataset is obtained from the multitemporal average of SAR images to get a more realistic reference image.The effect of averaging the images provides a result somehow equivalent to the one from the common multilook technique.The benefits and the drawbacks related to training deep-learning (DL) networks on both synthetic data and the temporal multilooking approach are addressed in [9].
Our approach resembles [6] in the sense of employing a multitemporal average of SAR data.However, in our case, the dataset is then used both for training and also for the evaluation of filter performance (in our last model, the ground-truth images are not corrupted with simulated speckle).A comparison with the common approach that relies on using corrupted data by a speckle model invites us to replace such a long-term way of working with the new methodology.This new methodology, although addressed in [9], where is said, sic."On the contrary, the approach based on simulation is quite risky if the simulated data are not really aligned with the test data," has not been soundly proposed as a new approach to replacing the old ones based on Lee's protocol.Note that we proposed this new methodology not only for assessing DL-based despeckling filters, but for all despeckling filters.Thus, designing a dataset rooted in actual SAR images is of significant importance for training DL models.In this context, the autoencoder (AE) will learn directly from the 1558-0571 © 2024 IEEE.Personal use is permitted, but republication/redistribution requires IEEE permission.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
genuine speckle patterns generated by the satellite sensor and not from synthetically corrupted images.The resulting dataset comprises a sufficient number of image pairs (noisy and ground truths) to effectively train the DL model for the precise despeckling of incoming images.This letter is organized as follows.In Section II, we introduce the SAR speckle model and also review different approaches, traditional (local and nonlocal) and even DL-based despeckling filters.The new protocol is summarized in Section III.In Section IV, the proposed framework to design a labeled dataset that includes ground-truth and noisy images is explained.In Section V, a default AE is trained by using the designed dataset to demonstrate the convergence and the good performance of the model.Finally, in Section VI, some conclusions and future work are shown.

II. SPECKLE MODEL
Speckle is considered a multiplicative noise.A SAR image Y can be expressed as Y = X • N , where X is the noise-free image and N is the speckle noise, which has distribution according to the following equation: where L is the equivalent number of looks (ENL) of the SAR image and (L) is the Gamma distribution of L. As an active area of research in the last decades, the irruption of the DL paradigm has established a clear division of all methods: traditional despeckling filters and DL-based ones.Different approaches for despeckling filters have been presented.Local filters like Lee and Refined Lee filter [10], Frost and its proposed variants [11], among others.Nonlocal filters, which replace the value of a pixel with the average of similar pixels that have no reason to be spatially close [12], nonlocal based on anisotropic diffusion (SRAD) which exploits the instantaneous coefficient of variation [13], a Nonlocal SAR Image Denoising Algorithm Based on LLMMSE Wavelet Shrinkage (SAR-BM3D) [14] and its improvement to make SAR-BM3D faster but keeping similar and good performance (FANS) [15].
Different techniques that use DL have been also proposed.Some of the most relevant described in the review are: generative adversarial networks (GANs) [16], SAR-CNN [8], DNN [17], and SAR-RDCP [18].Neigh CNN [19] is a SAR speckle reduction proposal that includes a combination of three different losses, obtaining better results compared to Kuan, SAR-BM3D, SARDRN, and IDCNN.Another proposal with a modified cost function, including the Kullback-Leibler divergence, is called multiobjective CNN-based (MONET) [6], has shown improved SSIM, SNR, and MSE despeckling over nonlocal filters like NOLAND, ID-CNN, SAR-DRN, SAR-BM3D, and FANS.The cost function that will be used at the end of this letter is binary cross-entropy (BCE) /log loss, according to the following equation: III.NEW PROTOCOL The new protocol proposed in this work has been designed for assessing SAR filter despeckling of actual SAR data through either standard filters or DL filters.For a noisy image Y and its filtered version F, the new protocol consists of the following steps: Select a set of well-known SAR-specific metrics, such as preservation of radiometric statistical properties measured on the ratio image (pixelwise division of Y and F: Y /F), the ENL, the mean preservation and noise variance reduction after filtering.Such metrics do not require a ground truth.Select a set of well-known referenced image quality indices to evaluate edges/small features/bright scatterers preservation and statistical structure of filtered data.These metrics require a ground truth to compare with.If not available, build a realistic SAR ground-truth dataset (see Section IV).Do not evaluate on a single filtered result but on as many despeckled outcomes as possible.If not possible, the evaluation must be done on several selected patches within the filtered results.Complement the numerical evaluation of filtered and ratio images with a visual inspection by an expert.Complement the evaluation (numerical and visual) with a comparison with state-of-the-art despeckling filters.
This new protocol differs from the standard one in two things: first, it does not use a simulated pattern.Second, it promotes the use of both: referenced metrics and referenceless metrics.Also, it may be used to evaluate the performance of a new proposed filter although it is also recommended for tuning a new filter.The design of the filter should be done on actual SAR data and not on synthetic SAR data.In the rest of the letter, the benefits of this new protocol are illustrated for the case of a DL filter and a well-known state-of-the-art filter.

IV. DATASET
In [6] and [14], classical optical noise-free images are used which are initially and their speckled version is obtained through a distribution like the discussed Gamma to test the performance of despeckling filters.An analysis of three different approaches for building datasets is performed in [20].

A. Actual SAR Imagery Download
In this letter, we use ASF Datasearch Vertex [21].Sentinel-1A imagery is available for download and we chose a region with urban areas and man-made structures like bridges, buildings, highways, and so on.The images correspond to the region of Toronto in 2022.For this letter, images in the intensity mode level "L1 Detected High-Res Dual-Pol (GRD-HD)" of the same location acquired from August 24 to December 22 with a revisiting period of 12 days were downloaded.The images were obtained with the C band at 5.4 GHz with a resolution of 10 m, VV polarization at a height of 693 km from Ecuador.

B. Multitemporal Fusion
Multitemporal fusion (or temporal multilooking) is a mean operation performed over several images of the same location.In our case, a reference image of September 5 is selected, the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.other four, and then the other nine images are registered and averaged with respect to it, as shown in Fig. 1.The ENL of the three images is calculated in two large enough regions of interest of 20 × 20 pixels selected in a homogeneous area (red squares).The corresponding ENLs of the images in the bottom of the regions of interest of the figure are (from left to right): 16.490, 50.015, and 70.605, respectively.As expected, including more images in the mean operation results in a less noisy image (higher ENL).A mean of ten samples will be used in this letter.

C. Image Clipping
This process can be performed through three main parameters: the desired width of the clipped images (W), the desired height of the clipped images (H), and the stride (S).A recommended setting that we make of these parameters is W = H = S = 512.With this setting, from the downloaded image of 26019 × 16732 pixels, it is possible to obtain 1600 clipped images of 512×512 pixels.A smaller stride will generate more images but some of the pixels will be overlapped between images, which could be considered a data augmentation technique which is a common practice in artificial intelligence and DL.The size of the image must be a power of 2, because of the dimension-reducing steps in DL, such as MaxPooling with a stride of 2. The code used in this paper is available at https://github.com/rubenchov/SAR_despeckling_dataset.

V. EXPERIMENTAL RESULTS
The experiments carried out in this letter include the training of two DL models.The "SAR model" was trained by using the designed dataset in IV by using actual SAR images.Another one, the "Synthetic model," was built by using images corrupted according to (1) with ENL = 16, similar to the ENL measured in the actual SAR images.As a strategy to compare the results obtained with the AE, we also performed a despeckling process separately over the same images by applying the FANS filter, since it is one of the state-of-the-art filters that do not include artificial intelligence, and thus it is a traditional reference filter.

A. Structure of the Autoencoder
The structure of the DL models, in this case autoencoders (AEs), was adapted from [22], by modifying its input and output dimensions, corresponding to the size of the designed dataset (512 × 512).The AE is composed of five main parts as shown in Fig. 2. The input is the noisy image, which in this case is a clip of 512 × 512 pixels.The image must be in grayscale, so the full size of the input is 512 × 512×1.The encoder is a layer of 32 convolutional 2-D filters with RELU activation followed by a downsampling MaxPooling operation along its spatial dimensions (height and width) by taking the maximum value over an input window of size 2 × 2. Again another layer of 32 filters of 2-D filters followed by a MaxPooling operation of size 2 × 2. The latent layer, or bottleneck, has the image transformed with the smallest dimensions (128×128), one-fourth of the input size.This small dimension restricts the noise and only leaves the important information of the image, so the AE acts as a speckle filter.The decoder is composed of two layers of transposed convolution, also called "deconvolution" of 32 filters.Finally, one layer consists of one last 2-D convolution filter of size 3×3.Finally, the output is a layer that delivers a new grayscale image, resulting from the compression and decompression of the AE, a process in which the noise was reduced.The output has the same size (512 × 512×1) as the input of the AE.
The optimizer is trained by using an Adam optimizer and the BCE loss function according to (2).Other loss functions were considered, but no improvements were obtained.The designed dataset is composed of 3200 images (1600 noisy and 1600 ground truths) but six images were removed from this dataset to build a separate validation set.Once the model is trained, when a new noisy image is fed, the output will be a filtered image.

B. Metrics for SAR Despeckling
The evaluation of despeckling filters can be divided into two categories.The first one (known as referenced assessing), requires a ground truth.The second one is known as referenceless assessing and is the one generally used for actual SAR data.
Among the metrics that require a ground truth, the most used are the mean squared error (MSE), the peak signal-tonoise ratio (PSNR), and the structural similarity index measure (SSIM).For actual SAR data (no ground truth available), the most used is the ENL.The M estimator proposed in [23] is a referenceless metric that operates within the ratio image (the pixel-wise division of the original image and the filtered image) and it measures both, the preservation of the radiometric properties of the ratio image and its statistical properties.A perfect despeckling filter will produce M = 0 (the ratio image resembles pure speckle).

C. Despeckling Results
Some of the results of training two DL models and applying the FANS filter over the two datasets, including ratio images Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.the best outcomes for each sample with every type of noise are highlighted with bold formatting across all performance metrics.The FANS filter performed better than the AE in the synthetic type of noise, even though the results of the AE were very close, especially in PSNR and SSIM.The AE performed best in the actual SAR images in all cases for the metrics ENL, PSNR, MSE, and SSIM.M was lower with FANS over synthetic noise only in one of the three samples with respect to the AE.
To validate the effectiveness of the traditional Lee's protocol, three additional samples (namely 4, 5, and 6) were selected from the Land-Use Scene Classification dataset [24].These optical images were corrupted with synthetic speckle noise, where the ENL was set to 16.The denoising outcomes obtained using the FANS method are shown in Fig. 4.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.The results of Table II in boldface demonstrate a significant improvement achieved by the FANS filter in the optical case, compared to those obtained in Table I.It is evident that the FANS filter outperforms all cases involving synthetic noise.These findings indicate that the FANS filter is well-suited for synthetically corrupted images; however, it falls short in accurately modeling and denoising the speckle in actual SAR images.The AE trained with the "SAR dataset" learned from the actual SAR images, including the speckle, and effectively removed it.As a result, the AE significantly improves the metrics of ENL, PSNR, MSE, and SSIM over actual SAR images, as was shown in Table I.

VI. CONCLUSION AND FUTURE WORK
The labeled "SAR dataset" was designed by creating two folders: one with noisy and another with ground-truth images.An AE with default parameters was trained and its results were analyzed, finding that the metrics evaluated (ENL, PSNR, MSE, SSIM, and ENL) were significantly improved over SAR images compared to a filter by using the traditional synthetic approach.
In general, the best performance over SAR images was obtained from the "SAR model," so we recommend stopping the use of the synthetic approach and starting the use of actual SAR images for all the filter validations whenever the multitemporal images are available.
This framework shows promising results and opens an enormous possibility for more researchers to design and evaluate their own labeled dataset to train DL-based models for image despeckling and obtain much better results in SAR applications.The structure of the AE can be taken as a baseline for future DL-based despeckling models, for example, by optimizing its hyperparameters or improving its structure.

Fig. 1 .
Fig. 1.Actual SAR image and averages 5 and 10, respectively (from left to right), with a zoom of a 20 × 20 window in red rectangles of a homogeneous area.

Fig. 3 .Fig. 4 .
Fig. 3. Comparison of validation images (top to bottom): Actual SAR, generated ground truth, ground truth corrupted with synthetic noise ENL = 16, synthetic denoised with FANS and ratio images, and SAR denoised by AE trained with actual SAR images and ratio images.

TABLE II ENL
, PSNR, MSE, SSIM, AND M-ESTIMATOR OF THREE OPTICAL IMAGES DENOISED WITH FANS FILTER ACCORDING TO LEE'S PROTOCOL