Robust Attention Deraining Network for Synchronous Rain Streaks and Raindrops Removal

Synchronous Rain streaks and Raindrops Removal (SR^3) is a hard and challenging task, since rain streaks and raindrops are two wildly divergent real-world phenomena with different optical properties and mathematical distributions. As such, most existing data-driven deep Singe Image Deraining (SID) methods only focus on one of them. Although there are only a few existing SR^3 methods, they still suffer from blur textures and unknown noise in reality due to weak robustness and generalization ability. In this paper, we will propose a new and universal SID model with novel modules, termed Robust Attention Deraining Network (RadNet), with strong robustness and generalization ability that are reflected in two main aspects. (1) RadNet can restore different rain degenerations, including raindrops, rain streaks, or both; (2) RadNet can adapt to different data strategies, including single-type, superimposed-type, and blended-type. The generalization ability is also demonstrated by the performance of dealing with real rain images. Specifically, we first design a lightweight and robust attention module (RAM) with a universal attention mechanism for coarse rain removal, and then present a new deep refining module (DRM) with multi-scale blocks for precise rain removal. To solve the inconsistent labels of real scenario data, we also introduce a flow & warp module (FWM) into the network, which can greatly improve the performance of real scenario data via optical flow prediction and alignment. The whole process is unified in a network to ensure sufficient robustness and strong generalization ability. We evaluated the performance of our method under a variety of data strategies, and extensive experiments demonstrated that our RadNet could outperform other state-of-the-art SID methods.


Introduction
Rain removal is an important task in the low-level image restoration field, since it can affect the outdoor computer vision and multimedia computing tasks. In reality,  [13] and CCN [14], respectively. Our method obtains the cleanest result.
rain captured by cameras, surveillance cameras and mobile devices has two major forms, i.e., rain streak and raindrop. Due to different optical properties and mathematical distributions, rain streak removal and raindrop removal are usually treated as two different tasks. Since rain streaks appear more frequently in reality, more deep SID methods are proposed for rain streak removal. However, the rain streak removal models cannot be well generalized to the raindrop removal task and vice versa. To address the difference between the two tasks, researchers have recently proposed a new task called synchronous rain streaks and raindrops removal (SR 3 ) [14], which aims to remove both via a unified convolutional neural network (CNN). Next, we will describe the development of these three tasks.

Rain streak removal
Rain streak images create a rain mask before the true image. For heavy rains, they may cause a haze atmosphere due to light scattering, and hence making the images blurring and haziness. Separating the rain mask from true image is an intuitional idea to solve this task. The rain streak removal problem can be modeled as follows: where O denotes a rain image that is decomposed into a rain streak component S and a clean background B. This model is widely used in current SID methods [16,19,21,24].

Raindrop removal
Raindrop degradation occurs when raindrop region is formed by rays of reflected light from a wider environment, which contains different imageries from those without raindrops. In most cases, the focus of camera is on the background scene, making the appearance of raindrops blur. The raindrop removal process is modeled as where M is a binary mask, and M (x) = 1 means the pixel x is a part of rain region, and otherwise it is a part of background region. D is the effect brought by the raindrops, representing the complex mixture of the background information and the light reflected by the environment and passing through the raindrops adhered to a lens or windscreen. This model is also frequently used in current raindrop removal methods, such as [13,15].

Synchronous rain streak and raindrop removal
When raindrops and rain streaks appear synchronously in the same image, they will affect each other and make the restoration task more complicated, due to different optical properties of the rain streaks and raindrops. For deep learning, it is difficult to treat the mixed data with different distributions in a unified network, which may make the network hard to converge or uneven to perform (i.e., better on either rain streaks or raindrops). SR 3 task is modeled as: where α is a global atmospheric lighting coefficient. This model is used in recent SR 3 method called complementary cascaded network (CCN) [14]. CCN tacked this hybrid task in a uniform network via a two-branch and two-stage strategy. However, the robustenss and generalztion ability of this divide-and-conquer strategy may be limited in real scenarios due to the independent setup of the networks and the difficulty of task un-entanglement.

Our contributions
Due to the difference of rain streaks and raindrops in optical properties and mathematical modeling, it is difficult to remove them at the same time. However, viewing SR 3 as a simple combination of rain streak and raindrop removal is unscientific, since removing the rain streak firstly and then removing the randrop cannot recover the image completely. Since image degradation caused by rain streaks and raindrops is entangled, artificially forcing them apart will result in unpleasant results (see Figure 1). Overall, the main contributions of this paper are summarize as follows: • To improve the robustness of SR 3 task against different degenerations and data strategies, we propose a hybrid SID network, termed Robust Attention Deraining Network (RadNet), with robust attention. Firstly, we design a lightweight network with a universal attention mechanism for coarse rain removal, and then propose a deep neural network with multi-scales blocks for precise rain removal.
• We propose a new and robust attention module (RAM), which can not only restore different degenerations, including raindrops, rain streaks or both, but also can adapt to different data strategies, including single-type, superimposedtype and blended-type.
• In addition to strong robustness on benchmark data, RadNet can be potentially generalized to the real scenario, which has advantages of good rain removal, and avoiding excessive blur and artificial noise.
• Extensive experiments on both benchmarks and realworld data demonstrate the effectiveness of our method. Up to 1dB-4dB reconstruction advantage in terms of PSNR is obtained, compared with the SOTA SR 3 method CCN.

Related Work
In this section, we review some representative SID methods on rain streak removal, raindrop removal, and SR 3 .
Rain Steak Removal Methods. Most of existing deep SID methods handles the task of rain streak removal. For example, JORDER [24] develops a multi-task deep learning architecture that learns the binary rain streak map, the appearance of rain streaks, and the clean background. PReNet [16] provides a better and simpler baseline deraining network by considering network architecture, input, output, and loss functions. Recently, RCDNet [18] proposes a model and utilize the proximal gradient descent technique to design an iterative algorithm only containing simple operators for solving the model.
Raindrop Removal Methods. To restore an image from a corrupt input, the model of [4]  and discriminative networks. A double attention mechanism is also introduced [15], which concurrently guides the CNN by shape-driven attention and channel re-calibration. Synchronous Removal Methods. Rain streak and raindrop are two different phenomena and have a large difference between their distributions. Hence, removing both of them through a unified CNN is challenging. Recently, CCN [14] is proposed to remove the rain streaks and raindrops in a complementary fashion. CCN uses the NAS to adaptively find an optimal architectures. They also construct a new rain dataset RainDS which includes the rain images in dif-ferent types and their corresponding rain-free ground-truth, including rain streak only, raindrop only, and both. However, using two-stage and two-branch is not a robust way in synchronous deraining. Besides, NAS can only optimize the results to a certain extent, and the overall rain removal performance is determined by the structure and property of the network itself. Therefore, to solve this problem, we propose a robust model to remove the rain streaks and raindrops synchronously in a truly unified CNN framework for better generalization abiltiy.

Proposed SR 3 Method
In this section, we introduce the proposed RadNet framework in detail (see Figure 2). As can be seen, there are two primary parts, i.e., Robust Attention Module (RAM) and Deep Refining Module (DRM). RAM can pay attention to different rain degradation phenomena with a robust atten-tional mechanism and perform coarse rain removal results. DRM uses a multi-stream CNN to perform refined rain removal. A hybrid loss constrains the whole network.

Robust Attention Module (RAM)
LSTM [3] has been demonstrated to be very useful in many computer vision tasks, e.g., image deraining and dehazing. Following [13,16,26], we also use the LST M unit and residual block [6] to construct our robust attention module RAM, which can pay attention to different rain degradation phenomena in different data strategies. A exmaples of the attention maps extracted by RAM is shown in Figure  3. We see that RAM can focus on most regions with rain streaks, raindrops, and both. We subtract the attention map from the original rain images to obtain coarse rain removal results, which are used as the input of DRM. RAM can reduce the abominable influence caused by the divergent dis- where X t denotes the streak maps obtained by prepositive t-stage Conv + ReLU unit, c t denotes the cell state that will be fed to the LST M unit in the next stage, H t is the output of current LST M unit and will be sent to the next Conv + ReLU unit, [·] is concatenate operation, σ and ε are the sigmoid and tanh functions, respectively.

Deep Refining Module (DRM)
In our model, RAM can be regarded as a lightweight network to obtain the coarse rain removal results. Hence, we need to use a more sophisticated network (namely, DRM) with stronger learning ability to remove the degraded remanent content after that. Traditional deep networks use simple residual blocks [6] and dense blocks [7] to extract features, but for the complex SR 3 task, these blocks cannot extract appropriate features from complex data strategies. Recently, a new block called Dual Path Residual Dense Block (DPRDB) is proposed in [23], which can jointly discover new features and reuse features, and has been proved to very effective in the image restoration task with complex distributions. We therefore use DPRDB as our basic block to construct a three-branch network for in-depth deraining, since multi-scale features extracted from different kernel sizes can capture different perceptive filed information hidden in the rain images.
The structure of DRM is illustrated in Figure 2, where each branch contains five DPRDBs. The kernal sizes of each branch are set to 3, 5 and 7, respectively. To avoid the information loss in deep network, we use feature shortcuts between blocks. After that, we use a Conv + ReLU operation to obtain the final refined rain removal results. A exmaples of the refined deraining result by DRM is shown in Figure 3. We see that DRM can remove most degraded region and obtain final clear output.

Loss Function
Due to the success of perceptual loss [10] for image restoration [11,13,25], we also use it as a part of our loss  Figure 7. Deraining comparison under blended-type data strategy (i.e., RD real + RS real + RDS real) with state-of-art SID methods. function to maintain global features. Beside, since the goal of outputting derained image is to approximates its corresponding ground truth, we directly use the SSIM loss as the other part to maintain pixel-level features. The total loss function can preserve the perpixel similarity as well as preserving the global structures: where L p denotes the perceptual loss to minimize the difference between perceptual features, and L ssim denotes the SSIM loss for measuring the similarity between two images. λ is a trade-off parameter. In our method, we use the layers ReLU2 2 and ReLU2 3 of the VGG-16 model [17] as the perceptual features extracting function.

Experiments
In this section, we compare our method with other stateof-the-art SID methods under different data strategies.

Experiment Setup
Baselines. We divided SID methods into three categories and tested them on different data strategies: • Raindrop removal method, i.e., AttentGAN [13].
• Superimposed-type data contain two type of rain degeneration in a single image, i.e., RDS syn and RDS real.
• Blended-type data contain two type of rain degeneration both in the same and different images, i.e., RD syn + RS syn + RDS syn, RD real + RS real + RDS real, and Rain200H + Rain200L + RainDrop.
Generalization evaluation. We choose two real scenario datasets (i.e., SIRR-Real [20] and Real200 [22]) as the benchmark data to examine the generalization ability: • SIRR-Real data contain 147 real scenario images.
• Real200 data contain 200 real scenario images. Implementation Details. We use the Pytorch platform in Python environment on a NVIDIA GeForce GTX 1080i GPU with 12GB memory. Adam is used as the optimizer with an initial learning rate 1e-3, which is decayed by multiplying 0.2 in every 30 epochs. Batch size is 16 and each image will be randomly croped to 128×128 pixels. We train 100 epochs to make the network convergence. The trade-off parameter λ in Eqn. (5) is setting to 1. Two metrics are used for evaluations, i.e., Peak Signal to Noise Ratio (PSNR) [8] and Structural Similarity Index (SSIM) [1].

Robustness Evaluation Results
Single-type data results. The results are described in Table 1, from which we see that our method can obtain better performance on both rain streak and raindrop data. Specifically, comparing with the rain streak removal methods (that is, DetailNet, RESCAN, PReNet and DRD-Net), our method achieves the best evaluation results, especially on RS syn data. Furthermore, when handling the raindrop removal task, we can also obtain the best performance comparing with the related raindrop removal methods (i.e., At-tentGAN). For a more fair comparison, we compared our method iwth two SR 3 methods (i.e., Pix2pix and CCN). We find that up to 4dB PSNR advantages are obtained by our method over CCN in RS syn and 2dB in RD syn. Finally, we also illustrate some results on RS real and RD real in Figures 4 and 5, from which one can see our method can remove more degradation and recover clear background.
Superimposed-type data results. The restoration task under this data strategy is more difficult, which mainly focuses on processing the rain streaks and raindrops synchronously. From the numerical results in Table 2, we see that our model obtains up to 2dB PSNR advantage over CCN in RDS syn. However, all methods perform poorly on RDS real data. This is mainly because pairs of images of the RDS dataset collected from real scenarios do not correspond at the pixel level, which brings problems to all supervised networks. But in this case, our approach still works effectively. Finally, we also illustrate some results from RDS real in Figures 6, from which we see that our method can remove most of the rain degradation under superimposed-type data strategy.
Blended-type data results. This is the most difficult data strategy in the experiment, since the simultaneous mixing and stacking of data are disastrous for networks with weak generalization learning capabilities. However, due to the RAM and DRM, our method can handle this situation and obtain the best performance. Specifically, our model can obtain up to 2-3dB PSNR advantage over PReNet and RESCAN. Unfortunately, due to the lack of CCN code, we cannot obtain corresponding results in this part. Finally, we also illustrate some results from RD real + RS real + RDS real and Rain200H + Rain200L + RainDrop in clearest image.
Remarks. To intuitively observe the performance of all methods, we use Figure 9 to show the degradation of various methods under different data strategies. Specifically, we labeled the five tests in superimposed-type and blendedtype data strategies as Ti (i = 1, 2, 3, 4, 5). The results in the blue rectangle are the superimposed-type tests, and the results in the orange rectangle are the blended-type tests. We see from the results that our method consistently yields the best results. It is worth noting that all methods degrade on real data, mainly because the pairs of images of RDS do not correspond at the pixel level. In the future, we will study alignment technology to solve this problem.

Generalization Evaluation Results
Testing in real scenario images is an important way to measure the generalization of the model. Since there are no ground-truth images for comparison, we list some randomly selected results in Figure 10. It can be seen from the results that our method has the best rain removal effect, which not only remove more rain streaks, but also avoid the ambiguity caused by excessive rain removal and artificial noise.

Ablation Study
In this study, we mainly explore the impact of three factors, i.e., Network Module, Basic Block and Loss Function, on the deraining results of our method. Specifically, three questions are considered: Q1: We select RAM + DRM as pipeline, whether such collocation is effective?
Q2: Why choosing DPRDB as the base block, rather than residual block and dense block?
Q3: Why adopting a mixture of the perceptual loss and SSIM loss as the loss function?
To answer the questions, we use the superimposed-type data RDS syn in ablation study. The results are described in Tables 4, 5, and 6, respectively. Specifically, A1: Our method with modules RAM + DRM obtains the best perfomance, and the setting with only RAM module obtains the worst performance, since the lightweight network focuses on coarse deraining and cannot learn good mapping from complex data; A2: Our method with DPRDBs obtained the best performance. By utilizing advantages of both residual block and dense block, DPRDB can reuse previous features and exploring new features, leading to stronger learning power; A3: The best perfomance is obtained by using both L p and L ssim . This form of loss function can preserve the perpixel similarity as well as the global structures.

Conclusion
We have explored the robustness and generalization ability in the SR 3 task. Technically, we have proposed a robust and hybrid attention SID network. Compared to the recent methods that attempt to solve singe-type data, we present a new RAM+DRM pipeline that can handle various data paradigms,including single-type, superimposed-type and blended-type data. Extensive experiments and comparisons are conducted on synthetic and real rain images to evaluate the deraining performance and generalization ability of our method. The investigated results show that our model can outperform recent related methods. In future, we will further investigate how to reduce the parameters of our method, so that it can be deployed on lightweight devices.