Nonlinear Beamforming Based on Group-Sparsities of Periodograms for Phased Array Weather Radar

—We propose nonlinear beamforming for phased ar- ray weather radars (PAWRs). Conventional beamforming is linear in the sense that a backscattered signal arriving from each elevation is reconstructed by a weighted sum of received signals, which can be seen as a linear transform for the received signals. For distributed targets such as raindrops, however, the number of scatterers is signiﬁcantly large, differently from the case of point targets that are standard targets in array signal processing. Thus, the spatial resolution of the conventional linear beamforming is limited. To improve the spatial resolution, we exploit two characteristics of a periodogram of each backscattered signal from the distributed targets. The periodogram is a series of the powers of the discrete Fourier transform (DFT) coefﬁcients of each back- scattered signal and utilized as a nonparametric estimate of the power spectral density . Since each power spectral density is pro- portional to the Doppler frequency distribution, (i) major components of the periodogram are concentrated in the vicinity of the mean Doppler frequency , and (ii) frequency indices of the major components are similar between adjacent elevations. These are expressed as group-sparsities of the DFT coefﬁcient matrix of the backscattered signals, and we propose to reconstruct the signals through convex optimization exploiting the group-sparsities. We consider two optimization problems which evaluate the group- sparsities in slightly different ways, and both problems are solved with the alternating direction method of multipliers including non- linear mappings. Simulations using real PAWR data show that the proposed method dramatically improves the spatial resolution.

Doppler weather radars [7], [8] transmit a pencil beam and receive backscattered signals within a narrow range of elevation angles. By repeatedly transmitting and receiving the signals at several elevation angles with the mechanical vertical scan, the classical Doppler radar can observe the weather in the sky. On the other hand, PAWRs transmit a fan beam and receive backscattered signals within a wide range of elevation angles by an array antenna. Then, the backscattered signals within the narrow ranges are reconstructed from the received signals of the array antenna by digital beamforming [9]- [20]. This is the key technology in the PAWR because it gets rid of the mechanical vertical scan and hence the temporal resolution is significantly improved in the weather observation. For example, the PAWR developed at Osaka University [5] can observe the weather of a radius 60 kilometers in 30 seconds, while the classical Doppler radar requires 5 to 10 minutes for a similar observation [8].
Major beamforming methods [9]- [19] reconstruct the signal arriving from each elevation as a complex weighted sum of the received signals. In particular, Capon beamforming [12]- [17], also known as the minimum variance distortionless response (MVDR) beamforming [18], is widely used since it can adaptively reduce the influence of so-called sidelobes if a sufficient number of beams (pulses) are transmitted. However, for quick weather observations, the number of pulses should be as small as possible. To deal with such a situation, the minimum mean square error (MMSE) beamforming [19] was proposed. In this method, differently from Capon's method, the covariance matrix of the received signals is iteratively updated, and hence the backscattered signal arriving from each elevation can be reconstructed robustly even if the number of the pulses is small.
Beamforming methods were developed originally for observations of point targets in array signal processing, while targets of the PAWR are distributed targets such as raindrops. In this case, the number of scatterers is significantly large, i.e., much greater than the number of antenna elements. As a result, the spatial resolution of the above linear methods [9]- [19] is limited, and fine fluctuations of the average signal power densities corresponding to precipitation profile have not been captured.
To overcome the limitation of the linear methodology, this paper proposes a nonlinear beamforming method. We formulate the beamforming as an ill-conditioned inverse problem. To solve it, we exploit properties of periodograms of the backscattered signals from the distributed targets. The periodogram is a series of the powers of the discrete Fourier transform (DFT) coefficients of each backscattered signal and often used as a nonparametric estimator for the power spectral density of the 0196-2892 c 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.htmlfor more information. backscattered signal. Since the power spectral density is proportional to the Doppler frequency distribution [7], (i) dominant components of each periodogram are concentrated in the vicinity of the mean Doppler frequency, and (ii) frequency indices of the dominant components are similar between the periodograms of adjacent elevation angles. We express these two properties as group-sparsities of the DFT coefficient matrix of the backscattered signals and propose to reconstruct the DFT coefficient matrix by minimizing a cost function that consists of a data-fidelity term and two group 1 -norms evaluating the group-sparsities. We consider two optimization problems that compute the group 1 -norms in different ways, and both problems are solved by the alternating direction method of multipliers (ADMM) [21]- [24]. Since nonlinear mappings called the proximity operators are used in the ADMM iterations, the proposed beamforming is a nonlinear transform for the received signals. Numerical simulations based on real PAWR data show that the proposed nonlinear beamforming greatly improves the spatial resolution in comparison with the linear beamforming. The rest of this paper is organized as follows. In Section II, after explaining mathematical notation, we introduce the basic principles of the classical Doppler weather radar, and then we formulate the observation model in the PAWR. In Section III, we describe three conventional linear beamforming [11], [12], [19]. Section IV presents the proposed nonlinear beamforming. First, we derive the data-fidelity term in the frequency domain. Second, we explain the group 1 -norms evaluating two groupsparsities of periodograms in Section IV-A and Section IV-B. Third, we propose two convex optimization problems including their ADMM iterations in Section IV-C and Section IV-D. Section V shows the effectivity of the proposed beamforming by simulations, and finally Section VI concludes this paper.

II. PRELIMINARIES
Let R, R + , and C be the sets of all real numbers, nonnegative real numbers, and complex numbers, respectively. We use j ∈ C to denote the imaginary unit, i.e., j := √ −1. For any complex number x ∈ C,x denotes its complex conjugate, and |x| := √ xx denotes its absolute value. We write vectors and matrices by boldface small and capital letters, respectively. We use I n ∈ {0, 1} n×n to denote the identity matrix of order n. The transpose and Hermitian transpose of a vector or a matrix are denoted by (·) T and (·) H , respectively. The 2 -norm (or the Euclidean norm) of a vector x := (x 1 , x 2 , . . . , x n ) T ∈ C n is defined by and a nonnegative vector ξ := (ξ 1 , ξ 2 , . . . , ξ n G ) T ∈ R n G + , by , and x G := (x i ) i∈G is the subvector 2 extracted from x with G. The Frobenius norm of a matrix X := (x i1,i2 ) ∈ C n1×n2 is defined by X F := n1 i1=1 n2 i2=1 |x i1,i2 | 2 . We use E[·] to denote the expected value of some random variable. 1 We also define a weighted group 1 -norm of a matrix X in a similar way. 2 If x G i ≈ 0 for many groups G i , x is called group-sparse. In particular, if all G i consist only of consecutive indices, x is also called block-sparse [25].

A. Basic Principles of Weather Observation by Doppler Radar
Classical parabolic Doppler weather radars transmit pulses for a specific elevation angle and receive backscattered signals generated by the incidence of the transmitted signals to targets (clouds and raindrops). For a fixed range r [m], the lth discretetime sample x l ∈ C of the received signal is expressed as c , L is the number of the pulses, x l ∈ C is the sample of the backscattered signal x(t) generated by the pulse transmitted at t = (l − 1)T , and ε l ∈ C is additive white Gaussian noise of variance E[|ε l | 2 ] = σ 2 ε . We consider x(t) to be one sample realization of a continuous-time random process X(t) which is wide-sense stationary, 3 ergodic 4 and zero-mean, i.e., µ x := E[X(t)] = 0, and whose autocorrelation function Strictly speaking, x(t) is expressed as the sum of individual backscattered signals in the scattering resolution volume [7]: where a s (t) > 0 is the scattering amplitude of the sth particle, f s [Hz] is called the Doppler frequency, which is independently generated from a common probability density function q(f ), and φ s [rad] is called the phase shift, which is uniformly distributed from [−π, π) under the assumption that precipitation profile is uniform within the scattering resolution volume. The power spectral density of the backscattered signal x(t) is given by the Fourier transform of the autocorrelation function R x (τ ): Suppose that the amount of rainfall is constant and the shapes of all raindrops are the same, i.e., ∀s ∀t a s (t) = a, during the observation. As a result, we have R x (τ ) = E[a 2 s e j2πfsτ ], and the power spectral density S(f ) in (2) can be expressed as In (3), δ(f ) is the delta function, P : is the carrier wavelength, and q v : R → R + is a common probability density function for each Doppler velocity v s = λfs 2 [m/s], where the direction of approach to the radar is defined as the positive direction of the Doppler velocity. From (3), we can see that the power spectral 3 Let X(t) be a continuous-time random process. X(t) is called wide-sense stationary or weak-sense stationary if its mean and autocorrelation function do not vary by shifts in the time t, i.e., ∀t E[X(t)] = C xq(x|t) dx = µx and ∀t ∀τ E[X(t + τ )X(t)] = C 2 x 1x2 q(x 1 , x 2 |t + τ, t) dx 1 dx 2 = Rx(τ ), where q(x|t) is the probability density function of x(t) and q(x 1 , x 2 |t+τ, t) is the joint probability density function of x 1 := x(t + τ ) and x 2 := x(t). 4 A wide-sense stationary process X(t) is called ergodic if any sample realization x(t) of the process X(t) satisfies lim T →∞ density S(f ) is proportional to the Doppler frequency distribution q(f ) [7]. On the Doppler frequency, we call its mean the mean Doppler frequency, and we call its standard deviation the Doppler frequency width. Similarly, we call µ v = λµ f 2 the mean Doppler velocity and call σ v = λσ f 2 the Doppler velocity width. The main purpose of the Doppler weather radars is to estimate P , µ f (or µ v ) and σ f (or σ v ) from the measurements x l in (1), since these parameters correspond to the amount of rainfall, the mean wind speed and the variation of wind speeds.
As shown in (3), S(f ) has information about P and q(f ). However, the backscattered signal x(t) is observed in discretetime, and we have to consider the power spectral density of x l that is given by the discrete-time Fourier transform of R x (lT ): (4) with finite samples x l (l = 1, 2, . . . , L), we suppose that the number L is even, and we compute the normalized DFT coefficients u k ∈ C of x l by x l e −j 2π(k−1−L/2)(l−1) L (k = 1, 2, . . . , L). (5) Then, we compute a periodogram as a series of the powers of the normalized DFT coefficients u k : x l e −j 2π(k−1−L/2)(l−1) L 2 (k = 1, 2, . . . , L).
(6) The expected value of the periodogram (|u k | 2 ) L k=1 is given by From (4), (7), and Therefore, the periodogram is often used as a nonparametric estimator for S d (f ). Moreover, the parameters P , µ f and σ f can be estimated from the periodogram (|u k | 2 ) L k=1 , e.g., by

B. Observation Model in Phased Array Weather Radar
First of all, we show the observation model for K point targets. Let a PAWR have an N -element uniform linear array with the inter-element spacing d [m]. A plane wave signal scattered from the κth point target hits on the array antenna at an unknown angle θ κ ∈ [− π 2 , π 2 ], where θ 1 < θ 2 < · · · < θ K [rad]. The lth time sample y l ∈ C N of the received signal is given by where x κ,l ∈ C is the lth time sample of the κth plane wave signal, a(θ κ ) ∈ C N is the so-called steering vector defined by and ε l ∈ C N is additive white Gaussian noise, independent of the signals x κ,l , of covariance matrix R ε := E[ε l ε H l ] = σ 2 ε I N . On the other hand, our targets such as raindrops are called distributed targets [19], that are supposed to exist continuously (strictly speaking, a sufficient number of raindrops exist within the antenna beamwidth). In this case, the number K of scatterers (raindrops) is too large, and hence the observation model in (9) is no longer appropriate. We observe the distributed targets by dividing the whole angular interval [θ min , θ max ] ⊂ [− π 2 , π 2 ] into M subintervals [θ m − ∆θ 2 , θ m + ∆θ 2 ] (m = 1, 2, . . . , M ), where ∆θ := θmax−θmin M and θ m := θ min + (m − 1 2 )∆θ. Thus, instead of (9), we use the following observation model x m,l a m + ε l = Ax l + ε l (l = 1, 2, . . . , L), (10) where x m,l ∈ C is the lth sample of the sum of backscattered plane wave signals in the mth subinterval [θ m − ∆θ 2 , θ m + ∆θ 2 ], x l := (x 1,l , x 2,l , . . . , x M,l ) T ∈ C M , a m := a(θ m ) ∈ C N , and A := [a 1 a 2 · · · a M ] ∈ C N ×M . Note that the signal x l in (1) corresponds to x m,l in (10) for a specific elevation angle θ m . In the PAWR system, θ m is the mth elevation angle, and the average signal power density vector p := (P 1 , P 2 , . . . , P M ) T := E |x 1,l | 2 , E |x 2,l | 2 , . . . , E |x M,l | 2 T ∈ R M + corresponds to precipitation profile along the elevation axis.

III. CONVENTIONAL METHODS: LINEAR BEAMFORMING
The beamforming is a reconstruction problem of x l in (10). Major beamforming methods [9]- [19] estimate x l by multiplying complex weights w m ∈ C N (m = 1, 2, . . . , M ) and y l as In this paper, we call the methodology based on (11) the linear beamforming. Note that the least squares (LS) method does not necessarily work well, even if N ≥ M , since A is illconditioned when ∆θ is smaller than the antenna beamwidth that is determined by the antenna size, where A † ∈ C M ×N is the Moore-Penrose pseudoinverse of A. In the following, we explain three linear methods, Fourier (FR) beamforming [11], Capon (CP) beamforming [12], and MMSE beamforming [19].

A. FR Beamforming
FR beamforming [11] is the most basic linear method and its complex weight vector is defined by independently of y l . The vector w FR,m is equal to a matched filter that maximizes the signal-to-noise ratio . If N ≥ M and there exists some positive integer n satisfying ∀m d λ (sin θ m+1 −sin θ m ) ≈ n N , then FR beamforming can reconstruct the signal x l in the manner of the inverse DFT, but this ideal condition is usually not met. In the typical case, the average signal power density P m is often overestimated [19]. To explain this fact, we derive the observation model as in (9) from (10) by redefining K (≤ M ) as the number of the subintervals [θ m − ∆θ 2 , θ m + ∆θ 2 ] in which scatterers (raindrops) exist, and θ κ as the center of such a subinterval. Then, from is overestimated by (8): for many elevation angles θ m due to the influence of sidelobes, i.e., the influence of

B. CP Beamforming
CP beamforming [12], that is also known as MVDR beamforming [18], is a data-dependent method, differently from FR beamforming. To avoid the overestimation, CP beamforming minimizes the signal power 1 denotes the sample covariance matrix of the zero-mean random vector y l . The weight vector w CP,m is defined as the solution to the following optimization problem and given by if L ≥ N (strictly speaking, if rank( R y ) = N ). In particular, if N ≥ K + 1 and L is sufficiently large, then we have for all m since ∀θ κ = θ m |w H CP,m a(θ κ )| ≈ 0 holds. However, if L is not large, each average signal power density is underestimated [19]. Moreover if L < N , R −1 y cannot be computed.

C. MMSE Beamforming
MMSE beamforming [19] was developed to robustly reconstruct the signal x l in case of small L. This method approximately solves the following optimization problem Then the exact solution to the optimization problem is given by where R y := E[y l y H l ] is the covariance matrix of y l . By using the covariance matrix R where denotes the Hadamard product. As a result, the weight vector w MMSE,m in (15) can be approximated, from the initial estimate x becomes sufficiently small. In this method, even for small L, R y and x l are stably estimated. However, if K is close to N or larger than N , then the estimation accuracy degrades, i.e., fine fluctuations of the average signal power densities cannot be captured, because each vector w m can only create at most N −1 null directions and the influence of the sidelobes occurs.

IV. NONLINEAR BEAMFORMING VIA CONVEX OPTIMIZATION BASED ON GROUP-SPARSITIES
In this section, we propose a nonlinear beamforming method based on convex optimization. First, we gather x l and y l into Then the beamforming can be translated into a reconstruction problem of X from Y , and the data-fidelity for the measurements y l in (10) is evaluated by For each discrete-time backscattered signal x m at θ m , define the normalized DFT coefficients u k,m ∈ C (k = 1, 2, . . . , L) as (5).
The reconstruction of X and that of U are essentially equivalent due to the unitarity of F , and the weather parameters can be computed from U with (8). Thus in this paper, we consider the beamforming as a reconstruction problem of U from Y . We have X T = F H U , and (17) can be transformed into from the unitarity of F . In the following, after describing two group-sparsities of U , we estimate U by solving convex optimization problems based on (18) and the two group-sparsities (see Footnote 2 for the meaning of the word "group-sparse").

A. Narrow Bandwidth of Each Power Spectral Density
As mentioned in Section II-A, the power spectral density of the backscattered signal x(t) is proportional to the Doppler frequency distribution. In addition, the Doppler frequency distribution and the Doppler velocity distribution are essentially the same (see (3)). Thus, unless the Doppler velocity width σ v is extremely large, the bandwidth of the power spectral density, similar to the Doppler frequency width σ f , is narrow compared to the sampling frequency 1 T [Hz]. In particular, when the influence of the atmospheric turbulence is large, 5 the power spectral density can be modeled by a Gaussian function [7], [26], [27]: In this model, the autocorrelation function is exactly given by Figure 1 illustrates the relation between S(f ) in (2), S d (f ) in (4), and the periodogram (|u k | 2 ) L k=1 in (6) using (19) and (20). As shown in Fig. 1, the frequency indices k of large E[|u k | 2 ] are gathered in the vicinity of the mean Doppler frequency µ f .
Since E[|u k | 2 ] ≈ 0 implies u k ≈ 0 and the above discussion holds for every elevation θ m , each DFT coefficient vector u m will be group-sparse by an appropriate non-overlapping group 5 See, e.g., [26] for a detailed relation between the atmospheric turbulence intensity and the Doppler velocity distribution at each elevation angle. partition, but the appropriate partition is unknown in advance. Instead, we divide u m into L overlapping groups of size b f : under the periodic boundary condition. Note that we connect u L,m and u 1,m in z Then, z 1 k,m ≈ 0 holds for many k and m. Therefore, the matrix Z 1 will be group-sparse by a non-overlapping partition This property can be evaluated by a weighted group 1 -norm where is a nonnegative weight matrix.

B. Similarity between the Adjacent Power Spectral Densities
In the previous section, we focused on the narrow bandwidth of the power spectral density for each elevation angle θ m . In this section, we explain the similarity between the power spectral densities at adjacent elevation angles θ m and θ m+1 . When ∆θ is relatively small, i.e., M is set to a relatively large value to acquire the high-resolution precipitation profile, the mean Doppler velocity µ v and the Doppler velocity width σ v at θ m and those at θ m+1 are expected to be similar. As a result, the power spectral density at θ m and that at θ m+1 are similar.
under the periodic boundary condition. By arranging all z Then z 2 m,k ≈ 0 holds for many m and k. Therefore, the matrix Z 2 will be group-sparse by a non-overlapping partition , and there exists a simple matrix B e ∈ {0, 1} beM ×M satisfying B e U T = Z 2 . This property can be evaluated by a weighted group 1 -norm where Ξ 2 := (ξ Remark 1 (Difference from Our Previous Work [20]): In our previous work [20], we considered the beamforming as a reconstruction problem of X, and used the same group 1 1,Ξ1 as in (21) for the narrow bandwidth of each power spectral density. On the other hand, we did not exploit the similarity of the adjacent power spectral densities but use the continuity of precipitation profile, i.e., we assumed that if x m ≈ 0, then it is highly possible that x m+1 ≈ 0. This was evaluated by a weighted group 1 -norm 1,ξ0 will automatically be small. Thus, we do not use our previous cost Z 2 G0 1,ξ0 in this paper.

C. The Proposed Nonlinear Beamforming (Formulation I)
Based on the data-fidelity term in (18) and the two weighted group 1 -norms (21) and (22), we reconstruct U from Y by solving a convex optimization problem with the use of ADMM [21]- [24] (see Appendix for details of ADMM). By defining two convex functions g and h in (40) as (23) is expressed as an ADMM-form: where T is the transpose operator, • denotes the composition of two mappings, and L : The ADMM iterations for (24) are given as follows. On the first line in (41), since the function g(Z 1 , Z 2 ) is divided into two parts Z 1 G1 1,Ξ1 and Z 2 G2 1,Ξ2 , Z 1 and Z 2 are updated by Z with the use of the proximity operators of the group 1 -norms in (42), where γ > 0 and V 1 ∈ C b f L×M and V 2 ∈ C beM ×L are dual variables. Then, on the second line in (41), we have On the third line in (41), since U is updated as the solution to a least squares problem, the solution U (i+1) has to satisfy with the adjoint operator L * : Moreover, the composite mapping L * • L can be computed by By substituting (28) into (27), we have and U (i+1) is obtained by applying (γA TĀ + (b f + b e )I M ) −1 to (29) from the right side. Finally, on the fourth line in (42), V 1 and V 2 are updated, with the use of ρ (i+1) ∈ [0, 2], by By repeating (25), (26), (29) and (30) until a convergence condition is met, the solution to (24) is obtained as an estimate U .

D. The Proposed Nonlinear Beamforming (Formulation II)
In addition to the optimization problem in (23), we propose another optimization problem in which the relation between U and (Z 1 , Z 2 ) is modified to further promote group-sparsities. To increase almost-zero vectors z 1 k,m ≈ 0 and z 2 m,k ≈ 0, i.e., to make Z 1 and Z 2 more group-sparse, we use the latent group lasso formulation [28]- [31]. Specifically, we propose to solve minimize U ,Z1,Z2 where the weight matrix Ξ 2 = (ξ  (31) are constructed by applying B T f and B T e to Z 1 and Z 2 , respectively. Such a latent group lasso formulation can provide more accurate evaluations of the group-sparsity for fixed-size overlapping groups [30].
To solve the problem in (31) with ADMM, we define five auxiliary variables Λ 1 ∈ C b f L×M , Λ 2 ∈ C beM ×L , and Γ n ∈ C L×M (n = 1, 2, 3). Then, define functions g and h in (40) as with the use of an indicator function ι s.t. ι(Γ 1 , Γ 2 , Γ 3 ) := 0 if Γ 1 = Γ 2 = Γ 3 and ι(Γ 1 , Γ 2 , Γ 3 ) := ∞ otherwise. Moreover, by defining an linear mapping L in (40) as the problem in (31) is expressed as an ADMM-form: Note that although the function h(U , Z 1 , Z 2 ) actually depends only on U , we added Z 1 and Z 2 as arguments to make it easier to understand that the problem in (32) as the ADMM-from. The ADMM iterations for (32) are given as follows. On the first line in (41), since g(Λ 1 , 1,Ξ2 , and ι(Γ 1 , Γ 2 , Γ 3 ), the variables Λ 1 and Λ 2 are updated, with the use of some γ > 0, by and Γ n (n = 1, 2, 3) are updated, with the use of the proximity operators of the indicator function in (43), by On the third line in (41), U , Z 1 , and Z 2 can be updated as the solutions to three different least squares problems since the linear mapping L does not mix U , Z 1 and Z 2 . Specifically, from U is updated by Z 1 and Z 2 are respectively updated by and (38) Note that, if we use the matrix inversion lemma [32], [33], then (37) can be computed by By repeating (33)-(39) until a convergence condition is met, the solution to (32) is obtained as an estimate U .

V. NUMERICAL SIMULATIONS A. Simulation Settings
To show the effectiveness of the proposed nonlinear beamforming, we conducted simulations based on the real reflection intensity, observed by the PAWR at Osaka University, in Fig. 2. We conducted the following simulations on the basis of [34]. At the range r = 7.5 [km], we extracted 55 real reflection intensity data between θ min = −15 [deg] and θ max = 30 [deg]. The target reflection intensity Z m [dBZ] at each θ m was generated by cubic spline interpolation of these 55 data followed by adding Gaussian random numbers of variance 9. Note that the random numbers were needed to create the high-resolution precipitation profile. Since the minimum value of the original data in Fig. 2 is 10 [dBZ] in the area where raindrops exist, we defined the average signal power density P m at each θ m by We  (19) as the power spectral density S m (f ) at each θ m , and we defined the mean Doppler frequency by where n m ∈ R is a Gaussian random number of variance 1. The Doppler frequency width at each θ m was simply fixed to σ fm = 4 λ ≈ 125. 7 [Hz] which corresponds to σ vm = 2 [m/s]. The exact autocorrelation function R xm (τ ) of the backscattered signal x m (t) at each θ m can be computed as in (20), and the covariance matrix of each random vector x m is given by Based on [7], [35], each target signal x m was generated from a circularly-symmetric complex Gaussian distribution 6 In (10), the variance of the noise ε l was set to σ 2 ε = 2.5. We set the number of antenna elements to N = 128. For M = 110 and M = 166, we compared the proposed nonlinear beamforming, NL-I in (23) and NL-II in (31), with LS in (12), 7 FR in (13), and MMSE beamforming in (16), in cases of L = 20 and L = 60. Note that CP beamforming in (14) cannot be used since L < N holds in both cases. When M = 110, the number of subintervals where signals exist 8 was K = 93 < N . On the other hand, when M = 166, that was K = 142 > N , and hence it is difficult to estimate X or U . Moreover, in both cases, the condition number of A H A, denoted by cn(A H A) in Table I We evaluated the estimate X or U = F X T of each method by the following normalized mean square error (NMSE).
Moreover, we evaluated the estimated periodograms (| u k,m | 2 ) by the following normalized mean absolute errors (NMAEs). 6 In i.e., the variance does not converge to zero when L approaches infinity [36]. 7 To avoid the numerical instability in the computation of the pseudoinverse A † , we truncated the singular values of A that are smaller than 0.005. 8 We defined K as the number of the indices m s.t. Zm ≥ 10.

B. Simulation Results
Table I summarizes the average, for each situation and each method, of the normalized errors in 100 trials, where the power spectral density S m (f ) at each θ m was fixed throughout all the 100 trials. From Table I, we can see that the proposed methods, NL-I and NL-II, reduced NMSE by 59-64% from LS, 76-80% from FR, and 38-47% from MMSE when M = 110, while by 17-22% from LS, 72-74% from FR, and 58-63% from MMSE when M = 166. Thus, the proposed methods greatly improved the estimation accuracy of U . In addition, we find that MMSE beamforming estimated X with relatively high accuracy when M = 110 while the estimation accuracy significantly degraded due to the occurrence of sidelobes when M = 166.
On the estimated periodograms (| u k,m | 2 ), we can see that the proposed methods reduced NMAE for |u k,m | 2 by 45-50% from LS, 63-66% from FR, and 20-29% from MMSE when M = 110, while by 10-13% from LS, 68-70% from FR, and 54-59% from MMSE when M = 166. Moreover, the proposed methods reduced NMAEs for E[|u k,m | 2 ] and S  the proposed nonlinear methods, NL-I and NL-II, obtained the most accurate periodograms as shown in Figs. 5(g), 5(h), 6(g), and 6(h). As in the case of M = 110, NL-I and NL-II obtained the high-resolution periodograms without the energy diffusion, while the conventional linear methods, LS, FR and MMSE, led to the blurring and the overestimation. Thus, the proposed nonlinear beamforming greatly improved the spatial resolution.

VI. CONCLUSION
In this paper, we proposed a nonlinear beamforming method for a PAWR. Differently from a standard radar which observes point targets, the PAWR receives a lot of backscattered signals from distributed targets, and thus the spatial resolution of the linear beamforming methods is limited. To acquire the highresolution precipitation profile, we used two characteristics of periodograms of the backscattered signals. One is the narrow bandwidth of each periodogram, and the other is the similarity between the adjacent periodograms. Both characteristics were expressed as group-sparsities of the DFT coefficient matrix of the backscattered signals. We proposed to reconstruct the DFT coefficient matrix by minimizing a convex cost function which consists of one data-fidelity term and two group 1 -norms with ADMM. Numerical simulations based on the real data showed that, compared to the linear methods, the proposed nonlinear beamforming obtains the high-resolution precipitation profile. APPENDIX ALTERNATING DIRECTION METHOD OF MULTIPLIERS Let us consider the following convex optimization problem minimize x∈X ,z∈Z where X and Z are finite-dimensional real Hilbert spaces with the standard inner products, L : X → Z is a linear mapping, and both functions g : Z → R ∪ {∞} and h : X → R ∪ {∞} are proper, lower semicontinuous, and convex. 9 The alternating  direction method of multipliers (ADMM) [21]- [24] solves the problem in (40), together with the corresponding dual problem, from any (x (0) , v (0) ) ∈ X × Z by iteratively computing for i ≥ 0. In (41), γ > 0, v ∈ Z and v ∈ Z are dual variables, · X is the Euclidean norm of X induced by the standard inner product, prox γg : Z → Z is the proximity operator defined by prox γg (ζ) := argmin z∈Z γg(z) + 1 2 ζ − z 2 Z , and ρ (i+1) ∈ [0, 2] satisfies ∞ i=0 ρ (i+1) (2 − ρ (i+1) ) = ∞. In particular, if ρ (i+1) = 1 for all i, which is the most commonly used setting, then we can replace v (i) with v (i+1) and remove the fourth line in (41). The sequence (x (i) , z (i) ) ∞ i=0 defined by the iterations in (41) converges to a solution to (40). Moreover, the sequences (v (i) /γ) ∞ i=0 and ( v (i) /γ) ∞ i=0 both converge to a solution to the dual problem. As described in [24], the overrelaxation setting, i.e., ρ (i+1) > 1 for all i, can accelerate the convergence, compared to the standard setting s.t. ρ (i+1) = 1.
(43) As can be seen from (43), the proximity operator of the indicator function is equal to the projection onto the consensus set and does not depend on the value of γ differently from (42).