MAP image reconstruction for landmine and IED detection using ground penetrating radar

ABSTRACT In this article, we develop an iterative maximum a posteriori (MAP) estimation algorithm for reconstructing sub-surface images from ground penetrating radar (GPR) data. Note, the larger goal of our research is to use the resulting GPR images, along with appropriate detection algorithms, to identify landmines and improvised explosive devices. A typical scene-of-interest is expected to contain only a few electromagnetic wave scatterers. Taking advantage of this fact, we construct a novel sparsity promoting probability density function, which we refer to as the Butterworth prior because it is based on the well-known Butterworth lowpass filter and use it as the prior probability density function for the unknown reflection coefficients. The Butterworth prior can be used to approximate the uniform distribution but has the added benefit of mathematical tractability because its derivatives exist everywhere. Unlike available MAP-based approaches for reconstructing GPR images, we have developed a method for estimating the parameters of the Butterworth prior directly from the data. Therefore, at a cost of greater computational complexity, the main advantage of the proposed algorithm is that it does not require any user-defined parameters. We have tested the proposed algorithm, which we refer to as the MAP-Butterworth algorithm, on simulated and real data provided by the US Army Research Laboratory in Adelphi, MD. The results from the tests indicate that the MAP-Butterworth algorithm significantly suppresses sidelobes and background noise while retaining known scatterers.


Introduction
The threat of landmines and improvised explosive devices (IEDs) remains a major challenge in wars and post-conflict scenarios (Marlatt and Huygen 1998;Swart 2008).For example, more than half of the combat casualties suffered by the coalition forces during the Iraq and Afghanistan wars have been attributed to landmines and IEDs (Wilson 2006;Bird and Fairweather 2007).This fact has motivated the US Army to investigate new landmine and IED countermeasure strategies.Several approaches for detecting landmines exist.These include using metal detectors (Takahashi, Preetz, and Igel 2011), infrared sensing of land surfaces (Nelsen 1999), and above-ground sensing of chemical signature compounds that emanate from buried landmines (Cumming et al. 2001).A countermeasure technology that is currently of great interest to the US Army is forward-looking ground penetrating radar (FLGPR), which is a very promising approach for the imaging and detection of landmines and IEDs at combat-pace speed (Zorpette 2008).Figure 1 shows the prototype ultra-wide band (UWB) FLGPR system developed by the US Army Research Laboratory in Adelphi, MD (ARL), which is referred to as the synchronous impulse reconstruction (SIRE) FLGPR system.The viability of such a system as a component of a full-fledged IED detection system in combat environments hinges on the availability of FLGPR imaging algorithms that are effective and can run in real or near-real time.
We now briefly review the basic physical principles involved in GPR imaging.As a transmitted pulse travels through a scene-of-interest (SOI), reflections or echoes occur whenever the pulse encounters changes in the dielectric constant of the propagation medium.The strength of the reflected signal due to a patch of terrain is quantified by its reflection coefficient, which is proportional to the overall change in dielectric constant within the patch.A GPR imaging system processes the reflections from an SOI and generates two-dimensional or three-dimensional images that represent the spatial distribution of the reflection coefficients within the SOI.The use of GPR for detecting IEDs and landmines is motivated by the fact that these targets have significantly larger dielectric constants than surrounding material, such as soil and rocks.For high-frequency (i.e.greater than 3 MHz) GPR systems, the reflected signal due to a target can be adequately approximated as the sum of reflected signals from the target's individual elementary scatterers (Keller 1962).This fact leads to a straightforward mathematical model that can be used to develop GPR image reconstruction algorithms.
Although there already exist a number of image reconstruction algorithm for GPR imaging, each has shortcomings that may preclude them from being sufficiently effective in large-scale real-time applications.The delay-and-sum (DAS) algorithm is widely used in radar applications because it is fast and easy to implement.The DAS algorithm generates an estimate of the reflection coefficient for a voxel (i.e.volume element) by coherently adding up, across some user specified receiver aperture, all the data points that contain backscatter contributions due to that specific voxel.Unfortunately, images obtained by the DAS algorithm are characterized by large sidelobes and poor resolution (Nguyen 2009).The detection of targets with high accuracy from DAS images may be unlikely because targets with relatively small radar cross section (RCS) are likely to be obscured by the sidelobes of adjacent targets with larger RCS.The recursive sidelobe minimization (RSM) algorithm is an extension of the DAS algorithm that combines multiple DAS images formed using random subsets of the aperture (Nguyen 2009).Images reconstructed using the RSM algorithm generally exhibited considerable noise and sidelobe reduction However, the RSM algorithm demonstrated only modest clutter reduction, and its performance may not be consistent because of the use of randomly chosen data samples.Some of the shortcomings of the DAS and RSM algorithms may be attributed to the fact that they do not take into consideration known prior information about SOIs.Since only a few scatterers are present in a typical SOI, better reflection coefficient estimates can be expected when the a-priori sparsity assumption is incorporated into the model.The sparse learning via iterative minimization (SLIM) algorithm produces good results by incorporating the assumption that the SOIs are sparsely populated by scatterers (Tan et al. 2011).However, its high computational cost may make the SLIM algorithm impractical for large-scale real-time applications.The class of , 1 -regularized least squares (, 1 -LS) algorithms has been recommended for radar imaging where sparse solutions are expected (Burns, Subotic, and Pandelis 1997).This class of algorithms has been found to perform well in a number of applications, such as machine learning and neuroimaging.In offline applications, where training data are attainable, generalized cross-validation or similar techniques can be used to obtain optimal or near-optimal regularization parameter.However, in real-time online applications such as GPR imaging, there is no straightforward way to choose an appropriate regularization parameter.It may therefore be difficult to effectively take advantage of , 1 -LS algorithms in GPR imaging problems or other real-time applications.It should also be noted that recently several image reconstruction algorithms that are variants of regularized LS estimation have been proposed for synthetic aperture radar applications (Varshney et al. 2008;Çetin and Karl 2001;Yang et al. 2013).Although some adaptions of these algorithms to the GPR problem of interest may be feasible, it is not clear how they could be tailored to handle the large number of unknowns and associated large-scale datasets encountered in vehicle mounted GPR systems.The iterative shrinkage-thresholding (Daubechies, Defrise, and De Mol 2004;Combettes and Wajs 2005;Wright, Nowak, and Figueiredo 2009) and alternative direction methods of multiplier (Boyd et al. 2011) algorithms are popular approaches for solving inverse problems via , 1 -LS estimation.However, analysis in Ndoye, Anderson, and Greene (2016) demonstrates that, for a standard real data scenario, these algorithms would require about 350 GB of memory to store the system matrix and would be more than 10; 000 times slower than an MM-based algorithm for , 1 -LS estimation that exploits the specific structure of the system matrix in GPR.
In this work, a major goal was to develop a GPR image reconstruction algorithm that did not need user-defined parameters.To this end, we first take advantage of the fact that typical SOIs contain few scatterers and construct a novel sparsity promoting probability density function that is used as the prior probability density function for the unknown reflection coefficients.We refer to this prior as the Butterworth prior because it is based on the well-known Butterworth lowpass filter (Bianchi and Sorrentino 2007).The Butterworth prior can be viewed as an approximation to the uniform distribution but with the added benefit of mathematical tractability given that its derivatives exist everywhere.Next, we use the majorize-minimize (MM) technique to minimize the negative of the log of the maximum a posteriori (MAP) objective and arrive at an algorithm, which we call the MAP-Butterworth algorithm, that is straightforward to implement and readily amenable to parallelization.Additionally, we take advantage of the particular structure of the GPR data model and results from Ndoye, Anderson, and Greene (2016) to construct an implementation of the MAP-Butterworth algorithm that is more computationally efficient and memory efficient than the direct implementation.Finally, we develop a method for estimating the parameters of the Butterworth prior directly from the data.Therefore, when compared to available GPR image reconstruction algorithms, such as the ones discussed above, the primary advantage of the MAP-Butterworth algorithm is that the user is not required to specify any parameter values.However, it should be mentioned that this advantage comes with the cost of an increase in computational complexity.Although, for concreteness, the MAP-Butterworth image reconstruction algorithm is based on ARL's SIRE UWB FLGPR system, it can be can applied to other GPR systems with moving platforms.

Model
The SIRE FLGPR system, which is depicted in Figure 1, comprises two transmit antennas and p s receive antennas.The two transmit antennas alternately send UWB pulses and the J receive antennas capture the backscatter.We define the transmit locations as the spatial positions where the UWB pulses are transmitted.The global positioning system (GPS) locations of the active transmit antenna and receive antennas are determined and stored by the system.
Consider a scatterer at spatial point p s within a voxel that is in the SOI, and a pair of transmit and receive antennas located at spatial points p t and p r , respectively.If we ignore the noise contribution momentarily, the relationship between the transmitted signal, pðtÞ, and the received signal due to the scatterer, r s ðtÞ, is given by r s ðtÞ ¼ x s α s pðt À τðp s ; p t ; p r ÞÞ (1 where t is time, x s is the reflection coefficient associated with the scatterer, τðp s ; p t ; p r Þ is the time it takes for the pulse to travel from the transmit antenna to the scatterer and back to the receive antenna, and α s is the attenuation the pulse undergoes during the round trip travel. We now generalize the model in Equation (1) and divide the SOI into a grid of L voxels and define x l to be the reflection coefficient of the lth voxel.Extending the model described in Equation (1) to the SIRE GPR system, the signal recorded by the jth receive antenna when the vehicle is at the ith transmit location is given by s i;j ðtÞ ¼ X L l¼1 x l α i;j;l pðt À τ i;j;l Þ þ wðtÞ i ¼ 1; 2; . . .; I j ¼ 1; 2; . . .; J where • τ i;j;l is the time it takes for the transmitted pulse to propagate from the active transmit antenna at the ith transmit location to the lth voxel and for the backscattered signal to propagate to the jth receiver.• α i;j;l is the propagation loss the transmitted pulse undergoes as it travels from the active transmit antenna at the ith transmit location to the lth voxel and then back to the jth receiver.
• wðtÞ is the noise.
• I is the number of transmit locations.
The signal s i;j ðtÞ is a continuous-time signal, whereas, in practice, the SIRE FLGPR system records only samples of the return signals.Therefore, we define the following discrete-time signals where n ¼ 0; 1; . . .; N À 1, T s is the sampling interval and N is the number of samples per radar return.Let d i;l denote the distance from the active transmit antenna at the ith transmit location to the lth voxel and d i;j;l denote the distance from the lth voxel to the jth receive antenna when the truck is at the ith transmit location.It follows that the delay is given by where c is the speed of light.Also, we adopt the mathematical model for attenuation that was employed in (Nguyen 2009;Ojowu et al. 2013) From Equations ( 3) and ( 4), we can express y i;j ½n as The system of equations that follows from Equation ( 7) for n ¼ 0; 1; . . .; N À 1 can be written in matrix form as where the L Â 1 vector x, and N Â 1 vectors y i;j and e i;j are defined to be The matrix A i;j is an N Â L matrix that contains shifted and scaled versions of the transmitted pulse.Specifically, the matrix A i;j can be written as A i;j ¼ P i;j D i;j where D i;j ¼ Δ diagðα i;j;1 ; α i;j;2 ; . . .; α i;j;L Þ (10) We concatenate the sampled data vectors from all transmitter-receiver pairs, {y i;j }, to obtain the K Â 1 data vector y y ¼ Δ ½y T 1;1 ; y T 1;2 ; . . .; y T 1;J ; . . .; y T I;1 ; y T I;2 ; . . .; y T I;J T (12) where K ¼ IJN.Extending Equation ( 8) to account for all transmitter-receiver pairs yields the desired model where the known K Â L matrix A is given by and the K Â 1 noise vector e, which is assumed to be zero mean Gaussian white noise with variance σ 2 (Stoica and Babu 2011;Webb, Havens, and Schulz 2018), is given by e ¼ Δ ½e T 1;1 ; e T 1;2 ; . . .; e T 1;J ; . . .; e T I;1 ; e T I;2 ; . . .; e T I;J T : (15) Given the transmitted pulse pðtÞ and observed data y, the task at hand is to estimate the reflection coefficient vector x.We note that the geometry of this estimation problem is defined by the chosen SOI and the locations of the transmit and receive antennas.Also, the time delays τ i;j;l È É and attenuation values α i;j;l È É are computed using Equations ( 5) and ( 6), respectively.

MAP estimation and the Butterworth prior
Although many estimation methods could be used to estimate the reflection coefficients, we chose the MAP method because it allows for the incorporation of a priori information in a relatively straightforward manner.

MAP objective function
Let Y and X represent the random vectors underlying the data y and reflection coefficient vector x, respectively.The a posteriori density function of X given Y ¼ y can be expressed as where f X and f Y are the joint density functions of the reflection coefficients and data, respectively.Therefore, a general expression for the MAP estimator is With the white Gaussian noise assumption, it follows from Equation ( 13) that the conditional density function of Y given that X ¼ x is a multivariate Gaussian density function with mean E½Y ¼ Ax and covariance matrix C ¼ σ 2 I K , where I K is the K Â K identity matrix In certain related radar applications, namely through-the-wall radar imaging and synthetic aperture radar, Bayesian methods have been developed under the assumption that the reflection coefficients are independent and identically distributed (Tang et al. 2013;Vu et al. 2013).Under this assumption, the MAP estimator for the reflection coefficients can be written as where  (Gurbuz, McCellan, and Scott, 2007).Thus, most of the reflection coefficient values are expected to be near zero.We present a novel density function, known as the Butterworth density function, that is used to account for the sparsity assumption.The Butterworth density function is defined to be where a is the function's variable and is a normalization factor, and and n are parameters that control the width and decay rate of the density function, respectively.Figure 2 is a plot of the Butterworth density function for ¼ 1 and n ¼ 1; 2; 3; . . .; 7. From this plot, we observe that for n sufficiently large, the Butterworth density function can be viewed as an approximation to the uniform distribution.

MAP-Butterworth algorithm: known prior parameters and noise variance
In this section, we present an algorithm that estimates the reflection coefficients by minimizing the MAP objective function in Equation ( 20) for the case of the Butterworth prior ϕðx; σ 2 ; ; nÞ where the negative natural log of the prior, ϕ 2 , is given by Figure 2. A plot of the Butterworth prior for n ¼ 1; 2; 3; . . .; 7 and ¼ 1:0.
The majorize-minimize optimization technique minimizes an objective function by iteratively minimizing a properly chosen majorizing function (Hunter and Lange 2004).
If we succeed in obtaining a majorizing function for the function f , then the associated MM algorithm for minimizing f is given by where x ðmÞ is the current iterate.From Equations ( 26)-( 28), it follows The above results imply that the iterates monotonically decrease the function f as the number of iterations increases (Yen 2011;Sriperumbudur, Torres, and Lanckriet 2011;Figueiredo, Bioucas-Dias, and Nowak 2007).However, the fact that an algorithm possesses the monotonicity property in Equation ( 29) does not imply that the algorithm converges to a minimum.Figure 3 provides an one-dimensional illustration of the MM technique.In the discussion below, we use the MM technique to minimize the MAP objective function, ϕ.

MM-based Butterworth-MAP algorithm for known prior parameters and noise variance
Initially, we will assume that the noise variance, σ 2 , and Butterworth parameter, , are known.Suppose the functions q 1 ðÁ; xÞ and q 2 ðÁ; x; ; nÞ are majorizing functions for ϕ 1 and ϕ 2 at the point x 2 R L , respectively.It would then follow that qðx; x; σ 2 ; ; is a majorizing function for ϕ at the point x.Moreover, the associated MM algorithm would be given by x ðmþ1Þ ¼ arg min x qðx; x ðmÞ ; ; nÞ (31) where x ðmÞ is the current estimate of the reflection coefficient vector.We now determine suitable majorizing functions for ϕ 1 and ϕ 2 .De Pierro (1995) constructed a majorizing function for linear least squares objective functions.Later, Ndoye, Anderson, and Greene (2016) used De Pierro's result to obtain a majorizing function for ϕ 1 , which is a linear least squares objective function.For completeness, we outline De Pierro's construction of a majorizing function for ϕ 1 .
First, we rewrite the least squares objective function as where y k is the kth component of y.We can obtain a majorizing function for ϕ 1 by first finding a majorizing function for ½Ax k À Á 2 , where ½Ax k is the kth element of the vector Ax.The term Ax ½ k À Á 2 can be written as where Since λ k;l ¼ 1 for all k and λ k;l !0 for all k; l, it follows from the convexity of the square function that where It is readily seen that r k ðx ðmÞ ; x ðmÞ Þ ¼ ð½Ax ðmÞ k Þ 2 .Thus, r k ðÁ; x ðmÞ Þ is a majorizing function for ½Ax k À Á 2 at the point x ðmÞ .By replacing ½Ax k À Á 2 by r k ðx; x ðmÞ Þ in Equation ( 32), we obtain the desired majorizing function for the least squares objective function The advantage of the majorizing function for the least squares objective function, q 1 , over the least squares objective function, ϕ 1 , is that the former function decouples the updates for the individual pixels.We now turn next to the problem of finding a majorizing function for ϕ 2 , which can be written as where gða; ; nÞ ¼ Δ À ln f B ða; ; nÞ (42) de Leeuw and Lange (2009) showed that for a one-dimensional function w, the following function on R 2 hðx; yÞ majorizes w at y, provided w is twice differentiable, and C > 0 is an upper bound for its second derivative w 00 .We use this 'Taylor series' expansion method to obtain a majorizing function for ϕ 2 .First, replacing the function g in Equation ( 41) by its Taylor series-based majorizing function results in the following majorizing function for ϕ 2 at x ðmÞ q 2 ðx; x ðmÞ ; ; nÞ ¼ where B ;n is the least upper bound of the second derivative of gðÁ; ; nÞ.This choice for B ;n guarantees the optimality of the majorizing function q 2 (de Leeuw and Lange 2009).Since g 00 ða; ; nÞ ¼ 1 2 g 00 ða; 1; nÞ, we find it convenient to express the upper bound as B ;n ¼ 1 2 B 1;n , where B 1;n can be determined numerically.Note, it is straightforward to see that q 2 ðx ðmÞ ; x ðmÞ ; ; nÞ ¼ ϕ 2 ðx ðmÞ ; ; nÞ.
From Equations ( 30), (40), and (45), we have the following majorizing function for the MAP objective function qðx; x ðmÞ ; σ 2 ; ; nÞ x ðmÞ ÞÞ þ q 2 ðx; x ðmÞ ; ; nÞ : Recall that we can obtain an algorithm for minimizing the MAP objective function using Equation ( 31).Taking the derivative of qðx; x ðmÞ ; σ 2 ; ; nÞ w.r.t.x l and setting the result to zero produces the desired MAP algorithm.The derivative of qðx; x ðmÞ ; σ 2 ; ; nÞ w.r.t.x l equals @qðx; x ðmÞ ; σ 2 ; ; nÞ where @q 2 ðx; x ðmÞ ; ; nÞ Setting this derivative to zero yields the following MAP algorithm for reconstructing images from GPR data where Using the particular structure of the GPR data model, the quantities H l and G ðmÞ l can be computed efficiently using the fast implementation strategies developed in Ndoye, Anderson, and Greene (2016).We provide a summary of the fast implementation in the Supplemental Data.

MAP-Butterworth algorithm: unknown prior parameters and noise variance
The MM-based algorithm developed in Section 2.3.2 assumes that both noise variance, σ 2 , and Butterworth parameters, and n, are known.However, these quantities are typically unknown in practice and must be estimated.In this section, we develop an algorithm that jointly estimates the reflection coefficients, noise variance, and Butterworth parameters.However, we first examine the question as to whether or not the MAP objective function has a minimum for the case of the Butterworth prior and Laplacian prior which is a popular choice [see Figueiredo (2003) and related references therein].

Analysis of MAP objective functions for Laplacian and Butterworth priors
First, we consider the MAP objective function for the Butterworth prior, which we rewrite as ϕðx; σ 2 ; ; nÞ Then, we investigate the MAP objective function when the Laplacian distribution is used as the prior For the moment, we will assume the parameters , n, and λ are known and σ 2 ¼ 1 w.l.o.g. when we consider the behaviour of ϕ and ϕ L as functions of x.
From Equation ( 54), it can be observed that ϕ, which is not convex, goes to infinity as the L 1 norm of x, which is given by k xk 1 ¼ Δ P L l¼1 x l j j, approaches infinity and has finite limit when k xk 1 is finite.Likewise, ϕ L , which is convex, goes to infinity as k xk 1 approaches infinity and has finite limit when k xk 1 is finite.Consequently, for fixed σ 2 , , n, and λ, there exist xλ and xε;n such that x;n ¼ arg min Since ϕ L is convex, it follows that xλ is unique.On the other hand, x;n may not be unique because ϕ is not convex and algorithms used to minimize ϕ may converge to a local minimum.
In practice, the parameters , n, and λ are unknown so an idea is to simply minimize (1) ϕ L with respect to λ and (2) fix n and minimize ϕ with respect to x and .To see if these two approaches are possible, we first write ϕ L as When ðx; λÞ !ð0; 1Þ in such a way that λ Á x l j j !0, it can be seen that ϕ L ðx; λÞ !À1, which implies that ϕ L does not have a minimum.Therefore, we cannot jointly estimate the reflection coefficients and Laplacian parameter, λ, by minimizing ϕ L .
The following result will be useful in our analysis of the MAP objective function: Our proof starts with the definition of the normalizing constant k n ðÞ We would like to determine upper and lower bounds for k n ðÞ.First, consider Now, for a lower bound, we have that With these upper and lower bounds, it follows that ð2n À 1Þ 2n From Equations ( 54) and ( 59), we get the inequalities INTERNATIONAL JOURNAL OF REMOTE SENSING Now, suppose ðx; Þ !ð0; 0Þ in such a way that x l j j= !c, where the constant c is a finite.It can be seen that ln 1 þ x l À Á 2n h i !À1 and thus ϕ does not have a minimum relative to x and when n is fixed.From the lower bound in Equation ( 67), it is clear that the MAP objective function, ϕ, is bounded below regardless of how ðx; nÞ !ð0; 1Þ, provided is fixed.Hence we develop our method based on the simultaneous minimization of ϕ with respect to x and n.

Butterworth-MAP algorithm for unknown prior parameters and noise variance
To develop an algorithm where the Butterworth parameters, and n, and the noise variance, σ 2 , are unknown, we first assume that a set of candidate pairs of Butterworth parameters is available.Suppose n i is the ith candidate for the Butterworth 'decay parameter' n out of N n chosen candidate values for n.Let ð i;1 ; n i Þ; ð i;2 ; n i Þ; . . .; ð i;N ; n i Þ be the associated candidate pairs for ð; n i Þ, where N ε is the number of values for the parameter ε that are considered for each candidate value for the parameter n.Given a set of suitable candidate pairs of Butterworth parameters, the cyclic optimization method given below could, in principle, be used to estimate the reflection coefficients while jointly determining estimates for the Butterworth parameters and noise variance: Algorithm 1 • Initialization: Get initial estimates: x ð0Þ and σ 2ð0Þ • Iteration: for u ¼ 1; 2; . . .; N n do for v ¼ 1; 2; . . .; N do for m ¼ 0; 1; . . .; M À 1 do end for Save x ðMÞ , σ 2ðMÞ , u;v , and n u if they produce the current minimum ϕ value end for end for Note, M is the chosen number of iterations and n 1 ; n 2 ; . . .; n N n f gare the candidate values for the Butterworth parameter n.Instead of solving the minimization problem in Equation ( 68), it will be convenient to find an iterate x ðmþ1Þ that decreases the MAP objective function ϕ in the following sense ϕðx ðmþ1Þ ; σ 2ðmÞ ; u;v ; n u Þ ϕðx ðmÞ ; σ 2ðmÞ ; u;v ; n u Þ : (70) Taking into account the discussion in Section 2.3.2, an iterate that satisfies Equation ( 70) is the minimizer of the majorizing function qðx; x ðmÞ ; σ 2ðmÞ ; u;v ; n u Þ.Thus, a solution to Equation ( 70) is We estimate the noise power by obtaining a closed form expression for the minimizer of ϕðx ðmþ1Þ ; σ 2 ; u;v ; n u Þ with respect to σ 2 .Putting the results together, below is an alternative to the algorithm defined by Equations ( 68) and ( 69): Algorithm 2 • Initialization: Get initial estimates: x ð0Þ and σ 2ð0Þ • Iteration: for u ¼ 1; 2; . . .; N n do for v ¼ 1; 2; . . .; N do for m ¼ 0; 1; . . .; M À 1 do end for Save x ðMÞ , σ 2ðMÞ , u;v , and n u if they produce the current minimum ϕ value end for end for It should be noted that Equation ( 73) is a convenient expression for Equation (69).
We now show that the cyclic minimization algorithm defined by Equations ( 72) and ( 73) is guaranteed to monotonically decrease the MAP objective function ϕ.As previously discussed, Equation (72) implies that Equation (70) holds.Also, due to the equivalence of Equations ( 69) and ( 73), it follows that ϕðx ðmþ1Þ ; σ 2ðmþ1Þ ; u;v ; n u Þ ϕðx ðmþ1Þ ; σ 2ðmÞ ; u;v ; n u Þ (74) and therefore we get the desired result The solution to Equation ( 72) was discussed in Section 2.3.2 and is given by Equation ( 49) with σ 2 replaced by σ 2ðmÞ x To obtain the estimate for the noise power, we compute the derivative of the objective function on the right-hand side of Equation ( 73) with respect to σ 2 and set the result to zero.Straightforward calculations provide the resulting noise power estimate For unknown noise variance and Butterworth parameters, the following is a detailed summary of the final algorithm: From Figure 2, we believe it is reasonable to conclude that the Butterworth priors for n ! 1 are sufficiently close that it would be adequate to only consider a small set of values for n in Algorithm 3. Given n i , which as stated previously is the ith candidate for the Butterworth parameter n, we first determine the corresponding maximum likelihood estimate for when the observations are the initial estimates of the reflection coefficients x ð0Þ 1 ; x ð0Þ 2 ; . . .; x ð0Þ L .
Then, we propose the following associated set of candidate epsilon values i;v ¼ ð0Þ n i =2 v ; v ¼ 1; 2; . . .; N : The motivation behind dividing ð0Þ n i by powers of two is to account for the fact that the initial image, for instance an image reconstructed using the DAS algorithm, is not sufficiently sparse.Therefore, the use of candidate values that are less than ð0Þ n i is expected to produce images with greater sparsity than the initial image.
We solve the optimization problem in Equation ( 78) by first replacing k n i ðÞ with an approximation kn i ðÞ ¼ ð2Þ À1 , and then using a one-dimensional line search, such as the golden section method.For the lower and upper bounds of the search, we used a small positive number (e.g. 10 À6 ) and max x 1 ; j jx ð0Þ 2 ; . . .; j jx ð0Þ L n o , respectively.An alternative to the approach just described would be to use the MM technique to develop an algorithm that monotonically decreases the objective function in Equation ( 78).
which provides the true values of the reflection coefficients of the voxels within the SOI.The cross-track and down-track size of the voxels were 0.1 and 0.02 m, respectively, so the dimension of the reconstructed GPR image was 300 Â 400 voxels.The radar returns were corrupted with additive white Gaussian noise with a signal-to-noise ratio of 25 dB.
The number of transmit locations and receivers was I = 121 and J = 16, respectively, and the number of samples per radar return is N ¼ 2700.We applied the DAS and LMM algorithms, and Algorithm 3 to the simulated data.In our implementation of Algorithm 3, we chose N ¼ 2, N n ¼ 3, n 1 ¼ 1, n 2 ¼ 2, and n 3 ¼ 3. The run times for the DAS and LMM algorithms, and Algorithm 3 were 0:45, 23, and 69:35 min, respectively, using MATLAB on a 2012 Apple MAC-Pro.In Figure 4(b), we observe that the standard DAS algorithm is not able to satisfactorily remove the noise and sidelobes.By contrast, as evident from Figures 4(c) and (d), both LMM algorithm and Algorithm 3 retain all the scatterers while significantly suppressing the sidelobes and noise.It can be observed that the scatterers in the LMM image are slightly more focused than the ones in Algorithm 3 image.The mean-square errors for the DAS and LMM algorithms, and Algorithm 3 were computed to be 1:0883 Â 10 14 , 1:7187 Â 10 6 , and 1:7367 Â 10 6 , respectively.

Real GPR data
We now evaluate the performance of Algorithm 3 using data obtained from ARL's FLGPR system.The test data correspond to measurements taken from I ¼ 274 consecutive transmit locations using J ¼ 16 receive antennas.The number of samples per radar return is N ¼ 1350.We choose a 25 m Â 64 m SOI and divide it into a grid of 250 voxels in the cross-track direction and 1600 voxels in the down-track direction.Therefore, the cross-track and down-track voxel sizes are 0:1 and 0:04 m, respectively.Due to restrictions imposed by the research sponsor, we are disallowed from providing any description of the targets.
As a result of the large size of the SOI, we do not reconstruct an image by estimating the reflection coefficients of the voxels all at once.Such an image would have crosstrack resolution that varies from the near-track to the far-track voxels.The voxels in the near-track would have higher resolution than those in the far-track (Nguyen et al. 1998(Nguyen et al. , 2003(Nguyen et al. , 2007)).To create GPR images with consistent resolution across the SOI, we use the mosaicking approach discussed in Nguyen (2009), which is depicted in Figure 5. Specifically, the image space associated with the SOI is divided into 32 sub-images of size 25 m Â 2 m.Each sub-image has 250 voxels in the cross-track direction and 50 voxels in the down-track direction (i.e.L ¼ 12; 500 for each sub-image).The aperture is divided into 32 sub-apertures, where adjacent ones overlap by approximately 2 m.In Figure 5, the locations S1 and E1, and S2 and E2 are the starting and ending points of the first and second sub-apertures, respectively.Each sub-aperture has 43 transmit locations and is approximately of size 12 m Â 2 m.The radar return data and GPS positioning measurements of the transmit and receive antenna locations are used to estimate the reflection coefficients for the corresponding sub-image.The reconstructed sub-images are merged together to obtain the complete image of the SOI.It should be noted that there is a system matrix for each pair of sub-apertures and sub-SOI.In this sense, the model for real GPR data is time-varying.and (c), are both sparser than the DAS image and suppress both sidelobes and background noise.Furthermore, the LMM and Algorithm 3 images clearly show all the targets that are known to be present in the SOI.The targets that served as the ground-truth are encircled in red ellipses in Figure 6.These results show that the performance of Algorithm 3 and the LMM algorithm is comparable, but, as in the simulated data scenario, the hand-tuned LMM image was more sparse than Algorithm 3 image.However, Algorithm 3 can automatically determine the parameters of the prior probability density function.The disadvantage of Algorithm 3 as compared to the LMM algorithm is its higher computational cost.The run times for the DAS and LMM algorithms, and Algorithm 3 were 4 min, 1:5 hour, and 4:9 hour, respectively, using MATLAB on a 2012 Apple MAC-Pro.It should be emphasized here that our fast implementation of Algorithm 3 [using acceleration techniques we presented in Ndoye, Anderson, and Greene (2016)] is approximately 200 times faster than the direct implementation and utilizes about 900 times less memory.

Discussion
In this article, we presented a MAP image reconstruction algorithm for ultra-wideband GPR imaging that exploits the known sparsity of scatterers through a novel prior, which we call the Butterworth density function.It is important to point out that the Butterworth density function is a mathematically tractable approximation to the uniform density function and can be used in other applications.The proposed algorithm, which we refer to as the MAP-Butterworth algorithm, iteratively minimizes the negative log of the a posterior density function via the MM optimization technique.The use of the MM technique guarantees that the iterates of the MAP-Butterworth algorithm produce objective function values that monotonically decrease.By using certain implementation techniques from Ndoye, Anderson, and Greene (2016), we developed a memory and computationally efficient implementation of the MAP-Butterworth algorithm.Also, because it can be implemented in parallel, we believe that a real-time implementation of the MAP-Butterworth algorithm is possible via the aforementioned implementation techniques and high-performance computing architectures, such a graphical processing units and field-programmable gate arrays.
In studies using simulated and real GPR data, the Butterworth-MAP algorithm produced sparse GPR images that retained all the scatterers with reduced sidelobes and noise as compared to the popular DAS algorithm.When compared to the LMM algorithm (Ndoye, Anderson, and Greene 2016), the performance of the MAP-Butterworth was comparable but, at the cost of a greater computational burden, had the advantage of not requiring any user defined parameters.

Figure 3 .
Figure 3. Illustration of an objective function f being minimized by an MM algorithm, where the function gðÁ; x ðnÞ Þ is a majorizing function for f about the point x ðnÞ .

Figures 6
Figures 6(a)-(c) show the GPR images that were generated using the DAS and LMM algorithms, and Algorithm 3, respectively.It is evident from this figure that, as in the simulated data case, the DAS image has significant sidelobes and shadows, and clearly visible background noise.The images obtained using the LMM algorithm and Algorithm 3, shown, respectively, in Figures6(b) and (c), are both sparser than the DAS image and suppress both sidelobes and background noise.Furthermore, the LMM and Algorithm 3 images clearly show all the targets that are known to be present in the SOI.The targets that served as the ground-truth are encircled in red ellipses in Figure6.These results show that the performance of Algorithm 3 and the LMM algorithm is comparable, but, as in the simulated data scenario, the hand-tuned LMM image was more sparse than Algorithm 3 image.However, Algorithm 3 can automatically determine the parameters of the prior probability density function.The disadvantage of Algorithm 3 as compared to the LMM algorithm is its higher computational cost.The run times for the DAS and LMM algorithms, and Algorithm 3 were 4 min, 1:5 hour, and 4:9 hour, respectively, using MATLAB on a 2012 Apple MAC-Pro.It should be emphasized here that our fast implementation of Algorithm 3 [using acceleration techniques we presented inNdoye, Anderson, and Greene (2016)] is approximately 200 times faster than the direct implementation and utilizes about 900 times less memory.
Figures 6(a)-(c) show the GPR images that were generated using the DAS and LMM algorithms, and Algorithm 3, respectively.It is evident from this figure that, as in the simulated data case, the DAS image has significant sidelobes and shadows, and clearly visible background noise.The images obtained using the LMM algorithm and Algorithm 3, shown, respectively, in Figures6(b) and (c), are both sparser than the DAS image and suppress both sidelobes and background noise.Furthermore, the LMM and Algorithm 3 images clearly show all the targets that are known to be present in the SOI.The targets that served as the ground-truth are encircled in red ellipses in Figure6.These results show that the performance of Algorithm 3 and the LMM algorithm is comparable, but, as in the simulated data scenario, the hand-tuned LMM image was more sparse than Algorithm 3 image.However, Algorithm 3 can automatically determine the parameters of the prior probability density function.The disadvantage of Algorithm 3 as compared to the LMM algorithm is its higher computational cost.The run times for the DAS and LMM algorithms, and Algorithm 3 were 4 min, 1:5 hour, and 4:9 hour, respectively, using MATLAB on a 2012 Apple MAC-Pro.It should be emphasized here that our fast implementation of Algorithm 3 [using acceleration techniques we presented inNdoye, Anderson, and Greene (2016)] is approximately 200 times faster than the direct implementation and utilizes about 900 times less memory.

Figure 5 .
Figure5.Illustration of the mosaic approach used for the real data.Adjacent sub-apertures overlap by 2 m.The kth sub-aperture is used to reconstruct the kth sub-image.For the first sub-image to be formed, the radar platform travels for 12 m (20to 8 m from the sub-image).After the first sub-image is formed, each subsequent sub-image (2 m in the along-track direction) will be generated every 2 m of the platform travel distance.

Figure 6 .
Figure6.A comparison of the images generated by applying the DAS and LMM algorithms, and Algorithm 3 to real data collected by ARL using their UWB SIRE GPR system.Using the DAS algorithm, the image in (a) was reconstructed.The image in (b) was obtained using Algorithm 3 with 300 iterations, N ¼ 1, N n ¼ 3, n 1 ¼ 4, n 2 ¼ 5, and n 3 ¼ 6.The image in (c) was generated by the LMM algorithm using 300 iterations and λ ¼ 5000.
)2.3.1.The majorize-minimize techniqueLet f be a function to be minimized within a domain D R L .A real valued function v is said to be a majorizing function for f at the point x 2 D if vðx; xÞ ¼ f ðxÞ; "x 2 D: Save x ðMÞ , σ 2ðMÞ , u;v , and n u if they produce the current minimum ϕ value end for end for Until this point, it has been assumed that a suitable candidate set of Butterworth parameter pairs is available.We now discuss our choice for the Butterworth parameter pairs.