Compound truncated Poisson gamma distribution for understanding multimodal SAR intensities

In recent years, many works have addressed the proposal of new probability models by theoretic and applied reasons. Specifically, mixture models have been indicated to describe phenomena whose resulting data impose high flexibility. One drawback of these tools is the high number of parameters involved, which implies hard inference procedures. To outperform this gap, we propose a new model that is able to describe multimodal behaviors with only three parameters, called compound truncated Poisson gamma (CTrPGa) distribution. Some properties of the CTrPGa law are derived and discussed: hazard, characteristic and cumulative functions and ordinary moments. Beyond, moment estimation, maximum likelihood estimation (via the expectation maximization algorithm) and empirical characteristic function methods for CTrPGa parameters are furnished. The first of them may be reduced to solve one nonlinear equation, which facilitates its use. We perform a simulation analysis to compare the performance of the three estimation methods studied. Moreover, since the gamma distribution and its mixture versions are commonly used to characterize synthetic aperture radar (SAR) intensities, we perform some real experiments with SAR imagery. The results present evidence that our model is a reasonable assumption that can be taken into account in the pre-processing step of such images.


Introduction
Several phenomena produce multimodal data in practice Cobb [5]; for instance, in image processing El-Zaart and Ziou [11] and in voice recognition Povey et al. [31]. Mixture models have been employed for describing these data McLachlan and Peel [23]. Bouguila and Ziou [2] proposed a new finite mixture model based on the generalized Dirichlet distribution. They also applied their proposal to image object recognition. Yang and Liu [37] have presented a new mixture density model based on the concept of group. Mignotte et al. [24] have proposed an unsupervised segmentation based on a hierarchical Markov random field model and Yang and Krishnan [36] provided an image segmentation method based on mixture and spatial information. However, such models impose difficult to inference and optimization (crucial steps in their adjustment) procedures because the presence of a high number of parameters. To solve this issue, we propose a three-parameter probability distribution which may describe data having multiple modes.
The practical focus of this paper is to study features in the remote sensing context. Synthetic aperture radar (SAR) images have been employed as an efficient source for solving remote sensing issues. Each pixel of a SAR image is associated with a complex element whose squared norm is known as intensity. To detail the last statement, as discussed by Delignon and Pieczynski [8], the back-scattered field defined on SAR illuminated areas is determined as where F k is the kth scatter in a resolution cell and N is the number of scatters in one resolution cell that is addressed in two senses: (i) if N is large and constant, F follows the circular Gaussian distribution and the associated intensity I = ||F|| 2 has exponential law [16]; (ii) N can be understood as a random variable following the Poisson(λ) distribution. We aim to advance in (ii). Note that E(N) = λ governs the number of parcels in (1). The intensity of F is where || · || is the Euclidean norm, E(·) is the expected value and (·) * is the conjugate of a complex argument. Here, we assume that the second parcel in (2) is null [3, p. 47]. The intensity of the SAR signal plays an important role because it carries physical properties of the under-inspection surface. Several imagery post-processing methods -like boundary detection Nascimento et al. [26], classification Ferreira et al. [12] and segmentation Ferreira and Nascimento [7] -are based on the intensity probability distribution. In the processing of such images, two important issues are: (i) SAR returns receive the interference of the speckle noise what makes their processing difficult and, when working with any image, (ii) it is expected some scenarios involve several texture kinds (or objects) and, therefore, the associated empirical distributions are multimodal. Here, we suggest as a pre-processing probabilistic assumption to describe SAR intensities by the sum of a random number N, following the truncated Poisson (TrP) model, of gamma (Ga) distributed independent random variables, called compound truncated Poisson gamma (CTrPGa) distribution. One way to generate probability distributions is by means of the compounding N method proposed by Karlis and Xekalaki [20]. In this method, a phenomenon is modeled by the random sum: where both N and {X i ; i = 1, . . . , N} are independent random variables. In what follows, we assume that S = S N . Equation (3) has been widely used by its analytic form arising from several natural phenomena. Revfeim [32] derived the compound Poisson exponential for describing the total precipitation per day, where the number of days with precipitation is Poisson distributed and the precipitation amount follows the exponential distribution.
Panger [28] discussed that the compound Poisson and the negative binomial models are extensively used in risk economic theory. Thompson [33] performed an application of the compound Poisson distribution to model the total amount of monthly rainfall. Özel and İnal [18,19] derived a procedure to obtain the compound Poisson process probability function and applied it to describe an aftershock sequence in Turkey and the number of fatalities in Groningen. Recently, Nascimento et al. [27] and Vasconcelos et al. [35] have also modeled SAR imagery using the compound truncated Poisson normal (CTrPN) and compound truncated Poisson Cauchy (CTrPCa) distributions, respectively. In this paper, we address three aims. First, we introduce the CTrPGa model to fit intensities extracted from SAR images. It is worth to mention the Ga distribution is the 'mother' law in multiplication modeling to describe intensities Frery et al. [14], being the Ga more adherent to SAR imagery physical formation than Gaussian and Cauchy distributions that were studied by Nascimento et al. [27] and Vasconcelos et al. [35], respectively, by proposing the CTrPN and CTrPCa distributions. Our model has three parameters: one shape parameter, one of scale and one that indicates the number of parcels in (3) as latent information to observed intensity. The CTrPGa distribution is able to describe amodal, unimodal, and multimodal phenomena. Some of its properties are derived and discussed: characteristic and cumulant generator functions, ordinary moments and skewness and kurtosis measures. Second, we propose three estimation methods for CTrPGa parameters: moment method (MM), maximum likelihood estimation (MLE) via the expectation maximization (EM) algorithm and empirical characteristic function (ECF) methods. We rewrite the first of them such that the associated estimation system can be reduced to one nonlinear equation in terms of one parameter. Moreover, we compare these three estimation methods through a simulation analysis. Third, applications to real SAR data are made. The performance of our model is quantified and compared with other fits due to the generalized beta-generated Lindley (GBGL) (Lima et al. [21]) and Weibull-Poisson (WP) (Lu and Shi [22]) distributions (recently proposed to describe lifetime data) and other five models, G 0 (Frery et al. [14]), K (Blacknell [1]), beta generalized normal (BGN) (Cintra et al. [4]), gamma generalized normal (GaGN) (Cordeiro et al. [6]) and CTrPN (Nascimento et al. [27]) distributions, which are commonly used in the SAR literature. This paper is outlined as follows. In Section 2, we define the CTrPGa model and present some of its properties. Three estimation methods are discussed in Section 3. In Section 4, a simulation study is made to compare these estimation methods. Section 5 addresses two applications to real data. Conclusions are presented in Section 6.

Gamma-based stochastic sum
Nascimento et al. [27] have proposed the compound N family, defined as S = N j=1 X j and having cumulative distribution function (cdf) where F S k is cdf of the sum of a k-points random sample, S k = k i=1 X k . Consider the problem of deriving the random sum of gamma distributions with the number of terms following the TrP law. Thus, assuming N ∼ TrP(λ) and X i ∼ Ga(α, β) as inputs to (4), we get that S k ∼ Ga(kα, β) and where λ, α, β > 0, α and β are the scale and shape parameters, respectively, γ (δ, z) = [ (δ)] −1 z 0 t δ−1 e −t dt denotes the lower incomplete gamma function and (·) is the gamma function.
This case is denoted by S ∼ CTrPGa(λ, α, β). By differentiating (5), one has that the CTrPGa probability density function (pdf) is given by From (5) and (6), the CTrPGa hazard rate function (hrf) is given by . Figures 1 and 2 present, respectively, some cases of the CTrPGa pdf and hrf. It is noticeable that the proposed model is able to describe multimodal, asymmetric and symmetric density behaviors ( Figure 1). Moreover, CTrPGa hrf ( Figure 2) may assume three standard patterns (increasing, decreasing and upside-down bathtub shapes). Based on Theorem 2.1 and on Equations (6) and (7) in [27], the following kind moment quantities hold: If S ∼ CTrPGa(λ, α, β), its characteristic function (CF), moment generate function (mgf) and cumulative generate function are given by, respectively, The following proposition may be verified from the CTrPGa mgf. and

respectively.
It is possible to check easily that the expression for the variance is positive, since e x > 1 + x for all x > 0.
Asymmetry and kurtosis measures of the CTrPGa distribution are given by, respectively, where κ 2 = Var(S) and κ 3 and κ 4 are expressed in (7) and (8), respectively, and Details of these results are presented in Appendix 1.

Estimation procedures for CTrPGa parameters
This section presents three estimation procedures for the CTrPGa parameters.
In this paper, we use the nonparametric boostrap method [10] to estimate standard errors associated with the proposed moment estimators. A routine to find λ, α and β for a given data set is detailed in Algorithm 1.
Algorithm 1: Estimation procedure using the moment method.

Maximum likelihood estimation via the EM algorithm
Dempster et al. [9] pioneered a unified algorithm, termed expectation maximization (EM), which aims to obtain maximum likelihood estimates from incomplete data. As follows, we present a summary of the Dempster, Laird, and Rubin's theory. Let x and y be two data sets. The first is complete and understood as a random sample (independent, identically distributed) of a population variable with pdf f (x; θ). On the other hand, the second is an incomplete random sample from a variable equipped with density g(x; θ ). Let X and Y be sample spaces due to x and y, respectively. Thus, the pdf of y can be defined as where Y(ẏ) is the subsample space of X determined byẏ = y(ẋ) andẏ andẋ are possible outcomes of y and x, respectively. Setting L c (θ) log f (x; θ ) and L(θ) log g(x; θ ), the EM algorithm consists in obtaining the maximum of L(θ) by means of an interactive process in terms of L c (θ ). As L c (θ ) is not observed, it is replaced by the following conditional expectation and temporary parametric values:θ Therefore, the EM algorithm can be understood in two steps: E-step To obtain the conditional expectation ].

M-step To findθ
Both steps above must be repeated until they converge in a specified sense. In what follows, we develop the EM algorithm for CTrPGa parameters.
Let S ∼ CTrPGa(λ, α, β) andṡ = (s 1 , . . . , s n ) be the observed sample, a possible outcome from a random sample s = (S 1 , . . . , S n ) drawn of S. Note that a non-observed sample,k = (k 1 , . . . , k n ) , is embedded in the observable one. Assume also that the last sample is an outcome of Specifically, one has that Proposition 3.1 summarizes the EM algorithm for the CTrPGa parameters. Details of these results are presented in Appendix 2.  = ( λ, α, β) , which maximize Q(θ; θ 0 ) in Equation (12) are defined by the following set of nonlinear equations: and α is a solution of the nonlinear equation Algorithm 2 describes the steps for this estimation.

Algorithm 2:
Estimation procedure using the MLE via EM.
as vector of estimates. Otherwise, make k = k + 1 and go back to step 2.

Minimum distance between empirical and theoretical characteristic functions method
Several works have addressed estimation methods based on the ECF. Parzen [29] introduced the ECF-based inference methods. The application of this method for independent data has been made by Paulson et al. [30], Feuerverger and McDunnough [13], Tran [34], Yu [38], among others. The idea behind this process is minimizing a distance measure between the ECF and the theoretical CF. Let F(x; θ ) be the cdf of a random variable X with support X . The CF of X is defined as (for i = √ −1) and the ECF is where i = √ −1 and {X j } n j=1 represents an independent and identically distributed sample and F n (x) is the empirical cdf. The support of r may be utilized under two approaches: First the continuous support such that the next distance is used: where g(r) is a continuous weight function. Second the discrete support such the following distance is used: and Im[·] are real and imaginary parts of a complex number and r 1 , r 2 , . . . , and r q is a set of points defined in the support of r. In this work, we use the discrete approach. According to Cojocaru and Doray [17], given a set of points from the support of r: r 1 , r 2 , . . . , r q , the efficient estimator for θ ,θ, is given byθ = arg min θ∈ [D 2 (θ ; x)], where is the parametric space and, for our case, Q(θ) = I. Algorithm 3 describes the steps for this estimation.
Algorithm 3: Estimation procedure using the ECF.
1: Choose the initial point for θ = (λ, α, β); 2: Choose the set of points r 1 , r 2 , . . . , r q ; 3: Determine the imaginary and the real parts of CF and ECF; 4: Calculate the values for V n and V θ ; 5: Find the argument which minimizes the distance

Simulation study
Now we consider a Monte Carlo simulation study having one thousand replicas. This simulation aims to compare the three estimation methods discussed in the previous section. We use the MM estimates as initial points for the EM and ECF estimation algorithms. To that end, we use (λ, α, β) = (1,1,2), (1,3,2), (1,1,4), (1,2,4) and (2,1,4) and sample sizes n = 50, 100 and 150. As figures of merit, we employ the mean of estimates, the mean square error (MSE) and the fit mean absolute error (FMAE).
In order to choose the best r sequence in the EFC method, several studies were carried out searching for the best set of points. As a conclusion of this study, we note that for the majority of cases, the best results were obtained in interval (0.001, 30). In what follows, Algorithm 4 describes the steps of the simulation study. Table 1 exhibits values of the figures of merit for the three estimation methods. In general, results point out the use of the EM algorithm outperforms the moment and ECF methods. For example, for n = 100 and (λ, α, β) = (1, 1, 2), the MSE(α) in method EM with MM is equal to 0.0236, while in the MM and ECF with MM methods, 0.0331 and 0.0821, respectively. However, the difference between the MM and MLE via EM procedures tend to be inexpressive when the sample size increases. In fact, for n = 150 and (λ, α, β) = (1, 1, 4) = (λ, α, β) , x 2 , . . . , x 150 ) . From that create other two vectors of sizes N 1 = 50 and N 2 = 100, given by y 1 = (x 1 , x 2 , . . . , x 50 ) and y 2 = (x 1 , x 2 , . . . , x 100 ) ; 3: Using y 1 , y 2 and y 3 , obtain the estimativesθ 1 ,θ 2 andθ 3 ; 4: If no valid estimate is obtained, discard the sample and return to Step 2; 5: Register the following vector of information: and 0.0266, respectively. Thus, due to its simplicity and since the sample size is large, we employ the MM in the applied study in the following section.

Application to real data
Now, we are in a position of applying the CTrPGa model to real data obtained from SAR images. The SAR system has been successfully used to solve remote sensing issues [25]. Its search can be justified by its ability at working under several weather conditions and at furnishing high spatial-resolution images. SAR data arise from the next dynamic: To send polarized (in horizontal, 'H', and, vertical, 'V', directions) pulses towards an understudy area and, after, capturing them (also in H and V directions). Each entry of produced images is associated with a complex vector, where each component represents one among three polarization channels: horizontal-horizontal, 'HH', vertical-horizontal (by physical reasons, similar to the horizontal-vertical channel), 'VH', and vertical-vertical, 'VV'. In this paper, we wish to describe an HH channel feature, denoted as where |Z HH | 2 = Z HH Z * HH represents the HH intensity -with (·) * being the conjugate operator -and φ HH is the intensity phase. In what follows, we want to model SAR intensities by means of S with cdf in (5).
A Foulum (Denmark) image [15] -captured on 1998 by the EMISAR sensor under the number of looks eight -is used. Figure 3(a) exhibits the used SAR image at the HH channel and a selected region for analysis. A brief descriptive analysis is given in Table 2. Values of coefficient of variation, mean and median indicate an asymmetric empirical distribution   Figure 3(b) exhibits the SAR image used at the VV channel and a selected region for analysis. A brief descriptive analysis is given in Table 2. Values of coefficient of variation, mean and median indicate a quasi symmetric empirical distribution having high-variability; while, the sample skewness and kurtosis are near to the ones due to the normal law.
We aim to analyze Foulum SAR intensities within the selected rectangle according to (at least) two regions, representing wheat crops. Further, we wish to describe forest regions combined with urban targets in the Los Angeles SAR image. For this case, the CTrPGa will also be compared with the BGN(α, β, μ, σ , s) [4], G 0 (α, γ , L) [14], K(α, λ, L) [1], GBGL(λ, a, b, c) [21] CTrPN(λ, μ, σ ) [27] WP(α, β, λ) [22] and GaGN(a, μ, σ , s) [6] laws having densities, respectively, given by where φ s (·) and s (·) are functions defined in [4], B(a, b) is the beta function, K a (b) represents the second kind modified Bessel function with argument b at order a and φ(·) is   the standard normal density. We do not use the CTrPCa distribution proposed in [35] in this analysis because it does not furnish reasonable estimates and fit for these scenarios. As observed in Section 4, MM estimates for the CTrPGa model have a similar performance when compared to more sophisticated estimation methods and can be obtained resolving one nonlinear equation in terms of a single parameter. Thus, since it is a method that can be more easily used by practitioners, we compared its performance in this application with the ML estimates of the other seven models. Fits in Figure 4(a-d) suggest a qualitative comparison favorable to the CTrPGa model, which presented a better fitting when compared to the BGN, G 0 , K, GBGL, CTrPN, WP and GaGN distributions.
In a quantitative analysis, parameter estimates involved in both applications are presented in Tables 3 and 5. It is noticeable that all fits were acceptable in respect to point estimates.
Tables 4 and 6 present values for six goodness-of-fit measures: Kolmogorov-Smirnov (d KS and its p − value KS ), Anderson-Darling (A * ), Cramer-Von-Mises (W * ) statistics, Akaike information criterion (AIC) and Bayesian information criterion (BIC). Based on the KS test, the CTrPGa distribution was the only adequate alternative, adopting the nominal level of 6% for Foulum data. With respect to A * and W * measures for Foulum data, the CTrPGa law outperforms all other distributions; while the CTrPN yields the best performance in terms of AIC and BIC, followed very closely by the CTrPGa. For Los Angeles data, all considered models were acceptable taking a nominal level of 10% according the KS test, being the smallest KS statistics value furnished by the CTrGa fit. The smallest values for A * and W * statistics were provided by the BGN and K fits, respectively. The CTrPGa fit yielded the best performance in terms of AIC and BIC.

Conclusions
In this paper, we have introduced a new three-parameter model named compound truncated Poisson gamma distribution. Some of its mathematical properties have been also derived: cumulative and characteristic functions, ordinary moments and skewness and kurtosis measures. A simple moment-based estimation procedure which involves finding the root of only one nonlinear equation is proposed as well. Further, we propose two other estimation methods based on the MLE via EM and on the ECF. It is worth emphasizing that the MM estimation procedure is computationally easier for CTrPGa users who may, for example, need to process images from neighborhoods of every image pixel. The MM estimation method was observed to have similar performance to the other two more sophisticated estimation methods. The CTrPGa model has been employed successfully as a descriptor of SAR intensities, outperforming seven other models.

Disclosure statement
No potential conflict of interest was reported by the author(s).