Log-mean distribution: applications to medical data, survival regression, Bayesian and non-Bayesian discussion with MCMC algorithm

We introduce a new family via the log mean of an underlying distribution and as baseline the proportional hazards model and derive some important properties. A special model is proposed by taking the Weibull for the baseline. We derive several properties of the sub-model such as moments, order statistics, hazard function, survival regression and certain characterization results. We estimate the parameters using frequentist and Bayesian approaches. Further, Bayes estimators, posterior risks, credible intervals and highest posterior density intervals are obtained under different symmetric and asymmetric loss functions. A Monte Carlo simulation study examines the biases and mean square errors of the maximum likelihood estimators. For the illustrative purposes, we consider heart transplant and bladder cancer data sets and investigate the efficiency of proposed model.


Introduction
In recent years, extensive efforts have been directed toward presenting new models in the area of distribution theory and related statistical applications. These studies are mainly concentrated to the modeling of various data sources and finding out the probabilistic structure. In connection with the development of new models, it is worthwhile to note that these new models should have the capability for analyzing a wide range of real observations. Undoubtedly, this is also the most basic concern in the development of new models from the past to the future. Mixture models are kinds of statistical models that are quite versatile and thus have been frequently used in reliability, information theory, economic, engineering, agriculture, among others. For example, Mendenhall and Hader [20], while referring to the practical situations encountered by engineers, pointed out that the failure of a system may be due to different types of causes.
Three well-known mixture distributions are reviewed here. The usual arithmetic mixture distribution is just a weighted mean of two probability density functions (PDFs): The geometric mixture is the normalized geometric mean of two PDFs, namely where k is finite on the support of h. A more general power mean mixture [25] has the form where The normalizing factor always exists for α > 0 and The α-mixture family (3) contains as special cases (1) (α = 1) and (2) (α → 0). Asadi et al. [3] obtained some interesting informational properties for the above mixture models. They play important roles in information theory and have shown to contain optimal information. Motivated by the optimal information properties of these mixture distributions, we first propose the log-mean-G (LM-G) family based on the log-mean of two survival functions.
The new family provides more flexibility for fitting real data compared to other well-known models.
There are several types of mean in mathematics, especially in statistics. Mixed statistical distributions have been widely used in various topics of statistics, especially reliability. These mixture models have mainly been defined based on known means in mathematics such as arithmetic mean, geometric mean, harmonic mean and mean power. From this perspective, various mixed distributions have been presented based on the average of two density functions or the average of two survival (distribution) functions in the statistical and reliability literature [24].
Another well-known mean in mathematics is logarithmic mean (log-mean). This mean is a function of two non-negative numbers which is equal to their difference divided by the logarithm of their quotient. This calculation is applicable in engineering problems and most notably in heat exchangers. Presenting mixture distributions based on log-mean has not been introduced in the literature. Therefore, this family is completely new and will share more applicability and potential in area of statistical distribution and information theories.
Second, log-mean-Weibull distribution and its corresponding survival regression are considered as special cases of the general LM-G model. Bayesian and non-Bayesian estimation approaches are implement to estimate the parameters of these sub-classes model. In regard with Bayesian approach in estimation of the parameters of a statistical distribution, one may refer to Eliwa et. al. [9], Rafique et al. [23], Yousaf et al. [27], Aslam et al. [4] as recently published research.
The remainder of this paper is structured as follows. In Section 2, we define the LM-G family and obtain some properties. In Section 3, we define the LM-Weibull (LM-W) distribution and address some of its properties. Some characterization results are discussed in Section 4. In Section 5, we estimate the parameters via maximum likelihood, bootstrap and Bayesian methods. Bayesian 95% credible and highest posterior density (HPD) intervals [5] are provided for each parameter of the new distribution. Section 6 is devoted to the LM-W regression. In Section 7, the superiority of the new models to some competitors is shown by means of two medical data sets. Finally, the paper is concluded in Section 8.

LM-G family
Let X be a continuous random variable with survival and hazard functionsḠ(x) and r(x), respectively. The hazard rate function (HRF) r Ph = αr(x) (for α > 0) defines the proportional hazards (PH) model popularized by Cox [7] and widely used in many fields. We propose a new model based on the log mean of two survival functionsḠ(x) andḠ α (x) and provide some of its properties with survival function (for α ≥ 0) From Equation (4) if α tends to one, then by using L'Hospital's rule, we have lim α→1H (x) =Ḡ(x). The associated CDF and PDF are and respectively.
The following theorem provides an interesting representation for the LM-G distribution. Theorem 2.1: Let X be a non-negative random variable with survival functionḠ(x) and suppose that the parameter λ has a prior distribution on (0, 1). Then, the Bayesian estimation for the expected lifetime of a series system consisting of two independent components with survival functionsḠ λ (x) andḠ α (1−λ) dx.

Proof:
The survival function of the lifetime variable T λ isF λ (x) =Ḡ λ (x)Ḡ a(1−λ) (x). Then, So, the Bayesian estimation of E(T λ ) under the quadratic loss function with respect to the uniform prior distribution in (0, 1) for the mixing parameter λ is where the third equality is satisfied by Fubini's theorem for interchanging integrations. The HRF of the LM-G distribution is In Section 4, we provide some characterization results based on the hazard rate (7).

Properties
We require three power series to obtain a useful linear representation for (5). For |z| < 1 and any real α = 0, the following power series holds: where a n = a n (α) = (−1) n α n .
where b n = a n (1 − α), c 0 = log(b 0 ) and (for n ≥ 1) Letting z = G(x) and inserting the previous power series in H(x), Equation (5) can be expressed as 1 and d n = −a n (for n ≥ 2).
The ratio of these two power series can be reduced to where e 0 = d 0 /c 0 and the coefficients e n 's (for n ≥ 1) can follow from the recurrence formula c r e n−r .
The exponentiated-G (exp-G) distribution with power parameter a > 0 for a given cdf G(x) has density π a (x) = a G(x) a−1 g(x). By differentiating (8), the density of X in Equation (6) has a linear representation where π n+1 (x) = (n + 1) G(x) n g(x) is the exp-G density with power parameter n + 1 and f n+1 = −e n+1 for n ≥ 0. Some properties of the LM-G distribution can be determined from (9) and those properties for more than 35 exp-G models reported by Tahir and Nadarajah [25]. The infinity limit in (9) can be substituted by a large positive integer such as 20 for most practical purposes.

The LM-W distribution
The LM-W(α, β, λ) density is where all parameters are positive. The corresponding HRF is Plots of the PDF and HRF are displayed in Figure 1.

Characterization results
Jeon and Kim [13] proposed a credibility theory based on the truncation of the loss data to estimate conditional mean loss for a given risk function. It should also be mentioned that characterization results are mathematically challenging and elegant. We present three characterizations of the LM-G distribution based on (i) two truncated moments; (ii) hazard function and (iii) conditional expectation of a function of a random variable.

Characterization based on two truncated moments
Then, X has PDF (5) if and only if the function ξ defined in Theorem 2.1 has the form Proof: Suppose X has PDF (5), then (for x ∈ R) and and then We also have Conversely, if ξ is of the above form, and Now, in view of Theorem 2.1, X has PDF (5). .

Corollary 4.2: The general solution of the last equation is
where D is a constant.

Characterization based on hazard function
For a twice differentiable distribution function H(x), we can write The following proposition establishes a non-trivial characterization of the LM-G distribution based on the hazard function.

Proposition 4.2: The random variable X has density (5) if and only if its hazard function r H (x) satisfies:
Proof. It is straightforward and hence omitted.

Characterizations based on a conditional expectation
Hamedani [11] established the following proposition which is used to characterize the LM-G distribution.
if and only if and δ = 1 2 , then Proposition 4.3 presents a characterization of the LM-G distribution.

Inference
In this section, we consider estimation of the unknown parameters of the LMW distribution via three methods: maximum likelihood method, bootstrap estimation and Bayesian procedure.

Maximum likelihood estimation
Let x 1 , . . . , x n be a random sample from the LM-W distribution and = (α, β, λ) be the vector of parameters. The log-likelihood function is The maximum likelihood estimate (MLE) of can be found by maximizing L( ) numerically using some routines in R (optim function) and Ox (sub-routine MaxBFGS).

Bootstrap estimation
Two parametric and nonparametric bootstrap procedures are described as below (see Efron and Tibshirani [8]): Parametric bootstrap procedure: • Estimate ψ, sayψ, from the previous method.
• Generate a bootstrap sample {Y * 1 , . . . , Y * k } usingψ and obtain the bootstrap estimate of ψ, say ψ * , from the bootstrap sample from the MLE method.

Nonparametric bootstrap procedure
• Generate a bootstrap sample {y * 1 , . . . , y * k } with replacement from the original data set. • Calculate the bootstrap estimate of ψ from the MLE method, say ψ * , using the bootstrap sample. • Repeat step 2, N times.

Bayesian inference
We consider different types of symmetric and asymmetric loss functions such as squared error loss function (SELF), weighted squared error loss function (WSELF), modified squared error loss function (MSELF), precautionary loss function (PLF) and K-loss function (KLF). These functions associated with Bayesian estimators and posterior risks are reported in Table 1.
For more details about these loss functions, see Kharazmi et al. [14] and references therein. Next, we provide a Bayesian discussion for estimating the parameters of the LM-W distribution from a complete sample.

Joint posterior and marginal posterior distributions
Assume that the parameters of the LM-W distribution have independent prior distributions, namely where all hyper-parameters are positive. Consequently, the joint prior density has the form For simplicity, we define the function ζ as The joint posterior distribution defined from Equation (11) and the likelihood function Therefore, the joint posterior PDF can be expressed as where = (α, β, λ) and K follows from Moreover, the marginal posterior density of α,β and λ (assuming that = ( 1 , 2 , 3 ) = (α, β, λ)) can be expressed as where i, j, k = 1, 2, 3, i = j = k and i is the ith member of the vector .

Bayesian point estimation
Under the marginal posterior PDF (14) and the loss functions listed in Table 1, the Bayesian point estimation for the parameter vector = ( 1 , 2 , 3 ) = (α, β, λ) is obtained via minimizing the expectation of loss function under the marginal posterior PDF as follows: arg min However, in practice, because of intractable integral in (15), we use the well-known Gibbs sampler [10] or Metropolis Hastings algorithms [12,21] to generate posterior samples. We will argue this issue more precisely in Section 5.8.

Credibility interval
In the Bayesian framework, interval estimation is done via credibility interval conception. Consider the parameter vector = ( 1 , 2 , 3 ) = (α, β, λ), which is associated with the LM-W distribution and π( j |x) that denote the marginal posterior PDF of the parameter j (j = 1, 2, 3), as in (14). For a given value of η ∈ (0, 1), the (1 − η)100% credibility interval CI(L j , U j ) is defined as By considering the relation (16), it is very difficult to obtain the marginal PDF from the joint posterior PDF. We use the Gibbs sampler to generate posterior samples.
) be a posterior random sample of size k, which is extracted from the joint posterior PDF as in (12). Using these generated posterior samples, the marginal posteriors PDFs of j given x can be given by where i −j represents the vector of posterior samples when the jth component is removed. Inserting (17) in (16), one can be able to compute the credibility intervals for j (j = 1, 2, 3) as follows:

Highest posterior density interval
The highest posterior density (HPD) interval is a kind of credibility interval under a specific restriction. A (1 − η)100% (i = 1, . . . , p) HPD interval for j (j = 1, 2, 3) is the simultaneous solution of the integral equations

Generating posterior samples
It is clear from Equations (12) and (14) that there are no explicit expressions for the Bayesian point estimators under the loss functions in Table 1. Because of intractable integrals associated with joint posterior and marginal posterior distributions, we require numerical software to solve the integral equations numerically via MCMC methods such as the Metropolis-Hastings algorithm and the Gibbs sampling. Suppose that the general model f (x|ψ) is associated with the parameter vector ψ = (ψ 1 , ψ 2 , . . . , ψ p ) and observed data x. Thus, the joint posterior distribution is π(ψ 1 , ψ 2 , . . . , ψ p |x). We also assume that is the initial vector to start the Gibbs sampler. The steps for any iteration, say iteration k, are as follows: • Starting with an initial estimate (ψ ; and so on down to. • Draw ψ k p from π(ψ p |ψ k 1 , ψ k 2 , . . . , ψ k p−1 , x).
In the case of the LM-W distribution, by considering the parameter vector = (α, β, λ) and initial parameter vector 0 = c(α 0 , β 0 , λ 0 ), the posterior samples are extracted by above Gibbs sampler where the full conditional distributions are and where Υ ( In practice, simulations related to Gibbs sampling can be conducted through a special software WinBUGS. This software was developed in 1997 to simulate data of complex posterior distributions, where analytical or numerical integration techniques cannot be applied. Moreover, Gibbs sampling processes can be carried out via OpenBUGS software, which is an open source version of WinBUGS. Here, since there are not any prior information about hyper-parameters in (11), We implement the idea of Congdon [6] and the hyper-parameters values are set as α 0 = α 1 = β 0 = β 1 = λ 0 = λ 1 = 0.0001. So, we can use the MCMC procedure to extract posterior samples of (12) by means of the Gibbs sampling process in OpenBUGS software.

Monte Carlo simulations
We investigate the performance of the MLEs based on a simulation study using the Monte Carlo method. First, we generate samples of size n from (10) from the inversion method which implies to find the root of where u ∼ U(0, 1) is a uniform variate on the unit interval. We compute the mean square errors (MSEs) and biases of the MLEs of the parameters based on N = 1000 simulations.
The results are summarized in Table 2 for selected parameters and values of n. The numbers in Table 2 indicate that the MSEs and biases of the MLEs decrease when the sample size n increases. So, the MLEs of the parameters are consistent.

A new regression
Let W be a Weibull random variable having PDF and CDF respectively. Inserting these expressions in Equation (2), we have Henceforth, consider that the random variable X has density (23). The density of Y = log(X) under the re-parametrization σ = 1/β and μ = −log(λ) is where μ ∈ is a location, σ > 0 is a scale, and α > 0 is a shape parameter. We refer to Equation (24) as the log-LM- Let Z = (Y − μ)/σ be the standardized random variable having PDF e −e z +z 1 + e z − αe z + 1 e −(α−1)e z .
The LLM-W regression is defined by and v i is the explanatory variable vector and z i is the random error with density h(z; α, σ ). The regression parameters are represented by τ = (τ 1 , . . . , τ p ) . Let y i be a dependent variable defined as y i = min{log(x i ), log(c i )}, where log(x i ) follows (24) and log(c i ) represent the log-lifetime and log-censoring times. Let F and C be the sets of individuals with log-lifetime and log-censoring, respectively. Under the definitions of sets F and C, the log-likelihood function for the parameter vector ψ = (α, σ , τ ) is given by where h(y i ; α, μ, σ ) and S(y i ; α, μ, σ ) defined as in (24) and (25), respectively. Now, consider u i = exp(z i ), z i = (y i − v i τ )/σ and let r be the number of uncensored observations. Consequently, from (27), we obtain The MLE of ψ can be obtained by maximizing (28) using the optim function of R software.

Residual analysis
The martingale residuals for the LLM-W regression are The modified deviance residuals for the LLM-W regression are The modified deviance residuals are normally distributed with zero men and unit variance once the fitted regression model is suitable and accurate for the current data.

Application
The goal here is to show two applications of the LM-W model under three methods (maximum likelihood, Bayesian and bootstrap) discussed in Section 5. In order to achieve this goal, first we consider the remission times for 128 bladder cancer patients and the parameter estimations are done by means of these three methods discussed in Section 5. Second, we show the performance of LLM-W survival regression via maximum likelihood by analyzing the heart transplant data set, which is associated with some covariates. For more details about these data sets, see Alizadeh et al. [2] and Marinho et al. [19].

Univariate data modeling
The bladder cancer data have been formerly studied by Lee and Wang [18]. Visualized measure: The total time test (TTT) plot in view of Aarset [1] is a vital graphical way to clarify whether a given data set can be adequately modeled through a specific   distribution or not. The TTT plot presented in Figure 2 indicates that the experimental failure rate function for the current data is upside-down bathtub shaped. Hence, the LM-W distribution is potentially suitable to fit these data.

Bootstrap inference
Here, we consider the point and 95% confidence interval (CI) estimation for the parameters of the LM-W distribution via parametric and non-parametric bootstrap methods. The numerical results of the bootstrap estimation for the bladder cancer data are provided in Table 3. In order to see the potential correlation structure that might exist between each pair of the parameters, we provide the scatter plot of the joint distribution of the bootstrapped values in Figure 3.

Analysis of the cancer data
We model the bladder cancer data by the LM-W distribution through maximum likelihood, Bayesian and bootstrap methods and compare the results with the Weibull, exponentiated Weibull (EW) by Nadarajah et al. [22], two-sided generalized exponential (TSGE) by Korkmaz and Genç [16] and transmuted two-sided generalized exponential (TTSGE) by Kharazmi and Zargar [11] with corresponding densities  Table 4 provides a summary of MLE discussion for the bladder cancer data. In order to examine the fitting validity of the LM-W distribution to the current data and its superiority to the other competitor models, we obtain Cramér-von Mises (W * ), Anderson-Darling (A * ) and Kolmogorov-Smirnov (KS) statistics with the corresponding P-value (P) measures. The LM-W distribution gives the best fit for these data as it shows the lowest A * , W * and KS than the other models. The histogram, fitted LM-W and TTSGE PDFs, and the plots of empirical and fitted LM-W and TTSGE CDFs are plotted in Figure 4. The P-P   plots and Q-Q plots for the LM-W and TTSGE are displayed in Figure 5. In terms of visual inspection, these plots support the LM-W validation as well as the results of Table 4. As mentioned formerly in Section 5, there are no closed-form expressions for the MLEs of the parameters β, α, θ and λ in the LM-W model. The log-likelihood surface is plotted for each combination of two parameters in Figure 6. Next, we provide Bayesian estimation results. It is evident from Equation (14) that there are no closed-form expressions for Bayesian estimators, which are extracted based on the loss functions in Table 1. Therefore, an MCMC procedure via the Gibbs sampler process is designed using Equations (19), (20) and (21), with 10,000 replicates to obtain the Bayesian estimators. In Table 5, we provide the corresponding point and posterior risk estimations. Further, 95% credible and HPD intervals are provided in Table 5. In order to provide a visual inspection, we provide posterior summary plots in Figures 7-9. These plots verify that the convergence of the Gibbs sampling process has occurred Next, for the evaluation of the MCMC procedure in Bayesian analysis, we report some diagnostics measures such as Gelman-Rubin (GR), Geweke (G) and Raftery-Lewis (RL) for checking the convergence of the Gibbs algorithm in Table 7. For more details about these indexes see, Lee et al. [17]. The GR diagnostic for parameters α, β and λ is equal to 1. Hence, based on the GR diagnostic measure, the chains are acceptable. Figure 10 shows that the estimates come from a state spaces of the corresponding parameters. From Table 7, Geweke's test statistics for parameters α, β and λ, are 1.53, 0.821 and 0.191, respectively. Hence, the G diagnostic measure also confirms the acceptance of chains as shown in Figures 11 and 12. Moreover, the reported diagnostic statistics for parameters α, β and λ based on the RL method do not show significant degree of dependence between estimates.

Data modeling with covariate variables: heart transplant data
Here, we asses the performance of the LLM-W regression by application to real heart transplant data. The current data are available in the survival package of R software. The proposed survival regression is where y i is distributed as the LHM-W distribution and the covariate random variables are described as • v i1 = age; • v i2 = previous surgery (surgery coded as 1 = yes; 0 = no); • v i3 = transplant (coded as 1 = yes; 0 = no). Table 8 provides a summary of MLE discussion for the heart transplant data. We fit the LLM-W regression model to this data set and compare the results with the log-Weibull  distribution. The estimated parameters, standard errors (SEs), AIC measure as well as corresponding P-values are reported in Table 8. We conclude that the estimated regression parameters are statistically significant at 5% level.

Results of residual analysis
The suitability of the fitted LLM-W regression model is evaluated by residual analysis. The plot of the modified deviance residuals and its Q-Q plot are displayed in Figure 13 which reveal that the fitted LLM-W regression provides good fit to the used data.

Conclusion
A new family of distributions is established via log-mean and some of its main characteristics are derived. A special sub-model of this family is proposed by considering the Weibull as the baseline distribution. We provide a comprehensive discussion about Bayesian estimation of the parameters under several loss functions for a special sub-model. We also provide a survival regression model associated with this sub-model in order to analyse real data with covariate variables. Numerical analysis of fitting an univariate real data set is provided via three inference approaches and the corresponding plots are given to evaluate these results. The data analysis proves empirically that the proposed distribution gives a better fit than other competing distributions for the current data. Finally, the performance of the survival regression sub-model is examined by considering a real example of observations with covariate variables. As a future work, one may be interested in incorporating other well-known distributions instead of G to produce new flexible models to analyze real environmental data sets in the different branch of sciences such as economic, engineering, reliability, medicine and and other disciplines. Further, as another future work, it would be interesting to extend log mean idea for bivariate probability distributions.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
This work was supported by Vali-e-Asr University of Rafsanjan .