The Modified-Half-Normal distribution: Properties and an efficient sampling scheme

Abstract We introduce a new family of probability distributions that we call the Modified-Half-Normal distributions. It is supported on the positive part of the real line with its probability density proportional to the function We explore a number of its properties including showing the fact that the normalizing constant and moments of the distribution can be represented in terms of the Fox-Wright function. We demonstrate its relevance by showing its connection to a number of Bayesian statistical methods appearing from multiple areas of research such as Bayesian Binary regression, Analysis of Directional data, and Bayesian graphical model. The availability of its efficient sampling scheme is important to the success of such Bayesian procedures. Therefore, a major focus of this article is the development of methods for generating random samples from the Modified-Half-Normal distribution. To ensure efficiency, we prove that the constructed accept reject algorithms are “uniformly efficient” with high acceptance probability irrespective of the choice of the parameter specifications. Finally, we provide basic inference procedure for its parameters when analyzing data assuming the Modified-Half-Normal probability model.


Introduction
The objective of this manuscript is to introduce the Modified-Half-Normal (MHN) distribution, a novel family of continuous probability distributions, supported on the positive part of the real line. The corresponding probability density function is specified as where Iðx > 0Þ denotes the indicator function and 1 W 1 a 2 , 1 2 ð1, 0Þ ; z 2 4 3 5 refers to a specific case of the Fox-Wright Psi function (Fox 1928;Wright 1935). We establish the connection between the normalizing constant and the Fox-Wright function in Lemma 1. For brevity of expression, we have used the notation W a 2 , z Â Ã instead of its full expression for the rest of the manuscript. Additional details along with a number of properties of the function is included in the Supplementary Section 1. The name of the distribution is inspired by the similarities of the probability density function in Equation (1) with that of the Normal distribution restricted on the positive part of the real line.
The introduction of the MHN distribution is primarily motivated by its connection to a number of Markov Chain Monte Carlo (MCMC) based Bayesian procedures. Although, in Bayesian analysis new distributions frequently crop up in the form of a conditional posterior distributions, usage for many such probability distributions is too specific and they may not be relevant for a broader context. Additionally, most of such distributions often lack tractable form of its distributional attributes, such as the known functional form of the normalizing constant. On the contrary, the MHN distribution occurs in the diverse areas of research signifying its relevance to the contemporary statistical modeling and computation. In Section 2, we include a few examples that involves the MHNða, b, cÞ distribution as one of their posterior conditionals to sample from. In particular, the distribution will play an important role in the development of data augmentation algorithms for the directional data (see Section 2.3). Apart from the examples provided in Section 2, the MHNða, b, cÞ may transpire in many other statistical models due to the similarity of its density with that of the Normal distribution. Moreover, the distribution has interesting properties including the availability of its normalizing constant to be represented in terms of Fox-Wright functions. From the perspective of its applicability toward MCMC algorithms, it is important to develop procedures to generate random samples from the distribution. Therefore, we devote considerable efforts to design efficient sampling scheme for the distribution. Specifically, we employed the accept reject sampling techniques (Ahrens and Dieter 1982;Casella 1985;Devroye 2006) for the purpose. It is known that, if not designed carefully, a rejection sampling algorithm may suffer from high rate of rejection; especially, when the rejection rate is not controlled uniformly for the all cases of the parameter values. Hence, we devise "uniformly efficient" Accept Reject Algorithms that ensure a high performance for all possible choices of its parameter values. Algorithms developed in Devroye (2014) and H€ ormann and Leydold (2014) are examples of the "uniformly efficient" accept reject algorithms. Therefore, we provide a reliable sampling algorithm that would guarantee a sample from MHNða, b, cÞ irrespective of the parameter specifications. Establishing uniform efficiency of the developed accept reject algorithms has been a nontrivial challenge and a major theoretical component of the article is to prove such results. On a related note, we here point out that the MHNða, b, cÞ is a log-concave distribution when a ! 1: Therefore the generic sampling techniques applicable for a log-concave density (Gilks and Wild 1992;Martino and M ıguez 2011) are also relevant for the case. On the other hand, the sampling algorithms developed in this article are optimally designed to achieve high efficiency. In reference to a particular case, note that the class of Truncated Normal distributions whose support is restricted to the positive part of the real line, is a special case of the MHN family of distributions.
In addition to the sampling algorithms, we explored a number of properties of the MHN distribution in the Section 3. We recognize its normalizing constant to be a Fox-Wright Psi function (Fox 1928;Wright 1935;Mehrez and Sitnik 2019). The moments and the moment-based statistics such as variance, skewness for the distribution can be represented in terms of the Fox-Wright Psi functions. Besides, we identify a recursive relation between three consecutive moments of the distribution that not only be helpful in developing a prudent double inequality (lower as well as upper bound) for mean of the distribution but also plays an important role in estimating the parameters of MHNða, b, cÞ that we discuss in Section 5. Note that the family of MHN distributions can be viewed as a generalization of multiple families including Half Normal, Truncated Normal, Square root of a Gamma and Gamma distributions (see Section 3). Therefore, it possess decent flexibility that can capture attributes of a real valued positive data. Although it is not the primary objective, for the sake of completeness we briefly discussed the inferential techniques for analyzing data using the Modified-Half-Normal probability model.
As an additional component of this manuscript, we develop a few properties of the Fox-Wright function that are required for studying the MHN distribution. Depending on different parameter choices, we designed appropriate computation strategies for the Fox-Wright functions that are relevant to the current context. Reliable computation of the function is not only important to evaluate the basic statistics such as mean, variance, skewness, etc. of the Modified-Half-Normal distribution but also useful for measuring the efficiency of the sampling algorithms that we have designed in Section 4. However, the implementation of the sampling algorithms does not require for its computation. Additionally, a few nontrivial inequalities for the Fox-Wright function are also designed as they are required to establish the uniform efficiency of the developed accept reject algorithms. We included a detailed discussion on the topic in the Supplementary Section 1.
The remainder of the manuscript is organized as follows. In Section 2, we provide a few examples where the presented statistical models involve the MHNða, b, cÞ distribution. We study and develop a few properties of the MHNða, b, cÞ distributions in Section 3. The development of the sampling algorithms and the appraisal of their efficiency are presented in the Section 4. The basic inference procedure to analyze data with the MHNða, b, cÞ distribution as a probability model is discussed in Section 5. The proofs of all theorems and the lemmas are included in the Supplementary material.

A Few applications of the Modified-Half-Normal distribution
In this section, we consider a few examples to demonstrate the usage of the MHNða, b, cÞ distribution in various statistical models. For the following Bayesian procedures, the MHNða, b, cÞ appears as one of the posterior conditional distributions which constitutes the associated Markov Chain Monte Carlo algorithms. The examples aggregated here are inspired from unrelated statistical problems, thus showing the spontaneity of the occurrence of MHNða, b, cÞ in the literature.

Sandwich MCMC algorithm in Bayesian binary regression
Let y R, 1 , y R, 2 , :::, y R, n be independent Bernoulli random variables such that Pðy R, where x R, i 2 R p denotes the corresponding covariates for i ¼ 1, :::n: b R 2 R p is a vector of unknown regression coefficients and H R ðÁÞ is the cumulative distribution function of the t-random variable with R degrees of freedom. For a MCMC based posterior inference of the parameter b R , it is required to sample from probability density, where an improper flat prior is used for b R : In Equation (2), we use the notation y R ¼ ðy R, 1 , y R, 2 , :::, y R, n Þ: A sufficient condition for the property of the posterior in Equation (2) can be found in (Pal, Khare, and Hobert 2015). Albert and Chib (1993) developed a Data Augmentation (DA) algorithm for exploring the above posterior (Equation (2)). The technique is based on introducing a set of latent variables K R :¼ ðk R, 1 , :::, k R, n Þ T 2 R n þ and Z R :¼ ðz R, 1 , :::z R, n Þ T 2 R n with appropriate probability distributions, so that the full conditional posteriors of b R , Z R and K R reduced to be Normal, Truncated Normal, and Gamma distributions, respectively. It is well known that the DA algorithm in Albert and Chib (1993) can be slow to converge and often leads to MCMC samples with high auto correlation (Van Dyk and Meng 2001;Roy 2012). To improve the performance of data augmentation algorithms, sandwich data augmentation algorithms Marchev and Hobert (2004) are designed where the idea is to gain efficiency only by introducing an additional draw from a scalar random variable so that the stationary distribution of the Markov chain remains same. Pal, Khare, and Hobert (2015) proposed a suitable version of the sandwich algorithm where the probability density function of additional scaler random variable that is required to be sampled (denoted by the notation G) transpires to be Iðg > 0Þ: From Equation (1), it follows that the distribution corresponding to Equation (3) is the MHN with parameters n, 1 2 P n i¼1 k R, i z 2 R, i , P n i¼1 k R, i z R, i x T R, i b R : Therefore, an efficient sampling algorithm for the MHN can contribute to the performance of the above sandwich algorithm.

Inference with the projected normal distribution
Projected Normal distribution is a popular choice to model the directional observations, y PN, i 2 S pÀ1 for i ¼ 1, :::, n where S pÀ1 ¼ y 2 R p : y T y ¼ 1 È É : As a modeling technique, it is assumed that there are latent variables x PN, i 2 R p such that x PN, i ¼ r PN, i y PN, i where r PN, i > 0 represents the norm of the vector x PN, i : y PN, i follows a Projected Normal distribution if the corresponding latent variable x PN, i follows Normal distribution with mean l PN 2 R p and variance R PN , a p Â p positive definite matrix. For the identifiability of the model parameters, it is customary to fix the last diagonal element of R PN at the unity. In the context of the MCMC based posterior inference, as discussed in Hernandez-Stumpfhauser, Breidt, and van der Woerd (2017) and Chakraborty and Khare (2017), the probability density for the full conditional posterior of r PN, i is p PN, i ðrÞ / r pÀ1 e Àr 2 1 From the definition of the MHN distribution, it follows that the distribution in (4) corresponds to the MHN with parameters p, 1 2 y T PN, i R À1 PN y PN, i and y T PN, i R À1 PN l PN : We refer to Hernandez-Stumpfhauser, Breidt, and van der Woerd (2017) for more details on the model, parameter specifications and the posterior sampling distributions.

Data augmentation algorithm for the vonMises-Fisher distribution
Let y VF, 1 , :::y VF, n 2 S pÀ1 , be the observed directional data on the p dimensional Euclidean space. vonMises-Fisher distribution with the probability density function (with respect to the appropriate uniform measure on S pÀ1 ), is one of the most popular distributions to model such data. In the above equation ¼ p=2 À 1 and I ðÁÞ denotes the Modified Bessel function of the first kind. The parameter l VF represent the modal direction while j VF controls the concentration of probability around the mode of the distribution. Recently, developed a data augmentation algorithm for a fully Bayesian analysis of the data where the i.i.d random variables T VF, 1 , :::, T VF, n distributed according to the probability density function, is augmented to deal with the intractable normalizing constant involving the Bessel function of the First kind. Here J þ1 ðÁÞ denotes the Bessel function of the first kind while fj , k g k!1 represents the positive zeros of J ðÁÞ: If independent Gammaða VF , b VF Þ distribution is used as a prior for the parameter j VF then the conditional posterior distribution for the parameter j VF results in the following density where T VF ¼ P n i¼1 T VF, i =n and Y VF ¼ P n i¼1 y VF, i =n: The full conditional distribution of j VF given the other variables follows a MHN distribution with parameters a VF , n T VF , nl T VF Y VF À b VF (see Equations (5) and (1)). For the efficiency of the above data augmentation algorithm, it is imperative to have an efficient sampling scheme for the MHN distribution.

Application in robust Bayesian graphical model
Consider the robust Bayesian graphical model as it is outlined in Finegold et al. (2014). The objective of this model is to learn the covariance structure W G of a set of multivariate observations assuming that the corresponding concentration matrix is likely to be a sparse positive definite matrix. To describe the model, let y G, 1 , y G, 2 , :::, y G, n 2 R p denotes the random sample from the alternative-t distribution (Finegold et al. 2014). To facilitate the parameter estimation in a tractable manner, Finegold et al. (2014) introduce a set of latent vectors, s G, 1 , :::, s G, n 2 R p þ where s G, i ¼ ðs G, i, 1 , :::, s G, i, p Þ T for i ¼ 1, :::, n: The model then can be represented as G, s, i for i ¼ 1, :::, n, for i ¼ 1, :::, n and j ¼ 1, :::, p where D G, s, i denotes the diagonal matrix with the diagonal elements s G, i : For Bayesian formulation of the model, the prior distribution for the parameter W G is assumed to be Hyper Inverse Wishart distribution (Roverato 2002;Finegold et al. 2014) whereas l G is considered to be Normally distributed. A Gibbs sampling algorithm is employed for posterior inference where the sampling density of the parameters s G, i, j is proportional to the function s 7 ! s G À1 e Àb G s 2 þc G s for appropriate values of b G and c G (see Finegold et al. (2014) for details). Hence, the posterior distribution of the variables s G, i, j are MHN distributions. We point out that the MHN distribution has a connection with the extended gamma distribution that the authors used in Finegold et al. (2014). However, we explore a number of properties of MHNða, b, cÞ that are unique to this manuscript. Also, there are significant novelty in designing the sampling schemes. The proof of the uniform efficiency of the developed algorithms is highly nontrivial and significant contribution to this manuscript as well.
In this section, we have discussed a few examples that pertain to the topics of covariance estimation, analysis of directional data and the Bayesian Binary regression. The above examples show the diversity of the applications where the MHN distribution can be utilized. As mentioned earlier, the authors believe that the spontaneity for the occurrence of the MHN is due to the similarity of its probability density with that of the Normal distribution and the authors believe that the distribution will be useful for many other occasions in future.

Properties of the Modified-Half-Normal distribution
In this section, we survey a few distributional properties of the Modified-Half-Normal distribution. We begin with the observation that the MHNða, b, cÞ is an exponential family of distributions. Therefore, the properties of the exponential family of distributions are automatically applicable to the MHNða, b, cÞ distribution. Apart from those, we include a few specific characteristics of distribution in the next few Lemmas. By definition, the density function of the MHNða, b, cÞ distribution is proportional to the function x 7 ! x aÀ1 exp ðÀbx 2 þ cxÞIðx > 0Þ, while we identify the corresponding normalizing constant and a form of its cumulative distribution function in the result below.
Lemma 1. Let f MHN ðxja, b, cÞ and F MHN ðÁ j a, b, cÞ denotes the probability density and the cumulative distribution function of the MHNða, b, cÞ distribution for a > 0, b > 0, c 2 R: a. The corresponding normalizing constant is given as 1 where cðs, yÞ ¼ Ð y 0 t sÀ1 e Àt dt, denotes the lower incomplete gamma function, and W a 2 , c ffiffi b p h i denotes the Fox-Wright function of the appropriate order (see Supplementary Section 1). The Figure 1 displays a few densities of the MHNða, b, cÞ distribution for different choices of a ! 1: The density becomes more similar to that of a Normal probability density in appearance when the parameter c is large.
In Figure 1a, we used a fixed positive b when similar plots can be seen if other values of b > 0 and a > 1 are chosen. For the case when a < 1, c > 0, the distribution appears not to be unimodal. Before we study the modal characteristics of the distribution in Lemma 3, we include the following properties related to its moments.
Lemma 2. Let X $ MHNða, b, cÞ then for k ! 0, then a. Assuming a þ k to be a positive real number, the kth raw moment of the distribu- The moment generating function of the distribution is given as M X ðtÞ ¼ It is evident from Lemma 2 that all finite positive moments of the distribution are finite and can be represented using the ratio of the Fox-Wright functions. Regarding its computation, we refer to the procedures developed in Supplementary Section 1. The moments have uncomplicated representation (see part(b) of Lemma 9 in the Supplementary) when the parameter a is a positive integer. The part(b) of the Lemma 2 provides a nontrivial recursive relation between the different moments of the distribution. The property played a nontrivial role in estimating the parameters of the MHN distribution that we discuss in Section 5. The variance of the distribution can be represented in terms of the Fox-Wright functions using the part(a) whereas an implication of the part(c) is that the variance of the distribution is bounded above by a=ð2bÞ if c < 0: However, Hernandez-Stumpfhauser (2012) provides a stronger bound for the variance when a ¼ 2. Taking insight from the result, we show in part(c) of Lemma 4 that the variance of the distribution is bounded above 1=ð2bÞ for all c 2 R when a ! 1, b > 0: Though it is an attribute inherited from the exponential family of distributions, we would like to point out a resemblance between the form of its moments and that of the Generalized Inverse Gaussian (GIG) distribution (Kotz, Balakrishnan, and Johnson 1994).
The moments of the GIG distribution can be represented as the ratio of Modified Bessel functions of the second kind (Abramowitz and Stegun 1972), that also appears in the normalizing constant of the GIG distribution. Likewise, for the MHNða, b, cÞ distribution, the moments are the ratios of Fox-Wright functions which correspond to its normalizing constant. Overall, the Lemma 2 characterizes a few moment-based attributes while we study the shape of its density in the following lemma. d. The density function is gradually decreasing on R þ and mode of the distribution doesn't exist, if either c > 0, 0 < a < 1 À c 2 8b or c < 0, a 1: The derivations of the above result are provided in the Supplementary Section 2.7. From the part(c) as wells as Figure 1b, it can be seen that the density function is not unimodal when a < 1 and c > 0: In terms of the shape of the density, it lacks the parity compared to the other cases of the parameters a, c: Although, for the sake of completeness, we keep the distribution in the family, the specific case does not occur in the applications that we have included in Section 2. The difference between the upper and lower bound provided in part(a) approaches to zero as a gets larger. Therefore, part(a) of the lemma also provides high precision approximation of E(X) when a is large. Regarding part(b), the lower bound does not follow from the lower bound in the part(a). Also, the condition a ! 4 is a sufficient condition for its validity. On the contrary, the upper bounds in part(a) as well as in part(b) are true for all a > 0: An implication of the fact EðXÞ ! X mode is that the distribution is positively skewed. Both the parts of the lemma play a crucial role for apprizing efficiency of the accept reject algorithms that we develop in the Section 4.
The conditional distribution of the random variable X given U is the square root of a Gamma random variable with shape parameter a 2 and rate parameter b þ c 2 u : The above result can be utilized to design hierarchical models by introducing additional variables U, V that can bypass the need for sampling from the MHNða, b, cÞ distribution. But the strategy of introducing additional latent variables may delay the mixing of the corresponding Markov chains. We conclude this section by identifying a few known families of densities to be the special case of the MHNða, b, cÞ distribution.
Lemma 6. Let X $ MHNða, b, cÞ: a. If c ¼ 0 then X follows a square root of a gamma distribution, i.e. X 2 $ Gamma a 2 , b À Á : b. If a ¼ 1 then X $ Truncated À Normal c 2b , 1 ffiffiffi ffi 2b p , 0, 1 on the support ð0, 1Þ: c. If a ¼ 1 and c ¼ 0 then X $ Half À Normal 1 ffiffiffi ffi 2b p : Furthermore, if we allow the parameter b to take the value zero, then the case c < 0 corresponds to the Gamma distribution with shape parameter a and rate parameter Àc:

Algorithms for generating random samples from the Modified-Half-Normal distribution
We devote this section for developing algorithms to sample from the MHN distribution. The ability to sample efficiently from the distribution is crucial for the success of the Bayesian statistical models that bring about the MHNða, b, cÞ distribution as one of the posterior conditionals. In terms of the technique, we employ the rejection sampling strategy (Ahrens and Dieter 1982;Casella 1985;Devroye 2006) for the purpose. The technique is widely used while the associated notations are general. Therefore, we include a brief overview of the generic procedure along with the specifications for the particular terminologies that we use in the subsequent sections. The rejection sampling algorithm is a technique to generate a random sample from a distribution that may not be sampled using a more direct approach. It is also commonly referred to as the "Accept Reject Algorithm" (Wells, Casella, and Robert 2004). Let us assume that we need to sample from a distribution with the probability density function (or the probability mass function) f(x). In order to do so, it is required to find a density g(x) such that f ðxÞ MgðxÞ for all x in the support of the target density f(x) and it is easy to sample from the distribution with probability density function g(x) (Wells, Casella, and Robert 2004). The notation M is a suitable positive constant. Often, the above notion is implemented by finding a density kernel (or probability mass kernel) g kernel ðxÞ so that f ðxÞ g kernel ðxÞ for all x and Ð g kernel ðxÞlðdxÞ < 1: Thus, we can get a proposal density g(x) that is proportional to g kernel ðxÞ: For the context of the current manuscript, lðÁÞ denotes the Lebesgue measure or appropriate counting measure depending on the specific context. Later in this article, we will refer to the function g kernel ðxÞ by the proposal kernel and the associated density function as the proposal density.
Once an appropriate proposal density g(x) is found, the steps required to sample from the distribution corresponding to f(x) is as follows: Sample X $ gðxÞ and U $ Uniformð0, 1Þ If U f ðXÞ MgðXÞ then accept X as a valid random sample from f(x) otherwise reject it and repeat the procure from the beginning until a valid sample is obtained. The efficiency of the rejection sampling algorithm is measured by the acceptance probability that transpires to be the reciprocal of the constant M. The acceptance probability can also be expressed as ð Ð g kernel ðxÞlðdxÞÞ À1 : A major component to design an efficient accept reject algorithm is to find an appropriate g kernel ðxÞ so that the corresponding acceptance probability remains large for all possible choices of the parameters of f(x).
In the specific case of MHNða, b, cÞ distribution, we used different strategies for constructing the proposal kernels depending on the sign of the parameter c. The description of the particularities associated with the algorithms are elaborated in the remaining parts of this section.

Sampling from the MHNða, b, cÞ when c>0
In this section, we develop algorithms for generating random samples from the MHNða, b, cÞ distribution when c > 0: From Lemma 3, we know that the MHNða, b, cÞ distribution is unimodal if a ! 1 and c > 0 while its density appears to be idiosyncratic when a < 1 and c > 0: We first focus on the former case and develop the accept reject algorithm in Section 4.1.1 whereas we construct an alternative procedure in the Section 4.1.2 that is applicable for a > 0, c > 0 and works effectively for the case c > 0 and 0 < a < 1 in particular.
4.1.1. Sampling from the MHNða, b, cÞ when c > 0, a > 1 In this part of the manuscript, we discuss the accept reject algorithms to generate random samples from MHNða, b, cÞ distribution when a > 1, c > 0: We utilize either the square root of the Gamma or the Normal distribution as the proposal distribution. The following Lemma provides the foundation for the development of the algorithm.
Theorem 1. Let f MHN ðxja, b, cÞ be the probability density function of the MHNða, b, cÞ distribution while f Normal , f ffiffiffiffiffiffiffi Gam p are probability density functions of a Normal and the square root of a Gamma distribution specified as If c ! 0 then, a. for any constants l 2 c 2b , 1 and d 2 ð0, bÞ and I ¼ IðK 1 ðd, a, b, cÞ K 2 ðl, a, b, cÞÞ: a. The optimum choices for the constants l and d are The constants K 1 ðl opt , a, b, cÞ, K 2 ðd opt , a, b, cÞ depends only on a and D :¼ c ffiffi b p , Therefore we will denote K 1 ða, DÞ :¼ K 1 ðl opt , a, b, cÞ and K 2 ða, DÞ :¼ K 2 ðd opt , a, b, cÞ:  Part(a) of the lemma provides the explicit form of the proposal kernels where the variable I identifies the more efficient strategy between the Normal or the square root of Gamma proposals. Although the constants K 1 , K 2 involve the Fox-Wright function in their expressions, the evaluation of the variable I does not require for its computation. One of the implications of Part(c) is that, the scale change of the random variable does not alter the efficiency of the algorithm. Therefore, without the loss of generality, we may consider b ¼ 1 when assessing the performance of the algorithm. The details of the sampling method are included in Algorithm 1.
Theorem 2. The Acceptance probability of the Algorithm 1 to generate a random sample from MHNða, b, cÞ, a ! 1, b > 0, c > 0 is given as a. A pos ða, DÞ ¼ max 1 K 1 ða, DÞ , 1 For a fixed a > 1 the function D 7 ! K 1 ða, DÞ is a decreasing function while D 7 ! K 2 ða, DÞ is an increasing function. c. K 1 a, ffiffi ffi a p À Á ! K 2 a, ffiffi ffi a p À Á for a ! 1 d. a 7 ! K 1 a, ffiffi ffi a p À Á is a decreasing function for a ! 4: e. A pos a, c ffiffi ! 0:8 for all a ! 4, b > 0 and c > 0: The proof of the Theorem is involved and necessities the part(b) of Lemma 4 as a key component. A major impediment is the unavailability of the analytic expression for the point D crit such that K 1 ða, D crit Þ ¼ K 2 ða, D crit Þ for a given a. We contrive a unique approach where the following attribute played a crucial role. For a fixed a ! 1, there is a unique D crit ! ffiffi ffi a p such that K 1 ða, D crit Þ ¼ K 2 ða, D crit Þ and D crit ffiffi a p ! 1 as a ! 1: In consonance with Theorem 2, the Figure 2 also displays that specific attribute of D crit : Note that, in Figure 2, we plot A a, c ffiffi b p , the acceptance probability of the Algorithm 1 versus C ¼ c ffiffiffiffi ba p : The curves with different colors correspond to different values of a ! 1: As C increases, the acceptance rate of the algorithm using the Normal proposal kernel increases, while it decreases for the other case. The Algorithm uses the square root of the Gamma proposal when the value of C is closer to zero. For its larger magnitudes, the Normal proposal kernel is selected by the indicator function. Therefore, in Figure 2, we see that each curve, representing the function A a, c ffiffi b p , decreases reach to an optima and increases when the Normal kernel outperforms the square root of the Gamma kernel. Part(e) of the Lemma 7 ensures that the acceptance probability of Algorithm 1 is always greater than 0.8, while we can observe the same from Figure 2.

4.1.2.
A General sampling algorithm for MHNða, b, cÞ distribution when a > 0 and c > 0 We develop an alternative procedure for generating random samples from the MHNða, b, cÞ distribution when c > 0 and a > 0: This algorithm is applicable for the case when 0 a < 1 and c > 0: To discuss the different components of the algorithm, let us start with the following lemma.
denotes the probability density function of the square root of Gamma distribution. A straightforward approach to design a sampling scheme for MHNða, b, cÞ using Lemma 7 is to construct a discrete random variable W supported on the non negative integers with probability mass function PðW ¼ iÞ ¼ p i for i ¼ 0, 1, :::, 1 where p 0 i s are as described in Lemma 7. It follows from the definition of the Fox-Wright function that that P 1 i¼0 p i ¼ 1: In the Supplementary Section 1, we have discussed methods to compute the Fox-Wright function. But computation of the Fox-Wright functions can be time computationally demanding. Therefore, we design the following procedure that does not require computation of the Fox-Wright function.
We can now construct a discrete probability distribution with support f0, 1, :::, 1g and the corresponding unnormalized probabilities f q i g i!0 are defined as The part(a) and (b) of the Lemma 8 imply that Therefore, q i q i for all i ! 0: In order to generate a sample from the discrete distribution with probabilities fp i g i!0 , a rejection sampling algorithm can be developed by constructing the discrete proposal distribution supported on the non negative integers with the probabilities f p i g i!0 where Accept a random sample from the proposal distribution as a valid sample with the probability q i Based on the above discussion, we now assume that we can sample from the discrete probability distribution having the support f0, 1, 2, :::, 1g with corresponding probabilities fp i g i!0 that are defined in Lemma 7. The following algorithm provides the details of the steps required to sampling from the MHNða, b, cÞ, a > 0, c > 0 distribution. Algorithm 2. Sampling strategy for c > 0: Input: a > 0, b > 0, c > 0: Output: Y, a sample from MHNða, b, cÞ: 1: Generate W $ Discreteðf0, 1, 2, :::, g, fp i g i!0 Þ 2: Generate Y ? j W $ Gamma shape ¼ aþW If either of a or c 2 b is large, then the computational time of the algorithm is significant. On the other hand, for small or moderate values of max a, c 2 b n o , the algorithm is efficient. Unfortunately, the technique of the above algorithm is inapplicable when c < 0 because the odd order terms in the sequence fp i g i!0 (defined in Lemma 7) would then be negative invalidating the discrete mixture representation provided in Lemma 7. An entirely different approach is considered to tackle the case c < 0 that we discuss next.

4.2.
Sampling from the MHNða, b, cÞ when c £ 0 If c 0 then the MHNða, b, cÞ density is proportional to x aÀ1 exp ðÀbx 2 À jcjxÞIðx > 0Þ: Using the generalized version of the AM-GM inequality (Steele 2004) in the exponent part of the target density, we can construct a proposal kernel x aÀ1 exp ðÀbx 2 À jcjxÞ x aÀ1 exp Àðb þ jcjÞx 2bþjcj bþjcj (7) (see Theorem 3 with the choice m ¼ 1), which can be normalized to make a proper proposal density. The functional value of the above proposal kernel matches with that of the target density (i.e., equality in the Equation (7)) at the point x ¼ 1 irrespective of the values of the parameters a, b, c: It follows from the characteristics of the AM-GM inequality (Steele 2004) that the proposal kernel approximates the target density better around the point x ¼ 1 compared to that for the other points. In a quest for developing a more general class of proposal kernels, we introduce the notion of "matching point," the point where the functional value of the proposal kernel becomes equal to that of the target density. For example, in Figure 3a, the matching point for the target and the proposal kernel is located at the point x ¼ 1 irrespective of the choices of the parameters a, b, c: Furthermore, the Figure 3a depicts the functional inequality specified in Equation (7) while it indicates that the inequality may not be optimal when the matching point is always at x ¼ 1.
Therefore, we design a class of proposal kernels where the flexibility is introduced by allowing the matching point to vary while the crucial functional inequality between the target density and the proposal kernel remains unaffected. The following Theorem provides a way to construct such a class of proposal kernels. The proof of the Theorem 3 is reinforced on the generalized version of the AM-GM inequality. The introduction of the parameter m which apparently appears as an abstract algebraic trick in the proof, actually have a tangible interpretation signifying the matching point that we have discussed earlier. The proposal kernel in the Theorem 3 can be normalized to a probability density that corresponds to the ð bmþjcj 2bmþjcj Þ th power of a Gamma distribution. In particular, if we consider  (7)) is plotted for four different cases of parameter values. The marked point is the "matching point," which is at x ¼ 1 irrespective of the values of a > 0, b > 0, c < 0: In subfigure (b) the MHNða, b, cÞ½5½1½À1 density and the proposal kernel in Theorem 3 is plotted for four different choices of m, the "matching point." The difference between the areas under the target density and the proposal kernel gradually decreases, reaches an optima and further increases when the value of m gradually increases from zero. The optimal value for m appears to be larger than the mode of the target density.
T $ Gamma aðbm þ jcjÞ 2bm þ jcj , mðbm þ jcjÞ and set Y ¼ T bmþjcj 2bmþjcj then the probability density function of the random variable Y f Y ðyÞ / y aÀ1 exp Àmðbm þ jcjÞy 2bmþjcj bmþjcj , which is identical (ignoring the constants) to the proposal kernel in Theorem 3. As a consequence, a random sample from the corresponding proposal density turns out to be an appropriately transformed random number generated from a suitable Gamma random variable. The specific details of the corresponding accept reject algorithm is provided in Algorithm 3. We would like to point out that the standard implementation of the 'rgamma' function in "R" for generating Gamma random variable may not execute with a desired accuracy when the shape parameter is small (Liu, Martin, and Syring 2017). We use a procedure discussed in Liu, Martin, and Syring (2017) to tackle that specific challenge.
The Algorithm 3 is valid for all positive values of the parameter m while an opposite choice for m can result in significant efficiency gain. In order to illustrate the role of the parameter m in Theorem 3, we refer to the Figure 3b where we considered MHNð5, 1, À 1Þ density as an example. We took four different choices of matching points 0:8, 1:4, 2:15 and the mode of the distribution and plot the corresponding kernels that are obtained using Theorem 3.
It is seen that the difference between the areas under the MHNð5, 1, À 1Þ density and the proposal kernels alters as we change the matching point m. Specifically, the difference of the areas gradually decreases, reaches an optima and further increases when the magnitude of m gradually increases from zero. Thus, by selecting the optimal value for m, we can gain in efficiency. From an intuitive standpoint, matching the proposal kernel at the mode of the target density seem to be a reasonable strategy. But in actuality, the optimal value for m transpires to be larger than mode, seemingly due to the fact that the target density, as well as the proposal kernel is rightly skewed. We reach to a congruous conclusion from the Figure 3b as well. Consolidating the heuristics that we have discussed so far in this paragraph, we contrive the following Lemma where we establish the uniqueness of the optimal matching point. b. For any a > 1, b > 0 and c < 0, the function m 7 ! A neg ðm, a, b, cÞ has a unique maxima at a point m opt where m opt > X mode , the mode of the distribution. c. For all a > 1, b > 0 and c 0, The part(c) of the Theorem implies uniform efficiency of the Algorithm 3. The proof of part(c) is highly nontrivial and requires several properties of Gamma and Digamma functions including the Ramanujan's double inequality for the Gamma function (Alzer 1997(Alzer , 2003Batir 2005Batir , 2008. The acceptance probability A neg ðm, a, b, cÞ becomes unity when the parameter c is set to zero irrespective of the choice for m > 0. The part(b) of the lemma ensures the existence of the optimal matching point whereas the optimization of m 7 ! A neg ðm, a, b, cÞ for finding m opt is nontrivial. As it is seen in the part(a) of Lemma 4, the parameter m does not involve the Fox-Wright function in the expression of A neg ðm opt , a, b, cÞ: Therefore, the optimization does not involve Fox-Wright function. In Supplementary Section 2.17, we provide an iterative procedure to find m opt : We anticipate that the strategy to use an iterative procedure for finding m opt and thereafter use it for sampling from the corresponding Modified-Half-Normal may be computationally demanding and time consuming. Therefore, we utilize the properties of the MHN density and the intuitive understanding of the position of m opt to construct a non iterative procedure to design m init , a conducive value which is close to m opt : In the following paragraph we discuss the formulation for computing m init : Let X mode and X inflex denotes the mode and the rightmost inflection point of the Modified-Half-Normal distribution. We refer to the Lemma 3 for the exact form of the X mode while X inflex is obtained by finding the largest real root of a quartic polynomial (Jenkins and Traub 1972;Venables, Hornik, and Maechler 2019). Then we define The candidate m init is designed to be in between the mode and the rightmost inflection point of the distribution. Depending on the skewness of the distribution, the fraction k carefully manages its proximity to the point X mode : For the numerical evaluation of k, there is no need to compute the Fox-Wright functions as its expression involves f MHN ðÁÞ in the numerator and the denominator as well. Although the development of m init is based on heuristics, it serves the need of the Algorithm 3.
In Table 1, we include the values of A neg ðm init , a, b, cÞ, the acceptance probabilities of the corresponding algorithms when the matching point is set to m init : Table 1 also displays A neg ðm opt , a, b, cÞ, the maximal value of the acceptance probability and the number of iterations required to obtain m opt after starting the iterative procedure from m init : In the Figure 4a, we plot the acceptance probabilities for different values of a, c when b is set to 1. The values of c are used in the X axis while the different colors of the curves pertain to different choices of a. It appears from the figure as wells as from Table 1, that the value of A neg ðm init , a, b, cÞ is larger than 0.9. In Figure 4c, we plot the acceptance probability values when m opt is used in the Accept Reject algorithm. Comparing the Figure 4a and c it appears that there is an increment in acceptance probability while the improvement after employing the iterative procedure is not quite significant. From Table 1, we see that improvement is less than 0.2% when comparing the acceptance probabilities corresponding to m init and m opt : In the fifth of column of the Table 1, we listed the number of steps required to get m opt starting the iterative algorithm from m init : The number of iterative steps required for convergence is significantly small if the iterative algorithm is started from m init : Therefore, m init appears to be in the vicinity of m opt : Thus, m init can effectively replace the need for an iterative algorithm. In particular, as a conservative practice we suggest to execute the single step of the iterative algorithm after starting it from the m init to get m recomend to be used as the recommended value for m. Although, the current context is completely different, the strategy is motivated from of the one-step estimators (Jure ckov a and Sen 1990). It is seen from the Table 1 (compare seventh and eighth columns) that the acceptance probability of the corresponding accept reject algorithms are not much different even if the computationally inexpensive recommended value m recomend is used instead of the m opt for the purpose (compare Figure 4b and c).
In the rest of the section, we discuss the efficiency of the Algorithm 3 when the optimal value of the parameter m is used. In Figure 4, we plot the Acceptance Probability where in the X axis we plot the quantity C ¼ c ffiffi a p , to keep parity with the Figure 2. The different colored curves pertain to the different specification of the parameter a ! 1: The sixth, seventh, and eight column shows the acceptance probability when the m init , m recomend , and m opt is utilized as a specification of the matching point. The fifth column provides the number of iterative steps required to obtain m opt when the algorithm is started from m init : The efficiency of the algorithm increases with the increment of the magnitude of a. It is clear from the plot that the worst possible acceptance probability is larger than 0.90.

Inference for the parameters when data is generated from MHNða, b, cÞ
Although, it is not the primary objective of this manuscript, we consider estimation for the parameters of MHNða, b, cÞ distribution. We see from Lemma 6 that the Truncated normal, Square root of Gamma, Half normal, and Gamma distributions are specific cases of the MHNða, b, cÞ distribution. Therefore, it is reasonably flexible family of distributions to model continuous data that lies on the positive part of the real line. In particular, we discuss the method of moments and the maximum likelihood estimation procedures, which does have certain nontriviality in the current context. Let X 1 , :::, X n be independent observations from the MHNða, b, cÞ distribution. For denoting the kth sample moment of the data, we introduce the notation m k ¼ 1 n P n i¼1 X k i , for k > 0: According to the standard practice, the method of moment estimator for the parameters are obtained by equating the sample moments with the corresponding population moments that are expressed as a function of the unknown parameters. In the current context, the implementation of the standard method of moment estimation would reduce to solving for a, b, c using the equations m k ¼ i for k ¼ 1, 2, 3 (see part(a) of the Lemma 2). Finding solutions to the above set of equations are complicated as it involves the Fox-Wright functions. In order to mitigate the difficulties, we utilize the recursion relation in the part(b) of the Lemma 2 instead of using the formula in part(a). In particular, we solve a, b, c from the following set of equations m kþ2 ¼ aþk 2b m k þ c 2b m kþ1 for k ¼ 0, 1, 2, where m 0 ¼ 1: The exact form of the solutions is provided in the Supplementary Section 2.18. The estimators obtained by the above procedure are statistically consistent and also asymptotically Normal. We provide for the asymptotic variance in Supplementary Section 2.18.
To obtain the Maximum Likelihood estimate in the current context, consider the loglikelihood function (see Equation (1) To employ the Newton-Raphson procedure, we require computation of the first and second order derivatives of the Fox-Wright function. The computation is non trivial especially to evaluate the derivatives involving the parameter a. The theoretical expressions of the associated gradient vector and the hessian matrix are provided in the Supplementary Section 2.19. We use the "Nonsmooth Optimization by Mesh Adaptive Direct Search" (NOMAD) procedure (Audet and Dennis 2006;Abramson et al. 2011;Audet and Hare 2017) to minimize the negative of the log-likelihood function. Specifically we use the "R" interface via the function "snomadr" implemented in the "R" package "crs" (Nie and Racine 2012). The optimization procedure requires a set of initial values which we obtain using the method of moments procedure that we discussed earlier. The performance of the estimation procedures is assessed via simulations that we discuss next. The objective of the simulation study that we have conducted here is to assess the performance of the proposed estimation procedures. We fix a set of true parameter values a true , b true , and c true to generate 1000 different data sets, each containing i.i.d samples from MHNða true , b true , c true Þ: We perform the moment-based estimation procedure to get the corresponding estimated values, namelyâ,b, andĉ: We calculate Mean Squared Error (MSE) where the average is taken over all the squared error differences between the true and the estimated values of the parameters for all 1000 data sets. We replicate the whole procedure thrice while changing the sample size for each of the data sets, which were fixed at n ¼ 5000, 20,000, and 40,000 correspondingly. The results are summarized in the Table 2a. We observe that the MSE decreases as the sample size increases. Also, from the Table 2b, we observe that the maximum likelihood procedure performs relatively better than that of the method of moments procedure. we have developed decreases as the sample size increases. On the other hand, (b) compares the estimates based on specific MOM and that of the NOMAD optimization procedure.