Using the Lambert Function to Estimate Shared Frailty Models with a Normally Distributed Random Intercept

Abstract Shared frailty models, that is, hazard regression models for censored data including random effects acting multiplicatively on the hazard, are commonly used to analyze time-to-event data possessing a hierarchical structure. When the random effects are assumed to be normally distributed, the cluster-specific marginal likelihood has no closed-form expression. A powerful method for approximating such integrals is the adaptive Gauss-Hermite quadrature (AGHQ). However, this method requires the estimation of the mode of the integrand in the expression defining the cluster-specific marginal likelihood: it is generally obtained through a nested optimization at the cluster level for each evaluation of the likelihood function. In this work, we show that in the case of a parametric shared frailty model including a normal random intercept, the cluster-specific modes can be determined analytically by using the principal branch of the Lambert function, . Besides removing the need for the nested optimization procedure, it provides closed-form formulas for the gradient and Hessian of the approximated likelihood making its maximization by Newton-type algorithms convenient and efficient. The Lambert-based AGHQ (LAGHQ) might be applied to other problems involving similar integrals, such as the normally distributed random intercept Poisson model and the computation of probabilities from a Poisson lognormal distribution.


Introduction
Time-to-event data pervade applied research, and can be found in domains as diverse as the life sciences, economics, or engineering. In particular, medical and epidemiological studies are frequently concerned with the analysis of time-to-event data through hazard regression models for censored data. In many situations, the data possess a hierarchical structure, for example, individuals coming from the same geographical area, treated in the same hospital. This induces a correlation between observations coming from the same cluster: for example, patients treated in the same hospital might have more similar outcomes than patients treated for the same disease in another hospital because they benefit from somewhat homogeneous treatment procedures. The resulting between-cluster heterogeneity can be taken into account in the modeling of time-to-event data through the introduction of a shared frailty parameter at the cluster level acting multiplicatively on the hazard rate, which corresponds to adding a random intercept to the logarithm of the hazard rate. The terminology comes from the fact that individuals from a given cluster are assumed to share a common frailty toward the occurrence of the event of interest.
There is an extensive literature on shared frailty models (see e.g., Duchateau and Janssen 2008;Wienke 2010;Balan and Putter 2020) and many sofware solutions for fitting such mod-CONTACT Hadrien Charvat h.charvat.ef@juntendo.ac.jp Faculty of International Liberal Arts, Juntendo University, Tokyo, Japan. Supplementary materials for this article are available online. Please go to www.tandfonline.com/r/TAS. els exist, in the context of semiparametric proportional hazard regression models as well as parametric or flexible models (Rondeau et al. 2012;Liu et al. 2017;Balan and Putter 2019;Crowther 2019;Charvat and Belot 2021;Therneau 2022). The gamma distribution is popular for fitting shared frailty with a random intercept partly because of mathematical convenience: in that case, the marginal likelihood for a cluster, obtained by integrating the corresponding conditional likelihood over the random effect distribution, has a closed-form expression. Nonetheless, several other random effect distributions are commonly used and, as in many other mixed effect models, the normal distribution, corresponding to the assumption of lognormally distributed frailties, retains its popularity.
One of the issues associated with the use of lognormally shared frailties is the fact that there is no closed-form expression for the marginal likelihood. Numerical integration procedures are thus required to compute an approximation of the integral defining the marginal likelihood. In particular, an improvement of standard Gauss-Hermite quadrature, described by Liu and Pierce (1994) and Pinheiro and Bates (1995), and known as adaptive Gauss-Hermite quadrature (AGHQ), provides a very powerful tool to compute approximations to such integrals. However, one of the difficulties associated with the implementation of AGHQ is that it requires the estimation of the mode of the integrand appearing in the definition of the cluster-specific marginal likelihood, which basically involves solving as many optimization problems as there are clusters for each evaluation of the likelihood of the model during the optimization process, although it is possible in practice to overcome this requirement by updating the modes only every few iterations as suggested in Tuerlinckx et al. (2006).
In the following, after restating the usual procedure to compute an approximation of the full likelihood of a flexible parametric hazard regression model with a random intercept based on AGHQ, we show how the introduction of the Lambert W function (Corless et al. 1996) allows a simplification in the formulation of the cluster-specific marginal likelihood, by providing an analytical formula for the mode of the integrand. Although it does not provide an analytical formula for the likelihood but only for its approximation, this Lambert-based AGHQ (LAGHQ) procedure proves useful in designing efficient optimization procedures based on exact formulas for the gradient and Hessian of the AGHQ-approximated likelihood. To conclude, we show how LAGHQ can be used to derive numerical approximations of integrals appearing in closely related problems, such as the computation of the marginal hazard and marginal survival from a normal random intercept hazard regression model, or to compute the marginal likelihood of a Poisson model with a normally distributed random intercept.

Flexible Parametric Shared Frailty Model with a Random Intercept
The problem of interest in this work is the analysis of timeto-event data, that is, data pertaining to the occurrence of an event along time. We assume that, for each individual of the study population, information on the follow-up time and the status toward the event of interest at the end of the follow-up is available. Individual-level information on covariates is also available and the study population is further assumed to be partitioned into clusters. Under this setting, a common research question is to describe the hazard rate of the event of interest, that is, the instantaneous rate of occurrence of the event, as a function of time and of the covariates while accounting for the hierarchical nature of the data.
With T a nonnegative random variable representing the time of event occurrence, the hazard rate h at time t is formally defined as dt and is related to the survival function S(t) = P(T > t) by the relationship The general formulation of the flexible parametric shared frailty model under discussion here specifies the hazard rate, h, as a parametric function of time t, a vector of covariates x, and a random intercept w ∼ N(0, σ 2 ) defined at the cluster level The vector of model parameters β contains parameters required to model the baseline hazard as well as the effects, potentially time-dependent, of the covariates.
In order to give the reader an idea of the kind of models included in this general formulation, we present two examples: • a Weibull model with scale parameter λ = exp(β 01 ) and shape parameter ρ = exp(β 02 ) and with two covariates x 1 , x 2 assumed to have a proportional effect on the baseline hazard • a model in which the logarithm of the baseline hazard is modeled by a quadratic polynomial of time and including a variable x 1 with a linear time-dependent effect h(t, x; β) = exp(β 01 + β 02 t + β 03 t 2 + β 11 x 1 + β 12 x 1 t).
In the following, it is assumed that h is an at least twice differentiable function of β but the particular parametric function of time and covariates used to model the hazard rate has no impact on the subsequent developments. It should be noted that the semiparametric Cox hazard regression model, in which the hazard rate is left unspecified, does not fall under the scope of the present work. Indeed, the estimation of the parameters of this model is based on the partial likelihood, which is structurally different from the likelihood of fully parametric hazard regression models considered here. The fitting of semiparametric hazard models including random effects is also based on specific methods such as the maximization of the penalized partial likelihood (Ripatti and Palmgren 2000;Duchateau and Janssen 2008).

Likelihood Formulation
For each individual j, j = 1, . . . , d i from cluster i, i = 1, . . . , D, let t 0ij denote the time of entry in the study, t ij the observed failure time and δ ij an indicator variable taking the value 1 in the case of an event and 0 in the case of right censoring. Under the assumption of non-informative right censoring, and in the case of non-left truncated data, the likelihood for a single observation j from cluster i conditional on the value of the random effect w i is The conditional likelihood for cluster i is and the likelihood of the model, as a function of the vector of parameters β and the variance of the normally distributed random intercept, σ 2 , is then obtained by integrating the product of the cluster-specific conditional likelihoods over the distribution of the random effects w = (w 1 , . . . , w D ) Because the individual observations are grouped into independent clusters, the covariance matrix of the random effects is w = σ 2 I D , with I D the identity matrix of size D, and the previous formula can be written as a product of unidimensional integrals In the last line of the previous equation, we introduced the marginal likelihood for cluster i Note the subscript i attached to w was removed, as the latter is just a dummy integration variable. The maximum likelihood value of the vector of parameters η = (β , σ ) is eventually obtained as the solution of the optimization problem The cluster-specific marginal likelihood L i defined in Equation (2) does in general not have a closed form and it is thus necessary to use numerical methods to approximate its value: this will be the subject of the next two sections.

Adaptive Gauss-Hermite Quadrature
The Gauss-Hermite quadrature is a numerical integration technique (Davis and Rabinowitz 1983) used to approximate integrals of the form by computing a weighted sum of the function f evaluated at particular points called the quadrature nodes. Without going into too much detail, and departing from the usual way of introducing it, let us notice, as in Charvat and Belot (2021), that Gauss-Hermite quadrature might equivalently be defined as a weighted sum of the integrand evaluated at a certain set of quadrature nodes. This is the formulation we use here, as it is analogous to the definition of nodes and weights in the case of the adaptive Gauss-Hermite quadrature to be presented subsequently. We thus write the n-point Gauss-Hermite quadrature approximation of J For a given number n of quadrature nodes, the nodes x k and weights ρ k are computed by the formulas where the x H k 's are the zeros of the nth order Hermite polynomial H n , and the ρ H k 's are given by In practice, standard statistical and mathematical software have procedures to calculate efficiently and accurately the x H k 's and ρ H k 's. The n-point quadrature formula gives the exact value of the integral whenever f is a polynomial of degree at most 2n − 1.
The cluster-specific marginal likelihood L i of Equation (2) can then be approximated by Gauss-Hermite quadrature It might be noticed that the Gauss-Hermite quadrature nodes and weights do not depend on the function f appearing in the integrand. This means that the positions of the nodes might not cover adequately the region of variation of the function resulting in (a) a poor approximation of the integral and (b) the necessity of using a large number of nodes to try to improve that approximation. Concerning the last point, the procedure becomes rapidly inefficient from a computational point of view because, when increasing the number of quadrature points, an increasing proportion of the corresponding weights falls below machine precision (Trefethen 2022).
is a strictly positive unimodal function, which is the case for example when f is a likelihood function, Liu and Pierce (1994) and Pinheiro and Bates (1995) suggested the use of the so-called adaptive Gauss-Hermite quadrature: the idea is to transform the integrand in order to obtain a new quadrature formula in which the nodes and corresponding weights depend on the function f and thus, also on g: the nodes are translated and rescaled so that they cover the region where the integrand g varies most, that is, around its mode. Specifically, the nodes are shifted so that they are centered on the mode, μ g , of the integrand and rescaled by a dispersion factor σ g , corresponding to the negative inverse of the second derivative of the logarithm of the integrand evaluated at μ g . These values are then used to transform the Gauss-Hermite quadrature nodes and weights according to the relationships We can then obtain the n-point AGHQ-based approximation, L n i , of the cluster-specific marginal likelihood defined in Equation (2) by where the x * i,k 's and ρ * i,k 's are defined similarly as in Equation (3), with respect to the mode μ i and dispersion factor σ i of the ith cluster-specific integrand.
Computing the modes and the dispersion parameters corresponds to solving cluster-specific optimization problems: solving for the first derivative of the integrand, which typically requires the use of a numerical optimization routine, and computing the second derivative at the estimated mode. However, the gain in precision in the approximation of the integrand is better than what can be hoped by increasing the number of quadrature points in a nonadaptive Gauss-Hermite quadrature procedure, as has been verified empirically for various types of mixed effect models (see e.g., Pinheiro and Bates 1995;Lesaffre and Spiessens 2001;Rabe-Hesketh et al. 2005). In the next section, we show how the particular structure of L i -the clusterspecific marginal likelihood of the flexible shared frailty model with a normal random intercept-allows us to obtain an exact expression for μ i and σ i in terms of the model parameters and the observed data.

Lambert-based Adaptive Gauss-Hermite Quadrature
Equation (2) for the cluster-specific marginal likelihood can be written as Let us now focus on the logarithm of the integrand appearing in Equation (4), defined as The mode μ i of the integrand-which is also the mode of its logarithm g i -can be determined as the solution of the equation where the right-hand side of the equation is a positive number. The solution of this equation can be obtained by using the Lambert W function, also known as the product logarithm (Corless et al. 1996;Lehtonen 2016). First appearing in a paper by the mathematician Johann Heinrich Lambert (1758) and derived in its current form by Leonhard Euler (1783), the Lambert function is defined as the multivalued inverse of the complex function y → y exp(y). In particular, when restricted to real numbers x ≥ 0, the principal branch of the Lambert function, W 0 , is characterized by the property that W 0 (x) is the unique real number such that Moreover, the derivative of the Lambert function is given by the elementary relationship .
Available in most standard mathematical and statistical computing environments, for example, the lamW package in R (Adler 2022), the Lambert function is used in the solution of numerous applied problems, notably in physics and in the biological sciences. As mentioned in Lehtonen (2016), it appears frequently when dealing with equations featuring the exponential or logarithmic function together with its argument, which is precisely the case of Equation (5). Going back to our problem, we can now see that an explicit solution for ν i can be obtained in terms of b i and c i as The curvature of g i at its mode, κ i , is then obtained as the value of the second derivative of g i evaluated at μ i : From the results of the previous section, we can obtain the n-point AGHQ-based approximation, L n i , of L i defined in Equation (2) by (6) Replacing μ i and σ i by their values expressed in terms of ν i in the formulas for the nodes and weights of AGHQ in Equation (3), Equation (6) can be simplified to Notice that the 1-point AGHQ-based approximation, which is equivalent to the first-order Laplace approximation of the marginal likelihood for cluster i, is simply given by Finally, the n-point AGHQ-based approximation of the clusterspecific marginal log-likelihood can be written as and the log-likelihood of the model can thus be approximated by The use of the Lambert function to obtain an explicit expression of ν i in terms of b i , c i and σ -and thus, ultimately, in terms of the vector of model parameters (β , σ ) -conveniently removes the need for a numerical optimization procedure to determine μ i in each cluster at each evaluation of the likelihood function. It is thus rather easy to obtain analytical formulas for the gradient and Hessian of the n-point approximation of the likelihood and of its logarithm as a function of the model parameters (see Appendix I, supplementary materials), which makes likelihood optimization by Newton-based algorithms particularly convenient. A further benefit of this procedure is that, contrary to the case of standard AGHQ, there is no need to compute the value of the cluster-specific conditional likelihood as defined in Equation (1), at each quadrature node. Indeed, this product involving potentially many terms can rapidly lead to under-or overflow issues depending on the current value of the model parameters, and usually requires that care be taken in performing its calculation. As a simple example, if the individual likelihood contributions of a cluster of size 500 all have a value smaller than 0.1 for a given vector of parameters, the product of these individual contributions will generally erroneously evaluate to 0 because the correct result is lower than the smallest positive number representable in the computer memory; this value depends on the computing environment and is around 5 · 10 −324 in R, for example.
Finally, after convergence of the optimization procedure, the covariance matrix of the model parameters can be easily obtained as the inverse of the analytical Hessian of the approximated likelihood evaluated at the maximum likelihood value of the parameters ( β , σ ) , and the cluster-specific random effects, commonly referred to as the empirical Bayes or shrinkage estimates, can be computed as To close this section, it should be noted that the principle of approximating L i by LAGHQ detailed here applies more generally to integrals of the form with a, b, c real numbers, and c ≥ 0. We will give several examples of such integrals in the following sections.

Random Intercept Frailty Model for Left-Truncated Data
In the presence of left truncation, the formulation of the likelihood must take account of the fact that the delayed entry times are conditional on the value of the frailty defined at t = 0, as described for example in van den Berg and Drepper (2016). Indeed, individuals from clusters with a lower value of the frailty are more likely to live longer, so that the distribution of the frailty in a population with delayed entry times differs from the distribution of the frailty in the same population at t = 0. Following Charvat and Belot (2021), the marginal likelihood for cluster i becomes where t 0ij is the delayed entry time for individual j from cluster i. The integral appearing in the denominator can also be approximated by LAGHQ as it is of the form of Equation (8) with By defining ν 0i = W 0 (σ 2 c 0i ), the n-point LAGHQ-based approximation of the cluster-specific marginal likelihood can then be expressed as Maximum likelihood optimization will then usually be based on the sum of the approximated cluster-specific log-likelihood contributions Finally, the gradient and Hessian of n i can be obtained analytically by applying the results of Appendix I, supplementary materials to the logarithm of the LAGHQ-based approximation of both the numerator and denominator of Equation (9).

Marginal Hazard and Marginal Survival
Using the estimated parameters ( β , σ ) , obtained by maximizing the approximation of the likelihood defined before, it is then possible to predict the marginal survival function at a given time t and for a particular value of the covariates x as Once again, we notice that this can be expressed in the form of Equation (8) with and we can thus obtain the n-point LAGHQ-based approximation of S Marg by Equation (7), with ν S = W 0 ( σ 2 c), as We can also express the time derivative of the marginal survival as This has the form of Equation (8) with 5.6 · 10 −2 (1.7 · 10 −2 , 9.8 · 10 −2 ) 5.8 · 10 −6 (5.7 · 10 −7 , 1.2 · 10 −4 ) Time (sec) 177.9 (170.6, 188.9) 1. 2 (1.1, 1.4) and we can thus obtain the n-point LAGHQ-based approximation of S Marg by Equation (7), with ν S = W 0 ( σ 2 c exp( σ 2 )), as The marginal hazard computed from the model parameters, h Marg , is expressed as

Its n-point LAGHQ-based approximation is then given by
From these formulas, it is possible to express analytically the gradient of the approximations of the marginal hazard and of the marginal survival considered as functions of the model parameters using once again the results of Appendix I, supplementary materials. It is then easy to compute their delta method-based variance.

Small-Scale Simulation
In this section, we provide a very succinct illustration of the computational benefit in using LAGHQ instead of AGHQ. The mexhaz R package (Charvat and Belot 2022) can be used to fit flexible parametric hazard regression models including a normally distributed random intercept: for such models, the computation of the likelihood was originally based on standard AGHQ, as described in Charvat and Belot (2021), but thanks to the developments presented in this work, LAGHQ has become the default method since version 2.0. Standard AGHQ can still be used by setting the argument exactGradHess to FALSE. By default, the number of quadrature nodes is set to 10 and optimization of the log-likelihood is based on the nlm function (see Schnabel et al. 1985): it implements a Newton-based algorithm supplemented by a step selection strategy based on a line search. First-and second-order derivatives can be defined analytically (option used for the LAGHQ-based approach) or be based on finite difference evaluations (option used for the standard AGHQ-based approach). Clustered time-to-event data were simulated with the simsurv R package (Crowther and Lambert 2010;Brilleman and Gasparini 2021). Briefly, we simulated event times based on the Weibull hazard h(t, X 1 , X 2 |w) = 1.5 · 0.02 · t 0.5 exp (0.5 · X 1 + 0.05 · X 2 + w) with X 1 ∼ Bern(p = 0.45), X 2 ∼ N(μ = 44, σ = 8.5) and w ∼ N(μ = 0, σ = 1). Because the properties of the model have already been studied in detail in Charvat et al. (2016), we just present results related to a few computational characteristics of LAGHQ and standard AGHQ when used on relatively large datasets. Two scenarios were considered: in the first one, a population of 100,000 individuals was partitioned into 5000 clusters of size 20; in the second one, it was partitioned into 20 clusters of size 5000. For each scenario, 100 datasets were simulated. Results shown in Table 1 are given in terms of median as well as first and third quartiles of the empirical distribution of the quantities of interest obtained on the simulated datasets.
In the first scenario, both approaches gave very similar results, with convergence for all datasets and a maximum difference in absolute values between coefficient estimates of 6.9 · 10 −5 over the 100 simulations (median of 1.9 · 10 −6 ). However, the median time to convergence for the LAGHQbased approach was of 4.6 sec compared with 146.1 sec for the AGHQ-based approach. This difference was mainly explained by the difference in the number of function evaluations required for the finite difference-based approximation of the gradient and Hessian in the AGHQ-based approach. Also, although the gradient tolerance was set to the same value of 10 −8 , the LAGHQ-based approach provided gradients that were closer to zero than when using standard AGHQ. Results from the second scenario were qualitatively similar. Because the number of clusters was very small, fitting the model required more iterations, which further increased the time to convergence of the standard AGHQ-based approach with the model even failing to converge in two simulations. The computational time of the LAGHQ-based approach mainly depended on the number of clusters which explains very short times to convergence in this scenario with only 20 clusters. It should be noted that these results, obtained with a computer running an Intel Core i5 CPU with 16GB of RAM, are just given for reference. In particular, because the steps involved in the computations based on the two approaches were not optimized in a uniform way, the times to convergence are not strictly comparable. For example, the standard AGHQ-based approach can also be used to fit shared frailty excess hazard regression models for which there is no LAGHQ-based equivalent method, and thus contains some extra steps. Nonetheless, LAGHQ seems to provide an efficient alternative to standard AGHQ. Appendix II, supplementary materials summarizes the performance of both approaches in terms of bias, mean squared error (MSE), and coverage probability of the estimated model parameters. As expected, results based on the AGHQ and the LAGHQ approaches were almost identical, and qualitatively similar to those obtained in the more detailed study of the AGHQ approach described in Charvat et al. (2016). In particular, there existed some bias in the estimation of the scale parameter of the Weibull hazard and the variance of the random effect, as well as a coverage probability lower than the nominal value of 95%, in the scenario with datasets including only 20 clusters.

Real Data Example
We briefly present an example from veterinary research with the analysis of the time to first insemination in dairy cows. Some of the analyses realized on the original dataset are described in Duchateau and Janssen (2008) and, for the purpose of this illustration, we used the insem dataset from the parfm R package (Rotolo et al. 2018), corresponding to a simulated dataset reproducing the structure of the original dataset which is not publicly available. The dataset contains information on the time to first insemination of 10,513 dairy cows coming from 181 herds. Cows were followed up from the thirtieth day after their previous calving and for a maximum duration of 300 days, with culling representing the censoring event. We used a Weibull hazard with scale parameter λ and shape parameter ρ, and the parity of the cows, dichotomized into primiparous (heifer) and multiparous, was considered as an explanatory variable for the time to first insemination. The original analysis assumed a gamma distributed frailty at the herd level, but we present here results obtained when assuming a log-normally distributed frailty. Table 2 summarizes the estimated parameters with their standard error obtained with the AGHQ-and LAGHQ-based maximum likelihood approaches implemented in mexhaz, and also with the parfm function that allows the estimation of fully parametric frailty models using various frailty distributions and based on the approximation of the likelihood by Laplace transforms, and the saddlepoint approximation in the case of the lognormally distributed frailty (Munda et al. 2012). In mexhaz the parameters of the Weibull hazard as well as the standard deviation of the random effect distribution are expressed on the logarithmic scale in order to allow unconstrained optimization: they were thus transformed, with appropriate rescaling of their standard error based on the delta method, to allow easy comparison with the parameters of the model estimated with parfm. Results showed that the three models led to almost identical parameter estimates and it was once again observed that the LAGHQ-based approach provided fast convergence: 0.2 sec compared with 7.8 sec for standard AGHQ and 52.9 sec with parfm.
The standard deviation of the random effect distribution was estimated at around 0.73, and predicted herd-specific empirical Bayes estimates ranged from −2.28 to 1.64. The left side of Figure 1 shows the predicted herd-specific, or conditional, hazard rates for heifers from a random selection of 20 herds (light gray lines), as well as the conditional hazard rate for a hypothetical herd for which the value of the random effect would be 0 (black line). All these conditional hazard rates are increasing functions of time, illustrating the fact that, at the herd level, the rate of first insemination increases with time, that is, the instantaneous probability of getting inseminated conditional on the event not having occurred yet increases with time. The right side of Figure 1 shows the population-average, or marginal, hazard rate, obtained through the computations detailed in Section 2.5. We can observe that the shape of the marginal hazard rate is different from that of the conditional rates: after an initial steep increase, the marginal hazard rate tends to plateau and even slightly decrease for longer follow-up times. This illustrates a selection effect at the population level (Duchateau and Janssen 2008;Vaupel and Yashin 1985): cows from herds with a high value of the random effect, that is, with a higher instantaneous rate of first insemination, present the event earlier so that the population of cows gradually contains a higher proportion of cows from herds with a lower value of the random effect, and thus a lower instantaneous rate of first insemination.

Concluding Remarks
Although restricted to flexible parametric hazard rate regression models with a normally distributed random intercept, the results presented here are of interest in that they provide an elegant solution to the problem of determining the mode of the integrand appearing in the definition of the cluster-specific marginal likelihood. The theoretical benefit is certainly limited as the proposed approach does not provide a closed-form solution to the marginal likelihood but only to its AGHQ-based approximation, and as there is no direct extension to models including multiple random coefficients. Be that as it may, by providing exact formulas for the gradient and Hessian of this approximated likelihood, LAGHQ provides a considerable improvement com- pared to standard, optimization-based, AGHQ, and the illustration showed on a few examples how it could reduce drastically the number of function evaluations, resulting in a substantial reduction in computational time. Besides, removing the need for the internal optimization procedure eliminates all the potential related problems, such as poor approximation of the modes or even nonconvergence, that might occur for each cluster at each evaluation of the likelihood, and result in failed convergence of the model. In this work, we have focused exclusively on the parametric shared frailty model with a normal random intercept but LAGHQ might be applied to other problems in which numerical approximation of integrals of the form given in Equation (8) appears. For example, as noted by Holford (1980), the likelihood of the Poisson log-linear model has a very similar structure to the likelihood of hazard regression models. Without entering into too much detail, this model is used for counts or rates of occurrence of an event of interest. The number of events Y is assumed to follow a Poisson distribution with mean λ, the latter depending log-linearly on a vector of covariates x or more generally, on a function of these covariates. The general expression for λ in the context of clustered data modeled with a random intercept w, is (Austin et al. 2018) In that equation, o designates an offset term taking the value 1 in the case of a model for counts and representing the observed population at risk in the case of a model for the rate of occurrence of the event of interest. The conditional likelihood for the observed number of events y ij in unit j from cluster i, is proportional to and the cluster-specific marginal likelihood can be written in the form of Equation (8) with The same approach to approximating the model log-likelihood with LAGHQ developed in Section 2.3, as well as to obtaining the analytical gradient and Hessian of the resulting approximation, can thus be applied. From a general point of view, there are several methods for approximating the full likelihood of mixed effect models with normally distributed random effects. Here, we have focused on the AGHQ as first described by Liu and Pierce (1994). It should be mentioned that there exists a second version of AGHQ described by Naylor and Smith (1982) in a Bayesian context: the shifting and scaling of the Gauss-Hermite quadrature nodes and weights are in that case based on the mean and variance of the posterior density of the random effects given the observations. This version of AGHQ is interesting in that it extends without difficulty to hierarchical models with more than two levels (Rabe-Hesketh et al. 2005). However, it requires the use of an iterative algorithm to determine the cluster-specific posterior mean and variance at each iteration of the optimization procedure. In the case of the random intercept models considered in this work, it would thus suffer the same kind of efficiency issues that LAGHQ addresses.
Another approach to approximating the likelihood of mixed effect models with normally distributed random effects is the use of the Laplace approximation: instead of approximating the integral defining the cluster-specific marginal likelihood, the integrand is approximated by a function that leads to a tractable integral (Tuerlinckx et al. 2006). More specifically, the Laplace approximation is based on a Taylor series expansion of the logarithm of the integrand around its mode. As noted previously, the first-order Laplace approximation, based on a second-order Taylor series expansion, is equivalent to the 1-point AGHQ; higher-order approximations can be obtained by including more terms from the Taylor series expansion (Raudenbush et al. 2000). As Laplace approximations are based on the evaluation of a particular function at the mode of the integrand of the integral to be approximated, it requires, as in the case of AGHQ, that this mode be estimated. Consequently, the Lambert functionbased determination of the mode described in this work can also be used to obtain Laplace-based approximations of integrals of the form of Equation (8). Laplace approximation and AGHQ are both commonly used in software for mixed effect models as they are considered to provide good accuracy while being computationally efficient (Skrondal and Rabe-Hesketh 2004). Recent works have estimated the asymptotic approximation error of the AGHQ estimator to be of order O(max(D −1/2 , d − (n+2)/3 )) where D is the number of clusters and d is the cluster size, so that AGHQ with 4-6 nodes might achieve the same level of accuracy as second-order Laplace approximation (Jin and Andersson 2020).
Instead of approximating the cluster-specific marginal likelihood contributions, the log-likelihood of mixed effect models can also be directly approximated by the so-called h-likelihood which corresponds to the sum of the logarithm of the integrands appearing in the integrals defining the cluster-specific marginal likelihood contributions (Lee and Nelder 1996). Estimation of the model parameters then proceedes by alternating between the estimation-based on an iterative algorithm-of the fixed and cluster-specific random effects given the parameters of the density function of the random effects, and the estimation of these density function parameters based on a profile h-likelihood itself computed by a Laplace approximation. The h-likelihoodbased mixed effect model estimation is a general method that can prove interesting, in particular when the distribution of random effects is not assumed to be normal, and it has been applied to the analysis of time-to-event data by semiparametric as well as parametric hazard regression models (see e.g., Ha et al. 2001;Ha and Lee 2003). In the specific context of the models considered in this work, the Lambert function might once again be used to obtain the cluster-specific random effects given the density function parameters as the score equations to be solved are similar to Equation (5). Giving a comprehensive overview of available approaches for random effect model estimation is outside the scope of this work: more general discussions can be found for example in Tuerlinckx et al. (2006) or Skrondal and Rabe-Hesketh (2004).
Besides its use for approximating the likelihood of a restricted subclass of random intercept models, LAGHQ might be useful to approximate the probabilities of the Poisson lognormal distribution (see Grundy 1951;Bulmer 1974). Similar to the negative binomial distribution, which can be construed as a Poisson-gamma mixture, that is, the parameter of the Poisson distribution is itself considered as a gamma distributed random variable, the Poisson lognormal distribution corresponds to a Poisson distribution with a lognormally distributed parameter. It can be used to account for over-or under-dispersion in count data and has been used to model species abundance. The probability of observing r events from a Poisson lognormally distributed variable with parameters μ and σ is given by the formula P(r; μ, σ ) = 1 r!σ √ 2π exp rw − exp(w) − (w − μ) 2 2σ 2 dw.
Once again, we find that this can be written in the form of Equation (8) with Further work would be needed to assess the usefulness of LAGHQ in this context. Finally, we would like to mention an extension of LAGHQ for multivariate integrals of the form I = 1 | | 1 2 (2π) r 2 exp φ w−C exp θ w − 1 2 w −1 w dw (10) with C a positive number, φ and θ vectors in R r , and an r × r positive definite matrix. The details of this multivariate extension, based on standard multivariate AGHQ (see e.g., Tuerlinckx et al. 2006) are given in Appendix III, supplementary materials. It should be noted that this formula cannot replace standard AGHQ for approximating the likelihood of shared frailty or Poisson models with multiple random effects because the cluster-specific marginal likelihood of such models is not of the form of Equation (10). However, we have identified several related problems in which this formula might be applied. For example, it can be used to compute the predicted marginal hazard rate and survival based on multivariate random effect models, using a multivariate extension of the results presented in Section 2.5. Moreover, the cluster-specific likelihood of certain types of joint models for longitudinal and time-to-event data might also be expressed in the form of Equation (10). Once again, further work is needed to investigate these approaches based on the multivariate LAGHQ-based approximation and compare them to existing methods.
To conclude, this work adds to the literature on uses of the Lambert function to solve practical problems in the life sciences. We concur with Lehtonen (2016) in believing the Lambert function might become part of the standard toolbox of applied mathematicians and statisticians.

Supplementary Materials
Appendices.pdf File containing appendices to the main text: (a) Expressions for the first-and second-order derivatives of the LAGHQ-based integral approximation; (b) Supplementary results related to the smallscale simulation of Section 3.1; (c) Multivariate LAGHQ. Illustration.R R script allowing the reproduction of the results shown in Section 3. It requires the installation of four R packages available on CRAN, namely, simsurv to generate the simulated time-toevent datasets, rsimsum to analyze the results of the simulations, paramsurv to access and analyze the insem dataset, and mexhaz to run random intercept flexible parametric hazard regression analyses with the AGHQ and LAGHQ methods.