Shrinkage Estimation Methods for Subgroup Analyses

Abstract Subgroup analyses increasingly gain importance for pharmaceutical investigations. Conventional approaches for treatment effect estimation are controversial because of multiplicity and small sample sizes within the subsets. Hence, we consider shrinkage estimators, which combine the overall effect estimate with the estimate within a given subgroup by using a Bayesian framework. This article contains a short introduction to two shrinkage estimation approaches proposed by Dixon and Simon and by Simon. Our key contribution is to present methodical extensions to enlarge the applicabilities and provide solutions for some computational issues. Besides an application to a real dataset, we perform an extensive simulation study, in which the conventional and the shrinkage approaches are compared under different models and scenarios of a typical clinical phase III design. The simulation results clearly show that the shrinkage approaches provide much better estimates than the conventional approaches according to the mean square error and the interval range under nearly all considered investigation cases. Exceeding advantages can be observed in the case of small sample sizes and low interaction effects. Some issues occur to the width and coverage probability of the credibility intervals concerning particular variants of the shrinkage estimators.


Introduction
Subgroup analyses are commonly and increasingly performed in confirmatory clinical trials, where the treatment effect is estimated in subsets of the overall trial population defined by certain patient characteristics.The appropriateness of subgroupspecific treatment effects, which essentially comprise only information from the patients belonging to the respective subset, is controversial because of multiplicity and small sample sizes within the subsets (e.g., Yusuf et al. 1991;Sleight 2000;Kleist 2007;Wang et al. 2007).A useful alternative to the conventional approaches are estimators of subgroup effects which take the overall effect estimate into account.Shrinkage estimators belong to this class because they combine the overall effect estimate with the estimate within a given subgroup.Thus, they offer a compromise between the assumption that there is no difference across the subgroups and the assumption that the subgroup effects are completely unrelated.Through the consideration of information from the whole trial population and not only from the respective subset, shrinkage estimators normally exhibit in comparison to conventional subgroup-specific estimators a higher bias, but a lower variance.A general introduction to the concept and the advantages of shrinkage is provided by Lipsky et al. (2010).The potential usefulness of shrinkage estimators in bringing extreme chance findings into perspective is also remarked in the European Medicines Agency (EMA) guideline on the investigation of subgroups in confirmatory clinical trials (2019).Moreover, in the EMA guideline on general principles CONTACT Julian Riehl julian.riehl@tu-dortmund.deDepartment of Statistics, TU Dortmund University, Dortmund, Germany.Supplementary materials for this article are available online.Please go to www.tandfonline.com/r/SBR.
for planning and design of multi-regional clinical trials (2017) the application of shrinkage methods for the estimation of regional treatment effects, particularly in the case of small regional sample sizes and overly influential outlying values, is brought up.
The implementation of shrinkage is usually carried out with the help of Bayesian framework.In doing this, some form of prior distribution for the treatment-by-covariate interactions is assumed.Approaches by Dixon and Simon (1991) and Simon (2002) can be mentioned as examples of Bayesian shrinkage estimation.
An overview of various Bayesian procedures that can be used to examine heterogeneity in treatment effects is provided by Jones et al. (2011).The procedures outlined there were implemented in the R package beanz (Henderson et al. 2016;Wang et al. 2018).Comparing this package with the research presented in this article and its underlying implementations, two main differences emerge: 1. Differences in procedures: For example, the procedure of Simon (2002), which is very illustrative from an interpretive point of view and has been methodologically extended in our work, is not considered in the beanz package.2. Differences in parameter settings: In our approach, parameters have to be set only in the procedure of Simon (2002), which, however, are easy to interpret.With the shrinkage methods from the package beanz, on the other hand, parameter settings have to be made, whose appropriate determination is difficult, especially for users with little experience regarding Bayesian methods, and which massively influences the obtained results.Such a parameter to be set results, for example, from the fact that in the procedure of Dixon and Simon (1991) in the package beanz the a priori standard deviation of the interaction parameters is modeled using a half-normal distribution, which requires an extra parameter.We avoid this by describing the a priori variance of the interaction parameters through a modified Jeffreys' prior.Also in the calculation of the shrinkage estimators, which, unlike our preference, is not performed analytically but by Markov chain Monte Carlo (MCMC) procedures, various settings have to be made (e.g., choice of the number of iterations and chains) that can negatively affect the quality of the resulting estimators, especially if the users lack experience.
We conducted extensive simulation studies.This allows us to draw conclusions under which scenarios the shrinkage methods we considered are superior to the classical subgroup estimators.
Beyond the approaches represented in Jones et al. (2011), there are some further methods, for example, by Conti and Spezzaferri (2002), Liu et al. (2017), andIm andTan (2021), which also make use of the benefit of Bayesian modeling in the context of subgroup analysis.Moreover, Nugent et al. (2019) review Bayesian and Bayesian decision-theoretic approaches to subgroup analysis with applications to clinical trials.
In the course of this article, we initially present and enlarge the above-mentioned techniques by Dixon and Simon (1991) and Simon (2002), which differ in the prior variances of the interaction parameters.The former approach incorporates a noninformative second-level prior, concretely a modification of the Jeffreys' prior (Jeffreys 1946).In contrast, the latter one uses an informative prior variance, which is determined through the probability of a medically important qualitative treatment effect.Both methods have been originally defined for subgroup factors with two categories.We extend them to be applicable to factors with more than two categories with the aid of reparameterized modeling.Especially in the case of the Simon approach, some methodical adjustments are necessary for this purpose.Moreover, we overcome some computational issues concerning the implementation of the Dixon-Simon approach within the R package DSBayes by Varadhan and Yao (2014).As mentioned, we conduct a comprehensive simulation study in which the conventional and the shrinkage approaches are compared under different models and scenarios.Furthermore, to show the functionality of the shrinkage methods we apply them to a real dataset in pulmonary arterial hypertension.
In Section 2, we initially deal with the model set-up and present some formulas on the treatment effect estimation concerning univariate and multivariate subgroups.Following this, we launch two conventional approaches and the mentioned shrinkage estimation methods including our extensions.Section 3 contains construction details and the results of our simulation study.Afterward, we go on with the introduction of the real dataset and the application of the methods in Section 4. Finally, we discuss our findings in Section 5.

Model Set-Up
We consider a two-armed clinical study with a continuous response variable Y which can be described by the following linear regression model: (1) In this context, the binary variable T indicates whether a patient belongs to the treatment (T = 1) or the placebo group (T = 0).Continuous covariates are denoted by Z i (i = 1, . . ., k).Binary covariates, which are used for defining the subgroups, are represented by X i (i = 1, . . ., l).Conveniently, all parameters are combined into a common vector called θ = (μ, τ , α 1 , . . ., α k , β 1 , . . ., β l , γ 1 , . . ., γ l ) T .Particular importance is ascribed to the treatment-by-covariate interactions γ 1 , . . ., γ l because they are essential to the determination of subgroup effects.

Univariate and Multivariate Subgroups
First, we have to deal with some terminology.As a univariate subgroup, we denote a subset of patients specified by only one single covariate (e.g., all women).In contrast, a multivariate subgroup is defined by the combination of two or more covariates (e.g., all women under 60 years without relevant preexisting conditions).The number of the considered covariates indicates the degree of the subgroup.In case the degree is equal to the number of binary covariates used to fit the regression model we call the respective subset a subgroup of maximum degree.
In general, the treatment effect of a particular subgroup is calculated as the difference between the mean response of the treatment and the placebo group.Note that also the covariates which are incorporated into the regression model but not used for defining the respective subgroup have to be considered in this context.For a subgroup of degree n (n = 0, 1, . . ., l), which can be described by (X i = m i ∀i ∈ I) with m i ∈ {0, 1}, I ⊆ {1, . . ., l} and |I| = n, the treatment effect is representable by Based on this, simplified formulas for special cases are easily deducible: subgroups of maximum degree (i.e., n = l): univariate subgroups (i.e., n = 1): whole study population (i.e., n = 0): ( 5 )

Conventional Estimators
In the following investigations, we consider two intuitive and commonly used estimators-we call them conventional estimators.The approaches have in common that both of them are based on the least squares method, according to which, for a regression model of the form Y = Rθ + e with target variable Y, regressor matrix R, and error term e, the parameter vector θ can be estimated by θ = (R T R) −1 R T Y. Hence, the difference between the two conventional estimators does not lie in the estimation formula, but in the set of observations entering the regression model.Whereas the so-called least squares estimator comprises all patients to fit the regression model, the naïve estimator only uses the patients in the respective subgroup.In most situations, both techniques lead to identical or similar results.However, there are some cases (e.g., multivariate subgroups of high degree) in which strong distinctions occur (see Section 3.3).

Methodical Introduction
The shrinkage estimation methods being presented in this section initially are based on the fact that the least squares estimator θ , which results from fitting the regression model (1), is approximately normal: where C is assumed to be a known and positive definite matrix.Unless very small subgroups or strong outliers occur, the normal distribution approximation can be considered uncritical.
For the parameter vector θ, a normal prior distribution is chosen with a zero mean vector-this means that we assume no interaction effects a priori-and a covariance matrix D. To populate D flat priors are used for the regression parameters through the choice of variances which grow without limit.Moreover, for the moment only independent priors are assumed for the interaction parameters.As a result the limit of D −1 is a diagonal matrix with main diagonal equal to 0 k+l+2 , 1/ξ 2 1 , . . ., 1/ξ 2 l T , where ξ 2 i denotes the prior variance of the interaction parameter γ i .The specification of ξ 2 i makes all the difference between the two shrinkage estimation methods of Simon (2002) and Dixon and Simon (1991).Simon Simon (2002) considers the probability π i of a medically important qualitative interaction effect for the covariate X i .Thus, π i is equal to the probability that the treatment effect of the subgroup (X i = 1, X j = 0 ∀j = i with j = 1, . . ., l) is greater than δ given that the treatment effect of the subgroup (X j = 0 ∀j = 1, . . ., l) is zero.Here, δ is chosen as a medically important treatment effect, for example, the value used for the sample size determination.It is worth noting that π i is a design parameter that needs to be specified based on previous knowledge.It follows that

Shrinkage Estimation by
Because of the normality and independence assumptions ξ 2 i is easily computable using the following term: where denotes the distribution function of the standard normal distribution.Note that formula (6) also arises through the comparison of the univariate subgroups (X i = 1) and (X i = 0), that is, without setting all other X j to zero, if we assume that This approach is more intuitive since we would like to consider a medically important qualitative interaction effect for the covariate X i .Generally, we use the same value of π i = π and, therefore, the same prior variance ξ 2 i = ξ 2 (i = 1, . . ., l) for all binary covariates, but this is not essential.Apart from the determination of ξ 2 , formula (6) is instrumental in understanding the influence of π on the amount of shrinkage.A small value of π leads to a large squared denominator and subsequently to a low variance ξ 2 .This gives a high weight to the zero prior parameter location.Therefore, it is expected that for a small value of π the shrinkage estimator is close to the overall estimate.
Because of the chosen prior distribution, the posterior distribution of θ is also normal (Lindley and Smith 1972): where B −1 = C −1 + D −1 and b = C −1 θ .Please remark that the inverse of B, which is essential for the mean vector and the covariance matrix, is a combination of the data variance C and the prior variance D, which underlines the idea of shrinkage.
Since we have to determine weighted sums of the parameters according to Section 2.2, we are especially interested in the distribution of a linear combination η = l T θ: To assess whether there is an extensive difference in the treatment effect of the complementary subgroups determined by X i we expand the Simon approach by the introduction of the measure p X i which we define as follows: where F X i diff denotes the posterior distribution function of the difference between the treatment effects of the subgroups X i = 1 and X i = 0.The linear combination which is required to determine F X i diff equals the difference of the individual linear combinations of both subgroups.
Since p X i , despite its apparent similarity to a p-value, does not formally result from a statistical test, the determination of this quantity is not accompanied by an explicit test decision for or against a particular hypothesis.Accordingly, p X i should be understood as an exploratory measure that suggests differences or not, but without claiming statistical significance.

Solution of Computational Issues
In the following, the distribution of θ | ξ 2 is a matter of particular interest because Dixon and Simon prove that it is sufficient to apply the dimensional reduced distribution of γ | ξ 2 to decrease the computational effort.Using the R package DSBayes, in which this reduction is implemented, we noticed that the resulting posterior density becomes improper in the case of smallsized models, that is, models with only one or two subgroup defining covariates.We detected that the core of the problem is that the used R basic function integrate() cannot handle the suggested reduction.For that reason, we established a correction term to adjust regarding the minimized number of parameters and the minor determinant: , where V stands for the prior variances of the regression parameters which grow without limit.We suggest using the product of c and g γ | ξ 2 instead of g( θ | ξ 2 ) to achieve proper results as well as a time reduction.A short derivation of the correction term c is provided in the Appendix, supplementary materials.
A further measure to decrease the computational effort recommended by Dixon and Simon consists in the application of eigenvalue-eigenvector decompositions to avoid computations of inverses and determinants in the course of integration.

Extension to Non-binary Categorical Covariates
Both presented shrinkage methods are based on regression model (1), which only allows the usage of binary subgroup factors.Certainly, it is desirable to have the possibility to include also subgroup factors with three or more categories.In the following, we start from the premise that there is one categorical covariate X with levels x 1 , . . ., x l (and no additional subgroup factors).A common approach to the binarization of such a covariate is to set a reference level and to introduce dummy variables for the other levels (e.g., Toutenburg 2003, p. 71).As will be illustrated with a corresponding example in Section 4, this practice-we call it ordinary modeling-leads to results that depend on the specification of the reference level.Therefore, it is unusable for practical applications, especially if there is, naturally, no clear reference level.Essential steps of a formal proof concerning the Simon approach in the case of a trichotomous covariate is presented in the Appendix, supplementary materials.
An alternative consists in the application of a reparameterization, as it is occasionally used for analysis-of-variance models (e.g., Toutenburg 2003, pp. 197 f., 256).We propose this modeling strategy in the present case because the necessity to choose a reference level is omitted through the introduction of one dummy variable for each level.Given a categorical variable X with l levels x 1 , . . ., x l , l binary dummy variables D 1 , . . ., D l are introduced, defined such that D i equals one when X = x i and zero otherwise.Let Y = Rθ + e denote the matrix notation of the linear regression model.In comparison to the ordinary modeling, two additional parameters occur within the parameter vector θ , which leads to the result that the matrix of regressors R has no full column rank.To counteract the overparameterization we introduce the restrictions which are represented by the matrix notation Hθ = h with Then, the parameter vector θ can be estimated in the following way (Caspary and Wichmann 1994, p. 257): In this case the formulation of the treatment effects is clearly simplified: univariate subgroup (X = x j ): τ + γ j , whole study population: τ + Reparameterized models are also applicable in the case of several binary subgroup factors even though there is no necessity for their use.Provided that the binary covariate X i is represented through the dummy variables D 2i−1 (for X i = 1) and D 2i (for X i = 0) for i = 1, . . ., l we require the following restrictions: The computation of the treatment effects takes place employing formulas (2)-( 5) which have to be adjusted regarding the occurrence of the dummy variables and the increased number of parameters.
To apply reparameterized models to the Simon approach some modifications are necessary.Apart from the use of the generalized inverse instead of the common inverse, we have to use a revised term for the calculation of the (uniform) prior variance ξ 2 (see formula ( 6)).
In the following, we present the procedure to determine ξ 2 in the context of one categorical subgroup factor with l levels which is modeled in the reparameterized way.In the first step, we have to define the meaning of a qualitative interaction effect in the case of nonbinary subgroup factors.For reasons of mathematical simplification, we only choose the condition that there is no treatment effect in another certain subgroup.The probability π i can then be described by π i = P τ + γ i ≥ δ | τ + γ j = 0 for at least one j = i .Moreover, it is important to notice that the prior distribution of θ comprises certain dependencies between the parameters due to the specified restrictions.Concretely, we have to solve the following two systems of linear equations with l equations and l(l − 1)/2 unknowns (see the derivation in the Appendix, supplementary materials): In the case of l = 2 (all correlations equal to −1) and l = 3 (all correlations equal to −0.5) the systems have unique solutions.
Under the assumption that all correlations are identical we can also solve the systems of equations for l ≥ 4 unequivocally: Therefore, the prior distribution of θ is representable by: where M stands for the prior variance of the regression parameters, which grows without limit, and the l×l dimensional matrix C = (c ij ) i,j=1,...,l is defined by c ij = 1 for i = j and c ij = c(l) for i = j.
Due to the properties of the normal distribution the joint distribution and the conditional distribution of the treatment effects are also normal: As a result the (uniform) prior variance ξ 2 is computable as and hence dependent on the number of levels of the categorical covariate.Compared to ordinary modeling (see formula ( 6)), ξ 2 is reduced by reparameterization.For example, if we consider the case l = 2, in ordinary modeling there is one free interaction parameter, while in reparameterized modeling there are two free interaction parameters whose sum must equal zero.That these two parameters then have lower variability due to the restriction seems reasonable.Specifically, the variance ξ 2 from ( 10) is reduced by a factor of 4 compared to (6), or ξ as a standard deviation by a factor of 2. This seems plausible if one parameter is split into two.
As already mentioned, reparameterized models are also applicable in the case of several subgroup factors with two levels.Under assumptions regarding the conditional prevalences of the covariates (see ( 7)) ξ 2 can be determined by using formula (10) with l = 2 (due to the binary of the covariates).It should be explicitly pointed out that the number of covariates affects the number of assumptions, though it is not included in the calculation of ξ 2 .
As a last point regarding the Simon approach, it should be mentioned that the measure p X i is only defined for binary subgroup factors.In the case of a non-binary categorical covariate X, we consider all pairwise combinations of levels and call the respective measure p X i,j (comparison of x i and x j ).From a methodical point of view, it is, in principle, possible to apply reparameterized models also to the Dixon-Simon approach provided that the generalized inverse is used again.However, the problem here is that all subgroup estimates get very close to the overall estimate due to an excessive shrinkage effect.As a consequence, it is hardly possible to differentiate between the subgroups even if there is a setting where the conventional estimates deviate widely from each other.Thus, it can be stated that there is no substantial benefit of using reparameterized models in the context of the Dixon-Simon approach.To get suitable results also in the case of a nonbinary categorical covariate we suggest determining the method once with each reference level using the ordinary modeling strategy and to average afterward.

Simulation Study
To explore the presented estimation methods in detail we conduct an extensive simulation study, which is partially based on a recent simulation study by Varadhan and Wang (2016).The main objective is to compare conventional and shrinkage estimators under a setting that is geared to a typical phase III study with a continuous endpoint.The implementation of the methods and all calculations, referred to Sections 3 and 4, are performed with the aid of R 3.3.1 (R Core Team 2016).

Data Generating
We choose for the simulations a two-arm study with a continuous response variable and generate data from the following two models: with e ∼ N 0, σ 2 (model type 2).
Type 1 includes three binary subgroup factors, which allows us to look at univariate and multivariate subgroups and to investigate the influence of sparse (the model contains only subgroup factors) and extensive modeling (several covariates which are not used as subgroup factors are incorporated into the model).In contrast, type 2 considers one trichotomous covariate X which is modeled through the dummy variables D i with D i = 1 {X=x i } (for i = 1, 2, 3).In both cases we set the parameters to the following values: μ = 10, β i = 5 (for i = 1, 2, 3) and σ = 50.The remaining parameters τ and γ i (for i = 1, 2, 3) are fixed depending on the respective scenario.Beyond that, we vary the overall sample size (N) of the generated dataset and-concerning the subgroup factors-the prevalences of the levels and the correlations.All data generating scenarios of model type 1 are represented in Table 1.Ensuing from a base case scenario (scenario 1) with a medium overall sample size (88% power for a two-sided detection based on an overall effect of 20 and a standard deviation of 50), mid-level prevalences, no correlations, a medium value for τ and differing interactions, we initially change the value of one component (variation scenario).Afterward, we aim at finding joint effects and interdependencies through the simultaneous modification of two components (combination scenario).
Concerning the prevalences and correlations it should be noted that the following condition has to be met (see the derivation in the Appendix, supplementary materials): corr X i , X j ≤ min{p i , p j } − p i p j p i 1 − p i p j 1 − p j with p i = P (X i = 1) and p j = P X j = 1 .
Table 1.Overview of all data generating scenarios regarding model type 1.
In the case of model type 2, similar scenarios are considered.Because the sum of the prevalences (P (X = x 1 ) , P (X = x 2 ) , P (X = x 3 )) has to be equal to 1, we divide each prevalence by the sum.Moreover, it is impossible to incorporate correlations under model type 2 so that it is adequate to examine scenarios 1-4 and 6-11 (see Table 1).

Evaluation Models and Estimators
To explore the impact of sparse and extensive modeling regarding model type 1 we look at seven different models for the evaluation of the generated data-we call them evaluation models.Three of them contain only one binary covariate, further three contain two binary covariates and the seventh one contains all three covariates (full evaluation model).Please note the difference in meaning between the evaluation models and the data generating models represented in Section 3.1.Concerning each evaluation model, we determine estimates for all univariate subgroups and multivariate subgroups of maximum degree.For model type 2 we solely regard the full evaluation model (all three dummy variables).Since the dummy variables are based on the same covariate, it is only possible to look at univariate subgroups in this course.
For each mentioned subgroup we determine estimates using the naïve, the least squares, the Dixon-Simon, and the Simon approach.A special focus lies on the latter one in consequence of the high impact of the parameter π .This specifically means that we compute Simon estimators with π = 0.01, π = 0.05 and π = 0.20 (in combination with δ = 20).Hence, for each subgroup, we consider nine estimates (naïve estimate, least squares estimate, Dixon-Simon estimate, three ordinary modeled Simon estimates, three reparameterized modeled Simon estimates) in the case of model type 1 and six estimates (naïve estimate, least squares estimate, Dixon-Simon estimate, three reparameterized modeled Simon estimates) in the case of model type 2. Due to the several evaluation models, we look in general at 396 (18) subgroup estimates per scenario of model type 1 (2).In addition to this, we compute a confidence (naïve, least squares approach) or a credibility interval (Simon, Dixon-Simon approach) for each point estimate at a level of 0.95.To compare the point and interval estimates with respect to their goodness we consider the mean square error (MSE), the interval width, and the coverage probability.Moreover, we always look at the measures p X i or p X i,j in the course of the Simon approach.Motivated by p-values we determine whether the computed values are less than or equal to 0.05.
We generate 1000 datasets per scenario which ensures high reliability.For the conduction of the simulation study, we use a Windows 2008 Server with 40 CPUs (80 logical processors through Hyper-Threading) and 32 GB working memory.Through parallel computing (40 cores), the required time per scenario of model type 1 (2) is about eight (two) hours.

Results
In the following, we present the most important results of our simulation study.First, it should be mentioned that the shrinkage methods provide much better results than the conventional approaches regarding the mean square error and the interval range under nearly all considered investigation cases.The shrinkage estimators usually have a higher bias, but also a lower variance as compared to the conventional estimators.Since the variance advantage outperforms the bias disadvantage in the context of the considered scenarios, the shrinkage estimators outperform the conventional estimators.
To illustrate the superiority of the shrinkage estimators we exemplarily use Figure 1, which refers to the base case scenario (scenario 1) of model type 1.As indicated in Table 1, we have chosen high interaction effects for X 1 , medium interaction effects for X 2 , and no interaction effects for X 3 under this scenario.Therefore, it becomes obvious that the shrinkage methods have exceeding advantages concerning subgroups with a low interaction effect.This seems natural because of the construction idea and the mathematical design of the shrinkage estimators.
An important point regarding the question of whether it is better to use a conventional or a shrinkage estimator is the sample size (scenarios 2 and 3). Figure 2 shows that the performances of all estimators, but especially of the conventional estimators, are highly dependent on the number of patients within the study.It is intuitively clear that the conventional estimators lead to the best results in the case of very large sample sizes because the variances become negligible.However, the boundary from which the conventional estimates outdo the Figure 1.Simulation-based comparison of the conventional and the shrinkage approaches regarding the MSE under the base case scenario (scenario 1, model type 1); a special focus lies on the consideration of different univariate subgroups which differ in the applied interaction effect; all subgroup estimates are calculated with the aid of the evaluation model which only contains the respective subgroup factor; the attached short forms "ordina"and "repara"stand for ordinary and reparameterized modeling, respectively.
Figure 2. Simulation-based comparison of the conventional and the shrinkage approaches regarding the MSE under different overall sample sizes (N) according to the scenarios 1 to 3 (model type 1); a special focus lies on the contrast between the univariate subgroups X 1 = 1 and X 3 = 1 which distinctly differ in the applied interaction effect; all subgroup estimates are calculated with the aid of the evaluation model which only contains the respective subgroup factor; the attached short forms "ordina" and "repara" stand for ordinary and reparameterized modeling, respectively.shrinkage estimators is not yet reached under a typical phase III setting with 500 patients.It should be noted that the sample size of particular subgroups also depends on the chosen prevalences (scenario 4) and correlations (scenario 5).
As already mentioned, we regard seven different evaluation models per scenario of model type 1.In most cases, the impact of the included covariates which do not function as subgroup factors is rather slight.However, there is an exception that is related to scenario 5 where we incorporate a relatively high correlation between X 1 and X 2 into the data-generating model.Figure 3 indicates for the shrinkage approaches that it is worthwhile to include covariates that are correlated with the respective subgroup factor.Such an effect is not identifiable for the conventional estimators.
The variations of the interactions (scenarios 6 and 7) have the expected impact on the concerned subgroups such that a reduction of the applied interactions favors the shrinkage estimators in comparison to the conventional estimators.By way of contrast, there is no appreciable impact of the chosen value of τ (scenarios 8 and 9).The simultaneous variations of two components (combination scenarios 10-14) basically result in the cumulative effects, that is, noteworthy interdependencies between the components do not occur.
In the course of the simulation study, we have seen that both conventional estimates are usually similar or identical.An exception is constituted by multivariate subgroups.Considering Figure 4 it becomes apparent that the least squares estimator is much more suitable for multivariate subgroups than the naïve Figure 3. Simulation-based comparison of the different evaluation models (the included covariates are indicated on the horizontal axis) concerning the shrinkage estimates for the univariate subgroup X 2 = 1; the illustration is based on scenario 5 of model type 1 which especially means that there is a correlation of 0.4 between X 1 and X 2 ; the attached short forms "ordina" and "repara" stand for ordinary and reparameterized modeling, respectively.
Figure 4. Simulation-based comparison of the conventional and the shrinkage approaches regarding the MSE concerning multivariate subgroups under the base case scenario (scenario 1, model type 1); a special intention is to show the difference between the naïve and the least squares approach; the underlying evaluation model inevitably contains all three subgroup factors; the attached short forms "ordina" and "repara" stand for ordinary and reparameterized modeling, respectively.
estimator.The graphic also underlines the assertion that the shrinkage estimators reach even better results in the case of low interaction effects (see subgroups (X 1 = 0, X 2 = 1, X 3 = 1) and (X 1 = 1, X 2 = 0, X 3 = 1)) and small sample sizes as they inevitably occur within multivariate subgroups.
The strength of the shrinkage approaches does not only become apparent regarding the mean square error but also through the consideration of the interval estimates which are noticeably narrower in most cases (see tables 2 and 3 in the Appendix, supplementary materials).Under the base case scenario (scenario 1, model type 1) the interval reduction of the Dixon-Simon and Simon estimates (with suitable π ) compared to the least squares estimates (naïve estimates) is about 20% (20%) for univariate subgroups and more than 30% (50%) for multivariate subgroups of the third degree.
Despite the advantages of the shrinkage estimators, there are also some problems that become evident considering tables 2 and 3 (Appendix, supplementary materials).First, it should be noted that there are some issues with the coverage probability in the case of large interaction effects concerning the Dixon-Simon approach and the Simon approach using a small value of π .Under the base case scenario (scenario 1, model type 1) the coverage probability is also for the critical interval estimates not less than 0.87 (univariate subgroups) and 0.81 (multivariate subgroups), respectively.The lowest coverage probabilities (slightly above 0.50) occur for multivariate subgroups under scenario 7 due to the combination of the high interaction effects for X 1 , X 2 , and X 3 .Beyond the amount of interaction, the coverage probability of the Dixon-Simon interval is negatively influenced by large sample sizes and interactions in the positive direction (e.g., the coverage probability regarding X 1 = 1 is almost always lower than regarding X 1 = 0).Furthermore, the reparameterized version of the Simon approach using a large value of π leads to immoderately wide credibility intervals because of the overestimation of the variance.Concretely, the Simon intervals (reparameterized) for π = 0.20 are 75% to 95% wider for univariate subgroups than the naive intervals; for multivariate subgroups both are in the same order of magnitude.
In addition to this, there is the problem that the credibility interval of the Dixon-Simon approach cannot be determined in a few cases (nearly 2% of all applications).However, this seems to be rather an implementation than a methodical issue.
Similarly, all mentioned results, except for the assertions regarding the correlations, also pertain to model type 2.
In view of p X i and p X i,j the simulation study shows that these measures basically do what they are supposed to do although it is extremely hard to detect differences between (complementary) subgroups.For this purpose, large sample sizes are necessary.

Application to a Real Dataset
To get a better impression of the functionality of the presented shrinkage approaches we apply them to a real dataset.

Dataset
The dataset comes from the phase III, double-blind study PATENT-1 (Ghofrani et al. 2013), which was funded by Bayer HealthCare.It was designed to examine the efficacy and safety of the drug riociguat in patients with pulmonary arterial hypertension (PAH).
The dataset includes 380 patients from which 254 study participants received riociguat with a maximum dose of 2.5 mg and 126 received a placebo.The primary efficacy endpoint is exercise capacity, which is measured by change from baseline to week 12 in the six-minute walking distance (6MWD).The 6MWD baseline performance is always included in the regression model as a continuous covariate.Furthermore, we consider sex, the WHO functional class (a means of classifying disease severity in PAH), the binarized baseline performance (with cutpoint 320 meters), and the pulmonary arterial hypertension classification (based on etiology) as potential subgroup factors.

Some Basic Applications
In a first step, we show through a simple model, where the WHO functional class is the sole binary covariate, to what extent the shrinkage estimators differ from the conventional estimators (see Figure 5).It becomes apparent that the Dixon-Simon estimates lie roughly in the middle between the conventional estimates and the overall estimate.The Simon estimates are also located within these areas but are highly dependent on the value of π .As we have expected based on formula (6), for a small π the Simon estimates are close to the overall estimate and for large π they approach the conventional estimates.Moreover, the conventional estimates are very close together.An important aspect that is not evident from Figure 5 is that the generated credibility intervals of the Simon approach are much narrower than the confidence intervals of the conventional procedures.This finding results from lower variances.
The measure p X i supplies a value of 0.068 for the illustrated example, which suggests that there might be a difference between the two complementary subgroups defined by the WHO functional class, whereas the least squares approach detects a highly significant interaction (p = 0.005).
Besides the WHO functional class, it is possible to incorporate several binary covariates into the regression model without using them as subgroup factors.The impact on the estimates, however, is relatively low-no matter whether we introduce a correlated covariate (binarized baseline performance), an uncorrelated covariate (sex), or both.
Taking several covariates into account, there is the possibility to look at multivariate subgroups.In this context, it becomes apparent that in some cases the conventional approaches, especially the naïve approach, provide very extreme estimates with large variances driven by a few patients.By way of contrast, the shrinkage methods lead to much more conservative results because they do not trust only the patients belonging to the respective subset, but borrow strength from the whole study population.Therefore, the estimates are much closer to the overall estimate, have noticeably lower variances, and prevent too rash conclusions.Figure 6 underlines these assertions with the aid of two exemplary multivariate subgroups of the third degree.As mentioned in Section 2.6, the results of the shrinkage approaches depend on the specification of the reference level in the case of one categorical subgroup factor with more than two categories which is modeled ordinarily.Figure 7 provides an example based on the pulmonary arterial hypertension classification.If the reference level was independent, all lines would be parallel to the horizontal axis.It is not just that this does not pertain; the lines rather intersect at some points which means that we get not only different estimates but different rankings.Consequently, we need the alternative modeling strategy presented in Section 2.6.Regarding the Simon approach, we consider reparameterized models which lead to slightly intensified shrinkage effects, whereby the estimates shift a bit closer to the overall estimate.This applies to models with one non-binary and several binary subgroup factors as well.
For the Dixon-Simon approach the reparameterized modeling results in estimates which are so serried that any differentiation between the subgroups is nearly impossible.Exemplarily, we refer to Figure 5 where the difference between the subgroups amounts to nearly 50 meters for the conventional estimators and more than 20 meters for the Dixon-Simon estimator (ordinary modeling).Using a reparameterized model the Dixon-Simon approach yields a difference of fewer than two meters.Hence, we suggest in the case of one nonbinary categorical covariate to determine the ordinary modeling procedure once with each reference level and to average afterward.

Discussion
In this article, we have distinctly shown the advantages of two shrinkage approaches, which incorporate information from the overall population to compensate for issues caused by small sample sizes and multiplicity.Such a "borrowing strength" strategy is also of particular importance in the context of posthoc analyses and subgroup factors whose consideration is not based on biological or clinical rationale.It should be pointed out once again that the choice of the best estimation method is dependent on the respective setting.Thus, the quality of the Simon estimator depends on a suitable choice of the design parameter π , which has to be specified as well as possible using prior information.It is intuitive that the higher the expected subgroup-specific effect, the higher the chosen value of π should ordinarily be.According to our simulation study, the Dixon-Simon approach works particularly well for subgroups with a non-extreme interaction effect.
If there are, in fact, subgroup-specific treatment effects, there is a boundary concerning the overall and subgroup-specific sample sizes (e.g., dependent on the dispersion within the data) from which the conventional approaches should be preferred.Under the scenarios of our simulation study, which has been oriented toward a typical phase III study with a continuous outcome, this boundary has not yet been reached.
To provide some assistance regarding the practical usage we briefly discuss which choice of π is suitable as the default setting for the Simon approach.Thus, in the course of the simulation study, it has been shown that a value of π = 0.05 provides (very) good results under all considered investigation cases and does not come into contact with the above-mentioned issues regarding the coverage probability and the interval width.Therefore, in case a user does not have a clear idea of the specification of π due to a lack of previous information, a value of π = 0.05 is appropriate.Such a recommendation is very useful, especially for confirmatory clinical studies where details have to be specified in advance.
In this article, we proposed the usage of reparameterized models in the case of a nonbinary subgroup factor.However, provided that the fixing of the reference level is unambiguous it might be worth considering applying the ordinary modeling strategy despite knowing that the results depend on the reference level.
Furthermore, due to the complexity of the sum-to-zero constraint, it would also be conceivable to use a simpler form of reparameterization based on contrasts, as applied in the R function contr.sum.Here a categorical variable X with the realisation x 1 , x 2 , and x 3 would be modeled by the dummy variables D Even if no reference level in the classical sense would have to be defined with this form of modeling, there would be nevertheless a level (in the above case x 3 ), which leads with all dummy variables to the value (−1) and thus has a kind of special status in comparison to the other levels.Whether the choice of this level influences the later results in the same way as it could be observed for the choice of the reference level in the ordinary modeling would have to be investigated in more detail.
Beyond the comprehensive investigation of the shrinkage estimators, our simulation study has also shown an interesting result for proponents of frequentist analysis: It has become apparent that the so-called least squares approach is preferable to the naïve approach because of the obvious advantages concerning multivariate subgroups.
Furthermore, the simulation study has brought up some issues which will be further investigated.Regarding the wide credibility intervals which are provided by the reparameterized version of the Simon approach in the case of a large value of π , it might be an alternative to use the same modeling strategy as for the Dixon-Simon approach (ordinary modeling procedure once with each reference level, afterward averaging).Another aspect that should be considered as part of future investigations concerns the questionable assumptions regarding the conditional prevalences of the covariates (see ( 7)).
Finally, we mention some options for model-related extensions.Instead of a linear regression model a logistic regression model or a Cox model can be applied as well serving as the basis of the represented methods.For another thing, it would be interesting to examine models with several nonbinary subgroup factors and to incorporate interactions of higher dimensions, which come into effect through the interplay of several covariates.While the focus of our work so far has been on the investigation of categorical subgroup variables, it would be worthwhile in the future to also consider relations between continuous variables and the treatment variable by integrating corresponding interactions.Concerning the reparameterized models it might be reasonable to incorporate the subgroup-specific sample sizes within the linear restrictions, that is, l i=1 n i β i = 0 and l i=1 n i γ i = 0 (Toutenburg 2003, pp. 197 f.), especially in the case of unbalanced data.

Figure 5 .
Figure 5. Introductory comparison of the conventional and the shrinkage approaches based on the real dataset (Section 4); the illustrated estimates refer to the binary subgroup factor WHO functional class; a special focus lies on the influence of π on the amount of shrinkage concerning the Simon approach; the horizontal gray line represents the overall estimate.

Figure 6 .
Figure 6.Conventional and the shrinkage estimates for two multivariate subgroups of the third degree based on the real dataset (Section 4); the subgroups are defined by the WHO functional class, sex, and the binarized baseline performance; the box size is proportional to the precision of the corresponding estimate; beyond each point estimate a confidence (naïve, least squares approach) or a credibility interval (Simon, Dixon-Simon approach) is plotted; the vertical gray line represents the overall estimate.

Figure 7 .
Figure 7. Impact of the chosen reference level on the shrinkage estimates based on the real dataset (Section 4); the pulmonary arterial hypertension classification is used as a trichotomous subgroup factor; different line types represent the three subgroups whereas the reference level is illustrated on the horizontal axis; "CTD" stands for connective tissue disease.