A Bayesian model for combining standardized mean differences and odds ratios in the same meta-analysis

ABSTRACT In meta-analysis practice, researchers frequently face studies that report the same outcome differently, such as a continuous variable (e.g., scores for rating depression) or a binary variable (e.g., counts of patients with depression dichotomized by certain latent and unreported depression scores). For combining these two types of studies in the same analysis, a simple conversion method has been widely used to handle standardized mean differences (SMDs) and odds ratios (ORs). This conventional method uses a linear function connecting the SMD and log OR; it assumes logistic distributions for (latent) continuous measures. However, the normality assumption is more commonly used for continuous measures, and the conventional method may be inaccurate when effect sizes are large or cutoff values for dichotomizing binary events are extreme (leading to rare events). This article proposes a Bayesian hierarchical model to synthesize SMDs and ORs without using the conventional conversion method. This model assumes exact likelihoods for continuous and binary outcome measures, which account for full uncertainties in the synthesized results. We performed simulation studies to compare the performance of the conventional and Bayesian methods in various settings. The Bayesian method generally produced less biased results with smaller mean squared errors and higher coverage probabilities than the conventional method in most cases. Nevertheless, this superior performance depended on the normality assumption for continuous measures; the Bayesian method could lead to nonignorable biases for non-normal data. In addition, we used two case studies to illustrate the proposed Bayesian method in real-world settings.


Introduction
Meta-analysis is a widely used statistical tool to combine findings from multiple existing studies that address common research questions in a systematic review (Borenstein et al. 2009;Gurevitch et al. 2018;Murad et al. 2014). It provides more precise estimates than individual studies that usually warrant higher certainty (Hultcrantz et al. 2017). To avoid potential bias, the selection of studies in a meta-analysis is based on pre-specified inclusion and exclusion criteria (e.g., relevance to the research question, study quality, etc.) that do not depend on study results (e.g., effect measures reported by the studies) (Moher et al. 2009;Seidler et al. 2019). Conventional statistical methods for meta-analysis typically assume that a specific effect measure is consistently estimated across all studies; each study is supposed to report a point estimate of the effect measure and its sample variance Laird 1986, 2015). However, these assumptions are not always true in practice.
A common scenario of inconsistent reporting is that, for the same disease outcome, some studies report results as continuous measures, but other studies report results as binary measures, which are obtained by dichotomizing certain latent, unreported continuous measures. In the first type of studies, the reported continuous outcomes could be measured on different scales. For example, in a metaanalysis investigating the efficacy of antidepressants (detailed in Section 4.1) (Cipriani et al. 2016), the collected randomized controlled trials (RCTs) reporting continuous outcome measures use various scoring tools to evaluate major depressive disorders. These scoring tools include the Children's Depression Rating Scale Revised, Beck Depression Inventory, Children's Depression Inventory, among others.
In the second type of studies reporting binary outcomes, the measures are typically counts of events, which are defined by applying certain cutoff values to some latent continuous measures. The choice of cutoff values could greatly affect the binary diagnosis results (Rotenstein et al. 2016). For example, in studies on depression, a subject may be diagnosed with depression if the Beck Depression Inventory score is at least 10. An intervention may be considered effective for successfully treating a patient with depression if the depression rating score is reduced by at least 50% during the follow-up period.
For analyzing the continuous outcome measures possibly on different scales in the first type of studies above, meta-analysts commonly choose the standardized mean difference (SMD) as the effect measure (Lin and Aloe 2021;Mayer 2019;Murad et al. 2019;Takeshima et al. 2014). The SMD is calculated by dividing the mean difference between the continuous outcome measures in the treatment and control groups by a common standard deviation (SD). On the other hand, for the second type of studies, one may only directly obtain the effect measures for binary outcomes, such as the odds ratio (OR), relative risk, and risk difference. Not combining the two types of studies in the same metaanalysis leads to potentially imprecise estimates that do not represent the totality of evidence.
For combining the continuous and binary estimates in the same analysis, the results from all individual studies need to be converted to a common effect measure (Higgins et al. 2019). Several methods have been proposed for the conversion between the SMD and OR (Chinn 2000;Cox and Snell 1989;Furukawa et al. 2005;Hasselblad and Hedges 1995;Suissa 1991). The most popular method is perhaps the one initially suggested by Hasselblad and Hedges (1995), later promoted by Chinn (2000), and recommended in the latest version of the Cochrane Handbook for Systematic Reviews of Interventions (Chapter 10.6) (Higgins et al. 2019). It has been widely used in meta-analysis applications (Allotey et al. 2020;Belbasis et al. 2016;Gandhi et al. 2008;Murad et al. 2010;Theodoratou et al. 2014). This conventional method converts the log OR to the SMD by multiplying a constant coefficient of 0.55; inversely, the SMD is converted to the log OR by multiplying 1.81. Due to its simplicity, it can be readily implemented by practitioners. This constant coefficient is derived under the assumption that the continuous measures in both treatment and control groups follow logistic distributions. However, it is more common to assume that the continuous measures follow normal distributions in meta-analysis practices, so this conversion method could potentially produce biases in the converted results. This method's assumption of logistic distributions is entirely intended for yielding a constant conversion coefficient, and it may not be appropriate in some applications. Section 2 will further illustrate this problem. Several other methods make the more commonly used normality assumption (Suissa 1991;Whitehead et al. 1999), but the conversion depends on certain cutoff values for dichotomizing the continuous measures to define binary events. Such cutoff values may not be reported in the original studies, and they could differ dramatically across studies in a meta-analysis, particularly if the studies use different scales for which different cutpoint values are meaningful. Therefore, it may be challenging to select an appropriate cutoff value to produce a meta-result.
This article proposes a new method to synthesize SMDs and ORs under the Bayesian framework. This new method assumes normal (rather than logistic) distributions for continuous outcome measures. It models the study-specific cutoff values as nuisance parameters in the Bayesian hierarchical model. In comparative effectiveness research, a meta-analysis is typically interested in comparisons between treatments (e.g., measured by the SMD and OR) instead of baseline risks (reflected by the cutoff values for continuous measures). In addition, Bayesian methods have some advantages over the conventional conversion methods that are performed under the frequentist framework (Al Amer et al. 2021;Higgins et al. 2009;McGlothlin and Viele 2018;Schmid 2001;Sutton and Abrams 2001). For example, Bayesian methods account for full uncertainties in treatment effect estimates, while conventional frequentist methods typically treat within-study sample variances as fixed, known values, which may be problematic in cases such as small sample sizes or low event rates (Hamman et al. 2018;Lin 2018). In addition, researchers can incorporate some informative priors in parameter estimation via Bayesian methods, which could improve the precision of meta-results (Turner et al. 2012).
This article is organized as follows, Section 2 reviews the conventional conversion method for the OR and SMD and introduces the Bayesian method to synthesize SMDs and ORs. Then, we present simulation studies to compare the performance of these methods in Section 3 and give two case studies to illustrate the use of the proposed method in a real-world setting in Section 4. Finally, Section 5 concludes this article with brief discussions.

Notation
Suppose a meta-analysis contains N and M studies that report continuous and binary outcome measures, respectively. Let i ¼ 1; . . . ; N index the studies with continuous outcomes and i ¼ N þ 1; . . . ; N þ M index those with binary outcomes. Moreover, j ¼ 0 and 1 denote the control and treatment groups, respectively. The sample size of group j in study i is denoted by n ij , which is reported by all studies.
We assume that the binary outcomes in the M studies are produced by dichotomizing some latent continuous measures at certain cutoff values. For all N þ M studies, suppose y ijk is the continuous measure of subject k from group j in study i for k ¼ 1; . . . ; n ij ; it is assumed to follow a normal distribution with mean μ ij and variance σ 2 i . Here, μ ij represents the true mean measure of group j, and σ 2 i is a common variance for both groups 0 and 1 in study i. The homoscedasticity assumption is generally valid for most applications. Of note, such continuous measures for individual subjects are typically unobserved for both the N studies with continuous outcomes and the M studies with binary outcomes, unless individual patient data are available.
For the N studies of continuous outcomes, the commonly reported group-specific results include sample means and corresponding sample variances, denoted by � y ij and s 2 ij , respectively, in study i's group j. The sample mean � y ij has a normal distribution: � y ij ,N μ ij ; σ 2 i =n ij � � . After multiplying a coefficient, the sample variance s 2 ij follows a chi-square distribution: n ij À 1 À � s 2 ij =σ 2 i ,χ 2 n ij À 1 ; it is statistically independent of the sample mean � y ij (Casella and Berger 2001).
For the M studies with binary outcomes, the continuous measures of individual subjects y ijk are latent variables. In each study i ¼ N þ 1; . . . ; N þ M, we suppose that binary events are defined based on such latent variables at a study-specific cutoff value c i . These cutoff values may vary across studies because different studies may use different criteria for defining events. In the following methodological materials, we tentatively assume that a subject has the event if its latent continuous variable is greater than c i . This is the case, e.g., when using the scoring tools to diagnose depression. In other cases, the event may occur if the latent continuous variable is less than c i . For example, a patient may be considered recovering from depression if the change of the rating scores during a follow-up period is less than some negative value. The information about the cutoff values c i may not be reported in the studies with binary outcomes. Instead, such studies typically report event counts, denoted by r ij for group j in study i. They follow binomial distributions: r ij ,Bin n ij ; p ij À � , where p ij 's represent true event probabilities. The event counts and sample sizes in both groups form a 2 � 2 table for each study.

Conventional conversion method
Among the various existing methods for the conversion between the SMD and OR, this article focuses on the one described by Hasselblad and Hedges (1995) and Chinn (2000). This method has been widely used in the current practice of meta-analyses, and it has been shown to perform adequately well compared with other alternatives (Anzures-Cabrera et al. 2011;da Costa et al. 2012;Mayer 2019).
This method assumes logistic distributions for the continuous measures y ijk in both groups in each study, with a common scale parameter (reflecting variance) but different location parameters (reflecting group-specific means). Specifically, the logistic distributions have cumulative distribution functions (CDFs) 1= 1 þ e À xÀ μ 0 ð Þ=β h i and 1= 1 þ e À xÀ μ 1 ð Þ=β h i for the control and treatment groups, respectively. Here, β is the scale parameter; μ 0 and μ 1 are the location parameters, and they are means in the two groups. For both logistic distributions, the variance is π 2 β 2 =3, which equals to σ 2 ; therefore, β ¼ ffi ffi 3 p π σ. Let c be the cutoff value for defining the binary event. Based on the CDFs of the logistic distributions, the log odds for the control and treatments groups are À cÀ μ 0 β and À cÀ μ 1 β , respectively. The log OR is subsequently computed as Inversely, the equation for converting the log OR to SMD is The cutoff value c is canceled out during the derivation process, so it is unnecessary to know this value when performing the conversion under the assumption of logistic distributions.

Conventional estimation method
The parameters in the above derivations represent true values; they need to be estimated in practice.
For studies i ¼ 1; . . . ; N with continuous outcomes, the SMD can be estimated as where s i;pool is the pooled sample SD: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi The variance of θ i can be estimated as (Hedges and Olkin 1985) d Of note, the estimator θ i here is often referred to as Cohen's d (Cohen 1988). Several variants are available for both the point and variance estimators of the SMD (Lin and Aloe 2021). For example, an alternative estimator is Hedges' g; it is unbiased within studies, while Cohen's d is subject to bias when sample sizes are small. This article uses the above equations for Cohen's d instead of the many variants to estimate the SMD because we are interested primarily in combining SMDs and ORs instead of comparing the different SMD estimators. Also, although Hedges' g is unbiased within individual studies, such unbiasedness is not guaranteed in the meta-result when using conventional metaanalysis methods (Doncaster et al. 2018;Hamman et al. 2018;Lin and Aloe 2021;Lin 2018).
For studies i ¼ N þ 1; . . . ; N þ M with binary outcomes, the OR is estimated as c Þ from the four data cells in the 2 � 2 table of each study. It is usually analyzed on the logarithmic scale, i.e., d logOR i ¼ log r i1 n i1 À r i1 À log r i0 n i0 À r i0 . In the presence of a zero count in one group, a continuity correction of 0.5 is conventionally applied to all 4 data cells in the corresponding study to make the (log) ORs calculable (Walter and Cook 1991). If both groups report zero counts in a study, conventional meta-analysis methods discard this study because its OR is not estimable. However, such practice could lose important information and bias meta-results (Xie et al. 2018;Xu et al. 2020). To combine both the binary and continuous data, we may convert these ORs to SMDs using the method reviewed in Section 2.2. Specifically, the converted SMD, denoted by θ i , is computed as and its variance is approximated as After the conversion, both θ i for i ¼ 1; . . . ; N and θ i for i ¼ N þ 1; . . . ; N þ M are estimates of SMDs. They can be subsequently combined using conventional meta-analysis methods (Borenstein et al. 2010; DerSimonian and Laird 1986).

Limitation of the conventional method
Although the conventional method for the conversion between the SMD and OR is easy to use, it assumes logistic distributions for the continuous measures in both groups. This assumption is arguably ad hoc; it is only intended to obtain a constant coefficient for the conversion. Under the more commonly used normality assumption, the cutoff value c could affect results. Specifically, unlike the convenient formula of the log OR in Equation (1), the event probabilities are based on dichotomizing normally distributed continuous measures at c are in the control and treatment groups, respectively. Here, Φ � ð Þ is the CDF of the standard normal distribution. We may call À cÀ μ 0 σ the negative standardized cutoff value and denote it by α. Note that À cÀ μ 1 is the SMD. Therefore, given a specific value of θ, the resulting log OR is a function of α: (2) Figure 1 shows the relationship between the log OR and (negative standardized) cutoff value at several SMD values. Using the conventional method, the converted log OR does not depend on the cutoff value, so it would appear as a horizontal straight line corresponding to each SMD, depicted by the cross points in the plot. Under the normality assumption, the function of the log OR is no longer a horizontal straight line, except when the SMD is 0. When the SMD is close to 0, the log OR generally changes slightly as the cutoff value changes. However, when the SMD is large in absolute magnitude, the cutoff value could greatly affect the log OR. For example, for the SMD = � 2, the log OR changes by over 2 (i.e., over sevenfold change for the OR) as the (negative standardized) cutoff value changes from -2 to 2. The log OR under the normality assumption is away from that obtained from the conventional method, particularly when the (negative standardized) cutoff value is away from 0. This case corresponds to rare events (or non-events). In general, when applying the conventional method for the conversion between the SMD and OR, extra attention is needed for large effect sizes or extreme cutoff values (which likely lead to small counts in 2 � 2 tables). This limitation of the conventional method motivates us to develop a Bayesian method to synthesize SMDs and ORs that properly accounts for cutoff values.

Proposed Bayesian method
This subsection introduces the Bayesian approach to combining SMDs and ORs, where the (latent) continuous measures for individual subjects in all studies are assumed to follow normal distributions. For studies i ¼ 1; . . . ; N with continuous outcomes, recall that � y ij ,N μ ij ; σ 2 i =n ij � � is the mean measure of n ij subjects in study i's group j and s 2 ij is the sample variance. The conventional method reviewed above treats the within-study variances as known, fixed values. In fact, however, s 2 ij is a random variable, and n ij À 1 À � s 2 ij =σ 2 i ,χ 2 n ij À 1 . Because the chi-square distribution is a special case of the gamma distribution, it can be shown that s 2 ij follows the gamma distribution with shape parameter n ij À 1 2 and scale parameter 2σ 2 i n ij À 1 (equivalently, rate parameter ). In the following, we will denote it by . . . ; N þ M with binary outcomes, recall that the number of events is r ij , following the binomial distribution with the event probability p ij . As illustrated earlier, the event probability in each group is σ i ; this is the negative standardized cutoff value of study i. Consequently, Relationship of the log odds ratio and standardized cutoff value under the normality assumption when the standardized mean difference varies from −2 to 2 by 0.5. The cross points represent the converted log odds ratios based on the conventional method assuming the logistic distribution.
where Φ À 1 � ð Þ is the inverse function of Φ � ð Þ, i.e., the probit link function. The latent SMD of study i is θ i ; it is essentially the difference between the event probabilities in the two groups after the probit transformation: The foregoing observations show that the SMD parameters in the studies with continuous outcomes are determined by μ ij and σ i , which are informed by the data � y ij , s ij , and n ij . On the other hand, the SMD parameters in the studies with binary outcomes are determined by the event probabilities p ij , which are informed by the data r ij and n ij . These relationships can be naturally inputted as hierarchies in a Bayesian model. To establish the connection between the SMDs from the two types of studies, we assume that the SMD parameters θ i are random effects, following the normal distribution with mean d and between-study variance τ 2 . The random-effects setting effectively captures the heterogeneity between studies, which is commonly used in practice (Higgins 2008). The parameter of primary interest in the final inference is d, which represents the overall SMD across all N þ M studies. Due to the random-effects assumption, d incorporates the information from both the N studies with continuous outcomes and the M studies with binary outcomes.
In summary, the full Bayesian hierarchical model is as follows: for studies i ¼ 1; . . . ; N with continuous outcomes This model consists of three parts. The first part is for the studies with continuous outcomes, where the first two lines present the likelihoods contributed by the sample means and sample variances, and the third line bridges the group-specific means, common SD, and SMD. The second part is for the studies with binary outcomes, where the first line presents the likelihoods contributed by the event counts, and the second and third lines present the probit links that bridge the group-specific event probabilities (negative standardized), cutoff value, and SMD. The third part uses the random effects for studyspecific SMDs to connect all studies. Figure 2 visualizes the proposed Bayesian hierarchical model. The large box on the left represents the N studies with continuous outcomes, and the one on the right represents the M studies with binary outcomes. The rectangles represent the reported data. The double-edged rectangles include the sample sizes n i0 and n i1 , which are fixed variables when designing studies. The single-edged rectangles denote observed results (realizations of random variables), including sample means � y ij and sample variances s 2 ij for continuous outcomes and event counts r ij for binary outcomes. The circles represent unobserved variables to be estimated. The solid and hollow arrows going from parent nodes to descendant nodes indicate the stochastic dependence and deterministic relationship, respectively. Priors are specified for the parameters without parent nodes, i.e., μ i0 , σ i , α i , d, and τ.

Remarks
In the proposed model, the specification in each part of the studies with continuous and binary outcomes exists in the literature; however, to the best of our knowledge, this article makes the first attempt to connect these two parts under the Bayesian framework. For example, the model specification for continuous outcomes is also described in Stevens (2011) and Zhang et al. (2015). Many articles also discuss models for binary data, but most focus on the logit link; see, e.g., Smith et al. (1995), Sutton and Abrams (2001) The proposed Bayesian hierarchical model can be implemented via the Markov chain Monte Carlo (MCMC) algorithm. We may assign the vague normal prior N 0; 100 2 ð Þ to μ i0 , α i , and d. Here, both μ i0 and α i are treated as study-specific nuisance parameters; μ i0 models the baseline mean response for the studies with continuous outcomes, and α i models the baseline risks (on the probit scale) for the studies with binary outcomes. The uniform prior U 0; 1 ð Þ may be used for the between-study SD τ; this range is generally reasonable for the scale of SMD. The inverse-gamma prior can be alternatively considered for τ 2 . The magnitudes of the continuous measures' variances σ 2 i may vary greatly across studies because they relate to the scales of measures. Therefore, instead of assigning the uniform priors to σ i , whose bounds are difficult to be determined on a study-by-study basis, we use the inverse-gamma prior for σ 2 i with small values 0.001 of both shape and rate parameters. The model specified in Section 2.5 assumes that the binary event occurs when the latent continuous variable is greater than a cutoff value. If the dichotomization of the binary event has an inverse direction, then In the Bayesian hierarchical model, the part of link functions in studies with binary outcomes becomes Φ À 1 p i0 ð Þ ¼ α i and Φ À 1 p i1 ð Þ ¼ α i À θ i ; the remaining parts of the model remain unchanged. With such a direction of dichotomization, similar modifications are needed for the conventional method. The log odds for the control and treatment groups become cÀ μ 0 β and cÀ μ 1 β , respectively. Thus, the log OR is The conversion coefficient is multiplied by -1 compared with that in Equation (1).
In addition, the proposed Bayesian model generates estimates of the overall SMD. If researchers are interested in the overall log OR, the conversion in Equation (2) can be used to produce its estimates based on a range of cutoff values. Specifically, the posterior median estimate of the overall SMD leads to a curve of the point estimates of the overall log OR, and the lower and upper bounds of the credible interval (CrI) of the overall SMD lead to a pointwise credible band of the overall log OR. Moreover, as discussed in the above paragraph, Equation (2) is applicable to the case that the binary event occurs when the latent continuous variable is greater than a cutoff value, and α in Equation (2) represents the negative standardized cutoff value. If the dichotomization of the binary event has an inverse direction, Equation (2) is replaced with where α represents the standardized cutoff value. Finally, based on the observations for binary outcomes in Section 2.5, Equation (3) implies a possible frequentist probit-based conversion method, which is similar to the conventional logitbased conversion method in Section 2.2. Specifically, we may use the estimates of event probabilities p ij ¼ r ij =n ij to estimate the SMD as By the delta method, its variance can be estimated as where ϕ � ð Þ is the probability density function of the standard normal distribution. Thus, the converted SMDs can be combined with the SMDs from studies with continuous outcomes, as in Section 2.3. As a secondary analysis, we will also investigate the performance of this probit-based conversion method via simulations.

Designs
We performed simulation studies to compare the performance of the frequentist method and proposed Bayesian method for combining SMDs and ORs. Recall that both the effect size and cutoff value could substantially affect the conversion between SMDs and ORs ( Figure 1); therefore, we considered several candidate values for both factors. Specifically, the true overall SMD d in a simulated meta-analysis was set to 0.2 (small), 0.5 (medium), 0.8 (large), 1, and 2 (huge), which describe different magnitudes (Cohen 1988). The (negative standardized) cutoff values α i were set to -1.5, 0, and 1.5, which were assumed to be the same for studies in a simulated meta-analysis. As a secondary analysis, we also considered the cases when the cutoff values were different across studies and generated from normal distributions. For each meta-analysis, we generated N studies with continuous outcomes and M studies with binary outcomes. We considered three sets of values 10; 10 f g, 30; 30 f g, and 10; 50 f g for N; M f g, representing relatively small and large numbers of studies and unbalanced numbers of studies with continuous and binary outcomes. The total sample sizes n i within the studies were set to 10; 20; . . . ; 100 f g (relatively small sample sizes) or 100; 200; . . . ; 1000 f g (relatively large sample sizes). For N or M = 30 or 50, the set of values was accordingly repeated 3 or 5 times. The ratio of sample sizes in the control and treatment groups was 1:1, so each group had n i =2 subjects. The between-study SD τ was set to 0.3 for the first set of (relatively small) sample sizes and 0.1 for the second set of (relatively large) sample sizes; these values produced relatively reasonable ranges of the heterogeneity measure I 2 , given that larger sample sizes generally led to smaller within-study variances (Higgins and Thompson 2002). For each simulation setting, we generated 1000 replicates of meta-analyses.
For the total N þ M studies in each meta-analysis, we sampled the study-specific underlying true SMDs θ i from N d; τ 2 ð Þ. The variances of the latent continuous measures σ 2 i were set to the values from 1.0 to 1.9 by 0.1; these 10 values were accordingly repeated 3 or 5 times if N or M was 30 or 50. Without loss of generality, the mean of the latent continuous measures of the subjects in the control group μ i0 was set to 0, and thus the mean in the treatment group was μ i1 ¼ μ i0 þ θ i σ i . We generated the subjects' continuous measures y ijk from N μ ij ; σ 2 i � � for j ¼ 0; 1. Other distributions were also considered for the continuous measures y ijk as a secondary analysis. They included the logistic distribution (which would favor the conventional conversion method) and skewed gamma distribution. By using appropriate location-scale transformations, they had the same mean μ ij and variance σ 2 i as the normal distribution. The Supplementary Material presents the details. For the N studies that finally reported continuous outcomes, we calculated the sample means � y ij and sample variances s 2 ij . For the remaining M studies that finally reported binary outcomes, we obtained the number of events r ij based on y ijk and α i (i.e., the counts of subjects with y ijk > μ i0 À α i σ i ).
We applied the conventional method to the simulated meta-analyses by converting the ORs to SMDs. For the N studies with continuous outcomes, the estimated SMDs θ i and corresponding sample variances were calculated based on � y ij , s 2 ij , and n ij . For the M studies with binary outcomes, the estimated ORs and corresponding sample variances were calculated based on r ij and n ij . In the presence of a zero count, we added 0.5 to each cell of the corresponding 2 � 2 table. The conventional method was subsequently applied to calculate the converted SMDs θ i from the ORs (as well as their variances). Recall that the conventional method is logit-based. We also examined the performance of the probit-based conversion method discussed in Section 2.6 as a secondary analysis. Finally, we used the random-effects model to synthesize the observed SMDs θ i from the N studies with continuous outcomes and the converted SMDs θ i from the M studies with binary outcomes. This model was implemented using the function rma() in the R package "metafor" with the restricted maximumlikelihood (REML) estimator for heterogeneity variance (Viechtbauer 2010(Viechtbauer , 2021. We obtained the estimated overall SMDs and their 95% confidence intervals (CIs). Of note, when using rma(), computational errors possibly occurred when the REML algorithm failed to converge. We discarded such cases and continued to simulate meta-analyses until we obtained 1000 replicates that did not yield errors. We recorded the error counts and calculated their rates.
The proposed Bayesian approach was implemented using the R package "rjags" via the MCMC algorithm (Plummer 2021). For each simulated meta-analysis, we used three Markov chains and generated 10,000 posterior samples after a 5000-run burn-in period. The thinning rate was 2. We obtained the posterior medians of the overall SMDs d and between-study standard deviations τ as their point estimates and the 2.5% and 97.5% posterior quantiles as the lower and upper bounds of their 95% CrIs.
The bias, root mean squared error (RMSE), and CI/CrI coverage probability were used to measure the performance of the two methods. Their Monte Carlo standard errors were computed to quantify their uncertainties. In each simulation setting, the bias was calculated as B À 1 P B b¼1d b À d, where B = 1000 was the number of simulated meta-analyses, d b was the estimated overall SMD in the b th simulation replicate, and d was the true SMD. The RMSE was , and the coverage probability was

Results
Across all settings, the Monte Carlo standard errors of biases of both methods were less than 0.005, those of RMSEs were less than 0.004, and those of coverage probabilities were less than 1.6%. The following interpretations about biases were in absolute magnitude. Table 1 summarizes the results when the true overall SMD was close to 0 (d = 0.2). The Bayesian method generally produced smaller biases than the conventional method in most settings, while their differences were tiny. When the (negative standardized) cutoff value α i was -1.5 and sample sizes n i were small, the Bayesian method produced slightly larger biases than the conventional method. The results by the Bayesian method were noticeably less biased than the conventional method for α i = 1.5 and 0. RMSEs produced by the Bayesian method were smaller than the conventional method when the sample size was large, but they were slightly larger when the sample size was small, except for α i = 1.5, N = 10, and M = 50. In nearly all settings, the Bayesian method produced CrIs with noticeably higher coverage probabilities than the conventional method. The coverage probabilities of the Bayesian method ranged from 93.3% to 95.6%, while the coverage probabilities of the conventional method could be as low as 80.5%. Table 2 presents the results when the true overall SMD d was increased to 0.5. The Bayesian method had smaller biases than the conventional method in most cases. The differences between the two methods in biases became larger than those in Table 1. Only when α i = À 1:5 and sample sizes n i were within 10-100, the biases of the conventional method were smaller than those of the Bayesian method. RMSEs of the Bayesian method were also generally smaller than those of the conventional method. The Bayesian method had higher coverage probabilities (mostly greater than 93%) than the conventional method except for one setting. When α i = -1.5, N = 10, M = 50, and sample sizes were within 10-100, the coverage probability of the Bayesian method dropped to 89.8%. The coverage probabilities of the conventional method could decrease to 29.6% when α i = 0, N = 10, M = 50, and sample sizes were within 100-1000.
As d further increased to 1, the results produced by the two methods had more dramatic differences, as shown in Table 3. When α i was 1.5 or 0, the Bayesian method had much smaller biases than conventional method, while the conventional method continued to have slightly smaller biases when α i was -1.5. The RMSEs of the Bayesian method ranged from 0.020 to 0.133, and those of the conventional method had slightly larger RMSEs in most settings, ranging from 0.021 to 0.175. Their differences were mostly less than 0.05. The coverage probabilities of the Bayesian method continued to be close to 95%, but they could drop to 85.6% when α i = -1.5, N = 10, M = 50, and n i was within 10-100. The conventional method had poor coverage probabilities in many settings; they could even drop to 9.5% when α i ¼ 0. It performed slightly better than the Bayesian method in terms of coverage probabilities only when α i = -1.5, N = 10, and M = 50. A noticeable number of computing errors occurred during the REML estimation when implementing the conventional method in the setting of α i = 1.5, N = 10, M = 50, and n i was within 10-100.
Additional simulation results of secondary analyses and their interpretations are presented in the Supplementary Material. Specifically, Tables S1 and S2 give the results of the Bayesian and conventional methods when d = 0.8 and 2, respectively. Tables S3 and S4 show the results of simulation studies when continuous measures followed the logistic and skewed distributions, respectively. Table  S5 summarizes the results with different study-specific cutoffs α i . Table S6 investigates the probitbased conversion method's performance. Table S7 presents the results of biases, RMSEs, and coverage probabilities for the between-study standard deviation τ when d = 0.5.

Antidepressant data
In addition to the simulation studies, we applied the conventional and Bayesian methods to a dataset collected in a systematic review by Cipriani et al. (2016) to illustrate the use of these two methods in real-world settings. This dataset contained a total of 34 RCTs to compare the efficacy and tolerability of antidepressants for major depressive disorders in children and adolescents. Our case study focused on the efficacy. The authors originally performed a network meta-analysis to compare placebo and the following 14 antidepressants: amitriptyline, citalopram, clomipramine, desipramine, duloxetine, Table 1. Biases, root mean squared errors (RMSEs), and coverage probabilities with their Monte Carlo standard errors (in parentheses) of the estimated overall standardized mean differences by the Bayesian method and the conventional method when the true overall standardized mean difference d = 0.2.    escitalopram, fluoxetine, imipramine, mirtazapine, nefazodone, nortriptyline, paroxetine, sertraline, and venlafaxine. The continuous outcomes were measured by the mean overall changes in depressive symptoms from baseline to endpoint. For the binary outcomes, events were defined as whether the depression rating scores of patients were reduced by at least certain cutoff values. Among the 34 RCTs, some reported continuous outcomes, some reported binary outcomes, and some reported both. In the following, we use the first author's surname with publication year to denote a study in the review. Our case study considered the placebo-controlled trials; three RCTs (i.e., Attari 2006, Braconnier 2003, and Hongfen 2009) did not include placebo, so they were removed from our analyses. We finally considered 31 RCTs, whose reported measures are shown in Table 4. Of note, three RCTs (i.e., Atkinson 2014, Emslie 2014, and Keller 2001) had 3 arms with 2 antidepressants compared with placebo. In Table 4, we collapsed the two groups of antidepressants in these three RCTs. In addition, the sample sizes reported for the continuous and binary outcomes within some RCTs might slightly differ because the corresponding measures of a few individual patients might be missing.
Our cleaned dataset included 6 RCTs only reporting continuous outcomes, 1 RCT only reporting binary outcomes, and 24 RCTs reporting both outcomes. Among the 24 RCTs reporting both outcomes, we were able to obtain the observed SMDs from the continuous outcomes and the observed ORs from the binary outcomes and thus evaluate the performance of the conventional method. The RCTs used different rating tools to measure depressive symptoms. For all tools, lower scores indicated milder symptoms. Because the binary events were patients whose scores at the endpoint were reduced by certain amounts compared with their scores at the baseline, the direction of dichotomization was the inverse of that primarily considered in the proposed model. Section 2.6 briefly discussed this situation.
We considered nine scenarios of meta-analyses, denoted by scenarios i-ix, with different settings regarding the number of studies, sample size, and effect magnitude. These scenarios were visualized using forest plots in Figures 3 and 4 and Figures S1-S3 in the Supplementary Material. In each forest plot, the results of the Bayesian method were the posterior medians and CrIs of the study-specific underlying SMDs θ i in the hierarchical model. They were depicted by squares over solid lines. The converted SMDs θ i from ORs using the conventional method were depicted by circles over dashed lines. The observed SMDs θ i in the studies that reported continuous outcomes were depicted by triangles over dotted lines. For studies with continuous outcomes, we displayed the converted SMDs θ i as the observed SMDs θ i in the forest plots, so that the conventional meta-analyses synthesized the data in the column of converted SMDs. The synthesized SMDs were displayed as diamonds at the bottom of each forest plot. The conventional and Bayesian methods were implemented similarly as in the simulation studies, except that we increased the number of posterior samples to 50,000 after a 20,000-run burn-in period.
Scenarios i and ii contained all 31 RCTs; they are illustrated in Figure 3. Among the 31 RCTs, 6 reported only continuous outcome measures, 1 reported only binary outcome measures, and the remaining 24 reported both types of measures (Table 4). We treated all antidepressants as a single group of active treatments, compared with the control group of placebo. In scenario i, we used the continuous outcome measures from the N = 6 RCTs that only reported such measures and the binary outcomes from the remaining M = 25 RCTs. In scenario ii, we considered an approximately balanced setting of N = 15 RCTs with continuous outcomes and M = 16 studies with binary outcomes. Specifically, the 15 RCTs with continuous outcomes included the 6 RCTs that only reported continuous outcome measures. Among the 24 RCTs that reported both types of outcomes, we used the continuous outcome measures from 9 randomly selected RCTs and the binary outcome measures from the remaining RCTs. In both scenarios, the interval estimates of the overall SMDs by the conventional and Bayesian methods did not cover 0, supporting the efficacy of the antidepressants; their differences were tiny. The CrI of the overall SMD by the Bayesian method was slightly wider than the CI by the conventional method, likely because the Bayesian method accounted for full uncertainties. Table 4. Summary data of the systematic review of antidepressants vs. placebo by Cipriani et al. (2016). Note: RCT, randomized controlled trial (ordered by reported outcome types, publication years, and author/company names); SD, standard deviation; n, sample size; NA, not available. Figure 4 were two subgroup analyses restricted to fluoxetine and paroxetine, respectively. The subgroup of fluoxetine contained one RCT with continuous outcomes and five RCTs with binary outcomes. Both the overall SMD estimates by the conventional and Bayesian methods had interval estimates not covering 0, while the Bayesian method produced a much wider CrI than the CI by the conventional method (Figure 4a). Scenario iv only contained one RCT with continuous outcomes and two RCTs with binary outcomes. The overall SMD estimates by both methods had interval estimates covering 0. The overall SMD based on the Bayesian method had a noticeably different point estimate and a dramatically wider interval estimate than the conventional method (Figure 4b).

Scenarios iii and iv in
(a) Scenarios v and vi in Figure S1 in the Supplementary Material were restricted to the 12 RCTs with relatively small total sample sizes (i.e., at most 100), among which 1 RCT only reported continuous outcomes, 1 only reported binary outcomes, and the remaining 10 reported both outcomes. Scenario v used continuous outcomes from six RCTs and binary outcomes from the remaining six RCTs. The estimated overall SMD's interval estimate by the conventional method did not cover 0, but that by the Bayesian method covered 0 ( Figure S1A). Scenario vi used continuous outcomes from 2 RCTs and binary outcomes from the remaining 10 RCTs. The estimated overall SMDs by both the conventional and Bayesian methods had interval estimates covering 0 ( Figure S1B). Similarly, scenarios vii and viii in Figure S2 were restricted to the 19 RCTs with relatively large sample sizes (i.e., greater than 100). Scenario vii contained 9 studies with continuous outcomes and 10 studies with binary outcomes (b) ( Figure S2A), and scenario viii contained 5 studies with continuous outcomes and 14 studies with binary outcomes ( Figure S2B). Both scenarios led to overall SMDs' interval estimates not covering 0, and the conventional and Bayesian methods had slight differences in the point and interval estimates. In addition, as a secondary analysis, scenario ix presented the results of the probit-based conversion method in Figure S3; the Supplementary Material gives its detailed interpretations. Figure 5a illustrates converting the estimated overall SMD -0.21 with 95% CrI (-0.29, -0.13) by the Bayesian model to the overall log OR in scenario i. It visualizes the relationship between the standardized cutoff value and the overall log OR. When the standardized cutoff values were away from 0 on both sides, the overall log OR estimate tended to be larger with a wider 95% CrI. The results for other scenarios are presented in Figures S4-S7 in the Supplementary Material.

Self-directed learning data
In addition, we applied the proposed Bayesian method to the dataset from Murad et al. (2010), in which the direction of dichotomizing binary events was different from the antidepressant data but was consistent with the tentative assumption used in Section 2.5. This systematic review included 48 different studies to compare the effectiveness of self-directed learning (SDL) with traditional learning  methods (e.g., lectures) in education among health professionals. The event was defined as whether an improvement in the quantitative results (e.g., the examination or test scores) was achieved. Therefore, higher scores indicated better effectiveness of SDL.
There were three outcome domains: knowledge, skills, and attitudes; we considered a meta-analysis for each domain. Several studies reported incomplete results (e.g., p-values only) that required further imputation; we removed them from our analyses. Moreover, some studies (e.g., Bhat 2007) contained multiple subgroups. Instead of merging these subgroups within studies, we treated them as separate data entries in our meta-analyses. Some studies reported more than one outcome domain. Our cleaned dataset contained 40 studies in the knowledge domain, 9 studies in the skills domain, and 5 studies in the attitudes domain.
Figures S8-S10 in the Supplementary Material show the forest plots and results in the domains of knowledge, skills, and attitudes, respectively. The knowledge domain included 36 studies with continuous outcomes and 4 studies with binary outcomes ( Figure S8). The overall SMDs estimated by both the conventional and Bayesian methods had interval estimates not covering 0. The Bayesian method produced a slightly larger estimate of the overall SMD with a wider 95% CrI. In the skills domain, eight studies reported continuous outcomes, and one study reported binary outcomes ( Figure S9). The conventional and Bayesian methods had the same overall SMD estimates, and both had interval estimates covering 0. The Bayesian method produced a noticeably wider 95% CrI. In the attitudes domain, all five studies reported continuous outcomes ( Figure S10). The Bayesian method produced a slightly smaller point estimate and a wider 95% CrI than the conventional method. Both methods produced overall SMDs' interval estimates not covering 0.
Converting from the overall SMD estimate of 0.41 with 95% CrI (0.14, 0.67) by the Bayesian method in the knowledge domain, Figure 5b displays the relationship between the overall log OR and (negative standardized) cutoff value. Figures S11 and S12 in the Supplementary Material present the overall log ORs for the skills and attitudes domains, respectively.

Discussion
This article has proposed a Bayesian approach to combining continuous and binary outcome measures. This approach uses exact likelihoods for both continuous and binary outcome measures, thus incorporating full uncertainties in the synthesized results. We used simulation studies and case studies to compare the proposed approach with the conventional method. The conventional method uses a linear conversion between the SMD and log OR, which could be easily implemented in practice. Despite its simplicity, our simulation studies showed that this method could produce substantial biases in some situations, such as large effect sizes, small sample sizes, and extreme cutoff values (leading to rare binary events). The Bayesian method generally outperformed the conventional method, as it produced smaller biases, smaller RMSEs, and higher interval coverage probabilities. In the case study of RCTs of antidepressants, the overall SMD estimates were mostly about -0.2, which were not large effect sizes. The conventional and Bayesian methods did not produce substantially different results. Nevertheless, when the number of studies was small (in subgroup analyses), the Bayesian method could produce noticeably different overall SMD estimates from the conventional method, and its interval estimates were wider. As indicated by the simulation studies, the CIs produced by the conventional method could have very low coverage probabilities; the CrIs by the Bayesian method may be more reliable, particularly in some extreme settings. The proposed Bayesian method may have some limitations. This article focused on synthesizing SMDs and ORs, while other effect measures (e.g., mean difference, relative risk) may be of interest in some meta-analysis practices (Zhao et al. 2022). The current approach may not be directly applied to other effect measures. Second, the proposed methods depend on the normality assumption for individual-level continuous measures, but this may not strictly hold in some cases. The secondary analyses in our simulations illustrated that the proposed method might not perform well for nonnormal continuous measures (e.g., following the logistic or gamma distributions). Also, the data in depression scales might be skewed in the first case study. Nevertheless, in the original article by Cipriani et al. (2016), the authors also made the normality assumption in their analyses. We believe that this case study under the normality assumption is sufficient for illustrating the proposed method, while more work is needed to extend the new method to model non-normal data in the future. Third, we used vague priors in the Bayesian analyses for illustrative purposes. On a case-by-case basis, informative priors may be considered to further improve the performance of Bayesian metaanalyses (Rhodes et al. 2015;Turner et al. 2012). Fourth, both the conventional and Bayesian methods considered in this article assumed a common variance of (latent) continuous measures in both control and treatment groups within each study. This assumption might be violated in some cases. Alternative measures are available to replace the pooled estimator of the common variance with other estimators for standardization without the homoscedasticity assumption (Glass 1976). Similar ideas may also be used in the proposed Bayesian model. They would increase the model complexity, and one may use the deviance information criterion for selecting an appropriate model (Spiegelhalter et al. 2002). Fifth, although the Bayesian approach generally performs better than the conventional method, its implementation may be challenging for practitioners without much statistical training. The Supplementary Material includes the R code for implementing the case study. We will also develop user-friendly software to perform the proposed approach.
In summary, we recommend the Bayesian method in cases of large effect sizes, small sample sizes, or extreme cutoff values for dichotomizing events if the (latent) continuous outcome measures can be considered approximately normally distributed. The new method also has merits when informative priors are available (e.g., from clinical perspectives for a specific disease outcome). If the outcome measures' distribution is closer to the logistic distribution, then the simple conventional conversion between the SMD and OR generally has satisfactory performance. When the outcome measures' distribution is far away from the normal or logistic distribution, both the new Bayesian method and the conventional method could perform poorly. Methods to deal with such data need further investigation in the future.
Some extensions based on the proposed Bayesian hierarchical model may be considered in future studies. For example, this article primarily considered the meta-analysis model in a contrast-based setting (Dias and Ades 2016;Hong et al. 2016); the study-specific baseline parameters (i.e., μ i0 for continuous outcomes and α i for binary outcomes) were treated as nuisances. Alternatively, one may consider an arm-based setting to synthesize these baseline parameters. For example, the (negative) standardized cutoff values may be modeled as random effects, and the overall cutoff values across studies could be obtained from this setting. As clinicians may find ORs more interpretable than SMDs (da Costa et al. 2012), this overall cutoff value could facilitate converting the synthesized SMD to an overall OR.