Power for balanced linear mixed models with complex missing data processes

Abstract When designing repeated measures studies, both the amount and the pattern of missing outcome data can affect power. The chance that an observation is missing may vary across measurements, and missingness may be correlated across measurements. For example, in a physiotherapy study of patients with Parkinson’s disease, increasing intermittent dropout over time yielded missing measurements of physical function. In this example, we assume data are missing completely at random, since the chance that a data point was missing appears to be unrelated to either outcomes or covariates. For data missing completely at random, we propose noncentral F power approximations for the Wald test for balanced linear mixed models with Gaussian responses. The power approximations are based on moments of missing data summary statistics. The moments were derived assuming a conditional linear missingness process. The approach provides approximate power for both complete-case analyses, which include independent sampling units where all measurements are present, and observed-case analyses, which include all independent sampling units with at least one measurement. Monte Carlo simulations demonstrate the accuracy of the method in small samples. We illustrate the utility of the method by computing power for proposed replications of the Parkinson’s study.


Introduction
In repeated measures studies, researchers observe multiple responses on a set of independent sampling units. The repeated measurements may be longitudinal (across time), spatial (across location), or multivariate (on different scales). Outcome data are missing when a measurement is not observed at a particular planned occasion. The chance that an observation is missing may vary across measurements, and missingness may be correlated across measurements. Both the amount and the pattern of missing outcome data can affect power in repeated measures studies. Such complex missing data patterns occur frequently in biomedical studies. For example, in a randomized controlled clinical trial of physiotherapy in patients with Parkinson's disease (ClinicalTrials.gov Identifier: NCT01257945) (Schenkman et al. 2012), increasing dropout over a 16 month follow-up period produced missing measurements of physical function. In this study, the chance that a data point was missing appeared to be unrelated to either outcomes or covariates.
Published power approximations use assumptions that do not match the complex missing data patterns like the one in the Parkinson's disease study. Muller et al. (1992Muller et al. ( , 2007 assumed no missing data. Li and McKeague (2013) use local alternative distributions which rely on asymptotic assumptions for accuracy. Further compounding the issues with local alternatives, the use of GEE estimates has been shown to suffer from inflated Type I error rates in small samples, weakening its otherwise general appeal (Stiger et al. 1998). Tu et al. (2007) accounts for missing data in their power approximation, but they too assume local alternative distributions. Ringham et al. (2016) accounts for missing data and uses a noncentral F-distribution proposed by Muller et al. (1992Muller et al. ( , 2007, but they assume that every response was independently and equally likely to be missing. To counteract these challenges, we provide power approximations that accommodate complex missing data processes characterized by a conditional linear model (Qaqish 2003). We modify a noncentral F power approximation for a class of mixed models with no missing data described by Muller et al. (1992Muller et al. ( , 2007 and Edwards et al. (2008). The modifications incorporate the expected value of one of several missing data summary statistics (Ringham et al. 2016; Barton and Cramer 1989;Catellier and Muller 2000). The choice of which missing data summary statistic to use depends on whether scientists plan complete-or observed-case data analytic approaches. The complete-case approach uses only independent sampling units with all measurements present. The observed-case approach uses any independent sampling unit with at least one measurement. There are two missing data summary statistics in particular that we favor for these objective analyses.
The work applies to a broad and useful subset of missing processes, models and hypothesis tests. The data are assumed to be missing completely at random (MCAR) (Little and Rubin 2002). Under the MCAR assumption, the distribution of the missing responses does not depend on any of the observed outcomes, unobserved outcomes or explanatory variables. The models are assumed to be a class of general linear mixed models (Laird and Ware 1982) referred to as balanced linear mixed models (Muller et al. 2007;Ringham et al. 2016). A general linear mixed model is a balanced linear mixed model if it can be expressed as an equivalent general linear multivariate model when there is no missing data. We restrict attention to the Wald test and the Kenward-Roger reference distribution due to its equivalence with a Hotelling-Lawley trace test (McKeon 1974). We use a covariance estimator that assumes an unstructured correlation pattern between responses which has been shown to be quite favorable in clinical trial settings with random dropout (Gosho et al. 2017). Balanced linear mixed models fulfill the following criteria. For a particular independent sampling unit, the predictors have the same value for all outcomes. In addition, each independent sampling unit has the same number of planned predictors and the same error covariance matrix. In balanced linear mixed models with no missing data, each independent sampling unit has the same number and type of planned outcome measurements. In this manuscript, we relax the last assumption, and allow for missing data.
The paper has six sections. Section 2 contains general notation, definitions, and model assumptions, while Section 3 contains the power approximations. In Section 4, Monte Carlo simulations are used to demonstrate the accuracy of the method in small samples. Section 5 contains example power analyses for a proposed clinical trial. The implications and limitations of the results are discussed in Section 6.
2. Notation, definitions, and assumptions 2.1. General notation For an ðm Â oÞ matrix A and an ðo Â nÞ matrix B, AB is the matrix product. Throughout vecðAÞ is the matrix function that stacks the columns of an ðm Â nÞ matrix A into an ðmn Â 1Þ vector. For an ðm Â nÞ matrix A and a ðp Â qÞ matrix B, the Kronecker product A B generates the ðmp Â qnÞ matrix C ¼ a ij B f g : The trace of a square matrix A is trðAÞ ¼ P N i¼1 a ii : A square and full rank matrix A has inverse A À1 : Also, for any matrix A, the Moore-Penrose inverse is A þ and A 0 is the transpose. An ða Â aÞ identity matrix is I a , an ða Â 1Þ vector with every value set to 1 is 1 a and an ða Â 1Þ vector with every value set to 0 is 0 a : The minimum and maximum values of a set S are denoted by minfSg and maxfSg, respectively.
We indicate the expected value and variance of y as EðyÞ and VðyÞ, respectively. Similarly, for scalar y, we indicate the expected value and variance as EðyÞ and VðyÞ: Scalar random variables y i and y j have covariance Cðy i , y j Þ: For positive integer n, we write y $ N n ðl, NÞ to indicate the vector normal probability distribution (Arnold 1981) with l 2 R n an ðn Â 1Þ vector of means and N 2 ðR n Â R n Þ a positive definite symmetric ðn Â nÞ matrix of covariances. Similarly, with N and p positive integers, N > p, we write Y $ N N, p ðM, X, NÞ to indicate the matrix normal (Arnold 1981), with M 2 ðR N Â R p Þ an ðN Â pÞ matrix of means, X 2 ðR p Â R p Þ a positive definite symmetric ðp Â pÞ matrix of column covariances and N 2 ðR N Â R N Þ a positive definite symmetric ðN Â NÞ matrix of row covariances. Consequently, Y $ N N, p ðM, I N , NÞ implies that vecðYÞ $ N Np vecðMÞ, N I N ½ :

Balanced general linear mixed model and general linear hypothesis
Under the assumptions in Section 1, we define a balanced linear mixed model (Muller et al. 2007;Ringham et al. 2016). We use the subscript m to indicate mixed model components. Independent sampling unit i 2 1, 2, :::, N f g at measurement j 2 1, 2, :::, p f g has response y i, j , with y i ¼ y i, 1 y i, 2 ::: y i, p Â Ã 0 the ðp Â 1Þ vector of responses for independent sampling unit i, and y ¼ y 0 1 y 0 2 ::: y 0 N Â Ã 0 the ðNp Â 1Þ vector of responses for all independent sampling units. The fixed and known ðN Â qÞ design matrix for a single outcome is X, and X m ¼ X I p is the corresponding ðNp Â qpÞ design matrix for all outcomes. The ðqp Â 1Þ vector b contains the unknown and constant model coefficients. In turn, R is a positive definite symmetric ðp Â pÞ covariance matrix and is the same for all independent sampling units. We define W ¼ ðI N RÞ: Throughout, e $ N Np ð0, WÞ is the ðNp Â 1Þ vector of errors, which implies statistical independence of e i and e i 0 for i, i 0 2 1, 2, :::, N f g , i 6 ¼ i 0 : The balanced general linear mixed model is with y $ N Np ðX m b, WÞ and y i independent of y i 0 for i, i 0 2 1, 2, :::, N f g , i 6 ¼ i 0 : The ða Â qÞ matrix C contains the between-independent-sampling-unit contrasts and the ðp Â bÞ matrix U contains the within-independent-sampling-unit contrasts. The ðab Â pqÞ contrast L ¼ C U 0 defines the secondary parameter h ¼ Lb and the corresponding null hypothesis The Wald parameter is defined as with X m the design matrix speculated to exist were the study to be observed.

Vector of non-missing data indicators
The balanced linear mixed model described in Equation 1 has no missing response data. Often, in biomedical studies, responses at planned measurements are not observed, resulting in missing response data, and a sample size that may be smaller than the planned sample size, N. We define notation for the probability that y i, j is missing or non-missing. The indicator variable d i, j has realization d Ã i, j : If y i, j is missing then d Ã i, j ¼ 0: With j 6 ¼ j 0 and j, j 0 2 1, 2, :::, p f g , the joint probability mass function is defined by An implicit requirement is that p jj 0 minðp j , p j 0 Þ: It is important to recognize that implicit in Equations 4 and 5 is a homogeneity assumption: the probability that a response is non-missing is the same for all i.
The ðp Â 1Þ vector of non-missing data indicators for independent-sampling-unit i is

Conditional linear missing data process
In the previous section, we defined notation to describe whether a response was present or missing. We turn now to examining the underlying probability processes which dictate the presence or absence of the responses. For the sake of clarity, we use the word "process" to describe missingness. This choice is in contrast to the word "model" used in Section 2 to describe the association between outcomes and predictors. For data analysis, Qaqish (2003) described binary responses using a class of conditional linear models with constrained covariance parameters. We can adapt his model to characterize a missing data process, referred to hereafter as the conditional linear process (CLP). The conditional linear process has two parameters. For j 2 1, 2:::, p f g, we define the ðp Â 1Þ vector of the marginal probabilities that a response at a given repeated measurement j is non-missing as p ¼ p 1 p 2 ::: p p Â Ã 0 : Further, we write p jÀ1 to indicate the ðj À 1Þ vector p 1 p 2 ::: p jÀ1 Â Ã 0 : We use c jj 0 to denote the cor- Qaqish (2003) showed that bounding yields a positive semidefinite covariance matrix while strict inequalities produce positive definite covariance matrices. For the conditional linear process, we write the ðp Â 1Þ vector of expected values Eðd i Þ ¼ p and the ðp Â pÞ covariance Vðd i Þ ¼ U: Both p and U are defined by a conditional model, which gives the result for measurement j, conditioned on the previous ðj À 1Þ measurements.
To describe the missing states for the first j measurements on independent sampling unit i, we define the ðj Â 1Þ vector d Similarly, we define the ðj Â jÞ matrix Vðd i, j Þ ¼ U j to contain the variance and covariance parameters for the first j measurements on independent sampling unit i. With Cðd i, k , d i, j Þ ¼ s k, j , k 2 1, 2, :::, j À 1 f g , and the ðj À 1Þ Â 1 Â Ã vector of covariances s jÀ1 ¼ s 1, j s 2, j ::: s jÀ1, j Â Ã 0 , we obtain Variance elements in U fall in 0, 0:25 ½ : Covariance elements are a function of the chance that both elements are non-missing. Using the result in Equation 7, we write 2.5. Missing data summary statistics Barton and Cramer (1989) and Catellier and Muller (2000) suggested using missing data summary statistics to summarize an observed missing data pattern. However, when designing a study and calculating power, the pattern of missingness has not yet been observed. Instead, one must consider the expected value of the missing data summary statistic with respect to the missing data process. The expected value is a weighted average over all possible realizations of the missing data process. For independent sampling unit i, we define the number of missing responses r i 2 0, 1, :::, p f gas A complete case is indicated by For j 2 1, 2, :::, p f g , the number of independent sampling units who have measurement j present is given by for N jj 2 f0, 1, :::, Ng: The missing data summary statistics N k , k 2 c, m, p f g are defined in Table 1. For the formulae presented in Table 1, we define

Power for balanced linear mixed models with missing data
We provide power approximations for balanced linear mixed models with no missing data, and for the complete-and observed-case analyses. We restrict attention to the Wald test with the Kenward-Roger Roger 1997, 2009) reference distribution. The new power approximation is based on a sequence of previous published works. Muller et al. (2007) and Edwards et al. (2008) demonstrated that a multivariate model can be converted to a balanced linear mixed model, and that the Hotelling-Lawley trace statistic can be converted to the Wald test statistic. Under the alternative hypothesis, H A : h 6 ¼ h 0 , for balanced linear mixed models with no missing data, a function of the Wald statistic has an approximate noncentral F distribution (Edwards et al. 2008). Ringham et al. (2016) presented a power approximation for balanced linear mixed models using a multivariate noncentrality parameter. In this work, we demonstrate that the multivariate noncentrality parameter (Ringham et al. 2016) can be transformed into the noncentrality parameter for balanced linear mixed models, given below in Equation 13. The transformation is detailed in Section A.1 of the Appendix.
In the remainder of Section 3 we outline the approximation method. The error degrees of freedom are e ¼ N À rankðXÞ: With missing data, with EðN k Þ as given in Table 1, we write the adjusted error degrees of freedom as k ¼ EðN k Þ À rankðXÞ: The noncentrality parameter is given by As defined previously, a and b are the ranks of the between-and within-independentsampling-unit contrast matrices and that f W ðh, WÞ is the Wald parameter (Equation 3).
Let F À1 f indicate the quantile function for a noncentral F random variable. Following the approach in Muller et al. (1992Muller et al. ( , 2007 and Ringham et al. (2016), we use the McKeon (1974) approximation for the Hotelling-Lawley trace to define the degrees of freedom. With k replacing e in the forms for no missing data, the numerator and denominator degrees of freedom are 1 ¼ ab and respectively. For f 0 ð k Þ % F À1 f 1 À a; ab, 2 ð k Þ ½ , we approximate power by We use the notation 2 ð k Þ and xð k Þ to emphasize the dependence of the parameters on k .
Equation 15 gives approximate power for two missing data analysis approaches. To calculate the approximate power for the complete-case analysis, one uses Equation 15 evaluated with k ¼ c ¼ EðN c Þ À rankðXÞ: In this setting EðN c Þ represents the expected number of complete cases. To calculate the approximate power for the observed-case analysis, one uses Equation 15 evaluated with k ¼ m ¼ EðN m Þ À rankðXÞ: We substitute EðN m Þ for N in the observed-case as it characterizes the expected number of observations at each repeated measurement.
For no missing data, the results of Equation 15 reduces to the power approximation suggested by Edwards et al. (2008). The results are equivalent because k ¼ p ¼ N p À rankðXÞ ¼ N À rankðXÞ ¼ e , xð k Þ ¼ ab Á f W and thus, Powerð k Þ ¼ Powerð e Þ: In an even more restrictive case, with no missing data and s ¼ min a, b f g ¼ 1, the noncentral distribution of the test statistic is known and the power results are exact, rather than approximate.

Simulation methods
We evaluated the accuracy of the power approximations via simulation. To compute empirical power, we defined a, the Type I error rate, X, C, U, b, h 0 and W: We computed X m ¼ X I p , L ¼ C U 0 and h ¼ Lb: We generated realizations of e and computed realizations of y as in Equation 1. We randomly generated realizations of d i , the vector of non-missing data indicators, using different missing data processes. We then created y Ã by setting each corresponding value in y to present or missing, as indicated by d i : The process was repeated for 10,000 replications.
For each replication, we attempted to fit a general linear mixed model, with y Ã as the response vector. Mixed models were fit in SAS 9.4 (SAS Institute Inc., 2017) using the PROC GLIMMIX procedure with double dogleg optimization. The modeling approach used an unstructured covariance, the REPEATED statement and the KR2 (Kenward Roger) option for the denominator degrees of freedom.
We evaluated the accuracy of the approximations separately for the complete-and observed-case approaches. For the complete-case approach, we analyzed only the independent sampling units with all measurements present. For the observed-case approach, we analyzed all independent sampling units with at least one measurement present.
For some replicates, we were unable to fit the mixed model. The parameters b and hence h are estimable only if there is at least one data point for each within-by-between cross-classification. In some cases, there were sufficient data so that b was technically estimable, but the model failed to converge, usually because the Hessian matrix was not positive definite. We recorded the number of replicates for which b was estimable, and the number of replicates for which the model converged. Convergence rate was calculated as the number of replicates for which the model converged divided by the number of replicates for which b was estimable.
For each model that converged, we calculated the Wald statistic and computed a pvalue using the Kenward-Roger (Kenward and Roger 2009) degrees of freedom. Empirical power was computed as the number of p-values less than or equal to a, divided by the number of replicates for which the model converged. The simulation study used 10, 000 replications so that the error in the estimation of the empirical power would occur in the second decimal place.
The analytic power approximations for the complete-case and observed-case analyses were computed as in Equation 15. The absolute deviation was calculated as the absolute value of the difference between the approximate power and the empirical power. The maximum absolute deviation was computed as the maximum of the absolute deviations across a specified range of experimental conditions.
We examined the accuracy of the power approximations for five of the nine scenarios previously considered by Johnson et al. (2009) and then by Kreidler et al. (2013). The subset was chosen to ensure that the Wald statistic and the mixed model were appropriate. Below, we provide results for two of the five scenarios: Scenario 4 and Scenario 5. Scenario 4 involves a test of a time-by-treatment interaction in a design with four exposure groups and three repeated measurements over time. Scenario 5 also involves a test of a time-by-treatment interaction, but for a design with two exposure groups, and five repeated measurements over time. Detailed descriptions of the study designs for Scenarios 4 and 5 appear in Section A.5 in the Appendix. The online supplemental material contains study designs for the remaining scenarios.
Scenarios 4 and 5 were chosen to allow evaluation of the accuracy of the power approximations in two important cases. In Section 3, we defined s ¼ min a, b f g to allow us to distinguish between single and multiple degree of freedom hypothesis tests. For the design in Scenario 4, s > 1, and the power results would be approximate even with complete data. For the design in Scenario 5, s ¼ 1. This means that the analytic power results would be exact with no missing data, but are approximate when some measurements are missing.
For each experimental scenario, we varied the analysis approach (complete-or observed-case), the planned sample size, the missing data process, the parameter inputs for the missing data process (p and U) and the scaling parameters for the error variance and mean difference (d r and d b , respectively).

Simulation results
Over all experimental conditions, the minimum number of replications for which b was estimable was 9, 431 out of the planned 10, 000 replicates. The convergence rates across most experimental conditions were greater than 0.99. The exception were conditions for Scenario 5 with a planned sample size of 20, which gave the minimum convergence rate of 0.78.
A subset of the results for Scenario 4 appears in Table 2. A subset of the results for Scenario 5 appears in Table 3. The online supplemental material contains results for the remaining scenarios and experimental conditions. We report EðN c Þ for the completecase approach, EðN m Þ for the observed-case approach, the approximate analytic power, the empirical power and the absolute deviation.
Across both experimental scenarios, and all experimental conditions, the maximum absolute deviation between the approximate and analytic power was less than 0.056 (data not shown). Most scientists design studies so that they have power between 0.8 and 0.95. For all scenarios where the analytic power was greater than or equal to 0.8, the maximum absolute deviation was less than 0.052. The accuracy of the approximations was similar across different sample sizes, values of variance, mean differences, types of missing data processes, inputs for the missing data processes and choice of analysis approach. Results for the other five scenarios shown in the online supplemental material were similar. Schenkman et al. (2012) conducted a randomized controlled clinical trial of three different forms of exercise therapy: flexibility/balance/function, standard aerobic exercises and a home-based exercise regimen. The outcome was the Continuous Scale Physical Function test at baseline, 4, 10, and 16 months after randomization. Investigators propose a similar follow-up study with a different sample from the same population. For the follow-up study, investigators will fit a general linear mixed model with indicator variables for the three treatments as predictors and the four repeated measurements of the physical function test as outcomes. The goal will be to evaluate the time-by-treatment interaction, using the mixed model Wald test with Kenward-Roger degrees of freedom and an a level of 0.05.

Randomized controlled clinical trial in parkinson's disease
We provide a power analysis for the follow-up study. Model and hypothesis parameters are given in Section A.6 in the Appendix. We assumed a conditional linear missing data process. The missing data process parameter estimates are taken from the preliminary data analysis, with the covariance matrix of the missingness given by We consider three patterns for p: Pattern A corresponds to complete data, with p ¼ 1 1 1 1 Â Ã 0 : Pattern B corresponds to the missing data pattern observed in the preliminary data analysis, with p ¼ 1 0:868 0:818 0:793 Â Ã 0 : Pattern C corresponds to half the missing data observed in the preliminary analysis, with p ¼ 1 0:736 Â 0:636 0:587 0 : Power curves for the three patterns are shown in Figure 1. Power is shown on the y axis. The x axis is in units of the largest element in the interaction coefficient matrix

Discussion
Missing data occurs everywhere in biomedical research, from longitudinal epidemiologic observational studies to randomized controlled clinical trials. Missing data can strongly affect the power of biomedical research studies and thus the choice of sample size. If the sample size is larger than needed, too many participants are exposed to the potential risk of research. If the sample size is smaller than needed, resources, investigator and participant time are wasted on a study that has insufficient statistical power to answer the question of interest. Accurate power and sample size calculations must take into account the chance of missing data. The methods proposed in the current work allow power calculations for some important classes of models and missing data processes. We consider balanced linear mixed models and the Wald test with Kenward-Rogers degrees of freedom. We also used a flexible model of the missing data process to allow us to examine how different missing data patterns affect the results of the power analysis.
The work extends previous power approximations for data without missing values (Muller et al. 2007;Edwards et al. 2008) and for data with values missing completely at random with equal probability (Ringham et al. 2016). The new methods accommodate complex missing data processes (Qaqish 2003) to better mirror processes likely to occur in real repeated measures studies. The missing data processes we consider allow a different chance of missingness for each measurement in a repeated measures study. In addition, the missing data processes allow for the chance of missingness for one measurement to be correlated with the chance of missingness of another measurement. We provide different power approximations for the complete-and observed-case data analytic approaches, both of which are commonly used in biomedical research. The approximation has adequate accuracy for the scenarios scientists care about the most, with true power between 0.8 and 0.95. The error in the power approximation is in the second decimal place.
A version of the approximations described here will be included in Version 3 of GLIMMPSE (Kreidler et al. 2013), available at SampleSizeShop.org. The GLIMMPSE power and sample size software for multilevel and longitudinal studies is user-friendly, point-and-click and available to run without cost from a web browser. Users who prefer R code to implement the methods may request a copy from the authors.
To approximate power for a study with no missing data, a scientist needs to specify the Type I error rate, predictor matrix, error variance, between-and within-independent sampling unit contrasts and sample size. To conduct an accurate power calculation with outcome data missing completely at random, a scientist must, in addition, choose an appropriate missing data process. The choice should depend on the scientist's assumptions about the missing data process. The conditional linear missing data process can accommodate several assumptions, including autoregressive and unstructured correlation structures. Estimates for the chance that the response is missing at each measurement, and the correlation between missingness at each measurement can be obtained from literature reports, from previous unpublished data, or from an understanding of the rationale for missing data in a certain study design, disease state, or treatment regimen.
While the power approximations cover a useful set of models and missing data processes, there are limitations to the work. First, the assumption that the data are missing completely at random (MCAR) may not hold. The MCAR assumption means that the chance that a response is missing does not depend on observed data, including covariates. Secondly, the manuscript only proposes power analyses for balanced linear mixed models with Gaussian outcomes. This means that the results may not hold for studies with mistimed data, time-dependent covariates or non-Gaussian outcomes.
In future work, we hope to develop a maximum likelihood approach to estimate marginal missingness, and correlation between missingness from completed studies. It would be useful to develop a corresponding likelihood ratio test to aid scientists in characterizing a missing data process. We also plan to relax the assumption that the outcome data are missing completely at random. We hope to consider a larger class of models and outcomes, including non-linear mixed models, and binary and Poisson distributed outcomes. It is important to consider how power calculations are affected by the use of observed (and hence random) percentages of missingness at each measurement drawn from previous experiments. An additional open question is how to compute confidence intervals for power for studies with missing data. For ethical reasons, some investigators may prefer to consider quantiles of power, rather than power calculated for an expected amount of missingness.

Disclaimer
This manuscript was submitted to the Department of Biostatistics and Informatics in the Colorado School of Public Health, University of Colorado Denver, in partial fulfillment of the requirements for the degree of Master of Science in Biostatistics for KPJ. The content of this paper is solely the responsibility of the authors, and does not necessarily represent the official views of the Eunice Kennedy Shriver National Institute of Child Health and Human Development, the National Center for Advancing Translational Sciences, the National Institute of Diabetes and Digestive and Kidney Diseases, the National Institute of General Medical Sciences, the National Institute of Neurological Disorders and Stroke, the National Library of Medicine, the Office of the Director, the National Institutes of Health, the American Heart Association nor the Parkinson's Disease Foundation. The authors have no conflicts of interest to disclose. The second moment is yielding Vðd i, j Þ ¼ p j ð1 À p j Þ: For d i, j d i, j 0 2 f0, 1g, Eðd i, j d i, j 0 Þ ¼ p jj 0 since d i, j d i, j 0 has a Bernoulli distribution. This gives Cðd i, j , d i, j 0 Þ ¼ p jj 0 À p j p j 0 :

A.3. Moments for N c
The moments can be derived by a process of induction. To begin, we show that Eð Q p j¼1 d i, j Þ ¼ p 1 Q p j¼2 ½p j þ s 0 jÀ1 U þ jÀ1 ð1 jÀ1 À p jÀ1 Þ: The initial step gives E d i, 1 ð Þ ¼ p 1 : For the base case, allow p ¼ 2. Then For the induction step, suppose that The complete-case indicator variable has a Bernoulli distribution such that Since we assume that for all i, i 0 2 1, 2, :::, N f g , i 6 ¼ i 0 and j, j 0 2 1, 2, :::, p f g , Cðd i, j , d i 0 , j 0 Þ ¼ 0, the sum of complete-case indicator variables has a binomial distribution with