Optimal and Efficient Crossover Designs for Test-Control Study When Subject Effects Are Random

We study crossover designs based on the criteria of A-optimality and MV-optimality under the model with random subject effects, for the purpose of comparing several test treatments with a standard control treatment. Optimal and efficient designs are proposed, and their efficiencies are also evaluated. A family of totally balanced test-control incomplete crossover designs based on a function of the ratio of the subject effect variance to the error variance are shown to be highly efficient and robust. The results have interesting connections with those in Hedayat and Yang (2005) and Hedayat, Stufken, and Yang (2006). The omitted proofs in the article are included in a supplemental material online.


INTRODUCTION
To compare the effects of various treatments we could, if feasible, conduct a crossover design where each subject in the study receives each treatment in succession. Besides the budget consideration, one primary advantage of adopting crossover designs is that the treatment effects are no longer confounded with the subject effects, and hence the systematic bias in estimating the treatment effects could be reduced substantially. However, the period effects and carryover effects could come into the picture at the same time, which makes the design problem more complex. Our interest here is to find optimal or efficient designs for the purpose of estimating direct treatment effects rather than carryover effects.
As in Hedayat, Stufken, and Yang (2006), we assume the subject effects to be random, which could be considered as in the middle status between the models with the fixed subject effects and no subject effects, in terms of the information matrix for the direct treatment effects. They studied the universal optimality, for which the totally balanced designs proposed therein exhibit the symmetry among all treatments. In this article, we focus on the A-optimality and MV-optimality, which aim to compare multiple test treatments to a control treatment. We investigate the high efficiency of totally balanced test-control incomplete (TBTCI) crossover designs, which exhibit the symmetry among test treatments. Interestingly, the class of TBTCI designs covers the totally balanced designs as a special case when the control treatment has the same replication as test treatments. The gain in generality for designs is due to our relaxation of the criterion from the universal optimality to one particular optimality either A-optimality or MV-optimality. Note that θ , the ratio between the variance of the subject effects and the variance of the error term, is still critical for the choice of designs. Sam A. Hedayat is Professor (E-mail: Hedayat@uic.edu) and Wei Zheng is Ph.D. Candidate, Department of Mathematics, Statistics & Computer Science, University of Illinois at Chicago, Chicago, IL 60607. Research was primarily sponsored by the National Science Foundation grants DMS-0603761 and DMS-0904125, and the NIH grant P50-AT00155 (jointly supported by the National Center for Complementary and Alternative Medicine, the Office of Dietary Supplements, the Office for Research on Women's Health, and the National Institute of General Medicine). The contents are solely the responsibility of the authors and do not necessarily represent the official views of NIH. We are very grateful to Professor Min Yang, Department of Statistics, University Missouri at Columbia, for his many valuable and insightful comments and suggestions while this article was under preparation. We also acknowledge with thanks the suggestions we received in improving the article by the Editor, the Associate Editor, and the two referees.
The TBTCI designs were first proposed by Hedayat and Yang (2005) when they studied the A-optimality and MV-optimality for the model with fixed subject effects, that is, θ = ∞ in our setup. They proved that a certain type of TBTCI designs would be optimal in , a mildly restricted subclass of competing designs, and conjectured that does not exclude too many good designs so that the optimal designs in is still highly efficient in , the unrestricted class of designs. As for the question of how efficient are these designs, no investigation was carried out. In this article, we generalize their results and find optimal designs in for general models with random subject effects, that is, for general θ ≥ 0. We also characterize the optimal designs in when θ = 0, which corresponds to the model with no subject effects. Moreover, we give an explicit way to evaluate the efficiency of any design in without finding the optimal designs for any θ .
The article is organized as follows. Section 2 introduces the model based on which the study of designs is carried out, and further provides some notations which will be used in the subsequent sections. Section 3 introduces the optimality criteria and the corresponding optimal and efficient designs. Section 4 describes the method of deriving the lower bound of the A-efficiency of any design, which will be used to evaluate the efficiencies of the designs proposed in Section 3. Section 5 summarizes the results of this article by comparing them to relevant results in existing literatures and proposes some future directions of research. Section 6 gives the proofs for Theorems 1 and 2 in Section 3 through several lemmas, while further supports for the derivation of these lemmas will be postponed to the supplemental material.

MODEL ASSUMPTIONS AND NOTATIONS
We denote by t+1,n,p the set of all of designs where n subjects are used in p ≥ 3 occasions, called periods, for the purpose of evaluating and studying one control treatment and t ≥ 2 test treatments. Hereafter, we shall designate the t test treatments by 1, 2, . . . , t and the control treatment by 0. For a continuous response Y, a plausible and useful linear model with random subject effects can be written as where we assume the subject effect ς u iid ∼ (0, σ 2 ς ), the error term ε ku iid ∼ (0, σ 2 ), and {ς u } ⊥ ⊥ {ε ku }. The identical condition in the notation iid is not essential here, and hence could be removed. Here, Y dku denotes the response from subject u in period k to which treatment d(k, u) ∈ {0, 1, 2, . . . , t} was assigned by design d ∈ t+1,n,p , k = 1, . . . , p, and u = 1, . . . , n. Furthermore, μ is the general mean, π k is the kth period effect, τ d (k,u) is the (direct) treatment effect of treatment d (k, u), and ρ d (k−1,u) is the (first-order) carryover or residual effect of treatment d(k − 1, u) that subject u received in the previous period (by convention ρ d(0,u) = 0). Finally, σ 2 ς is the subject effects variance, and σ 2 is the error variance.
Writing the np × 1 response vector as where V = I n ⊗ (I p + θ J p ) and θ = σ 2 ς /σ 2 . Here π = (π 1 , . . . , π p ) , τ = (τ 0 , . . . , τ t ) , ρ = (ρ 0 , . . . , ρ t ) , P = 1 n ⊗ I p , and T d and F d denote the treatment/subject and carryover/subject incidence matrices. The notation represents the transpose operator; ⊗ represents the Kronecker product; 1 s represents the column vector of length s with all its entries as 1; I s represents the s × s identity matrix; and J s = 1 s 1 s . The information matrix C d for τ under model (1) can now be expressed as where pr ⊥ is a projection operator such that pr ⊥ A = I − A(A A) − A for any matrix A. Throughout the article, for each design d, we adopt the notation n diu ,ñ diu , l dik , m dij , r di ,r di , to denote the number of times that treatment i is assigned to subject u, the number of times this happens in the first p − 1 periods associated with subject u, the number of times treatment i is assigned to period k, the number of times treatment i is immediately preceded by treatment j, the total replication of treatment i in the n experimental subjects, and the total replication of treatment i limited to the first p − 1 periods in the design. Also, we would like to define the subclass of designs t+1,n,p = d ∈ t+1,n,p |l d0k = r d0 /p, k = 1, . . . , p and m dii = 0, i = 0, 1, . . . , t .
Therefore, for any design in , the control treatment appears equally often in all periods and no treatment is allowed to be preceded by itself. Additionally, we have the following convention: For any two square matrices (e.g., A, B) of the same size, the inequality A ≤ B represents the loewner's ordering of the matrices; Tr(A) represents the trace of the matrix A; [x] represents the greatest integer that is not greater than x; B s = pr ⊥ 1 s = I s − J s /s; {S i , i = 1, 2, . . . , t!} is the set of all t × t permutation matrices and

OPTIMAL AND EFFICIENT DESIGNS
In comparing two or more test treatments with a control, the most frequently used optimality criteria are A-optimality and MV-optimality.
Definition 1. (i) In a class of competing designs, a design is said to be A-optimal if it minimizes Since the row and column sums of C d are both 0, we could express C d as The t × t submatrix M d is closely related to the A-optimality and MV-optimality (4). For completeness and convenience, we cite a well-known result as Lemma 1 below, which is similarly done by Hedayat and Yang (2005).
Lemma 1 indicates that M d contains all the information needed to evaluate the A-optimality and MV-optimality of a design. Since M d in turn is a submatrix of C d by ignoring the first row and first column, designs with the same C d should be equivalent in the A-sense and MV-sense. On the other hand, the matrix C d is a function of the unknown variable θ , hence the determination of optimal designs depends on the value of θ . As pointed out by Hedayat, Stufken, and Yang (2006), two extreme cases are worth mentioning. The case where θ = 0 corresponds to the situation of no subject effect. It is easily seen that which would indeed be precisely the information matrix for τ if we were to ignore the subject effect. Under this special case, Theorem 1 below gives the optimal designs in . The other extreme case corresponds to θ = ∞, and we have where U = I n ⊗ 1 p . This limit is precisely the information matrix that we would have obtained had we treated the subject effects as fixed. Under this special case, Hedayat and Yang (2005) gave the optimal designs in the subclass , and they conjectured that optimal designs in is still highly efficient in . In this article, we derive optimal designs in for any value of θ , which covers their result as a special case. We also give the explicit way of evaluating the efficiency of any design for any value of θ . Theorem 1. If θ = 0, then for any n, t, p, a design d is simultaneously A-optimal and MV-optimal in n,t,p if it satisfies: when the latter is an integer.
We would like to give an intuitive explanation for the conditions imposed in Theorem 1. Conditions 2-4 impose some structure of symmetry among test treatments and periods. Condition 5 directly determines the replication of each treatment in conjunction with Condition 4. Finally, Condition 1 requires the exact relationship between two type of variables, which is too strong in general. When n = 18, t = 4, p = 3, we need r d0 = np/(1 + √ t) = 18. Under this situation, the design below constructed by Yang and Stufken (2008) is optimal in 18,4,3 when θ = 0: 1 1 1 2 2 2 3 3 3 4 4 4 0 0 0 0 0 0  1 2 3 2 3 4 3 4 0 4 0 0 0 0 1 0 1 2  4 0 0 0 1 2 2 1 2 3 4 0 3 3 0 4 0 1. For general θ , it is hard to compare all designs in . However, we succeeded in finding optimal designs in the subclass . In order to introduce the result, it is necessary to give the definition of a class of designs proposed by Hedayat and Yang (2005).
, then for any value of θ a design d is simultaneously A-optimal and MV-optimal in t+1,n,p if d is a TBTCI design and where (ii) Suppose p = 3 and t = 2, then the conclusion in (i) is still valid if we change the class of competing designs from t+1,n,p to {d ∈ t+1,n,p |r d * 0 /n ≥ 0.6306}.
Theorem 2 indicates that θ influences the determination of optimal designs in by deciding the value of r d0 . Based on this r d0 , we could try to find a TBTCI design. Also note that Equation (5) is analogous to Condition 5 of Theorem 1. Now let us compare the seven conditions in Definition 2 with Conditions 1-4 in Theorem 1. Conditions 2-4 in Definition 2 are identical to Conditions 2-4 in Theorem 1. Conditions 5-7 in Definition 2 are about symmetry among subjects, and their association with treatments. Theorem 1 does not impose these conditions since θ = 0 corresponds to the model with no subject effects. Observe that Condition 1 in Definition 2 directly contradicts with Condition 1 in Theorem 1, which indicates that TBTCI designs can not be optimal when θ = 0. However, TBTCI designs satisfying Equation (5) are robust and highly efficient for all values of θ , which will be illustrated in the next section. Moreover, TBTCI designs exist more commonly than the designs in Theorem 1. Hedayat and Zheng (2010) gave different methods of constructing TBTCI designs.

EVALUATION OF THE EFFICIENCIES
For any θ , let us denote by d(θ ) the corresponding optimal design, then we can define the A-efficiency of a design d at this θ to be AE(d, θ) = Tr(M −1 d(θ) )/ Tr(M −1 d ). We would be interested in deriving AE(d, ·) for any design d, which should be resorted to deriving Tr(M −1 d(·) ). However, d(·) is generally not known except when θ = 0. In this article, we instead derive a lower-bound curve (·) ≤ Tr(M −1 d(·) ), then LB(d, ·) = (·)/ Tr(M −1 d ) serves as the lower bound of AE(d, ·). In the following, we will simply call LB(d, ·) the efficiency of design d. Section 3 gives optimal designs in when θ = 0, and optimal designs in for general θ . We would like to evaluate the efficiencies of these designs. For each design, we have where C (i) As in (4), we have the representation Further, we have the relationship For any t × t matrix M, define the functions φ(M) = 1 t M1 t and ϕ(M) = Tr(M) − 1 t M1 t /t. By (6), (7), and (8), we have where q d = min (x,y,z)∈R 3 Q d (x, y, z), and By (46) in Supplemental Materials, we have since Q d (x, y, z) is a linear function of M (i) d , i = 1, 2, 3. By direct calculation, we have Let T u d (resp., F u d ) be the portion of T d (resp., F d ) corresponding to the uth subject, C (1u) d+ , i = 1, 2, 3, by ignoring the first row and the first column of the latter. If we further denote by Q u d+ (x, y, z) the analogues of Q d+ (x, y, z) with its components M (i) (11) − η (r d0 − n(p − 1)/(t + 1))z + r d0 2 x 2 /t. (14) Even though the notation H u d (·) has both the superscript u and the subscript d, this function actually depends only on the sequence, based on which the uth subject in design d is taking the treatments. Hence the design influences the summation in (14) through choosing n sequences from all (t + 1) p possible sequences with replacement.
For example, the four sequences 0234, 0194, 0267, and 0854 are TC-equivalent since all of them start with the control treatment 0, which is then followed by three distinct test treatments. Observe that two TC-equivalent sequences should result in the same function H(·). By classifying all the (t + 1) p sequences into, say, J classes according to TC-equivalence and denoting by H j (·) the function for the jth class, j = 1, 2, . . . , J, we can write (14) in the form of where w dj is the proportion of the number of sequences from class j in design d, and hence J j=1 w dj = 1. Now we have By (9) and (16), we have: Theorem 3. For any n, t, p, and θ , the reciprocal of the righthand side of (16) is a lower bound of Tr(M −1 d ) for any d ∈ n,t+1,p .
For level (I) maximization, the problem could be reduced to the linear programming problem: Maximize J j=1 w j H j Subject to n w j a j = h 1 ; n w j b j = h 2 ; w j = 1; w j ≥ 0, j = 1, 2, . . . , J.
and χ 1 is defined as in Theorem 2 in Section 3.
Proof. By ignoring F d in C d , we have By theS i argument as in Lemma 4 in Section 6, we can obtain wherẽ In deriving (17), we used the fact t i=1 (l dik − r di /p) ≥ (l d0k − r d0 /p) 2 /t, k = 1, 2, . . . , p. The lemma is now concluded by noting n u=1 n 2 d0u ≥ χ 1 and p k=1 (l d0k − r d0 /p) 2 ≥ χ 4 . For level (III), we could use Newton-type algorithm. When n = 36, p = 3, t = 4, a TBTCI design d 2 with r d0 = 36 would be constructed through the method in Hedayat and Zheng (2010), also d 3 = (1, 1) ⊗ d 1 is the optimal design when θ = 0. The corresponding lower bound (·) as specified by Theorem 3 could be calculated by the methods mentioned above; Note that the screening procedure for level (II) maximization saved the calculating time by 77.89%. The total time needed to calculate (θ ) at one value of θ is 386.50 seconds (CPU: Intel 2 Duo 1.80 GHz; Software: R), while the direct search for optimal designs would require 1.11 × 10 26 years.  1 1 2 2 2  1 1 1 2 2 2 3 3 3 4 4 4 0 0 0 0 0 0  2 3 4 1 3 4 1 2 4 1 2 3 2 3 4 1 3 4   3 3 3 4 4 4 1 1 1 2 2 2 3 3 3 4 4 4  0 0 0 0 0 0 2 3 4 1 3 4 1 2 4 1 2 3  1 2 4 1 2 3 0 0 Figure 1 shows the performance of d 2 and d 3 with respective to (·). There should not be any surprise in seeing that d 3 is more efficient than d 2 when θ is small (d 3 is optimal when θ = 0, and also C d is continuous in θ ). However, d 3 becomes very inefficient when θ becomes large. Instead, the TBTCI design d 3 , even though not optimal, is always highly efficient for each θ . Also observe that the lower bound here is very tight since it is so close to the A-criterion of an existing design when θ = 0 or ∞. Hence it is proper to use this lower bound to calculate the efficiency of designs. Figure 2 shows that their efficiencies calculated by (θ )/ Tr(M −1 d i ), i = 2, 3, θ ∈ [0, ∞].  By Lemma 4 in Section 6, the A-criterion of a TBTCI design d has the form Tr(M −1 d ) = n −1 g p,t (r d0 /n, θ) for some function g which depends on the values of p and t. Hence the A-criterion of a TBTCI design is proportional to n −1 as long as the ratio r d0 /n and other parameters are fixed. Actually, the lower bound (·) has a similar property. To see this, let * (·) be the reciprocal of the right-hand side of (16) when we allow h 1 and h 2 to be real numbers instead of integers under the maximization. Then (·) should be very close to * (·) especially when n is large, and we have * (θ ) ≤ (θ ) for any θ in general. Also, we have * (θ ) = n −1g p,t (θ ), hence * (θ )/ Tr(M −1 d ) only depends on the ratio r d0 /n. Figure 3 gives the more conservative efficiencies of TBTCI designs with the ratio being 1, 0.95, 0.9 (if they exist) in both and when p = 3 and t = 4. This figure gives the guidance regarding what kind of TBTCI design should be chosen. Similar work could be carried out for other configurations of p and t. We postpone the discussion of possible improvements on current results to Section 5.
To evaluate the MV-efficiencies of TBTCI designs and the type of designs proposed in Theorem 1 in Section 3, it is useful to note that the matrix M d is completely symmetric for them. For any of these designs, we have MV d = A d /t, where A d and MV d are the A-criteria and MV-criteria of the design d. Let d a and d mv be the A-optimal and MV-optimal designs, then we have Hence we have: Corollary 1. For any design proposed by Theorems 1 and 2 in Section 3, its MV-efficiency is at least as large as its A-efficiency regardless of the competing class of designs.

DISCUSSION AND CLOSING REMARKS
In a mixed linear model with random subject effects, the ratio (θ ) of the variance of the subject effect to the variance of the error plays a crucial role in deciding which design should be used. Note that in estimating the treatment effects, θ = 0 corresponds to the model without the subject effects while θ = ∞ corresponds to the model with fixed subject effects. For the latter model, Hedayat and Yang (2005) found optimal designs in the subclass and conjectured that will not exclude too many good designs so that optimal designs in this subclass is still highly efficient or even optimal in . As for how efficient these designs will be, they didn't carry out the investigation.
In this article, we dealt with general θ ≥ 0 and found optimal designs in , which naturally covered the result in Hedayat and Yang (2005) as a special case when θ = ∞. Moreover, we found optimal designs in when θ = 0. Further, we gave the algorithm to calculate the lower bound of Tr(M −1 d ), which turns out to be very close to the value of optimal designs. Hence it is proper to use this lower bound to evaluate the efficiencies of the designs proposed. Specifically, we evaluated the behavior of two types of designs. One is the optimal designs when θ = 0, whose efficiency decreases dramatically as θ increases. The other is the TBTCI designs with properly chosen value of r d0 , which are shown to be robust across different values of θ . Since we usually don't know the value of θ in application, this robustness is critical. Also, it is gratifying that TBTCI designs commonly exist (Hedayat and Zheng 2010), while the optimal designs for θ = 0 is not so common to exist due to the first condition of Theorem 1 in Section 3.
In studying universal optimality, Hedayat, Stufken, and Yang (2006) proved the high efficiency of the totally balanced designs, which is essentially a special type of TBTCI designs with the property of r d0 = r d1 . Particularly, they proved the universal optimality of the totally balanced designs in 2 = {d ∈ |l dip = n/(t + 1), m dii = 0, i = 0, 1, . . . , t}, which is certainly true for A-optimality and MV-optimality. However, as pointed out by one of the referees, optimality frameworks based on functions of M d result in placing overriding weight on the control and hence higher control replication, which is shown in Section 4. Actually, the optimality of the totally balanced designs in 2 established by Hedayat, Stufken, and Yang (2006) could be explained by the fact that the class 2 rules out any designs with unequal replications of treatments. Also, the high efficiency of the totally balanced designs in established therein based on the trace approach could not be carried over to A-criteria and MV-criteria.
At the same time, we observe from Figure 3 that the gap between optimal designs in and is more obvious for small values of θ . We conjecture that this is due to the restriction of m dii = 0, i = 0, 1, . . . , t, in view of Condition 1 in Theorem 1 in Section 3. When θ = ∞,  and Yang and Park (2007) extended the class of competing designs from to 1 = {d ∈ |l d0k = r d0 /p, k = 1, 2, . . . , p} when either of the following conditions satisfies (i) p = 3 and 3 ≤ t ≤ 20; or (ii) p ≥ 4, (p − 3)(p − 2) + 2 ≤ t ≤ (p − 2)(p − 1) + 1 and n ≥ p(p − 1)/2. They proposed designs allowing part of the subjects in the study to have identical treatments in the last two periods, which are A-better than TBTCI designs. However, we note that Conditions (i) and (ii) here imposed on n, t, p are quite restrictive, therefore it would be essential if we could find optimal designs in 1 or even for wider ranges of the parameters n, t, p. Another possible direction for future research is to study the case when p > t + 1, for which we have not seen any work carried out yet. We believe some designs with similar structures as TBTCI designs would be highly efficient in this case.

MAIN PROOFS
Lemma 1 in Section 3 gives an explicit relationship between the optimality criteria and the information matrix C d . In order to find optimal designs, we first need to find a function 0 (θ ) such that for any (d, θ). If at the same time we have Tr(M d * (θ * ) −1 ) = 0 (θ * ), then the design d * would be optimal when θ = θ * . To establish (18), we can start with maximizing C d in the Loewner's sense.
Lemma 3. For any design d, we have The equality in (19) holds for any design d in which Proof. It is sufficient to prove Lemma 4. (i) For any design d ∈ t+1,n,p , we have where (1 + θ p)pn .
The equality in (20) still holds under the same conditions.
The proof of Lemma 4 is similar to lemma 4 of Hedayat and Yang (2005), and we postpone it to the Supplemental Materials. The merit of Lemma 4 is that we settle down with a very special subclass of designs such that the A-criterion of designs therein are easy to be calculated directly and at the same time the optimal design is guaranteed to be included in this subclass. For the special case of θ = 0, we are ready to prove Theorem 1 in Section 3.
For general value of θ , we first need the following four preliminary lemmas: Lemma 5. For any d ∈ t+1,n,p , we have r d0 ≤ p[ n 2 ],r d0 ≤ (p − 1)[ n 2 ], and l d0k ≤ [ n 2 ]. Proof. If l d0k > [ n 2 ], it will conflict with the condition that m d00 = 0. The other two inequalities follow immediately by noting that l d0k = l d0k for any 1 ≤ k = k ≤ p.
Proof. To maximize n u=1 n d0uñd0u , it is necessary for r d0 to attain its maximum, which is p[ n 2 ] by Lemma 5. (i) When n is an even number, then r d0 = np/2 and the distribution of control treatment in the design fall into one type: n/2 of the subjects take the control treatment at all even periods; the remaining half of the subjects take the control treatment at all odd periods. Then the inequality trivially holds. (ii) When n is an odd number, the maximum of n u=1 n d0uñd0u will be attained when one of the subjects does not take the control treatment while the remaining n − 1 subjects take the control treatment in the same way as in (i). To see this, suppose Subject 1 has the smallest value ofñ d0u . Ifñ d01 > 0, without loss of generality let us suppose Subject 1 takes the control treatment at the second period. There always exist a subject (say, 2) who takes test treatments in the second period as well as in the neighboring periods, that is, periods 1 and 3, since n is odd. Then we can exchange the treatments between these two subjects at the second period so that m d00 is still 0. By this exchange, the decrement of n d01ñd01 is at most 2ñ d01 , while the corresponding increment of n d02ñd02 is at least 2ñ d02 + 1. Sinceñ d02 ≥ñ d01 , n u=1 n d0uñd0u is increased. Hence we have, by the argument in (i), that n u=1 n d0uñd0u ≤ 1 4 (n − 1)p(p − 1) ≤ 1 4 np(p − 1).