Equivalence Testing With Particle Size Distribution Data: Methods and Applications in the Development of Inhalative Drugs

ABSTRACT Key criteria of the quality of inhalative drugs are assessed in experiments generating so-called particle size distributions as data. Many experiments of that kind are carried out to demonstrate that necessary modifications to whatever part of the manufacturing process do not substantially change basic characteristics of an inhalable drug product. The equivalence testing procedures we derive for that purpose rely on different models accommodating the specific structure of such data and on different ways of specifying the region of nonrelevant differences. For each hypotheses formulation, three different tests are derived (two parametric and one asymptotically distribution-free procedures) and compared in terms of level and power. The results support the conclusion that the asymptotically distribution-free procedure exhibits surprisingly favorable properties. Supplementary materials for this article are available online.


Introduction
Chronic respiratory diseases are one of the leading causes of death accounting for an estimated proportion of approximately 7% of all deaths occurring worldwide. Therefore, it is a highly important challenge to develop effective drugs to improve and prolong the lives of people affected by such a disease. Most of these drugs are administered by inhalation using aerosol sprays for transporting to a patient's respiratory tract the active ingredient as colloids of particles. Whether or not such a colloidal particle reaches a specific site within the respiratory system mainly depends on its size, and it must be ensured that a significant fraction of the particles of an aerosol exhibit a size enabling their deposition in the lung. Actually, studying the aerodynamic particle size distribution (APSD) as a whole is one of the major tools of quality control for inhaled drugs.
According to the pertinent guidelines (USP 34, 2011;EP 7.5, 2012), an APSD is recorded by means of a so-called cascade impactor. Using this device, the airstream carrying the particles is driven through a sequence of stages numbered 0 to s. Figure 1 shows the stages, the corresponding particle sizes, and the area in the respiratory tract where these particles are expected to be deposited for an impactor with s = 7.
After the air stream has passed through the impactor, the weight of all particles of the active ingredients that were deposited at each stage is determined by means of High Performance Liquid Chromatography (HPLC). The impactor with architecture shown in Figure 1 comprises three additional parts (Mouth Adapter, perpendicular Sample Induction Port collecting particles to be deposited in the throat region, and Filter), for which the amount of deposited particles is also  Burnell .) demonstrated that in the test laboratory the analytical method under consideration works essentially as well as in the reference laboratory.
In Section 2, we introduce different measures of dissimilarity and corresponding statistical hypotheses formulations for equivalence testing with multivariate populations consisting of vectors with nonnegative components summing up to unity. In Section 3, testing procedures for the hypotheses of interest are derived under the assumption that the observations are Dirichlet distributed. In Section 4, analogs of these tests are constructed under another parametric model named after R.L. Obenchain. Asymptotically distribution-free tests for the same hypotheses are presented in Section 5. Major results of an extensive simulation study of level and power of all tests under consideration are the topic of Section 6. The use of the testing procedure in assessing real data is illustrated in Section 7 with measurements obtained from a method transfer study.

Measures of Dissimilarity and Formulation of Hypotheses
Throughout, we consider a two-sample setting with individual observations X r = (X r1 , . . . , X r p ) , r = 1, . . . , R, Both the X r and the Y t are assumed to be iid with nonnegative components satisfying p k=1 X rk ≤ 1 and p k=1 Y tk ≤ 1, respectively. Defining in addition it is clear that the associated (p + 1)-dimensional random vectors (X r0 , X r1 , . . . , X r p ) and (Y t0 , Y t1 , . . . ,Y t p ) all have precisely the structure envisaged in Section 1. The X r are assumed to be obtained under conditions forming a reference against which an experimental or "test" condition used in taking the measurements Y t has to be compared. According to standard practice of evaluating APSD data, this comparison has to be made with the objective of demonstrating that the corresponding vectors π 1 = (π 10 , . . . , π 1p ) = E (X r0 , . . . , X r p ) , of expected values are sufficiently similar. In the special case p = 1, the problem of choosing a suitable measure of dissimilarity of π 1 and π 2 is the same as in the binomial two-sample setting. In the binomial case, convincing arguments can be made for requiring of equivalent distributions that the log-odds ratio ln (π 21 /π 20 )/(π 11 /π 10 ) must lie in a sufficiently narrow neighborhood around zero (see Wellek 2010, sec. 1.6 and 1.7). We adopt this view using throughout as measures of dissimilarity of π 1 and π 2 parametric functions that depend on the expected proportions π νk only through the odds ratios k = π 2k π 10 π 20 π 1k , k = 1, . . . , p, between all but the largest particle size category labeled 0, and the latter as reference. For defining equivalence in terms of these odds-ratios (or their log's), two basic options are available, namely, a componentwise and an aggregate approach.
In the componentwise approach, one specifies for each k = 1, . . . , p a pair (ε (k) 1 , ε (k) 2 ) of (tendentially small) positive numbers to be used as equivalence margins requiring of an equivalent pair (π 1 , π 2 ) of expected proportions that the true value of ln k must not fall outside the interval (−ε (k) 1 , ε (k) 2 ) around zero. Combining the p pairs of hypotheses leads to the intersection-union testing problem In this formulation, H 1 corresponds to a region of rectangular shape in the parameter space of the ln k , and rejection of H 0 means that equivalence of the expected particle size distributions under comparison could be established simultaneously for all categories.
In the aggregate approach, the p measures of dissimilarity of the proportions expected in the individual categories are combined into a single function of (π 1 , π 2 ), which takes nonnegative values only and vanishes if and only if the traditional twosided null hypothesis π 1 = π 2 holds true. A natural choice of this parametric function is the squared Euclidean distance of the vector (ln 1 , . . . , ln p ) from the origin as given by ln π 2k + ln π 10 − ln π 20 − ln π 1k 2 .
The corresponding equivalence testing problem reads where δ 2 0 is another prespecified positive constant. Equivalence in the sense of (7) means that the componentwise log-odds ratios make up a point that falls in a sufficiently small neighborhood of zero of spherical shape. The radius δ 0 of this sphere takes here the place of the componentwise equivalence margins (ε (k) 1 , ε (k) 2 ) appearing in (4). A proposal for choosing numerical values of these margins will be made in Section 7.

Maximum Likelihood Estimation for the Dirichlet Family of Distributions
The Dirichlet family of distributions provides a natural model for the data structure we are dealing with, insofar as the sample space of any member of this family is precisely that in which any observed APSD with given number p + 1 of different categories takes its value. By definition, each such distribution is given by a density function of the form with α j > 0 ∀ j = 0, . . . , p. For Dirichlet distributed data, the expected values π j upon which we focused in the previous section, admit the representation π j = α j /A with A = p l=0 α l . The equivalence tests to be derived in this section will be Wald-type asymptotic procedures based on the maximum likelihood estimators for the Dirichlet parameters α j obtained from both samples. Solving the likelihood equations corresponding to a random sample X 1 , . . . , X n from (8) is technically rather demanding, because these equations involve the digamma function ψ ( · ), that is, the log-derivative of (·). A detailed description of the algorithm to be applied to find the ML estimator denotedα n in the sequel, is postponed to Appendix A1 of the SM (online supplementary materials). As is shown there,α n is uniquely determined and asymptotically normal, in the sense, that there holds √ n(α n − α) for every value α of the estimand. I −1 (α) stands for the inverse of the expected information matrix and can be written as

Componentwise Intersection-Union Tests
Let us now assume that the data are given by two independent samples X = (X 1 , . . . , X R ) and Y = (Y 1 , . . . ,Y T ) from (p + 1)-dimensional Dirichlet distributions with possibly different parameter vectors α 1 and α 2 . Applying the main result stated in Section 3.1, we can write In (11), N stands for the total sample size R(N) + T (N), and c denotes the limiting value of the relative size R(N)/N of Sample 1, which is assumed to be neither 0 nor 1 so that the same holds true for the limit of the relative size of Sample 2. Under the Dirichlet model, the odds ratios of (3) can be written as Denoting for brevity the components of the estimated parameter vectorα (1) T (N ) byα 1k andα 2k , respectively, the ML estimator of the odds ratio for the kth component is given bŷ k =α 2kα10 α 20α1k . Accordingly, the numerator of a Wald-type statistic for testing the null hypothesis of (4) is Making use of the delta method (see, e.g., Rao 1973, sec. 6a.2), it can be inferred from (11) that for any parameter configuration with˜ (k) as the logarithm of the true value of (12). The asymptotic variance of with for ν = 1, 2. (Full details of the derivation of these formulas are provided in SM/Appendix A2.) Replacing in (15) and (16) all parameters α νk with their ML estimatesα νk yields a consistent estimatorv (k) , say, of v (k) , which implies that the asymptotic normality of T (k) N is also guaranteed when it is standardized by means ofv (k) rather than v (k) .
In view of these facts, asymptotically valid (1 − α)confidence bounds to˜ (k) = ln (k) are obtained by means of Now, the principle of confidence interval inclusion (see Wellek 2010, sec. 3.1) implies that an asymptotically valid level-α test for the elementary equivalence problem (4) is given by the following decision rule: Finally, the intersection-union principle (see Berger 1982; Wellek 2010, sec. 7.1) allows us to conclude that an asymptotically valid test for the combined problem (5) is obtained by applying the rule:

Aggregate Test for Equivalence of Two Dirichlet Distributions
In the aggregate counterpart of the intersection-union test derived in Section 3.2, the major building block of the test statistic is the ML estimator, sayδ 2 N , of the squared Euclidean distance δ 2 (recall Equation (6)) written as a function of (α 1 , α 2 ). Plugging-in the ML estimators for the individual components of the basic parameters yieldŝ From (11) it follows that we have under any parameter configuration for which δ 2 is the true value of p k=1 [ln α 2k + ln α 10 − ln α 20 − ln α 1k ] 2 . Deriving an explicit expression for the asymptotic variance can again be done by means of the delta method. Leaving the technical details as SM/Appendix A3, one obtains another surprisingly neat formula, namely, where the ν are defined as in (16) and ζ ν has the components As before, the asymptotic variance of the test statistic can be consistently estimated by replacing the α νk with their ML estimatorŝ α νk , implying that the following decision rule defines an asymptotically valid level-α test for aggregate equivalence in the sense of (7):

Description of the Model
Since the existing literature contains several equivalence testing procedures for multivariate normal data (see, e.g., Chervoneva, Hyslop, and Hauck 2007; Wellek 2010, sec. 8; Hoffelder, Gössl, and Wellek 2015), it would be tempting to start from the assumption that the X r = (X r0 , . . . , X r p ) and Y t = (Y t0 , . . . ,Y t p ) are iid N (μ 1 , 1 ) and N (μ 2 , 2 ), respectively, with unknown (p + 1)-vectors μ ν of expected values and covariance matrices ν of order (p + 1) × (p + 1), which are likewise unknown and possibly different. Unfortunately, this model assumption would be at variance with the basic property of APSD data according to which there holds p k=0 X rk = 1 ∀r = 1, . . . , R and p k=0 Y tk = 1 ∀t = 1, . . . , T . By the same reason, replacing the multivariate Gaussian with the multivariate log-normal model for the basic observations would be incompatible with the data structure. What one can do instead is to construct a transformation g, say, which allows one to represent each observed APSD in a 1:1 manner by a vector with p rather than p + 1 components and to assume multivariate normality for the transformed observations g(X r ) and g(Y t ). The specific transformation that we will use is that proposed by Obenchain in an unpublished oral communication cited by Kotz, Johnson, and Balakrishnan (2000, sec. 44.5). The definition of Obenchain's transformation reads As is shown in SM/Appendix A4 , on the sample space of an observed APSD, that is, on the set , g is 1:1, and its inverse is given by Denoting the transformed sample observations by all subsequent considerations to follow in this section rely on the assumption that whereμ ν and˘ ν denotes some vector with p components and a positive definite p × p-matrix, respectively. Whenever (27) and (28) hold true, we say that we are given independent samples of size R and T from Obenchain distributions with parameters μ 1 ,˘ 1 andμ 2 ,˘ 2 and write for brevity One of the implications of this model is that the relative size of the kth category with respect to the category labeled 0 as reference has expected value exp{μ 1k +σ (1) kk /2} and exp{μ 2k + σ (2) kk /2} for an observation of Sample 1 and 2, respectively. Taking the approximate equations have to be considered as direct analogs of the logarithms of the odds-ratios (3) for the Obenchain model. Although the approximation behind (30) is obtained heuristically by neglecting the dependence between X rk and X r0 and applying Jensen's inequality (see, e.g., Rao 1973, p. 58) to E(1/X r0 ), its accuracy was checked by simulation and found satisfactory. Making the additional assumption that the multivariate normal variables into which the primary observations are mapped by means of the Obenchain transformation (24) are homoscedastic so that we have˘ for some positive definite˘ of order p × p, the parameters to be assessed in testing for equivalence of two Obenchain distributions are simply the differences of the expected values of the transformed observationsY t andX r .

Tests for Equivalence of Two Obenchain Distributions
Under the assumptions and conventions of Section 4.1, (1 − α)confidence intervals for the nonstandardized differences of the expected valuesδ k =μ 2k −μ 1k estimated byT (k) N =Y k −X k are obtained through computing classical central-t-based CIs with the kth components of the Obenchain transformed vectors as data. Furthermore, under homoscedasticity of the transformed observations, (30) implies that the Obenchain version of the aggregate equivalence testing problem (7) reads An asymptotically valid solution to this problem is obtained by making use of results established in Hoffelder, Gössl, and Wellek (2015). Full details of both the componentwise and the aggregate tests for equivalence of two Obenchain distributions from which independent samples are taken, are given in SM/Appendix B.

Regularity Conditions
Let us now merely assume that both underlying distributions belong to some multivariate location-scale family that satisfies a few basic regularity conditions. Precisely speaking, an arbitrary element V (μ, ), say, of that family of distributions is assumed to have the following properties: (i) expected value equals μ.
(ii) covariance matrix is equal to where is positive semidefinite.
As before, the samples are denoted by where X r and Y t are iid random variables with The expected values μ 1 and μ 2 will be estimated by the vectors X andȲ of sample means. According to (iii), we have We again set N = R + T and, like in Section 3, we assume that the relative size R/N of Sample 1 converges (as N → ∞) to some limit c with 0 < c < 1. From (33) it follows that As the X r are assumed to be independent of the Y t , (34) implies that we have As usual, the covariance matrices 1 and 2 are consistently estimated by the empirical covariance matrices S 1 and S 2 .

Asymptotically Distribution-Free Intersection-Union Test for Equivalence
Under the semiparametric model introduced in Section 5.1, the odds ratios k of (3) can be consistently estimated bŷ By means of the delta method, the asymptotic distribution of √ N(lnˆ k − ln k ) can be shown to be normal with mean zero and variance where for ν = 1, 2. (The derivation of these formulas involves essentially the same steps as that of (15) and (16) so that the details need not be given here.) To consistently estimate w (k) one has simply to plug in the sample means and (co-)variances for the μ ν j and σ (ν) jk , respectively, and the actual relative size R/N of Sample 1 for its limiting value c, which yields the expression Altogether, these facts ensure that we have Thus, the asymptotically distribution-free analog of Equation (17) reads and with the confidence bounds obtained in this way, we can again make use of the decision rules defined in (18) and (19.)

Asymptotically Distribution-Free Aggregate Test for Equivalence
To indicate that under the present model the aggregate distance measure of (6) is a function of both vectors of expected values, we slightly change notation setting ln μ 2k + ln μ 10 − ln μ 20 − ln μ 1k 2 . (43) Again, it is natural to estimate the target parametric function we are interested in through replacing the expected values with the empirical means observed in the two samples. Denoting the resulting estimator by δ 2 N (X,Ȳ ), we have For the asymptotic distribution of δ 2 N (X,Ȳ ), we obtain by means of the delta method where the asymptotic variance w is shown in SM/Appendix A5 to be given by Denoting by 1 and 2 the (p + 1)-vector of all partial derivatives of δ 2 (μ 1 , μ 2 ) with respect to the components of μ 1 and μ 2 , respectively, there holds for ν = 1, 2, and (46) can be simplified to The steps to be taken to obtain a consistent estimatorŵ for the asymptotic variance of √ N δ 2 N (X,Ȳ ) are analogous to those leading from (37) to (39). Precisely, the required estimator is given byŵ where, for ν = 1, 2, the vectorˆ ν has the components: Finally, an asymptotically distribution-free test for the problem H 0 : δ 2 (μ 1 , μ 2 ) ≥ δ 2 0 versus H 1 : δ 2 (μ 1 , μ 2 ) < δ 2 0 is obtained through applying the decision rule

Simulation Results on Level and Power
The small-sample behavior of the tests derived above was investigated in an extensive simulation study analyzing datasets generated from both the Dirichlet and the Obenchain distribution.
In studying the level properties of the tests, four different simulation scenarios were constructed for every type of distribution. In a first step of this process, the distribution of the X r playing the role of a reference against which the distribution of the Y t has to be compared, was specified emulating an APSD distribution obtained in some real application. The sample meansX k and (co-)variances S jk calculated from this dataset (for the raw values see SM/ Table F1a) are shown in Table 2. Under the Dirichlet model, the reference distribution was assumed to have the maximum likelihood estimates obtained from the underlying raw data as true parameter values α 1k . To specify a reference distribution of the Obenchain type, the data underlying Table 2 were first transformed by means of (25). Calculation of sample means X k and (co-)variancesS jk with these transformed observations  gave the values shown in Table 3, and these were used in the simulations as Obenchain parametersμ 1 ,˘ 1 for generating the first sample X 1 , . . . , X R . To create configurations falling at the boundary of the null hypothesis of inequivalence, the parameters for the distribution of the Y 1 , . . . ,Y T had to be chosen to differ from those for the reference population, subject to the condition that the respective measure of distance must take the value specified as equivalence margin. Specifically, we constructed four different points in the parameter space of a Dirichlet distribution satisfying that condition with equivalence margin δ 2 0 = ln 2 (7/3) = 0.8473 2 = 0.7179. The precise specifications made under the corresponding scenarios can be seen from Table 4. The proposed choice of the equivalence margin can be motivated as follows: In the special case p = 1, the testing problem (7) reduces to that of testing for equivalence of two responder rates in the binomial twosample setting. If the true value of the reference responder rate is 50%, then it can be argued (see Wellek 2010, sec. 1.7) that tolerating a deviation of the experimental rate of up to 20% from this value has to be rated as a fairly unrestrictive or "liberal" choice of the margin. Translating the odds ratio between 0.70 and 0.50 to the log-scale and choosing the equivalence interval symmetric about zero on that scale amounts to requiring ln 2 1 < 0.7179 under the alternative hypothesis. Applying the same bound to a sum of an arbitrary number p ≥ 1 of squared log-odds ratios seems reasonable, then, since the corresponding sphere is the largest one being contained in the cube with edges of length 2 √ 0.7179. The values of the parametersμ 1 andμ 2 of the four scenarios used in the simulations with Obenchain distributed data are listed in SM/ Table D1.
In studying the level properties of the intersection-union test for componentwise equivalence, the same parameter configurations were used as in investigating the behavior of the test for aggregate equivalence under the null hypothesis. This required to shrink the equivalence margins ε (k) 1 , ε (k) 2 to be specified for the individual components of ln away from δ 0 = 0.8473 toward zero for some (in Scenario 4 with Obenchain data for all) components as far as necessary for making the true vector of logodds ratios assumed under the respective scenario a point lying on the boundary of the corresponding p-dimensional rectangle. The detailed lists of componentwise equivalence intervals are shown in SM/Tables D2 and D3.
The power attained in each of the proposed tests was studied under two different scenarios having in common that the distributions underlying the samples under comparison coincide. Under the first power simulation scenario (POWSCEN 1), the distribution from which the Y t were generated were the same as that used under the respective model in the first, levelrelated part of the simulation study only for generating the X r . In the second scenario (POWSCEN 2), the characteristics of both distributions of APSDs were determined from the data of another empirical example yielding the Dirichlet parameters and Obenchain parameters shown as entries in the two righthand columns of Table 5. Furthermore, the covariance matrix of SM/ Table D5 was used as theoretical covariance matrix for generating the Obenchain transformed observations under POWSCEN 2.
The results of the simulation study of the level properties of our tests are shown in Tables 6 and 7.
The entries in Tables 8 and 9 are the values obtained by simulation for the power of the tests against the selected alternatives under different models.
For a detailed discussion of the numerical material contained in Tables 6-9, the reader is referred to SM/Appendix E. In summary, our simulation results admit the following major conclusions:  (i) In neither one of the tests, the level is exceeded by a practically inacceptable amount, as long as the test is applied in a setting satisfying the model for which its asymptotic validity is guaranteed. However, it can well happen that in a scenario for which convergence of the maximum rejection probability under the null hypothesis toward the prespecified value of α has been proven, the size of the test turns out considerably smaller than the nominal level. (ii) Regarding their level properties, those of the tests that have been constructed exploiting properties of a specific parametric family of distributions, are quite sensitive against violations of these assumptions. The deviations  between size and nominal level occurring in the parametric procedures under misspecifications of the model, can be considerable and go to both directions. In contrast, both versions of the semiparametric testing procedure turned out (in additional simulation experiments presented in Hoffelder 2012) to have surprisingly good level properties even when carried out with data vectors whose components fail to be numbers between 0 and 1 adding-up to unity as required for an APSD. (iii) When applied under the correct model, the parametric tests are more powerful than their semiparametric analogs, as had to be expected. However, the amount of this gain in power has an order of magnitude being of relevance for practice only under the Dirichlet model. When the data are taken from Obenchain distributions, the asymptotically distribution-free semiparametric testing procedures turn out to be practically equivalent in power to the respective parametric procedures. In other words, it can be concluded that the Obenchain method is based on a sensible model, but its analysis requires no specific techniques leading beyond the asymptotically distribution-free inferential procedures.  Figure 2 visualizes the raw APSD data obtained from a method transfer experiment. In fully precise numerical form, the individual vectors obtained in that experiment are listed in SM/ Tables F1a and F1b. All tests were performed at the usual nominal level α = 0.05, and the equivalence margins were chosen to be ε (k) 1 = ε (k) 2 = 0.8473 uniformly in k for all IU tests and δ 2 0 = 0.8473 2 = 0.7179 for all aggregate tests. As first step of analysis, the basic parameters involved in the different approaches were estimated from both samples yielding the values shown in Table 10.

Illustrating Example
The entries in the empirical covariance matrices S 1 and S 2 , which are needed to apply the asymptotically distribution-free tests, are listed in Table 2 and SM/ Table F2. The elements of the corresponding covariance matricesS 1 andS 2 for the Obenchaintransformed samples are those shown in Table 3 and SM/ Table  F3.
The results of componentwise equivalence testing under the Dirichlet model are summarized in Table 11(a). Since each of these elementary tests can reject its null hypothesis, the intersection-union test leads to deciding in favor of componentwise equivalence of the underlying distributions of APSDs. From the results shown in Tables 11(b) and 11(c), it can be concluded that the other two versions of the intersection-union tests  this finding. Actually, the results presented by this group of authors do not contain evidence that the chi-square ratio test recommended in the FDA draft guidance for bioavailability and bioequivalence studies of nasal aerosols and nasal sprays for local action of 1999 is a valid procedure in the sense of controlling the Type I error risk for a testing problem with a precisely defined equivalence hypothesis as the alternative of interest. Altogether, we derived six testing procedures and thoroughly investigated their level and power properties in finite samples. By construction, all procedures are asymptotically valid with respect to the significance level. To assess their small-sample behavior, an extensive simulation study has been carried out. Summing up the key theoretical facts and our numerical results, we feel justified in stating that each of the tests is a possible candidate for performing confirmatory equivalence assessment in studies generating APSD data. However, in view of the limited improvements in efficiency obtained by replacing the asymptotically distribution-free procedures with parametric tests and the high sensitivity of the latter against model misspecifications, the tests considered in Section 5 seem to be the best choice. This recommendation is in accordance with the fact that independence from distributional assumptions is one of the key properties appearing on the consensus list of requirements for an "ideal" test for the evaluation of APSD profiles compiled by Adams et al. (2007).
Provided that prior knowledge about the processes generating the data under analysis is sufficiently rich for justifying parametric model assumptions, the pros and cons of the Dirichlet as compared with the Obenchain model seem to be largely balanced. The Dirichlet distribution has a long tradition in Bayesian analysis as a conjugate prior for the multinomial distribution (see Agresti and Hitchcock 2005), and the parameter space of the multinomial family has exactly the same structure as the sample space of a random variable taking APSDs as values. Furthermore, the parameter space of the Dirichlet family has comparatively low dimension allowing estimation procedures of reasonable properties even when the sample sizes are small. A crucial feature of the Dirichlet distribution, which limits its usefulness as a model for analyzing APSD data, is that it allows only for pairwise correlations of negative sign, a restriction that has no counterpart with the Obenchain distribution. Furthermore, constructing equivalence tests is technically much easier under the Obenchain model since, after a simple transformation of data, testing procedures for multivariate Gaussian distributions can be applied.
The differences in power we found in our simulations in favor of the intersection-union tests are mainly because with the equivalence margins chosen as proposed in Section 6, the aggregate equivalence regions are considerably smaller than the rectangular regions specified in the componentwise tests. Thus, establishing successfully the alternative hypothesis in an aggregate test leads to a more precise statement about the parameters under assessment, and higher statistical precision requires larger sample sizes or, with given sample sizes, reduces the power of an appropriate test.
Throughout, we measured the dissimilarity between both populations in terms of logarithmized (pseudo) odds ratios.
In contrast to differences between the proportions of particle size categories or to untransformed odds ratios, the range of these parametric functions is unbounded. This allows one to avoid the logical difficulty that defining equivalence in terms of raw differences yields regions that fail to be proper subsets of the parameter space, as has been explained in detail in Wellek (2010, sec. 1.7) for the case of binary data. The arguments put forward there could be used as points of orientation for determining suitable equivalence margins in the present context.
Strictly speaking, the parameters k which we introduced in Equation (3) and whose logarithms were used for defining both the componentwise and aggregate equivalence regions are pseudo odds ratios rather than odds ratios in the strict sense. In fact, the proportion of each individual particle size category is divided by that of some fixed reference category instead of the sum of the proportions assigned to all remaining categories. This makes sense as long as the whole set of categories admits some natural ordering. In the present context, that ordering is obvious from the structure of the Andersen Cascade Impactor (recall Figure 1), according to which the category chosen as reference represents the adapter of the system. In applications where the question of how to choose the reference category is not as easy to answer, it might be preferable to replace the pseudo with ordinary odds ratios. As shown in Hoffelder (2012), all steps of constructing equivalence testing procedures carried out in Sections 3-5 can be adapted to such a modified definition.

Supplementary Materials
The materials provided as supplements to this article are contained in a separate pdf document subdivided into six different sections. The first of these appendices contains proofs of some more technical mathematical results. Furthermore, the results of the simulation study are discussed in more detail, and the raw data for the method transfer study analyzed as an illustrating example, are presented.