Foundations for Envelope Models and Methods

Envelopes were recently proposed by Cook, Li and Chiaromonte as a method for reducing estimative and predictive variations in multivariate linear regression. We extend their formulation, proposing a general definition of an envelope and a general framework for adapting envelope methods to any estimation procedure. We apply the new envelope methods to weighted least squares, generalized linear models and Cox regression. Simulations and illustrative data analysis show the potential for envelope methods to significantly improve standard methods in linear discriminant analysis, logistic regression and Poisson regression. Supplementary materials for this article are available online.


INTRODUCTION
The overarching goal of envelope models and methods is to increase efficiency in multivariate parameter estimation and prediction. The development has so far been restricted to the multivariate linear model, ( 1.1) where i ∈ R r is a normal error vector that has mean 0, constant covariance Y|X > 0 and is independent of X, α ∈ R r and β ∈ R r×p is the regression coefficient matrix in which we are primarily interested. Efficiency gains in the estimation of β are achieved by reparameterizing this model in terms of special projections of the response vector Y ∈ R r or the predictor vector X ∈ R p . In this article we propose extensions of envelope methodology beyond the linear model to quite general multivariate contexts. Since envelopes represent nascent methodology that is likely unfamiliar to many, we outline in Section 1.1 response envelopes as developed by Cook, Li, and Chiaromonte (2010). An example is given in Section 1.2 to illustrate their potential advantages in application. A review of the literature on envelopes is provided in Section 1.3 and the specific goals of this article are outlined in Section 1.4.

Response Envelopes
Response envelopes for model (1.1) gain efficiency in the estimation of β by incorporating the projection P E Y onto the smallest subspace E ⊆ R r with the properties (1) the distribution of Q E Y | X does not depend on the value of the non stochastic predictor X, where Q E = I r − P E , and (2) P E Y is independent of Q E Y given X. These conditions imply that the distribution of Q E Y is not affected by X marginally or through an association with P E Y. Consequently, changes in the predictor affect the distribution of Y only via P E Y. We refer to P E Y informally as the material part of Y and to Q E Y as the immaterial part of Y.
The notion of an envelope arises in the formal construction of the response projection P E as guided by the following two definitions, which are not restricted to the linear model context. Let R m×n be the set of all real m × n matrices and let S k×k be the set of all real and symmetric k × k matrices. If A ∈ R m×n , then span(A) ⊆ R m is the subspace spanned by columns of A.
Definition 1. A subspace R ⊆ R p is said to be a reducing subspace of M ∈ R p×p if R decomposes M as M = P R MP R + Q R MQ R . If R is a reducing subspace of M, we say that R reduces M.
This definition of a reducing subspace is equivalent to that given by Cook, Li, and Chiaromonte (2010). It is common in the literature on invariant subspaces and functional analysis (Conway 1990), although the underlying notion of "reduction" differs from the usual understanding in statistics.
Definition 2 (Cook et al. 2010). Let M ∈ S p×p and let B ⊆ span(M). Then the M-envelope of B, denoted by E M (B), is the intersection of all reducing subspaces of M that contain B.
Definition 2 is the formal definition of an envelope, which is central to our developments. We will often identify the subspace B as the span of a specified matrix U: B = span(U). To avoid proliferation of notation in such cases, we will also use the matrix as the argument to E M : E M (U) := E M (span(U)).
The response projection P E is then defined formally as the projection onto E Y|X (β), which by construction is the smallest reducing subspace of Y|X that contains span(β) (or "envelopes" span(β) and hence the name "envelope"). Model (1.1) can be parameterized in terms of P E by using a basis. Let u = dim(E Y|X (β)) and let ( , 0 ) ∈ R r×r be an orthogonal matrix with ∈ R r×u and span( ) = E Y|X (β). This leads directly to the envelope version of model (1.1), . . . , n, (1.2) where β = η, η ∈ R u×p gives the coordinates of β relative to basis , and and 0 are positive definite matrices. While η, and 0 depend on the basis selected to represent E Y|X (β), the parameters of interest β and Y|X depend only on E Y|X (β) and not on the basis. All parameters in (1.2) can be estimated by maximizing the likelihood from (1.2) with the envelope dimension u determined by standard methods like likelihood ratio testing, information criteria, cross-validation or a hold-out sample, as described by Cook et al. (2010). In particular, the envelope estimator β env of β is just the projection of the maximum likelihood estimator β onto the estimated envelope, β env = P β, and √ n(vec( β env ) − vec(β)) is asymptotically normal with mean 0 and covariance matrix given by Cook et al. (2010), where vec is the vectorization operator that stacks the columns of a matrix and u is assumed to be known.

Cattle Data and Response Envelopes
The data for this illustration resulted from an experiment to compare two treatments for the control of an intestinal parasite in cattle: 30 animals were randomly assigned to each of the two treatments and their weights (in kilograms) were recorded at weeks 2, 4, . . . , 18 and 19 after treatment (Kenward 1987). Because of the nature of a cow's digestive system, the treatments were not expected to have an immediate measurable affect on weight. The objectives of the study were to find if the treatments had differential effects on weight and, if so, about when were they first manifested.
We begin by consider the multivariate linear model (1.1), where Y i ∈ R 10 is the vector of cattle weights from week 2 to week 19, and the binary predictor X i is either 0 or 1 indicating the two treatments. Then α = E(Y|X = 0) is the mean profile for one treatment and β = E(Y|X = 1) − E(Y|X = 0) is the mean profile difference between treatments. Fitting by maximum likelihood yields the profile plot of the fitted response vectors given in the top panel of Figure 1. The maximum absolute ratio of an element in β to its bootstrap standard error is only about 1.3, suggesting that there were no inter-treatment difference at any time. However, the likelihood ratio test statistics for the hypothesis β = 0 is about 27 with 10 degrees of freedom, which indicates that the two treatments did have different effects on cattle weights. Further analysis is necessary to gain a better understanding of the treatment effects. The literature on longitudinal data is rich with ways of modeling mean profiles as a function of time and structuring Y|X to reflect dependence over time. Although not designed specifically for longitudinal data, an envelope analysis offers a viable alternative that does not require prior selection of models for the mean profile or the covariance structure.
Turning to a fit of the envelope model (1.2), likelihood ratio testing and Bayesian information criterion (BIC) both give a strong indication that u = 1, so only a single linear combination of the elements of Y is relevant to comparing treatments. Details for determining an envelope dimension can be found in Cook et al. (2010), and we illustrate dimension selection in the real data applications of Section 6. The corresponding fitted weight profiles are given in the bottom plot of Figure 1. The two profile plots in Figure 1 are quite similar, except the envelope profiles are notably closer in the early weeks, supporting the notion that there is a lag between treatment application and effect. The absolute ratios of elements in β env to their bootstrap standard errors were all smaller than 2.5 before week 10 and all larger than 3.4 from week 10 and on. This finding gives an answer the original question: the two treatments have different effects on cattle weight growth starting no later than week 10.
To illustrate the working mechanism of the envelope estimator β env = P β graphically, as shown in Figure 2, we use only the weights at week 12 and 14 as the bivariate response Y = (Y 1 , Y 2 ) T . The maximum likelihood estimator of the first element β 1 of β under model (1.1) is just the difference between the treatment means at week 12. In terms of Figure 2 this corresponds to projecting each data point onto the horizontal axis and comparing the means of the resulting univariate samples, as represented schematically by the two relatively flat densities for Y 1 |(X = 0) and Y 1 |(X = 1) along the horizontal axis of Figure 2. These densities are close, indicating that it may take a very large sample size to detect the difference between the marginal means that is evident in the figure. In contrast, the envelope estimator for β 1 first projects the data onto the estimated envelope span( ) and then onto the horizontal axis, as illustrated in Figure 2. The result is represented schematically by two peaked densities on the horizontal axis. These densities are relatively well separated, reflecting the efficiency gains provided by an envelope analysis. The standard estimator β = (5.5, −4.8) T with bootstrap standard errors (4.2, 4.4) T , while the envelope estimator β env = (5.4, −5.1) T with bootstrap standard errors (1.12, 1.07) T .

Overview of Available Envelope Methods
Envelopes were introduced by Cook, Li, and Chiaromonte (2010) for response reduction in the multivariate linear model with normal errors. They proved that the maximum likelihood Representative projection paths, labeled "E" for envelope analysis and "S" for standard analysis, are shown on the plot. Along the "E" path only the material information P Y is used for the envelope estimator. estimator β env of β from model (1.2) is β env = P β and that the asymptotic variance of vec( β env ) is never larger than that for vec( β). The reduction in variation achieved by the envelope estimator can be substantial when the immaterial variation 0 = var( T 0 Y) is large relative to the material variation = var( T Y). For instance, using the Frobenius norm, we typically see substantial gains when || 0 || F || || F , as happens in Figure 2.
When some predictors are of special interest, Su and Cook (2011) proposed the partial envelope model, with the goal of improving efficiency of the estimated coefficients corresponding to these particular predictors. They used the Y|X -envelope of span(β 1 ), E Y|X (β 1 ), to develop a partial envelope estimator of β 1 in the partitioned multivariate linear regression where β 1 ∈ R r is the parameter vector of interest, X = (X 1 , X T 2 ) T and the remaining terms are as defined for model (1.1). The partial envelope estimator β 1,env = P β 1 has the potential to yield efficiency gains beyond those for the full envelope, particularly when E Y|X (β) = R r so the full envelope offers no gain. Cook, Helland, and Su (2013) used envelopes to study predictor reduction in multivariate linear regression (1.1), where the predictors are stochastic with var(X) = X . Their reasoning led them to parameterize the linear model in terms of E X (β T ), which again achieved substantial efficiency gains in the estimation of β and in prediction. They also showed that the SIM-PLS algorithm (de Jong 1993; see also ter Braak and de Jong 1998) for partial least squares provides a √ n-consistent estimator of E X (β T ) and demonstrated that the envelope estimator β env = βP T (S X ) typically outperforms the SIMPLS estimator in practice, where we use P A(V) := P A(V) = A(A T VA) −1 A T V to denote the projection onto A = span(A) in the V inner product, and Q A(V) = I − P A(V) . Given the dimension of the envelope, the envelope estimators in these three articles are all maximum likelihood estimators based on normality assumptions, but are also √ n-consistent with only finite fourth moments. It can be shown that the partial maximized log-likelihood functions L n ( ) = −(n/2)J ( ) for estimation of a basis of E Y|X (β), E Y|X (β 1 ) or E X (β T ) all have the same form with where the positive definite matrix M and the positive semidefinite matrix U depend on context. The estimated basis is = arg min J ( ), where the minimization is carried out over a set of semiorthogonal matrices whose dimensions depend on the envelope being estimated. Estimates of the remaining parameters are then simple functions of . We represent sample covariance matrices as S (·) and defined with the divisor n: T /n and S Y|X denotes the covariance matrix of the residuals from the linear fit of Y on X. To determine the estimators of the response envelope E Y|X (β), the predictor envelope E X (β T ) and the partial envelope E Y|X (β 1 ), we have { M, M + U} = {S Y|X , S Y }, {S X|Y , S X } and {S Y|X , S Y|X 2 }, respectively. For instance, a basis of the response envelope for the cattle data was estimated by maximizing Still in the context of multivariate linear regression, Schott (2013) used saddle point approximations to improve a likelihood ratio test for the envelope dimension. Su and Cook (2013) adapted envelopes for the estimation of multivariate means with heteroscedastic errors, and Su and Cook (2012) introduced a different type of envelope construction, called inner envelopes, that can produce efficiency gains when envelopes offer no gains. Cook and Zhang (2015) introduced envelopes for simultaneously reducing the predictors and responses and showed synergetic effects of simultaneous envelopes in improving both estimation efficiency and prediction. Cook and Zhang (2014) proposed a fast and stable 1D algorithm for envelope estimation.

Organization and Notation
The previous studies on envelope models and methods are all limited to multivariate linear regression. While envelope constructions seem natural and intuitive in that setting, nothing is available to guide the construction of envelopes in other contexts like generalized linear models. In this article, we introduce the constructive principle that an asymptotically normal estimator φ of a parameter vector φ may be improved by enveloping span(φ) with respect to the asymptotic covariance of φ. This principle recovers past estimators and allows for extensions to many other contexts. While the past envelope applications are rooted in normal likelihood-based linear regression, the new constructive principle not only broadens the existing methods beyond linear regression but also allows generic moment-based estimation alternatives.
The rest of this article is organized as follows. In Section 2 we give a general constructive envelope definition. Based on this definition, we study envelope regression in Section 3 for generalized linear models and other applications. In Section 4, we extend envelope estimation beyond the scope of regression applications and lay general estimation foundations. We propose moment-based and objective-function-based estimation procedures in Section 4.1 and turn to envelopes in likelihood-based estimation in Section 4.2. Simulation results are given in Section 5, and illustrative analyses are given in Section 6.
Although our focus in this article is on vector-valued parameters, we describe in a supplement to this article how envelopes for matrix parameters can be constructed generally. Due to space limitations, we focus our new applications in generalized linear models. The supplement also contains applications to weighted least squares and Cox regression, along with additional simulation results, proofs and other technical details.
The following additional notations and definitions will be used in our exposition. The Grassmannian consisting of the set of all u dimensional subspaces of R r , u ≤ r, is denoted as G u,r . If √ n( θ − θ ) converges to a normal random vector with mean 0 and covariance matrix V we write its asymptotic covariance matrix as avar( √ n θ ) = V. We will use operators vec : R a×b → R ab , which vectorizes an arbitrary matrix by stacking its columns, and vech : R a×a → R a(a+1)/2 , which vectorizes a symmetric matrix by stacking the unique elements of its columns. Let A ⊗ B denote the Kronecker product of two matrices A and B. Envelope estimators will be written with the subscript env, unless the estimator is uniquely associated with the envelope construction itself, in which case the subscript designation is unnecessary. We use θ α to denote an estimator of a parameter θ given another parameter α. For instance, X, is an estimator of X given .

A CONSTRUCTIVE DEFINITION OF ENVELOPES
The following proposition summarizes some key algebraic properties of envelopes. For a matrix M ∈ S p×p , let λ i and P i , i = 1, . . . , q, be its distinct eigenvalues and corresponding eigen-projections so that Proposition 1 (Cook et al. 2010).
From the first part of this proposition, we see that the Menvelope of B is the sum of the projections of B onto the eigenspaces of M; the second part of this proposition gives a variety of substitutes for B without changing the envelope; and part 3 implies that we may consider replacing M by f * (M) for a monotonic function f without affecting the envelope.
Envelopes arose in the studies reviewed in Section 1.3 as natural consequences of postulating reductions in Y or X. However, they provide no guidance on how to employ parallel reasoning in more complex settings like generalized linear models, or in settings without a clear regression structure. In terms of Definition 2, the previous studies offer no guidance on how to choose the matrix M and the subspace B for use in general multivariate problems, particularly since there are many ways to represent the same envelope, as indicated in parts 2 and 3 of Proposition 1. We next propose a broad criterion to guide these selections.
Let θ denote an estimator of a parameter vector θ ∈ ⊆ R m based on a sample of size n. Let θ t denote the true value of θ and assume, as is often the case, that √ n( θ − θ t ) converges in distribution to a normal random vector with mean 0 and covariance matrix V(θ t ) > 0 as n → ∞. To accommodate the presence of nuisance parameters we decompose θ as θ = (ψ T , φ T ) T , where φ ∈ R p , p ≤ m, is the parameter vector of interest and ψ ∈ R m−p is the nuisance parameter vector. The asymptotic covariance matrix of φ is represented as V φφ (θ t ), which is the p × p lower right block of V(θ t ). Then we construct an envelope for improving φ as follows.
This definition of an envelope expands previous approaches reviewed in Section 1.3 in a variety of ways. First, it links the envelope to a particular prespecified method of estimation through the covariance matrix V φφ (θ t ), while normal-theory maximum likelihood is the only method of estimation allowed by the previous approaches. The goal of an envelope is to improve that prespecified estimator, perhaps a maximum likelihood, least squares or robust estimator. Second, the matrix to be reducedhere V φφ (θ t ) -is dictated by the method of estimation. Third, the matrix to be reduced can now depend on the parameter being estimated, in addition to perhaps other parameters.
Definition 3 reproduces the partial envelopes for β 1 reviewed in Section 1.3 and the envelopes for β when it is a vector; that is, when r = 1 and p > 1 or when r > 1 and p = 1. It also reproduces the partially maximized log-likelihood function (1.4) by setting M = V φφ (θ t ) and U = φ t φ T t . To apply Definition 3 for the partial envelope of β 1 based on model (1.3), the asymptotic covariance matrix of the maximum likelihood estimator of β 1 is V β 1 β 1 = ( −1 X ) 11 Y|X , where ( −1 X ) 11 is the (1, 1) element of the inverse of X = lim n→∞ n −1 n i=1 X i X T i . Consequently, by Proposition 1, E V β 1 β 1 (β 1 ) = E Y|X (β 1 ), and thus Definition 3 recovers the partial envelopes of Su and Cook (2011). To construct the partially maximized log-likelihood (1.4) we set M = V β 1 β 1 and U = β 1 β T 1 . Then using sample versions gives which is the partially maximized log-likelihood of Su and Cook (2011). It is important to note that, although E Y|X (β 1 ) = E V β 1 β 1 (β 1 ), Definition 3 requires that we use V β 1 β 1 = ( −1 X ) 11 Y|X and not Y|X alone. The response envelope by Cook et al. (2010) can be seen as a special case of the partial envelope model with absence of X 2 . This implies that Definition 3 can also reproduce the response envelope objective function, which is obtained by replacing S Y|X 2 with S Y in the J ( ) function (1.4).
As another illustration, consider X reduction in model (1.1) with r = 1 and p > 1. To emphasize the scalar response, let σ 2 Y |X = var(ε) with sample residual variance s 2 Y |X . The ordinary least squares estimator of β has asymptotic variance . However, it follows from Proposition 1 that this envelope is equal to E X (β), which is the envelope used by Cook et al. (2013) when establishing connections with partial least squares. To construct the corresponding version of (1.4), let M = V ββ and U = ββ T . Then substituting sample quantities Cook et al. (2013).
Definition 3 in combination with J ( ) can also be used to derive envelope estimators for new problems. For example, consider enveloping for a multivariate mean μ in the Substituting sample versions of M and U leads to the same objective function J ( ) as that obtained when deriving the envelope estimator from scratch.

ENVELOPE REGRESSION
In this section we study envelope estimators in a general regression setting. We first discuss the likelihood-based approach to envelope regression in Section 3.1 and then discuss specific applications including generalized linear models (GLM) in Sections 3.2-3.4.

Conditional and Unconditional Inference in Regression
Let Y ∈ R 1 and X ∈ R p have a joint distribution with parameters θ := (α T , β T , ψ T ) T ∈ R q+p+s , so the joint density or mass function can be written as f (Y, X|θ ) = g(Y |α, β T X)h(X|ψ) and the observed data are {Y i , X i }, i = 1, . . . , n. We take β to be the parameter vector of interest and, prior to the introduction of envelopes, we restrict α, β and ψ to a product space. Let be the marginal log-likelihood for ψ. Then we can decompose L n (θ ) = C n (α, β) + M n (ψ). Since our primary interest lies in β and X is ancillary, estimators are typically obtained as ( α, β) = arg max α,β C n (α, β). (3.1) Our goal here is to improve the prespecified estimator β by introducing the envelope where ψ 1 repre-sents any parameters remaining after incorporating . Since both C n and M n depend on , the predictors are no longer ancillary after incorporating the envelope structure and estimation must be carried out by maximizing {C n (α, η, ) + M n (ψ 1 , )}.
The relationship between X and E V ββ (β t ) that is embodied in M n (ψ 1 , ) could be complicated, depending on the distribution of X. However, as described in the following proposition, it simplifies considerably when E(X| T X) is a linear function of T X. This is well-known as the linearity condition in the sufficient dimension reduction literature where there denotes a basis for the central subspace (Cook 1996). Background on the linearity condition, which is widely regarded as restrictive but nonetheless rather mild, is available from Cook (1998), Li and Wang (2007) and many other articles in sufficient dimension reduction. For instance, if X follows an elliptically contoured distribution, the linearity condition will be guaranteed for any (Eaton 1986).
An implication of Proposition 2 is that, for some positive definite matrices ∈ R u×u and 0 ∈ R (p−u)×(p−u) , X = T + 0 0 T 0 and thus M n must depend on through the marginal covariance X . Consequently, we can write M n ( X , ψ 2 ) = M n ( , , 0 , ψ 2 ), where ψ 2 represents any remaining parameters in the marginal function. If X is normal with mean μ X and variance X , then ψ 2 = μ X and M n ( , , 0 , ψ 2 ) = M n ( , , 0 , μ X ) is the marginal normal log-likelihood. In this case, it is possible to maximize M n over all its parameters except , as stated in the following proposition.
Proposition 3. Assume that X ∈ R p is multivariate normal N (μ X , X ) and that ∈ R p×u is a semiorthogonal basis matrix for E X (β t ). Then μ X, = X, X, = P S X P + Q S X Q and where ( , 0 ) ∈ R p×p is an orthogonal basis for R p . Moreover, the global maximum of M n ( ) is attained at all subsets of u eigenvectors of S X .
This proposition indicates that if X is marginally normal, the envelope estimators are ( α env , η, ) = arg max{C n (α, η, ) + M n ( )}. (3.4) In particular, the envelope estimator of β is β env = η and, from Proposition 3, X,env = P S X P + Q S X Q . It follows also from Proposition 3 that one role of M n is to pull span( ) toward the reducing subspaces of S X , although it will not necessarily coincide with any such subspace.

Generalized Linear Models With Canonical Link
In the generalized linear model setting (Agresti 2002), Y belongs to an exponential family with probability mass . . . , n, where ϑ is the natural parameter and ϕ > 0 is the dispersion parameter. We consider the canonical link function ϑ(α, β) = α + β T X, which is a monotonic differentiable function of E(Y |X, ϑ, ϕ). We also restrict discussion to one-parameter families so that the dispersion parameter ϕ is not needed.
The conditional log-likelihood takes the form of log f (y|ϑ) = yϑ − b(ϑ) + c(y) := C(ϑ), where ϑ = α + β T X is the canonical parameter. Then the Fisher information matrix for (α, β) evaluated at the true parameters is To construct the asymptotic covariance of β from the Fisher information matrix and to introduce notation, we transform (α, β) into orthogonal parameters (Cox and Reid 1987). Specifically, we transform (α, β) to (a, β) so that a and β have asymptotically independent maximum likelihood estimators. Define weights W (ϑ) = C (ϑ)/E(C (ϑ)) so that E(W ) = 1, and We now have orthogonal parameters and avar( is the corresponding envelope. The parameterizations (a, β) and (α, β) lead to equivalent implementation since we are interested only in β. Therefore, we use (α, β) in the estimation procedure that follows and use orthogonal parameters (a, β) to derive asymptotic properties in Section 3.3. The conditional log-likelihood, which varies for different exponential family distributions of Y |X, can be written as C n (α, β) = n i=1 C(ϑ i ), where ϑ i = α + β T X i and different functions C(ϑ) are summarized in Table 1. We next briefly review Fisher scoring, which is the standard iterative method for maximizing C n (α, β), as background for the alternating envelope algorithm (Algorithm 1). At each iteration of the Fisher scoring method, the update step for β can be summarized in the form of a weighted least squares (WLS) estimator as follows: is the weight at the current iteration, V = ϑ + {Y − μ( ϑ)}/ W is a pseudo-response variable at the current iteration, the weighted covariance S X( W ) is the sample estimator of X(W ) and the sample weighted crosscovariance S X V ( W ) is defined similarly. Upon convergence of the Fisher scoring process, the estimator β = S −1 X( W ) S X V ( W ) is a function of the estimators α, ϑ, W , and V at the final iteration.
Assuming normal predictors, it follows from Section 3.1 that the full log-likelihood can be written as L n (α, , η) = C n (α, β) + M n ( ), where β = η is the coefficient vector of interest and M n ( ) is given in Proposition 3. For fixed ∈ R p×u , the Fisher scoring method of fitting the GLM of Y on T X leads to η = ( T S X( W ) ) −1 T S X V ( W ) . Therefore, the partially maximized log-likelihood for based on Fisher scoring algorithm is where the optimization over treats η as a function of instead of as fixed. Since we are able to compute the analytical form of the matrix derivative ∂L n ( )/∂ , which is summarized in Lemma 1 of Supplement Section C, the alternating update of based on η is more efficient than updating with fixed η. We summarize this alternating algorithm for fitting GLM envelope estimators in Algorithm 1, where the alternating between (2a) and (2b) typically converges in only a few rounds.

Asymptotic Properties With Normal Predictors
In this section we describe the asymptotic properties of envelope estimators in regression when α (i.e., orthogonal parameter a in GLM) and β are orthogonal parameter vectors and X is normally distributed. We also contrast the asymptotic behavior of the envelope estimator β env with that of the estimator β from C n (α, β), and comment on other settings at the end of the section. The results in this section apply to any likelihoodbased envelope regression, including the envelope estimators in GLMs.
The parameters involved in the coordinate representation of the envelope model are α, η, , 0 , and . Since the parameters η, , and 0 depend on the basis and the objective function is invariant under orthogonal transformations of , the estimators of these parameters are not unique. Hence, we consider only the asymptotic properties of the estimable functions α, β = η and X = T + 0 0 T 0 , which are invariant under choice of basis and thus have unique maximizers. Under the normality assumption, X ∼ N (μ X , X ), we neglect the mean vector μ X since it is orthogonal to all of the other parameters. We define Table 1. A summary for the mean functions, the conditional log-likelihoods and their derivatives of various exponential family distributions the following parameters ξ and estimable functions h: Since the number of free parameters in h is q + p + p(p + 1)/2 and the number of free parameters in ξ is q + u + (p − u)u + u(u + 1)/2 + (p − u)(p − u + 1)/2 = q + u + p(p + 1)/2, the envelope model reduces the total number of parameters by p − u.
Proposition 4. Assume that for i = 1, . . . , n, the predictors X i are independent copies of a normal random vector X with mean μ X and variance X > 0, and that the data (Y i , X i ) are independent copies of (Y, X) with finite fourth moments. Assume also that α and β are orthogonal parameter vectors. Then, as n → ∞, √ n( β env − β t ) converges to a normal vector with mean 0 and covariance matrix The conditional log-likelihood is reflected in avar( √ n β env ) primarily through the asymptotic variance V ββ , while and 0 stem from the normal marginal likelihood of X. The span of reduces both V ββ and X , so the envelope serves as a link between the conditional and marginal likelihoods in the asymptotic variance. The first addend avar( √ n β ) is the asymptotic covariance of the estimator given the envelope. The second addend avar( √ nQ β η ) reflects the asymptotic cost of estimating the envelope and this term is orthogonal to the first. Moreover, the envelope estimator is always more efficient than or equally efficient as the usual estimator β.
A vector SE( β env ) of asymptotic standard errors for the elements of β env can be constructed by first using the plug-in method to obtain an estimate of avar( √ n β env ) of avar( √ n β env ) and then SE( β env ) = diag 1/2 { avar( √ n β env )/n}.
An important special case of Proposition 4 is given in the following corollary.
Corollary 1 tells us that, if we have normal predictors with isotropic covariance, then the envelope estimator is asymptotically equivalent to the standard estimator and enveloping offers no gain, but there is no loss, either. This implies that there must be some degree of co-linearity among the predictors before envelopes can offer gains. We illustrate this conclusion in the simulations of Section 5.2 for logistic and Poisson regressions and in Supplement Section B for least squares estimators.
Experience has shown that (3.4) provides a useful envelope estimator when the predictors satisfy the linearity condition but are not multivariate normal. In this case there is a connection between the desired envelope E V ββ (β t ) and the marginal distribution of X, as shown in Proposition 2, and β env is still a √ n-consistent estimator of β. If the linearity condition is substantially violated, we can still use the objective function {C n (α, β) + M n ( )} to estimate β within E X (β t ) but this envelope may no longer equal E V ββ (β t ). Nevertheless, as demonstrated later in Corollary 2, this still has the potential to yield an estimator of β with smaller asymptotic variance than β, although further work is required to characterize the gains in this setting. Alternatively, the general estimation procedure proposed in Section 4.1 yields √ n-consistent estimators without requiring the linearity condition.

Other Regression Applications
Based on our general framework, the envelope model can be adapted to other regression types. In Supplement Sections A.1 and A.2 we give derivation and implementation details for envelope in WLS regression and in Supplement Section A.3, we include details for envelopes in Cox regression. Theoretical results in this section could also apply to the WLS and the Cox regression models, as we discussed in the Supplement Section A.
The envelope methods proposed here are based on sample estimators of asymptotic covariance matrices, which may be sensitive to outliers. However, the idea of envelopes can be extended to robust estimation procedures (see, e.g., Yohai et al. 1991).

ENVELOPE ESTIMATION BEYOND REGRESSION
Having seen that Definition 3 recovers past envelopes and having applied it in the context of GLMs, we next turn to its general use in estimation. We propose three generic estimation procedures-one based on moments, one based on a generic objective function and one based on a likelihood-for envelope models and methods in general. The estimation frameworks in this section includes previous sections as special cases, and the algorithms in this section can also be applied to any regression context.

Moment-Based and Objective-Function-Based Estimation
Definition 3 combined with the 1D algorithm (Cook and Zhang 2014), which is restated as Algorithm 2 in its population version, gives us a generic moment-based estimator of the envelope E M (U), requiring only matrices M and U. Setting U = φ φ T ∈ S q×q and M equal to a √ n-consistent estimator of V φφ (θ t ) ∈ S q×q , it follows from Cook and Zhang (2014, Proposition 6) that the 1D algorithm provides a √ n-consistent esti- where G u is obtained from the sample version of the 1D algorithm and u = dim(E V φφ (θ t ) (φ t )) is assumed to be known. The moment-based envelope estimator φ env = P G u φ is then a √ n-consistent estimator of φ t . For example, the momentbased estimator for GLMs is obtained by letting U = β β T and in the 1D algorithm. Consequently, a distributional assumption, such as normality, is not a requirement for estimators based on the 1D algorithm to be useful, a conclusion that is supported by previous work and by our experience. However, the weight W (ϑ) depends on the parameter β so that iterative updates of weights and estimators could be used to refine the final estimator, as we will discuss at the end of Section 4.2.
1. Set initial value g 0 = G 0 = 0. 2. For k = 0, . . . , u − 1, g k+1 ∈ R q is obtained direction in the envelope E M (U) as follows, (a) Let G k = (g 1 , . . . , g k ) if k ≥ 1 and let (G k , G 0k ) be an orthogonal basis for R q . (b) Define the stepwise objective function 0k UG 0k ) and w ∈ R q−k . (c) Solve w k+1 = arg min w J k (w) subject to a length constraint w T w = 1. (d) Define g k+1 = G 0k w k+1 to be the unit length (k + 1)th stepwise direction.
The 1D algorithm can also play the role of finding starting values for other optimizations. For instance, in envelope GLMs, we use the 1D algorithm to get starting values for maximizing (3.7). Moreover, the likelihood-based objective function in (3.7) has the property that L n ( ) = L n ( O) for any orthogonal matrix O ∈ R u×u . Maximization of L n is thus over the Grassmannian G u,p . Since u(p − u) real numbers are required to specify an element of G u,p uniquely, optimization of L n is essentially over u(p − u) real dimensions, and can be time consuming and sensitive to starting values when this dimension is large. The 1D algorithm mitigates these computational issues: to our experience, with the 1D estimator as the starting value for Step 2(a) in Algorithm 1, the iterative Grassmann manifold optimization of L n ( ) in (3.7) typically converges after only a few iterations.
To gain intuition about the potential advantages of the moment-based envelope estimator, assume that a basis for E V φφ (θ t ) (φ t ) is known and write the envelope estimator as P φ . Then since the envelope reduces V φφ (θ t ), we have These relationships allow some straightforward intuition by writing The second term √ nQ φ is asymptotically normal with mean 0 and variance Q V φφ (θ t )Q , and is asymptotically independent of √ n(P φ − φ t ). Consequently, we think of Q φ as the immaterial information in φ. The envelope estimator then achieves efficiency gains by essentially eliminating the immaterial variation Q V φφ (θ t )Q , the greatest gains being achieved when Q V φφ (θ t )Q is large relative to P V φφ (θ t )P . Of course, we will typically need to estimate in practice, which will mitigate the asymptotic advantages when is known. But when the immaterial variation is large compared to the cost of estimating the envelope, substantial gain will still be achieved. Although we do not have an expression for the asymptotic variance of the moment-based estimator φ env = P G u φ, the bootstrap can be used to estimate it. Depending on context, cross-validation, an information criteria like AIC and BIC, or sequential hypothesis testing can be used to aid selection of u, as in Cook et al. (2010Cook et al. ( , 2013 and Su and Cook (2011).
If φ is obtained by minimizing an objective function φ = arg min φ∈R q F n (φ) then, as an alternative to the moment-based estimator φ env = P G u φ, an envelope estimator can be constructed as φ env = G u η, where η = arg min η∈R u F n ( G u η). We refer to this as the objective-function-based estimator. Sometimes these two approaches are identical, that is, response envelopes (Cook et al. 2010) and partial envelopes (Su and Cook 2011). In Section 4.2 we specialize the objective function approach to maximum likelihood estimators.

Envelopes for Maximum Likelihood Estimators
In this section, we broaden the envelope estimators in regression to generic likelihood-based estimators. Consider estimating θ as θ = arg max θ∈ L n (θ ), where L n (θ ) is a loglikelihood that is twice continuously differentiable in an open neighborhood of θ t . Then under standard conditions √ n( θ − θ t ) is asymptotically normal with mean 0 and covariance matrix V(θ t ) = F −1 (θ t ), where F is the Fisher information matrix for θ. The asymptotic covariance matrix of φ, V φφ (θ t ), is the lower right block of V(θ t ).
Recall that for envelope models, we can write the loglikelihood as L n (θ) = L n (ψ, φ) = L n (ψ, η). To obtain MLE, we need to maximize L n (ψ, η) over ψ, η and ∈ R p×u , where is semiorthogonal basis for E V φφ (θ t ) (φ t ) and u = dim(E V φφ (θ t ) (φ t )). We first discuss the potential advantages of envelopes by considering the maximization of L n (ψ, η) over ψ and η with known . Since φ t ∈ E V φφ (θ t ) (φ t ), we have that φ t = η t for η t ∈ R u . Consequently, for known , the envelope estimators become Any basis for E V φφ (θ t ) (φ t ) will give the same solution φ : for an The estimator φ given in (4.3) is in general different from the moment-based estimator P φ discussed near the end of Section 4.1. However, as implied by the following proposition, these two estimators have the same asymptotic distribution, which provides some support for the simple projection estimator of Section 4.1.
The following corollary to Proposition 5 characterizes the asymptotic variance when an arbitrary envelope is used.

Corollary 2. If is a basis for an arbitrary envelope
where M is a symmetric positive definite matrix, then (4.6) The above expression shows that is intertwined with V φφ (θ t ) in the asymptotic covariance of the envelope estimator φ , while the envelope by Definition 3 makes avar( √ n φ ) more interpretable because the material and the immaterial variations are separable.
As formulated, this likelihood context allows us to construct the envelope estimator φ when a basis for E V φφ (θ t ) (φ t ) is known, but it does not by itself provide a basis estimator . However, a basis can be estimated by using the 1D algorithm (Algorithm 2 in Section 4.1), setting M = V φφ (θ) and U = φφ T and plugging in the prespecified estimator θ = arg max θ∈ L n (θ) to get √ n-consistent estimators of M and U. The envelope estimator is then φ env = φ , where = G u . Then we have the following proposition.
Proposition 6. If the estimated basis for E V φφ (θ t ) (φ t ) is obtained by the 1D algorithm (Algorithm 2), then the envelope estimator φ env = φ is a √ n-consistent estimator of φ.
The envelope estimator φ env depends on the prespecified estimator θ through M = V φφ (θ ) and U = φφ T . Although it is √ n-consistent, we have found empirically that, when M depends nontrivially on θ t , it can often be improved by iterating so the current estimate of θ is use to construct estimates of M and U. The iteration can be implemented as Algorithm 3.

Algorithm 3
The iterative envelope algorithm for MLE.
1. Initialize θ 0 = θ , and let M k and U k be the estimators of M and U based on the kth envelope estimator θ k of θ , so M 0 and U 0 are based only on the prespecified estimator. 2. For k = 0, 1, . . . iterate as follows until a measure of the change between φ k+1 and φ k is sufficiently small.
(a) Using the 1D algorithm, construct an estimated ba-

SIMULATIONS
In this section we present a few simulations to support and illustrate the foundations discussed in previous sections. In Section 5.1 we simulate two datasets to illustrate the workings of envelope estimation in logistic regression. In Sections 5.2 and 5.3, we simulated 100 datasets for each of the three sample sizes: n = 50, 200, and 800. The true dimension of the envelope was always used in estimation. We mainly focus on the performance of Algorithm 1 in this section, since the predictors were simulated from normal distributions.

Envelope in Logistic Regression
Following Definition 3, we derived in Section 3.2 that the envelope in logistic regression is E X(W ) (β), where X(W ) = cov( √ W X) is the covariance of the weighted predictors √ W X and W = exp(β T X)/{1 + exp(β T X)} 2 . We now use a small simulation study, where p = 2 and u = 1, to illustrate graphically the advantages of envelopes.
We generated 150 independent observations as follows: Y i |X i ∼ Bernoulli(logit(β T X i )), β = (β 1 , β 2 ) T = (0.25, 0.25) T and X i ∼ N (0, X ). We let span(β) be the principal eigenvector of X with eigenvalue 10 and let the other eigenvalue be 0.1, which led to co-linearity and made the estimation challenging. We plotted the data in Figure 3(a), in which we used two lines to indicate the directions of the two estimators: standard estimator β = (−0.047, 0.692) T with asymptotic standard errors SE( β) = (0.418, 0.429) T , and envelope estimator β env = (0.321, 0.319) T (based on Algorithm 1) with asymptotic standard errors SE( β env ) = (0.058, 0.057) T . The components of β env are large relative to their standard errors, while the components of β are not. The alternative moment-based envelope estimators discussed in Sections 4.1 and 4.2, β env,2 = (0.323, 0.317) T (based on Algorithm 2 as indicated by the added subscript) and β env,3 = (0.320, 0.320) T (based on Algorithm 3), were both very close to the likelihoodbased estimator β env . We also plotted the weighted predictors √ W X in Figure 3(b) so that we can see from the figure that the envelope is the principal eigenvector of X(W ) .
To further demonstrate the point that the standard logistic regression estimator is highly variable in the presence of co-linearity, we simulated another dataset according to the same model. Again, the envelope estimator stayed close to the truth, β env = (0.208, 0.209) T with asymptotic standard errors SE( β env ) = (0.047, 0.047) T . And again the moment-based envelope estimators were both very similar to the likelihood-based estimator, β env,2 = (0.207, 0.208) T and β env,3 = (0.210, 0.208) T . But the standard estimator varied substantially, β = (0.665, −0.248) T with asymptotic standard errors SE( β) = (0.401, 0.399) T .
We use the Australia Institute of Sport (AIS) data to illustrate the phenomenon in this simulation setting. This dataset, originally from Cook and Weisberg (1994; p. 98), contains various measurements on 102 male and 100 female athletes collected at the Australian Institute of Sport. We take height (cm) to be X 1 , weight (kg) to be X 2 and Y as a binary indicator for male or female athletes. A plot of the data (not shown) is similar to that in Figure 3(a) and the envelope dimension was selected as u = 1 by five-fold cross-validation so we expected some gain from envelopes. The standard estimator β = (0.123, 0.062) T has asymptotic standard errors SE( β) = (0.034, 0.022) T , while the likelihood-based envelope estimator β env = (0.085, 0.086) T , which was obtained by Algorithm 1, has asymptotic standard errors SE( β env ) = (0.014, 0.014) T . The envelope estimator compared to the standard estimator, had 40% and 60% smaller standard errors for the two coefficients in β.

Generalized Linear Models
In this section we report the result of numerical studies to compare three estimators in the contexts of logistic and Poisson regression: (1) the standard GLM estimators obtained by the Fisher scoring method, (2) iteratively reweighted partial least squares estimators for GLMs (IRPLS; Marx 1996), and (3) envelope estimators based on Algorithm 1. The 1D algorithm (Algorithm 2) was used to provide √ n-consistent starting values for Grassmannian optimizations in Algorithm 1. We used the Matlab function glmfit to obtain the standard GLM estimators. The IRPLS algorithm is a widely recognized extension of PLS algorithms for GLMs. It embeds a weighted partial least squares algorithm in the Fisher scoring method and iteratively updates between the reduced predictors T X and the other parameters ( α, β, ϑ, V , W ), similar to our envelope method described in Section 3.2. The original IRPLS algorithm by Marx (1996) is based on the NIPALS algorithm (Wold 1966) for PLS, but our simulations all used the SIMPLS algorithm (de Jong 1993) within IRPLS. This is because SIMPLS is more widely used and is implemented in the plsregress function in Matlab. In all these simulations, the envelope dimension was equal to 2, p = 10 and X ∼ N (0, X ), where X = T + 0 0 T 0 , ∈ R p×2 was generated by filling with independent uniform (0, 1) variates and then standardized so that ( , 0 ) is an orthogonal matrix. The positive definite symmetric matrices ∈ R 2×2 and 0 ∈ R 8×8 were generated as ODO T where O is an orthogonal matrix and D is a diagonal matrix of positive eigenvalues. We chose the eigenvalues of to be 0.1 and 0.5, while the eigenvalues of 0 range from 0.002 to 20. The true parameter vector β t = η with η = (1, 1) T . The intercepts were α = 2 for Poisson regression and α = 1 for logistic regression.
The averaged angles and length ratios are summarized in Table 2. From this table, we found that the envelope estimator was always the best for various sample sizes. The advantages of envelope estimator over its competitors were clear in both logistic and Poisson regressions. The improvements of envelope estimators over the standard estimators became more substantial with increased sample sizes. At the same time, the standard estimator always did better than the IRPLS estimator in terms of angle.  Recall that, according to Corollary 1, we need some degree of colinearity among the predictors before envelope can offer gains. To illustrate this conclusion, we repeated the sets of simulations by using X = I p for Poisson regression and X = 4I p for logistic regression. From the results summarized in Table 3, β and β env had similar performance, as expected.

Cox Regression
In the study of survival time T on the p-dimensional covariates Z ∈ R p , Cox's proportional hazards model (Cox 1972(Cox , 1975 assumes the hazard function h(t|Z) of a subject with covariates Z has the form h(t|Z) = h 0 (t) exp(β T Z), where h 0 (t) is the unspecified baseline hazard function and β is an unknown vector of regression coefficients in which we are primarily interested. Let X i and C i be the failure time and the censoring time of the i-th subject, i = 1, . . . , n. Define δ i = I (X i ≤ C i ) and T i = min(X i , C i ). The data then consist of (T i , δ i , Z i ), i = 1, . . . , n. The failure time and the censoring time are assumed to be independent given the covariates. We assume that there are no ties for the observed times.
In our setting, Z ∈ R 10 followed the multivariate normal distribution N (0, Z ) with Z = T + 0.2 0 T 0 , where , 0 and were generated in the same way as in Section 5.2, except that eigenvalues of were 1 and 5. The true parameter vector β t = η where η = (1, 1) T . Then we followed the simulation model in Nygard and Borgan (2008), where the survival time followed a Weibull distribution with scale parameter exp(β T Z/5) and shape parameter 5, which was a Weibull distribution with hazard rate h(Y |Z) = 5Y 4 · exp(β T Z). The censoring variable δ was generated as the integer part of a uniform(0, 2) random variable, which gave censoring rates of approximately 50%.
We considered three different estimators: the standard estimator without envelope structure, the likelihood-based envelope estimator (see Supplement Section A.3 for implementation details), and partial least squares for Cox regression (Nygard and Borgan 2008). From Table 4, we found that the envelope estimator was always the best, while the PLS estimator also improved over the standard estimator.

ILLUSTRATIVE DATA ANALYSES
We present three small examples in this section to illustrate selected aspects of the proposed foundations.

Logistic Regression: Colon Cancer Diagnosis
We use these data to illustrate the classification power of the envelope estimator as well as its estimation efficiency. The objective of this study was to distinguish normal tissues and adenomas (precancerous tissues) that were found during colonoscopy. When the normal and adenomatous tissues are illuminated with ultraviolet light, they fluoresce at different wavelengths. The p = 22 predictors, which are laser-induced fluorescence (LIF) spectra measured at 8 nm spacing from 375 to 550 nm, reflect this phenomenon. The dataset consists of n = 285 tissue samples, comprised of 180 normal tissues and 105 adenomas. Studies on a similar dataset can be found in Hawkins and Maboudou-Tchao (2013).
The envelope dimension was chosen to be 2 by five-fold cross-validation. Compared to standard logistic regression, the bootstrap standard errors of the 22 elements in β were 2.2 to 12.5 times larger than the bootstrap standard errors of the elements in β env , which was obtained from Algorithm 1. On the other hand, five-fold cross-validation classification error rate for the standard logistic regression estimator was 21% while it was 17.5% for the envelope estimator with u = 2 or u = 3. And the error rate for envelope estimator with any dimension between 1 and 10 was no larger than that of the standard estimator.
To gain some idea about the level of intrinsic (nonestimative) variation in the envelope error rate, we took β env to be the true β, sampled the predictors 1000 times and predicted the response using β t = β env . The resulting classification error was 16.7%, which is not far from the observed error rate of 17.5%.

Linear Discriminant Analysis: Wheat Protein Data
In this dataset (Cook et al. 2010), we have a binary response variable Y indicating high or low protein content and six highly correlated predictors which are the logarithms of near infrared reflectance at six wavelengths across the range 1680-2310 nm. The correlations between these predictors range from 0.9118 to 0.9991. We standardized the predictors marginally so that each had mean 0 and variance 1. We illustrate the possibility of using envelopes to improve Fisher's linear discriminant analysis (LDA). It has been shown (Ye 2007) that two-class LDA is equivalent to a least squares problem: no matter how we code Y for two classes, the least squares fitted coefficient vector β OLS is always proportional to the LDA direction. Hence, we used β T OLS X as the discriminant direction and fitted the LDA classification rule based on it. We similarly replaced the OLS coefficient vector with the coefficient vector β PLS estimated by SIMPLS algorithm and the coefficient vector and β env estimated by using the 1D algorithm (Algorithm 2) with respect to objective function (2.1) to improve OLS. We estimated the error rates for these three classifiers as the average error rate over 100 replications of fivefold cross-validations on the n = 50 samples. The average error rates are shown in Table 5. Clearly the 1D algorithm had the best classification performance.

Poisson Regression: Horseshoe Crab Data
This is a textbook Poisson regression dataset about nesting horseshoe crabs (see Agresti 2002, sec. 4.3). The response is the number of satellite male crabs residing near a female crab. Explanatory variables included the female crab's color, spine condition, weight, and carapace width. Since color is a factor with four levels, we use three indicator variables to indicate color. Also, spine condition is a factor with three levels, so we used two indicator variables for spine condition, for a total of seven predictors. The sample size is n = 173.
Five-fold cross-validation was used to determine the dimension of the envelope, giving u = 1, which was also the answer given by BIC. To assess the gain in envelope reduction with u = 1, we repeated the five-fold cross-validation procedure 100 times and obtained the averaged Pearson's χ 2 statistic n i=1 (Y i − Y i ) 2 / Y i and its standard error shown in Table 6. Clearly, the likelihood-based envelope estimator based on Algorithm 1 had significant improvement over the standard method and a slight edge over IRPLS.

DISCUSSION
With our general definition and algorithms, we laid foundations for envelope models and methods. Our general framework subsumes previous envelope models in multivariate linear regression and provides guidance for future studies. Building upon this framework, we extended the scope of envelope methods to GLMs, Cox regression, and weighted least squares. Although we focused on vector-valued parameters, the extension in the Supplement for matrix-valued parameters suggests that our general construction can be straightforwardly extend to matrix-valued or even higher-order tensor-valued regressions (see, e.g., Zhou and Li 2014;Zhou, Li, and Zhu 2013). We propose three estimation procedures ranging from specific to general. Estimation based on Algorithm 1 is proposed for normal predictors and is also preferable in practice when the linearity condition holds. Algorithm 3 outlines an iterative procedure that is more flexible and that provides √ n-consistent estimators even without linearity or normality. We can use this procedure as a substitute for Algorithm 1 when the distribution of X is far from elliptical. Finally, the most generic envelope estimator for an arbitrary prespecified estimator β is proposed as β env = P G u β or β env = G u η G u if β is obtained based on objective function, where G u is estimated from the 1D algorithm (Algorithm 2). These estimators can deal with a variety of problems, ranging from likelihood-based estimators to problems that involve heuristic non-parametric or semiparametric estimation.
Inner envelopes (Su and Cook 2012) are not covered by these foundations because they correspond to a different type of envelope construction. Extensions to inner envelopes are possible, but the analysis and methods involved are different than those for envelopes and so are outside the scope of this article.
Another future research direction is to extend the likelihoodbased and the moment-based envelope estimation to highdimensional settings. One possible approach is to develop penalized likelihood estimation procedure based on the likelihoodbased estimation in Section 4.2. Another more straightforwardly implemented way is to use the ridge-type covariance estimators proposed by Ledoit and Wolf (2004) for M and U in our algorithms. Properties of such extensions are yet to be studied.

SUPPLEMENTARY MATERIALS
The supplementary materials contain additional proofs and technical details. [Received June 2013. Revised September 2014