Technological Heterogeneity and Corporate Investment

We propose an importance-sampling procedure to improve the computational performance of the simulated method of moments (SMM) for the estimation of structural models with fixed parameter heterogeneity. The main advantage of the procedure is that it does not require to simulate observations every time that the structural parameters change during the minimization of the SMM criterion function. We illustrate the use of our method by estimating a neoclassical model of investment for a sample of US manufacturing companies, allowing the technological parameters to vary across firms.


Introduction
There is a growing body of research that estimates structural models of corporate investment and financing choices. 1 The aim of many papers in this literature is to infer from the data the value of economic variables that are not directly observable by researchers, such as the costs of external financing (Hennessy and Whited, 2007), capital adjustment costs (Cooper and Haltiwanger, 2006), and the magnitude of agency conflicts between managers and shareholders (Nikolov and Whited, 2014). The approach followed by most papers in this literature is to parameterize an intertemporal model of firm-level investment and financing decisions, and estimate the parameters by matching a set of simulated moments from the model to their empirical counterparts. However, by assuming that all firms are described by the same set of parameters, this approach cannot account for the persistent differences in firm policies that have been documented in the empirical literature. 2 Our paper's main contribution is to develop a tractable and robust estimation methodology, based on importance sampling, that accounts for parameter heterogeneity in the cross-section of firms. We demonstrate our approach in the context of a dynamic model of investment, allowing the parameters of the profit function and the capital adjustment costs to differ across firms. We estimate the cross-sectional distribution of the model's parameters and quantify their value for each firm in a sample of US manufacturing companies.
Our methodology is based on the simulated method of moments (SMM). The standard SMM procedure solves numerically the firm's intertemporal optimization problem, simulates a sample of firms, and computes a set of simulated moments. Minimizing the distance between simulated and empirical moments requires searching over the parameter space, which, in turn, requires numerically solving the firm's problem for many different parameter combinations. In the presence of heterogeneity across firms, there is an additional computational challenge that makes the estimation of the cross-sectional distribution of parameters with the standard SMM approach intractable: At every step in the minimiza-1 See Strebulaev and Whited (2012) for a survey of this literature. 2 For example, Lemmon, Roberts, and Zender (2008) show that firm fixed effects account for the majority of the explained variation in leverage regressions. Fuller, Netter, and Stegemoller (2002) provide evidence of differences in acquisition policies across firms. Studies of the determinants of investment typically include firm fixed effects to account for unobserved time-invariant heterogeneity across firms (see Fazzari, Hubbard, and Petersen, 1988), and Campello, Galvao, and Juhl (2013) show that there is considerable variation in investment-cash-flow and investment-Q sensitivities across firms. tion, one needs to solve a separate intertemporal problem for each parameter combination that occurs in the simulated cross-section of firms. To alleviate the computational burden, we propose an algorithm based on adaptive importance sampling that iteratively estimates restricted local minima by only varying the distribution function of parameters, holding the simulated sample of firms constant, until no more improvement in the estimation criterion is possible. The output of the importance sampling algorithm is then used as an initial guess for the standard SMM estimator.
To illustrate our methodology, we employ a neoclassical model of investment that has become standard in the literature (see, for example, Adda and Cooper, 2003). The model is in discrete time, and the horizon is infinite. Firms are risk neutral and maximize the present value of future cash flows. In each period, cash flows are determined by operating profits, which are affected by persistent firm-specific profit shocks, the value of investment, and capital adjustment costs. The latter are incurred whenever a firm acquires or sells capital, which depreciates over time. The specification of the adjustment cost function accounts for asymmetric costs for investment and disinvestment, in the spirit of Abel and Eberly (1994) and Zhang (2005).
The key extension over the existing structural models in the literature is that we allow the technological parameters to vary across firms. In the model, firms are heterogeneous with respect to six technological parameters: the depreciation rate of capital, the persistence and volatility of the profit shocks, a profit curvature parameter that affects the marginal returns to capital, and two parameters that measure adjustment costs for positive and negative investment, respectively. Thus, in contrast to the existing structural models, different realizations of profits and investment across firms are not necessarily due to different firmspecific shock realizations, but they can be attributed to different firm-specific technological parameters.
We parameterize and estimate the cross-sectional distribution of technologies for a sample of 906 US manufacturing companies in Compustat for the period 1972 to 2012. The estimation results show that a considerable amount of technological heterogeneity exists across firms. Combining the distribution of technologies with firm-specific moments of investment and profits, we are also able to obtain parameter estimates at the firm level.
When we test the empirical performance of the model, we find that most simulated moments closely follow their empirical counterparts, but the discrepancies are statistically significant.
This exercise shows the potential, and the limitations, of neoclassical investment models to explain persistent heterogeneity in firm profitability and corporate investment polices.
The use of importance sampling techniques for Monte-Carlo simulation has been studied extensively in the literature (see, for example, Robert and Casella, 2005). Recent application of importance sampling for simulation-based estimation are Richard and Zhang (2007), who study an efficient importance sampling procedure in the context of maximum likelihood estimation, and Ackerberg (2009), who proposes the application of importance sampling in method-of-moments estimators. Our paper contributes to this literature by formulating an importance sampling algorithm that alleviates the computational burden of SMM estimation in the context of a neoclassical model of investment.
Two recent papers estimate structural models of corporate decisions accounting for heterogeneity in firm parameters. Morellec, Nikolov, and Schurhoff (2012) estimate a dynamic capital structure model with agency costs. In their setting, closed form solutions for optimal policies enable estimation by simulated maximum likelihood at a feasible computational cost. To account for heterogeneity, Glover (2016) estimates firm-by-firm a trade-off model of capital structure, to quantify the cross-sectional distribution of financial-distress costs.
Whereas these papers focus on financing policies, ours is the first paper, to the best of our knowledge, that estimates a neoclassical model of investment allowing for parameter heterogeneity across firms. Furthermore, our paper does not require closed form solutions of the firm's policy functions, or to estimate the model separately for each firm in the data. The methodology employed in this paper can benefit researchers in macroeconomics, finance, or industrial organization who seek to incorporate parameter heterogeneity in the estimation of their structural models.
The paper is structured as follows. Section 2 discusses the estimation of models with fixed parameter heterogeneity using SMM. Section 3 presents the importance sampling algorithm. Section 4 applies the importance sampling algorithm to a dynamic model of firm investment. Section 5 concludes.

SMM for models with fixed parameter heterogeneity
In this section, we describe how the standard SMM procedure of McFadden (1989) and Pakes and Pollard (1989) can be used to estimate non-linear structural models with fixed parameter heterogeneity. 3 Consider a dataset {y jt } of K variables observed for periods t = 1, ..., T and for j = 1, ..., N f firms. 4 At the firm level, the time series of observations y j = (y j1 , ..., y jT ) ∈ Y is distributed according to the joint density f y (y j ; θ j ), where θ j is a N θ × 1 firm-specific parameter vector belonging to the parameter space Θ. The firm's choices are summarized by a vector of firm-level moments where u is an N m ×1-valued function. We assume that estimates of the firm-specific moments h j can be computed in the data.
Firms are heterogeneous with respect to the parameters θ j driving the distribution of the observables y j . These parameters are time-invariant, and identically and independently distributed across firms with density f θ (θ; λ) for j = 1, ..., N f . The objective is to estimate and perform inference on the economy-wide parameter vector λ ∈ Λ characterizing the cross-sectional density of firm-specific parameters.
To that end, let η i (λ) be a P × 1 vector of economy-wide moments describing the cross-section of the i-th firm-specific moment, i = 1, ..., N m . A necessary condition for identification is that the total number of moments N m ×P exceeds the number of parameters in λ. We assume that the p-th element of η i (λ), p = 1, ..., P , obtains as a result of minimizing the criterion function 3 See also Gouriéroux and Monfort (1997) for a discussion of SMM. 4 Throughout the paper we refer to the cross-sectional units as firms to be consistent with the corporate investment application of Section 4. For example, if η i,p (λ) is the q-th percentile of the cross-sectional distribution of h i (θ), then and if η i,p (λ) is the q-th central moment of h i (θ), then Let η(λ) = (η 1 (λ) , . . . , η Nm (λ) ) be the P N m × 1 vector of all economy-wide moments.
Furthermore, let η N f be a consistent estimate of η(λ) in the actual data. If analytical expressions for η(λ) exist, estimation can be performed using a GMM approach (see Hansen, 1982). The need for simulation-assisted estimation arises when closed-form expressions for f y (y j ; θ j ), h(θ j ), and therefore η(λ) do not exist. In this case, the researcher can simulate a cross-section of firm-specific parameters θ s ∼ f θ (θ; λ), s = 1, ..., S. If, in addition, the researcher can simulate data y(θ s ) given the parameters drawn, then it is possible to obtain simulations of firm-specific moments h(θ s ) and economy-wide moments: For example, for the q-th percentile of h i , and for the q-th central moment of h i Equations 6 and 7 represent the simulation estimators of Equation 3 and Equation 4, respectively.
The SMM estimator is given by where W is a symmetric positive definite weighting matrix. To obtain efficiency, the weight- The asymptotic distribution of the (optimal) SMM estimator is: where and Solving the minimization problem in Equation 8 can be computationally very cumbersome. For example, the application that we examine in Section 4 requires to solve numerically, for a given value of the structural parameter vector λ, an intertemporal optimization problem for each draw s = 1, ..., S of the firm-specific parameter vector θ s , to obtain firm-specific moments h(θ s ). This procedure must be repeated for each value of λ visited in the minimization of the SMM criterion: If the minimization of Equation 8 requires M iterations, the dynamic programming problem will need to be solved numerically in total M × S times. Hence, this estimation procedure can be extremely time consuming even for relatively simple structural models. In the next section, we propose an algorithm based on importance sampling that can drastically reduce the computational burden of SMM when estimating structural models with fixed heterogeneity.

Importance sampling algorithm
In this section, we discuss a procedure that alleviates the computational burden of the SMM estimation by replacing the standard Monte Carlo simulation estimator η(λ) with one based on importance sampling, which we denote by η(λ).
Let g(θ) be an alternative distribution to f θ (θ; λ), with the same domain but invariant to λ. The importance-sampling estimator for the p-th element of h i is: where θ s ∼ g(θ) and S is the number of random draws from g(θ). This importance-sampling estimator rests on computing firm-specific moments on parameters θ s drawn from a distribution that is potentially different from the true distribution f θ (θ; λ). Unlike the draws in the standard Monte Carlo approach described in Section 2, the draws θ s in Equation 11 do not change with λ. Therefore, the firm-specific moments h(θ s ) need not be recomputed as λ changes. This represents a major advantage of the importance sampling procedure compared to the standard Monte Carlo approach, and results in very large savings in computation time.
It is important to notice that the variance of the simulation estimator in Equation 11 increases the more dissimilar the distributions f θ (θ; λ) and g(θ) are (see Robert and Casella, 2005). For this reason, we follow an adaptive procedure to periodically update the importancesampling distribution g. Accordingly, we initiate this distribution at g 0 (θ) = f θ (θ; λ 0 ) for an arbitrary λ 0 ∈ Λ.We then draw firm-specific parameters θ 0 s , for s = 1, ..., S, and compute the corresponding moments h(θ 0 s ). Allowing the structural parameters λ to vary in a narrow neighborhood Λ 0 of λ 0 , we minimize the SMM criterion to obtain an update of the parameter vector Notice that g 0 (θ) and h(θ 0 s ) are invariant to the argument of optimization, λ. At the new value λ 1 , we draw new firm-specific parameters θ 1 s , s = 1, ..., S. Rather than discard, however, the firm-specific moments from the previous step, we use the simulation estimator which uses all the simulated moments in iterations 0 and 1 and attaches different importance-sampling weights depending on the distribution from which the simulated parameters are drawn. This procedure is iterated so that, at iteration L, and λ L+1 is obtained from minimizing the SMM criterion To limit the simulation variance of η L (λ) it is important to restrict the range over which λ varies in each iteration L. A natural choice is to require that Λ L is such that each element of λ can vary at most by a fraction δ λ from the corresponding value in λ L .
Crucially, the optimization of the right side of Equation 15 does not require new solutions to the firm's intertemporal optimization problem as λ changes. New solutions are only needed once the parameter vector λ L+1 is obtained. Moreover, this importance-sampling approach makes use of the entire history of firm-specific moments, calculated at all iterations, and therefore its accuracy improves as new iterations take place.
Below we outline an algorithm that describes the steps needed to implement this adaptive importance-sampling simulated method of moments (IS-SMM) procedure.
3. In iteration L ≥ 0: (a) Update the parameter space: (c) For each s = 1, ..., S simulate a K × T matrix of data y(θ L s ) according to the postulated structural model.  (Equation 15) and, therefore, minimization is very fast.
Once Algorithm 1 converges, the final parameter vector λ IS−SM M is used as an initial guess to minimize the SMM criterion in Equation 8 using the standard Monte Carlo approach and obtain λ SM M , for which inference can be performed using the asymptotic distribution in Equation 9.
For firm j with vector of moments h j estimated from the actual data, the expected firm-specific parameter vector is where ξ(θ, h) is the joint density of firm-specific parameters and moments. To obtain firm-specific estimates of the structural parameters, we use the following procedure. Let θ 1 , . . . , θ S be random draws from f θ θ; λ SM M . If ξ is given in a known closed form expression, we can estimate E θ | h j by Monte Carlo integration: If such an expression is unavailable, we can obtain an estimate of the joint density of parameters and moments, ξ (θ, h), by fitting a Gaussian copula on θ 1 , h (θ 1 ) , . . . , θ S , h (θ S ) .
Then, the simulation estimator of the firm-specific parameter vector becomes

Application: corporate investment with heterogeneous technologies
In this section, we present a concrete application of our methodology, using a neoclassical corporate investment model with parameter heterogeneity. First, we describe the model, discuss the choice of the moments to match, and provide a step-by-step analysis of the estimation algorithm. Then, we describe the data sample and present the results of the estimation. Finally, we discuss the cross section of firm-level technology parameter estimates.

Model
We study a standard neoclassical dynamic model of investment (see Adda and Cooper, 2003). The economy is populated by risk-neutral firms that discount future cash flows at rate r. Time is discrete, and the horizon is infinite. Firm j's operating profits in period t where k j,t is the capital stock, α j ∈ (0, 1) is the firm-specific curvature parameter of the profit function, and z j,t is a random profit shock that follows a first-order Markov chain in [z j , z j ], which approximates the process where ρ j ∈ (0, 1), σ j ≥ 0 and ε j,t follows a standard normal distribution. 5 The dynamics of capital are determined by the firm's investment choices, where δ j ∈ (0, 1) denotes the depreciation rate of capital. Investment results in adjustment costs, where I{ } is an indicator function, φ − j ≥ 0 and φ + j ≥ 0. The specification of adjustment costs in Equation 22 accounts for costly reversibility of investment decisions, and it allows for different levels of adjustment costs for investment and disinvestment (see Zhang, 2005). 6 We define the firm's technology by the parameter vector We assume that θ j is time invariant. Therefore, unlike most prior papers in the literature, firm heterogeneity in the model results not only from transitory profit shocks, but also from permanent differences in investment policies across firms.
Using Bellman's principle of optimality, we characterize firm j's dynamic problem as the solution to where F (.|z j,t ; ρ j , σ j ) is the profit shock's transition c.d.f. implied by Equation 20, subject 5 The period profit function in Equation 19 is commonly used in the literature (e.g., Hennessy and Whited, 2005). Cooper and Ejarque (2003) show that this specification can be derived in a setting in which a firm with market power faces a demand function with constant elasticity, has a production function with constant returns to scale over capital and a flexible input, and is affected by shocks to profit, demand, and input prices. See also Abel and Eberly (2011). 6 Cooper and Haltiwanger (2006) use a specification that assumes firms pay a fixed adjustment cost every time they invest or disinvest assets. Such fixed costs give rise in equilibrium to lumpy investment behavior: Periods of high investment are followed by periods of inactivity. Whereas such inactivity is present at the plant level, it is rarely observed for large firms. For this reason, we model investment frictions by means of convex adjustment costs.
to the capital accumulation constraint in Equation 21. 7

Technological heterogeneity and optimal investment
In this subsection, we derive the optimality conditions for investment, and describe the effects of technological heterogeneity. Using the first-order condition of Equation 23 and the envelope condition with respect to capital, the optimality condition for investment is This investment Euler equation has the following interpretation: The marginal cost of investing in capital today (left-hand side) is balanced against its marginal benefit (righthand side). The marginal cost is due to the direct cost of installing one additional unit of capital, and the associated adjustment costs. The marginal benefit results from increases in future cash flows, from the availability of future capital, which will be deployed for productive use, and from the reduction in future adjustment costs.
The fact that technology, as described by the parameter vector (α j , ρ j , σ j , φ − j , φ + j , δ j ), is heterogeneous across firms means that, in the model, profits (Equation 19) and investment (Equation 24) exhibit persistent cross-sectional variation. The aim of the empirical analysis in the next sections is to estimate the structural parameters using the cross-sectional variation in investment and profitability across firms in the data. The structural parameters capture important determinants of firms' technology and competitive environment. For example, a firm's degree of market power, which affects its profitability, will be reflected in the curvature of the profit function, α. 8 Moreover, a firm may invest more aggressively than others because it has, for technological or competitive reasons, a shorter product life cycle (larger δ), a more loyal clientele (higher ρ), it faces less volatile demand and input cost conditions (lower σ), it has lower fire-sales pressures when disinvesting (lower φ − ), or because it can assimilate the technology of new assets with lower cost (lower φ + ).

Selection of moments
We now discuss the moments chosen for estimation. We set the real risk-free rate r to 1%, the average annualized difference between the three-month US Treasury bill rate and the quarterly growth rate of the Consumer Price Index over our sample period. We select moments on the basis of their informativeness about the structural parameters. For each firm, we compute a vector of six moments: the mean and standard deviation of operating profits scaled by assets, defined in the model as zk α /k; and the mean, standard deviation, skewness, and serial correlation of the investment rate, i/k. We then aggregate these firm-specific moments and choose as moments to match the deciles of their cross-sectional distributions. We leave out the first and ninth deciles to reduce the impact of outliers, which can be large given the short time series available for each firm in the data.
Each component of the parameter vector θ affects essentially all firm-level moments, often in a nonlinear way. However, to gain intuition about what guides our choice of moments to match for estimation, we highlight some key relations between firm-level parameters and moments. We do so by means of a comparative statics analysis, plotting firm-level simulated moments as a function of firm-specific parameters in Figure 1.
A higher curvature parameter, α, for the profit function leads to lower average operating profits as a fraction of capital. The autocorrelation, ρ, and variance, σ, of the profit shock are positively related to the autocorrelation of the investment rate and the variance of operating profits, respectively. The depreciation parameter, δ, positively affects the average investment rate. Finally, the standard deviation of the investment rate decreases in both adjustment cost parameters. However, φ + reduces the skewness of investment, whereas φ − increases it.

Computational advantages of the estimation procedure
Before describing the implementation of the importance-sampling algorithm, it is worth highlighting its computational advantages, compared to the standard SMM approach, in estimating our model with fixed firm heterogeneity.
Consider, first, the case in which there is no parameter heterogeneity, that is, all firms are characterized by the same parameter vector θ ≡ (α, ρ, σ, φ − , φ + , δ). Under this assumption, which is common in the literature, investment and profits vary across firms only because of different profit-shock realizations. In this case, the object to be estimated is the parameter vector θ. Instead, in our model, each firm j is characterized by its own parameter vector θ j , and what needs to be estimated is λ, the economy-wide parameter vector that determines the cross-sectional distribution f θ (θ; λ) of the firm-specific parameters. For example, one element of the vector θ is the parameter α, which determines the curvature of the profit function. If all firms had the same technology, we would need to estimate one number for α.
However, with parameter heterogeneity, firms have different α's, and we need to estimate the parameters that describe the distribution of α's across firms.
When firms are heterogeneous in parameters, the standard SMM approach described in Section 2 requires to solve the firm optimization problem for different draws of θ each time a particular value of λ is considered in the minimization of the SMM criterion (Equation 8).
When λ changes, the cross-sectional distribution of θ changes and, as a consequence, new values of θ need to be drawn and the corresponding firm optimization problems need to be solved again.
The main computational advantage of the importance-sampling algorithm is that it allows to vary locally λ and f θ (θ; λ), but keep the values of θ for the sample of simulated firms fixed. To see this, notice that the importance-sampling estimator in Equation 11 has three parts: (i) the moments of the simulated firm sample given a draw of θ; (ii) the auxiliary distribution g(θ); and (iii) the density f θ (θ; λ). The importance sampling algorithm varies only (iii), keeping fixed (i) and (ii), which implies considerable savings in computation time, as one does not need to solve the firm optimization problem and obtain the corresponding simulated moments for new draws of θ.

Estimation algorithm
In this section, we describe the algorithm that we use to estimate the structural model of corporate investment. We assume that each element m ∈ {α, ρ, σ, φ − , φ + , δ} of the parameter vector θ is independently distributed across firms according to a beta distribution with parameters λ m = (β m , γ m ) over the domain [m, m]. The beta distribution offers a high degree of shape flexibility: depending on parameter values, it can be symmetric, positively or negatively skewed, or bimodal. Moreover, this distribution comes as a natural choice for random variables with compact support, such as the autocorrelation coefficient of the profitability shock, ρ, the depreciation rate, δ, and the profit curvature parameter, α. We assume that the support for the parameters has lower bounds θ ≡ We solve Equation 23 by value function iteration. To that end, we discretize the state space for both the profit shock and the level of capital. Since firms' technologies are heterogeneous, the ergodic sets for z and k differ depending on the parameters. To compute expectations over future profit shocks, we follow Tauchen (1986). For each simulated parameter vector s = 1, ..., S, we construct an equally-spaced grid with n z = 10 points for log(z) that spans eight standard deviations of the ergodic distribution; that is, Capital lies on the grid where k * s = αs r+δs 1 1−αs is the firm's steady-state level of capital assuming that z s = 1 permanently (see Nikolov and Whited, 2014), and n k = 401 is the number of points of the capital grid.
We now provide a detailed analysis of the computational algorithm based on the impor-9 Our estimation results in Subsection 4.8 show that these constraints do not bind, that is, the density at the boundaries is zero (see Figure 4). tance sampling acceleration method discussed in Section 3. We use this algorithm to obtain a starting point λ IS−SM M for the standard SMM estimation. For each step, we indicate the input needed by the procedure and output generated.

In iteration L ≥ 0:
(a) Update the parameter space by setting i. Construct a grid for productivity z G s and capital k G s according to Equation 25 and Equation 26, respectively.
ii. Construct the n z × n z transition matrix P s for the grid log(z G s ) given parameters ρ s and σ s according to Tauchen (1986).
iii. Compute the period payoff matrix U s of n z n k states by n k controls. To that end, let 1 n denote a n × 1 vector of ones. Set Z s = 1 n k ⊗ z G s , K s = k G s ⊗ 1 nz , iv. Initiate the value function V (0) s as an n z n k × 1 vector of ones.
v. At iteration n ≥ 0, update the value function according to , where the maximization is performed column-wise for every row, and V (n) 1s is V (n) s reshaped into a n z × n k matrix. Iterate until convergence: to make fast progress in matching the economy-wide moments η N f without resolving for optimal firm policies for every value λ ∈ Λ L used in step 3(f) of iteration L.

Comparison with other estimation methods
In this subsection, we compare SMM estimation to alternative methods, namely, Gener-   [0.98, 0.98, 2, 5, 5, 0.8]. α is the curvature of the profit function. ρ is the autocorrelation coefficient of the log-profitability shock. σ is the volatility of the innovation to the log-profitability shock. φ + and φ − are adjustment-cost parameters for investment and disinvestment, respectively. δ is the depreciation rate of capital. Estimation is performed using a sample of 906 publicly traded manufacturing firms between 1972 and 2012 from the Compustat Industrial Annual database. The details of the sample construction can be found in Subsection 4.7, the definitions of the variables in Table 1, and the moments in Table 2. tegrate out, for each firm, the parameter heterogeneity, conditional on the firm's observable quantities (e.g., the investment rate). 11 Because there is no closed-form solution for the conditional density of parameters given firm-specific observables, one would need to estimate this density by means of simulation. In turn, this requires solving for the firm-specific investment policy using the Bellman equation, which renders the Euler-equation approach unhelpful to alleviate the computational burden. A second alternative approach would be to perform MSL estimation using investment-rate data for the firms in our sample. Such a method would effectively use all moments of the distribution of investment in the estimation. Our approach, however, offers the advantage of matching moments that are not based on investment, such as those describing firm profitability. Moreover, from a computational perspective, the SMM approach is superior. This is because, in our model, investment policy does not admit closed-form solutions, unless the capital adjustment-cost parameters are set to zero. MSL, therefore, would require numerically finding how the observed investment rate and capital pairs map to profit shocks for each simulated firm-specific parameter vector, which would substantially slow down the estimation process. 12

Data
Our source of data is the Compustat Industrial Annual database for the 1972-2012 period. We start from the full sample of US manufacturing firms (primary Standard Industrial Classification [SIC] code between 2000 and 3999). We delete firm-year observations with missing values, and observations with negative sales (Compustat item SALE) or gross property plant and equipment (PPEGT). 13 We also drop firms with negative average operating income, as the model does not account for negative operating profits, firms with total assets (AT) less than 10 million real 2009 dollars, 14 and those with growth in real sales or total 11 In our setting, firm-specific heterogeneity cannot be removed by simple transformations of the data, such as first-differencing, which is commonplace in GMM estimation of linear dynamic panel data (see Arellano and Bond, 1991).
12 Notice the contrast with the model of leverage policy with firm heterogeneity in Morellec, Nikolov, and Schurhoff (2012), for which closed-form solutions exist. There, MSL can be employed at a lower computational cost. 13 We replace missing values of sale of property (SPPE) with zeros. According to Frank and Goyal (2003), Compustat records this variable "as missing when a firm does not report a particular item or combines it with other data items." 14 Variables are deflated to constant 2009 dollars using the GDP deflator. Source: Bureau of Economic Analysis (www.bea.gov), NIPA Table 1.1.9. assets greater than 100% in a given year. The latter two filters are common in the empirical literature on corporate investment (see, for example, Almeida and Campello, 2007), and are applied to eliminate firms that are very small or likely to be involved in large mergers or reorganizations. We then choose, for each firm, the longest consecutive time series of data and drop firms with less than 10 observations. Our final sample consists of 19,326 yearly observations for 906 firms, with an average of about 21 observations per firm. Variable definitions are reported in Table 1. Finally, to reduce the effect of outliers, we winsorize the variables at the top and bottom 1%.  The variables used for estimation are investment and operating income, scaled by beginning-of-period assets. An analysis of variance shows that between-firm variation represents 35% of total variation in investment, and 38% in operating income. Moreover, a test of equality in means for these variables across firms is rejected at the 1% level. 15 This rejects the notion that the observed variation in profits and investment is due only to transitory firm-specific shocks. Hence, there is significant cross-sectional heterogeneity in investment and profits among firms in our sample. To further illustrate the degree of firm heterogeneity, we provide histograms of the firm-specific empirical moments of investment and operating income in Figure 3, and the deciles of the moments' distributions in Panel A of Table 2.
All moments exhibit substantial cross-sectional dispersion.
It is important to point out that the persistent heterogeneity in firm characteristics found in the data cannot be explained by structural models that assume a unique value of parameters across firms. The typical procedure when estimating such homogeneous-  Table 1, and the summary statistics of the cross-sectional distribution of the moments in Panel A of Table 2.
Average operating income   Table 3. The definition of each variable according to the model is reported in parentheses. Estimation follows the SMM procedure described in Subsection 4.5. Panel C presents t-statistics for the difference between actual and simulated moments.  parameters models is to filter out heterogeneity in the data by removing firm fixed effects in the variables of interest (see discussion in Strebulaev and Whited, 2012). Instead, in this paper, we aim precisely to understand the drivers of persistent variation in firm policies. We do so by allowing firms to have heterogeneous technological parameters, which we estimate in the following section.

Parameter estimates
The estimates of the structural parameters λ m = β m , γ m , for m ∈ {α, ρ, σ, φ − , φ + , δ}, and their associated standard errors are shown in Panel A of Table 3. Figure Table 3, we test whether firms are homogeneous with respect to each parameter, that is Var (m) = 0, using the null hypothesis H 0 : β m γ m = 0. We perform this test based on a t-statistic constructed by the delta method (see Greene, 2003, p. 914). The assumption of homogeneity is rejected at least at the 5% level for all technological parameters, except the disinvestment adjustment-cost parameter, φ − . However, when we test that E(φ − ) = E(φ + ), the hypothesis is rejected at the 1% level against the alternative E(φ − ) > E(φ + ). 16 Hence, we conclude that disinvestment adjustment costs exist, their average value is higher than positive-investment adjustment costs, but they display a lower degree of heterogeneity than other technological parameters.
It is informative to examine the performance of the model in matching the cross-sectional moments of investment and operating income. The goal here is not to test a new model of investment; it is to provide a novel empirical test of a standard investment model that accounts for firm-specific heterogeneity. In other words, we examine how well the neoclassical model fits the investment behavior of a heterogeneous cross section of firms. In the first two panels of Table 2, we show the empirical moments used for estimation and the corresponding simulated moments. Statistically, most simulated moments differ significantly from the 16 Formally, we test the null hypothesis by means of H0 :  98, 0.98, 2, 5, 5, 0.8]. α is the curvature of the profit function. ρ is the autocorrelation coefficient of the log-profitability shock. σ is the volatility of the innovation to the log-profitability shock. φ + and φ − are adjustment-cost parameters for investment and disinvestment, respectively. δ is the depreciation rate of capital. Estimation follows the SMM procedure described in Subsection 4.5 and is performed using a sample of 906 publicly traded manufacturing firms between 1972 and 2012 from the Compustat Industrial Annual database. The details of the sample construction can be found in Subsection 4.7, and the moments in Table 2. The last row reports the χ 2 statistic for the test of the overidentifying restrictions. Panel B report the t-statistics, constructed by the delta method (see Greene, 2003), for the test of parameter homogeneity using the null hypothesis H 0 : β m γ m = 0.  Table 2.  Table 3) allows us to reject the hypothesis that the empirical moments are matched.
Perhaps, though, this is not surprising in our case, as there is a large difference between the number of moments we target (42) and the number of structural parameters (12). A closer look at the moment comparison reveals that discrepancies between empirical and simulated moments may be statistically significant, but are often economically small. For example, the discrepancy in average operating income is less than 1.5% across most deciles. This is also the case when we examine the match for the standard deviation of operating income, average investment and the standard deviation of investment. Most simulated moments follow closely their empirical counterparts, yet the remaining discrepancies are statistically significant. From an economic perspective, we argue, therefore, that the model is informative about the heterogeneity in firm profitability and investment. The largest deviations emerge in the skewness of investment and in the lower deciles of the autocorrelation of investment. We attribute this difference to the relatively lower weight that these higher-order moments receive in the SMM objective function.
We obtain firm-specific estimates of the technological parameter vector θ j using Equation 18. To do so for a given firm j, the corresponding firm-specific moments h j need to belong to a relatively tight neighborhood of the set of moments simulated under λ. This is the case for 822 out of 906 firms in our sample. Table 4 presents summary statistics of the technology parameter estimates for these firms. All firm-specific estimates present considerable cross-sectional dispersion. This becomes even more remarkable recalling that our sample is constructed on a relatively homogeneous set of manufacturing firms that are in the sample for at least 11 years.
Overall, the results in this section show that there exists a large degree of persistent heterogeneity in corporate investment policies and firm profitability. Our methodology shows that it is possible to empirically test the potential of dynamic models of investment to explain this heterogeneity at a computationally feasible cost. Table 4: Summary statistics of firm-specific parameters. This table reports descriptive statistics of the firm-level parameter estimates, θ j ≡ ( α j , ρ j , σ j , φ − j , φ + j , δ j ). These are obtained, for each firm j, using Equation 18, the cross-sectional parameter estimates in Panel A of Table 3, and the firm-specific empirical moments. α is the curvature of the profit function. ρ is the autocorrelation coefficient of the log-profitability shock. σ is the volatility of the innovation to the log-profitability shock. φ + and φ − are adjustment-cost parameters for investment and disinvestment, respectively. δ is the depreciation rate of capital. The sample consists of 822 US manufacturing firms in Compustat for the 1972-2012 period, for which we obtain estimates of firm-specific technological parameters. Details of sample construction are in Subsection 4.7, and variable definitions are provided in Table 1

Conclusion
In this paper, we propose an adaptive importance-sampling method to improve the computational performance of SMM estimation in structural models with fixed cross-sectional parameter heterogeneity. To illustrate our method, we estimate a standard neoclassical model of corporate investment with firm-specific technological heterogeneity for a sample of US manufacturing companies. We find that there is a high degree of dispersion across firms in parameters describing the returns to scale of the profit function, the persistence and volatility of shocks to profits, the rate of economic depreciation, and capital adjustment costs.
The importance-sampling approach we employ shows how the daunting challenges of structural estimation of models with parameter heterogeneity and no analytical solutions can be overcome. The neoclassical model of investment that we consider is a building block on which further dynamic theories of firms' decisions have been developed. Therefore, additional topics can be explored on the basis of the methodology used in this paper.
Two particularly interesting areas are the empirical analysis of heterogeneity in costs of external financing, and the effects of technological heterogeneity on the aggregate price of risk. Although such models are characterized by more state variables than the model in this paper, recent developments in numerical integration methods (Judd, Maliar, and Maliar, 2011) and the use of parallel computing (Aldrich, Fernandez-Villaverde, Gallant,