Identifying and Testing Models of Managerial Compensation

This article analyses the identification and empirical content of the pure moral hazard (PMH) and the hybrid moral hazard (HMH) principal–agent models. The PMH model has hidden actions, while the HMH model has hidden information in addition to hidden actions. In both models, agents are risk averse and principals are risk neutral. The article derives the equilibrium restrictions from the optimal contract and uses the restrictions to show that the models have empirical content. For any given risk-aversion parameter, the models' other parameters are non-parametrically point identified. The risk-aversion parameter—and hence the model—are, however, only partially identified. Management's ability to manipulate accounting reports arises endogenously within HMH models, but not in all versions of PMH models. We use our framework to investigate whether shareholders contract with management recognizing that accounting reports are susceptible to manipulation and, therefore, endogenous to the incentives offered to management. The data reject all models in which accounting reports are verifiable. Furthermore, the version of the PMH in which accounting reports can be manipulated is rejected if expected compensation is restricted to be positive.


Introduction
To a large extent the theory of optimal contracting in the presence of asymmetric information revolves around principal agent models. The first half of this paper analyzes the identification of a canonical class of such models with a risk neutral principal and a risk averse agent. We show the models are point identified up to the coecient of absolute risk aversion, but that in the absence of further exclusion restrictions, the risk aversion parameter is only partially identified from profit or value maximization optimization conditions with data on profits or returns and compensation. The second half exploits our results on identification in an application of optimal contracting under moral hazard and hidden information to executive compensation. We use data on executive compensation, financial returns and accounting returns, to nonparametrically estimate bounds on the social costs of moral hazard in executive management and other measures of asymmetric information. Our estimation techniques is based on our identification results; by concentrating out the other parameters of the model the risk aversion parameter is obtained as a mapping of the underlying data generating process. This only leaves us with one parameter to estimate, the coecient of absolute risk aversion.
Our framework incorporates two forms of asymmetric information, a pure moral hazard model where only the actions of the agent are hidden, and a hybrid model where there are both hidden actions and also information about the environment that is privy to the agent but not the principal. In the pure moral hazard model, the expected utility of a manager does not depend on the state of the firm at the beginning of the period but only on his outside alternative employment option. In a hybrid model managers are provided incentives to divulge the true state of the firm at the beginning of the period, their expected utility from truth telling typically diers across states, and the optimal contract equates the expected utility of revealing the true state of the firm with the expected utility of lying about it. We fully characterize the restrictions from the optimal contracts for both models on data measuring information provided by the agent about the state of the environment, gross profits (or more generally gross financial returns). We show that the pure and hybrid models are not nested, and the restrictions of optimal contract on the data generating process in equilibrium are testable. Our identification analysis establishes sharp and tight bounds on the identified set for both types of models. Our tests and estimators are based on these bounds.
The empirical work in the second half of this paper applies the principal agent models to a large panel data set with information about the financial and accounting returns to firms, compensation to their managers, background descriptors, and aggregate economic conditions. In the hybrid model we treat accounting information as discretionary, in other words information that shareholders cannot verify and in principle can be withheld by the manager through discretionary reporting. We find the pure moral hazard model does explain the correlation between financial returns and compensation, but can only jointly reconcile financial returns, accounting returns and managerial compensation by resorting to a taste based explanation characterized by heterogeneity in managerial preferences. In contrast, the hybrid model with stable homogeneous preferences is not rejected by the data. Our nonparametric bounds show that, overall, the total cost of moral hazard, and the loss from ignoring the incentive problem entirely, is not very sensitive to whether accounting returns and hidden information are explicitly incorporated into the estimation framework; confirming previous parametric estimates, the cost of the latter dwarfs the cost of the former.
In the next section we develop our approach using a prototype static model of pure moral hazard. First we lay out the model and solve the optimal contract. Then we show how much of it can be identified from a cross section of data on identical firms with data on their profits and managerial compensation. This leads directly to procedures for testing and estimating the structural parameters of the model. We conclude this section with a discussion of how exclusion restrictions can be imposed to further narrow the identified set of parameters. In Section 3 we relax the assumption that shareholders and managers are equally well informed about the state of the firm, and analyze a static model in which the agent has hidden information and engages in actions. We set up the Lagrangian that yields the optimal contract, showing how the constraints for the optimization problem and the first order conditions are aected by the addition of hidden information. These changes work through the identification analysis of the hybrid model, where we compare the dierences in the identified sets of both models.
Our empirical application is described in Section 4. We explain how the data were compiled and summarize its main features. The panel of roughly 27,000 observations covering the period 1993 to 2005 contains data on compensation to about 4,700 chief executive ocers of publicly traded companies compiled from Standard and Poor's ExecuComp data base, financial and accounting information of the 2,600 publicly trade firms they manage, taken from the Center for Securities Research (CRSP), as well as background characteristics on the sector and size of the firms. Then we discuss how our model was modified to accommodate heterogeneity in firms, measurement error in the compensation, and consumption smoothing of executives, which gives the model dynamics. Our test results and estimates are reported in Section 5.
This work draws from, and is related to, several literatures. Our least cost approach to optimal contracting to moral hazard models with hidden information, or hybrid models, extends the two step procedure for solving pure moral hazard models pioneered by Grossman and Hart (1980). The closest papers to our work on nonparametric identification are the independent analyses of Perrigne and Vuong (2011), who also exploit predictions from principal agent theory to analyze nonparametric identification in models of moral hazard and adverse selection, and Huang, Perrigne and Vuong (2007), who identify and estimate a nonlinear pricing model of advertising. 1 The identified set of parameters in our models are defined by nonlinear inequalities that involved moments of the population from which the data is drawn. Other applications of moment inequalities in the parametric estimation of structural models, include Andrews Levine (2007), and Pakes,Porter, Ho, and Ishii (2006). To test the specification we appeal to results in Chernozhukov, Hong and Tamer (2007) and Romano and Shaikh (2006). Murphy (1999), Prendergast (1999) and Chiappori and Salanie (2000) survey the large body of empirical work that investigates how well managerial compensation can be rational-ized by moral hazard. A much smaller number of recently published papers seek to quantify the economic significance of incentives in the labor market. They can be divided into those that exploit data on the institutional structure of incentives and compensation, to investigate how people respond to incentives, and those that estimate the relationship between firm returns and compensation, to investigate the role of asymmetric information in contracting problems. The first group of papers includes Ferrall and Shearer (1999), Shearer (2004), Dubois and Vukina (2009), Duo, Hanna and Ryan (2011) and Todd and Wolpin (2010); the second Haubrich (1994), Margiotta and Miller (2000),and Miller (2009a, 2009b). We contribute to the latter body of work, by showing how chief executives use accounting records to keep shareholders abreast of the state of their firm relative to their competitors' for the purpose of setting appropriate incentives.
Finally, our identification, estimation, and empirical results add to the literature on estimating risk preferences. Previous studies have incorporated observed or unobserved heterogeneity, using data from laboratory experiments (e.g. Laury (2002, 2005), Harrison, Johnson, McInnes, and Rutstrom (2005), and Harrison, List, and Towe (2007)), field experiments (e.g. Harrison, List, and Towe (2007), Anderson, Harrison, Lau, Rutstrom (2008), and Dohmen, Falk, Human, and Sunde (2010)) and individual behavior in actual markets (e.g. Chetty (2006) and Cohen and Einav (2007)). Our results show that whether heterogeneity in risk preferences is necessary to rationalize observed behavior may depend on the information structure of the model and the degree of heterogeneity in the other parts of the model specification.

Pure Moral Hazard
To explain our approach to identification, estimation and testing, we first analyze a simple principal agent model. We set up a static model of pure moral hazard and derive the cost minimizing optimal contract of the principal for alternative eort levels by the agent, working diligently or shirking. Comparing the expected net revenue of both contracts yields the profit maximizing contract. Then we derive the set of structural parameters that are observationally equivalent for data on compensation and profits from identical profit maximizing principals. This leads to a procedure for testing whether or not the data generating process comes from the class of pure moral hazard models we analyze, and for constructing a confidence region for the identified set of parameters. Last, we show how exclusion restrictions can be imposed to further restrict the identified set of parameters in the presence of observed heterogeneity that aects some but not all of the parameters.

A Static Model
At the beginning of the period, a risk neutral principal, proposes a compensation plan to a risk averse agent, which depends on the future realization of the gross revenue to the principal. The plan may be an explicit contract or an implicit agreement. The agent decides whether to accept or reject the principal's (implicit) oer. If he rejects the oer he receives a fixed utility from an outside option. If he accepts the oer, the agent chooses between pursuing the principal's objectives of value maximization, called working diligently, versus accepting employment for the principal but following the objectives he would pursue if he was paid a fixed wage, called shirking. The decision to accept or reject the oer is observed by the principal, but his work routine is not. After revenue is realized at the end of the period, the agent receives compensation according to the explicit contract or implicit agreement, and the remainder is profit to the principal. We introduce the notation, write down the model, and then solve for the cost minimizing contracts that elicit diligence and shirking.

Notation
We denote the workplace employment decision of the agent by an indicator l 0  {0, 1}, where l 0 = 1 means the agent rejects the principal's oer. We denote the eort level choices by l j  {0, 1} for j  {1, 2} , where work is defined by setting l 2 = 1, and shirking is defined by setting l 1 = 1. Since taking the outside option, working and shirking are mutually exclusive activities, l 0 + l 1 + l 2 = 1. Gross revenue to the principal is denoted by x, a random variable drawn from a probability distribution that is determined by the agent's work routine. After x is revealed the both the principal and the agent at the end of the period, the agent receives compensation according to the contract or implicit agreement. To reflect its potential dependence on (or measurability with respect to) x, we denote compensation by w (x) . The principal's profit is revenue less compensation, x  w (x) .
Denote by f (x) the probability density function for revenue conditional on the agent working, and let f (x) g (x) denote the probability density function for revenue when the agent shirks. We assume: The inequality reflects the preference of principal for diligent work over shirking. Since f (x) and f (x) g (x) are densities, g (x) , the ratio of the two densities, is a likelihood ratio. That is g (x) is nonnegative for all x and: We assume there is an upper range of revenue that might be achieved from working, but is extremely unlikely to occur if the agent shirks. Formally: Intuitively this assumption states that a truly extraordinary performance by the principal can only be attained if the agent works. We assume that g (x) is bounded, an assumption that rules out the possibility of setting a contract that is arbitrarily close to the first best resource allocation, first noted by Mirrlees (1975), by severely punishing the agent when g (x) takes an extremely high value. We assume the agent is an expected utility maximizer and utility is exponential in compensation, taking the form: where without further loss of generality we normalize the utility of the outside option to negative one. Thus  is the coecient of absolute risk aversion, and  j is a utility parameter with consumption equivalent  1 log ( j ) that measures the distaste from working at level j  {1, 2}. We assume  2 >  1 meaning that shirking gives more utility to the agent, than being diligent. A conflict of interest arises between the principal and the agent because he prefers shirking, meaning  1 <  2 , yet the principal prefers the agent to work since

Optimal contracting
To induce the agent to accept the principal's oer and engage in his preferred activity, shirking, it suces to propose a contract that gives the agent an expected utility of at least minus one. In this case we require w (x) to satisfy the inequality: To elicit work from the agent, the principal must oer a contract that gives the agent a higher expected utility than the outside option provides, and a higher expected utility than shirking provides. In this case we require: and: To attain expected revenue of E [x] at minimal expected cost, the principal choose a schedule w (x) to minimize expected compensation, denoted by E [w (x)] , subject to Inequalities (4) and (5). Alternatively, the more limited expected revenue target of E [xg (x)] can be reached by minimizing E [w (x)] subject to Inequality (3) . In the proof of Lemma 2.1 we show both problems have a Kuhn Tucker formulation that yields the following characterization of the solution to the two cost minimizing contracts.
Lemma 2.1 The minimal cost of employing a agent to shirk is  1 log ( 1 ). To minimize the cost of inducing the agent to accept employment and work diligently the board oers the contract: where  is the unique positive solution to the equation: In the proof we show that the participation constraint is met with equality in both cases, pinning down the certainty equivalent wage. There is no point exposing the manager to uncertainty in a shirking contract by tying compensation to revenue. Hence a agent paid to shirk is oered a fixed wage that just osets his nonpecuniary benefits,  1 ln  1 . The certainty equivalent of the cost minimizing contract that induces diligent work is  1 ln  2 , higher than the optimal shirking contract to compensate for the lower nonpecuniary benefits because  2 >  1 . Moreover the agent is paid a positive risk premium of E [w o (x)] 1 ln  2 . 2 In this model of pure moral hazard these two factors, that diligence is less enjoyable than shirking, and more certainty in compensation is preferable, explains why compensating an agent to align his interests with the principal is more expensive than merely paying them enough to accept employment.
Profit maximization by the principal determines which cost minimizing contract the principal should oer the agent. The profits from inducing the agent to work diligently are x  w o (x) , while the profits from employing the agent to shirk are xg (x)   1 log ( 1 ) . Thus work is preferred by the principal if and only if: while a shirking contract is oered if and only if: Otherwise no contract is oered.

Identification in the Pure Moral Hazard Model
The parameters of the model are characterized by f (x) and g (x) , which together define the probability density functions of gross profits, ( 1 ,  2 ) , the preference parameters for shirking and diligent work (relative to the normalized utility from taking the outside option), as well as the risk aversion parameter . For the purposes of this introductory example, we assume the data comprise independent draws of profits and compensation, (x n , w n ) for a sample of N observations generated in equilibrium. When the principal induces shirking and (9) holds, the density f (x) g (x) can be estimated from observations on profits, the wage is constant at w n   1 log ( 1 ) for all n, but nothing more can be gleaned from the data about the structure of the model. Our analysis focuses on cases when work is induced, (8) holds, and compensation w n , depends nontrivially on revenue x n . Hence f (x) is identified, along with N points on the compensation schedule w n  w o (x n ) . Under the assumptions of the model f (x) can be estimated with a nonparametric density estimator. From the compensation equation, the regularity condition on g (x) given by (1) , and the fact that g (x) is nonnegative, the maximum compensation the agent can receive is: Thus w is identified, and consistently estimated by the maximum compensation observed in the data. This essentially leaves ,  1 ,  2 , and g (x) to identify from f (x) , w o (x) , and w. Our analysis proceeds in three steps. First we show that if  is known, then  1 ,  2 , and g (x) are identified from the cost minimization problem. This means that the the set of 2 To prove E [w o (x)] is greater than its certainty equivalent,  1 ln  2 , we note that in the cost minimizing contract inducing diligence, (4) is met with equality. This implies observationally equivalent parameters can be indexed by the positive real number , the risk aversion parameter. Second, we show that the firm's preference for working over shirking provides an additional inequality that helps delineate the values of observationally equivalent . Third, we prove that the set of restrictions we have derived in the first two steps fully characterize the identified set.

Restrictions from cost minimization
Suppose  is known, and define the mappings g (x, ) ,  1 () , and  2 () as: All three mappings inherit the basic structure of the model for any positive value of .
Furthermore as x  , from (10) we see that w (x)  w, and hence g (x, )  0, as stipulated by the regularity condition. This proves g (x, ) can be interpreted as a likelihood ratio satisfying (1) for any  > 0.
Next consider  1 () and Similarly the numerator and denominator of the equation for  1 () have the same sign for all , so  1 () is also positive. Rearranging the expression for the ratio of the two taste parameters we obtain: Since the inverse function is convex, Jensen's inequality implies , and consequently  1 () <  2 () for all positive . To summarize, this discussion shows that, given a probability density f (x) for x and a compensation schedule w o (x) satisfying w o (x)  w as x  , identified and estimated from observations (x n , w n ) , we can construct, for any positive , a likelihood ratio g (x, ) and the taste parameters  1 () and  2 () that serve as primitives for a principal agent model of the type studied in the previous subsection, where the principal minimizes expected costs to elicit participation and working from the agent. Theorem 2.1 is a stronger result: if the risk parameter is known, then other primitives of the model are identified o data on compensation and returns using Equations (11), (12) and (13).
Theorem 2.1 Suppose the data on x n and w n is generated by a parameterization of a pure moral hazard model with risk aversion   . Then: The basic ideas for the proof of this theorem are straightforward. Making g (x) the subject of the compensation equation (6) and dierentiating with respect to x yields: From this equation it is evident that the slope is defined up to one normalization; a second normalization determines the level of g (x) . In our setup the regularity condition (1) provides one normalization; the fact that E [g (x)] = 1 provides another. The proof to Lemma 2.1 shows that the participation constraint (4) is met with equality, and that gives the formula for  2 (  ) . The incentive compatibility constraint is also met with equality, so: Substituting for g (x) from (11) on the left side and rearranging to make  1 the subject of the equation yields (12) evaluated at   .

Restrictions from the firm's choice of contract
The restrictions from cost minimization tie down all the parameters up to preferences for risk, but place no restrictions at all on . Imposing profit maximization, as opposed to cost minimization only, does limit the set of admissible . Since expected profits from paying the agent w o (x) are higher than paying him  1 log ( 1 ), it follows from (8) that: Substituting for g (x) and  1 from (11) and (12) in (16) define: From Theorem 2.1 Q 0 (  )  0. This inequality restricts the set of  that are admissible for the data generating process. 3

Sharp and tight bounds
Theorem 2.1 only exploits the first order conditions, equalities for the participation and incentive compatibility conditions which we establish in the proof of the first lemma, and an inequality of the optimization problem. The second order conditions of the cost minimization problem are satisfied for all  > 0. Are there any other restrictions? The short answer is no. We now establish that, given the underlying data generating process, every positive  satisfying (17) is admissible. Thus , a Borel set of risk aversion parameters, defined as: indexes all parameterizations that are observationally equivalent to the true model. Theorem 2.2 provides sharp and tight bounds for the set of identified parameters of the pure moral hazard model.

Theorem 2.2
Consider any data generating process x n and compensation schedule w (x) satisfying w n = w (x n ) for all n. Define  from the x n process using (18). If  is not empty, then (x n , w n ) is observationally equivalent to every data process generated by the pure moral hazard model parameterized by each   . If  is empty, then (x n , w n ) is not generated by such a pure moral hazard model.
Does there always exist for each joint probability distribution of (x, w) a positive  that rationalizes this model of pure moral hazard? No, the next lemma shows the model has empirical content and would be rejected by some data generating processes. It proves the profit inequality embodies overidentifying restrictions that imply some joint distributions of profit and compensation are incompatible with any positive risk aversion parameter.
Intuitively, our results on identification can be summarized thus far as a four step process. Suppose that both the probability density of gross profits conditional on working, and the compensation schedule as a function of gross profits, are both known. Up to a normalization reflecting the outside option, the certainty equivalent utility of the contract, and hence the taste or ability parameter for working, is recovered through the participation constraint from the overall level of compensation, by adjusting for its variation using a given risk aversion parameter. Given a regularity condition and a risk aversion parameter, variation in compensation as a mapping of profits yields, through the first order condition, the likelihood ratio of the densities for shirking versus working. Since cost minimization also implies the incentive compatibility constraint is also met with equality, and the distribution of output conditional on shirking can be recovered using the previous step, the preference for shirking is also recovered as another level eect from that constraint for a known risk aversion parameter that adjusts for variation in compensation that arises when the gross profit distribution is conditioned on shirking. Only the profit maximization is left to partially identify the risk aversion parameter: there is just one inequality which reflects the principal's preference for a working contract, that yields the expected net profits calculated directly from the distribution of gross profits and expected compensation, over a fixed wage shirking contract, calculated using the second and third steps for any given risk aversion parameter.

Empirical Approach
The identified set of risk parameters defined  has a simple empirical analogue. Suppose we have N cross sectional observations on (x n , w n ) on identical firms and their managers. To estimate Q 0 () , we replace w with w (N )  max {w 1 , . . . , w N } in (17) , and substitute sample moments for their population corresponding expectations, to obtain upon rearrangement: Our tests are based on the fact that if    then sampling error is the only explanation for why Q (N ) 0 () might be negative. Clearly Q (N ) 0 () converges at the rate of its slowest converging component. For simplicity suppose there exists some x <  such that g (x) = 0 for all x > x. In words, there is a revenue threshold that shirking cannot achieve. Thus compensation is flat at w for all profits levels above x, and w (N ) converges to w at a faster rate than  N. Since all the other components of Q (N ) 0 () are sample moments, we conclude Q the set of risk aversion parameters that asymptotically cover the observationally equivalent set of  > 0 with probability 1  . For the critical value c  associated with test size , this set is defined as: A consistent estimate of c  can be determined numerically by following the subsampling procedures used in our empirical application and described in the appendix. Intuitively, if  NQ (N ) 0 () is negative and large in absolute value for all  > 0 we reject the null hypothesis that the pure moral hazard model generated the data. On the other hand is  NQ (N ) 0 (  ) is small in absolute value, or positive, we do not reject the null hypothesis that   belongs to the identified set.

Multiple States and Exclusion Restrictions
When there is heterogeneity in probability distribution for revenue, additional restrictions on the set of observationally equivalent parameters can be imposed if preferences are invariant across states. Suppose there are two states denoted by s  {1, 2}. We denote the probability density function of revenue from working diligently in state s by f s (x) , and similarly express the corresponding likelihood ratio in s as g s (x). We assume f 1 (x)  = f 2 (x) and g 1 (x)  = g 2 (x). The optimal contract, state dependent, but solved the same way as the one state model, is denoted by w o s (x) . We also write w s for the limiting constant wage as x   in state s. If the heterogeneity is observed, the data records the state s n  {1, 2} , revenue x n  R and compensation w n  R for each observation n  {1, . . . , N} .
Suppose the risk aversion parameter of the agent does not vary across states, for example because the same type of agent works in both states. The solution to the cost minimization problem of inducing diligence, now denoted by w o s (x) to reflect the state dependence, is derived the same way as (6). For each state s  {1, 2} we define Q s () analogously to Q 0 () for s  {1, 2} , by substituting w o s (x) and w s for w o (x) and w respectively, as well as conditioning the expectations operator on the state, substituting E s [·] for E [·] in (17). 4 Following the same reasoning as the derivation of (17) , Q s (  )  0 for s  {1, 2} . More generally, increasing the states while maintaining the hypothesis that the risk aversion parameter is invariant across states, increases the number of inequalities from profit maximization by the same number. Now suppose that in addition nonpecuniary benefits from diligently working,  2 , do not vary by state. Although there might only be one participation constraint ensuring that the unconditional expected utility of the agent is at least as high as what the outside alternative oers, it is straightforward to show that the participation condition (4) holds with equality for each state s  {1, 2} in the optimal contract, implying  2 E s  e w o s (x)  = 1. Defining: it follows that  2 (  ) = 0. 5 Intuitively, a person's risk preferences cannot be identified from playing a single lottery if there are unobserved components to the reward from entering the lottery. When oered the chance to play two lotteries with dierent risk characteristics but the same unobserved nonpecuniary components, his risk preferences are partially revealed by the pecuniary compensating dierential between them, which equalizes his expected utility from playing one versus the other. Another potential restriction is that the nonpecuniary benefits from shirking,  1 , do not vary by state. Since the incentive compatibility constraint (5) also holds with equality in each state, Theorem 2.1 implies: In this case the restriction is based on two hypothetical lotteries, compensation from shirking in the dierent states. To incorporate these restrictions into the testing and estimation framework we define: From (20) and (21), Theorem 2.1 implies  1 (  ) = 0, if  1 does not vary across states. A joint test of these restrictions can be based on the criterion function: which attains a minimum of zero at all risk aversion parameter values that are observationally equivalent to   . To summarize, we provide an intuitive explanation of how the extension of the static pure moral hazard model to two states aects identification. Consider as a baseline a framework with maximal heterogeneity, where the taste parameters for working and shirking, as well as the risk parameter, vary by state. With maximal heterogeneity, we obtain just two inequalities from the profit maximization condition. The two risk parameter sets are separately determined state by state. If a single risk parameter satisfies both profit inequalities, then it must belong to the intersection of the individually determined sets. In this way we can derive the set of risk parameters that are common across states, without imposing homogeneity on the other preference parameters. To impose homogeneity on the taste parameter for work as well, we would extract from the intersection derived above those risk parameters that induced the same taste parameter for work in both states. Alternatively, imagine permitting heterogeneity in the risk parameter across states, but imposing instead homogeneity on the taste parameter for work. We would seek to equalize the taste parameter for work across states using two dierent risk parameters that individually satisfy the profit inequalities for their respective states.

Hybrid Moral Hazard
When multiple states in the principal agent model arise from the resolution of uncertainty about the revenue generation process, it is reasonable to entertain the possibility that the agent might be more informed than the principal, especially in applications of the model to principals and agents. In hybrid models of moral hazard the agent is subject to moral hazard, but also has more information than the principal about the probability distribution from which revenues are drawn. This section presents a hybrid model that diers from the two state pure moral hazard model discussed in the previous section only because the agent is more informed. We explain the dierences between the hybrid model and the pure model of moral hazard. Then we show how these dierences translate to identification.

Modifying the Pure Moral Hazard Model
In contrast to its pure moral hazard counterpart, in the two state hybrid model the agent has full information about the state but the principal does not. We model this below, imposing throughout the restrictions made in Section 2.4 that none of the preference parameters depend on the state. Then we explain how the information asymmetry is captured by inequality constraints that restrict the range of feasible contracts (implicit agreements). This leads to a statement of the optimization problem the principal solves, and its associated first order conditions. The identification analysis is based on the feasibility constraints of the contract and the first order conditions. Throughout we emphasize how and why the pure moral hazard analysis must be modified to account for the information asymmetry.

Hidden information about the firm
As before, at the beginning of the period, the principal proposes a contract. In the hybrid model however, the agent's compensation is determined by what he discloses about the probability distribution of gross revenue, denoted by r  {1, 2}, and its subsequent performance, x, revealed to both parties at the end of the period. We denote this mapping by w r (x). If the agent accepts employment with the principal over his outside option, the probability distribution of revenue is then fully revealed to the agent but remain partially hidden to the principal. There are two states s  {1, 2}, and the probability state s occurs is identically and independently distributed with probability  s  (0, 1). As in the multistate pure moral hazard model, if the state is s, revenue is drawn from the probability density function f s (x) if the agent is diligent and from g s (x) f s (x) if the agent shirks. We assume that the agent privately observes the true state s, reports the state r  {1, 2} to the principal, and makes his eort choice. If the agent reports the second state, meaning r = 2, then the principal can independently confirm or refute it. (For example imagine principals can review geological surveys of new oil fields, but that agents exercise some discretion about when to disclose them.) This prevents the agent from lying when the first state occurs, and models the idea that legal considerations induce the agents not to overstate revenue prospects, but that incentives must be provided to dissuade agents from understating them. 6 If s = 2 the agent then truthfully declares or lies about the firm's prospects by announcing r  {1, 2} , eectively selecting one of two schedules, w 1 (x) or w 2 (x) in that case, but if s = 1 he reports r = 1. By way of contrast, in the analogous pure moral hazard model, the true state s  {1, 2} is observed by the principal, thereby eliminating any role for reporting. This is the only dierence between the pure and hybrid frameworks. 7 If , the weighted likelihood ratio of the second state occurring relative to the first given any observed value of excess returns x  R, is unbounded for some value of x  (because f 2 (x  ) > 0 and f 1 (x  ) = 0), then truth telling about the second state can be enforced without cost. 8 To rule out this possibility, we assume h (x) is bounded throughout. The assumption implies the principal cannot be sure that the second state occurred simply by observing profits. To capture the idea that the second state is weakly more desirable than the first, we also assume that the second state is most likely to have occurred relative to the first state at the limit x  . 9 Summarizing our assumptions about h (x) mathematically: lim

Truth telling and sincerity constraints
Contracts between the principal and the agent that induce honest reporting in the second state and working in both states must satisfy a participation constraint plus two incentive compatibility constraints (one for each state) identical to those derived for the pure moral hazard model with two states, and two additional conditions that distinguish the hybrid from the pure moral hazard model, inducing the agent to truthfully reveal his private information.
Define v s (x)  exp [w s (x)] as the multiplicative utility value from the payo w s (x) . We rewrite the incentive compatibility constraint for each state as: and the participation constraint for working as: These three constraints correspond exactly to the three constraints in the two state pure moral hazard model. In the hybrid model we append them with two further constraints. Comparing the expected value from lying about the second state and working with the expected utility from reporting honestly in the second state and working, the principal can information disclosure in their theoretical analysis of the severity of moral hazard. 7 Our model should be distinguished from the mixed models reviewed for example in Chapter 7 of Laont and Martimort (2002) and Chapter 6 of Bolton and Dewatripont (2005), where there is adverse selection of agents. In our framework this would occur if managers had dierent abilities or types, unobserved to shareholders, and competed with each other for a position with the firm after their type is revealed. 8 The principal could promises to severely punish the agent if the first state is reported but x  is subsequently drawn as the revenue outcome. 9 This assumption is implied, for example, by first order dominance.
induce the agent to tell the truth telling by restricting himself to contracts that satisfy: An optimal contract also induces the principal not to understate and shirk in the second state, behavior we describe as sincere. Comparing the agent's expected utility from lying and shirking with the utility from reporting honestly and working, the sincerity condition reduces to: where  1 v 1 (x) is the utility obtained from shirking and announcing the first state, and f 2 (x) g 2 (x) is the probability density function associated with shirking when the second state occurs.

Optimal contracting in the hybrid model
In the pure moral hazard framework the cost minimization problem is additively separable across states. In the hybrid model the information structure is not separable and this feature complicates the solution. Since to minimize expected compensation of inducing work in both states subject to the five constraints is tantamount to choosing v s (x) for each (s, x) to maximize: subject to the same five constraints. To induce work and truth telling in both states, the principal maximizes the Lagrangian: where  0 through  4 are the shadow values assigned to the linear constraints. Since each constraint is a convex set, their intersection is too. Also log v is concave increasing in v, the expectations operator preserves concavity, so the objective function is concave in v s (x) for each x. Hence the Kuhn Tucker theorem guarantees there is a unique positive solution to the equation system formed from the first order conditions augmented by the complementary slackness conditions. The dierences between the cost minimization problems for the pure and hybrid moral hazard models are evident from (29). In the pure moral hazard model  3   4  0 because the truth telling and sincerity constraints do not figure into the formulation of the problem. The first order conditions for this problem are: The following lemma is helpful for interpreting the first order conditions. Lemma 3.1 The Lagrange multipliers satisfy: From the second equality in Lemma 3.1 we infer that if, as in the pure moral hazard model,  3 =  4 = 0, then: In words, if neither the truth telling nor the sincerity constraints bind, or if the state is directly observed by the principal, then the pure moral hazard case applies, and expected utility is equalized across states. Otherwise ( 3 +  4 ) is strictly positive implying expected utility from the pure moral hazard case straddles the expected utility attained in the two states of the hybrid model: When the agent has private information he is rewarded for announcing the principal's good prospects and penalized for bad ones; in other words, the optimal contract pays him for luck.
As in the two state pure moral hazard model, there are three other contracts the principal might design, all of which involve the agent shirking in at least one state. If the principal requires only participation but not diligence in both states, there is no reason to distinguish between the two. Thus the cost minimizing contract for achieving (l 1 , l 2 ) = (0, 0) is found by setting  1 =  2 = 0, yielding an identical solution to the pure moral hazard model. In both states the agent is paid  1 ln ( 1 ) , equalizing expected utilities across both states. However both the remaining two cases yield counterintuitive results to the pure moral hazard problem. Suppose the principal opts for (l 1 , l 2 ) = (1, 0) , diligence in the first state but shirking in the second. Then  2 = 0, but from the second part of Lemma 3.1, the expected utility in the second state where there is shirking, is higher than in the first state, where diligence is called for! This is because the agent must be paid to reveal the second state, so that the principal can identify the first state to install incentives. Finally consider the cost minimizing way of achieving (l 1 , l 2 ) = (0, 1) . In this case at least one of the multipliers,  3 or  4 , is strictly positive, and from the first order condition for compensation in the first state, this immediately implies v 1 (x) and therefore compensation in the first state where the agent shirks, depends on profits through h(x) and possibly g 2 (x) too. Rather than load all the risk premium into the second state, compensation to the agent optimally also depends on profits in the first state, not to induce work in that state, but only to induce sincerity and truth telling that indirectly encourages diligence in the other (second) state.
Finally the type of contract chosen by the principal in a two state hybrid model of moral hazard is determined the same way as in the two state pure moral hazard model, by comparing net profits from the solutions to the four cost minimization problems, and oering the agent the most profitable one for the principal.

Identification in the Hybrid Model
The parameters of the hybrid model are also the same as for the two state pure moral hazard model, namely f s (x) and g s (x) for each state s  {1, 2} , which together define the probability density functions of abnormal returns from working and shirking in the states, the probability of each state occurring  s , the probability distribution for the states, ( 1 ,  2 ) , the preference parameters for leaving the firm, versus shirking and working within the firm, and the risk aversion parameter .
Instead of receiving data on (s, x, w) , we assume repeated cross sectional data is available on (r, x, w), where r  {1, 2} is a report by the agent on the firm's financial state. This dierence reflects the fact that the data only records what the agent reports, rather than independently verifying the actual state. However we assume that the data is generated by our model, where agents truthfully reveals the state in equilibrium. We directly apply the implication of the theory, that s = r (s) , by setting r = s. Hence, as in the two state pure moral hazard model, f s (x) is identified from observations on abnormal returns. This also implies that the probability  1 , are identified along with h (x) (including h). Finally w s (x) = w r (x) is identified from observations on (r, x, w) . As in the pure moral hazard model we assume the regularity condition on the schedule w r (x) that for all r, compensation is bounded, and the bound is approached at extraordinarily high returns, and we also apply the regularity condition on h as well.
Although the methods for establishing the identified set is the similar to the pure moral hazard model, there are some notable dierences in what can be identified from the dierent information structures embodied in the hybrid model. In the pure moral hazard model the participation constraint is binding for each individual state, whereas in the hybrid model, the expected utility determining participation is integrated over both states. This reduces the number of restrictions in a two state model by one. Instead the hybrid model impose two other constraints, truth telling and sincerity constraint. Since these constraints shape the optimal contract as a function of the parameters, they provide several restriction on the population moments that do not hold in the pure moral hazard model. From an econometrics standpoint the models are nonnested.
Again we focus on the empirical content of the equilibrium where diligence is induced in both states. First we represent the model parameters as functions of  and quantities directly computed from the data generating process. Then we display another overidentifying restriction that arises if the tastes for shirking do not vary by state because the incentive compatibility constraint is met with equality in both states. This restriction corresponds to Equation (21) in the pure moral hazard case. By Lemma 3.1 expected utilities are not equalized across states in the hybrid model, ruling out a potential restriction analogous to Equation (20). Instead we derive restrictions that arise from the truth telling and sincerity constraints, a dierence that implies the pure and hybrid models are nonnested. A final set of restrictions comes from the inequalities implied by contract selection, similar to those obtained in the pure moral hazard case.

Solving for preferences and the likelihood ratio
The likelihood ratio  g 2 (x, ) and the preference parameters   1 () and   2 () are defined as in Equations (11) through (13), which apply to the one state pure moral hazard model. Thus: and However in the hybrid model we see from the first order condi- ] even if tastes for work are not state dependent, whereas in the two state pure moral hazard model, finding can only be reconciled with state dependent tastes for work. Similarly  g 2 (x, ) is defined as: The striking resemblance to g (x, ) arises because, as in the pure moral hazard model, the principal can independently verify the agent's declaration that the second state has occurred, obviating the need to impose a truth telling constraint within the compensation contract. Turning now to the parameter for shirking we define the function: If the second state occurred with certainty this formula also would match to its analogue defined in (13) for the one state pure moral hazard model. The essential dierences arise in the two models arise from the role of the shadow prices on truth telling and sincerity in shaping (or distorting) the optimal contract in the first state, and hence statistical inferences from w 1 (x) about the likelihood ratio g 1 (x) . Recall that the principal cannot independently verify whether the first state has occurred if the agent declares it. Accordingly we define  g 1 (x, ) as: where the Kuhn Tucker multiplier representations are defined as: It is straightforward to check that if  3 () =  4 () = 0 as in the pure moral hazard case, then the expression for  g 1 (x, ) simplifies to Equation in Section. We are now ready to prove that if   is known, then both parameters,   1 and   2 , plus the likelihoods, g  1 (x) and g  2 (x) , are identified from data on abnormal returns, agent reports and compensation (x n , r n , w n ).
Theorem 3.1 If the data is generated by a hybrid model of moral hazard with positive risk aversion parameter   then:

The identified set of parameters
The (restricted) hybrid model also yields a restriction from the same value of  1 appearing in the incentive compatibility conditions for both states. Defining: we prove in the appendix that this restriction can be stated as  1 (  ) = 0 in the hybrid model. Setting  3 =  4 = 0, the functional form of  1 () simplifies to yield the identical restriction embodied in Equation (21) of the two state pure moral hazard case where tastes for shirking are not state dependent. Appealing to Lemma 3.1 the participation equations in pure moral hazard models do not hold in hybrid models state by state, because the sum of the shadow values for truth telling and sincerity are strictly positive, inducing an increase (osetting decrease) between the expected utility received in the second (first) state relative to the expected utility when the participation decision is made. This removes a restriction. Osetting this one restriction are two extra equalities plus seven extra inequalities in the hybrid model, that either do not apply or are automatically satisfied in the pure moral hazard model.
Define  2 () through  4 () as : By Theorem 2.2, the truth telling constraint guarantees at least one of the constraints holds strictly. Also both sets of complementary slackness conditions for truth telling and sincerity must be satisfied, meaning is a likelihood ratio in the hybrid model we ensure  g 1 (x,   )  0 with unit mass by imposing the restriction that  2 (  )  0. Finally three inequalities that ensure the Kuhn Tucker multipliers,  1 (  ),  3 (  ) and  4 (  ) are positive. Although there are a greater the number of inequality and equality restrictions imposed by the hybrid model than the pure moral hazard model, we are not asserting that hybrid moral hazard is more restrictive: the two models are nonnested. Turning now to the eort level induced by the principal in the hybrid model, we first remark that if shirking is demanded in both states, that is (l 1 , l 2 ) = (0, 0) , then compensation is determined in Lemma 2.1 for the one state pure moral hazard model. Since this is suboptimal: is positive at   . Since (l o 1 , l o 2 ) = (1, 1) the expected value to the firm would have been lower if either (l 1 , l 2 ) = (1, 0) or (l 1 , l 2 ) = (0, 1) had been chosen, and this observation yields two extra restrictions on   to be utilized in identification. For any   R + , we denote by w and from the notation defined in the previous section, w . Given the parameterization indexed by , the dierence in value to the principal from selecting (l 1 , l 2 ) = (1, 1) versus (l 1 , l 2 ) = (1, 0) is: 1). In similar fashion we define: and note that  3 (  )  0 because setting (l 1 , l 2 ) = (1, 1) is more profitable than setting (l 1 , l 2 ) = (0, 1).
Consolidating the restrictions directly applied to the hybrid model, we define  , a Borel set of risk aversion parameters, as: Our last theorem establishes an analogous result to Theorem 2.2. Theorem 3.2 establishes that the bounds we have constructed for the hybrid model are sharp and tight.
Theorem 3.2 Consider any data generating process (r n , x n ) and compensation schedule w r (x) satisfying w n = w rn (x n ) for all n. Upon setting r n = s n , define   from the (r n , x n ) process using (42). If   is not empty, then (r n , x n , w n ) is observationally equivalent to every data process generated by the hybrid moral hazard model parameterized by each    . If   is empty, then (r n , x n , w n ) is not generated by such a hybrid moral hazard model.
Summarizing the essential dierences between the two state pure and hybrid moral hazard models from the perspective of identification: they are clearly non-nested. There are three profit inequalities in the hybrid, but only two in the pure, whereas the hybrid has only one participation constraint, while the pure has two. Finally, both have two incentive compatibility constraints, but the hybrid also has truthtelling and sincerity constraints.

An Empirical Application
As an empirical matter, managerial compensation varies significantly with abnormal financial return. 10 The theory of pure moral hazard postulates that risk averse managers should receive compensation that fluctuates with signals risk neutral shareholders observe about decisions their managers make, most notably abnormal returns, in order to align the incentives of the managers when their nonpecuniary goals dier from maximizing shareholder wealth and the actions and decision of management are not monitored. Although the dominant paradigm, this explanation for executive compensation has been challenged on several fronts. First, as we show below, managerial compensation not only depends on the financial returns of the firm, but also its accounting returns. In models of pure moral hazard, shareholders might use signals other than financial returns to determine optimal compensation, but the reporting of accounting income is subject to considerable discretion by the manager. Our hybrid model of moral hazard provides a framework for analyzing unverifiable claims by management that are credible because of financial incentives embedded in their compensation to be truthful and sincere. Second, managers are paid for luck, risk factors beyond executive control that increase the volatility of their income, 11 which is inconsistent with the notion of mitigating uncertainty in compensation to risk averse agents. The optimal contract in our hybrid model rewards managers for announcing good news they observe privately, so that they are incentivized to work in favorable states. Third, several empirical studies find that trading by corporate insiders appears profitable. 12 In models of pure moral hazard, managers do not have private information about the firm's future prospects; by way of contrast a prediction of the hybrid model is that managers take advantage of private information they have. In our empirical study we investigate alternative ways of rationalizing these three anomalies, by allowing for sucient heterogeneity of preferences and abilities within a dynamic version of the pure moral hazard model, versus resorting to the richer information structure that characterizes a model of hybrid moral hazard.
Our approach to estimation and testing can be conducted by data pairing profits, x n , with contractual arrangements, w s (x) , or with actual payouts w n . If as in Section 2 the data the contractual arrangements comprehensively define the payouts in every contingency s, and compensation is measured without error, the two approaches are equivalent because w n = w (x n ). Findings in Hayes and Schaefer (2000) and Gillen, Hartzell and Parrino (2009) suggest that implicit understandings between the board and management are an important factor in managerial compensation policy. The details of the institutional process determin-ing compensation only adds plausibility to their results: typically executive compensation committees convene several times a year, perhaps in conference calls, using spreadsheets, and benchmarking other comparably placed executives against firm performance, to determine a value for each component of compensation, that is how much cash and bonus to pay, how many stock options of a given type to grant, and so forth. Reliable information on how the manager would have been paid if firm profits had deviated from the actual outcome do not exist. For this reason all empirical work on CEO compensation is based on actual payouts, that is w n , rather than an incomplete understanding of the contractual arrangement, w s (x). 13 We follow Smith (1985, 1986), Hall and Liebman (1998), Margiotta and Miller (2000) and Miller (2009a, 2009b) to measure total executive compensation. It is the sum of salary and bonus, the value of restricted stocks and options granted, the value of retirement and long term compensation schemes, plus changes in wealth from holding firm options, and changes in wealth from holding firm stock relative to a well diversified market portfolio instead. 14 However we do not assume that compensation is measured without error; on the one hand the data source, the Securities and Exchange Commission, imposes harsh penalties on fraudulent reporting; on the other hand some income managers receive might not be recognized as such and consequently not subject to the same reporting requirements.
The remainder of this section describes the longitudinal data set used in our empirical study. To account for cross sectional dierences between firms in any given year we use firm level data on sector, employment, assets, and debt to equity ratio. Financial returns on the market portfolio and bond prices index the aggregate variation over time. Accounting return is treated as a signal the manager sends stockholders about the state of the firm and its future profitability prospects relative to firms with similar characteristics. In our empirical framework managerial compensation is explained by these variables.

Data
Our primary data source is Standard & Poor's ExecuComp database. We extracted compensation data on the current chief executive ocer (CEO) of 2,610 firms in the S&P 500, Midcap, and Smallcap indices spanning the years 1992 to 2005. We supplemented these data with firm level data obtained from the S&P COMPUSTAT North America database and monthly stock price data from the Center for Securities Research (CRSP) database. The sample was partitioned into three industrial sectors by GICS code. Sector 1, called primary, includes firms in energy (GICS:1010), materials (1510), industrials (2010,2020,2030), and utilities (5510). Sector 2, consumer goods, comprises firms from consumer discretionary (2510,2520,2530,2540,2550) and consumer staples (3010,3020,3030). Firms in health care (3510,3520), financial services (4010,4020,4030,4040), information technology and telecommunication services (410, 4520, 4030, 4040, 5010) comprise Sector 3, which we call services. Table 1 summarizes the cross sectional features of our data. Almost twice as many firms in services, as in consumerables, with the primary sector accounting for about half the observations. Average firm size by total assets is highest in the services sector and lowest in the consumer sector. This ordering is reflected by the debt equity ratio, the sector with largest firms by asset also being the most highly leveraged, but reversed when employment is used to measure firm size instead. For this reason we used both total assets and employment as two measures of size, and included the debt equity ratio as a factor that might aect the distribution of abnormal returns, and hence managerial compensation. 15 In this study we assume that firm sector, the firm's total assets, the number of its employees, and its debt equity ratio, is public information.
From Table 1 we note that the average accounting return in the services industry is higher than the other two, but more remarkable is the fact that its standard deviation is much higher. This could be attributable to many factors, but we note that the services sector includes many firms that are intertwined with technological change in a rapidly changing product space, and for that reason alone might rank amongst the hardest firms to value. Table 2 summarizes the longitudinal features of our data. There are roughly the same number of observations per year, apart from 2005, where we only include data on firms whose financial records for that financial year ended before December. 16 In the sample period, financial returns from the stock market to diversified shareholders ranged from a yield to 45 percent in one year to a loss of 14 percent returns in another. Far greater is the variation around the market return by individual firms. Note that the actions of an individual manager are too inconsequential to appreciably aect the stock index. For this reason we take, as our measure of the component of profit that managers can aect through his actions, financial returns to the firm net of the share market index return. This latter variation in abnormal returns, rather than variability due to aggregate factors, is critical to explaining managerial compensation.
The collective signal managers send about business, average accounting returns, is highly correlated with financial returns, almost without exception rising and falling together. Note though that accounting returns have a considerably higher standard deviation, in part attributable to fixed eects across firms, but also to higher idiosyncratic variability over time.
The term structure of interest rates underlying the bond price series were constructed from data on Treasury bills of varying maturities, and the prices were derived using methods described in Gayle and Miller (2009b). Table 2 shows that over this period, year to year bond price fluctuations are in the order of 5 to 10 percent, but there is no discernible trend in this aggregate variable.
Total assets vary a great deal by firm within and across years, growing by a factor of factor of almost 3 over the period, with year to year standard deviations that are more than twice the mean; thus the cross sectional distribution of firm assets is skewed to the upper tail.
The cross sectional distribution of employees is similarly skewed, but in contrast to assets, firm employment on average grows by less than a quarter. More remarkable than changes in annual average debt equity ratio, which ranges between 2.41 and 4.69, is its standard deviation, which varies between 5 and 105.
From Table 2 we see that the mean compensation of managers fluctuates much more than real wages for professional employees, the trough of $1.7 million for the 12 years occurring only 2 years after the peak of $4.7 million and just one year before the second highest, $4.6 million. Variation in CEO compensation between firms within years is greater than the average variation over the 12 years, with a standard deviation of approximately 3 to 10 times the mean, depending on the year, although this feature of the data is partly due to individual variation, reflected in the sectorial dierences evident in Table 4 discussed below.
To the extent compensation depends on the firm's abnormal return, year to year fluctuations in CEO individual income is of course unpredictable.

Firm Characteristics and Signals
In our empirical analysis we allow for heterogeneity between firms by classifying firms within each of the three sectors on the basis of three indicators, total assets at the beginning of the period (or the end of the previous period), total employment, and its debt to equity ratio. More specifically we classify each firm by whether its total assets were less than or greater than median total assets for firms in the sector, whether its total employment were less than or greater than median employment for firms in the sector, and whether its debt to equity ratio was less than or greater than the median debt to equity ratio for firms in the sector. To notate these size indicators, let (S, S, L) mean lower total assets and employment than the median firm in the sector, but a higher debt to equity ratio than the median debt equity ratio for firms in the sector. Similarly let (L, S, L) mean lower employment than the median firm in the sector but greater than the median in the other two size indicators.
Managers release information about the state of the firm through accounting statements, and exercise considerable discretion over the values which are reported. They have many ways of directly aect the firm's balance sheets, choosing for example among dierent valuation methods for credits and liabilities, and using discretionary timing when writing o nonperforming assets. Exercising such liberties provides a mechanism for managers to signal the state of the firm to shareholders.
A commonly used accounting measure of the manager's accomplishments and firm's success is the dierence between the change in assets and the changes in liabilities plus dividends, called comprehensive income. Let A nt denote total assets reported at the end of the t th period and Debt nt the level of debt reported at the end of the period. Thus (A nt  Debt nt ) denotes net assets as reported in the annual report of the n th firm up to the end of period t. 17 Normalizing comprehensive income, we define the accounting return  nt for the firm in period t as: These variables are used to form our measures of r nt . For a given firm type let E [ nt ] denote the expected accounting return of  nt for firm n at the beginning of period t before the manager announces total assets A nt . We define the manager's report about the hidden state r  {1, 2} as an indicator variable, telling whether the firm's accounting return is higher or lower than the expected value of accounting returns in that period t. We note that shareholders receive Dividend n,t1 during the course of period t  1, and (A n,t1  Debt n,t1 ) has been tabled by the end of period t  1, leaving  nt to be determined by the manager's current report of (A nt  Debt nt ) . This definition is internally consistent with the timing in our model, because the dual eects of a payo relevant announcement by the manager during the current period are evident in a capital gain or loss through a change in the stock price immediately following, and also in the balance sheet, tabled at the end of the period. In this fashion our model captures the cumulative impact of announcements made throughout the period by their net eect on the balance sheet and shareholder equity value at the end of the period. Table 3 displays the number of observation in each sector and size category, and the probability that the report is good. For the most part, the probability of being in the bad state is higher, implying the median of r nt is less than its mean. However there are exceptions, such as (A, W, D) = (S, S, L) in the primary and consumer sectors. The latter columns of Table 3 provide a cross sectional summary of the average abnormal returns conditional for each size category by sector and report. The sample means for returns and compensation are without exception higher when a favorable report indicating the good state is released.

Bond Prices and Dynamic Considerations
The managers in our data set are about 55 years old and on average typically last less than 10 years as CEO before retiring. 18 They spend the compensation earned over that period throughout the remainder of their lives, taking account of future accruals from compensation and returns on wealth. It follows that their consumption and savings decisions, and the value of their compensation packages, are complicated by interest rate fluctuations. Thus variation in economic conditions provides a source of identification in models of moral hazard and restrictions in estimation. To account for the fact that the value of compensation, and also the compensating dierential of nonpecuniary benefits, partly depends on the interest rate, we allow ( 1 ,  2 , ) to depend on bond prices, by setting: where (  1 ,   2 ,  ) become the primitive preference parameters to be estimated as functions of z, the characteristics of the firm. Equation (2), the agent's preferences, becomes: Comparing (2) with (45) , instead of managers with risk aversion parameter  receiving w (x) , they have risk aversion parameter   and receive only the interest on the bonds purchased with the compensation, namely w t+1 (x) /b t+1 . Similarly instead of receiving the cash certainty equivalent of nonpecuniary benefit  j , which is  1 log  j , the manager receives the one period deferred cash certainty equivalent of nonpecuniary benefit   j , which is (b t+1 /b t  1 ) log  . 19 In other words in our empirical application we use the annuity value of compensation, and the annuity value of the nonpecuniary benefits, to reflect the notion that the managers in our sample spread expenditure from their income over their lifecycle to smooth their consumption.
This way of modeling bond prices yields a precise dynamic interpretation our model: managers sequentially choose their consumption and work choices each period, and the contract we derived for the static model is the long term optimal contract shareholders would oer. More precisely, suppose preferences take the form: where (l 0t , l 1t , l 2t , c t ) are the choice variables for each period t, and   (0, 1) is the manager's subjective discount factor. Mirroring the static model, l 0t  {0, 1} is an indicator variable for participation in the firm or taking the outside option in the t th period, l 0t  {0, 1} indicates whether the manager shirks or not in that period, l 2t  {0, 1} indicates whether the manager is diligent or not, c t is his consumption in period t and l 0t + l 1t + l 2t = 1 for each period. We now let x t denote abnormal profits of the firm received at the end of period t, f (x t ) denote the density of abnormal profits in period t under diligence, and f (x t ) g (x t ) the returns under shirking. Reinterpreted within this dynamic setting, the participation constraint in period t, (4) , is Equation (16) of Margiotta and Miller (2000), the incentive constraint (5) is their Equation (18), and the optimal compensation plan (6) is their Equation (21). They also prove the long term contract for the pure moral hazard model in this dynamic framework decentralizes to a sequence of short term contracts that mimic the contract described in Sections 2. In Appendix B we show that the contract derived in Section 3 for the hybrid model has the same dynamic interpretation. Regarding estimation, Equations (44) treat bond prices as observed variables entering preferences in a restrictive way, thus providing a further source of identification. In the pure moral hazard model, we can substitute from (44) for  1 ,  2 , and  into (12) and (13) , to prove that Theorem 2.1 implies: for each period t. Raising both sides of both equations to the power of b t 1 to make   1 and   2 the subject of (47) and (48) , and then first dierencing over t yields T 1 further restrictions that aid identification of  , and hence the other structural parameters. An equivalent set of restrictions, derived in Appendix C, applies to the hybrid model. Intuitively these restrictions mean that in the dynamic interpretation of our framework tastes for working, shirking and risk do not vary over time, implying that economic conditions supply the only reason in our models for contracts to vary over time. We view these restrictions as an appealing null hypothesis to test from.

Measurement Error
Abnormal returns to the firm are defined as the residual component of returns that cannot be priced by aggregate factors the manager does not control. More specifically, let V nt denote the equity value of firm n at time t on the stock market, and let  x nt , net abnormal returns, denote the financial return on its stock net of the financial return on the market portfolio in period t. Gross abnormal returns for the n th firm in period t attributable to the manager's actions are defined as net abnormal returns plus compensation as a ratio of firm equity: In an optimal contract, compensation depends on x nt , not  x nt . If w nt was observed without error the we could x nt directly from ( x nt , w nt , V n,t1 ) and apply the estimator to obtain w nt for each z nt , and ignoring dynamic concerns, compute the test statistics described towards the end of Section 2.
However the series we construct on executive compensation, w nt , is assumed to be measured with error, rendering inconsistent the estimator described in Section 2. Measured compensation, denoted  w nt , is the sum of true compensation w nt plus an independently distributed disturbance term  t , assumed orthogonal to the other variables of interest: Although (  w nt ,  x nt ) rather than (w nt , x nt ) is observed for each (n, t), we can nevertheless construct consistent estimates of (w nt , x nt ) from (  w nt ,  x nt ) by exploiting a premise of the model that the manager is risk averse under a mild regularity condition, that net abnormal returns to shareholders increase with gross abnormal returns; in other words the manager does not appropriate all the increase in the firm value.
This lemma implies that compensation schedule is the conditional expectation of measured compensation given net abnormal returns and lagged firm size. Pointwise consistent estimates of compensation w nt can be obtained for each observation with Kernel estimators o successive cross sections. From our estimates of w nt we then constructed a consistent estimator of the gross abnormal return, which we denote by: We also require an estimate of w st to form estimates of v st ()  exp [ w st /b t+1 ]. We use the fact that although w st is unknown, w st (x) is a locally non-decreasing function in x in the limit as x  . Following Brunk (1958), given firm type, for each state s  {1, 2} and period t  {1, . . . , T } , we rank the observations on returns in decreasing order by x (1) st , x (2) st , . . . and so on, denoting by w (1) st , w (2) st , . . . the corresponding (estimated) compensations, and estimate w st with: w Finally we require estimates of g s (x) , which we denote by g Ranking excess returns realized in the first state achieved at the end of any period t  {1, . . . , T }, we obtain the decreasing sequence x (1) , x (2) , . . .. Again, following Brunk (1958), we estimated h with: Table 4 provides a cross sectional summary of CEO compensation conditional on the accounting report r nt based on the manager's hidden information for each firm type. On average compensation is higher when the good state is announced. There is a great deal of dispersion about the sample means. From the numbers of observations in each accounting state r nt provided in Table 3, we infer that many their dierences are significant. By way of contrast, there are no systematic dierences between sample mean returns that depend on the publicly observed states. Compensation tends to be higher in companies that are larger on any of the three dimensions we have measured, and also higher in the service sector. Figure 1 depicts the estimated probability density functions for abnormal returns, and compensation schedules 20 , in each sector for two of the eight observed states, (A, W, D) = (S, S, S) and (L, L, L), and both unobserved states. Referring to Table 3 between 1,686 and 3483 observations are used to construct each graph. The probability density functions for the good state exhibit first order stochastic dominance over the bad. This suggests that accounting measures do anticipate financial performance. Hence a manager conditions on these measures when making her eort choice. It immediately follows that these accounting variables are relevant for analyzing empirical models of moral hazard.
Our model does not predict a monotone increasing compensation schedule, nor that compensation is uniformly higher in the good state than the bad, nor that compensation under the good state is tilted to punish poor performance and reward strong results, plausible as these hypotheses might sound. Thus we should not reject the theory because the illustrated compensation schedules in Figure 1, while for the most part upward sloping, are not monotone increasing, and also cross each other more than once. The nature of this data highlight the advantages of a nonparametric approach that directly confronts the theory, eectively eliminating the possibility of spuriously rejecting auxiliary assumptions imposed to accommodate a tightly parametrized formulation of the empirical specification. 20 A kernel regression of measured compensation,  w nt , on gross abnormal returns, x nt , accounting returns, our measures of observed heterogeneity, and bond price, shows that true compensation, w nt , explains about 75% of the variation in measured compensation.

Estimation and Testing
The equilibrium restrictions imply there is only one primitive parameter in the econometric formulation of the pure and hybrid models, the constant coecient of absolute risk aversion, denoted by   in the dynamic version of the model. Our estimation and testing procedures directly exploit restrictions on the risk aversion parameter embodied in the equilibrium conditions modified to account for heterogeneity in firm type, manager type, and interest rate fluctuations that aect the optimal contract, and extended to allow for a panel of firms and managers containing several time periods. As foreshadowed in Section 4.2, we allow   to vary by the three industrial sectors, the two measures of firm size, and the leverage indicator for the debt to equity ratio, as defined in the text. The basis for allowing   to vary by type of firm is that heterogeneous firms might be matched with heterogeneous managers with respect to their risk attitudes. We do not impose restrictions across the 24 sector/size/leverage combinations.
This section adapts Section 2.3 to allow for measurement error in compensation and taste parameters that vary over time (for reasons explained in Section 4.3), and reports our empirical findings. We describe the procedures used to derive our econometric estimators and analyze their properties. Then we discuss the confidence regions we found for several variations of the moral hazard model and the most restrictive version of our hybrid moral hazard. Finally we infer the economic implications of our estimates, in terms of potential losses that optimal contracting mitigates, along with the costs of private information and moral hazard incurred to avoid those losses.

Constructing a confidence region
In the text we explain the estimation and testing of the unrestricted pure moral hazard model that only imposes profit maximization. First we characterize the equilibrium restrictions on   and define a population analogue to the criterion function used in estimation and testing. Then we derive the critical values that determine the confidence region, accounting for the pre-estimation of several parameters that are determined by the probability density functions characterizing excess returns under shirking and working in the two states. Finally we review the subsampling procedures used to compute consistent estimates of the confidence regions. The more restricted versions of the pure moral hazard and hybrid moral hazard models are complicated only by additional or dierent inequalities and equalities that must be accounted for in estimation. However since they are treated in exactly the same way, we have relegated to Appendix C the extra detail which describes the estimation and testing of those models.
For each state s  {1, 2} and time t  {1, 2, . . . , T } , appealing to (44) we define Q st ( ) analogously to Q 0 () , given by (17) as: Interpreted either as a static model with time varying risk preferences, or as a dynamic model of the sort described in Appendix B, expected value maximization by the firm implies Q st ( )  0 for s  {1, 2} and t  {1, . . . , T }. Extending our definition of  given in (18) yields: Our empirical analysis is based on observing N firms over T periods, and the asymptotic properties described here are for large N .
To test the null hypothesis for the least restricted pure moral hazard model, we form a sample analogue to Q st ( ) , denoted by Q and estimate a confidence region for   with: where N a is the asymptotic rate of convergence of Q (N ) ( ) , and c  is the  critical value of the test statistic. We reject the pure moral hazard model at level  if  (N ) is empty. Under standard regularity conditions Q (N ) st ( ) converges in probability to Q st ( ). In the paragraphs below we explain how the rate of convergence, N a , is determined in our application, and then describe the subsampling procedure used to obtain a consistent estimate of c  .
The regularity condition about the upper bound x st plays a role in determining the rate of convergence of w st (x) is estimated nonparametrically in the first stage, and converges pointwise at a slower rate than N 1/2 , appealing to results in Newey and MacFadden (1994) establishes: for a given   > 0 for some covariance matrix  st ( ). Alternatively, we can relax the assumption about the existence of a finite x st , and assume less restrictively, that lim where  N  ln N and  N , a strictly positive sequence, converges to zero at a rate faster than N a . For each subset j  {1, ..., B N } of size N b define: and denote by c (57) , the estimated confidence region for   is determined by selecting those values for which N a Q (N ) ( ) is less than c

Risk Aversion Parameter
Given the joint probability distribution of the data, all the equilibrium restrictions from cost minimization and profit maximization in our structural models of moral hazard can be expressed as a mapping of  , the coecient of absolute risk aversion. We apply the estimation and testing methods described in Section 5.1 to the data described in Section 4. Table 4 shows that compensation varies across sector and firm type. Nonpecuniary benefits for both working and shirking might dier across firm type and sector; executives with dierent abilities, who therefore command dierent outside options, might select into dierent firm types and sectors. For these reasons all the models we estimate allow for a full set of interactions between firm and sector type. We also incorporate the aggregate eect of bond prices on the structural parameters through (44). Our empirical analysis investigates whether this degree of (observed) heterogeneity suces to reconcile either or both models to the data. For example is it necessary to allow tastes for working and shirking to vary with the accounting state of the firm, or with calendar time? Is selection by executives into firm and sector type driven by their attitude towards risk? Tables 5 through 7 presents the 95 percent confidence interval for the risk aversion parameter conditional on the firm type and sector under several variations of the moral hazard models considered.
In the pure moral hazard specification model reported in Table 5 the risk aversion parameter is allowed to vary over firm type and sector, but not over time or by accounting return; the coecients for working and shirking are permitted to vary with firm type and sector, by accounting return and also across periods in an unrestricted way. However imposing the restriction that all the variation in the taste parameters for working and shirking over calendar time comes from changes in the bond price has no eect on the confidence regions for  . We do not reject either version of the pure moral hazard model at the five percent level. While the lower bound of the estimated confidence interval is either 0.01 or 0.02, the upper bound of the confidence interval varies considerably across firm and sector type ranging from 0.21 to 20.1; a wide range of risk attitudes are compatible with the data given this amount of heterogeneity. The intersection of the estimated intervals is (0.02, 0.21), nonempty; we cannot reject the hypothesis that a common risk aversion parameter of   applies to all firm types within all sectors.
In Table 6 we report our results from imposing additional exclusion restrictions that eliminate dependence of the nonpecuniary costs and benefits of working and shirking on the two accounting return states (good and bad). The left panel of Table 6 shows what happens to the estimated identified set of the risk aversion parameters when we impose the restriction that nonpecuniary benefits from shirking are not aected by accounting returns. Providing the risk aversion parameter is permitted to vary with firm and sector type, the pure moral hazard model is not rejected. However there is no common region of overlap for the risk aversion parameter across all 24 firm and sector types, and the model is rejected when we impose the further restriction that the risk aversion parameter is common across firm and sector types. 21 The right panel presents our results from imposing the restriction that the nonpecuniary cost of diligent work is equal across the two accounting return states. Here again we cannot reject the model if the risk aversion parameter is allowed to vary across firm and sector type, but is rejected if we maintain the assumption that the risk aversion of managers does not depend on the type of firm or sector they select into.
Saturating the model with the risk aversion parameter, by allowing it to depend on each of the 24 firm and sector combinations, does not give the pure moral hazard model enough flexibility to accept the hypothesis that abilities and tastes are independent of accounting returns, because the estimated identified set of risk parameters is empty for three of the firm and sector specific cells. 22 Inspecting the columns for the primary and service sectors in the left panel, we reject the hypothesis that a sector specific risk preference parameter can reconcile the data when we impose the additional restriction that the common shirking parameter does not depend on accounting returns; similarly there is no overlap in each the right three columns; we reject the model a working parameter does not depend on accounting returns for a sector specific risk parameter. Performing a similar exercise on each row reveals that for a given firm type, the model rejects a common risk parameter across sectors if we impose the additional joint restrictions that the nonpecuniary benefits of working and shirking are common within a sector. In other words, imposing restrictions on the working and shirking parameters across sectors or across firm types is inconsistent with a common risk aversion parameter in the selected types. Summarizing Table 6, pure moral hazard models that do not permit tastes and abilities to vary with the accounting state are 21 We reach this conclusion comparing the maximum of the lower bounds in each cell with the minimum of the upper bounds in the respective cells, while acounting for the three instances of cells where the set of admissable risk aversion parameters are disjoint. 22 The three firm sector combinations in which the risk parameter regions do not intersect in their corresponding left and right panels are (L, S, L) in the Primary sector, (S, L, L) in the Consumer sector and (S, L, S) in the Service sector. rejected. Table 7 presents the 95 percent confidence interval of the risk aversion parameter set for a restricted hybrid model. Here we assume the preferences for working and shirking do not depend on the accounting return state or calendar time, and we also impose a common risk aversion parameter across firm and sector type. We cannot reject that model at the 5 percent confidence level in any sector. In both the primary and consumer goods sectors the confidence regions for the identified set of risk aversion parameters consists of two intervals, whereas in the services sector there is only one. The bands are quite wide, especially in the primary and consumer goods sectors, evidence of a wide range of risk aversion parameters that reconcile the restricted hybrid model with the data. Moreover since the regions overlap all three sectors for    (0.37, 0.42), there is no evidence from applying this model to the data that managers with dierent attitudes towards risk sort into dierent sectors.
Summarizing the results of Tables 5 and 6 for the pure moral hazard model, we conclude that only if risk preferences are permitted to vary across firm and sector type, preferences for working and shirking depend on the accounting state, can a common risk aversion parameter generate the observations. After adjusting for bond prices to reflect the structure of the dynamics and accommodate aggregate shocks in a parsimonious way, imposing restrictions that preferences do not shift over calendar time is completely innocuous. However imposing restrictions that preferences for working and shirking are not aected by the accounting state shrinks the confidence regions for the risk parameter set; under such restrictions the pure moral hazard model is rejected unless we allows the risk parameter to vary with firm and sector type. These results contrast vividly with those given in Table 7. The hybrid model does not reject the joint restrictions of homogeneous risk preferences and the independence of working and shirking parameters with respect to the accounting state.
Neither the fully unrestricted pure moral hazard model nor the fully restricted hybrid moral hazard model is rejected, which implies they are observationally equivalent. For comparison purposes, suppose managers had a common risk aversion parameter. The upper bound of (0.02, 0.21) , the intersection of the risk parameter sets for the firm and sector types pure moral hazard model given in Table 5, is less than the lower bound of (0.37, 0.42), the intersection of the sector types for the hybrid model in Table 7. Our measure of compensation units is in millions of dollars. Thus a manager with risk-aversion parameter between 0.02 and 0.21 would be willing to pay between $8, 849 and $92, 390 to avoid a gamble that has an equal probability of losing or winning one million dollars and a manager with risk aversion parameter between 0.37 and 0.42 would be willing to pay between $160, 870 and $181, 710 to avoid the same gamble.

Certainty Equivalent Wages
For the purposes of identification and estimation, the parameters of both models can be concentrated to just one primitive,  , but from an economics standpoint, the estimates of the preferences for working and shirking,   2 and   1 (induced by   in estimation) are also informative about the plausibility of the models. Along with   and the bond price, b t , these parameters determine the nonpecuniary costs of eort that oset pecuniary compensation. In particular, the manager's reservation wage to shirk (a certainty equivalent) is given by the expression b t+1 ln (  1 ) /  (b t  1), which is the dynamic extension of the formula for the shirking wage given in Lemma 2.1. Similarly b t+1 ln (  2 ) /  (b t  1) is the manager's reservation wage to work (derived in an analogous manner); this is the equilibrium wage the firm would pay the manager to work if his actions were observed. In both cases a negative reservation wage means that in equilibrium the manager would pay shareholders for the privilege of holding the job, presumably because they enjoy wielding power and the other perquisites of executive life. Finally the dierence between the two is the manager's compensating dierential from shirking versus working.
We computed 95 percent confidence regions of the identified sets for the unrestricted pure moral hazard model with a common risk aversion parameter and for the restricted hybrid model with a common risk aversion parameter. These regions are determined by substituting corresponding regions for the risk aversion parameters into the formulae for   1 and   2 . For example using (13) and (44) we calculate estimates of: for all    (0.02, 0.21) by firm and sector type to determine the estimated the reservation wage to work in the pure moral hazard model. Table 8 presents the estimated identified set of the manager's reservation wages, to shirk and work, for both the unrestricted pure and the restricted hybrid models at the median bond price. Because the unrestricted model has a reservation wage for each accounting state, but nonpecuniary benefits do not vary by state in restricted models, there are twice as many regions for the pure moral hazard model to report as for the hybrid.
The top panel of Table 8 presents the estimated identified set of the manager's reservation wage to shirk for both models. The confidence interval is quite small in all the cells; the dierence between the upper and lower bound is usually less then $20,000. This is because, given the probability distribution of our data, the formula for the shirking wage is not very sensitive to the risk aversion parameter. In the pure moral hazard model the shirking wage is always higher in the good state than in the bad. The dierences between the states are invariably more than a million dollars, another reminder that the only way to reconcile our data with a pure moral hazard model sporting a common risk aversion parameter is to assume reservation wages dier by accounting state. In 18 out of 24 firm and sector types the hybrid shirking wage lies between the two for the pure moral hazard shirking wages, while in the remaining 6 the hybrid shirking wage is less than the region for the shirking wage in the bad state of the pure moral hazard. 23 In the pure moral hazard models the shirking wage is negative for more than half of the firm types in the bad state, but in the good state, the manager would demand positive compensation to shirk in 22 out of 24 firm and sector types. About half the shirking reservation wages in the hybrid model are positive, and because the estimated regions typically lie between the pure moral hazard shirking wages for the two states the magnitudes for the hybrid are lower, and in our opinion, more plausible.
The bottom panel of Table 8 presents the identified set of the (certainty equivalent) reservation wages for working. In the pure moral hazard model the certainty equivalent compensation is negative for 9 out of 24 firm type in the bad state. This striking finding is not surprising. As shown in Table 4 the same 9 firm and sector types have negative average compensation. Since the dierence between expected compensation and its certainty equivalent is the risk premium, the former must be higher than the latter for risk averse agents. One way of rationalizing why a manager is willing to pay the firm to work in bad accounting states is to argue that in such states firms diversify from their core competencies to make themselves attractive to managers as a revenue source. We are skeptical this explanation applies to our firm population, and prefer to interpret this finding as evidence against the pure moral hazard model. There is no reason to resort to this implausible explanation to rationalize the hybrid model. Although 4 out of 24 of the firm types have a negative lower bound, the confidence region under the hybrid model always has an interval containing only positive numbers.

Agency Costs
To gauge the importance of agency in our data we use two measures. The first is the expected gross loss shareholders would incur from the manager shirking. The second is the risk premium, which measures the expected extra compensation paid to managers because of the agency problem. Table 9 depicts estimates of both measures for the pure and hybrid moral hazard models. The first measure, denoted by  1 , is the expected gross output loss to the firm switching from the distribution of abnormal returns for diligent work to the distribution for shirking, that is the dierence between the expected output to the plant from the manager pursuing the firm's goals versus his or her own, before netting out expected managerial compensation given the state (note that in hybrid it is given the state is reported truthfully). It is a function of the likelihood ratio of abnormal returns from shirking versus working. In symbols: where the expectation is over (x, s) . This was computed by numerically integrating over x where appropriate after substituting the 95 percent confidence region for the risk aversion parameters into the appropriate formulae for  g s (x). The top panel of Table 9 presents our estimated set of gross losses for both the pure and hybrid model hazard model. The estimated confidence intervals of gross losses are very tight irrespective of the model specifications; for example the average length of the confidence interval range from 0.86 percent to 1.73 percent for the pure moral hazard model and 0.68 percent to 1.04 percent for the hybrid model. The dierences between the two model specifications are small relatively to relative the variation over firm type using any reasonable of distance. For example the median minimum distance between the confidence intervals for the pure and hybrid moral hazard model is 0.32 percent in the primary sector, 1.74 percent in the service sector, and 1.49 percent in the service sector. Meanwhile the variation of the confidence interval across firm type is of several order of magnitude larger; for the pure moral hazard it range between 7.84 percent and 14.89 percent and for the hybrid model between 17.35 percent and 24.57 percent depending on sector. Therefore heterogeneity over firm type is much more important for the estimates of the gross losses than which model specification is used. It is worth noting however that the variation across firm type is higher hybrid specification than under the pure moral hazard specification. This is further illustrated by applying our measures of market value of the firms to our estimates of  1 after integrating out firm type; this is done by taking the average over firm type of the bounds of the confidence intervals and then multiplying these average bounds by the average market value in Table  2. For the pure moral hazard model the average loss to the firm varies from $545 million to $601 million in the primary sector, $918 million to $1.00 billion in the consumer goods sector, and $1.46 billion to $1.66 billion in the services sector per year. For the hybrid model it varies from $580 million to $648 million in the primary sector, $1.00 billion to $1.07 billion in the consumer goods sector, and $1.42 billion to $1.49 billion in the services sector per year. Clearly the estimated dollar value of gross losses between the two specifications is minor compared to their overall magnitudes. Another way of making this point is to note that the average stock market return over this period was roughly 10 percent per annum, so that for more than half the firm and sector types in both specifications, the expected gross return would have been negative if shareholders had ignored the moral hazard.
The second measure,the risk premium, is denoted by  2 . It shows how much the firm would be willing to pay to eliminate the moral hazard problem. It is a function of the risk aversion parameter, the nonpecuniary utility loss from working, and bond prices. Under a perfect monitoring scheme shareholders would pay the manager the fixed wage of b t+1 ln (  2 ) /  (b t  1) . Hence the expected value of a perfect monitor to shareholders is the dierence between expected compensation under the current optimal scheme and its certainty equivalent, which we average over the time periods in the sample: The bottom panel of Table 9 presents our estimates of the identified set for the risk premium. We find the risk premium is higher in the hybrid than in the pure moral hazard model for every firm type, by several hundred thousand dollars. Evidently the direct eect of the higher estimates of risk aversion for the hybrid model on (59) outweigh its indirect eects transmitted through work preferences in (13) and (31) . Despite the quantitative dierences between the pure and hybrid models, most of the qualitative comparisons between firm types and industry sectors match up. For both the hybrid and pure moral hazard specifications, after conditioning on firm type, the risk premium is lower in the primary than the consumer sector with just one exception (L, S, L). In both specifications the risk premium for consumer sector is generally lower than for the service sectors. 24 Controlling for assets and employment, all firm types with a higher debt to equity have a lower risk premium than their counterparts with a lower debt equity ratio in both the pure and hybrid models. Thus managers are more uncertain about their compensation, attributable in our framework to moral hazard and hidden information, when the population distribution of stakeholder claims to the firm's assets is tilted towards those who are most aected by firm performance. As a rule the CEO of a firm employing more workers is usually paid a higher risk premium, given total asset and the debt equity category. Now only does this hold for both the pure and hybrid specification; the two exceptions to this rule, which occur in primary sector, (L, L, L) versus (L, S, L) and (L, L, S) versus (L, S, S) occur for both specifications as well. The relationship between firms assets and the risk premium is somewhat weaker, but generally speaking, higher firm assets are associated with a higher risk premium.
Finally our findings for the risk aversion parameter set for the hybrid model, the risk premium, and the losses that would be incurred by the firms if they ignored moral hazard, are quite close to those found for pure moral hazard model by Margiotta and Miller (2000), Gayle and Miller (2009) and Gayle, Golan, and Miller (2011), although these papers uses dierent estimation methods and data from industrial sectors, executive ranks, and time periods. 25 Thus if accounting returns are treated as hidden information in a hybrid model of moral hazard, or if accounting information is ignored by integrating out those states, estimates obtained from structural models of moral hazard applied to executive compensation seems robust to a variety of econometric techniques and data sources. But if accounting data is used in the estimation, the estimates of the social cost of moral hazard are quite sensitive to assumptions about hidden information. Our results favor the hybrid model, which treats accounting data as unverifiable information that shareholders value because the manager has incentivizes to reveal his knowledge in the optimal contract.

Conclusion
If every piece of information a manager knows about his or her firm is codified and independently verifiable in a court of law, managers can be compelled to reveal all their privy information through the firm's accounting records. In that case a multistate pure moral hazard model would apply, dierent states being distinguished by distinct records. Within the current legal system, however, managers exercise considerable discretion about how much information they release describing the state of their own firms. If the penal code for accounting protocol was augmented by incentives embedded in managerial compensation designed to elicit truthful revelation, a hybrid model of moral hazard would apply.
Our empirical investigation is based on a large panel data set measuring compensation of chief executive ocers, financial and accounting returns, as well as size and sector background characteristics of the publicly trade firms they manage. In the pure moral hazard models we estimate and test, managers do not have discretion about how they report accounting returns. In the hybrid model, we interpret data on accounting returns as information reported by the CEO that cannot be fully corroborated by shareholders. Thus our empirical study compares and contrasts the role of these alternative information assumptions about accounting returns within competing models of moral hazard.
We derive the equilibrium restrictions from optimal contracting to predict the shape of the compensation schedule when there are only hidden actions (pure moral hazard), and when there is hidden information as well (hybrid). These restrictions fully characterize the empirical content of our models. We establish sharp and tight bounds for the risk aversion 25 Margiotta and Miller (2000)  parameter, and show that all the other parameters can be expressed as mappings of the risk aversion parameter and probability distribution of the data generating process, for which we have sample analogues. Our estimation and testing procedures are based on inferring the bounds of the risk aversion parameter. The benchmark static model of moral hazard is only partially identified, because every risk parameter that satisfies just one inequality derived from profit maximization generates an observationally equivalent model. The identified set of risk aversion parameters shrinks as we add constraints that impose exclusion restrictions to limit the scope of heterogeneity in multistate models of moral hazard. However point identification is by no means assured, either in theory, or in the confidence intervals for the identified risk aversion parameter sets we estimate.
The pure moral hazard model with homogenous preferences is rejected, but the hybrid moral hazard model with homogenous preferences is not. It is observationally equivalent to a pure moral hazard model with heterogeneous preferences. Yet the hybrid model provides a more satisfying economics explanation of the data than the heterogenous pure moral hazard model.
The data show that expected compensation for next period increases with current accounting returns, and also that the gradient of compensation in financial returns is higher when the accounting return has been greater. The hybrid model predicts that the expected utility of the agent is higher in the firm's good state then its bad state. Moreover to induce truth telling and report higher earnings when the firm's prospects are good, the principal lowers and flattens the schedule when the agent reports the bad state, reducing expected compensation and making realized compensation less dependent on the outcome. In our application this permits financial and accounting returns data to play bigger roles in explaining compensation. Relatively high estimated values of the risk parameter, which are consistent with previous work on pure moral hazard models that do not exploit the accounting data, reduce the certainty equivalent of compensation in the good accounting state. These features reconcile the hybrid model to the data even when tastes for working and risk attitudes are not allowed to vary with the firm's accounting state.
In contrast to the hybrid model, the pure moral hazard model equalizes expected utility across states. The heterogeneous pure moral hazard model mitigates the eects of curvature dierences in compensation schedules across states, by making the managers appear almost risk neutral, and simultaneously attributing to nonpecuniary benefits the dierences in expected compensation across accounting states. The risk parameter in the heterogenous pure moral hazard model is considerably lower than previous findings for pure moral hazard models that do not exploit dierences in accounting states. The nonpecuniary benefits from working for the firm in the bad accounting state are so high that the estimated certainty equivalent compensation is negative. But unless work preferences or risk attitudes dier across accounting states, the pure moral hazard framework lacks the degrees of freedom necessary to fit the dierently shaped compensation schedules.

A Proofs of Theorems and Lemmas
Proof of Lemma 2.1.
We define v(x)  exp [w (x)] and note that the participation constraint can be ex-pressed as: Similarly the incentive compatibility constraint for work can be expressed as: To minimize expected compensation subject to (60) and (61) , we choose v(x) to maximize: The first order condition is given by: Multiplying through by v(x) and taking expectations yields: since the complementary slackness condition for incentive compatibility implies: in the complementary slackness condition for participation proves that  0 = 1 and consequently: Thus the first order condition simplifies to: where    1  1 / 2 . Substituting for v(x)  exp [w (x)] and taking logarithms then yields (6) , the optimal compensation equation for work. A contradiction argument establishes the incentive compatibility constraint holds with equality too. Substituting Equation (65) into the incentive compatibility condition and imposing equality gives the solution to , namely (7) . Finally the optimal contract for shirking is found by setting  1 = 0 and substituting  1 for  2 in (62) and solving for from the first order condition to obtain  1 log ( 1 ). Proof of Theorem 2.1. Upon substituting   for , Equation (13), the expression for  2 (  ) follows directly from (64). Rearranging Equation (10) yields: Subtracting Equation (65) from (66) we obtain: Appealing to Equation (65): Subtracting Equation (68) from (66) we obtain: Substituting for  2  using (69) in (67) and making g (x) the subject of equation yields the expression for g (x,   ) given in (11). Substituting for  2  using (69) , and also for  2 using (64) , in Equation (68) , yields upon rearrangement the expression for  1 (  ) given in (12).
. Since the objective function in (62) is strictly concave, and the constraints are linear, the first order and complementary slackness conditions in this Kuhn Tucker formulation uniquely determine the solution to the optimal contract. We prove the theorem by showing that v (x, ) satisfies the first order conditions for the Lagrangian (62) , and that the complementary slackness conditions are satisfied when the Kuhn Tucker multipliers, denoted by  0 () and  1 () , are defined as: and From their respective definitions both  0 () and  1 () are strictly positive.
2. From the definition of  2 () in (13) the participation constraint in (60) is met with equality. From (70) ,  0 () is positive. Therefore the complementary slackness condition for participation is satisfied. Noting from (71) that  1 () is positive, it follows from the definitions of  1 (),  2 () and g (x, ) given in (11) through (13) that: Therefore the complementary slackness condition for incentive compatibility is satisfied.
Proof of Lemma 2.2. There are three steps. First we show that for all  > 0: Then we show that if cov  x, e w o (x)  < 0, then: Finally we construct a joint distribution for (x, w) in which the covariance is negative. Upon combining the inequalities the lemma now follows from the definition of Q 0 () given in (17).
1. Since e w is convex in w, Jensen's inequality implies Taking logarithms of both sides, dividing through by  and rearranging yields: But from (13) and the discussion following (14): Combining Inequalities (75) and (76) we obtain (73) upon substituting in the expression for  1 () given by (12).
for all positive , it now follows that: 3. Suppose the probability density function for x is symmetric, let E [x] = 0 and w o (x) = x. Then: Proof of Lemma 3.1. Multiplying each first order equation in the text by  s v s (x)f s (x) , then summing and integrating over x yields: where we make use of the complementary slackness conditions. Substituting for  0 = E [v s (x)] 1 into the complementary slackness condition for participation then gives the first numbered item in the lemma. Multiplying the first order conditions for the second state by v 2 (x), after solving for  0 we obtain: Taking the expectation with respect to x conditional on the second state occurring, and noting the incentive compatibility constraint is satisfied with equality in both states, yields: 1. Since the participation constraint is met with equality in the optimal contract: 2. Substituting the solution for  0 into the first order condition for the second state yields: Taking expectations we obtain: Dierencing the second two equations: Upon rearrangement, we appeal to the result in Item 2, that  2 =  2 (  ) to obtain: we substitute the solution for  2 above into the first order condition for the second state evaluated at the limit x   to obtain: or, upon appealing to Lemma 3.1: Making  1 the subject of the equation: 5. To prove  4 =  4 (  ) we first multiply the first order conditions for the first state by v 1 (x), after solving for  0 ()  E [v s (x, )] 1 to obtain: Conditioning on the first state and taking expectations with respect to x yields: since the incentive compatibility condition drops out. Substituting out the solution for: we obtained from Lemma 3.1 reduces this expression to: Upon collecting terms: so solving for  4 we now have: 6. Proving  3 =  3 (  ) follows directly from Lemma 3.1, which implies: , rewrite the first order condition for the first state as: At the limit x   we have: Making  1 the subject of the equation now demonstrates  1 =  1 (  ).
8. Dierencing the first order condition for the first state and its limit as x   gives: Dividing both sides by  1 we thus establish g 1 (x) = g 1 (x,   ).
Proof of Theorem 3.2. The proof follows the same steps as the proof to Theorem 2.2. First we define some candidate values for the Kuhn Tucker multipliers, as a function of  and establish they are positive. Then we show that if     the first order conditions for the optimization problem in (29) are satisfied in both states. Finally we demonstrate the complementary slackness conditions are also satisfied. Since the objective function for the underlying maximization problem is strictly concave, and the constraints are linear, the first order and complementary slackness conditions in the Kuhn Tucker formulation uniquely determine the solution to the optimal contracting problem, thus proving the theorem.
2. From the definitions of   1 () ,   2 (),  g 2 (x, ) and  2 () it follows that: From the definition of  3 () we have: Subtracting the first equation from the second and substituting we obtain the first order condition for the second state in the hybrid model given by the second line of (30). Turning to the first state, the definition of  g 1 (x, ) implies: From the definition of  1 (): Substituting out  3 () h in the expression above for  1 ()  g 1 (x, ) , and using the fact that  0 ()  E [v s (x, )] 1 now yields the first line of (30) upon rearrangement, which is the first order condition for the first state.

The definition of
directly implies the participation constraint is met with equality, and hence the complementary slackness condition for participation is satisfied. The complementary slackness conditions for truth telling and sincerity constraints are directly imposed by virtue of    . We now show the remaining two complementary slackness conditions are satisfied. In the second state, we again appeal to the fact that the definitions of   1 () ,   2 (),  g 2 (x, ) and  2 () are identical to their counterparts in the pure moral hazard model, which implies from Item 2 in the moral hazard case that: Multiplying this equation by v 2 (x, ) and taking expectations conditional on the second state yields: proving from (24) that the complementary slackness condition for incentive compatibility in the second state holds.
Multiplying the first line of (30) , the first order condition for the first state, by v 1 (x, ) , using the identity  0 ()  E [v s (x, )] 1 , and taking the expectation conditional on the first state yields implies: Successively substituting the definitions of  3 () and  4 () into the right side of this equation proves that both sides of the equation are zero. Comparing the left side of the equation with (24), it now follows that the complementary slackness condition for incentive compatibility in the first state also holds.
Proof of Lemma 4.1. For notational convenience, and without loss of generality, we suppress the dependence of compensation w nt on (s nt , b t , b t+1 ). Let  x denote the net excess returns, x gross excess returns, w (x) the compensation schedule as a mapping from gross excess returns, and let V denote the value of the firm at the beginning of the period. By our definition of net and gross excess returns: Suppose there exists for some ( x 0 , V 0 ) two distinct values of net excess returns, denoted x 1  R and x 2  R, satisfying Equation (78). Then: But this possibility is ruled out as a possibility in the premise of the Lemma. Therefore a unique solution to the relation defined by Equation (78) exists for each pair ( x, V ), and we can denote the solution mapping by The lemma now follows because the measurement error on compensation is assumed to independent of (

B A Dynamic Hybrid Model
This appendix develops the notation for a dynamic version of the hybrid model of moral hazard, writes down the feasibility constraints for the optimization problem, and then shows that the optimal contract mimics the optimal contract for a static model under the parameter transformation given in the text.

B.1 Assumptions and notation
At the beginning of period t the manager is paid compensation denoted by w t for his work the previous period, denominated in terms of period t consumption units. He makes his consumption choice, a positive real number denoted by c t , and the board proposes a new contract. The board announces how managerial compensation will be determined as a function of what he will disclose about the firm's prospects, denoted by r t  {1, 2}, and its subsequent performance, measured by abnormal returns x t+1 revealed at the beginning of the next period. We denote this mapping by w rt (x) , the subscript t designating that the optimal compensation schedule may depend on current economic conditions, such as a bond prices. Then the manager chooses whether to be engaged by the firm or be engaged outside the firm, either with another firm or in retirement. Denote this decision by the indicator l t0  {0, 1}, where l t0 = 1 if the manager chooses to be engaged outside the firm and l t0 = 0 if he chooses to be engaged inside the firm. If the manager accepts employment with the firm, so l t0 = 0, the prospects of the firm are now fully revealed to the manager but partially hidden to the shareholders. There are two states, and the probability the first state occurs is identically and independently distributed with probability  1  (0, 1). For convenience we denote the probability of the second state occurring by  2  1   1 . We assume that managers privately observe the true state s t  {1, 2} in period t, information that aects the distribution of the firm's abnormal returns next period, and reports the state r t  {1, 2} to the board. If the manager discloses the second state, meaning r t = 2, then the board can independently confirm or refute it; thus if s t = 1 he reports r t = 1. If s t = 2 the manager then truthfully declares or lies about the firm's prospects by announcing r t  {1, 2} , eectively selecting one of two schedules, w 1t (x) or w 2t (x) in that case.
The manager then makes his unobserved labor eort choice, denoted by l stj  {0, 1} for j  {1, 2} for period t which may depend on his private information about the state. There are two possibilities, to diligently pursue the shareholders objectives of value maximization by working, thus setting l st2 = 1, or to accept employment with the firm but follow the objectives he would pursue if he was paid a fixed by setting l st1 = 1, called shirking. Let l st  (l t0 , l st1 , l st2 ). Since leaving the firm, working and shirking are mutually exclusive activities, l t0 + l st1 + l st2 = 1.
At the beginning of period t + 1 abnormal returns for the firm, x t+1 , are drawn from a probability distribution which depends on the true state s t in period t and the manager's action then, l st . We denote the probability density function for abnormal returns when the manager works diligently and the state is s by f s (x) . Similarly, let f s (x) g s (x) denote the probability density function for abnormal returns in period t when the manager shirks. Thus for both states s t  {1, 2}: the inequality reflecting the preference of shareholders for diligent work over shirking. Since f s (x) g s (x) is a density, g s (x) is positive and integrating f s (x) g s (x) with respect to x with respect to f s (x) demonstrates E s [g s (x)] = 1. As in the text we assume: We make similar assumptions about the weighted likelihood ratio of the second state occurring relative to the first given any observed value of excess returns x  R, by assuming: The manager's wealth is endogenously determined by his consumption and compensation. We assume there are a complete set of markets for all publicly disclosed events, eectively attributes all deviations from the law of one price to the particular market imperfections under consideration. Let b t denote the price of a bond that pays of a unit of consumption each period from period t onwards, relative to the price of a unit of consumption in period t; to simplify the exposition we assume b t+1 is known at period t. Preferences over consumption and work are parameterized by a utility function exhibiting absolute risk aversion that is additively separable over periods and multiplicatively separable with respect to consumption and work activity within periods. In the model we estimate, lifetime utility can be expressed as: where  is the constant subjective discount factor,   is the constant absolute level of risk aversion, and   j is a utility parameter that measures the distaste from working at level j  {0, 1, 2}. As in the text we assume   2 >   1 and normalize   0 = 1.

B.2 Feasibility constraints
The cornerstone of the constraint formulation that circumscribes the minimization problem shareholders solve is the indirect utility function for a manager choosing between immediate retirement versus retirement one period hence. Lemma B.1 states this indirect utility function in terms of the utility he would receive from returning immediately. To state the lemma, let r t (s) denote the manager's disclosure rule about the state when the true state is s  {1, 2} .
Lemma B.1 If the manager, oered a contract of w rt (x) for announcing r, retires in period t or period t + 1 by setting (1  l t0 ) (1  l t+1,0 ) = 0, upon observing the state s and reporting r t (s) he optimally chooses l st  (l t0 , l st1 , l st2 ) to minimize: Had he truthfully disclosed the true state s t in period t, the manager would actually receive w st (x) as compensation if abnormal returns x are realized at the end of the next period t + 1. Suppressing for expositional convenience the bond price b t+1 , and recalling our assumption that b t+1 is known at period t, we now let v st (x) measure how (the negative of) utility is scaled up by w st (x): To induce an honest, diligent manager to participate, his expected utility from employment must exceed the utility he would obtain from retirement. Setting (l t2 , r t ) = (1, s t ) in (81) and substituting in v st (x) , the participation constraint is thus: Given his decision to stay with the firm one more period, and truthfully reveal the state, the incentive compatibility constraint induces the manager to prefer working diligently to shirking. Substituting the definition of v st (x) into (81) and comparing the expected utility obtained from setting l t1 = 1 with the expected utility obtained from setting l t2 = 1 for any given state, we obtain the incentive compatibility constraint for diligence as: In the hybrid model information hidden from shareholders further restricts the set of contracts that can be implemented. Comparing the expected value from lying about the second state and working diligently with the expected utility from reporting honestly in the second state and working diligently, we obtain the truth telling constraint: An optimal contract also induces the manager not to understate and shirk in the second state, behavior we describe as sincere. Comparing the manager's expected utility from lying and shirking with the utility from reporting honestly and working diligently, the sincerity condition reduces to: where (  1 /  2 ) 1/(bt1) v 1t (x) is proportional to the utility obtained from shirking and announcing the first state, and f 2 (x) g 2 (x) is the probability density function associated with shirking when the second state occurs.

B.3 Optimal contracting
We first prove the short term optimal contract for the dynamic model has a static analogue of the form we describe in the text, and then show that the long term contract decomposes to a sequence of short term contracts. As in the static model deriving w st (x) to minimize expected compensation of inducing diligent work in both states subject to the five constraints is equivalent to choosing v st (x) to maximize: subject to the same five constraints. To achieve diligent work and truth telling, shareholders maximize: where  0t through  4t are the shadow values assigned to the linear constraints. Setting: establishes by inspection that the solution to the static model solves the transformed problem as claimed in the text. In this framework there are no gains from a long term arrangement between shareholders and the manager. Lemma B.2 verifies the assumptions of Fudenberg, Holmstrom and Milgrom (1990) are met, thus establishing that the long term optimal contact decentralizes to a sequence of short term contracts solved by the problem above. 26 Lemma B.2 Denote by  the manager's date of retirement. The optimal long term contract can be implemented by a  period replication of the optimal short term contract.
Proof of Lemma B.1.
Let  r be the date t price of a contingent claim made on a consumption unit at date r, implying the bond price is defined as: and let q t denote the date t price of a security that pays o the random quantity: From Equation (15) on page 680 of Margiotta and Miller (2000), the value to a manager with current wealth endowment e nt , from announcing state r t (s) in period t when the true state is s, choosing eort level l st2 in anticipation of compensation w rt(s)t (x) at the beginning of period t + 1 when he retires one period later, is:  the corresponding value from choosing eort level l st1 is:  whereas from their Equation (8) on page 678, the value from retiring immediately is:  Dividing each expression through by the retirement utility it immediately follows that the manager chooses l st  (l t0 , l st1 , l st2 ) to minimize the negative of expected utility: Since l t0  {0, 1} and b t > 1 the solution to this optimization problem also solves: Multiplying through by the factor (  1 l st1 +   2 l st2 ) 1/(bt1) and summing over the two states s  {1, 2} yields the minimand in Lemma B.1 Proof of Lemma B.2. In our model the proof of Proposition 5 in Margiotta and Miller (2000) can be simply adapted to show that Theorem 3 of Fudenberg, Holmstrom and Milgrom (1990) applies, thus demonstrating that the long term optimal contract can be sequentially implemented. An induction completes the proof, by establishing that the sequential contract implementing the optimal long term contract for a manager who will retire in  periods replicates the one period optimal contract. In the optimal short term contract, the participation constraint is satisfied with strict equality, which implies that at the beginning of period   1 the expected lifetime utility of the manager is determined by setting t =   1 in the equation: Suppose that at the beginning of all periods t  { + 1,  + 2, . . . ,   1}, the expected lifetime utility of the manager is given by Equation (90). We first show the expected lifetime utility of the manager at  is also given by Equation (90). From Lemma 2.1 the problem shareholders solve at  is identical to the short term optimization problem solved in the text. In the solution to each cost minimization subproblem for the four (L 1t , L 2t ) choices, the manager's participation constraint is met with equality. Consequently the manager achieves an expected lifetime utility of the manager at  is given by Equation (90) as claimed. Therefore the problem of participating at time  and possibly continuing with the firm for more than one period reduces to the problem of participating at time  one period at most, solved in Lemma B.1. The induction step now follows.

C Implementing the Restrictions of the Pure Moral Hazard and Hybrid Models
This appendix extends the discussion of Section 5.1 on estimating and testing the unrestricted pure moral hazard model to other models we analyzed in our empirical investigations. First we show the set of admissible   shrinks when we impose the restrictions that tastes for working or shirking do not change with the state, and only vary with the bond price following Equations. Then we characterize the set of restrictions on   implied by the fully restricted hybrid moral hazard model as defined in the text.

C.1 Restrictions on the pure moral hazard model
In the most unrestricted pure moral hazard model it follows from Section 5.1 that: With reference to and in the text, we now define the taste parameters for the dynamic version of the pure moral hazard model as: We investigated how the confidence region for   shrinks when we impose the restrictions that  1st ( ) and  2st ( ) do not change with the state s  {1, 2} or with time t  {1, . . . , T }.
To impose the restriction that  1st ( ) does not vary by state, we define the real valued functions  1t ( ) as: Thus to find a confidence region for the risk parameter under the null hypothesis that tastes for shirking or working,  jst ( ) for j  {1, 2} , do not vary by state, we augment (91) and find those values of   that achieve close to the lower bound of zero for a sample analogue of: The results from separately imposing these two sets of restrictions for are reported in Table  6 of the text. Essentially the same procedure can be used to constrain   1st or   2st to remain constant over time. Defining: it immediately follows from that   1s1 =   1st when  1st ( ) = 0. Similarly   2s1 =   2st when  2st ( ) = 0, where  2st ( ) is defined as: This restriction implies that  2st ( ) = 0 for all t  {1, 2, . . . , T } and s  {1, 2} . Thus the confidence region for the risk parameter under the null hypothesis that  jst ( ) does not vary over time, could be found by constructing a sample analogue of: and, using the methods we describe below, selecting those   that bring the criterion function close to zero. Sample analogues are formed the same way as in the unrestricted model and the testing procedures followed the same steps. As reported in the text, the intersection of the regions across the firm types is empty, implying that under pure moral hazard tastes vary either over time or across state. For the sake of completeness Table A1 displays the regions that C.2 Restrictions on the hybrid moral hazard model The restrictions in the hybrid model are imposed in a similar way. Here we maintain throughout our analysis of the hybrid model the null hypothesis that the that the taste parameters for working and shirking, both mappings of  , do not vary by state or time. These restrictions are maintained because the intersection of the estimated confidence intervals for   for the 24 sectors under the null hypothesis is not empty.
To develop the notation for the econometric framework that accommodates a panel where bonds prices over time, as opposed to a cross section or a steady state economy with constant interest rates, we extend our notation as follows. Appealing to Equations (47) and (48), define taste parameters that are independent of the states: Similarly the likelihood ratio for the second state is defined as We then define the Lagrange multipliers  1t ( ) through  4t ( ) by substituting   /b t+1 for ,   1t () for   1 () ,   2t () for   2 () and  g 2t (x, ) for  g 2 (x, ) into Equations, and hence define: We are now in a position to define   t by substituting  1t ( ) through  4t ( ) for  1 ( ) through  4 ( ) ,  it ( ) for  i ( ) , and  kt ( ) for  k ( ) in the definition of   and replacing  with  . To impose the restriction that none of the parameters vary over time we take the intersection: In the hybrid model, the components of  Q ( ) are formed from the probability density functions characterizing abnormal returns, conditional on the firm's characteristics and the manager's report, that is f s (x) , and the nonlinear regression function of compensation on abnormal returns the same set of variables, denoted by w st (x). In the previous sections we described our estimates of the compensation scheme, w we constructed a confidence region for the hybrid model in a similar way to the pure moral hazard model. That is substituting  Q (N ) ( ) into Equation () in the text, we derived the rate of convergence and numerically computed the critical value using the same procedures laid out in the text.
Computing  2t ( ) and  3t ( ) requires us to solve w (1,0) s (x,   /b t+1 ) and w (0,1) s (x,   /b t+1 ) for each candidate value of  , a nonlinear problem that includes two Lagrange multipliers. If the states s  {1, 2} and the eort level (l 1 , l 2 ) were observed by shareholders, then they would optimally oer b t+1 log [  1t ( )] /(b t  1)   for shirking and b t+1 log [  2t ( )] /(b t  1)   for diligence. The profits from this hypothetical arrangement are therefore: from shirking in the second state and working diligently in the first, and: from shirking in the first state and working diligently in the second. Since neither cost minimization problem imposes the truth telling, sincerity and one incentive compatibility constraint, but has the same objective function, it now follows that   2t ( )   2t ( ) and   3t ( )   3t ( ) . Let   ( , T / 2t ,  3t ) denote the set of   formed from excluding  2t ( ) and  3t ( ) for all t. By construction: ) denote the set of   formed from intersecting   ( , T / 2t ,  3t ) with   2t ( ) and   3t ( ) for all t  {1, 2, . . . , T }. Since   2t ( )   2t ( ) and   3t ( )   3t ( ) it immediately follows that: In our empirical application we found the confidence region for   obtained from imposing   ( , T / 2t ,  3t ) coincided with the confidence region obtained from imposing   ( , T / 2t ,  3t   2t ,   3t ) . In other words imposing the restrictions   2t ( ) and   3t ( ) for all t  {1, 2, . . . , T } did not shrink   ( , T / 2t ,  3t ), implying from (96) that: In this way we computed the confidence region for   ( , T ) without solving for w   Compensation in thousands of 2000 $US. † Firm Type is measured by the triplicate (A, W, D) where each element corresponds to (i.e., A is assets, W is the number of workers, and D is the debt to equity ratio) whether that element is above (i.e., L) or below (i.e., S) its industry average.