Experimentation and Job Choice

In this article, we examine optimal job choices when jobs differ in the rate at which they reveal information about workers’ skills. We then analyze how the optimal level of experimentation changes over a worker’s career and characterize job transitions and wage growth over the life cycle. Using the Dictionary of Occupational Titles merged with the National Longitudinal Survey of Youth 1979, we then construct an index of how much information different occupations reveal about workers’ skills and document patterns of occupational choice and wage growth that are consistent with a trade-off between information and wages.

the optimal level of experimentation changes over a worker's career and how experimentation affects job transitions and wage growth over the life cycle. Then, using data from the Dictionary of Occupational Titles (DOT) merged with the National Longitudinal Survey of Youth 1979 (NLSY79), we construct an index of how much information different occupations reveal about workers' skills and document patterns of occupational choice and wage growth that are consistent with experimentation.
The theoretical and empirical literature on uncertainty in the labor market primarily focuses on models of matching (see, e.g., Jovanovic 1979;Miller 1984) and models of learning (see, e.g., Farber and Gibbons 1996;Gibbons and Waldman 1999;Neal 1999;Gibbons et al. 2005). Our model differs from standard matching models because, in our model, workers' productivity is not match specific. In addition, our model differs from standard learning models in that we allow different jobs to convey different amounts of information about workers' skills. Two recent papers, Papageorgiou (2010) and Pastorino (2010), estimate models of experimentation. These papers do not use an explicit measure of the extent to which different occupations reveal information about workers' skills. Thus, we are able to more directly establish the link between occupational choice, learning, and wage dynamics. Several recent papers also use data from the DOT to characterize occupations (see, e.g., Autor, Levy, and Murnane 2003;Ingram and Neumann 2006;Poletaev and Robinson 2008;Bacolod and Blum 2010;Yamaguchi 2010). These papers, however, do not consider whether employers can observe whether workers possess the skills needed in a given occupation. In contrast, our primary goal is to capture the extent to which different occupations are likely to reveal information about workers' skills.
In our model, workers choose a job in every period to maximize the expected present discounted value of lifetime income. In each job, the more output depends on the unobserved skill, the more information the job reveals about that skill. For example, this might correspond to the case in which workers learn more about their ability as a manager in jobs in which output depends on managerial ability. Workers value information because it increases the probability that they will be assigned to the job at which they are the most productive. Thus, workers experiment, forgoing expected current-period output in order to learn about their skill. We find that workers are more likely to experiment at the beginning of their career, when there is considerable uncertainty about skill. The optimal level of experimentation, however, is initially small, increases as workers gain experience, and then declines as workers become increasingly certain about their skill. The decline in experimentation at the end of a worker's career is intuitive: as uncertainty about workers' skills falls, so too does the value of experimentation. The increase in experimentation in the early stages of a worker's career is driven by the fact that when there is a lot of dispersion in workers' prior beliefs about their skill, marginal increases in information do little to increase the probability that workers are correctly assigned to jobs in the future. 1 The trade-off between information and current output is similar to matching models (such as in Jovanovic [1979] and Miller [1984]). In Miller's study, the mean and the variance of the prior distribution of match quality differ across occupations. Since workers learn about match quality more quickly in high-variance occupations than in low-variance occupations, workers may be willing to enter into occupations in which expected match quality is low as long as the prior variance of match quality is high enough. This form of experimentation primarily takes place early in a worker's career. Our model differs because instead of facing a binary decision about whether or not to learn about a specific match (workers either enter an occupation or do not), in our model, workers choose both whether to learn and how much to learn.
In addition to characterizing patterns of experimentation over the life cycle, we also examine our model's implications for wage dynamics. We show that, unlike most standard models of wage growth in which wages increase because of either human capital accumulation or improvements in match quality, in our model, wage growth is also partially driven by the eventual decline in experimentation. 2 Further, we show that random productivity shocks can have long-lasting effects on both wages and wage growth. In particular, workers who receive negative productivity shocks may be reassigned to jobs that reveal little about their skill and in which wage growth is slow. As a result, luck may lead to different career trajectories, even in the long run. Further, since new information has the largest effect on prior beliefs when workers are young, productivity shocks have the largest impact early in workers' careers.
To test our model's predictions, we use data from the DOT to create an index that ranks occupations by the degree to which output depends on unobserved skill, and we merge this index with occupational work histories from the NLSY79. We show that our model's predictions are consistent with observed wage and job mobility patterns in the NLSY79. We find that workers do not start in jobs that are likely to reveal a great 1 The fact that patterns of experimentation are often nonmonotonic and complex is a manifestation of the famous Radner and Stiglitz (1984) result that the value of information is nonconcave. This result is generalized by Chade and Schlee (2002).
2 One exception is the model of Harris and Holmstrom (1982), who examine an environment in which firms provide risk-averse workers with partial insurance against negative productivity shocks. In their model, wages rise over workers' careers because as uncertainty about worker ability falls, so does the cost of insuring the worker against future wage cuts.
deal of information about their skill. In addition, although, on average, workers move into jobs that depend more on skill, a substantial fraction transition into jobs that depend less on skill and, consistent with experimentation, still experience wage increases. In addition, workers who experiment more at the beginning of their careers have faster wage growth and greater wage dispersion than workers who experiment less but earn higher wages, again suggesting a trade-off between information and wages.
A handful of other papers also consider experimentation in the labor market. Ortega (2001) builds a two-period model of job rotation and shows that expected productivity is higher when firms learn about workers through job rotation rather than through fixed job assignments. Ortega does not, however, fully characterize the optimal job rotation policy. Felli andHarris (1996, 2006) and Pastorino (2009) characterize wages and firm turnover in models of experimentation in a strategic framework in which learning is inefficient because of competition over scarce talent. In contrast, we characterize the optimal level of experimentation in a nonstrategic framework in order to focus on the trade-off between current-period productivity and information. 3 While few papers consider experimentation in the labor market, there is a large theoretical literature on optimal experimentation in other contexts. These papers, however, use different payoff functions and information acquisition processes, and thus their results are not directly applicable to our setting.
The article is organized as follows. Section II describes the labor market and the structure of the information in the model. Section III characterizes experimentation and job assignments in a two-period model in order to highlight the trade-off between information and current-period output. Then, to characterize job transitions and wage dynamics over the life cycle, Section IV presents the solution to the infinite-horizon problem. Section V describes the data. Section VI presents empirical regularities consistent with our model's predictions, and Section VII presents conclusions.

II. Model
We consider an economy with infinitely lived, risk-neutral workers and firms with a common discount factor d. Workers differ in the set of skills they possess. In principle, this skill set may be multidimensional and include skills such as creativity, diligence, adaptability, and so forth. To focus on essentials, however, we examine a simple scenario in which each 3 Our information structure is similar to that of Felli andHarris (1996, 2006) in that jobs that depend more on skill are more informative. Pastorino (2009) explains the patterns of promotions of managers found in Baker, Gibbs, andHolmstrom (1994a, 1994b), considering the case in which high-level jobs are less informative about ability than low-level jobs. worker has only two skills: a known skill, k, and an unknown skill, v, both of which are time invariant. For simplicity we assume that each firm offers one job. Each job differs in the extent to which output depends on k and v. There are N different types of jobs, each completely characterized by a given value of a, where a denotes the degree to which output depends on v, relative to k. Thus, choosing a job in period t is equivalent to choosing a value of a. Given this choice, we assume that output in period t is given by where is an independent and identically distributed (i.i.d.) productivity e t shock, denotes the value of a chosen by the worker at time t, and a t , where , , and for all . Thus, there is one job in which output is sensitive only to v ( )a n do n e N a p 1 job in which output is sensitive only to k ( ). For the rest of the 1 a p 0 jobs, the higher , the more output depends on v. j N Ϫ 2 a Information in the model is symmetric; firms and workers have common priors on v, k is known to everyone, and output is commonly observed. Workers and firms acquire additional information about a worker's unknown skill through successive observations of output. Thus, having observed output, workers and firms calculate where serves as a signal of the worker's unobserved skill, v. The noise x t in is not independent of a worker's job choice. In particular, the higher x t , the higher the signal-to-noise ratio and the more information about a t v the market is able to extract from . Under the assumption that the x t prior distribution of v at time t is normal with mean and variance 2 m j Thus, the posterior mean of v follows a martingale, and the more information reveals about v (the higher a is), the higher the variance of the x t posterior mean.
Timing in the model is as follows: at the beginning of each period, workers announce a job choice, firms make take-it-or-leave-it wage offers, and each worker accepts an offer. We assume competitive markets and free entry into the labor market.
Given their prior beliefs about v and given a worker's job choice, , a t firms pick a wage policy to maximize the present discounted value of future profit. Workers' current-period utility is given by U p t , so workers choose to maximize the expected present discounted w a t t value of lifetime wages. We assume spot contracts. Thus, given free entry and symmetric information, wages will equal a worker's expected productivity in each period. In addition, it is straightforward to show that if firms (instead of workers) determined job assignments, the equilibrium outcome would be the same (proof available on request).

III. Optimal Job Choice in a Two-Period Model
Consider a worker who works for two periods and then retires. Assume that there is a continuum of jobs . For simplicity and without a [0, 1] loss of generality, assume that . Thus, the worker's problem e ∼ N(0, 1) can be written as 1 1 1 1 1 1 2 2 2 a [0,1] t which we can solve recursively beginning from the second period.
Proposition 1. The second-period optimal choice of job is given by The second-period job assignment is the solution to a static problem in which the worker maximizes his or her expected wage. Since workers are paid their expected productivity, they choose the job in which their expected productivity is the highest.
Next, we solve for the optimal assignment in period 1. Note that expected productivity in period 2 depends on the second-period belief, m 2 , which in turn depends on a 1 through (see eqq. [2] and [3]). We therefore x 1 rewrite the first-period problem as expected current-period output will be less than k. Thus, when , m ! k 1 workers must weigh the benefit of increasing a 1 in terms of expected second-period output against the cost in terms of expected current-period output.
To see why expected second-period output is increasing in a 1 , note that the expected value of any left-truncated normal random variable is increasing in the variance of that random variable. Thus, since is increasing 2 s 2 in a 1 , we know that expected second-period output must also be increasing in a 1 . Intuitively, information is valuable because workers can insure themselves against the arrival of negative information about v by selecting future jobs with but can take advantage of the arrival of positive a p 0 information about v by selecting jobs with . a p 1 If a worker chooses to forgo expected current-period output in order to gain information about v, then we say that the worker experiments. Proposition 2 establishes that for any , experimentation is beneficial m ! k 1 if there is sufficient uncertainty about a worker's skill.
Proposition 2. A worker experiments if , but the worker m ! k 1 chooses . For every and any , there exists a large a 1 0 Ϫϱ ! m ! k a 1 0 1 1 1 enough such that the value of experimenting is greater than the value j of not experimenting. That is, .
The intuition is that even when v is believed to be very low, if there is sufficient uncertainty about v, then the probability that is high v 1 k enough that it is worth forgoing current-period output to gain additional information about v's true value.
Next, we characterize the optimal solution. The first-order necessary condition for an interior solution is where is the standard normal pdf. f(r) Thus, in an interior solution, the marginal benefit of experimentation a in terms of second-period output is equal to the marginal cost in terms of first-period output. Note that when , the cost of increasing a is m 1 k 1 negative (the right-hand side of eq. [10] is negative). Thus, when , m 1 k 1 both first-period and second-period expected output are increasing in a 1 , and there will be a corner solution at . a p 1 1 Figures 1 and 2 illustrate the optimal job choice as a function of the state variables, j 1 and m 1 . When j 1 is held constant, the higher the prior mean of v, the higher the optimal choice of a 1 . The optimal choice of a 1 , however, is not always increasing in the prior variance of v. 4 To understand why there is a nonmonotonic relationship between a 1 and j 1 , recall that the current-period expected output does not depend on ; this implies that the nonmonotonic relationship between and 2 2 j j 1 1 the optimal choice of a 1 must depend solely on how increases in a 1 affect expected future output. In particular, the effect of increasing a 1 on expected future output must be low both when is small and when is 2 2 j j 1 1 large. When is small, the option value of new information is low because 2 j 1 new information on v is unlikely to have a large impact on the posterior mean of v. Moreover, the expected loss of output due to incorrect future job assignments is small because the likelihood that v is much different from m 1 is small. To see why the benefit of increasing a 1 is also small when is large, recall that an increase in a 1 increases expected future 2 j 1 output through its effect on , the spread of m 2 . As is clear from equation siderable uncertainty about a worker's skill, the spread of m is large, and experimentation has little value on the margin since increased information has little effect on the optimal job assignment in the second period. Figure  3 describes the marginal effect of an increase in a 1 on second-period output as a function of , and proposition 3 establishes this formally. 2 j 1 Proposition 3. The marginal value of increasing a 1 in terms of second-period output shrinks to zero both as j 1 becomes arbitrarily small and as j 1 becomes arbitrarily large. That is, and

IV. Optimal Job Choice in an Infinite-Period Model
We now extend the model above to incorporate an infinite time horizon so that we can fully characterize the evolution of wages and job assignments over the life cycle. We solve the model numerically and assume that there are a finite number of job "types" (i.e., is discrete). Since a t the worker's decision problem in period t is the same as his problem in period 1, except that he updates his prior beliefs about v on the basis of the history of productivity signals, , according to equations {x , … , x } tϪ1 1 (3) and (4), the worker's problem is stationary. Thus, we can write the value function, , as the solution to a Bellman equation in which the V(7, 7) control variable is and in which the state variables, and , describe 2 a m j ) . m p 6 a p 5 where f denotes the normal pdf with mean and variance ; the 2 m s tϩ1 tϩ1 dependence of and on and the state variables is given in equa- (4) and (6). The first two terms on the right-hand side of (11) represent expected current-period output, and the second term represents the continuation value, which incorporates the value of information obtained from observing .
x t As in the two-period model, there is a trade-off between current-period output and information. In the context of equation (11), the benefit of increasing is reflected in the continuation value and the fact that , 2 a s t tϩ1 the variance of , is increasing in . In contrast to the two-period m a tϩ1 t model, however, also affects the continuation value through its effect a t on . That is, information affects future patterns of experimentation. In this subsection, we characterize optimal job assignments and verify that the basic qualitative properties of the two-period model hold. We refer to as the low-level job and as the high-level job. In a p 0 a p 1 addition, we refer to all jobs with as intermediate-level jobs. a (0, 1) A key feature of our model is that the productivity signal, , provides x t information about a worker's productivity at many jobs. Thus, we cannot appeal to solution techniques developed in the literature on independent multi-armed bandit problems (because the "arms" in our problem are dependent). 5 Instead, we solve our problem numerically. 6 Figure 4 illustrates optimal job choices in the N-job model when , , , and . As the figure reveals, the key trade-off between current wages and information. Second, as in the twoperiod model, with held constant, the higher the prior mean of v, the 2 j t 5 For further discussion, see Gittins and Jones (1974). 6 By the contraction mapping theorem, the value function in (11) is unique and can be obtained by iteration of T. The problem is solved using standard numerical methods.
higher the optimal choice of . Finally, may increase as the prior a a t t variance of v falls. To see this, note that the frontier along which workers are indifferent between choosing and is positively sloped a p 0.5 a p 1 t t when the prior variance of v is relatively high. 7 Thus, if young workers are very uncertain about their skills, then with the prior mean fixed to be about 6.25, the optimal level of experimentation will be low at the early stages of workers' careers ( ) but will increase as uncertainty a p 0.5 t begins to fall. Eventually, as uncertainty falls further, experimentation will again decrease ( when approaches zero). Thus, as in the two-a p 0 j t t period model, the relationship between and is nonmonotonic. In standard matching models, for example, Miller (1984), in which there is an occupation-specific skill and productivity across occupations is uncorrelated, inexperienced workers are also more likely to experiment early on, choosing "risky" occupations in which they learn quickly about their skill but receive low wages. Similarly, in our model, inexperienced workers (those with high prior variance) are more likely to experiment; for example, in figure 4, the frontier between and is negatively a p 0 a p 0.5 sloped, implying that the likelihood of experimentation falls as falls.
2 j t However, as discussed above, in our model, the optimal level of experimentation is initially low, increases in the early stages of workers' careers, and eventually falls as uncertainty about workers' skills disappears.
To illustrate job transitions over the life cycle, we simulate job choices over time for workers with and in a model in which m p 3 j p 6 0 0 , , , and . As figure 5 reveals, 2 d p 0.9 k p 7 j p 1 a {0, 0.2, 0.5, 0.7, 1} e given this initial mean and variance, all workers start out with . a p 0.5 Note, however, that this is a transitory job. As workers gain experience and become more certain about whether v is greater than or less than k, they increasingly sort into jobs in which or . In addition, a p 1 a p 0 notice that some workers who select do so because they wish to a p 1 experiment, whereas others do so because maximizes their expected a p 1 current-period output. Workers who are assigned to but experi-a p 1 ment ( ) are marked by (E). m ! k a p 1 Proposition 4. At the beginning of the life cycle, workers may work in jobs in which . As they accumulate experience, they sort 0 ≤ a ≤ 1 into jobs that depend more heavily on one of the skills, and in the limit, workers choose either or . a p 1 a p 0 The proof is in the appendix. The intuition for this proposition is clear. The first part follows directly from the optimal solution (see, e.g., fig. 4), 7 Notice that the frontier between and is an "indifference" curve a p 0 a p 0.5 t t in which the value function is the same along the curve. However, along the frontier between and , the value function differs at different points a p 0.5 a p 1 t t on the frontier. In particular, the higher the prior mean and prior variance, the higher the value function. 8 These patterns hold for a larger number of jobs with as well. 0 ! a ! 1 Wage growth in our model occurs as workers learn about v and sort into the job at which their expected productivity is the highest. Moreover, since experimentation involves a loss in expected current-period wages, wage growth is also driven by the eventual decline in experimentation. To illustrate this, we simulate the wage distribution when , k p 7 m p 0 , , , and , so that the optimal job 6 j p 4 d p 0.9 a {0, 0.2, 0.5, 0.7, 1} 0 assignment is initially . Figure 6 shows the resulting percentiles a p 0.7 of the wage distribution for 10 periods into the future. Notice that experimentation initially leads some workers to earn less than they would earn if they were assigned to . For example, the wage at the 5th a p 0 for the first four periods, but this bottom k p 7 tail (any wage less than k) disappears as workers stop experimenting. In addition, like most learning models, it shows increasing cohort wage dispersion.
To illustrate the trade-off between current wage and future earnings, we repeat the simulation in figure 7 but set . Relative to workers m p 4.3 0 with , these workers will start with a lower initial a ( m p 6 a p 0.2 0 0 instead of ) and will have higher initial wages ( p $6.50 instead a p 0.7 w 0 0 of p $6.30). Over time, however, they will have lower wage growth w 0 and wage dispersion.
Our model also suggests that i.i.d. productivity shocks, especially those early in a worker's career, have a persistent effect on earnings. It is a common feature of all learning models that past output realizations affect current beliefs about workers' skills, and on average, workers who receive positive productivity shocks ( ) will have higher wages than those e 1 0 t who receive negative productivity shocks ( ), at least for any finite e ! 0 t time horizon.
In contrast with previous literature, however, in our model, negative productivity shocks have longer-lasting effects than positive productivity shocks. Workers who receive negative productivity shocks are more likely to choose jobs with a low a and hence will acquire less information about v and will be slower to sort into the job at which they are the most productive. As a result, workers who receive negative productivity shocks will have not only lower wages but also slower wage growth. In addition, since no information is revealed when , there will always exist a a p 0 subset of workers for whom v is never fully learned, even in the limit. 9 Thus, experimentation serves as a propagation mechanism. Further, the effect of luck is especially pronounced early in a worker's career since new information has the largest effect on beliefs when there is considerable uncertainty about v.
To see the effect of luck on wages, we assign all workers and v p 8 simulate the wage distribution when , , , , and k p 7 m p 6 j p 4 d p 0.9 0 0 . Given that , if v were known, all workers a {0, 0.2, 0.5, 0.7, 1} v 1 k would be assigned to and would earn a wage of $8. Uncertainty a p 1 about v, however, leads to departures from this full-information outcome, and figure 8 shows the percentiles of the wage distribution for 10 periods into the future. Given the parameter values, all workers initially are assigned to and earn a wage of $6.50. a p 0.7 Figure 8 demonstrates several points. First, the exceptionally high wages captured by the 85th and 95th percentiles result from "good luck" (high realizations of e) and the fact that productivity signals are highly influential early in a worker's career. Continued learning, however, leads wages for these individuals to converge to $8, and in the limit, all workers not assigned to earn $8. Second, convergence to a wage of $8 is slower a p 0 at the bottom than at the top of the wage distribution because learning is slower for those who initially experience bad luck and are assigned to relatively low-a jobs. Third, since by period 7 the median worker earns a wage close to the true productivity, the long-run per-period wage loss of workers incorrectly assigned to is roughly . Finally, a p 0 v Ϫ k p 1 note that the percentages along the line of represent the cumulative k p 7 probability of incorrect assignment and that the marginal increase in this probability decreases with experience. Thus, the probability of incorrect assignment is greatest early in a worker's career.

V. Data and Empirical Implementation
To construct the data used in our empirical analysis, we create an index that ranks occupations by the degree to which output depends on unobserved skills. We then merge this measure of a with occupational work histories from the NLSY79 in order to construct life cycle patterns of a and wages.

A. The Dictionary of Occupational Titles
To construct our measure of a, we rely on information in the Dictionary of Occupational Titles. The DOT provides information on the primary tasks performed in a given occupation and the worker characteristics necessary for successful job performance. The occupational characteristics given in the DOT are linked to the 1970 census three-digit occupation codes in an augmented version of the April 1971 Current Population Survey (CPS) compiled by the Committee on Occupational Classification and Analysis at the National Academy of Sciences. This augmented data file contains occupation codes from the fourth edition of the DOT, which we update with the 1991 revised fourth edition of the DOT. 10 The data in the DOT are both comprehensive and detailed, describing over 12,000 occupations along 44 dimensions.
From the DOT, we assemble a list of job characteristics that capture the importance of hard-to-observe skill to job performance. There are several key features that characterize hard-to-observe skill in our model. First, there must be uncertainty about the skill prior to a worker's entry into the labor market. Second, observing output only gradually reveals a worker's skill, and the more important the unobservable skill to successful job performance, the more quickly the skill is revealed. In order to identify occupations in which hard-to-observe skill is important to job performance, we select occupational characteristics that indicate the importance of complex tasks. We define complex tasks as those for which it is hard to write down an explicit algorithm for successful completion. This is similar to the definition of "nonroutine" tasks in Autor et al. (2003) and to the definition of "unanalyzable" in Perrow (1967).
Our reasoning is that if a task can be broken into an ordered list of well-defined actions, then a worker's ability can be quickly learned by observing his or her performance at each separate action. In contrast, if it is difficult to explicitly describe how to successfully complete a task, then it will be difficult to determine a worker's skill without observing his or her on-the-job performance. For example, we classify the DOT variable "Data" as complex since occupations that score high on this variable involve activities such as "conducts research to discover new uses for chemical by-products" and "creates satirical cartoons based on current news events"-activities for which it would be difficult to write down step-by-step instructions. In contrast, we classify the DOT variable "Things" as noncomplex since even occupations that score high on this variable involve activities for which it is relatively easy to give detailed instructions, such as "prepares machines for operation," "verifies the dimensions of parts for adherence to specifications," and "verifies the ac-curacy of machine functions." Further examples are presented in table 1, and a full list of the DOT variables we classify as complex is included in the note to the table.
Using this list of variables, we construct a single summary measure of a using principal component analysis. Since the DOT variables do not have a natural scale, we follow Autor et al. (2003) and first transform each DOT variable into a percentile value corresponding to its ranking in the 1970 distribution of that job attribute in the population. We then calculate the first principal component and predict the first component score for each occupation. To ease comparison with our theoretical model, we normalize this predicted first component score by calculating its percentile ranking and dividing by 100. This normalized predicted score naturally takes on a value between zero and one, and higher values indicate a higher level of required skill. We then use this normalized predicted score as our measure of a. 11 At every stage in constructing our measure of a, we use the sampling weights given in the CPS. Thus, our measure of a best captures the variation in the occupational characteristics from the DOT for a nationally representative sample of men in the United States. 12 To verify that a captures the importance of unobservable as opposed to observable skill, we create a measure of the importance of observable skill using variables from the DOT that we classify as noncomplex and easy to observe (and so were not used to create our measure of a). Just as we did when constructing a, we use principal component analysis to create a single measure of the importance of observable skill. As it turns out, the ranking of the occupations in terms of the importance of observable skill is very different from the ranking of occupations in terms of unobservable skill. To illustrate this point, table 2 compares occupations that have similar observable skill requirements but different measures of a. For example, while both legal secretaries and bank tellers have similar observable skill requirements, a is higher for legal secretaries than it is for bank tellers (0.69 vs. 0.52), suggesting that it is harder to observe the skills needed to be a legal secretary compared to a bank teller.

B. The National Longitudinal Survey of Youth 1979
The model we develop focuses on the evolution of over the life cycle.
a t In order to construct this occupational work history, we use the NLSY79, which follows individuals born between 1957 and 1964. We focus our empirical analysis on males in the cross-sectional sample. Although the NLSY79 contains information on individuals' labor force activities for   each week from 1978 through the most recent year in which a respondent was interviewed, we rely on labor market data only from 1978-2000 because of a switch in occupational coding that occurred after 2000. If a respondent is not interviewed in a given year (or years), then at the next interview date, the respondent is asked to go back and retrospectively report his or her labor force activities. As a result, the NLSY allows us to construct relatively complete work histories. The work history data include information on each of up to five jobs a respondent may have held in a given week, and we define an individual's occupation in a given week to be his occupation in the job at which he worked the most number of hours. We follow individuals' occupational histories starting with their first transition to full-time work after the completion of their highest degree. In particular, following the completion of their degree, we identify the first week in which individuals are working at least 10 hours a week and in which they will continue to work at least 10 hours a week for at least 39 of the next 52 weeks. We then keep a running tab of their actual labor market experience and their occupation in each week in which they work. 13 In our empirical analysis, we focus on the first 350 weeks (about 6.7 years) of an individual's actual experience in the labor force because attrition from the sample makes it difficult to construct complete work histories for longer horizons.
We lose 693 respondents because we cannot identify either their highest degree or the date at which they received their highest degree. We additionally drop 350 respondents who completed their highest degree prior to the start date of the work history record and 254 observations who complete their highest degree relatively late in life because we worry that these workers already may have accumulated substantial labor market experience that could influence employers' beliefs about skills. We also drop 239 observations whose occupational history is relatively incomplete. In particular, we drop individuals who have more than 150 weeks in which they either are not working or have missing occupation information during the first 500 weeks following their transition to full-time work. In other words, we give individuals 500 weeks in which to accumulate 350 weeks of valid occupation information; otherwise we drop them from the sample. We additionally drop 67 individuals who ever report an hourly wage of either over $100 or under $2. After these restrictions have been made, we are left with 1,360 individuals. Relative to the initial sample, these individuals have a relatively strong attachment to the labor market and are relatively young. Table 3 presents basic summary statistics for our sample.

VI. Empirical Findings
In this section, we document patterns in wage growth and job assignments over the life cycle, discuss the extent to which our model's predictions are supported by the data, and relate our findings to existing models of wage dynamics.
Wage and job assignment profiles.-We begin by describing the changes in a and wages over the life cycle. Figure 9 shows the average value of a by weeks of actual experience. For college graduates the average value of a in the first week is 0.65, rising to roughly 0.75 in week 350. Similarly, a increases from about 0.32 to 0.42 for high school graduates. Figures 10  and 11 present the average hourly wage and the standard deviation of hourly wages for high school graduates and college graduates. As in previous studies, we find that both wages and wage dispersion increase over the life cycle. The above patterns in a and wages are consistent with our model. First, the optimal level of a is relatively low in the early stages of workers' careers when there is considerable uncertainty about workers' skills, and over time, as workers sort into jobs in which they are more productive, both wages and wage dispersion grow. These findings, however, are also consistent with theories of on-the-job training in which workers learn how to perform tasks and accordingly move up the job ladder, causing wages to increase over time (e.g., Jovanovic and Nyarko 1997).   As discussed in Section IV.B, workers who experiment will have relatively low initial wages and high initial a but will have higher wage growth and wage dispersion than workers with high initial wages and low initial a (see, e.g., the simulations in figs. 6 and 7). To look for evidence of this, we regress a worker's hourly wage in week t on a worker's initial job assignment (a 0 ), the initial wage ( ), experience in week t, and the w 0 interaction between experience and both a 0 and . Columns 1 and 2 of w 0 table 4 show that for both high school graduates and college graduates, the coefficient on the interaction between a 0 and experience is positive and statistically significant whereas the coefficient on the interaction between and experience is negative, suggesting that wage growth is higher w 0 for the group of workers who begin their careers in jobs with higher a and lower wages.
To look for evidence of whether the increase in wage dispersion is higher for workers with high initial a, we determine the quartile of the distribution of a and the quartile of the distribution of wages into which each worker falls in week 1. Thus, for each education category, there are 16 possible bins into which a worker can fall (four a quartiles and four wage quartiles). We want to know how the change over time in the spread of the wage distribution within each bin depends on workers' initial job assignments. Thus, within each bin, we calculate the difference between the wage at the 90th and the 10th percentiless for every week of actual experience. We then regress this measure of wage dispersion on the quartile of the initial wage distribution, the quartile of the initial a distribution,  experience, the interaction between experience and the quartile of the initial wage distribution, and the interaction between experience and the quartile of the initial a distribution. Table 5 reports the results of this regression. The coefficient on the interaction between the quartile of the initial a distribution and experience is positive and statistically significant, suggesting that the spread in the distribution of wages grows more quickly for those initially in occupations with higher a. Note that if jobs with higher a provide more training than jobs with lower a, then the above patterns also could be consistent with investment in human capital as opposed to investment in information (see Ben-Porath 1967).
Job transitions.-A key difference, however, between our model and human capital models is that workers in our model should also transition into jobs that depend less on the unobserved skill, which we analyze next. In our model, as uncertainty is resolved and workers experiment less, a fraction of workers should move to jobs with lower a. Table 6 presents a transition matrix for a in which the rows show the decile of a before the transition and the columns show the decile of a after the transition. Thus, the sum of the entries in each row adds up to 100%. The entries below the diagonal capture transitions into lower deciles and the entries above the diagonal capture transitions into higher deciles. Clearly a large fraction of all occupational changes involve transitions into jobs with a lower a.
A key feature of our model is the trade-off between current wages and information and the fact that a decline in experimentation can lead to wage increases even for those who move to occupations with a lower a. Thus, transitions to jobs in which output depends less on hard-to-observe skills may entail wage increases. Table 7 summarizes the mean change in a and wages for workers who move to higher-and lower-a jobs. First, note that the number of job changes is larger in the first 200 weeks after a worker's entry into the labor market than it is in weeks 201-350, suggesting a decline in uncertainty and experimentation. Consistent with our model, we also find that even among those who transition to lower-a jobs, wages increase on average.
The magnitude of the changes in a and wages in table 7 is large. The absolute value of the mean change in a for those who change jobs is above 0.2, which is substantial given that in week 1 the mean of a is 0.43 and the standard deviation is 0.29. In addition, the mean increase in a associated with a move up is more than twice as large as the entire increase in the mean of a from week 1 to week 350, suggesting that a substantial fraction of workers transition to jobs with a lower a. Indeed, we find that between week 1 and week 350, a declines for roughly 30% of the workers in our sample. 14 Furthermore, the wage increase for those who move to lower-a jobs is approximately $1.18, which is large compared to the wage increase for those who do not change jobs and compared to the overall wage increase of $4.80 for those with . 15 a ! a 350 0 Existing theories of learning and sorting (e.g., Gibbons et al. 2005;  Gibbons and Waldman 2006) that incorporate on-the-job training can explain both why a increases over time on average (because experience augments the unobserved skill) and why a may decline for some individuals (some will receive negative information about their skill). But in those models, wages and a should move in the same direction when workers change jobs because workers who receive positive information about their skill will move to jobs in which output depends more on skill and earn more, whereas those who receive negative information will move to jobs in which output depends less on skill and earn less. The results in table 7, however, show that a and wages do not always move in the same direction. Life cycle patterns of experimentation and sorting.-Our model also predicts that for workers who experiment (those for whom but m ! k t ), the optimal level of experimentation is initially low, increases over a 1 0 t time, and then eventually declines. This nonmonotonic relationship is difficult to test for because the nonmonotonicity holds conditional only on and for and because adequate measures of are not readily m m! k m t t t available as wages reflect both expected skill and job assignments. Nonetheless, while our analysis is only suggestive, we look for a nonmonotonic relationship between experience and experimentation using workers' starting wages and the value of a associated with their first occupation to proxy for m 0 and by focusing on individuals whose wages fall below a certain threshold to try to isolate individuals for whom . In partic-m ! k t ular, for each week of experience, we keep only individuals whose wage is less than , where is defined to be the average starting wage of Columns 3 and 4 of table 4 present the regression results. For college graduates, the coefficient on experience is positive and the coefficient on experience squared is negative and statistically different from zero. On the basis of these estimates, figure 12 shows the predicted value of a by experience for the median individual with a college degree. It shows that the overall increase in a between the first week and week 350 is small relative to the increase between the first week and week 200, when the predicted value of a reaches its peak. We also find weak evidence of nonmonotonicity for high school graduates. In column 3, the coefficient on experience squared is still negative but no longer statistically different from zero (see also fig. 13). 17 Finally, while we also find evidence that workers sort over time into jobs that depend either more or less on the unobserved skill, we do not find evidence that they sort into occupations in which a is close to either 16 We chose this specification to minimize the mean squared error. The inclusion of higher-order terms of experience does not substantially change the mean square error, and the coefficient on higher-order terms is statistically indistinguishable from zero. In addition, adding higher-order terms does not change any of our qualitative findings. 17 As discussed above, our model predicts that we should not find any evidence of a nonmonotonic relationship between experience and experimentation for workers with expected. When we repeat the analysis in table 4 for high-m 1 k t wage workers, we find that a is strictly increasing with experience. zero or one, as predicted in proposition 4. There are several extensions to our model that could potentially account for this. For example, if there are switching costs that depend positively on the difference between the value of a in a worker's current job and the value of a in the next job, then this will limit sorting. In addition, if the production function involves complementarities between the observed and the unobserved skills, then even under full information, workers will not sort into jobs that depend on one skill.

VII. Conclusion
This article develops a life cycle model of occupational choice and wage dynamics when jobs differ in the amount of information they provide about workers' skills. In this setting, we show that workers experiment, trading off current-period wages for information. Our model predicts that the optimal level of experimentation is relatively low at the beginning of workers' careers, increases as workers gain experience, and then declines as workers become increasingly certain about their skill. This eventual decline in experimentation partially drives wage growth in our model. In addition, experimentation can lead random productivity shocks, especially when workers are young, to have lasting effects on workers' career trajectories.
We then use data from the Dictionary of Occupational Titles to construct a measure of how much information different occupations reveal about workers' skills and match this measure to data from the NLSY79. In particular, our measure captures the degree to which different occu-pations involve complex tasks, conjecturing that there will be uncertainty about workers' skill in these tasks. We then document patterns of occupational choice and wage dynamics in the NLSY79. Consistent with our model, we find that workers tend to start their careers in jobs that reveal relatively little about their skill. In addition, the more information a worker's initial job reveals about the worker's skill, the faster the worker's wage growth will be. We also find that a large fraction of workers transition into occupations with lower skill requirements and that these transitions are often accompanied by wage increases, a fact that is hard to reconcile with existing models of wage dynamics.
We believe our results suggest that experimentation may be an important feature of the labor market. Nonetheless, we acknowledge that without estimating the fundamental parameters of a richer model of wage dynamics, we cannot parse out the importance of experimentation relative to other factors that may explain the wage growth and job transition patterns in our data. For example, an obvious extension to our model would be to allow workers to accumulate job-specific human capital and to quantify the importance of experimentation relative to learning by doing and search frictions. We leave these identification issues and extensions to future work.