The Effect of Incarceration on Re-Offending: Evidence from a Natural Experiment in Pennsylvania

This paper uses a sample of convicted offenders from Pennsylvania to estimate the effect of incarceration on post-release criminality. To do so, we capitalize on a feature of the criminal justice system in Pennsylvania—the county-level randomization of cases to judges. We begin by identifying five counties in which there is substantial variation across judges in the uses of incarceration, but no evidence indicating that the randomization process had failed. The estimated effect of incarceration on rearrest is based on comparison of the rearrest rates of the caseloads of judges with different proclivities for the use of incarceration. Using judge as an instrumental variable, we estimate a series of confidence intervals for the effect of incarceration on one year, two year, five year, and ten year rearrest rates. On the whole, there is little evidence in our data that incarceration impacts rearrest.

whereas general deterrence refers to the response to the threat of punishment in the public writ large. This distinction is important because the experience of punishment may be an ineffective deterrent or possibly criminogenic even as the threat of punishment is an effective deterrent to the public writ large (Nagin 2013a, b). The principle aim of the research reported here is to estimate the specific deterrent effect on reoffending of a custodial sanction involving imprisonment as compared to a noncustodial sanction. We hereafter refer to this response as the effect of experience of punishment on reoffending rather than the specific deterrence because as will be discussed, there are also sound reasons for theorizing that the experience of imprisonment might exacerbate rather than reduce recidivism.
A recent review of the empirical literature on the effect of the experience of imprisonment compared to noncustodial sanctions concludes that on balance the evidence points to a null or criminogenic effect rather than a preventive effect ). However, this review also concludes that the evidentiary base for this conclusion is weak. Nagin et al. (2009) identify a long list of deficiencies in the research on the effect of imprisonment on reoffending. Included among these deficiencies are insufficient controls for age, prior record of offending, and offense severity, all of which may bias the imprisonment effect estimate.
Possible exceptions to this criticism include Nieuwbeerta et al. (2009), Bales and Piquero (2012), and Wermink et al. (2010) who use various forms of matching to account for the potential forms of bias from measured covariates that are known to be correlated with recidivism. All conclude that the experience of a custodial sanction is associated with an increase in recidivism. Snodgrass et al. (2011), using similar methods, examines whether time served in prison is associated with recidivism and finds no evidence of such an association either positive or negative. Each of these studies, however, is vulnerable to the criticism that even with extensive controls for measured differences between individuals who do and do not receive custodial sanctions, unmeasured differences that are systematically related to recidivism probability may be biasing the estimate of the effect of imprisonment on reoffending.
To overcome this problem of bias from unmeasured differences in this paper, we capitalize on the random assignment of cases to judges in the criminal courts of Pennsylvania. Because of random assignment, there will be no systematic difference in case characteristics either measured or not across judges. Cross-judge variation in punitiveness, which we demonstrate exists in Pennsylvania, is used as the basis for inferring the effect of incarceration compared to a noncustodial sanction on reoffending. This same approach to avoiding bias from unmeasured covariates is used in Green and Winik (2010) and Loeffler (2013). We discuss these studies in greater detail below.

Background
The logic of specific deterrence is grounded in the idea that the experience of imprisonment will deter reoffending, perhaps because the experience is more adverse than anticipated. Moreover, because the criminal law commonly prescribes more severe penalties for recidivists, 1 the structure of the law itself may also cause previously convicted individuals to revise upward their estimates of the likelihood and/or severity of punishment for future law breaking. 2 The experience of punishment may also affect the likelihood of future crime by decreasing the attractiveness of crime itself or by expanding alternatives to crime. While imprisoned the individual may benefit from educational or vocational training that increases post-release non-criminal income earning opportunities (Layton MacKenzie 2002). Other types of rehabilitation are designed to increase the capacity for self-restraint when faced with situations, like a confrontation, that might provoke a criminal act such as violence (Cullen 2002).
On the other hand, there are many reasons for theorizing that the experience of punishment might increase an individual's future proclivity for crime. While some individuals might conclude that imprisonment is not an experience to be repeated, others might conclude that the experience was not as adverse as anticipated and as a result be more, not less, crime prone. Prisons might be 'schools for crime' where inmates learn new crime skills even as their non-crime human capital depreciates. Associating with other more experienced inmates could lead new inmates to adopt the older inmates' deviant value systems or enable them to learn 'the tricks of the trade' (Hawkins 1976;Steffensmeier and Ulmer 2005). Being punished could also elevate the offender's feelings of resentment against society (Sherman 1992) or strengthen the offender's deviant identity (Matsueda 1992).
The experience of imprisonment may also increase future criminality by stigmatizing the individual socially and economically. There is much evidence showing that an important part of the deterrent effect of legal sanctions stems from the expected societal reactions set off by the imposition of legal sanctions (Williams and Hawkins 1986;Nagin and Pogarsky 2003;Nagin and Paternoster 1994). Prior research has found that individuals who have higher stakes in conformity are more reluctant to offend when they risk being publicly exposed (Klepper and Nagin 1989). While the fear of arrest and stigmatization may deter potential offenders from breaking the law, those that have suffered legal sanctions may find that conventional developmental routes are blocked. In their work on the 500 Boston-delinquents initially studied by Glueck and Glueck (1950), Sampson and Laub (1997) have called attention to the role of legal sanctions in what they call the process of cumulative disadvantage. Official labeling through legal sanctions may cause the offender to become marginalized from conventional opportunities and non-criminal social networks, which in turn increases the likelihood of their subsequent offending (Bernburg and Krohn 2003). Sampson and Laub (1997) propose that legal sanctions may amplify a snowball effect that increasingly mortgages the offender's future by reducing conventional opportunities. Several empirical studies support the theory that legal sanctions downgrade conventional attainment (Freeman 1995;Waldfogel 1995, 1998;Sampson and Laub 1993;Waldfogel 1994;Western 2002;Western et al. 2001).
Although space does not permit an extended discussion of the evidence on the effect of imprisonment on reoffending, there are a few observations that deserve mention. First, in terms of numbers, the great majority of studies based on non-experimental data point to a criminogenic effect of custodial sanctions compared to non-custodial sanctions ). As already indicated, much of this research is vulnerable to the criticism that persons sent to prison are more crime prone in unmeasured ways and as a result, the seeming criminogenic effect of imprisonment is entirely or at least substantially attributable to selection bias.
Second, there have been a small number of experimental or quasi-experimental studies comparing custodial versus non-custodial sanctions. Nagin et al. (2009) conclude that, taken as a whole, the experimental studies also point towards a criminogenic effect of custodial sanctions. The evidence for this conclusion, however, is weak because it is based on a small number of studies and many of the point estimates are not statistically significant. Further, several features of the samples used in these studies also limit their usefulness for understanding the effects of imprisonment on reoffending in the contemporary context of imprisonment in America. Of the five experimental examinations, two involve juveniles and all but one (Killias et al. 2000) utilize data that is more than 20 years old. Among the four studies involving adults, only Bergman (1976) is based on a population that might be characterized as serious adult offenders. 3 Two other studies are also notable for our proposed research-Green and Winik (2010) and Loeffler (2013). Similar to this work, each of these studies use the random assignment of cases to judges to overcome the selection problem. None of these studies found evidence that the experience of imprisonment affected reoffending. Our proposed research moves beyond these valuable efforts in several ways. First, as elaborated upon in the section ''Methods'', random case assignment guarantees that any difference across judges in the recidivism rates of their case loads is attributable to a ''judge'' effect. Thus, an important first step in the analysis is establishing that ''judge'' treatment effects are present and large, something that was only done in the Loeffler analyses. Second, we extensively check balance across judges in observed covariates. Third, instead of relying on the output from an instrumental variable regression, our analysis takes a different approach developed by Rosenbaum and colleagues (Imbens and Rosenbaum 2005;Rosenbaum 1996Rosenbaum , 1999Rosenbaum , 2002a. While not necessarily better than classic econometric techniques for estimating the impact of treatment using instrumental variables, our approach adds value in two important ways. First, our approach develops an individual-level model of the response to incarceration. Rather than relying on stochastic disturbances in a regression framework, this approach clearly develops a counterfactual argument and then relates our inference back to the counterfactual model. Second, and more importantly, our approach generates statistically valid confidence intervals even when the instrument is uninformative or weak. The problem of weak instruments is well known and well documented (Bound et al. 1995;Nelson and Startz 1990;Maddala and Jeong 1992). The classic Two-Stage Least Squares (TSLS) approach to the estimation of treatment effects using instrumental variables relies on asymptotic properties. However, the finite sample and asymptotic properties on which TSLS rely are highly questionable when the instrument is weak or uninformative (Nelson and Startz 1990;Maddala and Jeong 1992). Rather than obfuscating the limitations associated with a weak or uninformative instrument, our approach continues to yield valid confidence intervals even when the instrument is weak. Specifically, rather than providing a point estimate and associate standard error that is driven by incorrect asymptotics, our approach yields confidence intervals that grow in length as the information contained in the data degrades. Put differently, the approach taken in this paper will inform how, rather than assume that, the data is informative concerning the treatment effect.

Data
To estimate the impact of incarceration on subsequent criminality, this study uses a sample of 6,515 offenders convicted of a criminal offense in the Court of Common Pleas in the state of Pennsylvania during 1999 who had their sentencing information forwarded to the Pennsylvania Sentencing Commission. As discussed in greater detail below, for each of these 6,515 offenders we observe basic demographic characteristics, extent and severity of offending history, seriousness of current offense, the type of punitive sanction (e.g., release to the community, sentence to county jail, sentence to state prison), and the duration of the incarceration administered by the judge. To measure future offending, we observe any arrest that occurred in the state of Pennsylvania.
We use six Pennsylvania counties to estimate the impact of incarceration on rearrest. To be clear, in Pennsylvania, randomization occurs at the level of the county-the geographic unit to which Common Pleas judges are elected. Among the duties of Common Pleas judges is the adjudication and sentencing of criminal cases. Pennsylvania is composed of 67 counties. The number of judges elected in a county depends upon its population. This analysis began by identifying Pennsylvania counties that satisfied three conditions. First, pre-sentence covariates were examined to identify counties in which the 1999 randomization process achieved balance on observable covariates. 4 Second, sentencing outcomes were examined across time to identify counties in which the judiciary demonstrated stable sentencing practices. Third, conditional on satisfying the previous two requirements, counties in which there was statistically significant variation across judge in the use of confinement, whether in the form of jail or prison, were selected. 5 This process led to the identification of six counties that satisfied the selection criteria-Centre, Crawford, Cumberland, Dauphin, Erie, and Mercer counties. 6 Table 1 presents a brief comparison of the counties used in this analysis with Pennsylvania and the United State as a whole. As compared to the national average, the counties used in this analysis tend to be more White, more educated, have a lower prevalence of foreign born citizens and families below the poverty line, and tend to be more rural. When compared to the Pennsylvania average, the six counties used in this analysis are generally less diverse, more educated, have a lower prevalence of foreign born citizens, and its residents are more likely to speak English in the home. It should also be noted that the six counties used in this analysis vary rather substantially along the characteristics shown in Table 1. For example, the proportion of White residents ranges from a low of 0.771 in Dauphin county to a high of 0.97 in Crawford county. Similarly, the proportion of residents living in a rural area ranges from a low of 0.147 in Dauphin to a high of 0.656 in Crawford county.
As mentioned above, in these six counties 6,515 offenders were convicted of a criminal offense in the Court of Common Pleas during 1999 and had their sentencing information 4 Balance on observables is shown in the ''Results'' section. 5 Variation in the use of confinement is shown in the ''Results'' section. We also selected one county in which there was no variation in the use of confinement as a control. 6 Pennsylvania's two largest counties, Allegheny and Philadelphia, assign a subset of judges to hear specific types of cases usually involving less serious charges. Our data do not identify judges who were so assigned. As a consequence, it was not possible to test for balance and differences in judge punitiveness in these two counties. For this reason they were not candidate counties for inclusion in the analysis. From the remaining 65 counties we removed 59 because there was insufficient variation in the use of confinement to merit inclusion in the analysis. forwarded to the Pennsylvania Sentencing Commission. 7 The randomization of cases occurs when the case is docketed, which is prior to conviction. Thus, we do not observe cases that were randomly assigned to judges but did not result in conviction. In the six counties used in this analysis, not every judge presided over at least 100 cases that resulted in conviction. To ensure that the observed sentencing outcomes reflect the judge's true underlying tendency to mete out incarceration, from the 6,515 offenders convicted in these six counties, we set aside data from 110 (1.7 %) offenders who were sentenced by one of the judges who did not sentence at least 100 offenders in 1999. We use the information contained in these 6,405 offenders' cases to verify that the 1999 randomization achieved the desired level of balance in the pre-sentence and case disposition covariates. The pre-sentence covariates and case disposition measures used in this analysis were supplied by the Pennsylvania Sentencing Commission (PASC). PASC data allows the tracking of cases from sentencing through release or entry into the correctional system, depending on sentencing outcome. At the case level, PASC data documents the county of adjudication, the judge of record, and the number of charges in each case. PASC data also records basic demographic information including age, sex, and race. Additionally, the data permits observation of each offender's prior record score, 8 and the number of prior adjudications and convictions for 25 separate offense categories (e.g., number of prior burglary offenses, number of prior rape offenses, number of prior felony drug convictions). At the charge level, PASC data allows observation of the offense classification for each charge and any mitigating or aggravating circumstances.
For the purposes of estimating the effect of incarceration, we then restrict our sample to the 6,127 offenders for whom we could locate valid correctional and arrest data. This restriction resulted in the removal of 282 (4.6 %) offenders for whom either no rap sheet data could be located or for whom the correctional outcome was inconsistent with the sentencing data. 9 In this work, we measure reoffending by rearrest rate in 1, 2, 5, and 10 years after sentencing. To generate these rates, we use rearrest in Pennsylvania as measured by Pennsylvania State Police rap sheet data. This rap sheet data allows us to observe any arrest that occurred in Pennsylvania between the date of sentencing and April 30, 2010. 10 7 Not all sentences are reported to the Commission. (1) Philadelphia Municipal Court sentences are not reported to the Commission. These may include DUI (driving under the influence) offenses as well as other misdemeanor offenses. (2) Offenses sentenced by district magistrates are not reported to the Commission. These typically include DUI offenses or other misdemeanor offenses. (3) Murder 1 and Murder 2 offenses, which are subject to life or death mandatory sentences, do not fall under the sentencing guidelines and are not required to be reported to the Commission. The Commission encourages reporting of the Murder 1 and Murder 2 offenses; many are reported and are included in the data collection'' (Pennsylvania Sentencing Commission 1999). 8 Prior record score is a numeric variable calculated by PASC which aims to encapsulate the seriousness of the offender's entire prior criminal history. 9 For example, sentencing data indicated a period of confinement in state prison but no period of confinement could be located in the Department of Corrections files. 10 Our data only measure arrests that occur in Pennsylvania. We, thus, do not measure arrests that occurred outside of Pennsylvania. While we do not have data that permits us to speak to the degree to which the offenders used in this analysis were arrested outside of Pennsylvania, there is literature which measures displacement across state lines. Langan and Levin (2002) found that 7.6 % of offenders released from confinement were rearrested out of state. Similarly, Orsagh (1992) found that roughly ten percent of offenders released from eleven state prisons in 1983 would experience an out of state arrest in the 3 years following their release. Nakamura (2010) found that slightly less than 23 % of offenders arrested in New York in 1980 were not rearrested in New York, but were arrested in another state. These studies suggest that the recidivism measures used in this analysis only modestly underestimate actual re-arrest rates.
With respect to the calculation of our outcome measure, one point merits further discussion. Studies of the effect of imprisonment on reoffending, including analyses conducted by the authors (Nieuwbeerta et al. 2009;Snodgrass et al. 2011) routinely correct for exposure time-time not incarcerated-in calculating rearrest rates or time to rearrest. The rationale for the exposure time correction is to avert contamination of the behavioral effect of incarceration on reoffending with incapacitation effects.
In this analysis we do not correct for exposure time. Our changed stance on correcting for exposure time is reflective of several considerations. Because incarceration follows randomization, incarceration should be viewed as a consequence of treatment and as such should not be statistically controlled. Suppose, for example, incarceration exacerbates criminality. As a consequence, individuals who are initially incarcerated, on average, will commit more crimes and thereby, will be more vulnerable to further stints of incarceration. Because their greater vulnerability to incarceration is a result of their treatment status, it should not be statistically controlled. Adjustments for exposure time, including for the initial ''treatment-status'' incarceration, also creates potentially large imbalances in age. This is a very serious potential threat to identifying the treatment effect because recidivism is highly age dependent ) with older adults offending at substantially lower rates than younger adults. By correcting for exposure time, incarcerated offenders are older and, hence, less likely to offend than those that are not incarcerated. This relationship between incarceration and offending, however, is a result of the aging that takes place during incarceration rather than ''the effect'' of incarceration. 11 Finally, not correcting for exposure time produces a treatment effect estimate that is more relevant from a policy perspective because it measures how many additional (fewer) offenses are incurred by society in the next t years due to the use of imprisonment.

Methods
We use the instrumental variables approach advanced by Rosenbaum and colleagues (Imbens and Rosenbaum 2005;Rosenbaum 1996Rosenbaum , 1999Rosenbaum , 2002a to estimate the effect of imprisonment on 1, 2, 5, and 10 year rearrest rates. We begin with a general overview of the methodological approach and conclude with a more technical discussion. We rely on the randomization of cases to judges within county as the basis for using judge as an instrument to identify the effect of incarceration. Like all instrumental variables techniques, this approach uses the variation in treatment induced by the instrument to identify the effect of treatment. In our application, this requires that the judge to whom the individual is randomized must impact the likelihood that an offender is incarcerated, net of the impact of other factors. The use of judge as an instrument also requires that the judge to whom an individual is randomized impacts the likelihood of rearrest only through his/her effect on the likelihood of incarceration. These requirements are sometimes referred to as an exclusion restriction. These two requirements play a central role in both the classical approaches to instrumental variables found in the econometric literature (e.g., Angrist et al. 1996) and the approach used here.
The first requirement, that the instrument induces variation in treatment net of the impact of other factors, can be resolved empirically. In this application, we demonstrate that this requirement is satisfied by examining differences across judges in their tendency to use incarceration as a punitive sanction. We demonstrate that such variation exists across judges in the ''Results'' section. This approach suffices to demonstrate treatment variation across judges due to the properties of randomization. In particular, randomization guarantees that case and offender characteristics, whether measured or unmeasured, are equivalent across judges in a county. Hence, any variation in the use of incarceration must be attributable to a judge effect.
The second exclusion restriction requirement is an assumption that can be argued, but not empirically verified. In our application, the exclusion restriction requires that judges have no impact on the likelihood of recidivism beyond their impact on the likelihood of incarceration. This assumption is quite reasonable given the very limited interaction between judge and offender in most circumstances. However, it is possible that this condition could be violated. For example, a stern admonishment from the bench may deter (or exacerbate) future criminality in a subset of offenders. Alternatively, an informal request by the judge that local law enforcement more closely watch a given offender, thereby increasing the likelihood that s/he is observed engaging in criminal activity, would constitute a violation of the exclusion restriction. While these scenarios are possibilities, conversations with criminal justice practitioners in Pennsylvania indicate that they rarely occur.
Our identification approach also rests on an important additional assumption-the effect of incarceration is homogeneous 12 and additive. These two assumptions play a critical role in the estimation strategy described below. To see why, consider the extreme case where one judge incarcerates his/her entire caseload and another judge incarcerates none of his/ her caseload. If the treatment effect is additive and homogenous this implies that in expectation the difference in the rearrest rate of their respective caseloads will be b, where b measures the difference in the recidivism rate in a sanction regime in which all individuals are incarcerated versus a sanction regime in which none are incarcerated. Because the treatment effect is assumed homogenous this implies that in the less extreme case where the difference in the probability of imprisonment between the harsh and lenient judge is D, in expectation the difference in the rearrest rate of their caseloads will be Db.
The approach taken here uses the following line search to generate a 100(1 -µ) % confidence set for the parameter b. The analyst first proposes a minimum value for b. For those that were incarcerated, this proposed value is subtracted from their observed rearrest rate; call this the adjusted rearrest rate. 13 Next, a test for the equality of mean adjusted rearrest rate is conducted across all judges in the county. If the test concludes that there is statistically significant variation across judge in mean adjusted rearrest rate, then this proposed value of b is rejected. The test is then repeated for a larger value of b until a value of b is found that generates mean equivalence in adjusted rearrest rate across judges. This value of b forms the lower bound of the 100(1 -µ) % confidence set of b. This testing process continues for successively larger values of b until the largest b that generates mean equivalence in adjusted rearrest rate is found. This largest b forms the upper bound of the confidence set. If the set contains zero, then this implies that b cannot be signed.
To formalize the previous discussion, the i-th offender is randomly assigned to the j-th judge in county k. As previously discussed, we use six counties in this analysis, so k 2 fCentre; Crawford; Cumberland; Dauphin; Erie; Mercerg. Henceforth, we suppress the notation of k. Z i is the judge to which the i-th offender is randomly assigned. We assume that the number of judges in a given county is fixed at z, so Z i [ {1, 2, …, z} The i-th offender will be sentenced by the randomly assigned judge to either incarceration, d i = 1, or released to the community, d i = 0. This individual has a fixed potential response to treatment-the individual would be rearrested at rate r I,i if sentenced to prison (i.e., d i = 1) and rate r C,i if released to the community (i.e., d i = 0). We model individual response to incarceration as, However, since d i = 1 or d i = 0, but never both, we observe only one element of the pair (r I,i , r C,i ). Namely, the realized rearrest rate, The adjusted rearrest rate would not depend on whether the individual was incarcerated. Hence, in expectation, the adjusted rearrest rate would be invariant across judges who make differential use of incarceration. We do not, however, know b. To estimate b, the line search algorithm described above is used. To see why this approach is valid, suppose the proposal value is Thus, the adjusted rearrest rate would continue to depend on whether the individual was incarcerated. Consequently, the adjusted rearrest rate would vary across judges who make differential use of incarceration. 14 If the variation across judge in adjusted rearrest rate is statistically significant, then b 0 is not a plausible estimate of b.
In order to generate our confidence intervals we combine the observed judge effect with a model. Put differently, the way that we interpret the observed judge effect is driven by our model, and our model may be incorrect. Our model assumes a constant, additive treatment effect. If the effect of incarceration varies across offenders, this may pose a significant problem for the approach taken here. How consequential such a violation would be depends on whether judges can discern the distribution of individual-level treatment effects. If judges cannot distinguish the offender-level effect of incarceration, then the approach outlined above estimates the mean of the distribution of incarceration effects. However, if the effect of incarceration varies across offenders and judges can discern the offender-level return from incarceration and this information is used to guide the sentencing decision, then our interpretation of the observed judge effect as the average treatment effect no longer holds. See Manski and Nagin (1998) for a demonstration of how such judge discernment capability can substantially affect the bound on the treatment effect of incarceration. The effect crucially depends upon how the judge incorporates this information into the sentencing decision.
It should also be noted that this analysis considers only the effect of the in/out decision-it does not consider the impact of time served. That is, our model assumes that the dose-response relationship between time served and rearrest rate is invariant with respect to time served. If the effect of incarceration varies as a function of time served in a meaningful way, then our estimate represents the dose-response function integrated out with respect to the density of time served. Our model also does not consider the context of incarceration. It assumes the effect of incarceration in a state prison is the identical to the effect of incarceration in a county jail, and it further assumes that the conditions of confinement (e.g., security level of the facility, distance from friends and family) do not influence the impact of incarceration. Although the current work ignores these considerations for the sake of clarity and tractability, they are important considerations.

Randomization
Our methodological approach relies heavily on the randomization of cases to judges. It is randomization which guarantees that the judge to whom a case is assigned is unrelated to either the characteristics of the offender or the offense. For this reason we carefully check for differences in observed case characteristics across judges to test whether, at least based on measured covariates, randomization appeared to have successfully achieved balance. On the whole, there is little evidence against the randomization hypothesis in the six counties used in this analysis.
However, even with flawless adherence to a valid randomization procedure, substantively important covariate imbalance may persist after a single randomization. We observe 42 different measurable characteristics related to offender demographics, current offense severity, and extent of prior criminal offending. 15 Many of these characteristics have been repeatedly found to be predictive of the sentencing decision and of recidivism. To test for the successful implementation of random assignment across judges, we tested for withincounty mean equality across judges for each of these characteristics. The tests were conducted for a = 0.1. We, thus, expect to find imbalance in about 4 measured covariates simply by chance. Table 2 reports the number of tests showing a significant difference for each county. Table 3 reports the covariates for which such significant differences were found. In three counties-Centre, Cumberland, and Mercer-four or fewer covariates were found to be out of balance. In three others, Crawford, Dauphin, and Erie, five or six covariates were found to be out of balance which is one or two more than expected. 16 A full discussion of each out-of-balance covariate in each county would be tedious, but in our judgment, none are substantively important. To illustrate this conclusion we discuss the six out of balances covariates detected in Crawford county. Table 4 shows the 42 covariates used in this study, the mean level of each covariate for each judge in Crawford county, and whether there existed a statistically significant difference across judge. 17 Statistically significant inter-judge variation was found for six covariates: the previous number of burglary adjudications, the previous number of aggravated assault convictions, the previous number of rape convictions, the maximum offense gravity score for the charges contained in the case, the average offense gravity score for the charges contained in the case, and the minimum offense gravity score for the charges contained in the case. This is two more covariates out of balance than we would expect to observe by chance.
On closer inspection, although statistically significant, the magnitudes of the imbalances observed in Crawford county are small and likely of little substantive import. In 1999, two judges sentenced offenders in Crawford county, Judge M and Judge V. Judge M sentenced 300 cases while Judge V sentenced 269. Of the 300 cases sentenced by Judge M, two contained an offender with a prior burglary adjudication. Of the 269 cases sentenced by  Judge V, seven contained an offender with a prior burglary adjudication. Similarly, Judge M sentenced no offenders with either a prior rape conviction or a prior aggravated assault conviction. Judge V sentenced three offenders with a prior rape conviction and three offenders with a prior aggravated assault conviction. Thus, while we were able to detect a mean difference, the size of the difference is rather small. The final three measures that differed across the two judges in Crawford county were the seriousness of the least severe charge in a case, the average seriousness of the charges in the case, and the seriousness of the most severe charge in a case. 18 For all three seriousness covariates, the judge-specific distributions have the same 1st quartile, the same median, and the same 3rd quartile. Put differently, even though statistical tests detected significant mean differences, the judge-specific distributions were, overall, quite similar.

Variation in Incarceration
As discussed in the ''Methods'' section, in order to use judge as an instrument to estimate the effect of incarceration, we must first demonstrate that there exists substantial variation across judges in their willingness to use incarceration as a punitive sanction. Since randomization is conducted at the county level, all analyses are conducted within county. The grey bars in Figs. 1, 2, 3, 4, 5, 6 show the proportion of offenders sentenced to a period of incarceration by each judge. As shown in Table 5, there is statistically significant evidence of variation in the use of incarceration in five of the six counties used in this analysis.
Centre was the lone county in which we could not detect differences in the use of confinement across judge. The harshest judge in Centre county incarcerated 39.4 % of offenders s/he sentenced, while the most lenient judge incarcerated 37.5 % of offenders s/he sentenced. The difference in incarceration rates observed across the three judges was not statistically significant (p value = 0.932). In the remaining five counties, there were large, statistically significant differences across judges in their use of incarceration. In Crawford county, Judge M incarcerated 59.4 % of those sentenced, while Judge V incarcerated 70.9 % of those sentenced. This difference was statistically significant at traditional levels (p value = 0.008). In Cumberland county, the most lenient judge, Judge B, incarcerated 46.6 % of those s/he sentenced, while most punitive judge, Judge O, incarcerated 64.5 % of those sentenced. The difference observed across the 5 judges was statistically significant (p value \ 0.001). Variation in the use of incarceration was more dramatic in Dauphin county where the most lenient judge incarcerated one-fifth of offenders while the harshest judge incarcerated nearly two-thirds of those s/he sentenced (65.3 %). The variation across the seven judges in Dauphin county was statistically significant at any traditional level (p value \ 0.001). In Erie county, the difference across judges in the use of confinement was again statistically significant (p value \ 0.001) with the harshest judge incarcerating 68.8 % of the sentenced and the most lenient judge incarcerating 43.0 % of those sentenced. Finally, in Mercer county, the observed difference across judge in the use of confinement was again statistically significant (p value \ 0.001).  The least punitive judge, Judge W, incarcerated 37.6 % of those s/he sentenced, while the harshest judge, Judge F, incarcerated over three-quarters (76.5 %) of those s/he sentenced.

Estimated Effect of Incarceration
Having demonstrated both covariate balance that is consistent with randomization and substantial inter-judge variation in the use of confinement, we now examine evidence of the effect of incarceration on 1, 2, 5, and 10 year rearrest rates. To do so, we apply the model developed in section ''Results'' to each of the six counties used in this work for each  of the four outcomes. Again, the main result from this approach is a 95 % confidence interval for the estimated effect of incarceration. Hence, we estimate twenty-four 95 % CI. These are shown in Table 6. If the interval falls wholly below zero, then incarceration reduces subsequent criminality. If the interval falls entirely above zero, then incarceration exacerbates subsequent criminality. If the interval contains zero, then we cannot sign the effect of incarceration. No significant variation across judge in average rearrest rate implies 0 will be contained in our confidence interval. Put differently, if we are unable to detect variation across judge in average rearrest rate, then our data does not sign the effect of incarceration. To see why, observe that no variation across judge in rearrest rate despite large differences in the use of confinement means Db ¼ 0 despite observing D 6 ¼ 0. This clearly implies that there is insufficient evidence in the data to conclude that b = 0. In Centre county, there was no statistically significant evidence of variation across judges in their willingness to use incarceration as a punitive sanction. Put differently, the instrument is very weak in Centre county. Consequently, our ability to detect an effect in Centre county is seriously compromised. An adequate model should indicate this, and our's  p values reflect a Fisher's exact (or Chi square) testing the equality of the proportion confined across all judges in the county does. As shown in Table 6, our confidence interval for the effect of incarceration in Centre county is simply (-?, ?). In essence, no variation in the use of confinement means D ¼ 0 ð Þ, so Db ¼ 0 for any value of Db ¼ 0. Stated differently any value of b is consistent with the data.
In the remaining five counties, there was a clear judge effect. Therefore, our instrument should aid in the estimation of the effect of incarceration. In Crawford county, despite differences in the willingness to use incarceration, there is little variation across judges in the average rearrest rate of their caseloads. Not surprisingly then, all confidence intervals contain zero. There is no evidence that incarceration impacts the rate at which offenders would be rearrested in the next year, the next 2 years, the next 5 years, or the next 10 years. In Crawford county, our results indicate that, with high probability, exposure to incarceration could increase the rate at which offenders are arrested in the next 10 years by as much 0.18 arrests per year or could decrease the rate at which offenders are rearrested in the next 10 years by up to 0.48 arrests per year.
Similarly, in Cumberland county there was again statistically significant evidence of a judge effect, but relatively little variation across judge in average rearrest rates. Based on the point estimates shown in Fig. 3, average rearrest rates in the first year are slightly lower for judges who incarcerated a greater share of the offender they sentenced. This pattern, however, is no longer evident by the tenth year after sentencing. Thus, in Cumberland county, point estimates are consistent with a mild incapacitation effect. However, all confidence intervals again contain zero. Consistent with the point estimates, our confidence interval for the effect on 1 year rearrest rates (-1.48, 0.18), is more heavily weighted toward a suppressing effect of incarceration. However, even in the first year after sentencing we are unable to distinguish the effect from 0. For the 10 year window, ambiguity concerning the sign of the effect persists with the 95 % CI covering the interval -0.42 to 0.23.
In Dauphin county, there was large and statistically significant variation across judges in the use of incarceration. Despite this, there is very little variation across judges in the rearrest rates of their caseloads. Not surprisingly then, all confidence intervals for the estimated effect of incarceration include zero. For the 1 year window after sentencing, at the upper bound, incarceration is estimated to increase rearrest by as much as 0.22 arrests. However, at the lower bound, incarceration is estimated to decrease rearrest by 0.28 arrests. Similarly, for the 10 year window the 95 % CI, ranging between -0.05 and 0.10, provides no evidence for signing the effect of incarceration on rearrest rates.
There is little evidence that incarceration impacts rates of rearrest in Erie county. In Fig. 6, there is no clear association between rates of confinement and rates of rearrrest. This holds independent of the duration over which rearrest is measured. As would be expected given this, all confidence intervals contain zero. The estimated impact of exposure to incarceration on 1 year rearrest rates ranges between a decrease of 0.30 arrests and an increase of 0.18 arrests. Similarly, when examining the impact of incarceration on 10 year rearrest rates, estimates range from a decrease of 0.10 arrests per year to an increase of 0.17 arrests per year. Finally, in Mercer county, we again were able to distinguish a strong judge effect, but were unable to identify statistically significant variation across judge in rates of rearrest. Similar to what was observed in Cumberland county, based on point estimates there is weak evidence of a suppressing effect on rates of rearrest. Judges who incarcerated a greater share of offenders had slightly lower average rearrest rates over the observation periods considered here. Despite point estimates indicating a very mild suppressing effect, all confidence intervals for the effect of incarceration contain zero.

County-Pooled Estimate of the Effect of Incarceration
The confidence intervals thus far presented have implicitly allowed the effect of incarceration to vary freely across county. Although the notation has been suppressed, we have allowed b to be indexed on k. This approach allows for a great degree of flexibility since each county's confidence interval is generated without appealing to data from other counties. While quite flexible, this estimation strategy suffers from an important limitation-degradation of statistical power.
Statistical power has important scientific consequences for this analysis. When statistical power is low, even an otherwise well designed study may fail to detect a large effect. In this study, concerns of statistical power can be reframed as concerns about the width of the previously presented confidence intervals. As was noted above, each of the countyspecific confidence intervals contain zero. That is, they do not allow us to either sign the effect of incarceration on recidivism or to determine if the effect is small whatever its sign. However, it could be the case that if the number of offenders sentenced in each county in 1999 was increased the confidence intervals would narrow sufficiently to exclude zero, thereby signing the effect of incarceration.
Although increasing the number of offenders in 1999 is not feasible, we can increase power by carefully pooling data across counties. To do so, we make an additional assumption-the effect of incarceration is the same across all six counties used in this analysis. To see why such an assumption allows us to pool data, we briefly revisit the model developed above.
Henceforth, we no longer suppress the notation on county. At the true, but unknown, value of b k , the adjusted rate of rearrest of individual i sentenced in county k is R i,kb k d i,k = r C,i,k . In other words, the adjusted rearrest rate is simply the rate at which the offender would have been rearrested if not exposed to incarceration. This rate is a fixed property of the individual, and hence, is balanced across judge by randomization. Put differently, E[R i,k -b k d i,k |K, Z] = E[r C,i,k |K, Z] is balanced across Z within K at the true value of b k . To generate our county-specific confidence intervals we use the previously defined line search to check if E[R i,k -b 0,k d i,k |K, Z] is balanced across Z within K. The confidence interval is the set of b 0,k that generate balance.
We can re-express this in a conceptually equivalent way through the use of a fixed effect linear model. Suppose we have a fixed effect linear model of the following form.
Without loss of generality, assume that we omit judge 1 (i.e., Z = 1) in county k. In this linear model, In other words, at the true value of b k , a captures the average adjusted rearrest rate for offenders in county k, which is simply the average rearrest rate for offenders in county k if none had been exposed to incarceration. Also note that Stated less formally, c Z measures how the averge adjusted rearrest rate differs between the z-th judge and the omitted judge. Randomization implies that this is balanced at the true value of b k . Hence, c Z ¼ 0forZ ¼ 2; . . .; z in K at the true value of b k . The fixed effect model can then be used to generate a confidence set by conducting the line search previously defined but substituting a test of the null c Z ¼ 0forZ ¼ 2; . . .; z in K in the fixed effect model for a test of mean equality using ANOVA. The two approaches will generate virtually identical confidence intervals.
The added value of the fixed effects conceptualization is that it can be immediately extended to accommodate multiple counties, thereby allowing the data to be pooled. Again, we layer on the additional assumption that the effect of incarceration does not vary across county. When including multiple counties, the model will take the following form, As before, at the true value of b, each county-specific a measures the average rearrest rate of the offenders in the county had they not been exposed to incarceration. Given that there are likely idiosyncratic county-level factors that inform the tendency to recidivate (e.g., differing access to social services, differing labor markets and employment opportunities, differing distributions of socio-economic status), there is good reason to believe that the county-specific as will not all be equal. This is not problematic. Also as before, the c Z;k measure the difference in the average adjusted rearrest rate between judge Z and the omitted judge. At the true value of b, c Z;k ¼ 0forZ ¼ 2; . . .; z; andallK. To generate our joint confidence intervals which make use of all data, we use the line search previously outlined but test the null c Z;k ¼ 0forZ ¼ 2; . . .; z; andallK in the fixed effects linear model. If we fail to reject this null hypothesis, then the proposal value is retained in the confidence interval. Table 7 presents the cross-county pooled confidence intervals for each of the four postrelease observations windows. As would be expected given our gain in statistical power and the nature of the estimation strategy, the cross-county confidence intervals are substantially tighter than those generated using only information from a single county. Similar to the findings from the county-specific confidence intervals, zero is contained in the 1, 2, and 10 year pooled confidence intervals. Put differently, when considering the effect of incarceration on recidivism there is again no evidence that incarceration impacts the rate at which offenders will be rearrested in the next 1, 2, or 10 years. With high probability, exposure to incarceration reduces the number of rearrests by no more than one-quarter of an arrest and increases the number of rearrests by no more than one-eighth of an arrest in the year after sentencing. Similarly, our pooled confidence interval indicates that in the 2 years after sentencing, incarceration reduces the number of yearly rearrests by no more than one-sixth of an arrest and increases the number of yearly arrests by no more than oneeighth of an arrest. When looking at the confidence interval employing the longest observation window, exposure to incarceration changes the yearly rate of rearrest by less than one-tenth of an arrest.
While these intervals are substantially narrower than those found in the county-specific analysis, they are still relatively wide when compared to the average rearrest rate. The average rearrest rate, across all six counties, in the first year after sentencing was 0.21 arrests, while our confidence interval was 0.36 arrests wide. In the first 2 years after sentencing, offenders sentenced in the six counties considered here average 0.19 rearrests per year, while our confidence interval for the estimated effect of incarceration is 0.27 arrests per year wide. Similarly, in the first 10 years after sentencing, the average rearrest rate observed in the six counties used in this analysis was 0.17 arrests per year, while our confidence intervals had a width of 0.15 rearrests per year. Put differently, pooling data substantially narrowed our confidence intervals, but the effect is still imprecisely estimated when measured against the tendency to recidivate.
In the case of the confidence interval for the 5 year recidivism window, there were no values of b that were consistent with the data. Thus, the confidence set for the effect of incarceration in the 5 years after sentencing is empty. There is more than one interesting explanation for this. First note that, given our estimation strategy, the pooled confidence intervals are essentially the intersection of the county-specific confidence intervals. When looking at the county-specific confidence intervals for the effect of incarceration on 5 year recidivism, the intersection of the county-specific confidence intervals is quite small (0.00, 0.02). Given this small intersection and the improved statistical power, it is not surprising that no value consistent with the data could be found.
Alternatively, the empty confidence interval could indicate that the assumption of a common treatment effect across counties is untenable. Although the evidence is quite fragile, this finding may indicate that the effect of incarceration depends on the community from which the offender is drawn. Just as it would not be surprising if the tendency to offend varied by county, it might not be surprising to find that the effect of incarceration varies across county.

Discussion and Conclusions
On the whole, the results provide no indication of whether the experience of incarceration increases or decreases reoffending rate. This holds whether we observe rearrest during a very short window (1 year) or a long window (2 years). The result holds across all six counties and persists even after pooling data to increase power. This result is consistent with an emerging body of work that uses randomization of cases to judges as the basis for concluding that incarceration has no clear impact on recidivism. The findings of Green and Winik (2010) and Loeffler (2013) mirror our own-they too find no evidence of an effect of the experience of imprisonment on reoffending. Similarly, using the same data analyzed here, Anwar and Stephens (2011) find no evidence that duration of confinement impacts criminality. 19 Further, compared to the base rearrest rate, the confidence intervals on the effect size in our analysis are generally sufficiently wide that they are not informative about whether the effect size is small, whether positive or negative. We earlier argued why we believe this analysis advances the earlier work based on the randomization of cases to judges. In particular, rather than relying on standard IV regressions, we develop an individual-level model of the effect of incarceration and show how this model allows us to trace out the impact of incarceration. Importantly, the approach applied in this paper makes it clear when the instrument is weak and provides results that are reflective of the fact that with a weak instrument it will be difficult to precisely identify a treatment effect.
Still, our study suffers from several important limitations. First, like all IV analyses, it is model based. Thus, the validity of our results depends on the tenability of our model. As was discussed in detail in section ''Results'', our model assumes we have a valid instrument and that the effect of incarceration is constant and additive. We further assume that, at the individual level, the decision to incarcerate is unrelated to the effect of incarceration on offending. While we conclude that the assumptions needed to use judge as an instrument are likely met in this application, we also believe that the assumption of an additive and constant treatment effect is more fragile. It would thus be valuable to examine the sensitivity of our conclusions to alternative formulations such as those posed in Manski and Nagin (1998) that assume that judges can discern to some degree individual-level response to incarceration and act upon that knowledge in sentencing decisions.
One way that our model did incorporate the possibility of effect heterogeneity was through the impact of county. Because we estimated the effect of incarceration independently in each of the six counties, the effect could vary freely across county. Despite this flexibility, we were unable to differentiate the effect from zero in any county. If there is variation in the way that communities respond to offenders, there is likely to be variation in the effect of incarceration. If the goal is the development of sound public policy, then understanding the dynamic that exists between communities and those released from prison is an interesting and important avenue for future work.
It should also be noted that in this study we observe convictions, not cases. Randomization takes place when a case is docketed. Thus, judges have the opportunity to filter cases prior to our ability to observe them. Our concerns about this limitation are assuaged for two primary reasons. First, based on measurable characteristics, there is little difference across judges in the types of cases that progress to conviction. Second, based on conversations with court officials, rates of conviction in these counties are quite high, often above 90 %. Thus, we fail to observe only a relatively small proportion of cases.
On the whole, the literature on the effect of incarceration is developing rapidly both in size and sophistication. This work aids in both of these respects, while also echoing the conclusions of the modern literature. There is little persuasive evidence that incarceration reduces future criminality. Acknowledgments This work was generously supported by National Science Foundation Grants SES-102459 and SES-0647576. We are also grateful to Mark Bergstrom for coordinating the data collection with the PSC, PA DOC, and the PA State Police.

Appendix
See Tables 8, 9, 10, 11, 12.  For continuous covariates, p values reflect One-Way ANOVA test for equality of mean. For binary covariates, p values reflect a Chi square test of the equality of proportions. n random is the number of offenders sentenced by the judge who were used in the randomization checks. n analysis is the number of offenders sentenced by the judge who were used in the analysis estimating the effect of incarceration # 0.05 \ p value \ 0.1 * 0.01 \ p value \ 0.05 ** 0.001 \ p value \ 0.01 *** p value \ 0.0001 For continuous covariates, p values reflect One-Way ANOVA test for equality of mean. For binary covariates, p values reflect a Chi square test of the equality of proportions. n random is the number of offenders sentenced by the judge who were used in the randomization checks. n analysis is the number of offenders sentenced by the judge who were used in the analysis estimating the effect of incarceration # 0.05 \ p value \ 0.1 * 0.01 \ p value \ 0.05 ** 0.001 \ p value \ 0.01 *** p value \ 0.0001