The Effect of Incarceration on Re-Offending: Evidence from a Natural Experiment in Pennsylvania

ObjectivesThis paper uses a sample of convicted offenders from Pennsylvania to estimate the effect of incarceration on post-release criminality.MethodsTo do so, we capitalize on a feature of the criminal justice system in Pennsylvania—the county-level randomization of cases to judges. We begin by identifying five counties in which there is substantial variation across judges in the uses of incarceration, but no evidence indicating that the randomization process had failed. The estimated effect of incarceration on rearrest is based on comparison of the rearrest rates of the caseloads of judges with different proclivities for the use of incarceration.ResultsUsing judge as an instrumental variable, we estimate a series of confidence intervals for the effect of incarceration on one year, two year, five year, and ten year rearrest rates.ConclusionsOn the whole, there is little evidence in our data that incarceration impacts rearrest.


INTRODUCTION
The principle aim of this research is to estimate the effect on reoffending of the experience of a custodial sanction involving imprisonment as compared to a noncustodial sanction. In considering the crime prevention effects of imprisonment, it is important to differentiate between the potential preventive effect of the threat of imprisonment and the potential preventive effect of the experience of imprisonment. In criminology, the former is referred to as general deterrence. The experience of imprisonment may affect reoffending by either altering an individual's opportunity to reoffend or by altering an individual's behavior when released. The first of these is referred to in the criminological literature as incapacitation, the second as specific deterrence. The logic of incapacitation and the direction of its effect are clear. With respect to the specific deterrent effect, there are many sound theoretical arguments for predicting that the experience of imprisonment will be criminogenic, not preventive.
A recent review of the empirical literature on the effect of the experience of imprisonment compared to noncustodial sanctions concludes that on balance the evidence points to a null or criminogenic effect rather than a preventive effect (Nagin, Cullen, & Jonson, 2009). However, this review also concludes the evidentiary base for this conclusion is weak. Nagin et al. (2009) identify a long list of deficiencies in the research on the effect of imprisonment on reoffending. Included among these deficiencies are insufficient controls for age, prior record of offending, and offense severity, all of which may bias the imprisonment effect estimate. Perhaps the most important limitation of existing research is the selection problem-even in studies with extensive controls for measured differences between individuals who do and do not receive custodial sanctions, unmeasured differences that are systematically related to recidivism probability may be biasing the estimate of the effect of imprisonment on reoffending. To overcome the selection problem, we capitalize on the random assignment of cases to judges in the criminal courts of Pennsylvania.
Because of random assignment, there will be no systematic difference in case characteristics across judges. Cross-judge variation in punitiveness, which we demonstrate exists in Pennsylvania, is used as the basis for inferring the effect of incarceration compared to a noncustodial sanction on reoffending. BACKGROUND As stated above, the rationale for incapacitation is clear cut. By incarcerating an individual, his or her opportunity to offend against the community is all but removed 3 . Through incapacitation, society averts, on average, crimes per year that the individual would have committed had s/he been free, where is the individual's expected rate of offending. While as a logical matter we know is non-negative 4 , estimates of the size of the effect vary considerably. For example, Sweeten and Apel (2007) use matching find that incapacitation averts approximately 6 crimes per year in their sample of 18-19 year old offenders. In contrast, Owens (2009), exploiting a change in policy, finds that incapacitation averts just under 3 crimes per year.
The logic of specific deterrence is grounded in the idea that the experience of imprisonment will deter reoffending, perhaps because the experience is more adverse than anticipated. Moreover, because the criminal law commonly prescribes more severe penalties for recidivists 5 , the structure of the law itself may also cause previously convicted individuals to revise upward their estimates of the likelihood and/or severity of punishment for future law breaking. The experience of punishment may 3 The offender could still engage in crimes that do not require physical proximity to the victim (e.g., fraud, extortion, identity theft). The offender may also still engage in criminal behavior directed toward fellow inmates, correctional employees, or the correctional institution. 4 Ignoring the rare offense committed while imprisoned, this is true at the individual level. However, the number of crimes averted at the societal level needn't be non-negative. If the incapacitated individual is replaced by an individual with a larger , then overall levels of crime may rise. 5 For example, sentencing guidelines routinely dictate longer prison sentences for individuals with prior convictions. Similarly, it may also be the case that prosecutors are more likely to prosecute individuals with criminal histories. also affect the likelihood of future crime by decreasing the attractiveness of crime itself or by expanding alternatives to crime. While imprisoned the individual may benefit from educational or vocational training that increases post-release non-criminal income earning opportunities (Layton MacKenzie, 2002). Other types of rehabilitation are designed to increase the capacity for selfrestraint when faced with situations, like a confrontation, that might provoke a criminal act such as violence (Cullen, 2002).
On the other hand, there are many reasons for theorizing that the experience of punishment might increase an individual's future proclivity for crime. While some individuals might conclude that imprisonment is not an experience to be repeated, others might conclude that the experience was not as adverse as anticipated and as a result be more, not less, crime prone. Prisons might be 'schools for crime' where inmates learn new crime skills even as their non-crime human capital depreciates.
Associating with other more experienced inmates could lead new inmates to adopt the older inmates' deviant value systems or enable them to learn 'the tricks of the trade' (Adams, 1996;Hawkins, 1976;Steffensmeier & Ulmer, 2005). Being punished could also elevate the offender's feelings of resentment against society (Sherman, 1992) or strengthen the offender's deviant identity (Matsueda, 1992).
The experience of imprisonment may also increase future criminality by stigmatizing the individual socially and economically. There is much evidence showing that an important part of the deterrent effect of legal sanctions stems from the expected societal reactions set off by the imposition of legal sanctions (Williams and Hawkins, 1986;Nagin and Pogarsky, 2003;Nagin and Paternoster, 1994). Prior research has found that individuals who have higher stakes in conformity are more reluctant to offend when they risk being publicly exposed (Klepper and Nagin, 1989).
While the fear of arrest and stigmatization may deter potential offenders from breaking the law, those that have suffered legal sanctions may find that conventional developmental routes are blocked. In their work on the 500 Boston-delinquents initially studied by Glueck and Glueck (1950), Sampson and Laub (1997) have called attention to the role of legal sanctions in what they call the process of cumulative disadvantage. Official labeling through legal sanctions may cause the offender to become marginalized from conventional opportunities and non-criminal social networks, which in turn increases the likelihood of their subsequent offending (Bernburg and Krohn, 2003). Sampson and Laub (1997) propose that legal sanctions may amplify a 'snowball' effect that increasingly 'mortgages' the offender's future by reducing conventional opportunities. Several empirical studies support the theory that legal sanctions downgrade conventional attainment (Freeman, 1996;Waldfogel, 1995, 1998;Sampson and Laub, 1993;Waldfogel, 1994;Western, 2002;Western, Kling & Weiman, 2001) and increase future offending (Bernburg and Krohn, 2003;Hagan and Palloni, 1990).
Although space does not permit an extended discussion of the evidence on the effect of imprisonment on reoffending, there are a few observations that deserve mention. First, in terms of numbers, the great majority of studies based on non-experimental data point to a criminogenic effect of custodial sanctions compared to non-custodial sanctions (Nagin et al., 2009). As already indicated, much of this research is vulnerable to the criticism that persons sent to prison are more crime prone in unmeasured ways and as a result, the seeming criminogenic effect of imprisonment is entirely or at least substantially attributable to "selection" bias.
Second, there have been a small number of experimental or quasi-experimental studies comparing custodial v. non-custodial sanctions. Nagin et al. conclude that, taken as a whole, the experimental studies also point towards a criminogenic effect of custodial sanctions. The evidence for this conclusion, however, is weak because it is based on only a small number of studies and many of the point estimates are not statistically significant. Further, several features of the samples used in these studies also limit their usefulness for understanding the effects of imprisonment on reoffending in the contemporary context of imprisonment in America. Of the five experimental examinations, two involve juveniles and all but one (Killias, Aebi, & Ribeaud, 2000) utilize data that is more than 20 years old. Among the four studies involving adults, only Bergman (1976) is based on a population that might be characterized as serious adult offenders 6 .
Three other studies are also notable for our proposed research - Berube and Green (2007), Green and Winik (2010) and Loeffler (2011). Similar to this work, each of these studies use the random assignment of cases to judges to overcome the selection problem. None of these studies found evidence that the experience of imprisonment affected reoffending. Our proposed research moves beyond these valuable efforts in several ways. First, as elaborated upon in the section 3, random case assignment guarantees that any difference across judges in the recidivism rates of their case loads is attributable to a "judge" effect. Thus, an important first step in the analysis is establishing that "judge" treatment effects are present and large, something that was only done in the Loeffler analyses. Second, we extensively check balance across judges in observed covariates.
Third, instead of relying on the output from an instrumental variable regression, our analysis takes a different approach developed by Rosenbaum and colleagues (Imbens and Rosenbaum, 2005;Rosenbaum, 1996Rosenbaum, , 1999Rosenbaum, , 2002aRosenbaum, , 2002b. While not necessarily better than classic econometric techniques for estimating the impact of treatment using instrumental variables, our approach adds value in two important ways. First, our approach develops an individual-level model of the response to incarceration. Rather than relying on stochastic disturbances in a regression framework, this approach clearly develops a counterfactual argument and then relates our inference back to the counterfactual model. Second, and more importantly, our approach generates statistically valid confidence intervals even when the instrument is uninformative or weak. The problem of weak instruments is well known and well documented (Bound, Jaeger, and Baker, 1995;Nelson and Startz, 1990;Maddalla and Jeong, 1992). The classic Two-Stage Least Squares (TSLS) approach to the estimation of treatment effects using instrumental variables relies on asymptotic properties. However, the finite sample and asymptotic properties on which TSLS rely are highly questionable when the instrument is weak or uninformative (Nelson and Startz, 1990;Maddalla and Jeong, 1992). Rather than obfuscating the limitations associated with a weak or uninformative instruments, our approach continues to yield valid confidence intervals even when the instrument is weak. Specifically, rather than providing a point estimate and associate standard error that is driven by incorrect asymptotics, our approach yields confidence intervals that grow in length as the information contained in the data degrades. Put differently, the approach taken in this paper will inform how, rather than assume that, the data is informative concerning the treatment effect.

DATA
To estimate the impact of incarceration on subsequent criminality, this study uses a sample of 6,515 offenders convicted of a criminal offense in the Court of Common Pleas in the state of Pennsylvania during 1999 who had their sentencing information forwarded to the Pennsylvania Sentencing Commission. As discussed in greater detail below, for each of these 6,515 offenders we observe basic demographic characteristics, extent and severity of offending history, seriousness of current offense, the type of punitive sanction (e.g., release to the community, sentence to county jail, sentence to state prison), and the duration of the incarceration administered by the judge. To measure future offending, we observe any arrest that occurred in the state of Pennsylvania.
We use six Pennsylvania counties to estimate the impact of incarceration on rearrest. To be clear, in Pennsylvania, randomization occurs at the level of the county-the geographic unit to which Common Pleas judges are elected. Among the duties of Common Pleas judges is the adjudication and sentencing of criminal cases. Pennsylvania is composed of sixty-seven counties.
The number of judge elected in a county depends upon its population. This analysis began by identifying Pennsylvania counties that satisfied three conditions. First, pre-sentence covariates were examined to identify counties in which the 1999 randomization process achieve balance on observable covariates 7 . Second, sentencing outcomes were examined across time to identify counties in which the judiciary demonstrated stable sentencing practices. Third, conditional on satisfying the previous two requirements, counties in which there was statistically significant variation across judge in the use of confinement, whether in the form of jail or prison, were selected 8 . This process lead to the identification of six counties that satisfied that satisfied the selection criteria -Centre, Crawford, Cumberland, Dauphin, Erie, and Mercer Counties. 9 Figure 1 displays a county map of Pennsylvania with these six counties highlighted in red. *********************************************** Figure 1. Map of Pennsylvania with Selected Counties ************************************************ 7 Balance on observables is shown in section 5.1 As mentioned above, in these six counties 6,515 offenders were convicted of a criminal offense in the Court of Common Plea during 1999 and had their sentencing information forwarded to the Pennsylvania Sentencing Commission 10 . The randomization of cases occurs when the case is docketed 11 , which is prior to conviction. Thus, we do not observe cases that were randomly assigned to judges but did not result in conviction. From the 6,515 offenders convicted in these six counties, we set aside data from 110 (1.7%) offenders who were sentenced by a judge who did not sentence at least 100 offenders in 1999. This restriction was made to ensure that the observed sentencing outcomes reflect the judge's true underlying tendency to mete out incarceration. We use the information contained in these 6,405 offenders' cases to verify that the 1999 randomization achieved the desired level of balance in the pre-sentence and case disposition covariates.
The pre-sentence covariates and case disposition measures used in this analysis were supplied by the Pennsylvania Sentencing Commission (PASC). PASC data allows the tracking of cases from sentencing through release or entry into the correctional system, depending on sentencing outcome. At the case level, PASC data documents the county of adjudication, the judge of record, and the number of charges in each case. PASC data also records basic demographic information including age, sex, and race. Additionally, the data permits observation of each offender's prior record score 12 , and the number of prior adjudications and convictions for 25 separate offense categories (e.g., number of prior burglary offenses, number of prior rape offenses, number of prior 10 "Not all sentences are reported to the Commission. 1) Philadelphia Municipal Court sentences are not reported to the Commission. These may include DUI (driving under the influence) offenses as well as other misdemeanor offenses. 2) Offenses sentenced by district magistrates are not reported to the Commission. These typically include DUI offenses or other misdemeanor offenses. 3) Murder 1 and Murder 2 offenses, which are subject to life or death mandatory sentences, do not fall under the sentencing guidelines and are not required to be reported to the Commission. The Commission encourages reporting of the Murder 1 and Murder 2 offenses; many are reported and are included in the data collection" (Pennsylvania Sentencing Commission, 1999) 11 We are currently in the process of determining the precise randomization mechanism used in each of the six counties. 12 Prior record score is a numeric variable calculated by PASC which aims to encapsulate the offender's entire prior criminal history. felony drug convictions). At the charge level, PASC data allows observation of the offense classification for each charge and any mitigating or aggravating circumstances.
For the purposes of estimating the effect of incarceration, we then restrict our sample to the 6,127 offenders for whom we could locate valid correctional and arrest data. This restriction resulted in the removal of 282 (4.6%) offenders for whom either no rap sheet data could be located or for whom the correctional outcome was inconsistent with the sentencing data 13 .
In this work, we measure reoffending by rearrest rate in 1, 2, 5, and 10 years after sentencing.
To generate these rates, we use rearrest in the State of Pennsylvania as measured by Pennsylvania State Police rap sheet data. This rap sheet data allows us to observe any arrest that occurred in the state of Pennsylvania between the date of sentencing and April 30, 2010.
With respect to the calculation of our outcome measure, one point merits further discussion.
Studies of the effect of imprisonment on reoffending, including analyses conducted by the authors (Nieuwbeerta, Nagin, and Blokland, 2009;Snodgrass, et al., in press) routinely correct for expose time-time not incarcerated-in calculating rearrest rates or time to rearrest. The rationale for the exposure time correction is to avert contamination of the behavioral effect of incarceration on reoffending with incapacitation effects.
In this analysis we do not correct for exposure time. Our changed stance on correcting for exposure time is reflective of several considerations. Because incarceration follows randomization, incarceration should be viewed as a consequence of treatment and as such should not be statistically controlled. Suppose, for example, incarceration exacerbates criminality. As a consequence, individuals who are initially incarcerated, on average, will commit more crimes and thereby, will be more vulnerable to further stints of incarceration. Because their greater vulnerability to incarceration is a result of their treatment status, it should not be statistically controlled. Post-treatment assignment adjustments for exposure time, including for the initial "treatment-status" incarceration, also creates imbalances in age across treatment status. This is a very serious potential threat to identifying the treatment effect because recidivism is highly age dependent (Nagin, et al., 2009) with older adults offending at substantially lower rates than younger adults. By correcting for exposure time, incarcerated offenders are older and, hence, less likely to offend than those that are not incarcerated. This relationship between incarceration and offending, however, is a result of the aging that takes place during incarceration rather than "the effect" of incarceration 14 . Finally, not correcting for exposure time produces a treatment effect estimate that is more relevant from a policy perspective because it measures how many additional (fewer) offenses are incurred by society in the next t years due to the use of imprisonment.

METHODS
We use the instrumental variables approach advanced by Rosenbaum and colleagues (Imbens and Rosenbaum, 2005;Rosenbaum, 1996Rosenbaum, , 1999Rosenbaum, , 2002aRosenbaum, , 2002b to estimate the effect of imprisonment on 1 year, 2 year, 5 year, and 10 year rearrest rates. We begin with a general overview of the methodological approach and conclude with a more technical discussion. We rely on the randomization of cases to judges within county as the basis for using judge as an instrument to identify the effect of incarceration. Like all instrumental variables techniques, this approach uses the variation in treatment induced by the instrument to identify the effect of treatment. In our application, this requires that the judge to whom the individual is randomized must impact the likelihood that an offender is incarcerated, net of the impact of other factors. The use of judge as an instrument also requires that the judge to whom an individual is randomized impacts the likelihood of rearrest only through his/her effect on the likelihood of incarceration. These requirements are sometimes referred to as an exclusion restriction. These two requirements play a central role in both the classical approaches to instrumental variables found in the econometric literature (e.g., Angrist, Imben, and Rubin, 1996) and the approach used here.
The first requirement, that the instrument induces variation in treatment net of the impact of other factors, can be resolved empirically. In this application, we demonstrate that this requirement is satisfied by examining differences across judges in their tendency to use incarceration as a punitive sanction. We demonstrate that such variation exists across judges in section 5.2. This approach suffices to demonstrate treatment variation across judges due to the properties of randomization. In particular, randomization guarantees that case and offender characteristics, whether measured or unmeasured, are equivalent across judges in a county.
Hence, any variation in the use of incarceration must be attributable to a judge effect.
The second exclusion restriction requirement is an assumption that can be argued, but not empirically verified. In our application, the exclusion restriction requires that judges have no impact on the likelihood of recidivism beyond their impact on the likelihood of incarceration.
This assumption is quite reasonable given the very limited interaction between judge and offender in most circumstances. However, it is possible that this condition could be violated. For example, a stern admonishment from the bench may deter (or exacerbate) future criminality in a subset of offenders 15 . Alternatively, an informal request by the judge that local law enforcement more closely watch a given offender, thereby increasing the likelihood that s/he is observed engaging in criminal activity, would constitute a violation of the exclusion restriction. While these scenarios are possibilities, conversations with criminal justice practitioners in Pennsylvania indicate that they rarely occur.
As discussed above, this work uses the judge to whom a case is randomized as an instrument to identify the effect of incarceration. Our identification approach also rests on an important additional assumption -the effect of incarceration is homogeneous 16 and additive.
These two assumptions play a critical role in the estimation strategy described below. To see why, consider the extreme case where one judge incarcerates his/her entire caseload and another judge incarcerates none of his/her caseload. If the treatment effect is additive and homogenous this implies that in expectation the difference in the rearrest rate of their respective caseloads will be , where measures the difference in the recidivism rate in a sanction regime in which all individuals are incarcerated versus a sanction regime in which none are incarcerated 17 . Because the treatment effect is assumed homogenous this implies that in the less extreme case where the difference in the probability of imprisonment between the harsh and lenient judge is Δ, in expectation the difference in the rearrest rate of their caseloads will be Δβ.
The approach taken here uses the following line search to generate a confidence set for the parameter β. The analyst first proposes a minimum value for β. For those that were incarcerated, this proposed value is subtracted from their observed rearrest rate, call 15 However, respect for authority is not traditionally a hallmark of the criminally involved. 16 This is a strong assumption, but one that can be relaxed. If their exists treatment effect heterogeneity, but judges do not sort offenders into prison based on the offender's return from prison, then this approach estimates the mean of the treatment effect distribution. See, for example, Heckman, Schmierer, and Urzua (2010). 17 Under the assumptions used here, β can also be interpreted as the individual-level effect of incarceration. this the adjusted rearrest rate 18 . Next, a test for the equality of mean adjusted rearrest rate is conducted across all judges in the county. If the test concludes that there is statistically significant variation across judge in mean adjusted rearrest rate, then this proposed value of β is rejected. The test is then repeated for a larger value of β until a value of β is found that generates mean equivalence in adjusted rearrest rate across judges. This value of β forms the lower bound of the confidence set of β. This testing process continues for successively larger values of β until the largest β that generates mean equivalence in adjusted rearrest rate is found.
This largest β forms the upper bound of the confidence set. If the set contains zero, then this implies that β cannot be signed.
To formalize the previous discussion, the i-th offender is randomly assigned to the j-th judge in county k. As previously discussed, we use six counties in this analysis, so . Henceforth we suppress the notation of k.
is the judge to which the i-th offender is randomly assigned. We assume that the number of judges in a given county is fixed at z, so The i-th offender will be sentenced by the randomly assigned judge to either incarceration, , or released to the community, . This individual has a fixed potential response to treatment-the individual would be rearrested at rate if sentenced to prison (i.e., ) and rate if released to the community (i.e., ). We model individual response to incarceration as, However, since , but never both, we only observe only one element of the of the pair ). Namely, the realized rearrest rate, .
Because is assumed to be constant across i, if its value was known the adjusted rearrrest rate would be The adjusted rearrest rate would not depend on whether the individual was incarcerated. Hence, in expectation, the adjusted rearrest rate would be invariant across judges who make differential use of incarceration. We do not, however, know . To estimate , the line search algorithm described above is used. To see why this approach is valid, suppose the proposal value is , where Then the adjusted rearrest rate is .
Thus, the adjusted rearrest rate would continue to depend on whether the individual was incarcerated. Consequently, the adjusted rearrest rate would vary across judges who make differential use of incarceration 19 . If the variation across judge in adjusted rearrest rate is statistically significant, then is not a plausible estimate of β.
As noted above, in order to generate our confidence intervals we combine the observed judge effect with a model. Put differently, the way that we interpret the observed judge effect is driven by our model, and our model may be incorrect. Our model assumes a constant, additive treatment effect. If the effect of incarceration varies across offenders, this may pose a significant problem for the approach taken here. How consequential such a violation would be depends on whether judges can discern the distribution of individual-level treatment effects. If judges cannot distinguish the offender-level effect of incarceration, then the approach outlined above estimates the mean of the distribution of incarceration effects. However, if the effect of incarceration varies across offenders and judges can discern the offender-level return from incarceration and this information is used to guide the sentencing decision, then our interpretation of the observed judge effect as the average treatment effect no longer holds. See Manski and Nagin (1998) for a demonstration of how such judge discernment capability can substantially affect the bound on the treatment effect of incarceration. The effect crucially depends upon how the judge incorporates this information into the sentencing decision.
It should also be noted that this analysis considers only the effect of the in/out decisionit does not consider the impact of time served. That is, our model assumes that the doseresponse relationship between time served and rearrest rate is invariant with respect to time served. If the effect of incarceration varies as a function of time served in a meaningful way, then our estimate represents the dose-response function integrated out with respect to the density of time served. Our model also does not consider the context of incarceration. It assumes the effect of incarceration in a state prison is the identical to the effect of incarceration in a county jail, and it further assumes that the conditions of confinement (e.g., security level of the facility, distance from friend and family) do not influence the impact of incarceration. Although the current work ignores these considerations for the sake of clarity and tractability, they are important considerations.

RANDOMIZATION
As stated above, our methodological approach relies heavily on the randomization of cases to judges. It is randomization which guarantees that the judge to whom a case is assigned is unrelated to either the characteristics of offender or the offense. Without this property, the methodological approach taken in this work would not suffice to identify any casual effect. To see why, suppose two judge violated randomization by employing a trading scheme in which one judge exchanged a small number of serious cases for a large number of less serious cases with the second judge. If the tendency to recidivate is greater for those accused of serious offenses, then we would expect the judge receiving the more serious offenses to have greater average recidivism. However, this difference in recidivism would be attributable to differences in case load characteristics of the judges rather than the effect of incarceration. If the characteristics defining the swapped cases were measured, such a trading scheme would be detectable by comparing the distribution of offense characteristics between the two judges. For this reason we carefully check for differences in observed case characteristics across judges to test whether, at least based on measured covariates, randomization appeared to have successfully achieved balance. On the whole, there is little evidence against the randomization hypothesis in the six counties used in this analysis.
However, even with flawless adherence to a valid randomization procedure, substantively important covariate imbalance may persist after a single randomization. Moreover, a hypothesis test at the significance level α will incorrectly reject the null hypothesis of balance with probability α.
We observe 42 different measurable characteristics related to offender demographics, current offense severity, and extent of prior criminal offending 20 . Many of these characteristics have been repeatedly found in the criminological literature to be predictive of the sentencing decision and of recidivism. We, thus, expect to find imbalance in about 4 measured covariates simply by chance. In half of our counties there is more balance than we would expect to observe by chance. In the remainder, we observe one or two more covariates out of balance than one would expect. Table 1 through Table 6 show the 42 covariates used in this study, the mean level of each covariate by judge, and whether there existed a statistically significant difference across judge. Table 7 shows the number of covariates out of balance in each county.

VARIATION IN INCARCERATION
21 Where seriousness is defined as Offense Gravity Score (OGS).
As discussed in the methods section, in order to use judge as an instrument to estimate the effect of incarceration, we must first demonstrate that there exists substantial variation across judges in their willingness to use incarceration as a punitive sanction. Since randomization is conducted at the county level, all analyses are conducted within county. The blue bars in Figures 5 through 10 show the proportion of offenders sentenced to a period of incarceration by each judge. As shown in Table 2, there is statistically significant evidence of variation in the use of incarceration in five of the six counties used in this analysis.

ESTIMATED EFFECT OF INCARCERATION
Having demonstrated both covariate balance that is consistent with randomization and substantial inter-judge variation in the use of confinement, we now examine evidence of the effect of incarceration on 1 year, 2 year, 5 year, and 10 year rearrest rates. To do so, we apply the model developed in Section 4 to each of the six counties used in this work for each of the four outcomes.
Again, the main result from this approach is a 95% confidence interval for the estimated effect of incarceration. Hence, we estimate twenty four 95% confidence interval. These are shown in Table   3. If the interval falls wholly below zero, then incarceration reduces subsequent criminality. If the interval falls entirely above zero, then incarceration exacerbates subsequent criminality. If the interval contains zero, then we cannot sign the effect of incarceration. No significant variation across judge in average rearrest rate implies 0 will be contained in our confidence interval. Put differently, if we are unable to detect variation across judge in average rearrest rate, then our data does not sign the effect of incarceration. To see why, observe that no variation across judge in rearrest rate despite large differences in the use of confinement means despite observing . This clearly implies that there is insufficient evidence in the data to conclude that β 0. ************************************ Insert Table 9 County-Specific C.I. Estimates About Here ************************************ In Centre county, there was no statistically significant evidence of variation across judges in their willingness to use incarceration as a punitive sanction. Put differently, the instrument is very weak in Centre county. Consequently, our ability to detect an effect in Centre county is seriously compromised. An adequate model should indicate this, and our does. As shown in Table 3, our confidence interval for the effect of incarceration in Centre county is simply . In essence, no variation in the use of confinement means that , so for any value of . Stated differently any value of is consistent with the data.
In the remaining five counties, there was a clear judge effect. Therefore, our instrument should aid in the estimation of the effect of incarceration. In Crawford county, despite differences in the willingness to use incarceration, there is little variation across judges in the average rearrest rate of their caseloads. Not surprisingly then, all confidence intervals contains zero. There is no evidence that incarceration impacts the rate at which offenders would be rearrested in the next year, the next two years, the next five years, or the next ten years. In Crawford county, our results indicate that, with high probability, exposure to incarceration could increase the rate at which offenders are arrested in the next ten years by as much 0.18 arrests per year or could decrease the rate at which offenders are rearrested in the next ten years by up to 0.48 arrests per year.
Similarly, in Cumberland county there was again statistically significant evidence of a judge effect, but relatively little variation across judge in average rearrest rates. Based on the point estimates shown in Figure 7, average rearrest rates in the first year are slightly lower for judges who incarcerated a greater share of the offender they sentenced. This pattern, however, is no longer evident by the tenth year after sentencing. Thus, in Cumberland county, point estimates are consistent with a mild incapacitation effect. However, all confidence intervals again contain zero.
Consistent with the point estimates, our confidence interval for the effect on one year rearrest rates, (-1.48, 0.18), is more heavily weighted toward a suppressing effect of incarceration. However, even in the first year after sentencing we are unable to distinguish the effect from 0. For the ten year window, ambiguity concerning the sign of the effect persists with the 95% CI covering the interval -0.42 to 0.23.
In Dauphin county, there was large and statistically significant variation across judge in the use of incarceration. Despite this, there is very little variation across judges in the rearrest rates of their caseloads. Not surprisingly then, all confidence intervals for the estimated effect of incarceration includes zero. For the 1 year window after sentencing, at the upper bound, incarceration is estimated to increase rearrest by as much as 0.22 arrests. However, at the lower bound, incarceration is estimated to decrease rearrest by 0.28 arrests. Similarly, for the ten year window the 95% confidence interval, ranging between -0.05 and 0.10, provides no evidence for signing the effect of incarceration on rearrest rates.
There is little evidence that incarceration impacts rates of rearrest in Erie county. In Figure   9, there is no clear association between rates of confinement and rates of rearrrest. This holds independent of the duration over which rearrest is measured. As would be expected given this, all

COUNTY-POOLED ESTIMATE OF THE EFFECT OF INCARCERATION
The confidence intervals thus far presented have implicitly allowed the effect of incarceration to vary freely across county. Although the notation has been suppressed, we have allowed to be indexed on k. This approach allows for a great degree of flexibility since each county's confidence interval is generated without appealing to data from other counties. While quite flexible, this estimation strategy suffers from an important limitation-degradation of statistical power.
Statistical power has important scientific consequences for this analysis. When statistical power is low, even an otherwise well designed study may fail to detect a large effect. In this study, concerns of statistical power can be reframed as concerns about the width of the previously presented confidence intervals. As was noted above, each of the county-specific confidence intervals contain zero. That is, they do not allow us to either sign the effect of incarceration on recidivism or to determine if the effect is small whatever its sign. However, it could be the case that if the number of offenders sentenced in each county in 1999 was increased the confidence intervals would narrow sufficiently to exclude zero, thereby signing the effect of incarceration.
Although increasing the number of offenders in 1999 is not feasible, we can increase power by carefully pooling data across counties. To do so, we make an additional assumptionthe effect of incarceration is the same across all six counties used in this analysis. To see why such an assumption allows us to pool data, we briefly revisit the model developed above.
Henceforth, we no longer suppress the notation on county. At the true, but unknown, value of , the adjusted rate of rearrest of individual i sentenced in county k is . In other words, the adjusted rearrest rate is simply the rate at which the offender would have been rearrested if not exposed to incarceration. This rate is a fixed property of the individual, and hence, is balanced across judge by randomization. Put differently, is balanced across Z within K at the true value of . To generate our county-specific confidence intervals we use the previously defined line search to check if is balanced across Z within K. The confidence interval is the set of that generate balance.
We can re-express this in a conceptually equivalent way through the use of a fixed effect linear model. Suppose we have a fixed effect linear model of the following form.
Without loss of generality, assume that we omit judge 1 (i.e., Z=1) in county k. In this linear model, . In other words, at the true value of , captures the average adjusted rearrest rate for offenders in county k, which is simply the average rearrest rate for offenders in county k if none had been exposed to incarceration. Also note that . Stated less formally, measures how the averge adjusted rearrest rate differs between the z-th judge and the omitted judge. Randomization implies that this is balanced at the true value of . Hence, at the true value of . The fixed effect model can then be used to generate a confidence set by conducting the line search previously defined but substituting a test of the null in the fixed effect model for a test of mean equality using ANOVA. The two approaches will generate virtually identical confidence intervals.
The added value of the fixed effects conceptualization is that can be immediately extended to accommodate multiple counties, thereby allowing the data to be pooled. Again, we layer on the additional assumption that the effect of incarceration does not vary across county.
When including multiple counties, the model will take the following form, As before, at the true value of , each county-specific measures the average rearrest rate of the offenders in the county had they not been exposed to incarceration. Given that there are likely idiosyncratic county-level factors that inform the tendency to recidivate (e.g., differing access to social services, differing labor markets and employment opportunities, differing distributions of socio-economic status), there is good reason to believe that the county-specific s will not all be equal. This is not problematic. Also as before, the measure the difference in the average adjusted rearrest rate between judge Z and the omitted judge. At the true value of , . To generate our joint confidence intervals which make use of all data, we use the line search previously outlined but test the null in the fixed effects linear model. If we fail to reject this null hypothesis, then the proposal value is retained in the confidence interval.
************************************ Insert Table 10 Pooled C.I. Estimates About Here ************************************ Table 10 presents the cross-county pooled confidence intervals for each of the four postrelease observations windows. As would be expected given our gain in statistical power and the nature of the estimation strategy, the cross-county confidence intervals are substantially tighter than those generated using only information from a single county. Similar to the findings from the county-specific confidence intervals, zero is contained in the 1year, 2year, and 10 year pooled confidence intervals. Put differently, when considering the effect of incarceration on recidivism there is again no evidence that incarceration impacts the rate at which offenders will be rearrested in the next 1, 2, or 10 years. With high probability, exposure to incarceration reduces the number of rearrests by no more than one-quarter of an arrest and increases the number of rearrests by no more than one-eighth of an arrest in the year after sentencing.
Similarly, our pooled confidence interval indicates that in the two years after sentencing, incarceration reduces the number of yearly rearrests by no more than one-sixth of an arrest and increases the number of yearly arrests by no more than one-eighth of an arrest. When looking at the confidence interval employing the longest observation window, exposure to incarceration changes the yearly rate of rearrest by less than one-tenth of an arrest.
While these intervals are substantially narrower than those found in the county-specific analysis, they are still relatively wide when compared to the average rearrest rate. The average rearrest rate, across all six counties, in the first year after sentencing was 0.21 arrests, while our confidence interval was 0.36 arrests wide. In the first two years after sentencing, offenders sentenced in the six counties considered here average 0.19 rearrests per year, while our confidence interval for the estimated effect of incarceration is 0.27 arrests per year wide.
Similarly, in the first ten years after sentencing, the average rearrest rate observed in the six counties used in this analysis was 0.17 arrests per year, while our confidence intervals had a width of 0.15 rearrests per year. Put differently, pooling data substantially narrowed our confidence intervals, but the effect is still imprecisely estimated when measured against the tendency to recidivate.
The confidence interval examining the effect of incarceration on the tendency to recidivate in the five years after release merits further discussion. We could find no real value that was consistent with our data, thus producing an empty confidence set for the effect of incarceration in the five years after sentencing. There is more than one interesting explanation for this. First note that, given our estimation strategy, we should not be surprised that most of our pooled confidence intervals are essentially the intersection of the county-specific confidence intervals. When looking at the county-specific confidence intervals for the effect of incarceration on five year recidivism, the intersection of the county-specific confidence intervals is quite small, (0.00, 0.02). Given this small intersection and the improved statistical power, it is not at all surprising that no value consistent with the data could be found.
Alternatively, this could be that an indication that our assumption of a common treatment effect across counties is untenable. Although the evidence is quite fragile, this finding may indicate that the effect of incarceration depends on the community from which the offender is drawn. Just as it would not be surprising if the tendency to offend varied by county, it might not be surprising to find that the effect of incarceration varies across county. Again, however, the empirical support in this analysis for such conjecture is very weak.

DISCUSSION AND CONCLUSIONS
On the whole, the results provide no indication of whether the experience of incarceration increases or decreases reoffending rate. This holds whether we observe rearrest during a very short window (one year) or a long window (ten years). The result holds across all six counties and persists even after pooling data to increase power. This result is quite consistent with an emerging body of work that uses randomization as the basis for concluding that incarceration has no clear impact on recidivism. Our results are very similar to Green and Winick (2010), Berube and Green (2007), and Loeffler (2011) who exploit the randomization of cases to judges to make a determination of the effect of incarceration on future offending. Similarly, using the same data analyzed here, Anwar and Stephens (2011) find no evidence that duration of confinement impacts criminality 22 . All arrive at the same conclusionthe data are not informative about even the sign of the effect. Further, compared to the base rearrest rate, the confidence intervals on the effect size in our analysis are generally sufficiently wide that it is not particularly informative about whether the effect size is small whether positive or negative.
We earlier argued why we believe this analysis advances the earlier work based on the randomization of cases to judges. In particular, rather than relying on standard IV regressions, we develop an individual-level model of the effect of incarceration and show how this model allows us to trace out the impact of incarceration. Importantly, the approach applied in this paper makes it clear when the instrument is weak and provides results that are reflective of the fact that with a weak instrument it will be difficult to precisely identify a treatment effect.
Still, our study suffers from several important limitations. First, like all IV analyses, it is model based. Thus, the validity of our results depends on the tenability of our model. As was discussed in detail in Section 4, our model assumes we have a valid instrument and that the effect of incarceration is constant and additive. We further assume that, at the individual level, the decision to incarcerate is unrelated to the effect of incarceration on offending 23 . While we conclude that the assumptions needed to use judge as an instrument are likely met in this application, we also believe that the assumption of an additive and constant treatment effect is more fragile. It would thus be valuable to examine the sensitivity of our conclusions to alternative formulations such as those posed in Manski and Nagin (1998) that assume that judges can discern to some degree individual-level response to incarceration and acts upon that knowledge in sentencing decisions.
One way that our model did incorporate the possibility of effect heterogeneity was through the impact of county. Because we estimated the effect of incarceration independently in each of the six counties, the effect could vary freely across county. Despite this flexibility, we were unable to differentiate the effect from zero in any county. Notwithstanding, many arguments concerning the effect of incarceration are built on the interaction between offender and community 24 . If there is variation in the way that communities respond to offenders, there is likely to be variation in the effect of incarceration. If the goal is the development of sound public policy, then understanding the dynamic that exists between communities and those released from prison is an interesting and important avenue for future work.
While allowing the effect to vary across county afforded flexibility, it also degraded power.
As in all empirical work, power plays a central role in the capability to distinguish the effect of incarceration. This work used a sample of 6,127 offenders. However, when analyses were conducted at the county level, sample sizes ranged from 501 in Centre county to 1,988 in Dauphin county. These cases were randomized within county to between two judges (in Crawford county) and seven judges (in Dauphin county). To confront this power issue, data was pooled to estimate cross-county confidence intervals. Despite the improved power, we remained unable to uncover a relationship between incarceration and the tendency to recidivate.
It should also be noted that in this study we observe convictions, not cases. Randomization takes place when a case is docketed. Thus, judges have the opportunity to filter cases prior to our ability to observe them. Our concerns about this limitation are assuaged for two primary reasons.
First, based on measurable characteristics, there is little difference across judges in the types of cases that progress to conviction. Second, based on conversations with court officials, rates of conviction in these counties are quite high, often above 90%. Thus, we fail to observe only a relatively small proportion of cases.
On the whole, the literature on the effect of incarceration is developing rapidly both in size and sophistication. This work aids in both of these respects, while also echoing the conclusions of the modern literature. There is little persuasive evidence that incarceration reduces future criminality.  is the number of offenders sentenced by the judge who were used in the randomization checks.
is the number of offenders sentenced by the judge who were used in the analysis estimating the effect of incarceration, is the number of offenders sentenced by the judge who were used in the randomization checks.
is the number of offenders sentenced by the judge who were used in the analysis estimating the effect of incarceration, is the number of offenders sentenced by the judge who were used in the randomization checks. is the number of offenders sentenced by the judge who were used in the analysis estimating the effect of incarceration, is the number of offenders sentenced by the judge who were used in the randomization checks. is the number of offenders sentenced by the judge who were used in the analysis estimating the effect of incarceration, is the number of offenders sentenced by the judge who were used in the randomization checks.
is the number of offenders sentenced by the judge who were used in the analysis estimating the effect of incarceration,

County Number
Centre 3 Crawford 4 Cumberland 6 Dauphin 5 Erie 6 Mercer 4 Note: Balance examined across 42 covariates. At the 0.1 level, we would expect to observe approximately 4 out of balance covariate.