Evaluating Education Programs That Have Lotteried Admission and Selective Attrition

We study the effectiveness of magnet programs in an urban district that ration excess demand by admission lotteries. Differential attrition arises since students who lose the lottery are more likely to pursue options outside the school district than students who win the lottery. When students leave the district, important outcome variables are often not observed. The treatment effects are not point-identified. We exploit known quantiles of the outcome distribution to construct informative bounds on treatment effects. We find that magnet programs improve behavioral outcomes but have no significant effect on achievement.

subscribed magnet programs in an urban district. While debates surrounding the effectiveness of other school choice options such as charter schools and educational vouchers have attracted much attention from researchers and policy makers, magnet programs have gotten less attention despite the fact that they are much more prevalent than charter schools or educational voucher programs. Most urban school districts typically operate a variety of magnet programs that are popular and, therefore, oversubscribed.
Many school districts use lotteries to determine access to oversubscribed educational programs. Lottery winners are accepted into the program, with the ultimate choice of attendance left to the student and his family. Lottery losers do not have the option to participate in the program but have many different outside options. As a consequence, lottery losers often decide to pursue options outside of the traditional public school system and attend charter or private schools. Educational outcomes are often not observed for students who leave the school system, which creates a missing data problem. If attrition rates differ by lottery status, the randomization inherent in the lottery assignment is not necessarily sufficient to identify meaningful treatment effects. However, we can still identify and estimate informative bounds on treatment effects under fairly weak assumptions about the nature of the attrition problem. 1 Lottery-based admission can be viewed as an experimental design with multiple sources of noncompliance that arise from parental or student decisions. We focus on two of the most important outside options: parents can send their children to a nonmagnet program within the district or they can leave the school district and send their children to a private school or a public school in a different district. We model this behavior as noncompliance with the intended treatment using latent household types. Our approach builds on the work of Angrist, Imbens, and Rubin ð1996Þ and adds two additional latent types to the framework to deal with multiple sources of noncompliance. Differential attrition arises in our framework if there exists a household type that complies with the lottery and participates in the program if it wins the lottery but leaves the district if it loses the lottery. We denote these households as "at risk" since they are at risk of leaving the district.
Our findings show that approximately 25% of applicants to magnet programs that serve elementary school students are at risk. That suggests that Pathak, Chris Taber, Ken Wolpin, Jeff Wooldridge, Tiemen Woutersen, and participants at numerous conferences and seminars. We would also like to thank the "midsized urban school district" for sharing its data. Financial support for this research is provided by the Institute of Education Sciences ðIES R305A070117 and R305D090016Þ. Contact the corresponding author, Holger Sieg, at holgers@econ .upenn.edu. Data are available as supplementary material online.Data are available 1 Selective attrition may also arise when lottery winners who initially participate in the program drop out because they experience unfavorable outcomes. magnet programs help to attract and retain students. These households come from neighborhoods that have higher incomes and a higher fraction of more educated households than neighborhoods preferred by households that stay in the district regardless of the outcome of the lottery. The at-risk households have many options outside the public school system, but apparently they view the existing magnet programs as desirable programs for their children. The market for elementary school education is more competitive than the market for middle and high school education; that is, the fraction of households at risk declines with the age of the students.
Since the fraction of at-risk households is significantly larger than zero, we face a missing data problem. We do not observe educational outcomes when these households leave the district after they do not win the lottery. As a consequence, we cannot point-identify the causal effect of magnet programs on achievement or other behavioral outcomes. 2 We therefore focus the remainder of the analysis on estimating informative bounds on these treatment effects. One prominent approach that is often used in the partial identification literature relies on "worst-case" scenarios to construct bounds for treatment effects ðManski 1990Þ. Horowitz and Manski ð2000Þ provide a framework that exploits the assumption that the support of the outcome variable is bounded to deal with nonrandom attrition. We follow this approach but use known quantiles of the outcome distribution to create worst-case scenarios. In particular, we use the distribution of state test scores and the districtwide distribution of offenses, suspension days, tardies, and absences. 3 To our knowledge, we are the first to use these ideas to bound treatment effects of educational programs.
We find that our bounds analysis is informative and demonstrates that magnet programs offered by the district improve behavioral outcomes such as offenses, attendance, and timeliness. Our findings for achievement effects are mixed. While the point estimates of the upper and lower bounds point to positive treatment effects, sample sizes are still too small to provide precise estimates. This is largely the case because standardized achievement tests were conducted only in grades 5, 8, and 11 during most of our sample period.
The rest of the article is organized as follows. Section II provides a brief review of the literature. Section III discusses identification and estimation of treatment effects when program participation is partially determined by 2 If there are two different types of compliers, the instrumental variable estimator does not identity a local average treatment effect. In a related paper, Heckman, Urzua, and Vytlacil ð2006Þ also consider multiple unordered treatments with an instrument shifting agents into one of the treatments. 3 We also implement Lee's ð2009Þ approach, which uses sample trimming rules to construct informative bounds. Note that this approach is closely related to that of Zhang and Rubin ð2003Þ. lotteries and selective attrition cannot be ignored. Section IV provides some institutional background for our application and discusses our main data sources. Section V reports the empirical findings of our article. Finally, we offer some conclusions and discuss the policy implications of our work in Section VI.

II. Literature Review
Our article is related to a growing literature that evaluates educational programs using lottery-based estimators. 4 Lotteries were used by Rouse ð1998Þ to study the impact of the Milwaukee voucher program. Angrist et al. ð2002Þ also study the effects of vouchers when there is randomization in selection of recipients from the pool of applicants using data from Colombia. Hoxby and Rockoff ð2005Þ use lotteries to study Chicago charter schools. Cullen, Jacob, and Levitt ð2006Þ have analyzed open-enrollment programs in the Chicago Public Schools. Hastings, Kane, and Staiger ð2010Þ estimate a model of school choice based on stated preferences for schools in Charlotte, North Carolina. Since school attendance was partially the outcome of a lottery, they use the lottery outcomes as instruments to estimate the impact of attending the first-choice school. Abdulkadiroglu et al. ð2009Þ and Hoxby and Murarka ð2009Þ study charter schools in Boston and New York, respectively, and find strong achievement effects. Dobbie and Fryer ð2009Þ study a social experiment in Harlem and show that high-quality schools or high-quality schools coupled with community investments generate the highest achievement gains. All of these papers focus on applications in which selective attrition is not present and therefore do not explicitly deal with the key selective attrition problem discussed in this article. 5 Currently, there are a number of approaches that have been proposed in the econometric literature that deal with selection and attrition problems. Heckman ð1974, 1979Þ explicitly models an outcome and a selection equation. 6 In some applications, there exists an exogenous variable that affects the selection but not the outcome equation. These types of exclusion restrictions can be used in both parametric and semiparametric estimation techniques. 7 In our application, there are no obvious exclusion restrictions.
When exclusion restrictions are not available, one can often construct bounds for the treatment effects. Horowitz and Manski ð2000Þ provide a 4 Angrist ð1990Þ introduced the use of lotteries to study the impact of military service on earnings. 5 Angrist et al. ð2002Þ encounter a related issue of selective test participation since students in private schools are more likely to take college entrance exams than public school students. 6 An alternative approach follows Rubin ð1976Þ and assumes that data are missing at random, after conditioning on a set of observed variables. 7 Some well-known examples are Heckman ð1990Þ, Ahn andPowell ð1993Þ, andDas, Newey, andVella ð2003Þ. general framework for dealing with nonrandom attrition that exploits the assumption that the support of the outcome variable is bounded. As we discussed in detail in the introduction, we follow this approach but use known quantiles of the outcome distribution to create bounds on the treatment effect. 8 An alternative approach is based on the principal stratification method that is popular in the statistics literature ðFrangakis and Rubin 2002Þ. This approach classifies individuals in latent groups according to the joint values of the potential outcome variables. For example, in a standard selection model, there are four latent groups that are implicitly defined by the potential outcomes of the employment decision in the treated and untreated state. 9 Our approach also relies on latent types and builds on Angrist et al. ð1996Þ to account for the multiple sources of noncompliance.

A. The Research Design
We consider a research design that arises when randomization determines eligibility to participate in an educational program. A parent has to decide whether or not to enroll a student in a magnet program offered by a school district. 10 We consider only households that participate in a lottery that determines access to an oversubscribed magnet program. Let W denote a discrete random variable that is equal to one if the student wins the lottery and zero if he loses. Let w denote the fraction of households that win the lottery.
We assume that a student who wins the lottery has three options: participate in the magnet program; participate in a different, nonmagnet program offered by the same school district; or leave the district and pursue educational opportunities outside the district. A student who loses and is not an always-taker has only the last two options. Let M be one if a student attends the magnet program and zero otherwise. Finally, let A denote a random variable that is one if a student attends a school in the district and zero otherwise.
To model compliance with the intended treatment, we define five latent types to classify households into compliers and noncompliers. 11 8 Blundell et al. ð2007Þ develop bounds for the quantiles of the treatment distribution rather than using an extreme quantile of the outcome distribution to bound the average treatment effect. 9 In a recent application, Barnard et al. ð2003Þ study the effect of school choice on test scores, and Zhang, Rubin, and Mealli ð2009Þ evaluate the Job Corps training program. 10 We use the terms "parent" or "household" to describe the decision maker and "student" to describe the person who participates in the program. 11 Appendix B discusses the assumptions needed to derive these five types from the 16 possible types. DEFINITION 1.
1. Let s m denote the fraction of "complying stayers." These households will remain in the district when they lose the lottery. If they win the lottery, they comply with the intended treatment and attend the magnet school. 2. Let s n denote the fraction of "noncomplying stayers." These households will remain in the district when they lose the lottery. If they win the lottery, they will not comply with the intended treatment and instead will attend a nonmagnet program in the district. 3. Let l denote the fraction of "leavers." These are households that will leave the district regardless of whether they are admitted to the magnet program. 12 4. Let r denote the fraction that is "at risk." These households will remain in the district and attend the magnet program if admitted to the magnet program, and they will leave the district otherwise. 5. Let a t denote the fraction of "always-takers." They will attend the magnet school regardless of the outcome of the lottery.
Since the household type is latent, one key empirical problem is identifying and estimating the proportions of each type in the underlying population. These parameters are informative about the effectiveness of magnet programs in attracting and retaining households that participate in the lottery. Moreover, we will show that households at risk cause the selective attrition problem.
The latent types of households are likely to differ in important characteristics, and we need to characterize these differences. If households at risk differ among observed characteristics from the other latent types, they may also differ by unobserved characteristics. As a consequence, ignoring the selective attrition problem will be problematic. By characterizing the observed characteristics of all latent types, we can thus gain some important insights into the potential importance of the selective attrition problem.
To formalize these ideas, consider a random vector X that measures observed household characteristics such as income or socioeconomic status. 12 Parents have incomplete information and need to gather information to learn about the features of different programs. Parents have to sign up for lotteries months in advance. At that point, they have not accumulated all relevant information. Once they have accumulated all relevant information, they may decide to opt out of the public school system if their preferred choice dominates the program offered by the district. In addition, household circumstances may change. For example, parents may obtain a job that requires moving to a different metropolitan area. Note that there are typically no penalties for participating in the lottery and declining to participate in the program.
Appealing to our decomposition, let m r , m sm , m sn , m l , and m at denote the means of the random vector X conditional on belonging to group r, s m , s n , l, and a t , respectively. Below we discuss how to identify and estimate the parameters ðw, r, s n , s m , l, a, m r , m sn , m sm , m l , m at Þ.
Let T be an outcome measure of interest, for example, the score on a standardized achievement test. Following Neyman ð1923Þ and Fisher ð1935Þ, we adopt standard notation in the program evaluation literature and consider a model with three potential outcomes: where T 1 denotes the outcome if the student attends the magnet school, T 0 if he attends a different program in the district, and T 2 if he attends a school outside of the district. 13 We will later assume that T is not observed for students who do not attend a public school within the district; that is, T 2 is not observed. This assumption is plausible since researchers typically have access to data from only one school district. Private schools rarely provide access to their confidential data and often do not administer the same standardized tests as public schools. Attention, therefore, focuses on the individual treatment effect D 5 T 1 2 T 0 . Note that D is unobserved for all students. Conceptually, we can define five different average treatment effects, one for each latent group: 14 ATE Type 5 E½T 1 2 T 0 jType 5 1; Type ∈ fS n ; S m ; R; L; A t g: The key research question is then whether we can identify and estimate these types of treatment effects when selective attrition is important. To answer this question, we first discuss how to characterize the extent of the selective attrition problem. We then derive bounds estimators for the relevant treatment effects. Finally, it is useful to compare our approach to the one developed in Angrist et al. ð1996Þ. Note that we have two types of "never-takers" that we denote by "noncomplying stayers" and "leavers." Similarly, we have two types of "compliers" that we denote by "complying stayers" and "at-risk" 13 This approach shares many similarities with the "switching regression" model introduced into economics by Quandt ð1972Þ, Heckman ð1978, 1979Þ, and Lee ð1979Þ. Heckman and Robb ð1985Þ and Bjorklund and Moffitt ð1987Þ treated heterogeneity in treatment as a random coefficients model. It is also known in the statistical literature as the Rubin model developed in Rubin ð1974, 1978Þ. See also Heckman and Vytlacil ð2007Þ for an overview of the program evaluation literature. 14 There are other effects that may also be of interest such as treatment effect on the treated or the marginal treatment effect. For a discussion, see, among others, Heckman and Vytlacil ð2005Þ and Moffitt ð2008Þ.
households. The main difference arises because individuals have more than one outside option and outcomes are not observed for at-risk households that leave the district when they lose the lottery.

B. Identification of the Fraction of Latent Types
First we need to establish the information set of the researcher. We observe probabilities and conditional means for the feasible outcomes shown in table 1. Note that only six of the eight outcomes listed in table 1 are possible since a student attending a magnet program ðM 5 1Þ must also attend a public school ðA 5 1Þ.
Identification can be established sequentially. First, we discuss identification of the probabilities that characterize the shares of the latent types. We have the following result. PROPOSITION 1. The parameters ðw, r, s n , s m , l, a t Þ are identified by the six nondegenerate probabilities in table 1.
Proof. Parameter w is the fraction that wins the lottery: Given w, s n is identified from ð1, 0, 1Þ: l is identified from ð1, 0, 0Þ: Given w and s n , s m is identified from ð0, 0, 1Þ: Given a t , l, s n , and s m , r is identified from the identity QED Note that there is no overidentification at this stage since the six probabilities in table 1 add up to one, and the last three nondegenerate probabilities add up to 1 2 w.
Next we discuss identification of the five conditional means of household characteristics. We have the following result.
PROPOSITION 2. Given ðw, r, s n , s m , l, a t Þ, the parameters ðm r , m sm , m sn , m l ,m at Þ are identified by the observed conditional expectations in table 1.
Proof. Parameter m l is identified from ð1, 0, 0Þ: Similarly, m sn is identified from ð1, 0, 1Þ: and m at is identified from ð0, 1, 1Þ: Given m sn , m sm is identified from ð0, 0, 1Þ: Given m sm and m at , m r is identified from ð1, 1, 1Þ: QED There is one overidentifying condition for each characteristic at this stage. Propositions 1 and 2 then imply that the parameters ðw, r, s n , s m , l, a t , m r , m sn , m sm , m l , m at Þ are identified. We can thus study the effectiveness of magnet programs to attract and retain students. Moreover, the fraction of households that are at risk is the key parameter that measures the selective attrition between lottery winners and losers. Analyzing at-risk households is also important for the district and policy makers. Many urban districts have struggled in the past to retain students from higher-socioeconomic backgrounds. Magnet programs are perceived to be one possible solution to this problem. It is therefore important to quantify the impact of magnet programs on household retention in the district.

C. Identification of Treatment Effects
We now turn to the analysis of identification of causal treatment effects of magnet programs on educational and behavioral outcomes. We assume that the researcher observes outcomes, T, only for students who remain in the school district; that is, we do not observe outcomes for leavers and atrisk households that lose the lottery. As a consequence, we face a missingdata problem in the analysis.
It is useful to assume initially that we observe the latent household type. Table 2 provides a summary of the relevant conditional expectations. 15 Conditioning on lottery outcomes, there are 10 conditional expectations. Three of these pertain to outcomes that are not observed since students in these latent groups leave the school district ðT 2 Þ. The remaining seven conditional expectations relate to household types that remain in the district.
From table 2, it is evident that even if we observed the latent types, there is little hope in identifying ATE Sn , ATE R , ATE L , or ATE At . For stayers who never attend the magnet program, we cannot identify E½T 1 jS n 5 1. For students at risk, we cannot identify E½T 0 jR 5 1. For leavers, we cannot identify either E½T 1 jL 5 1 or E½T 0 jL 5 1. For always-takers, we never observe E½T 0 jA t 5 1. Without imposing additional assumptions on the selection of students into latent groups, ATE Sn , ATE R , ATE L , and ATE At are not identified. Attention, therefore, focuses on identification of ATE Sm .
This treatment effect is of interest to policy makers since the complying stayers account for the majority of students who attend magnet schools at any point in time. Our estimates suggest that 60%-70% of all the students in our sample of applicants to magnet programs and approximately 70%-80% of all attending students fall into that category. The school district and 15 Note that we are implicitly assuming that the mean performance of stayers who would decline lottery admission is the same whether they win or lose the lottery, i.e., E½T 0 jS n 5 1; W 5 1 5 E½T 0 jS n 5 1; W 5 0 5 E½T 0 jS n 5 1: policy makers are obviously interested in finding out whether the magnet programs improve outcomes for the majority of students who are attending the program.
Note that ATE Sm would be identified if types were not latent. Of course, household types are not observed, and as a consequence, identification of ATE Sm is not straightforward. One key result of this article is that the local average treatment effect for compliers is not point-identified if there is selective attrition.
Consider the case in which there is selective attrition ðr ≠ 0Þ. We only observe mean outcomes for the students conditional on W, M, and A. For students who win the lottery and attend the magnet school, we observe For students who lose the lottery and attend the magnet school, we observe We also observe mean performance of stayers who lose the lottery: Finally, we also observe the mean performance of stayers who win the lottery and decline to enroll in the magnet program: E½TjW 5 1; M 5 0; A 5 1 5 E½T 0 jS n 5 1: Equations ð16Þ and ð17Þ imply that we can identify E½T 0 jS m 5 1 and E½T 0 jS n 5 1 since s n and s m have been identified before. Equation ð15Þ implies that we can identify E½T 1 jA t 5 1. However, equation ð14Þ then implies that we cannot separately identify E½T 1 jS m 5 1 and E½T 1 jR 5 1.

Always-Takers
This result illustrates that attrition per se is not the problem. If the fraction of at-risk households is negligible ði.e., r 5 0Þ, identification is achieved even if the fraction of leavers is large. 16 The lack of point identification arises from the at-risk households, which cause the selective attrition problem. Selective attrition is a problem only if at-risk households have different mean outcomes than compliers. 17 Since point identification is no longer feasible when selective attrition is not negligible, attention focuses on set identification and the construction of bounds.
i. Suppose we have an upper bound, denoted by T u 1 , for E½T 1 jR 5 1; that is, T u 1 satisfies E½T 1 jR 5 1 ≤ T u 1 . We can then construct a lower bound for E½T 1 jS m 5 1 and ATE Sm . ii. Suppose we have a lower bound, denoted by T l 1 , for E½T 1 jR 5 1; that is, T l 1 satisfies E½T 1 jR 5 1 ≥ T l 1 . We can then construct an upper bound for E½T 1 jS m 5 1 and ATE Sm .
Proof. Consider the first part of the statement. Equation ð14Þ then implies that E½T 1 jS m 5 1 5 s m 1 r 1 a t s m E½TjW 5 1; M 5 1; A 5 1 2 rE½T 1 jR 5 1 1 a t E½T 1 jA t 5 1 s m ≥ s m 1 r 1 a t s m E½TjW 5 1; M 5 1; A 5 1 where the last inequality follows from E½T 1 jR 5 1 ≤ T u 1 . Since all terms in the last inequality are identified, we conclude that we can construct a lower bound. We therefore define our lower bound as the parameter value for which equation ð18Þ holds with equality. 16 Recall that if r 5 l 5 0, our research design simplifies to the one considered in Angrist et al. ð1996Þ. 17 We can generalize this result by assuming that E½T 1 jS m 5 1; X ≠ E½T 1 jR 5 1; X, i.e., by conditioning on some observables X. If controlling for selection on observables is sufficient to deal with the selection problem, a matching approach can be justified. For a discussion of matching estimators, see, among others, Rosenbaum and Rubin ð1983Þ, Heckman, Ichimura, and Todd ð1997Þ, and Abadie and Imbens ð2006Þ.
Replacing T u 1 with T l 1 and reversing the inequality yields the upper bound. QED We thus obtain a lower and an upper bound for E½T 1 jS m 5 1. These bounds are sharp in the sense that without additional information we cannot improve on them. It is easy to see that for any value between the lower and upper bounds, there is a data-generating process such that T l 1 ≤ E½T 1 jR 5 1 ≤ T u 1 that is consistent with the observed means in equation ð18Þ. This follows essentially from the fact that equation ð18Þ is linear in the observed means.
There are different ways of constructing lower and upper bounds depending on the outcome variable. A plausible assumption for the construction of an upper bound of the mean treatment effect is that the at-risk households are at least as good as the compliers, T l 1 5 E½T 1 jS m 5 1 ≤ E½T 1 jR 5 1. A better approach that we explore in this article is to bound outcomes using known percentiles of the outcome distribution. These types of aggregate distributions are often available in applications in education at the state level, as we discuss in detail in the next section.
Alternatively, we use a trimming approach as suggested by Lee ð2009Þ. This approach is applied in our context by first ordering magnet students from lowest to highest performance on the outcome variable being studied. Then treatment observations are dropped from the sample on the basis of both the proportions of missing data in the control and treatment groups and the distribution of the outcome variable being bounded.
We have seen that selective attrition implies that we have to focus on the construction of bounds since point identification is not feasible. It is therefore important to have a simple test to determine whether r is zero. If r 5 0, treatment effects are point-identified and can be estimated using standard linear instrumental variable ðIVÞ estimators. A simple way to estimate r is to regress A i on W i . The slope coefficient in that regression is equal to r. At minimum, researchers who work with lottery data in educational applications should run this regression and test whether one of the key identifying assumptions of the IV estimator is valid. If we reject the null that r is equal to zero, the bounds analysis suggested in this article is more appropriate than IV estimation.

D. A Generalized Method of Moments (GMM) Estimator
Let v denote the parameter vector that includes the fraction of the latent types, the means of the characteristics, and the lower and upper bounds of the various treatment effects. Suppose that we observe a random sample of N applicants to an education program, indexed by i. We view these as N independent draws from the underlying population of all applicants to this program. Let W i , M i , A i , and X i now denote the random variables that correspond to observation i. The proofs of identification are constructive.
Replacing population means by sample means thus yields consistent estimators for the parameters of interest. It is useful to place the estimation problem within a well-defined GMM framework. This allows us to estimate simultaneously all parameters and compute asymptotic standard errors. We can estimate the fractions of each latent type on the basis of moment conditions derived from the choice probabilities in table 1. Define where v 0 denotes the true parameter value. Similarly, we can estimate the mean characteristics of each type. Define and note that Finally, we can construct additional orthogonality conditions to construct both upper and lower bounds. Consider first the case of estimating an upper bound for compliers, denoted by E½T u 1 jS m 5 1, by setting the lower bound for treatment for at-risk E½T 1 jR 5 1 types equal to compliers ðE½T 1 jS m 5 1Þ, denoted by T l 1 . Define 1s n E½T 0 jS n 5 1Þ T i W i ð1 2 M i ÞA i 2 ws n E½T 0 jS n 5 1; 8 > > > > > > > > < > > > > > > > > : Similarly, we can construct an orthogonality condition for the lower bound if we use the 95th percentile outcome for T u 1 . This value comes from statelevel data for test scores and from our sample of nonmissing data at the district level for all other outcomes. Combining all orthogonality conditions, we can estimate the parameters of the model using a GMM estimator ðHansen 1982Þ.
The main advantage of the GMM framework is that we can estimate all parameters jointly by imposing all relevant orthogonality conditions. Moreover, it is straightforward to obtain standard errors for the upper and lower bounds using a GMM framework. For each household characteristic we add, we obtain one overidentifying restriction. We implement our GMM estimator using a standard two-step approach, where the optimal weighting matrix is estimated on the basis of a consistent first-step estimator. Standard errors can be computed using the standard result for optimally weighted GMM estimators. 18 There are different approaches to construct confidence intervals for partially identified models. One approach constructs a confidence set that includes each element in the identified set with fixed probability. The second approach constructs a confidence set that contains the entire identified 18 Many of the parameters of the model-especially all parameters that characterize the fraction of latent types-can be estimated using linear estimators. An appendix is available on request that shows exactly how to set up the linear estimators. set with fixed probability. 19 In our application, the identified set can be described by a set of closed intervals. As a consequence, we report only point estimates for the end points of these intervals and estimate standard errors for these estimates.
Thus far we have considered the problem of estimating causal effects using data from one lottery. In practice, researchers often need to pool data from multiple lotteries to obtain large enough sample sizes. We discuss in detail in appendix A of this article the problems that are encountered when aggregating across lotteries. Using a suitable weighting procedure, we show that we can estimate weighted averages of the underlying parameters of the model. Weights can be chosen in accordance with the objectives of the policy or decision maker.

IV. Data
We focus on magnet programs that are operated by a midsized urban school district that prefers not to be identified. Magnet schools emerged in the United States in the 1960s. Magnet schools are designed to draw students from across normal attendance zones. In contrast, a feeder school typically admits only students who live inside the attendance zone. As a consequence, the composition of feeder schools reflects residential choices of parents and is largely driven by the composition of local neighborhoods.
Magnet schools were initially used as a way to reduce racial segregation in public schools. More recently, magnet programs have been viewed as attractive options to increase school choice, to retain students with higher socioeconomic backgrounds in public schools, and to increase student achievement. In some cases, magnet programs are housed in separate schools. But they can also be a program within a more comprehensive school. Magnet programs offer specialized courses or curricula. There are magnet programs for all grade levels in our district. We consider only magnet programs that are academically oriented. These magnet programs typically provide specialized education in mathematics, the sciences, languages, or humanities. Other magnet programs have a broader focus on topics such as international studies or performing arts.
Every academic year, interested students submit applications for one magnet program of their choice. Some magnet programs in the district have a competitive entrance process, requiring an entrance examination, interview, or audition. We do not include these magnet programs in this study since the admission procedure does not use randomization. Instead we focus on magnet programs that do not have competitive entrance procedures. If the number of applications submitted during registration for any magnet pro-gram exceeds the number of available spaces, the district holds a lottery to determine the order in which applicants will be accepted.
Many of the magnet schools in our district are vocational in nature. These programs are always undersubscribed and are not included in our study. Nearly every academically oriented magnet school held at least one binding lottery over the course of the study and is included in our sample.
In the case of oversubscription, a computerized random selection determines each student's lottery number. The lottery is binding in the sense that students with lower numbers are accepted and higher-numbered students are rejected. There is a clear cutoff number that separates the groups. 20 We do not observe students attending magnet schools who lose the lottery; that is, there are no always-takers in our sample.
To preserve racial balance in the magnet programs, separate lotteries are held for black students and other students. Some programs also have preferences for students with siblings already attending the magnet programs or for students who live close to the school. Separate lotteries are held for those students with an acceptable preference category for each magnet program. All in all, each lottery is held for a given program, in a given academic year, separately by race, and, finally, separately by preference code.
Lottery winners ðlotteried-inÞ have the option to participate in the magnet program, with the ultimate choice of participation left to the student and his family. Lottery losers ðlotteried-outÞ do not have this option and thus must make their schooling choice without the availability of the magnet option. With a fair and balanced lottery, the winners and losers will be determined by chance, thus creating two groups that are similar to each other on both observable and unobservable characteristics.
The district granted us access to its longitudinal student database. We use data from the 1999-2000 school year through 2005-6. In addition to demographic data, the database contains detailed information about educational outcomes. This information is linked to each student by a unique identification number. The demographic characteristics for the students include race, gender, free/reduced-lunch eligibility, and addresses. To be eligible for free lunch, households must have income below 130% of the poverty line. Reduced-lunch eligibility requires income below 185% of the poverty line. 21 Using the addresses, we can assign census tract-level variables to each student. We use two community characteristics that measure the socioeconomic composition of the neighborhoods in which students 20 Strictly speaking, the win probabilities depend on the ordering of students on the wait list. However, these effects are probably small. As a consequence, the literature ignores these issues. 21 The race variable is one if a student is African American and zero otherwise. The gender variable is one for girls and zero for boys. reside. Poverty is the percentage of adults in the student's census tract with an income level below the poverty line. Education is the percentage of adults in the student's census tract with at least a college degree.
As pertaining to student educational outcomes, the database includes the school of attendance in each year and standardized scores for the state assessment tests. In addition, we observe a variety of behavioral outcome measures such as offenses, suspensions, and absences. The district has a code of student conduct that classifies two types of offenses, conveniently labeled level 1 infractions and level 2 infractions. Level 1 infractions are those of a less serious nature that do not necessarily pose a threat to the health, safety, or property of any person. These include truancy and class cuts, minor class disruption, teasing, refusal to participate in class, refusal to comply with staff directives, inappropriate language, and littering. Staff handle and correct level 1 offenses on their own without informing higherlevel administrators. Level 2 infractions are those of a serious nature that may pose a threat to the health, safety, or property of any person. These include disruption of school, damage of school property, assault of a school employee or another student, weapons or drug possession, sexual harassment, academic dishonesty, bullying, and fighting. Staff are required to notify an administrator when a level 2 offense occurs. The administrator is then charged with the completion of an investigation and subsequent determination of consequences for the offender. Disciplinary action can include in-school suspensions, out-of-school suspensions, alternative education placement, and expulsion. These requirements hold for every school in the district at all levels, regardless of magnet designation or not.
In our data, the number of suspension days due to each offense is listed, and a very small minority reveal zero suspension days for the incident. Over 60% of the observations show 1 suspension day and 96% show 3 days or less of suspension. There is not any explicit description of the infraction, nor is there any clarification as to whether the offenses are level 1 or level 2. However, in light of the writing and required notification policy explained in the code of student conduct and the fact that nearly all of these incidents result in at least 1 day of suspension, we believe that the events we call offenses in the data files are level 2 infractions. In other words, these are highly disruptive and definitely nonconducive to the educational environment. They are relatively extreme behavioral problems that would be properly identified, in the same way, in every school.
The database also contains the outcomes of the magnet lotteries. We do not observe test scores or behavioral outcome measures for students outside of the district. Table 3 shows descriptive statistics for the entire sample used in this study as well as three important subsamples that we also consider in estimation. 22 We consider only binding lotteries in this research. In total, over the time frame of the data, there are 173 binding lotteries with 1,269 students lotteried-in and 785 students lotteried-out.
Before we implement the estimators, we check whether the lotteries are balanced on student observables. While assignment within lotteries may be random, participation in a lottery is not. To make use of within-lottery randomness and not the between-lottery nonrandomness, we perform a check for balance by running a lottery fixed-effect regression for each observable characteristic as a dependent variable with acceptance as the only independent variable other than the fixed effects. Separate lotteries are held by race, so race is left out of the balance analysis. We test every other observable student characteristic in the data set.
Following Cullen et al. ð2006Þ, we use equation ð19Þ to determine whether the lottery is balanced: where X i is the observable characteristic of interest, W i is a dummy equal to one if student i wins lottery j, I ij is an indicator variable equal to one if student i participated in lottery j, and v i is the error term. We estimate a separate regression for each observable. The coefficient b 1 determines the fairness of the lottery system. If we cannot reject the null hypothesis that it is equal to zero, then acceptance into a magnet is not determined by the value of that particular student observable, X i . The first column of table 4 shows the results when all students in all binding lotteries are included in the regressions. The coefficient b 1 is not significant for any tested variable at 10%. The second through fourth columns contain the other subsamples of interest. We find that the estimates of b 1 are not significantly different from zero.
In addition to the tests reported in table 4, we have also implemented joint tests using seemingly unrelated regressions. The p-values of the corresponding F-tests are .933 for the full sample, .704 for the elementary schools, .816 for the middle schools, and .522 for the high school subsample. We thus conclude that the joint tests fail to reject the null hypothesis that all coefficients are zero. We thus find that the lotteries are fair, creating separate winner and loser groups that are similar in observed characteristics. Any differences between winners and losers are small and statistically insignificant. This holds for the overall population in binding lotteries and for the smaller subsamples that were tested.
The design of the preferences codes in the admission process implies that there is no variation of race within lotteries. The lottery fixed effects will capture the effects of race. We therefore cannot conduct the standard balance analysis for race. To get some additional insights into differences among racial groups, we computed win percentages by race and report them in table 5.
The district is approximately 55% black. In addition, nonblack students live in areas with better neighborhood schools, even within the same dis- trict, somewhat mitigating their interest in magnet schools. As a consequence, there are many more black applicants in our study. Black applicants have lower overall win percentages.

A. Attraction, Retention, and Selective Attrition
To study the importance of selective attrition in our sample, we implement a number of different estimators. First, we use a GMM estimator that imposes only the orthogonality conditions that identify the fraction of latent household types. Then we add the orthogonality conditions that can be used to estimate the mean characteristics of the types. The characteristics include race, gender, free or reduced lunch, poverty, and college education. Recall that the last two measures are based on neighborhood characteristics as reported by the US Census. We report estimates for three samples, which include all students who applied to an oversubscribed magnet program that is associated with an elementary school, middle school, and high school, respectively. We pool across all lotteries in each sample and therefore use the weighted estimator discussed in appendix A. Tables 6 and 7 report the point estimates and estimated standard errors for each of the three samples.
Comparing the estimates in the upper and lower panels of table 6 clearly allows us to evaluate whether there are efficiency gains that arise when using a GMM estimator. 23 We find that there are significant efficiency gains in the estimates of two key parameters, the fraction of compliers and the fraction at risk. Estimated standard errors are up to 50% larger when one ignores the additional orthogonality conditions. Our framework does not generate many overidentifying restrictions. For each household characteristic we add, we obtain one overidentifying restriction. This is the case be- cause we observe the conditional expectation of X for six observed types. But these conditional expectations are functions of the means of the five latent types. Hence, we have little reason to believe that we are suffering from small-sample problems that can arise when the number of orthogonality conditions is too large. We thus conclude that our approach of jointly estimating the model using GMM is preferable to simpler methods.   Table 6 reveals some interesting new insights into the importance of selective attrition in our application. Recall that the fraction of households at risk is the key parameter that captures selective attrition. We find that selective attrition is substantial and ranges between 12% and 25% across our three samples. We also find that the majority of students will stay in the district regardless of the outcome of the lottery. The majority, 61%-71%, will attend the magnet program if they win the lottery. The fraction of households that will leave the district regardless of the outcome of the lottery ranges between 4% and 8%. Overall, these results suggest that most households consider the magnet programs desirable. We conclude that magnet programs are effective tools for attracting and retaining households and students.
Equally interesting are the observed mean characteristics of the latent types of households reported in table 7. These and the ones reported in the lower part of table 6 are the results from the first and second set of orthogonality conditions ð f 1 and f 2 Þ. For each characteristic, the differences across household types ðat risk, leavers, stayersÞ are statistically significant. We find that at-risk households are, on average, less likely to be African American and to be on free-or reduced-lunch programs than households that are stayers. Moreover, they come from better-educated neighborhoods. 24 These differences are more pronounced at the elementary school level, where the fraction of at-risk households is the greatest. We thus conclude that magnet programs are effective devices for the school district to retain more affluent households. Not surprisingly, the leavers are the most affluent group and come from neighborhoods with the highest levels of education. These households may just apply to the magnet programs as a backup option in case their students should unexpectedly not be admitted to an independent, charter, or parochial school. 25 The demographic differences, summarized above, between at-risk students and stayers drive our assumptions on the bounds. Poor minority students are known to perform poorly in school compared to wealthier majority peers ðDobbie and Fryer 2009Þ. Therefore, our upper-bound estimation assumes that the performance of at-risk students is only as good as that of the stayers, while the lower-bound estimation assumes that the at-risk students are in the 95th percentile of the outcome distribution.
Tables 3 and 7 permit interesting comparisons across grade levels. From table 3, we see that elementary and middle school lotteries are somewhat more competitive than high school lotteries. The former have average win 24 Note that the differences in household characteristics are statistically significant from zero at all conventional levels. 25 It could also be that these households left the district because of job transfers or other issues unrelated to schools. rates of 52% and 53%, respectively, while the latter have an average win rate of 77%. Table 7 provides information about types by grade levels. We see that elementary programs attract a clientele from more highly educated neighborhoods. The fraction of African American families is also lower among applicants to elementary school lotteries. Not surprisingly, we find that the fraction of at-risk families and the fraction of leavers are also higher among elementary school students. These findings highlight the fact that, among the magnet school applicants, the market for elementary school education is more competitive than the market for high school education.

B. Treatment Effects
We have seen in the previous section that the fraction of at-risk households is large and significantly different from zero in our application in all three samples. Moreover, households that are at risk of leaving the district have more favorable socioeconomic characteristics than other types except for leavers. As a consequence, we conclude that selective attrition cannot be ignored in this application. Since treatment effects are set-identified only when selective attrition matters, we implement our bounds estimators. For comparison purposes, we also report the IV estimates that ignore selective attrition.
We start our analysis by focusing on achievement effects. The main problem encountered in this part of the analysis arises as a result of missing data. This is largely the case because standardized achievement tests were conducted only in grades 5, 8, and 11 during most of our sample period. For our middle school sample, there are only 155 observations for which we have prior test scores. For the high school sample, the reduction is of similar magnitude. 26 Ultimately, we have test score outcomes for 213 middle school students ðeighth-grade examÞ and 203 high school students ðeleventhgrade examÞ. Table 8 summarizes our main findings using standardized test scores in reading and mathematics as outcome variables.
We find that the point estimates of the upper and lower bounds point to positive treatment effects, but sample sizes are too small to provide precise estimates. While few researchers would advocate the use of the simple IV estimator in the presence of selective attrition, it is useful to compare the results of our bounds analysis with the IV approach. One surprising finding is that the simple IV estimates suggest statistically significant positive treatment effects. Our bounds analysis reveals that this inference is not correct.
We next turn our attention to behavioral outcomes measured a year after the lotteries were conducted. 27 The main advantage of studying these outcomes is that we do not face the data limitations that we encounter with test scores. Comprehensive records of four important behavioral measures are available: suspensions, offenses, absences, and tardies. Table 9 summarizes our main findings. Note that a negative treatment effect is a reduction in undesirable behavior and thus a good outcome. For elementary students, we find that magnet programs significantly reduce offenses and suspensions. There are no measurable effects on tardies and absences. We find that there are few significant treatment effects at the middle school level. The estimates themselves suggest that middle school magnet programs have a negative effect on offenses, no effect on suspensions, and possibly an increase in absences and tardies. Again, however, these estimates at the middle school level are generally not significant. For the high school sample, we find strong evidence that the magnet schools reduce absences and tardies while having no significant effects on offenses or suspensions. Comparing the IV estimates with the bounds, we find that the IV estimates are often of similar magnitude to our upper-bound estimates and have smaller estimated standard errors than the bound estimates.
We can derive the asymptotic limit of the IV estimator under our model specification. Setting the fraction of always-takers equal to zero ðas is true for this applicationÞ, we obtain b IV → s m 1 s n 1 r s m 1 r ðA 2 BÞ;   If r ≠ 0, then there is no reason to believe that the IV estimator will always be within the bounds provided in this article. We thus conclude that our bounds analysis is informative and demonstrates that magnet programs offered by the district improve behavioral outcomes. In particular, we find that offenses are significantly lower for elementary school students, while high school students have significantly better attendance and timeliness records. It is also important to note that the 95th percentile of all behavioral outcomes is zero except for high school absences ðwhere it is oneÞ. Thus our lower-bound estimates for nearly all behavioral outcomes are the most pessimistic possible since they attribute flawless behavior to all who leave the district.

C. Sensitivity Analysis
First, we investigate whether our results are sensitive to the choice of the percentile used to construct the lower bound. We can use the 90th or 99th statewide percentile on math and reading exams instead of the 95th percentile. The results are reported in table 10. We find that the results are qualitatively the same. We cannot reject the null hypothesis that the treatment effect is zero.
Next we consider the behavioral outcomes. The lower-bound estimates for the behavioral outcomes are exactly the same whether the 90th, 95th, or 99th percentile of the districtwide measures is used for tardies, offenses, and suspensions. In all cases, at all levels, the at-risk students who lose and leave are assumed to have no instances of any of these outcomes at all three percentiles. The lower-bound estimates for absences remain largely the same under all three percentile choices even though the assumptions for the at-risk types who lose and leave change a bit, moving up to three for the high school 90th percentile.
We also conduct two additional robustness checks. First, we find that our results are similar when we drop the five overidentifying conditions and implement an exactly identified estimator. For example, from the middle school sample, our estimate with ðwithoutÞ overidentifying conditions for the upper bound on offenses is 20.62 ð20.64Þ, on suspension days 20.22 ð20.23Þ, on absences 1.98 ð1.61Þ, and on tardies 3.04 ð3.02Þ. We also implemented the overidentified estimator for nonoptimal weighting matrices such as the identity matrix and again found insignificant differences. We thus conclude that these small-sample problems are not a problem in our application. Second, we explore heterogeneity in treatment. For middle school, there are 40 lotteries. Sixteen have win rates estimated above or equal to 0.5 and 24 have win rates less than 0.5. We can split the sample and determine separate treatment effects. From the full sample, our estimate for the upper bound on offenses is 20.62 with an associated standard error of 0.36. Using the subsample with low win rates, we obtain an estimate of 20.49 ð0.29Þ. For high win rates, the estimate is 20.13 ð0.25Þ. The results are statistically similar for both subsamples. This result generally holds for all school levels and all outcomes.
Finally, we consider Lee's ð2009Þ approach. Recall that one of the nice features of his estimator is that it does not require additional information. Instead it relies on trimming to construct an estimator for the lower and upper bounds of the treatment effect. It is therefore useful to implement this approach using the data from our application. Table 11 compares our estimates with those obtained from Lee's trimming method. 28 As we detail in appendix A, weighting is appropriate when estimating bounds using data from multiple lotteries. In implementing Lee's estimator, we do not weight lotteries by number of applicants. 29 Hence, the comparison in table 11 reflects both a difference in the approach to bounding and a difference in weighting, potentially confounding the two effects. For the outcomes considered in table 11, we have confirmed that the results from our weighted estimator are similar to those when we do not weight by lotter-  28 The results are similar for other outcomes analyzed in this article. The four outcomes were chosen for the following reason. We have a large sample for elementary school offenses. Our point estimates suggest that the magnet schools may reduce offenses. For tardies, our estimates suggest no effect. The sample size for high school math is small, and our estimates suggest no significant treatment effect. Finally, the sample for middle school math is also small, but our estimates suggest that there may be a positive treatment effect. 29 Lee's estimator has not yet been extended to estimate bounds when combining data from multiple lotteries, though it is surely possible to do so.
ies. This is not always the case, however. For example, for middle school reading, weighting by lotteries proves to be quite important. 30 Hence, it would be desirable in future work to extend the Lee estimator to weight lotteries. The two methods could then be compared on a common footing in applications with multiple lotteries. Table 11 suggests that the empirical results are similar, but there is at least one noteworthy difference. We find that our estimator provides tighter bounds estimates for the magnet treatment effects than the one proposed in Lee ð2009Þ in this application. Recall that our approach requires both additional data and additional assumptions to be valid. As a consequence, it is not that surprising that the approach proposed in this article sometimes yields tighter bounds. Table 11 also reports the trimming proportionsp for Lee's estimator for all outcomes. Note thatp is the trimming proportion and is defined just as in Lee's paper. We find that the trimming rates are much greater in our application than in Lee's application, wherep 5 0:068. The reason is that our proportion of nonmissing data between the control and treatment groups differs significantly since we never observe outcomes for those who leave the district. These students are exclusively contained in the control group since nobody can be in a magnet program yet outside of the district. The other main difference between our application and Lee's application is sample size. Lee reports over 3,000 observations in the treatment group before and after trimming. These samples are much larger than the ones in our application.

VI. Conclusions
We have studied the effectiveness of magnet programs in a midsized urban district. Our empirical results suggest that selective attrition cannot be ignored in our application. We find that magnet programs are useful tools that help the district to attract and retain students from middle-class backgrounds. Finally, we have also studied the impact of magnet programs on achievement and a variety of behavioral outcomes. Our findings for achievement effects are mixed. While the point estimates of the bounds point to positive treatment effects, sample sizes are too small to provide precise estimates. For a variety of behavioral outcomes, we do not face these data limitations. Our evidence suggests that magnet programs often improve behavioral outcomes.
We acknowledge that our results may not be broadly applicable to all magnet programs. This is a general drawback of many policy papers, where a clean experimental design can ensure internal validity but lacks external validity. Nearly every academically oriented magnet school held at least one binding lottery over the course of the study. If magnet schools are better than regular feeder schools, students have strong incentives to get into any magnet. In that case, some students apply to lower-quality magnet schools in the hope that they face less stiff competition in the lottery process. This gives rise to selection across magnet programs that may affect peer qualities and other endogenous features of the school.
In some studies, it has been possible to link district-level data with statelevel data. For example, it is possible to track students from the Boston school district, even if they leave the district, as long as they stay in Massachusetts. Moreover, researchers have been able to use statewide achievement tests as outcome measures. As a consequence, they have access to the same outcome measure for students who decided to stay in the district and who decided to leave. These data sets contain outcome measures for the atrisk types as well as the leavers if they stayed in the state. Unfortunately, the district that has provided us with the data is situated in a state that does not allow us to track students when they leave the district. That is quite common for other districts that have cooperated with researchers as well. Limited access to private school data is an even more pervasive problem.
As a consequence, we think that our framework applies to the vast majority of the potential applications. As better data-sharing arrangements become available, we will be able to expand our analysis to incorporate additional observables. Our framework permits estimation of the proportion of lottery applicants who remain in the district as a result of magnet admission. These students are denoted at risk in our framework. Hence, while we do not observe outcomes for students who leave the district, we are able to provide information about the retention effect of the magnet program. This is explicitly something that the district hopes the magnet programs achieve.

Appendix A Aggregation
For a given magnet program, a separate lottery is conducted for each grade, and within a grade, separate lotteries may be conducted for different groups of applicants ðe.g., by raceÞ. In such cases, sample sizes for individual lotteries may be relatively small, yielding lottery-specific estimates with low power. While outcomes for a particular lottery may be of interest, a district will typically be more concerned with evaluation at the program level rather than at the lottery level. Here we extend our analysis to permit investigation at the program level.
Suppose that there are j 5 1; : : : ; J lotteries governing access to a magnet program. A program may be a magnet school ðor perhaps set of magnet schoolsÞ serving a particular range of school grades. Let w j be the probability of winning lottery j, and, analogously to our previous notation, let a t; j , ' j , r j , s m; j , and s n; j be the proportions of latent types in lottery j.
Let N j be the number of applicants to lottery j and N 5 o j N j . The share of lottery j is then n j 5 N j =N. When we extend our previous notation, W ij equals one if applicant i to lottery j wins and zero otherwise, A ij equals one if applicant i to lottery j attends a school in the district and zero otherwise, and M ij equals one if applicant i to lottery j attends magnet school j and zero otherwise.
Let w 5 o j n j w j , a t 5 o j n j a t; j , ' 5 o j n j ' j , s m 5 o j n j s m; j , s n 5 o j n j s n; j , and r 5 o j n j r j . Thus, w, a t , ', r, s m , and s n are parameters denoting the share of each of the latent types at the program level. Our previous analysis applies to each lottery, establishing identification of a t; j , ' j , r j , s m; j , and s n; j for all j. The n j are known and nonrandom. Hence, w, a t , ', r, s m , and s n are identified. We therefore focus on estimation and inference at the program level. Consider the following: Proceeding analogously for other latent types, we obtain the orthogonality conditions below for estimating program-level parameters: ð1 2 W ij Þð1 2 M ij ÞA ij 1 2 w j 2 ðs n 1 s m Þ: Next, consider achievement. Let E½T 1; j jS m; j 5 1 denote the expected test score of a student who wins the lottery for program j and is a complying stayer. For simplicity let a t 5 0. Note that 1 N j o N j i51 T i W ji A ji M ji → w j fr j E½T 1j jR j 5 1 1 s mj E½T 1j jS m; j 5 1g: ðA3Þ Using the same logic above and pooling over lotteries implies that n j fr j E½T 1j jR j 5 1 1 s mj E½T 1j jS m; j 5 1g: Now suppose that we have an upper bound U for E½T 1j jR j 5 1 for all j, that is, U ≥ E½T 1j jR j 5 1 for all j. Hence o J j51 n j r j E½T 1j jR j 5 1 ≤ o J j51 n j r j U 5 rU: Hence we have constructed a lower bound for the weighted average of the treatment effect. Note that the weights depend on n j and s mj . Using a lower bound L such that L ≤ E½T 1j jR j 5 1 for all j yields an upper bound for the weighted treatment effect.
The weighting scheme we use is based on the win rates for each lottery. The win rates are themselves a function of the available seats and the number of applicants. This is similar to results in Frolich and Lechner ð2010Þ. Table B1 Potential Latent Types 1*