Class Size and Learning: Has India Spent Too Much on Reducing Class Size? Sandip Datta and Geeta Gandhi Kingdon

This paper examines the efficacy of class-size reductions as a strategy to improve pupils’ learning outcomes in India. It uses a credible identification strategy to address the endogeneity of class-size, by relating the difference in a student’s achievement score across subjects to the difference in his/her class size across subjects. Pupil fixed effects estimation shows a relationship between class size and student achievement which is roughly flat or non-decreasing for a large range of class sizes from 27 to 51, with a negative effect on learning outcomes occurring only after class size increases beyond 51 pupils. The class-size effect varies by gender and by subject-stream. The fact that up to a class-size of roughly 40 in science subjects and roughly 50 in non-science subjects, there is no reduction in pupil learning as class size increases, implies that there is no learning gain from reducing class size below 40 in science and below 50 in non-science. This has important policy implications for pupil teacher ratios (PTRs) and thus for teacher appointments in India, based on considerations of cost-effectiveness. When generalised, our findings suggest that India experienced a value-subtraction from spending on reducing class-sizes, and that the US$3.6 billion it spent in 2017-18 on the salaries of 0.4 million new teachers appointed between 2010 and 2017 was wasteful spending rather than an investment in improving learning. We show that India could save US$ 19.4 billion (Rupees 1,45,000 crore in Indian currency) per annum by increasing PTR from its current 22.8 to 40, without any reduction in pupil learning.


I. Introduction
Reducing class size has been a popular reform across countries in their search for improved quality of education, and many countries have legislated an official maximum class size. In India, at the secondary school level, official policy supports a class size of 30 i and the Right to Education (RTE) Act 2009 also stipulated a maximum class size of 30 in elementary schools, policies which necessitated the appointment of a large number of new teachers.
Between 2010 and 2017, total number of elementary teachers rose from 4,047,070 to 4,451,953, with a corresponding increase in the total teacher salary bill of these 0.4 million extra teachers, an increase of approximately US $3.6 billion per annum in 2017-18 ii As elsewhere, in India too class size is a vexed issue. Inadequate teachers and unfilled teacher vacancies are bemoaned by NGOs and in official documents, and frequently identified as the factor behind low student learning levels iii . India's draft New Education Policy (MHRD 2019, p.115) noted that "according to government data, the country faces over 10 lakh [one million] teacher vacancies". It resolved that "teacher vacancies will be urgently filled" (para P2.14, page 56) and recommended increasing the education budget for filling the vacancies (page 414, Appendix A1.4.4) iv . However, contrary to widely held belief of an acute teacher shortage, mean pupil teacher ratio (PTR) in public schools is much lower than the RTE-Act-mandated maximum of 30 in elementary schools (grades Against this background of increased public expenditure to reduce class-size vi , it is important to ask whether class size reduction improves student learning outcomes, i.e. whether the expenditure to reduce class size was an investment in better quality education or merely unproductive spending of scarce taxpayer money. It is known from the Annual Status of Education Report (ASER) (various years) and also from the National Council of Educational Research and Training (NCERT, (2015) that between 2010 and 2015, pupils' learning achievement levels fell, and that over the same period, PTR and class-sizes were also reduced, suggesting simplistically a perverse positive temporal relationship between class-size and pupil achievement, rather than the expected negative one. However, to our knowledge, there is no study that estimates the causal effect of class size on student achievement in India using micro i.e. individual-pupil level data.
Whether reducing class size improves student outcomes remains a contentious question in the literature. Proponents argue that class-size reductions lead to more individual attention, higher quality instruction, a broader scope for student-centred innovation and teaching, increased teacher morale, less student misconduct and more ease of involvement of students in academic activities such as group work. An extensive literature has sought to measure the causal impact of class size on student learning using a variety of estimation methodologies.
While a meta-analysis by Hanushek (2003) collating findings from 376 educational production functions found no consistent relationship between class-size and student achievement, and Hattie's meta-analysis (2005) demonstrated a typical effect-size that was considered "tiny" or "small" relative to other educational interventions, meta-analyses are questioned on the ground that they mix the studies with credible identification strategies with those that are not capable of yielding a causal inference.
A number of individual studies have used techniques to examine the causal effect of class size in different contexts. Krueger (1999) used the Randomised Control Trial (RCT) method in the STAR experiment in Tennessee; Angrist and Lavy (1999) used the 'Maimonides Rule' to estimate the effect of class size on student achievement in Israel, finding an exogenous source of variation in class size that is uncorrelated with student unobservables; Case and Deaton (1999) used an instrumental variables approach on South African data; Woessmann and West (2006) used TIMSS data on student performance in 11 countries, combining school fixed effects and instrumental variables to identify random class-size variation between two adjacent grades within individual schools; Altinok and Kingdon (2012) used a pupil fixed effects approach to examine the impact of class size in 47 countries using TIMSS data; Shen and Konstantopoulos (2019) used predicted class size based on Maimonides' Rule as an instrument, to measure the class size effect in four Eastern European countries. The typical finding in these studies is of a non-existent or small beneficial effect from reducing class size vii .
However, we would expect the impact of class size to be heterogeneous depending on grade level (e.g. primary versus secondary grades) and on the range of class-sizes. For example, in the Tennessee STAR experiment, reduction of class-size from 22-25 to a very low class-size of 14-17 students per class, and reduction in class size in early grades, and for disadvantaged students, produced short to medium run learning gains, but this says nothing about the impact of reducing class size from say 40 or 45 to 30 (the situation in most developing countries), or at higher (secondary) grades, or in the science and non-science subject streams.
Most of the studies on the impact of class size (that have a credible identification strategy) use data from developed countries where the range of class sizes is much smaller than the typical class sizes in most developing countries.
There are only a few studies on developing countries, where low student achievement is a growing concern. Altinok and Kingdon's (2012) study divided TIMSS data on 47 countries into three groups: developed countries, transition countries, and developing countries. They found a statistically significant negative relationship between class size and pupil achievement only in the developing country group, though the effect size was small: a 1 SD increase in class size in developing countries (by a large 10.9 pupils per class from a mean class size of 37.2) lowered student achievement by only 0.03 SD but this effect was fairly precisely estimated, with a t-value of 2.7. In a study on Bangladesh, Asadullah (2005) used instrumental variable (IV) estimation to find that class size in secondary grade had a perverse sign: the coefficient on class size was positive and statistically significant, i.e. reducing class size in secondary grades reduced pupil achievement, and would not be an efficient policy. Finally, a study by Banerjee et. al. (2006) in 175 government primary schools in two cities of India using RCT found that reducing class size had no impact on test scores, which they say is "consistent with the previous literature suggesting that inputs alone are ineffective".

II. The shape of the relationship between class size and pupil learning
It is useful to consider a priori the possible shape of the relationship between class-size and pupil learning, illustrated in Figure 1 viii . It is generally accepted that very young children are mostly best cared for in near one-on-one to veryfew-to-one-adult caregiver situations, and mostly not for 'cognitive' learning but for socio-emotional development and socialization that requires intimacy. This graph goes way negative as the experience of abusive situations (e.g. in some Romanian orphanages in the communist era) can lead to massive, lasting damage. Across all ages, the willingness to pay for one-on-one tutoring suggests that it is regarded as giving the highest learning gain per unit of child time, leading to a downward slope in the relationship between class size and learning, as shown in Figure 1 near the low class sizes. This depicts that schooling as we know it, is a technologically inferior but cheaper option (compared to one-to-one or small-group tutoring), i.e. a pragmatic compromise. The large approximately 'flat spot' in class size in Figure 1 depicts the idea that once homogenous instruction becomes the dominant mode -with lectures, readings and student homework -then it may not be important whether there are 30 or 50 children in the class. The kind of "stair step" to the functional form then has another fall when the class size is so large that either class discipline can no longer be maintained or all feedback is effectively lost, e.g. the teacher can no longer assess and make corrections to student work. The debate then is around where that "flat spot" starts, how flat or inverted it is, how wide it is (e.g. 20 to 45, or 30 to 50 etc.) and whether where that flat spot is (and how wide it is) varies by grade. It can be expected that the flat spot is narrower (the upper end is lower) the lower the grade; for example, it could be in low 30s (or lower) for grade 1, and be low 40s (or higher) for grade 12.
Implicit in the stair step functional form depicted in Figure 1 is the tacit notion that learning is gained exclusively/ primarily from a teacher, which may be more true for some subjects than others, for example, it may be more true for subjects where explanation of concepts is important for understanding, e.g. perhaps maths and science. It may be less true for languages and descriptive subjects where learning from peer-interactions may be helpful, and where consequently more students in a class (up to a point) may actually improve learning, leading to a functional form that is concave with respect to the horizontal axis -i.e. learning would first increase with class size and then fall.
Indeed, such a concavity may be found in the teaching of any (including science) subject if teachers use the lecture method without explaining, and students take recourse to and benefit from peer-learning. Figure 2 suggests that if most of the actual class-sizes are in the low range (near the first 'step'), the relationship between class size and pupil learning will be convex; if most of the actual class-sizes are in the high range (near the second kink or 'step'), the relationship will be concave; if most of the actual class-sizes are in the flat range, there will be no relationship between class size and pupil learning. We will test the hypotheses of Figures 1 and 2. Identifying the causal effect of class size on pupil learning Identifying the causal effect of class size on student achievement is challenging because of the potential non-random matching of pupils to schools and, within schools, the non-random matching of students to particular classes. If more able or more motivated students manage to sort themselves into the smaller classes then any expected negative effect of class-size on student achievement will be under-estimated, i.e. there would be a smaller negative (or even a positive) coefficient on class-size than the true negative relationship. Conversely, if schools deliberately put less able children into smaller classes, then any expected negative effect of class size would be over-estimated (the negative coefficient on class size would be a bigger negative than the true relationship), since small class here contains the less than averagely abled children. Any systematic correlation of the unobservables in the error term with the included class-size variable undermines the simple production function's ability to produce causal estimates.
While randomized experiments can in principle be used to fix the problem of non-random matching, in practice there are many problems, as noticed about the STAR experiment study by Krueger (1999). Participants may behave differently if they know they are part of an experiment, especially if the outcomes of the study might have implications for future school funding (Hoxby, 2000). Attrition into and out of small and large class assignments over time in the STAR experiment may have undermined the random allocation: Hanushek (1997aHanushek ( , 1997bHanushek ( , 1999Hanushek ( and 2003 pointed out that only half the participants remained in the study until the end of the third grade (Year 4).
Experiment-based studies are costly and other experimental studies are required to check the robustness of the findings of the STAR project (Todd and Wolpin, 2003). While findings from experiments are synthesised in Kremer (2003), the number of truly natural experiments are few (Rosenzweig and Wolpin, 2003).
Since true natural experiments are costly and rare, some researchers have addressed the endogeneity issue by using some valid instrumental variables for class-size, i.e. a class-size predicted by some exogenous variation. Angrist and Learning gain

Class size
With actual observations spanning the upper kink, quadratic will be concave If most data is within the flat spot, then no quadratic Data at low class-sizes would produce a convex quadratic Lavy (1999) used the hypothetical class-size predicted by (Maimonides') maximum class size rule ix as an instrument for class-size with Israeli schools data, and they obtained a significant effect of class size on student achievement.
However, in developing countries including India, even though the maximum class-size rule (of 30 students per teacher) exists, it is not closely followed, so generating a valid instrument for the developing countries is difficult.
In a similar kind of IV analysis, Woessmann and West (2006) estimated the effect of class size on student achievement in 11 countries by combining the school fixed effect and IV techniques. They found no effect of classsize in nine countries and a large and significant effect only in Greece and Iceland.
In the current paper, we follow a pupil fixed effects approach to estimate the causal effect of class size on student achievement, using data on students of secondary grade 12 from ten different schools of a private school chain in Uttar Pradesh. While using the traditional achievement production function, we allow for pupil fixed effects in cross-section data, as used in Altinok and Kingdon (2012), using across-subject differencing rather than across-time differencing. This methodology is possible as we have data on each student's marks (at one time point) in different subjects, and this enables us to control for all subject-invariant student unobservables. Thus cross-section data allows us to investigate whether the within-pupil variation is class size is associated with within-pupil variation in learning achievement. Students face different class-sizes for different subjects and this permits us to ask whether the class-size in different subjects is correlated with students' marks in the different subjects within the grade in the school. The idea is identical to the panel data estimate of the achievement production function: we estimate the within-pupil across-subject equation of achievement production function rather than within-pupil across-time. The estimation technique is explained in the next section. A similar estimation technique is used in Dee (2005)

IV. Estimation Approach
We have adopted the pupil fixed effects approach described in Altinok and Kingdon (2012). The standard achievement production function is specified as follows: where the achievement level (Aik) of student i of school k is determined by the vector of his/her personal characteristics (X) and by school specific characteristics (S). μi and ηk capture the student and school specific unobservables. Sk captures the class size variable.
In such an OLS equation, the estimated coefficient of the class size variable will suffer from endogeneity bias if student ability is correlated with class size. Removing from our sample students who are deliberately placed in abilitysetted classes would reduce the endogenity problem but not necessarily eliminate it. In order to credibly address the issue, a pupil fixed effects approach is feasible where data exists on both achievement scores and class size by subject for each student, and where thus, for each student, there are as many rows of data as there are number of subjects.
In such a setup, students are allowed to face different class size for different subjects within the school. This subjectwise variation in class size 'within a student' is what allows us to incorporate class size (along with teacher characteristics that also vary across subjects) as an explanatory variable in a pupil fixed effects (PFE) equation. This is the approach we follow. We estimate the following simple PFE achievement equation.
Aijk = α + βXik + γCjk + ψ.Tjk + (δSk + μij + ηjk + εjk) where Aijk is the achievement of a student i in subject j and in school k. X is the vector of characteristics of students i. C is class size of subject j, T is a vector of teacher characteristics of subject j, and S is the school specific characteristics of school k. The composite error terms are represented by μij, ηjk and εjk. These error terms denote the unobserved characteristics of students, school and subject respectively. A simplified PFE model of two subjects' cases (subject 1 and subject 2) looks like as follows: PFE is self-evidently a within-school phenomenon since a student studies in a single school. If school unobservables are not subject specific (i.e. η does not have j subscript) and pupils' unobservable are not subject specific (i.e. μ does not have j subscript), then within school PFE model looks like as follows: Regressing difference in a pupil's test score across subjects on the difference in class size across subjects nets out the effect of all student subject-invariant unobserved characteristics. However, if student ability varies by subject, that is not netted out but (μi2 -μi1) remains in the error term. Although it remains in the error term, it will not create a problem in our estimation unless it is correlated with the (C2 -C1). For this correlation to exist, students should be able to match to specific classes of a subject within their grade in the school, e.g. pupils who are bright in a subject systematically match to the smaller -or the larger -classes of that subject (within their grade). To avoid this, we present achievement equations using that sub-sample of classes which are not ability setted, which we call the 'reduced sample'. In these equations, subject-specific class size will not be systematically matched with students' subject-specific ability. We present results using both the full sample of school classes and also the reduced sample of school classes that are not setted by ability. In the reduced sample estimations, it will not be the case that a student is put in a smaller (or bigger) class for the subject in which she is able, and in a bigger (or smaller) class in a subject in which she is less able. Thus, the presence of subject-varying pupil ability is not expected to be a source of bias in our approach. However, subject-specific school unobservables (η2k -η1k) remain in the error term and may in principle be correlated with (C2 -C1). While some subject-varying aspects of school should be captured in class-size (e.g. if a school emphasises a particular subject, it is often reflected in, or is because of, small class-size in that subject), not all subject-varying aspects will be captured in class-size, and they remain a potential source of endogeneity.
For consistent estimation of the effect of class-size, it is also required that class level (i.e. subject-specific) unobserved characteristics (such as class-resources, teacher quality etc.) be unrelated to the included class size variable: For example, if more skilled teachers are assigned to teach larger classes, a class level unobservable (teacher skill level) will be correlated with both class-size and with pupil scores. Since omitted class-level variables in ε1, ε2 may be correlated with both class-size (C1, C2) and with pupil achievement A1, A2, we cannot say that PFE estimation permits us to interpret the class-size effect as causal. We do include a number of teacher quality characteristics in the PFE achievement equation (the subject teacher's qualifications, training and experience), which should reduce this source of endogeneity, but it may not necessarily eliminate it. While across-subject PFE estimation resolves one source of endogeneity (i.e. correlation between μ and C), it does not solve this potential source of endogeneity (the possible correlation between ε and C). This is analogous to the standard panel data estimation where class unobservables remain in the error term.

V. Data
The estimation strategy presented above requires a specific type of database. First, it is needed to have students' test score across different subjects. Second, there has to be enough variation in class size between subjects. We collected subject-wise test scores of each student of grade 12, from ten different schools of a private school chain in Uttar Pradesh. To pass grade 12, students take six subjects from a pool of 16 subjects, where the compulsory and optional subjects are specified within each of two major streams: science and commerce x . English is examined in two different papers, Language and Literature, and the score division is 50-50 for a 100 marks exam. The mark we obtain for English is the consolidated mark of English-Language and Literature. Therefore, in our analysis, we have given equal marks to both the subjects. For example, if a student scored 78 per cent mark in English, we have given 78 for English-Language and 78 for English-Literature, as the two subjects are taught by different teachers in most of the campuses. We have also restricted our sample by removing the scores of Physical-Education (PEd) from our analysis xi .
Grade 12 students are typically aged 17 years old at the start of the school-year, which begins generally around 1 st April each year. In the sample school chain, a typical grade 12 student takes three compulsory internal examinations (before facing the external Board exam the following March): the First Comparative exam, the Second Comparative and the Pre-Board exam. The First Comparative exam happens in late June, by when only one-third of the syllabus is covered. The Second Comparative examination, also called the Half-yearly exam, takes place in September, by when two-thirds of the syllabus is covered. By the Pre-Board exam in mid-December, all of the syllabus is covered.
From mid-December to February is revision/review time. Finally the class 12 external exam set by the exam board is typically spread over the month of March.
For the analysis, one should ideally use students' Board exam marks as the external exam answer sheets are anonymously evaluated by Board-appointed examiners, usually in another city. However, the distribution of marks in the Board exam is highly non-Gaussian. On the other hand, the distribution of the school's internal Pre-board exam marks is more Normal. Figure 3 shows that the board exam marks' distribution is always to the right of the internal Pre-board exam marks' distribution, which need not in itself be a problem. What is problematic is that the marks distribution is distinctively (rightward) skewed rather than Normal: the most extreme case is illustrated in the Computer Science marks, where the 'moderation' policy adopted in the Board exam leads to a distribution where no candidate has received marks less than 46, and the vast bulk of students have marks between 85 and 100. The Maths and Economics marks distributions are bimodal, with a lot of students given grace marks that take them just above the pass mark of 35, and there are an unduly large number of students getting marks between 90 and 100 per cent. The board marks' distribution is also generally narrower than the internal exam marks' distribution, e.g. see the kernel density distributions for Computer Science and English.
Concern has been expressed about the 'grace marks' and moderation practices of the various exam boards in India (Sanghi, 2013;Bhattacharji, 2015;Times of India, 2018;Kingdon, 2019; see Appendix B for details). Board exam results in India are also not trusted for entrance to prestigious universities such as the Indian Institutes of Technology (IIT) and for medicine and engineering courses at other colleges. Thus, instead of using Board Exam results for our analysis, we have used marks in the school's internal 'Pre-board' exam since, by the time of the Preboard, the entire syllabus is covered, and since students from all the schools in the sample school-chain appear for same exam, on the same date and with the (same) question papers prepared by an independent authority xii .
The distribution of pre-board exam marks in the different subjects is different, e.g., the distribution of internal preboard marks in physics, chemistry and maths in the left panel of Figure 4 shows that marks in physics and chemistry are lower and less dispersed than marks in maths. In order to render them comparable and to use student achievement in different subjects as the dependent variable, it is thus necessary to standardize the marks. We standardize the score by the average score in the subject, that is, we use the z-scores of achievement. The z-score is the score of the pupil in a given subject minus the overall average score in that subject, divided by the standard deviation of the overall score in that subject. Therefore, by construction, z-score of each subject has a mean of 0 and a standard deviation of 1. The right panel of Figure 4 presents the z-scores of the three different subjects. As expected, the distribution of standardized score is much more similar across subjects than the distribution of the raw scores.

VI. Results
This section presents the results of our regression analysis and also robustness checks. Results are presented in Tables 1-7. To prevent the analysis from being unduly affected by outliers, we removed the bottom and top two percent of observations of class-size, which led to removing class sizes below 18 and above 59. Mean class size is 43.64, though the whole-school pupil teacher ratio is lower, at 28.3 due to music, dance, sports, and art teachers, class-coordinators, librarians, lab-technicians, swimming coaches, psychologists, career counsellors, etc.
Since different teachers teach different subjects to the same pupil, it is possible to include teacher variables in not only the OLS but also in the pupil fixed effects equation of the achievement production function. While adding school fixed effects reduces the problem of the endogeneity of class size, it does not necessarily eliminate it since student ability may be correlated with class-size within the school, e.g. if more able or more motivated students systematically get selected into small or large classes (of their grade) within the school. xiii We try several different ways of dealing with the within-school endogeneity bias. Firstly, we control for students' subject-specific ability, measured by the average mark of the student in the subject in the previous two internal i.e.
within-school exams called the First and the Second Comparative exams (see para 2 of section V on data, above).
We control for subject-specific ability by adding this variable in our school fixed effects achievement equation.
Secondly, we estimate a pupil fixed effects (PFE) equation. While this controls fully for students' subject-invariant ability, ability may differ across subjects and that would remain a source of endogeneity if students who are particularly able in a given subject are deliberately put into smaller or larger class sizes in that subject. To take this into account, we control for pupils' subject-specific ability even within the PFE achievement equation -allowed by the rich nature of the data available to us. In case this does not fully control for subject-specific ability, we go further and estimate a PFE equation with subject-specific ability separately for two subject-groups (the science-subjects group and non-science subjects group xiv ), since subject-specific ability would be more similar for subjects within such a grouping than for subjects across groupings, and we continue to control for subject-specific ability too, within the subject group. Estimating the equation separately for the two subject groups also allows us to see whether the shape of the relationship between class size and pupil achievement varies by subject. Table 2 controls for subject-specific ability. This has a large and statistically significant coefficient, and its inclusion strongly increases the adjusted R-square. Table 2 shows a statistically significant quartic relationship between class size and student achievement. However, it still relies on across-student differences in achievement, and does not control for pupils' subject-invariant ability.
This is a useful juncture at which to examine the coefficients on the control variables in the most stringent achievement production function estimated so far -the last column of Table 2 with school-fixed-effects and with subject-specific student ability. The results show perversely that teachers with a bachelor's degree are more effective than those with higher qualifications (Masters, M. Phil). Teacher's training has no relationship with pupils' marks, as the coefficients on B.Ed and M.Ed qualifications are not significantly different from the base category (teachers with no professional training) xv . Male teachers are more effective than female ones, but this seems a selectivity effect since the sample (co-educational) secondary school has a preference for female teachers and only a few male teachers are recruited/retained who are judged to be exceptionally effective. Teacher experience is uncorrelated with student learning. Turning to child characteristics in the achievement equation, there is no significant difference in performance between male and female students, though there is a small and weakly significant coefficient on student's religion.
Next we estimate the within-pupil relationship, i.e. a pupil fixed effects (PFE) equation in Table 3. This provides a stringent test for a class-size effect since it nets out the influence of all subject-invariant pupil and school unobservables. We have used the same specification as in Table 2 but student variables and school fixed effects drop out of the PFE estimator. As before, we allow flexibly for functional form by including linear, quadratic, cubic and quartic specifications, and present results with and without a subject-specific ability control. The PFE results suggest a statistically significant quartic relationship between class size and student achievement. This relationship is "horse-shaped", as seen in the top panel of Figure  For the remaining more than 86% of all observations, the relationship is concave. As class size increases from 35 to 51, achievement mark increases (gently) by 0.10 SD, equivalent to a rise in absolute mark by 1.8 percentage points, which is a modest increase for a very large increase (of about 2 SD) in class size. Finally as class size increases further (by 1 SD) from 51 to 59, achievement declines by 0.18 SD, which is equivalent to a reduction in absolute mark by 3.2 percentage points.
So far, we imposed a quartic relationship between class size and student achievement. To explore the functional form in more detail, we introduce splines by creating dummy variables of class size. The results reported in column 9 of Table 3 suggest that as class size increases, student performance initially increases, then remains flat for a significant range of class sizes (the coefficients of class size of '27 to 34' to '53 to 56' are not statistically different from each other) and starts declining as class size crosses the category of '53 to 56'. This substantiates our quartic results. The combination of quartic PFE estimates and estimates with splines suggests that the relationship between class size and student achievement is "table-shaped/inverted U shaped" as shown in the bottom panel of Figure 5.
Beyond class-size 35, student performance is invariant with respect to class size till class-size reaches 51. The implications of this shape of relationship are discussed at the end of the current section.
Next we examine whether the class-size effect differs across boys and girls. To do this, we estimated our most stringent achievement production function (the pupil fixed effects equation with control for subject-specific ability) separately for girls and boys. The results in Table 4 suggest that among boys, the relationship between class size and student achievement is quartic, whereas among girls it is linearly positive. Figure 6, top panel shows the shape of the relationship. For boys, there is no large change in achievement as class size increases from 24 to 50 xviii . After class size 51, boys' achievement level declines rapidly. Thus, apart from the very low end (class-sizes 18 to 23, where only 1.36% of observations lie), the relationship of class-size and pupil achievement for boys conforms broadly to the shape described in Section 2 of the paper, based on a priori considerations, i.e. the relationship between classsize and student achievement is reasonably flat for the range of class-size between 24 and 50, but then declines sharply with an increase in class size beyond 50. For girls, the absence of a negatively sloped part suggests that there are no disciplinary issues with increased class sizes until the high 50s. We return to the positively sloped part later.
We also investigated whether the class-size effect differs between the science and non-science xix subject streams. Table 5 shows a concave relationship between class size and student achievement in the science-stream classes but a quartic relationship in the non-science stream. However, Figure 6 (bottom panel) shows graphically that even in the non-science stream, the dominant part of the relationship (where about 86 per cent of the observations lie, i.e. above class size of 34) is again concave, as for the science subjects. It is very clear that in non-science subjects, the optimal/maximum point i.e. the turning point after which the relationship becomes negative, occurs at a higher class-size than in the science subjects.
The optimal class size for the study of the sciences is 41 but in the non-science subjects, optimal class size is around 52. Whereas up to class-size 34, achievement in non-science stream subjects first modestly increases and then modestly falls with class size, beyond class size 34, the relationship is strongly concave, where achievement increases with class size till a class size around 52 and then falls with class size xx (see figure 6, bottom panel). The fact that in science subjects, achievement starts declining from a class size of 41 upwards, but that in the non-science subjects, for the bulk of the observations, achievement starts declining after a class size of 52, suggests that science students require more individual attention than non-science students.
In summary, when we do not bifurcate by gender or subject-stream, our analysis suggests a "table shaped" or "inverted U shaped" relationship between class size and student achievement. Student performance initially increases as class size increases till about 27, flattens between class-sizes of 27 and about 50 xxi , and then declines as class size increases beyond 50. We also observe that the effect of class size on student performance differs between boys and girls, and between science and non-science subject streams. In general, girls' learning flourishes in larger classes, and performance in non-science subjects flourishes up to a larger class size i.e. only starts declining after a class size of roughly 50, compared to science performance which starts declining beyond a class size of roughly 40.

Explanation of the shape of the relationship
Contrary to our prior beliefs, the data show that as class-size increases from 18 to 27, children's learning level rises with class size, rather than falling. Moreover, from a class size of 35 to about 50, there is again a gentle increase in learning achievement with class-size ( Figure 5). When we look separately by subject-stream, we find that in both science and non-science subjects, the dominant part of the relationship is again concave i.e. learning first rises and then falls with class size. While the negative slope beyond a class size of roughly 50 is understandable (beyond a certain class-size, it may be difficult to maintain discipline, though for girls the linear relationship suggests an absence of disciplinary issues), the positively sloped part of the relationship is intriguing and demands an explanation.
We explore some peer-group effects to examine whether the positively sloped part shows that children learn from each other, and we presume that larger class-sizes permit more learning from peers. We first constructed three peergroup variables: 1. "Mean achievement of class peers" (mean mark of all the class peers, i.e. all pupils in the class, excluding the index student); 2. "Mean achievement of ability peers" (mean mark of the peer group in the achievement decile of the student within the class, again excluding the index student); 3. "Variation in pupil ability within the class" (measured by the within-class standard deviation of achievement in the prior 'Comparative exam' in the subject, which captures the heterogeneity of the ability distribution in the class).
A student may learn not only from the teacher, but also from her/his peers -i.e. from students in the whole class group (class peers), or from others in her ability group within the class (ability peers), and weaker students may learn from bright students, i.e. the greater the ability distribution in a class, the greater may be such learning by the weak from the able. The extent to which such peer learning happens may differ by subject. For example, it is often said that maths and science require more explanation and attention by a teacher but that language learning can benefit from peer interaction as it is not so dependent on a teacher's explanations or personal attention. If this is so, we would expect less peer learning in science than in the non-science stream. Students may learn from class peers or ability peers either through watching their work (demonstration effect) or from getting direct help from them.
Finally, if science subjects require the attention of a teacher rather than being self-learnt or learnable from peers, then a high variation in ability level across children in a class would deter science learning because some of the teacher's attention will be given to the weaker students. But by the same token, if non-science lends itself to learning from peers, then weaker students will benefit from interaction with smarter peers and the smarter peers may learn themselves too, by teaching their less able peers.
Our peer-group results in Table 6 show that the achievement level of both 'class peers' and of 'ability peers' statistically significantly benefits a student's attainment in non-science subjects, but not in the science subjects. This gives credence to the maintained view that science is learnt mostly from a teacher, but that in the non-science subjects, one can learn from one's peers. The size of the peer-learning effect is also large: ceteris paribus, a one SD increase in 'class peer' mean achievement xxii raises the index student's achievement by 0.17 SD, and additionally a one SD increase in the mean achievement level of 'ability peers' raises the index student's achievement by 0.07 SD.
Since a student can learn from others in non-science subjects, instead of being a constraint to learning, larger class size are beneficial for learning as there are more peers to learn from, though the peer learning effect is increasingly tempered by the manageability of the class, as discipline becomes a bigger issue as class size increases, which may explain the diminishing returns to class-size which ultimately turn negative after the optimal class size of 41 in science and 52 in non-science. Table 6 also shows that while in the non-science subjects, variation in class ability benefits learning, in the science subjects, it harms learning. This may be because in science, a teacher needs to give individual attention to each student: the greater the variability of ability in a class, the more the teacher's attention is divided as there is greater need for differentiated teaching for pupils of different levels of ability, and there is less individual attention.

VII. Cost-Benefit Analysis
The fact that up to a class-size of roughly 40 in science subjects and roughly 50 in non-science subjects, there is no reduction in pupil learning as class size increases implies that there is no learning gain to be had from reducing class size below 40 in science and below 50 in non-science. This has important policy implications for optimal pupil teacher ratios (PTRs) and thus for teacher appointment decisions in India, based on considerations of costeffectiveness and economic efficiency. In this section, we compare the fiscal cost of existing class-size policies with the cost of hypothetical policies based on the pedagogically optimal class-sizes suggested in our findings.
Recent education policies in India reflect the tacit belief that to improve student performance, class size must be reduced by recruiting more teachers. The Right to Education (RTE) Act mandates a maximum PTR of 30 for elementary schools, and the Ministry of Human Resource Development (MHRD) guidelines for the secondary education program RMSA (Rashtriya Madhyamik Shiksha Abhiyan) mandate a minimum of 5 teachers for up to 160 students (implying a PTR of 32) and then a further teacher for each 30 students thereafter. As mentioned in Section 1, the draft National Education Policy (NEP, 2019) also identifies pupil teacher ratios above 30 as a major cause of lack of learning (page 63, section 2.14). It states that the country faces over one million teacher vacancies (page 115), and suggests that the government's education budget should increase by 1.05 percentage points, for filling teacher vacancies and better teacher resourcing (page 417, Table A1.4). This additional recruitment of teachers would create a permanent fiscal liability for government.
In reality data show that public schools are operating at much lower levels of PTR than the mandated 30 pupils per teacher. Appendix Table A2 shows that in 2017-18, nationally, PTR at the elementary school level was 22.8 and that in 8 out of 20 major Indian states it was below 16. At the secondary and higher secondary levels the PTR was 27.9 in 2016-17 xxiii (27.3 for 20 major states).
However, it is important to highlight that the elementary and secondary PTRs of 22.8 and 27.9 are prima-facie PTRs, being the total reported pupil enrolment divided by the total number of appointed teachers. These use uncritically what are known to be inflated enrolment numbers based on fake/ghost names entered by the school to show a higher than actual enrolment. This happens because grains for mid-day meals, bags, shoes, sweaters, cloth for school uniforms, other freebies and ultimately even teacher appointments are all based on the schools' self-reported enrolments. The Comptroller and Auditor General of India found 20% inflation in DISE pupil enrolment data at the elementary school level in Uttar Pradesh (CAG, 2017), and the Mid Day Meal Authority also reports overstated enrolment in public schools (Times of India, 2015). Given that real enrolment is lower than reported (inflated) numbers, the real PTR is even lower than the reported prima facie PTRs. High student absence rates of 31% (EdCil, 2008) and 28% (ASER, 2018) shown in Table A2, capture both fake enrolments as well as actual absence among genuine enrolees. Dutta and Kingdon (2021) show that the 'cost-conscious' PTR which adjusts for student absence rates was 15.8, and that the 'effective PTR' which adjusts for both student and teacher absence rates, was 20.8.
However, for our cost-benefit analysis, we use the prima facie PTR of 22.8 (see Table A2) xxiv even though it is known to be higher than the true PTR.
Our analysis based on the impact of changes in class-size on learning levels, together with our analysis of the costs of teacher salaries presented in Tables 7(a, b) suggests that major cost-efficiencies can be achieved by maintaining larger than currently mandated class-sizes without compromising on student performance. Tables

VIII. Conclusion
Our paper used a pupil fixed effects method to examine the causal effect of class-size on pupils' learning outcomes.
While earlier studies using pupil fixed effects (e.g. Altinok and Kingdon, 2012) could not control for subject-specific ability, the current study does so by including prior achievement of the student in various subjects. Our more refined estimates of the causal effect of class size on student achievement are thus an advance on previous studies.
We found a robust and statistically significant 'table shaped' relationship between class size and student achievement.
We observed that over a wide range of class-sizes from 27 to about 51, student performance does not fall with classsize, suggesting that in the type of pedagogy that is practiced, children's learning benefits from having a larger number of peers, i.e. children learn from peers too, and not only from the teacher, a hypothesis for which we found empirical support. It is likely that beyond 51, discipline issues are greater, leading to a decline in learning. Our estimates suggest that in secondary education, reducing class-size to 30 or below cannot be good policy as it lowers learning and raises costs. When making the maximum PTR policy, the tacit assumption in policy makers' minds about there being a negative relationship between class-size and learning gain, was not empirically grounded, and seems to have been based on an incorrect understanding about the causal relationship between inputs and outcomes in education. If our estimates are correct, India spent a very substantial amount of money on reducing class size below the learning-maximizing level which was wasteful expenditure on rents to teachers, and not investment that would benefit children. It is important to make empirically grounded decisions to ensure that that any proposed public spending on education would represent an investment.
Our cost-benefit analysis results showed that if India increased the PTR from its current 22.8 in elementary and 27.9 in secondary schools, to 40 pupils per teacher in both, it would save USD 19.4 billion per annum in government's teacher salary expense, without any reduction in learning outcomes. We showed how this money would allow investment in other educational items, such as providing internet-enabled computers and computer teachers to all public elementary schools; providing a smartphone to all 41.2 million below-poverty-line (BPL) children of the country; or providing a school voucher or Direct Benefit Transfer (DBT) for schooling, equal to nearly Rs. 2943 per month per BPL child.
Fulfilling the resolve of the draft National Education Policy (2019) to urgently appoint 1 million new teachers, would further greatly reduce the already low PTR without any increase in learning outcomes. This suggests the need to recalibrate policy on teacher appointments and to re-evaluate the definition of 'teacher vacancy', since what is needed is substantial redeployment of teachers from urban (teacher surplus) areas to rural (teacher-deficit) areas, rather than the appointment of new teachers.
Since we used data on secondary age students, the question could arise whether the impact of class size could be different for a lower (e.g. elementary school) age group and, secondly, our analysis is based on data from private schools so the question arises whether individual attention by a teacher could matter more to children in public schools who have less learning support at home, with typically less educated parents. However, Banerjee et. al. (2007) in their RCT experiment study using data on 175 public primary schools of Mumbai and Vadodara found that reduction in class size xxix had no effect on learning in government elementary schools in India. While this finding supports our conclusions, more research would strengthen the evidence base for the implied class-size policy direction for India. Note: Robust standard errors in parentheses (*** p<0.01, ** p<0.05, * p<0.1) Note: Teacher characteristics included but not shown.  Note: Note: Robust standard errors in parentheses (*** p<0.01, ** p<0.05, * p<0.1) Note: Robust standard errors in parentheses (*** p<0.01, ** p<0.05, * p<0.1). Teacher characteristics included, and student's subject-specific ability included, but not shown. Note: Robust standard errors in parentheses (*** p<0.01, ** p<0.05, * p<0.1). Teacher characteristics included, and student's subject-specific ability included, but not shown Note: Teacher variables included but not shown.  Table 4 in Kingdon (2020) who estimates them as follows.
1. Monthly salary data at the India level is obtained by taking the simple average of state-wise salary of primary school teachers with 15 years' experience from Ramachandran (2015) at the National University of Educational Planning and Administration (NUEPA), who had salary data from six major states of India.
2. In the above calculation, the average salary of upper primary level teachers is ignored even though one-third of all public elementary schools are upper primary schools, whose teachers receive a significantly higher salary rate. If we were to include upper primary school teachers' salary, it would come to a weighted average monthly elementary school teacher salary of Rs. 53996, in which case the estimated actual cost would be USD 38.76 billion instead of USD 37.25 billion.  Kingdon (2020) which is based on averaging of salary data of secondary school teachers (15 years' experience) from across six major Indian states in Ramachandran (2015), and this data for 2014 is extrapolated to 2016-17 using the Consumer Price Index. *-Actual Cost.     Note: Experience = (Actual Age -24). We have assumed that everybody joined the job market at the age of 24. Physical Education is a non-academic subject, an outlier (with all students obtaining a very high mark in PEd) and scoring well in PEd does not need much effort, special training or ability, nor is it a high stakes subject since performance in PEd does not get counted for admission to university xii For the Pre-board exam, the sample school chain centrally prepares a grade 12 internal examination paper in each subject, which is taken by the students at all ten schools in the chain. Scripts are marked by the teachers in each school, but on a pre-agreed mark-scheme. The pre-board exam takes place in December when the syllabus is complete. Prior to that, grade 12 students take the First Comparative xiii In each school, there are several sections (classes) of grade 12. xiv The science subjects' group consists of physics, chemistry, biology, biotechnology and maths. xv This is not surprising as the mandate that teachers must have a training certificate, and the fact that the trained teacher salary grade is very significantly higher than the untrained grade in private schools, has led to many low-quality teacher training colleges being established/accredited to provide such training certificates, colleges which sell such certificates with perfunctory courses, without genuinely taking teachers through a proper training course. This is much rued in the media and the accusations of corruption and low quality of training colleges has led the National Council for Teacher Education (NCTE) to suspend giving any further colleges accreditation from 2018 onwards.

Figures:
xvi Including the 2% observations in the smallest classes that have been excluded in the regressions.
xvii If these few observations were disregarded, there is hardly a positively sloped portion: as class size rises from 24 to 27, pupil achievement increases minimally by 0.02 SD. Only 0.77% observations lie in class-sizes 12 -17; and 2.91% lie between class-sizes of 24 to 27. Thus, 5.02% of observations lie in class sizes 12 to 27. xviii (though achievement dips by 0.05 SD as class size increases from 24 to 37, it rises by 0.07 SD as class size increases from 37 to 50, i.e. the quartic form imposes an inflexion point) xix We consider a student as studying in the science stream if his/her core subject combination has Physics, Chemistry and Mathematics (PCM) or Physics, Chemistry and Biology (PCB) out of 6 subjects. Students without PCM or PCB combinations are categorized as non-science students. xx We checked the robustness of the results of non-science stream with splines. The results suggest that initial bump in achievement of non-science students as seen in figure 5, is actually flat as the coefficient of class size 27 to 39 and 40 are not statistically different from the base category.

xxi
In the equation that imposes a quadratic, performance gently increases between class-sizes 35 to about 50. In the equation that allows splines (dummy variables) for different class-size categories, the relationship is roughly flat. xxii 1 SD of achievement in the school's (pre-board) exam is 18.38 per cent mark. This is the mark used throughout the analysis. xxiii Enrolment and total number of teachers at secondary and higher secondary level are 26276072 and 941725 respectively in 2016-17. Source: http://udise.schooleduinfo.in/ xxiv We are unable to compute the effective pupil teacher ratio (EPTR) at secondary and higher secondary school levels due to lack of any survey data on pupil and teacher absence rates for these levels. https://nrega.nic.in/netnrega/writereaddata/Circulars/2058Notification_wage_rate_2017-2018.pdf xxix For the non-remedial children who were left behind in the class when the academically weaker children were taken out for a remedial class by the teacher-aide (Balsakhi).