Optimism Bias and World Bank Project Performance

Abstract This paper examines the correlates of optimism bias and its impact on World Bank project performance. We measure optimism bias in different ways using estimated Economic Rates of Return (ERR) of projects at approval and closure. We examine over 2,800 World Bank projects that were appraised between 1960 and 2019. We find that approximately 60% of projects in the sample were prone to optimism bias. Correlates of optimism bias include both project and country characteristics. Findings also indicate that the incidence of optimism bias reduces the chance of a satisfactory project performance rate at the time of evaluation by 17–20%. Recommendations include embracing complexity and uncertainty in considering projects for approval, providing organizational incentives for ensuring projects are successful rather than ERRs being accurate, shifting some resources from appraisal to implementation, and changing the nature of project supervision.


Introduction
Project underperformance remains an everlasting puzzle in development theory and practice (Andrews, 2018;Hirschman, 1967;Ika, 2012). Consequently, project performance has been the subject of research, with three broad areas of interest (Denizer, Kaufmann, & Kraay, 2013;Feeny & Vuong, 2017;Ika, 2018). The first stream, which dates back to the 1970s, focuses on Cost-Benefit Analyses (CBA) and typically assesses the Economic Rates of Return of projects (ERRs), prospectively at project appraisal/approval and retrospectively at project completion/ closure (e.g. Del Bo & Florio, 2010). The second and relatively recent stream focuses on rigorous impact evaluation through randomized control trials (e.g. Banerjee et al., 2015). In addition to these two streams which are in the micro-economic tradition, a third look into the inner workings of the project management 'black box' and explores how activities and processes are actually carried out to fill the void in practical insight on how projects really get done (e.g. Ika, 2015). 1 examine country-level macro characteristics such as average economic growth rates, income levels, political rights, and civil liberties that prevailed during project implementation (Bulman, Kolkma, & Kraay, 2017;Denizer et al., 2013). We also control for the country or region of implementation. While many of these correlates have been identified in the literature (World Bank, 2010), this study is the first to test their significance, not only in a larger dataset but also in terms of both the incidence and intensity of optimism bias.
More precisely, we find that the ERR at approval is positively and significantly associated with optimism bias, which has tended to increase up to the mid-1980s to the 1990s and then decrease thereafter. This contrasts with Flyvbjerg (2016) who suggests there has been no sign of amelioration or learning throughout the years. Further, going beyond World Bank (2010), Flyvbjerg (2016) and Love et al. (2022), we quantify the impact of optimism bias on project performance. We show that the incidence of optimism bias or the percentage of World Bank projects that are deemed overoptimistic is at best 60% as opposed to 57% in social infrastructure  or 80% in economic infrastructure projects (Flyvbjerg, 2016). On average, the ERRs at project closure is 22.9% compared to 25.4% at approval, demonstrating a mean ERR gap of À2.5%. Finally, optimism bias can reduce the chances of a satisfactory project performance rating by 17%-20%.
The remainder of this paper is organized as follows. Section 2 describes the practice of CBA at the World Bank. Section 3 reviews the empirical evidence not only on the ERR gap but also on optimism bias and its relationship to performance in World Bank projects. Section 4 provides details on the methodological background of the research while the findings from the estimation of the empirical models are reported in Section 5. Finally, Section 6 concludes with the practical implications of the research, presents its limitations, and offers suggestions for further research.

The practice of cost-benefit analysis at the World Bank
Since the 1970s, CBA has featured prominently as a powerful tool for project investment justification, now mandated by Operational Policy OP 10.04 of the World Bank. Under this guidance, the World Bank should conduct CBAs for all its funded projects, unless some costs and benefits cannot be measured in monetary terms and, therefore, a cost-effectiveness analysis, which includes a comparison of the costs of alternative ways of reaching a given goal, should be conducted (World Bank, 2010). While CBA at approval is generally assessed at the project appraisal stage and finalized at the negotiations/approval stage, CBA at closure is calculated at the completion/evaluation stage of the World Bank project cycle, which consists of six stages: identification, preparation, appraisal, negotiations/approval, implementation/support, and completion/evaluation. This project cycle is provided in Figure S1 in the paper's supplementary material.
Typically, CBA assesses whether the present value of benefits exceeds that of costs for a given project to be undertaken. CBA consists of a cash-flow analysis that quantifies over time the major benefit and cost flows and the economic rate of return (ERR) of a project or the discount rate at which its net present value (NPV) is equal to zero (World Bank, 2010). With few exceptions, the World Bank usually considers a cut-off rate of an ERR of at least 10% at approval (in constant prices), reassesses, using data from completion reports, the ERR this time after completion or start-up of normal operations, and measures the ERR gap (Del Bo & Florio, 2010;Pohl & Mihaljek, 1992). Notably, the ERRs at closure are rarely real ex-post ERRs, which may be calculated only after 10þ years when evaluation reports are prepared (Little & Mirrlees, 1990).
Yet, the ERR is not the sole criterion for project performance. While ERRs measure efficiency, (non-ERR) performance ratings, however, measure not only efficiency but effectiveness, relevance, impact and sustainability (World Bank, 2010, p. 58). Such ratings have been applied since 1973. The World Bank initially had a binary (satisfactory and unsatisfactory) scale, which they changed to the current six-point scale in 1993 (highly satisfactory, moderately satisfactory, satisfactory, moderately unsatisfactory, unsatisfactory, and highly unsatisfactory) (Kilby & Michaelowa, 2019).
3. The empirical evidence on cost-benefit analysis, optimism bias and project performance at the World Bank While the theory of CBA may be conceptually rigorous, it remains difficult for practitioners to get key parameter estimates right (Pohl & Mihaljek, 1992). Even Little and Mirrlees (1990) pioneers of the CBA methodin their assessment of the use of CBA 'twenty years on', noted that, though it was intended to be "practical" (p. 353), it has been "hard to implement," "unsound," "self-defeating," and "too complex" (p. 365), especially in the face of complexity and uncertainty of both the project and its setting, which can diminish its usefulness and "value" (p. 356). They conclude that changing organizational contexts meant that, in the 1970s, ERRs "became more and more optimistic, without justification," a phenomenon they dubbed the "McNamara effect" as the then president of the World Bank was advocating for a push in the volume of investments (p. 364).
More concretely, a few studies have sought to measure the accuracy of CBA in World Bank projects. In their study of a sample of 1,015 infrastructure projects in the agriculture, transport, energy, industry, and urban development sectors, Pohl and Mihaljek (1992) analyzed the differences between ERRs at approval and closure when the projects were completed about 5-10 years after appraisal (between 1974 and 1977). Their regression analysis revealed a large ERR gap and thus a prevalence of a high level of uncertainty in project appraisal which tends to be biased and optimistic. Their findings suggest a high degree of optimism bias in the average project returns in the 1970s with a rise in ERRs at approval from 16% in the mid-1960s to 20-25% in the mid-and late 1970s, and a downtrend in the EERs at closure in the 1970s.
Del Bo and Florio (2010) found that there are no statistical differences between ERRs at approval and closure for 84 projects approved in fiscal years 1988-1997, but this was not the case for 259 projects completed in years 1990-1997. They showed that the ERR gap for the more recent years was diminishing but the highest for industry projects and the lowest for road projects. The EERs at closure, when compared against the 'industry' sector benchmark, revealed higher returns for roads and highways, ports and airports, energy distribution and telecommunications infrastructure. The findings also suggest 'constant forecasting optimism' of around 30% in the ERRs at approval for industry, water, and railways projects, whereas it is less for road projects.
World Bank (2010) looked into the ERR gap of a total of 1,938 investment projects including 1,772 projects from 11 different sectors over two periods of closing (1975)(1976)(1977)(1978)(1979)(2003)(2004)(2005)(2006)(2007) and 166 projects in 2008. The presence of an ERR estimate served as an indicator of whether CBA was performed. It was found that the percentage of projects with a CBA analysis dropped from 70% to 25% between 1970 and 2008. A time series between 1972 and 2008 using median values showed that the ERR gap had "virtually disappeared" around 2000 perhaps essentially due to "a rise in the upward bias in returns, improvement in overall economic conditions as measured by growth, and a rise in the degree of market orientation of the economic regime" (p. 40).
The ERR gap may be due to a 'positive' or 'upward' bias of different forms including the planning myth of "Everything Goes According to Plan"; overestimation of long-term benefits; a tendency not to report CBA for many projects, especially those with negative returns; and recalculation bias in that many projects, especially the poorly performing ones, do not have ERRs at closure. Further, both the ERRs and the performance ratings are correlated: "the returns are observed by the evaluator doing the rating" yet "the empirical overlap between them is not huge" (World Bank, 2010, p. 39). These findings that optimism bias is at work in World Bank CBA Optimism Bias and World Bank Project Performance 2607 trigger this study, which investigates the extent of optimism bias, its correlates and its impact on project performance.

Data
This research utilizes the World Bank's Project Performance Rating database, a publicly available dataset administered by the Independent Evaluation Group (IEG), which evaluates World Bank projects, programs and activities by examining their achievements against stated objectives. It includes some 10,500 completed projects spanning the period 1964-2019. While World Bank staff prepare Implementation Completion Reports (ICRs), which include initial project performance ratings, IEG staff carry out a review of all ICRs together with the lending/grant agreement, project appraisal documentation and the country assistance strategy. The dataset provides the estimated ERRs of projects at their approval and at their closure. While it remains difficult to measure optimism bias (e.g. Love et al., 2022), a proxy is to calculate a binary variable equal to one if the ERR of a project at appraisal is higher than at closure. If this is true for most projects, then there is evidence of optimism bias.
However, this absolute proxy, by essence, implies that projects with high ERRs at approval will be positively associated with a high likelihood of optimism bias. 2 Following Pohl and Mihaljek (1992), we resort to a relative proxy, the percentage difference between a project's ERR at closure relative to its ERR at approval: (ERR at closure -ERR at approval)/ERR at approval (multiplied by 100). Though this approach does not remove the measurement problem altogether, lower values of the variable imply higher expectations of the ERR at project approval than closure. Thus, the intensity rather than incidence of bias is modelled. Given its continuous nature, the variable does not distinguish between optimism and pessimism bias, the latter potentially arising when, on average, ERRs at approval are systematically lower than at closure. We therefore also include results pertaining to models with a binary variable in our examination of optimism bias, noting that a positive association with ERRs at approval is unsurprising and that robustness tests are warranted. 3 It is World Bank staff that calculate the ERRs at approval and closure although the IEG will use this information in their project evaluations (Kilby & Michaelowa, 2019). The IEG data only report ERRs if CBA has been applied to at least 20% of a project commitment (World Bank, 2015). Our analysis is thus restricted to the IEG ratings on those projects where CBA applies, implying that they are different in nature from others in the database. Further, as the World Bank typically doesn't fund projects with low ERRs at approval (i.e. less than 10%), we exclude them from subsequent regression analysis. 4 We also exclude projects that are deemed a complete failure or arbitrarily assigned an ERR at a closure of -5% (e.g. Pohl & Mihaljek, 1992). 5 We restrict the values of the continuous optimism bias variable to between À300 and þ300 to account for 16 outliers (although they are kept for the binary variable). A histogram providing the distribution of values for the continuous optimism bias variable is provided in Figure S2 of the supplementary material.
Projects in some sectors are much more likely to have ERRs at approval and closure. 'Agriculture and Rural Development', 'Energy and Mining', 'Global Information/ Communications Technology', 'Transport', 'Urban Development' and 'Water' have between 25 and 67% of projects subject to CBA (see Table 1). These six can be referred to as 'high-CBA' sectors and the other ten as 'low-CBA' sectors. Consistent with World Bank (2010), about 30% of projects are subject to CBA (3,098 projects out of a total of 10,459).
The proportion of projects subject to CBA by exit year points to a downward trend since 1980, due to a gradual shift away from high-CBA sectors and a decline notably in low-CBA sectors (World Bank, 2010) (see Figure S3 in the supplementary material). Notably, the number of projects with an exiting year in the years just prior to 2020 is low, thus explaining why the proportion of projects subject to CBA in these years falls to zero. Table 2 provides the incidence of optimism (and pessimism, e.g. Love et al., 2022) bias and shows that, over the sample period, at best 60% of projects with an ERR at approval and closure were subject to optimism bias. Figure 1 provides the average ERR gap in percentage terms by exit year, with lower values implying higher expectations of returns at approval than closure. Evidence of optimism bias is provided by most values falling below zero. Moreover, there was a tendency for greater optimism bias from mid-1970s to mid-1980s. Figure 2 provides the percentage of projects subject to CBA that encountered optimism bias by the year of closure. There is a clear non-linear relationship, with the incidence of optimism bias tending to increase during the period 1960-1985 and falling thereafter. The so-called 'McNamara effect' along with the push for disbursement and the reorganizations the World Bank underwent in the 1980s leading to lower quality control over projects may perhaps explain this increase (Little & Mirrlees, 1990). The fall however is likely to be attributed to a tighter focus on project planning and the move toward results-based management (RBM), which gained momentum in the early 1990s (World Bank, 2017).
Out of the sample of 3,801 projects with an ERR at approval, 691 never received an estimated ERR at closure, making it impossible to calculate optimism bias and indicating the presence of sample attrition or selection bias (see Table S1 in the supplementary material). While this proportion is not great (18%), unless the omission is random, it may affect the research findings. For example, the ERRs for poorly performing projects at approval might be less likely to be re-estimated at project closure, highlighting a persistent recalculation bias in favour of Optimism Bias and World Bank Project Performance 2609 high-performing projects. Common reasons for not reporting an ERR at project closure include staff belief that an ERR was not applicable or benefits could not be quantified (World Bank, 2010). Figure S3 in the supplementary material demonstrates that the proportion of projects with an ERR at both approval and closure fell during the period 1960-1990. Since 1990, no discernible pattern appears in the data. We discuss how we measure and correct for sample selection bias in Section 5. This paper also empirically examines whether optimism bias impacts project outcomes. Again, the World Bank changed its project performance ratings from a binary (satisfactory/not satisfactory) measure to a six-point Likert scale in 1993. So, to maximize our sample size, we reclassify the projects from 1993 using the previously employed binary classification (e.g. Bulman et al., 2017). Consistent with Ika (2018) and World Bank (2017), almost 82% of the projects were deemed satisfactory at the time of their evaluation (see Table S2 in the supplementary material).   reveal a downward trend between 1980 and 1985 but an upward trend since then. World Bank (2010) finds that project performance has improved since 1993. Bulman et al. (2017) show that average project success rates fluctuated between 70% and 80% between 1995 and 2015.
The Project Performance Rating database is augmented with country-level data from the World Bank's World Development Indicators database (World Bank, 2020b) and from Freedom House (2020). Our choice of these variables is driven by the literature (e.g. Bulman et al., 2017;Isham & Kaufmann, 1999). Variable definitions and sources are provided in Table  S3 and summary statistics for key variables are provided in Table S4 (see supplementary material). The projects in the sample were approved between 1960 and 2011 and completed between 1964 and 2015. On average, they took about seven years to complete after approval. The ERRs were on average 22.9% at closure versus 25.4% at approval, suggesting an ERR gap of À2.5% for the whole sample. Extant literature suggests an average ERR of 16% at closure as opposed to 22% (Pohl & Mihaljek, 1992) or 25% (Del Bo & Florio, 2010) at approval.

Methodology
In order to ascertain the correlates of optimism bias, the following model is specified and estimated: Our dependent variable: B i is a continuous variable or gap measure (of the intensity of optimism bias) or a binary variable (measuring the incidence of optimism bias).
P is a vector of project level characteristics for the project i including the: (i) ERR at approval; (ii) sector; (iii) lending between the World Bank and the borrower (e.g. funding sources include the International Bank of Reconstruction and Development, IBRD; the International Development Association, IDA); and (iv) type of investment loans (e.g. Financial Intermediary, Specific Investment, and Sector Investment and Maintenance Loans). 6 It also includes the length of the project implementation period, measured in years. Given endogeneity concerns over the ERR at approval, the model is also re-estimated with this variable excluded. Optimism Bias and World Bank Project Performance 2611 C is a vector of country-level variables including Gross Domestic Product (GDP) per capita, inflation (logged) (World Bank, 2020b), and extent of freedom measured by political rights and civil liberties (Freedom House, 2020).
Z is a vector of control variables and includes geographic region and approval year. Note that all country-level variables are averaged during the project implementation period. The model is estimated using Ordinary Least Squares (OLS). To test for sample selection bias, we estimate a probit model using binary attrition as a dependent variable which takes the value of one for projects that received an ERR at approval but not closure and zero otherwise.
After examining the correlates of optimism bias, we assess whether it impacts the project's evaluated outcome. The following model is estimated: O i is a binary project outcome variable taking the value of 1 if the project is evaluated as satisfactory and zero if non-satisfactory. B i measures optimism bias for the project i as discussed above. The vectors P i , C i and Z i provide the same project characteristics, country and control variables as in Equation 1. Robust standard errors are used in all specifications. While results are unlikely to suffer from simultaneity bias, endogeneity might arise from omitted variables being correlated with both dependent and independent variables. Thus, Oster's (2019) unobservable selection and coefficient stability test was applied to examine the potential role of unobserved variables.

ERR gap measure of intensity of optimism bias
Results examining the intensity of optimism bias from the estimation of Equation 1 using OLS are provided in Table 3. In the first column, region-fixed effects are included and standard errors are clustered at the country level. In the second column, country-fixed effects are included (without region-fixed effects) and the model is estimated with robust standard errors. In the third column, the model clusters standard errors by country. Macroeconomic variables are also included.
Results suggest that the ERR at approval is associated with the intensity of optimism bias. Lower levels of the dependent variable represent higher levels of optimism bias. A project with an ERR at approval 10% higher than the mean is associated with a 3%-5% higher level of optimism bias. While the coefficients attached to region dummy variables are not statistically significant in column 1, results from column 3 indicate that, relative to projects implemented in other regions, projects implemented in Latin America and the Caribbean, LAC (18% of the sample) have a higher level of optimism bias, potentially due to a higher level of regional economic volatility and political instability. ERRs may be higher for projects implemented in less distorted economic environments (Isham & Kaufmann, 1999).
The project sector appears important. The level of optimism bias is higher in the 'Agriculture and Rural Development' and 'Water' sectors relative to other sectors. Pohl and Mihaljek (1992) found that the ERR gap is smaller for transport compared to agriculture projects. This might reflect projects in the latter sector being reliant on weather and climate and therefore subject to a higher degree of volatility and uncertainty. Column 2 suggests that projects funded through IDA experience lower levels of optimism bias (relative to IBRD-funded projects). In their assessment of bias, Kilby and Michaelowa (2019) obtain mixed findings between IDA and non-IDA projects. The type of loan is also important in explaining optimism bias. Relative to 'Financial Intermediary Loans' (FIL), 'Sector Investment Loans' and 'Sector Investment and Maintenance Loans' show higher levels of optimism bias, suggesting that project appraisers more effectively account for downside risks associated with FIL loans.
To capture the non-linear relationship between project approval and the intensity of optimism bias (see Figure 1), the approval year and its squared terms are included in the model. An inverted U-shaped relationship prevails where the degree of optimism bias first increases with the approval year before falling. There is no significant association between the number of years a project takes to implement and the level of optimism bias. 7 Indeed, though optimism bias and  (Kahneman, 2011), project appraisers generally make their entry ERR assessments at appraisal, not during execution. Column 3 describes a model with smaller sample size and includes country-level variables, which add very little explanatory power to the model (cf. R-squared). Projects implemented during periods of higher growth are associated with lower levels of optimism bias. This could reflect projects performing better than expected in countries experiencing high growth rates and favorable macroeconomic conditions leading to high ERRs. The average inflation rate during implementation is also associated with lower levels of optimism bias. 8 There is no discernible relationship between political rights/civil liberties and optimism bias intensity.
Given endogeneity concerns, the models are re-estimated without the ERR at approval variable (results are available from the authors on request). Interestingly, the R-squared numbers fall by a value of just 0.02, implying that the ERR at approval explains only about 2% of optimism bias variation, which is negligible in our view. Results relating to the coefficient on other variables are nearly identical in terms of statistical significance.

The incidence of optimism bias
The models are also estimated with the binary optimism bias as the dependent variable using a linear probability model. Results are very similar (see Table 4). Considering a mechanical link between the incidence of optimism bias and the ERR at approval, the latter variable was removed as a covariate (results are available from the authors on request). All results with respect to region, sector, agreement type, lending instrument, approval year fixed effects, inflation and GDP growth are consistent. The only difference is in the magnitude of the coefficient estimates and the greater statistical significance of the variables. Again, the inclusion of the ERR at approval variable led to an increase in the R-squared numbers of around just 1%. Thus, this variable, though statistically significant, does little in explaining optimism bias variation. These results are also consistent with the estimation of a probit model. The impact of attrition and sample selection bias are examined in Insert S1 (and Tables S5-S7) and Insert S2 (and Table S8) of the supplementary material to this paper. The impact of including the ERR at appraisal and the issue of mathematical coupling is also discussed in Insert S3 of the supplementary material.

Optimism bias and project outcomes
Optimism bias matters for non-ERR project performance for two reasons. Firstly, lower ERRs at closure than approval might influence evaluators' overall assessment of the project (Kilby & Michaelowa, 2019;World Bank, 2010). Secondly, project appraisers might be influenced by organizational incentives to identify the best-performing projects, which might lead them to be over-optimistic (or deliberately misleading) with respect to ERRs (Little & Mirrlees, 1990). Tables 5 and 6 provide results examining the impact of our two measures of optimism bias on project outcomes. Table 5 estimates the model using the intensity of optimism bias (ERR gap measure), showing that, at the mean, a 10 unit increase in the ERR gap of a project is associated with a fall in the probability of project outcomes being satisfactory by 2%-3%. Relative to Sub-Saharan Africa, evidence from two of the models suggests projects undertaken in East Asia and the Pacific and South Asia are more likely to be evaluated as satisfactory. This is also the case in Europe and Central Asia (column 1) and LAC (column 3). However, the Middle East and North Africa results are mixed. These findings are supported by the literature. About 50% of African investment projects funded by the World Bank's private arm (IFC) are rated satisfactory as compared to 60% elsewhere in the world due to higher risk, business and political climate, below-average project environment, and social compliance (Dugger, 2007). Further, projects in the Global ICT and Transport sectors are more likely to be evaluated as satisfactory relative to other sectors. Column 2 indicates that IDA-funded projects have better outcomes than IBRD ones, reflecting perhaps the use of a performance-based allocation model for IDA whereby more aid is provided to lower-income countries which deliver development benefits more effectively. The coefficients attached to the approval year variable in columns 1 and 2 suggest that projects have become less likely to be evaluated as satisfactory over time, although the coefficients are very small. The approval year squared variable was dropped from this specification due to collinearity. Other findings indicate that projects implemented over longer periods are less likely to be satisfactory. This might reflect such projects being larger, more complex, or having experienced more difficulties during the implementation cycle. This is supported by Bulman et al. (2017) who found that the larger the project, the longer it takes time to complete, and the less likely it is to succeed. Finally, macroeconomic variables also appear to be important for project performance, as attested by Denizer et al. (2013) and Bulman et al. (2017). Column 3 indicates that projects implemented in countries with higher levels of civil liberties, income and economic growth rates are more likely to be deemed satisfactory while less likely for those implemented during periods of higher inflation. Interestingly, projects are more likely to be deemed satisfactory in countries with lower levels of political rights. Table 6 also suggests an important role of optimism bias in project performance. Its incidence is associated with a 17%-18% lower probability of the project being assessed as satisfactory at the time of evaluation. Findings with respect to the other variables are broadly consistent with those in Table 5 examining the intensity of optimism bias. Given the binary nature of the dependent variable, probit models are also estimated (see Table S9 in the supplementary material). Again, results are consistent although the impact of optimism bias is even greater. At the mean, a 10 unit increase in the ERR gap of a project is associated with a fall in the probability of project outcomes being successful by 3%, while the incidence of optimism bias reduces the chance of a satisfactory project rating by 19%-20%. Interestingly, when the optimism bias variables are removed from the three models, the R-squared numbers fall to 0.03, 0.10 and 0.05, respectively. This suggests that optimism bias is important in explaining a relatively large amount of variation in project outcomes.  Optimism Bias and World Bank Project Performance 2617

Further robustness tests
Our core result that optimism bias reduces the likelihood that a project will be deemed successful might be biased due to the role of unobserved variables. To examine this, we adopt Oster's (2019) coefficient stability approach, which assumes that bias arising from observed controls is more informative than that from unobserved factors. The approach allows us to calculate the bias-adjusted impact of optimism bias by specifying: (1) a value for the relative degree of selection on observed and unobserved variables (d); and (2) a value for R max which is the highest expected R-squared from a hypothetical regression of project outcomes on optimism bias and both observed and unobserved controls (see Insert S4 and Table S10 in the supplementary material). The bias-adjusted coefficients are very similar in magnitude to those from the controlled and uncontrolled regressions, suggesting that unobserved variables do not play an important role in our findings. Finally, we undertake randomization inference which provides an alternative approach to calculating p-values and testing hypotheses. This exercise allows us to test whether the impact of optimism bias is likely to be observed by chance. It establishes what we would find if optimism bias was randomly assigned across the projects and therefore whether our finding from regression analysis is unusual.
The approach does not require large sample sizes nor make any distributional assumptions. It allows us to identify whether optimism bias is important by repeatedly and randomly re-sampling the assignment of project bias. For each random assignment, the test statistic is calculated as the difference in average performance ratings for projects with and without optimism bias. Repeating this exercise provides a reference distribution for the null hypothesis of optimism bias not having an effect on project performance. We perform 10,000 permutations using the methodology of Young (2019), stratifying at the country level to determine whether or not there is a statistically significant impact of optimism bias on project performance. The null hypothesis is that there is no effect and the p-value is calculated as the proportion of times the placebo optimism bias effect is larger than the estimated effect. Results (available upon request) from using 10,000 permutations revealed that there were no instances where the re-sampled test statistic was more extreme than the test statistic for the data for the two measures of optimism bias. This implies that the impact of optimism bias on project performance ratings is highly unlikely due to chance.  Optimism Bias and World Bank Project Performance 2619

Conclusion
This paper examines the correlates of optimism bias and its impact on World Bank project performance. The paper analyzes both the intensity (magnitude) and incidence (prevalence) of optimism bias for over 2,800 World Bank projects that were appraised between 1960 and 2019. Our main findings are worth reiterating for their contributions.

Research contributions, limitations and outlook
First, we demonstrate that the incidence of optimism bias is at best 60%, which is closer to comparable figures for social infrastructure projects undertaken in the developed world (57% in Love et al., 2022), which share, though at varying degrees, similar features like the intangibility of their goals and the socio-political complexity of their settings, with development projects (Ika, 2018). However, in terms of the intensity of optimism bias, on average, the ERR at approval appears to be about 2.5% higher than at closure, as opposed to 6% in a Pohl and Mihaljek (1992) study of World Bank projects probably reflecting a fall in the level optimism bias over time. These findings offer not only new empirical evidence on the incidence but more importantly on the intensity of optimism bias in projects (Flyvbjerg, 2016;Love et al., 2022). Second, consistent with existing literature, we find that the ERRs at approval tend to be significantly biased, partly due to optimism bias. Further, we find that the incidence and the intensity of optimism bias have decreased over the last three decades, perhaps due to a tighter focus on project appraisal, planning and RBM, suggesting, as noted by World Bank (2010), that learning takes place over time in contrast to Flyvbjerg's (2016) results. Similar to the work of Kilby and Michaelowa (2019) on evaluation bias, we also find that project and country-level characteristics are significant correlates of optimism bias.
Third and finally, we reveal that optimism bias can reduce the chances of a satisfactory project performance rating by 17%-20%. This finding indicates that, in contrast to the theory that optimism bias is the main reason why projects experience cost overruns and benefit shortfalls (Flyvbjerg, 2016), additional influences beyond optimism bias might explain project underperformance. While the World Bank dataset does not include measures of competing explanations such as scope changes, complexity and uncertainty (Bulman et al., 2017;Denizer et al., 2013;Ika, 2018;Love et al., 2022), the literature suggests the quality of project appraisal and supervision processes also matters (Ika, 2015;Kilby, 2000). Indeed, experience seems to suggest that there are organizational incentives and practices that induce cost underestimations and/or benefit overestimations, which may not be due to optimism bias but the way organizations operate. Those producing the benefit-cost "acceptable" ratios might play according to the rules of the game. And this does not necessarily mean 'cheating', because there are several degrees of freedom when making ex ante estimates in conditions of high uncertainty. This observation indicates the need for more research on what explains project (under) performance in general and the role of optimism bias in particular. "Indeed, a perennial concern for World Bank management has been the frequency of projects exhibiting "disconnect"projects that were rated as satisfactory throughout the implementation process but were then ultimately rated as unsatisfactory upon completion" (Denizer et al., 2013, p. 295).

Policy recommendations
Our policy recommendations are fourfold and go beyond World Bank project settings. First, we do not call for a CBA 'revolution' in the sense of a sweeping and broad application (Sunstein, 2018) but evolution to embrace complexity and uncertainty in project appraisal. The World Bank should thus distinguish between risk and uncertainty settings (see Feinstein, 2020;Hirschman, 1967). In risk settings or the realm of known unknowns, where events and their probabilities can be known, the World Bank should apply the CBA policy, particularly in high-CBA sectors. But the World Bank should expect some inaccuracies, whether they are due to bias (e.g. optimism) or error (e.g. mistakes) . Using Kahneman's (2011) 'outside view' (instead of 'inside view' of project appraisers), the World Bank might de-bias CBA overestimates by acquiring objective past ERR data. To this end, the World Bank may implement the risk management technique of Reference Class Forecasting (RCF) advocated by Flyvbjerg (2013), which looks at the cost and benefit outcomes of a reference class of past projects similar to the one under consideration for a CBA. The World Bank may also use scenario building, a framework that weighs different scenarios with different probabilities (Feinstein, 2020).
However, under (Knightian) uncertainty, the realm of unknown unknowns, where events and their probabilities remain unknown or in instances where "we simply do not know" as Keynes would say, the reductionist CBA and RCF would fall short as the calculation cannot substitute informed and considered judgment, and cater to socio-political complexities or the human element. So, the World Bank might apply CBA, whenever relevant, along with other rules of thumb for project appraisal such as collective deliberation. But, the World Bank should expect to adapt to changing circumstances, whether they are welcome or unwelcome surprises, that is revisit assumptions, adjust plans, and make changes, particularly when projects veer from assigned paths and take complex out-turns (Hirschman, 1967;Ika et al., 2022).
Second, organizational incentives matter and project appraisers and supervisors understand their importance for their own career paths (Little & Mirrlees, 1990). While others have proposed firing or even "suing the forecaster" (Flyvbjerg, 2013, pp. 771-772), we believe error happens and forecasters may simply get their forecast wrong, especially when they face complexity and uncertainty, which are common features of development projects (Feinstein, 2020;Ika, 2018). The World Bank should instead put in place incentives not for the accuracy of CBA, which remains a means to an end, but for delivering ultimate success in projects. In other words, the World Bank should reward project appraisers and supervisorsthe Task Team Leaderswho, in the face of unforeseen circumstances, reappraise and adjust their projects, and eventually 'stumble into success' (Hirschman, 1967).
Finally, getting the right incentives alone in place is not sufficient to make projects successful. Resources matter for CBA practice (Little & Mirrlees, 1990). We know project appraisal contributes to development impact and project supervision matters for implementation success or the delivery of specific objectives under time and cost constraints (Ika, 2015;Kilby, 2000). But resources are limited. So, to fully embrace complexity and uncertainty, the World Bank needs to shift some resources from project appraisal to implementation. As Feinstein (2020) notes, this shift means more opportunities for team leaders to reappraise projects and facilitate adjustments in the implementation phase and thus overcome a detrimental 'project arrow' or linear appraisal-implementation process. This move suggests a further complexification of supervision so that it allows for monitoring and evaluation during implementation as a real-time learning instrument that complements and refines appraisal in the face of changing circumstances. This entails giving project appraisers and supervisors the flexibility or the latitude (Hirschman, 1967) to 'learn by implementing' and try different best-fit modalities of implementation in different contexts, for example, in high versus low CBA sectors, high versus lower-income countries, and in countries with varying degrees of civil liberties or political rights (Feinstein, 2020).