Targeting Social Safety Nets: Evidence from Nine Programs in the Sahel

Abstract This paper analyzes household data from nine programs in the Sahel region using a harmonized approach to compare Proxy-Means Testing (PMT) and Community-Based Targeting (CBT) as conducted in practice, once geographical targeting has been applied. Results show that the targeting performance measured depends critically on the definition of the targeting objectives, share of beneficiaries selected, and indices used to evaluate targeting. While PMT performs better in reaching the poorest households based on per capita consumption, it differs little from CBT, random or universal selection when distribution-sensitive measures are employed, or when food security is used as the welfare metric. Administrative costs associated with targeting represent only a small share of budgets. Results emphasize the importance of studying programs as implemented in practice instead of relying on simulations of targeting performance. They also suggest that PMT and CBT contribute little to poverty or food insecurity reduction efforts in poor and homogeneous settings.


Introduction
How to identify beneficiaries, or in other words, how to target, is a central question for social assistance programs.In recent years, the COVID-19 pandemic and other shocks have led to an unprecedented expansion of social assistance, and to a need to deliver transfers rapidly at scale (Aiken, Bellue, Karlan, Udry, & Blumenstock, 2022; Gentilini et al., 2022).Yet, and despite the vast literature on the subject, there is no consensus on how we should select beneficiaries. 1 Indeed, cross-country comparisons of targeting methods emphasize that there is a wide variation in targeting efficiency for each method (Coady, Grosh, & Hoddinott, 2004; Devereux  et al., 2017).Within-country comparisons of different targeting methods show that targeting errors are often large and that tradeoffs exist across targeting objectives (Alatas, Banerjee,  Hanna, Olken, & Tobias, 2012; Schnitzer, 2019; Stoeffler, Mills, & Del Ninno, 2016).However, comparing targeting performance across these studies is difficult, given the differences in assessment methodologies employed.The few studies that compare targeting performance across several programs are based either on simulated scenarios (such as in Brown, Ravallion,  & van de Walle, 2018) or on actual programs, but without fully harmonized features (such as in Coady et al., 2004 and Devereux et al., 2017).
Relying on harmonized household-level data from nine programs implemented in the Sahel region, this article compares targeting performance of Proxy-Means Testing (PMT) and Community-Based Targeting (CBT), two targeting methods that are widely used globally.We study targeting methods that were implemented between 2014 and 2018 in six countries (Burkina Faso, Cameroon, Chad, Mali, Niger, and Senegal).We contribute to the academic and policy discussion on targeting by (i) providing comparable evidence on the performance of targeting methods implemented in practice; and (ii) showing the role of different measurement choices on targeting performance.Our analysis allows us to understand the contributions of PMT and CBT as applied in practice, once geographical targeting has been applied, and compare the two methods within and across countries.
In addition, we explore the role of measurement choices across multiple dimensions, including three key ones.First, we consider both per-capita consumption and food security as wellbeing metrics, the latter being a major policy objective in humanitarian interventions.Second, we explore the role of program coverage (or the share of beneficiaries selected) in the performance of targeting.Third, we rely on multiple measures used in the literature to evaluate targeting, including distribution-sensitive indices (i.e.measures that consider distances to poverty lines).To provide additional insights, we conduct simple regressions to identify whether distinct types of households are selected or excluded by different targeting methods.Finally, we present administrative cost data, an area where evidence is particularly limited.
Several results emerge from this study.We show that measurement choices play a decisive role in the performance of targeting.At first sight, PMT seems to perform well in reaching the poorest households.The median PMT-targeted program provides 19% more resources to the poorest than would random allocations.This is only 5% for CBT, which performs systematically worse than PMT based on per capita consumption.Nonetheless, when considering distribution-sensitive measures of performance, none of the methods perform significantly differently from each other and from random or budget-neutral universal delivery of benefits.This result may be driven by the high poverty rates and the significant homogeneity present in the program areas studied.When relying on food insecurity as a well-being metric, there are no systematic differences between PMT, CBT, and random selection.While differences in performance between targeting methods are relatively small, program coverage plays a crucial role.Finally, targeting administrative costs are also similar across methods and represent a small share of the total amount of funds distributed.In sum, the results suggest that after geographical targeting is applied, the household-level targeting method employed is not likely to make a major difference in reducing poverty or food insecurity in low-income settings.
Our study contributes in several ways to the literature.This is the first cross-country comparison of targeting performance based on a primary data analysis from actual programs.As such, it differs from cross-country analyses based on simulations (e.g. Brown et al., 2018), which focus on PMT performance but cannot assess CBT or measure actual performance in practice.Our study and our results also differ from cross-country reviews that are based on actual programs, but not based on primary, micro-level data analysis (Coady et al., 2004;  Devereux et al., 2017).The use of primary data allows us to rely on multiple measures of performance that are comparable, and to benchmark our results against potential alternatives (random, geographic or universal targeting).Finally, we provide data on targeting administrative costs, and discuss policy and methodological implications for the design and study of targeted programs in the future.
Our findings have several implications.From a methodological perspective, they imply that comparisons of targeting performance that do not account for coverage rates can be misleading, Targeting social safety nets 575 and the adjustments that we make in our primary data analysis by harmonizing program coverage rates are not trivial.They also highlight how targeting studies based on simulations can yield results that differ from those obtained in real settings.Finally, our results emphasize the critical role of clearly defining objectives (e.g.selecting the poorest, or the food-insecure) when discussing targeting performance.
From a policy perspective, our findings question the widespread use of household-level targeting, as other studies have done, compared to 'universal' alternativespotentially, after some geographic targeting is applied at the regional or village level, or delivering lower amounts to all households.Moreover, given that targeting measures do not make a significant difference in terms of administrative cost or in reaching the intended population, greater attention should be paid to other dimensions of performance such as satisfaction and legitimacy, final development impacts, spillover effects on non-beneficiaries, and non-administrative costs (including private, social and political). 2 Additional evidence on these aspects would be needed to inform policy decisions.

Context and targeting in the Sahel
This study focuses on the Sahel region, one of the poorest in the world that has suffered recurrent economic shocks and is being adversely affected by climate change (Mbaye & Sign e, 2022).Four of the countries studied have among the 10 worse human development outcomes in the world and this situation has been aggravated by the Covid-19 pandemic (World Bank, 2020).Given tremendous needs, unless budgets were to increase substantially, choices need to be made on how to identify beneficiaries, even among the poor.For example, coverage rates of cash transfer programs for Chad, Burkina Faso, Niger, and Mauritania ranged between 0.4% and 1.6%.These coverage rates contrast with the high poverty rates in these countries, ranging from 38% to 45%, and with the high frequency of food crises.Besides, Sahelian countries are understudied (Briggs, 2017; Porteous, 2022), resulting in a scarcity of evidence despite the large humanitarian and development needs. 3n the last 10 years, social safety nets have emerged as key elements of the poverty-alleviation strategy in the Sahel region (Beegle, Coudouel, & Monsalve, 2018).These programs have been shown to generate various improvements in household consumption levels, food security, human capital, productive activities and resilience (Kandpal, Schnitzer, & Daye, 2023;  Premand & Stoeffler, 2022; Stoeffler, Mills, & Premand, 2020).However, there is a disagreement on the well-being metric that should be used for targeting households (food security or per-capita consumption) and a heated policy debate on the best targeting method to use in a Sahelian environment.
The PMT method relies on predicting household income, consumption, or poverty status from a limited set of observable household characteristics (Brown et al., 2018; Del Ninno &  Mills, 2015; Grosh & Baker, 1995).In contexts in which the means-testing of benefits is not an administratively feasible option, PMT provides the advantage of relying on information that can be measured relatively quickly and transparently.On the other hand, CBT relies on communities to identify beneficiaries (Beaug e, Koulidiati, Ridde, Robyn, & De Allegri, 2018;  Conning & Kevane, 2002).CBT has the advantage of leveraging community knowledge and involvement for targeting, which has the potential to improve both accuracy and legitimacy.However, CBT in some contexts is plagued by elite capture (Alatas et al., 2019; Basurto,  Dupas, & Robinson, 2020; Pan & Christiaensen, 2012; Stoeffler, Fontshi, & Lungela, 2020).In the Sahel region, the Household Economy Analysis (HEA) represents a widely used type of CBT, especially by humanitarian agencies responding to food crises (Schnitzer, 2019).Three of the CBT schemes that we assess rely on such a type of CBT.

P. Schnitzer and Q. Stoeffler
A few studies have compared PMT and CBT efficiency directly and found that PMT is better in selecting households with low per-capita consumption in Indonesia, Cameroon, and Niger (Alatas et al., 2012; Premand & Schnitzer, 2020; Stoeffler et al., 2016).However, in Indonesia, CBT can generate greater community satisfaction, while the opposite result was found in Niger.Two review studies have measured the performance of PMT, CBT and other targeting methods worldwide (Coady et al., 2004; Devereux et al., 2017).Both found important variations within methods and make the hypothesis that this stems from important variations in implementation.These studies understate the role of widely diverse levels of coverage rates in explaining targeting performance.

Data
We analyze nine datasets from six countries, presented in Table 1. 4 These datasets were collected in the context of targeting studies (Burkina Faso 2, Mali, Senegal 1, Senegal 2) or monitoring efforts (Chad), as baseline for impact evaluations (Cameroon, Niger 2), or with several objectives (Burkina Faso 1, Niger 1).In all cases, PMT formulas were calibrated based on nationally representative surveys, different from the datasets that we used.
Three main criteria were used in choosing these datasets.First, we looked for datasets from Sahelian environments, to compare across poor, homogeneous settings.Second, the data had to include information on implemented targeting schemes of social safety nets programs.Third, data had to have been collected prior to the intervention.For every household in each dataset, we had information on whether the household had been identified as an eligible beneficiary by PMT, CBT, or both.Later on, those identified as an eligible beneficiary were ultimately selected to benefit from a program based on one targeting method only (such as in Burkina Faso 1 and 2, Chad, Niger 1 and 2, and Senegal 1), or based on a combination of PMT and CBT (such as in Cameroon, Mali, and Senegal 2).Program benefits were provided after the data that we use was collected, and as such, are not relevant for our analysis, which evaluates the ability of targeting methods to identify eligible beneficiaries.
The reports from the original studies for which the data were collected 5 indicate several issues in the implementation of the targeting operations (e.g.CBT thresholds deviating from the original target in Cameroon, and PMT and CBT implementation issues in Niger 1).These implementation gaps are inherent to targeting and to CBT selection in particular (Olivier de Sardan & Piccoli, 2018).Since we study actual beneficiary identification outcomes, the results account for these implementation issues.Targeting social safety nets 577 In all datasets, information was collected on households in the areas of intervention, after a geographical targeting process.This geographic selection process varied across datasets: it was very narrow in Cameroon (15 villages in the poorest commune of the country) but not as focused on the poorest areas in Senegal 2 and Burkina Faso 2. Most datasets include information on per-capita consumption and on the food consumption score (FCS) (Table 1), as well as different household characteristics. 6CBT selection thresholds vary from 21% to 68%, reflecting the wide variation in coverage rates within program areas.
Most datasets have remarkably high levels of poverty, and poverty indices much higher than national poverty rates (Supplementary Materials Table S2).This is likely, in part, a result of geographical targeting, although comparisons based on different consumption aggregates may be difficult.The datasets are also relatively homogeneous in terms of consumption and food insecurity levels (Supplementary Materials Table S3 and Supplementary Materials Figure S1).Senegal 2 households have the highest consumption levels, followed by Burkina 2 households.In these two datasets, geographical targeting is not as narrow as in other datasets, which focus on smaller, poorer geographic areas. 7Food security is the highest in our Senegal 2 and Chad samples, followed by Niger 2. Cameroon is the sample with the lowest consumption, but food security levels are even lower in Mali.

Framework
Analyzing targeting outcomes is challenging because of the different conceptions of what targeting should achieve and the different ways to assess its performance.The rationale for targeting in low-income environments is most frequently the limited budget available for social or humanitarian programs (Verme & Gigliarano, 2019).For instance, in the six low-income countries that we focus on, between 44% and 50% of the population lives in extreme poverty ($1.9 PPP), and almost the entire population lives with less than $5.5 PPP per day.While this context makes universal transfers compelling (Ellis, 2012), delivering transfers to the full population would be a significant financial and administrative challenge for countries with limited resources (Banerjee, Niehaus, & Suri, 2019; Hanna & Olken, 2018; Ikegami, Carter, Barrett, &  Janzen, 2017).Nevertheless, if we assume a fixed budget constraint for the allocation of social protection resources, the debate on 'universalism' points to important tradeoffs for targeting transfers.The first tradeoff regards transfer amounts: including more people (potentially everyone) with lower levels of benefits vs. targeting fewer people with higher benefits.The second tradeoff regards the geographic areas: including more people (or everyone) in fewer geographic areas (e.g.villages) vs. including fewer people in more geographic areas.We consider these tradeoffs in our analyses and discuss the critical importance of program coverage.More generally, we consider the counterfactual of a given targeting scheme by comparing it to the absence of program, random selection, or universal delivery of benefits.We also collected program targeting costs.
Another issue that hinders our understanding of targeting performance is the lack of clear targeting objectives, which are often not defined ex-ante in social programs (Schnitzer, 2019;  Stoeffler et al., 2020).Given that cash transfers address a wide variety of objectives, targeting studies have been testing performance along several welfare metrics, including food security, per-capita consumption, an asset index, poverty self-assessment, legitimacy, or efficiency (see for example Pan & Christiaensen, 2012; Schnitzer, 2019; Stoeffler et al., 2016).However, the lack of clearly defined objectives makes it difficult to derive policy insights from studies assessing targeting performance using different indicators in different contexts.We address these issues by conducting a systematic comparison of targeting performance across country using two policy objectives that are widely acknowledged in shock-prone, low-income environments: per capita consumption and food security.
Importantly, measures of targeting efficiency used in the literature have also generated confusion in the discussion of targeting performance.These measures have largely relied on binary classifications of whether someone is poor or not (for example, exclusion and inclusion errors, CGH index, Targeting Differential; targeting indicator construction is described in Appendix A).While somewhat easier to interpret, these measures have important limitations that can bias the policy discussion.Classifying households as 'poor' or 'non-poor' in a binary distinction provides limited information, which has long been acknowledged (Foster, Greer, & Thorbecke,  1984).This binary classification would tend to overstate the actual level of targeting errors: a household just above the poverty threshold would be counted as wrongly included if it receives benefits, even though its consumption is still very low. 8It is important for targeting measures to consider the full distribution of the well-being metric of interest, rather than classifying households in two categories.
Finally, the question of eligibility thresholds has been largely neglected in the literature.Yet, it is likely to affect targeting performance critically.To compare two targeting schemes or for assessing the targeting performance of a targeting scheme across various indicators, the same threshold needs to be used for eligibility (selected or not) and for the wellbeing benchmark (being poor or not). 9In addition, results from using selection thresholds different from actual program coverage can also be misleading.In practice, analyses can present results using various eligibility thresholds to show potential tradeoffs and allow policy makers to decide based on key objectives.This would also make comparisons across studies possible.
There are important limitations to our study.We are unable to consider three other important aspects of targeting performance: legitimacy (the satisfaction generated by a targeting scheme), spillover-effects (how the targeting method and related program affect non-beneficiaries), and impacts (how a targeting method affects program impacts).Only a few within-country studies have been able to measure how targeting methods affect legitimacy (Alatas et al., 2012) and impacts (Premand & Schnitzer, 2020).Spillovers of cash transfer programs have been studied (Della Guardia, Lake, & Schnitzer, 2022; Filmer, Friedman, Kandpal, & Onishi, 2023) but not in relationship with targeting.These are important areas for future research.

Methodology: measures and metrics
Informed by our framework, the methodology employed focuses on making relevant comparisons across countries and on targeting methods along several well-being metrics.
The first well-being metric we focus on is per-capita consumption, which is widely employed for analyzing targeting performance.As consumption is continuous, we can fully adjust the well-being threshold (or poverty line) according to the targeting threshold used.Our second well-being metric is the Food Consumption Score (FCS), a measure of food security that incorporates both the quantity and the variety of food items consumed (Vaitla, Coates, & Maxwell,  2015).In a Sahelian context characterized by chronic and recurring acute food insecurity, this indicator is largely used by humanitarian organizations.FCS values are quasi-continuous in our sample, which allows for adjusting the FCS threshold to various selection thresholds.In the two datasets where the FCS could not be constructed, we used the Household Diet Diversity Score (HDDS), which has only a few discrete values (from 1 to 12).
While most targeting schemes rely on different selection rates, we can adjust selection rates for PMT schemes, given that PMT scores provide a full welfare ranking of households.When comparing PMT and CBT selections, we adjust PMT selection rates and eligibility thresholds for each dataset to match CBT selection rates (see Table 1).This means that if CBT selects 20% of the households in a given dataset, we adjust the PMT threshold to select 20% of households.In addition, we consider the eligibility target as being 20% of the households and thus define 20% of the households as 'poor' or 'food insecure'. 10rgeting social safety nets 579 While the above-mentioned approach allows us to make meaningful comparisons between PMT and CBT within datasets, comparisons across datasets are confounded by varying selection thresholds.For this reason, we also present harmonized selection thresholds for PMT schemes and set them all equal to 35%. 11Thus, we can compare the performance of PMT schemes between datasets, net of thresholds effects.In addition, to better explore the role of selection thresholds in targeting performance, we compute targeting measures for each selection rate between 5 and 100% for PMT selection (with either per capita consumption or FCS as a welfare metric).This means that we can compare PMT across datasets for a range of different thresholds.
We focus on three indicators of performance.First, we compute inclusion/exclusion error rates, which is an imperfect, but simple measure widely used in the literature.Given the threshold adjustment that we impose, exclusion and inclusion error rates are equal (one excluded 'poor' household means one included 'non-poor' household; see below), and we call these targeting error rates (as in Brown et al., 2018).Second, we present CGH indices, which show the contributions of targeting schemes relative to a random allocation of benefits. 12hird, we look at measures that are sensitive to the distribution of the well-being metric.We show the simulated FGT poverty-rates reduction, focusing on the gap and severity.As discussed in Section 3.1, this third measure goes beyond classifying households in binary categories (poor or non-poor), and accounts for how poor people may be (see Appendix A for a discussion of these measures). 13ur simulations of poverty reduction are based on transfers of 15,000 CFA per capita per year approximatively ($0.2 PPP per day), which represents about 15% of the median consumption level in our sample. 14The simulations consist in adding the transfer amount to the per capita consumption of each household selected by a given method.Similarly, our simulations of food insecurity reduction are based on an increase in household FCS of 7 points after transfers, which is also about 15% of the median FCS.While the simulated impacts on consumption and food insecurity are likely to differ from the actual impacts, the objective of these simulations is to assess targeting performance in a distribution-sensitive manner, not to realistically predict cash transfer impacts.
Finally, we simulate alternative selection mechanisms: random and universal selection.Random selection consists in selecting randomly X% of the households (using the same threshold as PMT and/or CBT).Universal selection includes all households, but each household receives a smaller amount of benefit compared to PMT or CBT, keeping the overall budget constant.

Targeting performance
4.1.1.Performance in reaching the poorest households.We start by presenting the performance of PMT and CBT in reaching the poorest households, using CBT selection thresholds in each dataset.For example, if a CBT targeting scheme selected 20% of the sample in one dataset, we adjust the PMT threshold so that 20% of the sample is selected by PMT in this dataset, and we assess the performance of these CBT and PMT in reaching the poorest 20% of the sample.This approach enables us to make meaningful comparisons, net of threshold effects, between PMT and CBT within datasets.
When we rely on targeting error rates to assess targeting performance, PMT systematically selects individuals with the lowest per-capita consumption relative to CBT for all targeting schemes (Table 2). 15This is consistent with the literature and with the design of PMT formulas, but particularly striking across our nine datasets.The median targeting error for CBT is 50%, while that of PMT is 39%.PMT still delivers a significant improvement over a random allocation: the median PMT targeting scheme provides 19% more resources to the poor than a Targeting social safety nets 581 random allocation would, versus 5% for the median CBT targeting scheme, based on the CGH index.
While the median difference in the performance between PMT and CBT is large, important variations exist across datasets with some patterns emerging mechanically: as more people are selected, targeting error rates decrease, and targeting methods tend to contribute less to redistribution towards the poorest (based on the CGH index).For example, the difference in the CGH index between PMT and CBT is lowest in Mali (0.07) where the selection threshold is among the highest (66%).On the other hand, the difference in the CGH index between PMT and CBT is highest in Burkina Faso 2 (0.71), where the selection threshold is lowest (21%).At the same time, targeting errors are the lowest in countries with the higher thresholds.
When relying on distribution-sensitive measures for performance based on poverty, the differences between CBT and PMT are relatively small (Table 3).The CBT simulated poverty-gap reduction of the median targeting scheme is 3.3 versus 3.7 for PMT.The small difference in poverty-gap reductions is partly mechanical, given that an important share of households is well below the poverty line in most datasets.Nevertheless, results are similar when considering poverty severity, which values more transfers made to the poorest of the poor: the median poverty severity reduction generated by CBT is 3.3 versus 3.6 for PMT.
Also striking is how close the FGT poverty-reduction targeting measures are when random or universal selection is compared to CBT and PMT.When looking at the poverty gap, CBT and universal targeting are virtually identical.But when looking at the poverty severity, universal targeting outperforms CBT in Cameroon and Niger 2and is almost identical in other countries except the two datasets with the lowest selection rates.This is because all the poorest receive transfers with universal targeting, while some are excluded with other methods.PMT is only slightly above universal targeting in terms of poverty-severity reductionby 0.23-0.73points. 161.2.Performance in reaching the most food-insecure households.When considering food insecurity as a well-being metric using targeting error rates, the median CBT and PMT targeting schemes perform similarly to a random allocation of benefits (Table 2).The median CBT and PMT programs provide, respectively, only 3% and 1% more resources to the most food insecure households than would a random allocation.Confidence intervals overlap for CBT and PMT, except in Burkina Faso 2. There are two notable exceptions where both PMT and CBT targeting schemes make a significant difference compared to random allocation.This higher contribution of targeting happens again in the two datasets with the lowest selection rates.In Burkina Faso 2, the median CBT and PMT program provide 77% and 36% more resources to the food insecure, respectively.In Senegal 1, PMT outperforms CBT by providing 33% more resources to the food insecure, compared to 17% for CBT.
Turning to distribution-sensitive measures based on the FCS, the difference between CBT and PMT appears negligible (Table S6).PMT targeting schemes outperform CBT schemes in all cases but one, according to the simulated reduction in the gap and the severity of food insecurity, but magnitudes are very small.In a majority of cases, a universal allocation of benefits performs better than alternative targeting schemes based on the gap and the severity of food insecurity.This shows that PMT misses a large part of the most food insecure individuals, who receive benefits under universal allocation.

PMT performance and program coverage.
In this sub-section, we assess the performance of PMT based on the same selection thresholds (or program coverage) across datasets with two objectives in mind.The first objective is to compare the performance across PMT schemes, net of threshold effects.The second objective is to explore in a systematic way the role of thresholds in the performance of targeting.To do so, we compare the performance of targeting methods Targeting social safety nets 583 based on a range of harmonized selection thresholds.A downside of this harmonization exercise is that it cannot be applied to CBT, given the lack of a welfare-based ranking.
The performance of PMT under a harmonized threshold (35%) tells a different story compared to the analysis based on different thresholds (i.e. using CBT selection) (Table 4).The median PMT scheme delivers 38% more resources to the poorest households than a random allocation would (Table 4), compared to 19% for the median PMT scheme under CBT selection rates (as seen in Table 2).On the other hand, the median targeting error rates based on per-capita consumption also increases from 38.8% to 51.5%.Both results are a mechanical effect of the fact that the threshold used in Table 4 (35%) is lower on average compared to the CBT thresholds used in Table 2.
Figure 1, which computes consumption-based targeting error rates for each selection threshold between 5% and 100% of the population, reflects this mechanical effect and shows the decisive role played by the selection thresholds in targeting performance.The figure illustrates that the performance of targeting varies more across selection thresholds than across targeting PMT schemes.Targeting errors decrease at a quasi-constant rate with the increase of the percentage selection in all datasets.At the same time, after a selection threshold of around 50%, the larger the selection threshold, the smaller the contribution targeting can make relative to a random allocation (as illustrated by the 45-degree line).A similar result is obtained when measuring PMT performance based on the FCS, although in most datasets targeting errors are indistinguishable from errors obtained from random selection (Figure 2). 17n the other hand, Figure 3 and Supplementary Materials Figure S3 show that the poverty gap and poverty severity decrease at quasi-constant rate with the selection threshold for most countries.For datasets with lower initial poverty levels, PMT performs slightly better than random or universal transfers at addressing poverty severity, but for the poorest datasets (e.g.Cameroon), universal and random selection perform just as well as PMT.In sum, using a harmonized selection threshold of 35% confirms that the differences in performance across PMT methods exist, but these are substantially smaller than the observed differences under actual (unharmonized, CBT-based) selection thresholds.Performance varies depending on whether methods were applied in areas undergoing narrower or broader geographical selection of poor areas.Where geographical targeting was relatively broader (in Senegal 1, Senegal 2, and Burkina Faso 2), targeting errors are lower and the CGH indices higher, ranging from 1.58 to 1.69.On the other hand, where geographical targeting was narrower (in Niger 1, Niger 2, Mali, and Cameroon), targeting errors are higher and the CGH indices lower, ranging from 1.21 to 1.33.
Our results contrast with previous multi-country studies on targeting, which have found large variations in targeting performance across programs employing the same method (Coady et al.,  2004; Devereux et al., 2017).Our results may differ from these studies because they have included a larger and more diverse set of countries, and because the programs they considered employ a wide range of selection thresholds.
4.1.4.Insights from studying targeting methods in practice.Program managers often simulate the performance of PMT models to make decisions on how to select program beneficiaries.The academic literature also uses extensive simulations of PMT methods drawn from nationally representative surveys to assess targeting performance.Nonetheless, there are several reasons why Source: Data from nine social programs (see Table 1).Notes: Targeting error rates computed for each selection threshold ranging from 5% to 100% selection.Selection and eligibility thresholds are set equal in each dataset.Random selection results are expectations (1selection rates).
Targeting social safety nets 585 simulation results may differ from those obtained in practice when programs are implemented.
In addition to implementation flaws and lags, other factors such as geographical targeting and the arbitrary choice of selection thresholds have been largely ignored. 18In contexts such as the Sahel, PMT or CBT methods are usually applied after geographical targeting.Because of this, PMT targeting is implemented in places that significantly differ from those that are used to simulate its performance (national or regional samples).If geographical targeting selects poorer and more homogenous areas, then simulations will likely overstate the contributions that any household-level selection method can make.Finally, simulations often rely on selection thresholds that differ from practice which, as we have shown, affects the assessment of targeting performance.
As an illustration, we compute targeting errors for selection thresholds set at 20% and 40% and provide a comparison with results from Brown et al. (2018) who use these thresholds in Burkina Faso, Mali and Niger in a national sample, ignoring geographical targeting (Table S4 and Table S5 in Supplementary Materials).As expected, the level of targeting errors is sizably larger in our study (approximately 50% larger, based on the same selection thresholds in the same countries).This suggests that the role of geographical targeting and other implementation issues needs to be carefully considered and that results obtained from program data are largely different from those obtained from nationally representative surveys.Source: Data from nine social programs (see Table 1).Notes: Targeting error rates computed for each selection threshold ranging from 5 to 100% selection.Selection and eligibility thresholds are set equal in each dataset.Random selection results are expectations (1selection rates).

Mechanisms: determinants of PMT and CBT selection
To explore what drives the differences between CBT and PMT results, we explore the determinants of the selection of households using probit models where the dependent variable is the selection by PMT and CBT in each dataset.We include independent variables that appear in most datasets.These probit regressions are conducted for the seven datasets where we have information on both PMT and CBT selection, and PMT selection thresholds are adjusted to match CBT selection rates.
Results show some clear patterns suggesting that definitions of poverty by communities across countries may have similarities in this setting (Table 5).Households with female heads are significantly more likely to be selected by CBT in six cases.This is not generally true for PMT.The age of the household head is also positively correlated with CBT selection, which is again not the case for PMT.Regarding household size, CBT is less consistent, but PMT always favors the inclusion of larger households.Indeed, household size usually has a high weight in the PMT formula, as it correlates strongly with low per-capita consumption.Schooling is not significant for CBT selection (except in Senegal 1), whereas it is significant and negative for PMT selection in all datasets.Finally, CBT targeting is not consistent across datasets in the weights it allows to low-quality roofs, whereas this variable enters PMT formulas in most cases and drives PMT selection in six datasets. 19verall, the results suggest that CBT tends to select smaller, vulnerable households, putting more weights on these determinants than on per-capita consumption (compared to PMT).

Targeting cost
The cost of targeting is often one of the main subjects of criticism formulated against targeting and one of the arguments for promoting universal programs.However, the literature offers Poverty severity.Source: Data from nine social programs (see Table 1).Notes: Simulations of poverty gap and poverty severity decrease under each PMT targeting scheme based on transfers of 15,000 CFA per capita per year approximatively ($0.2 PPP per day).A $1.9 PPP poverty line is used.Simulations are computed for each selection threshold ranging from 5 to 100% selection.Selection and eligibility thresholds are set equal in each dataset.
Targeting social safety nets 587 Source: Data from nine social programs (see Table 1).
Notes: T statistics in parentheses.Probit regressions of the determinants of selection by CBT and PMT in each dataset.For each dataset, the dependent variable is a dummy variable equal to 1 when the household is selected.*p < 0.10, **p < 0.05, ***p < 0.01.
little guidance regarding whether targeting represents a large share of program budgets, and whether a best use of these funds could be, for instance, to redistribute them directly to all households in a universal manner (after geographic targeting for example).Although CBT is often discussed as a cheaper alternative than PMT, CBT was found to generate approximatively the same cost as PMT in Indonesia (Alatas et al., 2012).CBT costs were also found to be relatively large in Burkina Faso (Beaug e et al., 2018). 20e collected the available administrative data on costs for all the programs studied as well as two other programs in Chad and Burkina Faso that relied on a combination of self-targeting and PMT. 21Table 6 shows the targeting costs per screened households (e.g. for each interview, for PMT, regardless of whether the household will be selected).It also shows the targeting cost per beneficiary household, which is higher because not all households screened will become beneficiaries.Mechanically, programs with a lower selection rate (i.e. a smaller share of total, screened households) will have higher targeting costs per beneficiary.Finally, we show the targeting costs as a share of total transfers made to households by the program.
The results from each program must be taken with caution since costs are not recorded systematically and in a consistent manner.Besides, targeting costs are not always easily separable from other program costs (e.g.staff time).Finally, targeting operations often serve other purposes, such as registration of households for future payments (which needs to be conducted even for non-targeted transfers).This means that eliminating targeting is unlikely to eliminate the full amount indicated in Table 6.In other words, are likely to represent a high bound estimate.
The cost to screen each household is similar across PMT and CBT methods: in most cases, costs are around $5-7 per household.The exceptions are Chad ($9.5) and Senegal 2 ($3.2).Given that selection rates vary greatly across programs, these costs result in a cost per beneficiary household ranging from $13.5 to $38.8.However, these costs remain minor compared to the total amount transferred to households: they represent 0.4% to 5.5% of total transfers.These numbers are consistent with those from the studies reviewed by Devereux et al. (2017).In sum, targeting costs do not seem to affect in a substantive manner the amounts delivered to households.Compared to the issue of targeting performance, targeting costs remain, at best, a minor argument against targeting households.Targeting social safety nets 589

Discussion and conclusion
This article makes contributions to the discussion on targeting by focusing on the Sahel, one of the poorest regions globally.By conducting a harmonized analysis across nine datasets from six countries, we generate insights for the design and study of targeting mechanisms.By relying on multiple performance indicators, we show that the definition of the well-being metric, the share of beneficiaries selected, and the indices used to evaluate targeting play a decisive role in targeting performance.Our analysis suggests that a much greater importance needs to be paid to define these parameters in a policy-relevant manner.
Even though PMT is more successful in selecting households with the lowest per capita consumption based on a binary classification of households as 'poor' or 'non-poor', PMT does not generate results that differ greatly from budget-neutral alternatives such as CBT, or random or universal delivery of benefits, when considering distributionsensitive measures of performance.Different factors could explain the poor performance that we observe.While we cannot test for the relative contributions of different factors, results are likely driven by the high levels of poverty and the low levels of inequality, in part due to geographical targeting being applied before household-level targeting.Other factors affecting the performance of targeting such as manipulation, imperfect information, and implementation challenges may also be present, but we believe that these play a relatively minor role compared to the distribution of welfare.In fact, in our context, even a method that could perfectly identify the poorest households would result in a performance that is not largely different from PMT or CBT based on distribution-sensitive measures. 22ur analysis also shows that while simulations of PMT schemes are widely used by practitioners and academics, ignoring geographical targeting and arbitrarily defining selection thresholds could potentially lead to large biases in the simulated targeting performance.Finally, our results do not suggest that PMT and CBT perform well in terms of reaching the most food-insecure households in this environmentan important policy objective in the Sahel region.While again, our analysis does not allow us to distinguish between potential explanations, this result may stem from the fact that food insecurity largely depends on geography rather than household characteristics.In addition, PMT is not designed to reach food-insecure households, and per-capita consumption often does not correlate well with food insecurity, which is more variable over time and more difficult to measure accurately (Brown, Ravallion, & van de Walle, 2019; Schnitzer, 2019).It also appears that CBT does not weigh food insecurity as a main determinant of selection, as opposed to other factors.
Taken together, our results suggest that while there may be a need to select households resulting from budget constraints, in poor and homogeneous settings, household-level targeting employing PMT or CBT contribute little to poverty or food-insecurity reduction efforts.Indeed, in the areas where PMT and CBT are appliedafter geographical targetingit is unlikely that PMT and CBT will select a large proportion of households who are not in need.These results may not be generalizable in other contexts where inequality levels are higher and where administrative capacities make it easier to distinguish between poorer and wealthier households.However, household-level targeting performance has been found low in middle-income countries as well.More research in different settings, and on other dimensions of performance (such as legitimacy and impacts) would help inform social programs design.
There are several caveats to our analysis, due to data limitations.We do not consider the legitimacy of different targeting methods, or how they affect program impacts on household outcomes.Also, we do not consider the political economy and other costs related to targeting (Devereux, 2016; Duchoslav, Kenamu, & Thunde, 2021; Sen, 1992).All these areas call for further research.
A second series of indicators rely on distribution-sensitive measures.Despite its advantages over binary indicators (see discussion in main text), these measures pose other types of challenges.(i) FGT poverty indices simulations: as a measure of efficient targeting, one can use the simulated impact of monetary transfers of FGT poverty indices (Foster et al., 1984).For a given targeting measure, FGT a reduction ¼ FGT a, post−transfer − FGT a, pre−transfer : This measure is especially useful with a ¼ 2, as transfers to poorest households will have a larger effect on the poverty severity.FGT simulations have several advantages, as they can be used as a way to measure whether transfers reach the poorest households; they provide estimates of the magnitude of the difference between various alternatives, including non-targeted approaches; they are directly related to the policy objective and to the impact of targeting; and they are relatively easy to understand.For this reason, these simulations are widely used in the targeting literature.(ii) Utility calculations: an alternative to FGT simulations is utility simulations, where the utility of a population is assessed before and after transfers (Hanna & Olken, 2018;  Jensen et al., 2019).This measure is useful to include the full population (not only the poor), and thus discuss political economy issues.However, it relies on assumptions that are not necessarily realistic, and are not transparent.(iii) Horizontal equity measure: this measure is used by Hanna and Olken (2018) but focuses on fair treatment of equal individuals.However, this measure has undesirable mechanical features (e.g. it indicates perfect targeting when everyone is included, or excluded) and does not necessarily correspond to policy and potential beneficiary objectives.(iv) TD a : this measure was suggested by Stoeffler et al. (2016) as a version of the Targeting Differential that takes into account the distance to the poverty threshold, similar to the FGT measure.However, given the weights put on inclusion errors, it does not have the desirable properties of the FGT measure.(v) Distribution Characteristic Index (DCI): this measure was used by Coady and Skoufias (2004) and more recently by Grosh,  Leite, and Wai-Poi (2022) to capture the progressivity of a targeting scheme regardless of the size of the program and without assuming a poverty line.However, the measure is very sensitive to extreme values; not transparent for policy makers and practitioners; and does not provide a meaningful comparison with non-targeting alternatives.
Some recent studies have also attempted to measure satisfaction and legitimacy among beneficiary populations.However, measuring satisfaction and legitimacy from targeting mechanisms is difficult without actually implementing counter-factual targeting methods.Two randomized evaluations have measured satisfaction and legitimacy among groups selected by PMT or CBT (Alatas et al., 2012; Premand & Schnitzer, 2020).Potential measures include: (i) Satisfaction rates: this measure focuses on whether households declare themselves satisfied or not with the whole targeting mechanism.However, it is difficult to disentangle satisfaction with the program, with the targeting process, with the selection result, etc. (ii) Future usage: this measure focuses on the desire from the local populations to reproduce, in the future, the targeting method that was employed.However, a limitation of this metric is that answers may be affected by the alternative methods that they know or can imagine (e.g.PMT-selected households may be able to imagine a CBT mechanism, but CBT-selected households may not be able to know whether they would prefer PMT-selection).(iii) Negative impacts: this measure focuses on the negative effects of targeting, such as the existence of social conflict (Cameron & Shah, 2013) or other negative effects.However, this is a limited assessment that focuses only on the negative aspects of a targeting scheme.
Targeting social safety nets 595

Figure 1 .
Figure 1.Targeting-error rates by selection threshold, per-capita consumption metric, PMT.Source: Data from nine social programs (see Table1).Notes: Targeting error rates computed for each selection threshold ranging from 5% to 100% selection.Selection and eligibility thresholds are set equal in each dataset.Random selection results are expectations(1selection rates).

Figure 2 .
Figure 2. Targeting error rates by selection threshold, food consumption score, PMT.Source: Data from nine social programs (see Table1).Notes: Targeting error rates computed for each selection threshold ranging from 5 to 100% selection.Selection and eligibility thresholds are set equal in each dataset.Random selection results are expectations(1selection rates).

Figure 3 .
Figure 3. Poverty-gap and severity reduction, post-transfer, PMT.Panel A: poverty gap, Panel B:Poverty severity.Source: Data from nine social programs (see Table1).Notes: Simulations of poverty gap and poverty severity decrease under each PMT targeting scheme based on transfers of 15,000 CFA per capita per year approximatively ($0.2 PPP per day).A $1.9 PPP poverty line is used.Simulations are computed for each selection threshold ranging from 5 to 100% selection.Selection and eligibility thresholds are set equal in each dataset.

Table 1 .
List of datasets used

Table 2 .
PMT and CBT exclusion-error rates for consumption and food insecurity (thresholds: selection CGH corresponds to the Coady, Grosh and Hodinott index.It indicates the share of additional resources that go to the poor relative to a random allocation of benefits.Selection and eligibility thresholds are set equal in each dataset based on the CBT selection rate.FCS is used as a food insecurity metric, except for Cameroon and Burkina Faso 2, where HDDS is used.90% confidence intervals indicated in brackets based on bootstrapping. Source:Data from nine social programs (see Table1).Notes:a Chad and Niger 1 do not have CBT selection, the PMT selection rate was set to 45.

Table 3 .
PMT and CBT distribution-sensitive targeting measures for consumption (thresholds: selection Notes:Simulations of poverty gap and poverty severity decrease under each targeting scheme (CBT, PMT, random) based on transfers of 15,000 CFA per capita per year approximatively ($0.2 PPP per day).A $1.9 PPP poverty line is used.Universal transfers are adjusted to keep budgets constant(15,000 Ã selection rate).Selection and eligibility thresholds are set equal in each dataset based on the CBT selection rate.a As Niger 1 and Chad do not have CBT selection, PMT selection rate was set to 45.

Table 4 .
PMT exclusion-error rates and the CGH index for consumption and food insecurity using a common selection threshold for all datasets (selection ¼ 35%, welfare ¼ 35%) Source: Data from nine social programs (see Table1).Notes: CGH corresponds to the Coady, Grosh and Hodinott index.It indicates the share of additional resources that go to the poor, relative to a random allocation of benefits.All food insecurity measures are based on FCS except Burkina Faso 2 and Cameroon, for which it is based on HDDS.Selection and eligibility thresholds are set equal to 35%.Random selection results are expectations (1selection rates).90% confidence intervals indicated in brackets based on bootstrapping.

Table 5 .
CBT and PMT selection, probit models

Table 6 .
Targeting costs (US dollars) The estimates include variable costs associated with collecting the information necessary to identify beneficiaries and exclude fixed costs that are linked to multiple program aspects other than targeting, such as government administrative costs.Conversions are based on an exchange rate of 582 FCFA per USD. a This represents a lower bound estimate as it does not account for the household listing that was taken from the registry census.