Quality standards, implementation autonomy, and citizen satisfaction with public services: cross-national evidence

ABSTRACT This article investigates whether citizens’ evaluations of service performance are related to archival measures of performance, and how institutional context shapes this relationship contingent on administrative autonomy – standards, human resources, and financial autonomy. Using cross-national education data, this study finds that student performance is positively associated with parental evaluations of schools. Perceptions are more closely aligned with performance when agencies have greater autonomy in managing employees, and when national-level bureaucracies set performance standards. This research advances our understanding of the role of administrative autonomy in citizen satisfaction and provides implications for the institutional designs that can benefit performance assessment.


Introduction
How citizens view government performance is fundamental for democratic accountability.
Citizens' opinions of public services can provide government officials with vital feedback on the job they are doing by alerting officials about changes in service priorities, the clientele they serve, or the need to reallocate scarce resources.Given the importance of citizen inputs for effective democratic governance, governments around the world have gathered citizens' opinions about government performance (Bouckaert, Van de Walle, and Kampen 2005;Van Ryzin 2015) and compared these with administrative performance data.To the extent that citizen evaluations of performance match administrative performance indicators, there is the potential for a consensus on what the government should do, the informed judgments necessary for governmental accountability, and increased citizens' support for overall policy (Hill and Hinton-Andersson 1995;Monroe 1998).
Under the auspices of New Public Management's (NPM) emphasis on performance assessment (Overman 2016(Overman , 2017)), citizen satisfaction with public services has frequently been used as vital performance information.An implicit assumption underlying the use of citizen satisfaction is that the actual quality of public service matters (Van Ryzin 2015;Van Ryzin et al. 2008) so that there is a strong relationship between citizen satisfaction and government performance.The empirical findings of this relationship, however, are inconclusive (e.g., Favero and Meier 2013;Kelly 2003).
A second assumption is that governments have incentives to be responsive to citizen preferences; therefore, the commonality between citizen evaluations and actual performance exists regardless of institutional contexts.This is partly because most studies have been conducted in decentralized, democratic systems where citizens can signal their preferences to governments through exercising choice or exit options (Schneider, Teske, and Marschall 2002;Tiebout 1956; but see Brinkerhoff, Wetterberg and Wibbels 2018;Song and Meier 2018).Under decentralized systems with local control, governments are incentivized to prioritize performance criteria that match citizens' preferences (Tiebout 1956; but see Overman 2017).This assumption, however, does not always hold.Countries with centralized systems or autocratic regimes may have fewer incentives to respond to citizen preferences.In addition, the immature policy environments of developing countries often make it difficult for governments to respond to citizens and manage the performance appraisal system (Nõmm and Randma-Liiv 2012).These raise important questions: Can the association between citizen satisfaction and archival measures of performance be generalized across countries?If so, under what institutional contexts are they most closely aligned?
This article aims to answer these questions and add to the existing literature on citizen satisfaction and public management in three ways.First, it incorporates variation in institutional contexts across the world and contributes to increasing the generalizability of existing empirical findings.Generalizability is important not only because it is a necessary condition for a good theory, but also because it helps to provide policy implications for performance appraisal and management practice for practitioners who work in different institutional contexts.Although recent studies have broadened the research context by looking at citizen satisfaction in non-US contexts (e.g., Song and Meier 2018;Walker et al. 2018), they are mostly single-country studies and, thus, have a limited ability to consider cross-national variation (but see Brinkerhoff et al. 2018).Using a cross-national database, this study advances our understanding of how a country's institutional arrangements make a difference in linking citizen satisfaction to program outputs.
Second, among the various institutional factors, this study incorporates a theoretically and practically important institutional context, administrative autonomy-the level of decision-making discretion that administrative agencies have in the policy process-into the research on citizen satisfaction.The delegation of authority over public service delivery to local governments is a global trend allowing more managerial autonomy (Overman 2016;Verhoest et al. 2012); however, we know little about how autonomy affects the link between citizen satisfaction and actual program performance.Examing the role of autonomy therefore makes a significant contribution to the literature.
Third, this research considers both the degree of autonomy (how much to delegate) and the dimensions of autonomy (what to delegate) and investigates whether citizens' perceptions toward service quality are more closely aligned to archival measures of performance when local agencies have greater autonomy in setting quality standards, managing employees, and allocating financial resources.By doing so, this study provides meaningful information about which dimensions of autonomy should or should not be delegated to agencies.In addition to providing practical implications, this research offers theoretical insights into the multidimensionality of autonomy.As far as we know, our study is among the first that examines how three different dimensions of autonomy affect the relationship between objective and subjective assessments of public services in a crossnational setting.
The empirical context of this research is education, an ideal case to study citizen satisfaction and performance assessment.Quality education with its links to economic mobility is a highly salient issue in most countries, thus providing incentives for parents to be informed about school quality and for public administrators to seek improved performance.
Parents are consumers who benefit directly from the service; 1 in some cases, parents are 1 There are, of course, other service areas that citizens are the direct consumers, such as utilities, waste removal, public transportation, fire protection, etc. permitted some choices in their children's education further increasing the incentives to learn about school performance (Djellal and Gallouj 2009).Unlike other government services where citizens might not use the service and thus be unaware of overall quality, education studies focus on parents with children in school.
We test our research questions using a cross-national education dataset with more than 62,000 individual respondents in 16 countries.The analysis combines archival data on students' academic performance, parents' perceptual judgments of the quality of schools, and administrators' perceptions of autonomy.By doing so, this study investigates under what institutional context parents' perceptual evaluations and archival indicators of school performance are more closely related to each other.

Citizen Evaluations and Government Performance
In response to a growing emphasis on government performance and heightened expectations for quality services, governments seek to measure and evaluate performance accurately.Many government service outcomes, however, are not quantifiable; or there is no consensus among multiple stakeholders on what good performance is.Citizens' perceptual evaluations are often used to measure government performance at both the local and national levels (Bouckaert et al. 2005), assuming that citizens' assessments are aligned with actual government performance. 2Research has sought to investigate whether stakeholders' perceptual judgments and archival performance indicators have some common ground (e.g., Campbell and Fiske 1959;Favero and Meier 2013;Van Ryzin, Immerwahr, and Altman 2008).Archival performance data are quantifiable and observable data on performance (e.g., administrative records of performance), while perceptual data represent stakeholders' perceptual judgments of performance (e.g., citizen satisfaction with service quality) (Andrews, Boyne, and Walker 2006). 3  Some scholars argue that citizens' perceptual evaluations may not reflect actual service quality because it is unclear how much accurate information citizens have and what criteria they use to evaluate performance (e.g., Kelly 2003;Stipak 1979).Recent experimental studies also question the validity of citizens' evaluations.They show that citizens unconsciously associate public agencies with inefficiency and inflexibility, thus biasing their evaluations (Marvel 2016), and that citizen satisfaction does not systematically reflect changes in performance as the theory suggests (Andersen and Hjortskov 2016).
Citizens' responses to satisfaction also can be shaped by information cues about government performance (James 2011) and can be contingent on the choice of a positive or negative label description (Olsen 2015).These framing effects raise questions about whether citizens' perceptions of performance accurately represent service quality.
Despite the concern about the validity of citizens' perceptual evaluations, many empirical studies have demonstrated that citizen satisfaction does correlate with service quality measures.In the context of education, Favero and Meier (2013) show that parents' evaluations of New York City schools are positively associated with test scores, objective progress reports, official quality reviews, student attendance, and negatively to school violence (see also Charbonneau and Van Ryzin 2012).In addition to parents' evaluations, research finds that other stakeholders' (students and teachers) perceptual judgments are also 3 Both measures have their own pros and cons.Archival performance indicators have been regarded as desirable due to being independent from perceptional judgments (Andrews et al. 2006); however, they often only focus on service aspects that can be easily quantified and may not represent actual service quality.Perceptual performance measures, in contrast, can capture non-quantifiable but important elements of performance that matter to citizens, but are limited by citizen knowledge about the service.significantly related to archival school performance indicators (Song and Meier 2018).
Goldring and Silve (2011) using longitudinal survey data to administrative records on student achievement in England find a strong relationship between parent satisfaction and academic performance measures.The positive relationship between objective indicators and subjective judgments is not limited to education.Evidence from street cleanliness services also finds a positive correlation between citizen evaluations of service and a cleanliness scorecard (Van Ryzin et al. 2008).Even in a complex policy area with extensive information asymmetryhealthcare, patients' perceptions of service quality are significantly related to objective hospital performance indicators such as clinical process of care scores and 30-day readmission rates (Cheon et al. 2019).
While these studies have made a meaningful contribution, most research linking citizen evaluations to performance indicators has been drawn from Western democratic countries with substantial local autonomy such as the United States, the United Kingdom, and Denmark (Walker et al. 2018).The heavy reliance on a few governance contexts raises concerns about generalizability because how citizens perceive government performance can be influenced by various institutional contexts (Lyons, Lowery, and DeHoog 1992).The underlying theoretical assumptions that local services vary, and that citizens can select jurisdictions that match their tax and services preferences so that citizens' opinions can inform priorities for service provision and influence government funding decisions, however, are not met in many countries (Overman 2017).
As an effort to increase the generalizability of the theory, recent studies have moved the theory to new institutional contexts.Song and Meier (2018) examine how multiple stakeholders' perceptions of school quality are related to archival school performance indicators in South Korea, a centralized education system.Unlike the theoretical works that suggest that the association might be weak in centralized regimes, their study demonstrates a common ground for performance assessment in Korean schools.Walker et al. (2018) also highlight the importance of institutional context and test how perceptual judgments of performance are shaped by performance information in the context of education and solid waste in Hong Kong.In Mexico, Petrovsky, Mok, and León-Cázares (2017) examine whether the theory holds in a developing country with limited accountability and limited competitive democracy (see also Brinkerhoff et al. 2018).Even in this context they find that citizens are satisfied when service quality exceeds their expectations supporting the basic argument of the theory. 4ased on these findings, we expect that the positive relationship between citizens' judgments of performance and archival performance indicators will hold in a cross-national context.We test this hypothesis at both the individual-level and the organizational-level and examine whether each level of performance influences citizen satisfaction respectively or jointly.Citizens might evaluate services solely on the basis of how they benefit personally, or they might take a broader view and respond to the benefits to all users.This distinction also provides implications for theoretical works that argue individual benefits matter, but collective ones do not (see Tiebout 1956).We expect that parents are more likely to be satisfied with schools both when their own children achieve higher performance and when the schools perform better.

H1. Parents will give more favorable evaluations to schools when their children perform well on academic achievement tests.
H2. Parents will give more favorable evaluations to schools when the schools perform well on academic achievement tests.

Administrative Autonomy and Performance Evaluations
A trade-off between agency autonomy and democratic accountability remains at the heart of the study of public administration (Kirkhaug and Mikalsen 2009).On the one hand, greater autonomy allows bureaucrats to use their expertise and improve performance (Carpenter 2001).On the other hand, autonomy can be misused to pursue policy goals that diverge from what citizens want (Kogan 2017).Despite the conflict, the devolution of policy responsibilities from higher-level to lower-level governments has been a long-term trend world-wide (Verhoest et al. 2012).
In the present era of delegation, an extensive literature explores the role of public sector agency autonomy (for a review, see Overman 2016).In particular, Overman (2017) explains how agency autonomy shapes citizen satisfaction based on the three theoretical routes.First, according to responsiveness theory, delegation to administrative agencies facilitates managerial discretion in policy implementation, thus, permitting greater responsiveness to citizens and ultimately greater citizen satisfaction with public services (Lyons, Lowery and DeHoog 1990;Van Thiel 2001).Second, building upon the blameshifting argument, delegation can be a strategy for politicians and the central government to hide behind agencies when things go wrong (Hood 2002).Autonomous agencies often "serve as a shield to deflect blame for bad service outcomes," which could increase citizens' satisfaction with politicians and the central government (Overman 2017, p.215).Third, based on credibility theory, the delegation of task implementation can be seen as a commitment to independent decision making, and such depoliticization can improve the impartiality of policy implementation (Knott and Miller 2006) as well as continuity in service delivery (Overman 2017).These will lead to higher citizen satisfaction.
Although Overman (2017) focuses on citizens' satisfaction with the central government rather than service agencies, the logic should apply to our study.First, delegation to schools facilitates a principal's discretion in school policy and educational curriculum and allows greater responsiveness to parents and students.When schools offer more tailored education, parents should be more satisfied with the schools.Second, when schools have greater autonomy, parents should be more able to blame schools for bad performance than when the central government has complete control over education policy (Koppell 2005). 5hird, administrative autonomy helps protect schools from politically-driven education policies (see Hammond et al. 2018) and allows schools to continue to offer stable and sustainable education programs.This can lead to higher parent satisfaction with schools.
In practice, administrative autonomy varies significantly across countries as well as within countries.This variation should affect not only parent satisfaction but also the relationship between parent satisfaction and archival school performance indexes.Some countries, such as the United States, have decentralized education systems that grant schools significant autonomy.Parents in such systems can send preference signals to schools by exercising school choice options or participating in school decision making (Schneider, Teske, and Marschall 2002).Under this context, parents are more likely to be happy with schools, and administrative school performance indicators should be more closely related to parents' satisfaction with their children's schools (see Favero and Meier 2013).Other countries have centralized systems that limit school autonomy and parents' school choice.In this context, it is difficult for parents to express their preferences and needs; therefore, their satisfaction would not closely be linked to objective performance indicators set by the central government.Little empirical evidence, however, exists on this question.
Another important gap in the literature is the multidimensionality of autonomy.
Previous literature has focused on the degree of autonomy and paid less attention to the question of how different dimensions of autonomy affect public service delivery.
Recognizing the multidimensionality of autonomy (Krause and Van Thiel 2019;Verhoest et al. 2012;Wynen et al. 2014), we argue that 'what (not) to delegate' matters as much as 'how much to delegate' in shaping the relationship between subjective and objective performance assessments.In particular, this study considers three distinct aspects of autonomy --(1) standards, (2) human resources, and (3) financial autonomy.Standards autonomy reflects devolving authority to administrative agencies to set the actual quality standards used in evaluation whereas human resources and financial autonomy capture delegating the responsibility of the agencies to get the job done in the policy implementation process.Based on the literature suggesting that the degree and the type of autonomy allowed to administrative agencies varies within and between countries (Overman 2017;Verhoest et al. 2012), we expect that each type of autonomy can have a unique moderating effect on the relationship between archival performance indicators and citizens' perceptual judgments of service quality.

Standards autonomy
Standards autonomy is defined as the degree of decision-making authority in setting quality standards for public programs.The delegation of authority to set standards directly relates to an autonomy-accountability dilemma, because if agencies have significant autonomy in the early stages of policy goal setting (or quality standards setting), it is more difficult for elected officials to hold agencies accountable (Nielsen 2014).Standards autonomy also involves the inherent risk of goal displacement with individual agencies defining their own policy objectives.Public agencies may have incentives to focus on tangible policy outputs rather than more meaningful policy outcomes to maximize their performance ratings (Merton 1940), engage in "effort substitution (reducing effort on nonmeasured performance dimensions) or gaming (making performance on the measured performance dimension appear better, when in fact it is not)" (Kelman and Friedman 2009, p.918).
Greater standards autonomy may lead agencies to seek lower standards because it is easier for them to exceed expectations and get high performance ratings when standards are low.High standards that are difficult to meet may not be favorable to the agency even though the pursuit of higher service quality is good for citizens.In education, permitting states to set performance standards under the US No Child Left Behind Act allowed low performing states to set significantly lower standards (see Manna 2006).In addition, individual schools may try to manipulate student performance on standardized exams by purposely excluding low achievers from the testing (see Bohte and Meier 2000).As Bohte and Meier (2000, p.180) note, "cheating is likely to occur in organizations in which the day-to-day activities of bureaucrats are not heavily monitored (for example, highly decentralized bureaucracies)." Based on the discussion, we expect that greater local autonomy in setting quality standards can negatively affect the link between archival performance indicators and citizen evaluations.When schools have greater autonomy in setting assessment policies and educational curricula, they might set low standards (e.g., adopting lenient performance standards, lenient grading, or offering easier courses) to maximize their performance ratings.
The performance indicators schools use, in this case, may not capture what parents value, thus increasing the gap between archival performance measures and parents' evaluations of school quality.Under standards set by national governments, in contrast, schools have less space for manipulating the quality standards; therefore, administrative performance indicators may more closely reflect citizens' judgments of school performance.
When national quality standards exist, administrative performance can be compared across different schools; knowing how their children's schools perform compared to other schools may help parents make more accurate judgments about school quality.Highlighting the notion of social aspirations (Cyert and March 1963) where organizations or individuals compare themselves to others, Barrows et al. (2016) argue that parents' evaluations of schools can be influenced by how their child's school ranks relative to other schools at the state, national, or international levels.Decentralized and fragmented systems with different quality standards; therefore, make it more difficult for parents to evaluate the quality of their school compared to other schools (see also Olsen 2017).
H3a. Student performance will be less closely aligned to parents' evaluations in schools with greater standards autonomy.

Human resources autonomy
Human resources autonomy is defined as the degree of decision-making authority in managing people.It concerns the devolution of responsibilities to administrative agencies to hire, fire, compensate, train, and motivate employees.The authority to manage human resources is especially important in labor-intensive policy areas because effective personnel management directly relates to better service quality and increased citizen satisfaction (Favero et al. 2014).Greater human resources autonomy can help agencies attract and hire the employees who fit the job or bring local cultural skills, thus allowing the organization to provide better service (Nielsen 2014).
Responsiveness theory presumes that the delegation of autonomy facilitates interaction between bureaucrats and citizens allowing for more locally tailored service (Van Thiel 2001) that leads to increased citizen satisfaction (Overman 2017). 6Greater human resources autonomy allows schools to hire teachers and staff who can provide more customized service to students (e.g., teachers specialized in teaching nonnative speakers).In a similar vein, the decentralization literature argues that the devolution of power from upperlevel to lower-level governments can make bureaucracies more responsive and accountable to citizens (Escobar-Lemmon and Ross 2014).In theory, fragmented and decentralized local governments can provide a greater range of public services responding to the preferences of citizens, whereas centralized national governments often implement one size fits all policies (Li, Wang, and Zheng 2017). 7Human resources autonomy can also encourage co-production, therefore, contributing to reducing the gap between citizens' perceptions of services and administrative performance indicators (Jakobsen 2012).
The education literature has also suggested that school autonomy encourages parental involvement and co-production (Bifulco and Ladd 2005).Increased parental involvement can contribute not only to school quality or student performance (e.g., Marschall 2006) but also can reduce the gap between parents' perceptions of school quality and archival performance indicators.This is possible because the more parents participate, the more their opinions are likely to be reflected in educating students and the more they know what the school is actually doing.
H3b. Student performance will be more closely aligned to parents' evaluations in schools with greater human resources autonomy.

Financial autonomy
Financial autonomy is defined as the degree of decision-making authority in financial transactions (or authority in determining revenues and expenses).Similar to human resources autonomy, financial autonomy can also help agencies to provide more customized services to citizens and encourage citizen engagement by allowing agencies to allocate financial resources according to citizens' preferences and needs.
While the literature on fiscal decentralization supports the argument for providing more localized services, the participatory budgeting scholarship suggests two competing hypotheses about the effect of financial autonomy on citizen participation.The first scenario proposes that managers in an agency with greater financial autonomy would rely on their own expertise and knowledge to make budget decisions rather than seeking citizens' opinions.An alternative perspective holds that greater autonomy in budgeting leads to accountability, and managers are more likely to engage citizens to increase the legitimacy of agency actions (Neshkova 2014).Empirical findings also show mixed results.Neshkova (2014), for instance, finds that greater autonomy in allotment processes and own-source revenue has a positive relationship with citizen participation, whereas greater autonomy in developing spending forecasts is negatively correlated with citizen participation in US transportation and environmental policy.
When a school has a greater autonomy in budget formulation and allocating funds, parents might feel that they can exert greater influence over educational programs and policies and gain more from participating; therefore, they may be more willing to devote their time and effort to participate (Bifulco and Ladd 2005, p.556;Parrado et al. 2013).Put differently, greater financial autonomy can make involvement worthwhile for parents by providing parents the material incentives to participate in collective efforts to influence school programs and policy.When parents engage with their children's schools, their evaluations of school quality are more likely to match administrative school performance indicators because parents understand the production process.
H3c. Student performance will be more closely aligned to parents' evaluations in schools with greater financial autonomy.

Data and Methods
Testing these hypotheses requires cross-nationally comparable data on parents' perceptions of school quality, an archival measure of school performance, and an autonomy index.We use the Organization for Economic Cooperation and Development's (OECD 2015) Program for International Student Assessment (PISA) database.PISA provides measures of 15-year old students' academic performance in math, science, and reading, and this measure allows countries to compare student learning outcomes.In 2015, more than 70 countries and a few education systems that are not countries (e.g., Hong Kong and Macao) participated in the PISA assessment.
Within participating countries, samples were drawn via a multistage stratified random sampling process designed to generate a fully representative sample for each country (OECD 2015). 8While 72 countries and education systems participated in the academic performance assessment, only 16 countries participated in the parent survey. 9Our sample includes Chile, 8 For more details, see OECD (2015). 9A small but unrepresentative sample of parents in the United Kingdom also participated in the parent survey and were excluded from the analysis.The substantive results in this paper remain the same whether or not we include the UK data.
Croatia, Dominican Republic, France, Georgia, Germany, Hong Kong, Ireland, Italy, Korea, Luxembourg, Macao, Malta, Mexico, Portugal, and Spain.While these countries are clearly not representative of all nations of the world, they do provide a wide range of countries that can provide a general test of the hypotheses. 10  Since our data include both individual and organization level variables, ignoring the multilevel data structure can bias standard errors downward.Since we pool countries together in estimating our results, the country level characteristics should also be considered in the modeling strategy.One way to address this issue is to employ three-level multilevel models with country level control variables.A challenge with multilevel modeling is that the group N should be large enough (otherwise, inefficient; see the discussion from Maas and Hox 2005); the small degrees of freedom at the country level also limits us accounting for various country characteristics.For these reasons, rather than using multilevel models, we employ a mixed modeling strategy. 11We use Ordinary Least Square (OLS) regression models with clustered standard errors by schools accounting for the first two levels.In addition, to control for unobserved country-specific effects on parents' satisfaction with schools, all models include country fixed effects.Lastly, we also use the sampling weight of grade non-response adjustment in each country provided by PISA.

Dependent variable: Perceived school quality
The dependent variable of theoretical interest is perceived school performance as measured 10 The Chow test results show that pooling data is essential instead of estimating the results in each country (results available upon request). 11As a robustness check, we also run our models with the three-level multilevel modeling approach and find that our results remain largely the same (results available upon request).
by parents' perceptual judgments of the schools.The PISA parent survey asks about various aspects of the school quality, such as teacher quality, education standards, teaching methods, disciplinary atmosphere, the quality of education, etc. on a four-point scale (from strongly disagree to strongly agree).Factor analysis demonstrates that all survey items loaded onto a single factor producing an eigenvalue of 4.00 with a Cronbach's alpha of 0.87.Table A1 in the Appendix shows the details of the survey questions and the factor analytic results.

Independent variables: Archival performance indicators
This study adopts two different levels of archival performance indicators as independent variables-individual student test scores and school mean scores.All students in the sample take tests in math, reading, and science; and we create two standardized performance indexes using these test scores (one for individual scores, the other for school scores).This permits us to determine if parents evaluate schools based on their own child's performance or the performance of the entire school.Although PISA scores are not an official performance indicator in any of the countries examined, they are highly correlated with other standardized exams that are used in official assessments (Rindermann 2007).

Standards, human resources, and financial autonomy
The article incorporates three dimensions of autonomy: standards, human resources, and financial autonomy.We measure each dimension of autonomy using questions about who has the responsibility for educational standards, staffing, and budgeting.The PISA school survey includes a set of questions, "regarding your school, who has considerable responsibility for the following tasks?"School administrators report whether (1) teachers, (2) the principal, (3) the school governing board, (4) the regional or local education authority, or (5) the national education authority has a responsibility for the tasks such as hiring and firing teachers, determining teacher salaries, formulating and allocating school budgets, establishing disciplinary policies and assessment policies, determining course content and textbooks, etc.
For each item, we assign a value of four if inside school actors (the principal or teachers) have the responsibility, three for the school governing board, two for the regional/local education authority, and one for the national education authority. 12The underlying logic here is that the more upper-level agencies (education authorities) delegate authority for managing schools to school-level actors (principals and teachers), the more autonomy schools have.
The factor analysis of school autonomy indicators produces a three-factor solution (see Table A2 in the Appendix). 13The first factor taps school autonomy in assessment policies and educational curricula reflecting who has decision making power in setting educational standards.We use this factor to measure standards autonomy.The second factor mainly captures school autonomy in managing human resources (teachers).The third factor captures autonomy in formulating and allocating the school budget, and we use this factor as financial autonomy.

Control variables
The citizen satisfaction literature suggests that socioeconomic and demographic factors are significant predictors of citizens' preferences and satisfaction (Brown and Reed Benedict 2002).We control for the parents' educational attainment (coded as a six categoryvariable from no formal education to tertiary education and advanced research programs) and the students' immigrant status (native=0; second generation=1; first generation=2) at the government-sanctioned measure of education performance in any of these countries.
Countries may have other official test indicators or no indicators at all.Similarly, countries such as Germany do not report any test scores to parents, and parents would not know the PISA scores of their child.Although PISA scores are positively correlated with other standardized tests including those used for national assessments (Rindermann 2007), the correlation between perceptions of school performance and actual test scores relies on a series of judgments on the part of parents in an environment that may not be information rich.
Despite these difficulties, we find a relationship between actual test scores and parents' perceptions of performance.Because this relationship varies greatly across the nations in our study, the next step is to determine what factors can enhance the ability of clients (parents) to evaluate the quality of services (education).
[Table 1 here] Table 2 shows the pooled regression models to test the generality of theory.Based on the theory and evidence from previous single country studies that offer a positive association between parents' evaluations and student performance, we first hypothesize that the positive link can be generalizable across countries.Consistent with our theoretical expectations, Model 1 in Table 2 shows both individual student performance (test scores) and overall school performance are positively and significantly associated with parents' evaluations of school quality (H1 and H2 supported).Parents are more satisfied with their child's school when the school achieves higher academic performance as well as when their own children do well on tests.The positive relationships remain the same when we add the three autonomy measures (Model 2).The interaction of student and school performance, however, is unrelated to parent satisfaction (Model 3).
[Table 2 here] The primary research interest of this study is to examine the role of autonomy in shaping the relationship between perceptual evaluations and archival assessments.By adding interaction terms between archival indicators of performance and the three autonomy measures, we test whether and how autonomy moderates the effect of archival assessments on satisfaction (Table 3).The interaction between student performance and standards autonomy shows a negative and significant coefficient suggesting that when schools have more autonomy in setting standards, student performance is less associated with parents' satisfaction (Model 1 in Table 3; H3a supported).By contrast, human resources autonomy positively moderates the effect of student performance on parents' evaluations, indicating that parents' judgments of school quality and student performance are more closely aligned when schools have more responsibilities in managing teachers (Model 2; H3b supported).The interaction between financial autonomy and student performance is not statistically significant (Model 3; H3c not supported).These results remain the same in the full model (Model 4).
[Table 3 here] A more intuitive way of illustrating these results is plotting predicted values.Figure 1 shows predicted parents' evaluations in a school at varying levels of student performance given different levels of standards autonomy.The solid line shows the relationship between student performance and parents' evaluations when schools have a high level of standards autonomy (two standard deviations above the mean), and the dashed line illustrates the relationship for a low level of standard autonomy (two standard deviations below the mean).
The slope of the solid line is negative, suggesting that student performance is negatively associated with parents' evaluations when schools have more standards autonomy relative to upper-level bureaucracies.The slope of the dashed line is positive, by contrast, indicating that student performance is positively related to parents' evaluations when schools have less standards autonomy.
Figure 2 illustrates the predicted effects of student performance on parents' evaluations given different levels of human resources autonomy.The plot shows the opposite pattern from Figure 1, with a positive relationship between student performance and parents' evaluations for schools with a high level of human resources autonomy (the solid line) and a slightly negative relationship with a low level of human resources autonomy (the dashed line). 14

[Figures 1 and 2 here]
Recognizing individual student level performance and aggregate school-level performance may have different effects on parents' satisfaction, we conduct additional analyses testing the interactive relationship between autonomy and academic performance at the school level (Table 4).The findings from the school level performance are similar to those from the student level performance model.Whether the performance criterion is the entire school or the individual child, parents' perceptions of quality education are more consistent with these measures when local schools have substantial autonomy in human resources but limited autonomy in setting performance standards for the organization.Fiscal autonomy appears to have little impact on the congruence of citizen perceptions and organizational performance. 15  [Table 4 here]

Discussion and Conclusion
14 The plots of the marginal effects are also available in the online appendix. 15The plots of marginal and predicted effects from Table 4 are available in the online appendix.This article advances our understanding of how citizens' perceptual judgments of public services relate to archival indicators of service quality across countries and how institutional structures affect this relationship.By incorporating three dimensions of autonomy (standards, human resources, and financial autonomy), this study provides theoretical clarity and practical implications for institutional designs that can facilitate government performance assessment.Based on the literature on goal displacement, decentralization, and coproduction, we hypothesized that standards autonomy negatively moderates the relationship, whereas human resources and financial autonomy positively moderate the link.
Analyses from more than 62,000 parents in 16 different countries suggest that parents' perceptual evaluations of schools are significantly associated with student test scores.This implies that the convergent validity of the performance indicators can be generalized, although there are exceptions to this pattern.The finding also indicates that countries can design performance systems to enhance the ability of parents to evaluate schools.Parents' satisfaction and student performance are more closely aligned when schools have greater autonomy than national-level education authorities in managing teachers.
Interestingly, standards autonomy shows an opposite pattern, suggesting that student performance has a stronger relationship with parents' satisfaction when a national-level bureaucracy has greater authority in setting the standards in education.
The opposite effect of standard and human resources autonomy is both theoretically and practically meaningful in understanding and improving citizens' satisfaction with public services.Theoretically, our finding highlights the multidimensional nature of autonomy (Verhoest et al. 2012) by demonstrating each dimension of autonomy has a unique role in shaping the relationship between citizens' perceptual evaluations and archival performance.This result is consistent with a Danish study that shows that school autonomy (defined as a managerial authority) over human resources positively moderates the effect of performance management while school autonomy in goal setting has a negative moderating effect (see Nielsen 2014).Also, the findings of our study suggest that the convergence of performance indicators can be conditional on an institutional context rather than being absolute.The crucial institutional context that contributes to convergent validity (at least in the context of education) is local flexibility in managing street-level bureaucrats while subject to consistent national standards.
This article provides practical implications by answering the questions of what to delegate and what not to delegate.The delegation of authority and responsibility became more popular under NPM (Verhoest et al. 2012) based on the assumption that greater managerial autonomy can increase service user satisfaction as well as overall performance.
Our findings suggest that the devolution of autonomy is not a panacea.In particular, decentralized standards setting might encourage managers to set low standards of performance with detrimental consequences for desirable policy outcomes (see also Nielsen 2014).Variation in standards across jurisdictions also muddies the signal to citizens about the absolute quality of public services.This finding suggests that quality standards setting should not be left to local administrative organizations or managers, but set centrally to create greater transparency and accountability.Empowering local authorities over human resources, by contrast, can contribute to more positive perceptions of government among citizens and greater convergent validity of the performance indicators.In sum, the institutional design that allows more flexibility in people management (delegating human resources management) with centrally set standards (not delegating the quality standard setting) can contribute to a better performance appraisal system.
There are several limitations of this study, which inform future research on this topic.
While this research highlights the value of cross-national investigation, our empirical models do not necessarily explore the role of national-level factors.Future research should examine how macro-level structures (e.g., democratic vs. autocratic regimes, political decentralization) shape the link between citizens' subjective judgments and objective performance indicators.
Whether national-level structures interact with within-country autonomy when shaping citizen satisfaction with governments also merits study.Citizen satisfaction with various levels of government also provides the potential for future research, given that the delegation of decision-making authority involves the power dynamic between local and national governments.While this study focuses on citizen satisfaction with service organizations at the local level, future research can further explore citizen satisfaction with national government with varying level of administrative autonomy.
Lastly, it is worth discussing the question of whether our findings from education could be transferable to other public services such as healthcare, transportation, or welfare.In education, parents and students benefit directly from the education services and are likely to be aware of the quality of schools.In many other government services, however, citizens benefit indirectly from those services and are less likely to aware of overall quality.Although the current study only dealt with schools, we expect that the logic of national standards and local flexibility on human resources to be applicable to other public services.National standards give clients a uniform standard and provide clarity that facilitates comparative evaluation.Local flexibility in human resources permits crafting policies to fit local needs.
While both factors are likely to enhance the client's ability to evaluate a wide range of services, only additional studies in other areas can indicate how general this finding is.

Figure 1 .Figure 2 .
Figure 1.Predicted Effects of Student Performance on Parents' Evaluations depending on the Levels of Standard Autonomy

Table 1 .
Cross-National Variation in the Relationship between Archival Performance Indicators and Perceptual Evaluations DV = Parents' evaluation IV = Student performance IV = School performance

Table 2 .
A General Model for the Relationship between Archival Performance Indicators and

Table 3 .
The Role of Autonomy in the Relationship between Archival Performance Indicators and Perceptual Evaluations: Individual Level Performance Note.Clustered robust standard errors by school in parenthesis; country fixed effects included but not shown; + p<0.10, *p<0.05,**p<0.01;two-tailed tests.

Table 4 .
The Role of Autonomy in the Relationship between Archival Performance Indicators and Perceptual Evaluations: Aggregate Level Performance Note.Clustered robust standard errors by school in parenthesis; country fixed effects included but not shown; + p<0.10, *p<0.05,**p<0.01;two-tailed tests.