Can you standardise transformation? Reflections on the transformative potential of benchmarking as a mode of governance

ABSTRACT This paper is a collaborative effort between academic researchers and practitioners to consider the conditions under which global benchmarking may be used as a tool for supporting urban transformation. Reflecting on WWF’s One Planet City Challenge and UN-Habitat’s Guiding Principles for City Climate Action Planning, the paper suggests that the practice of global benchmarking can be transformative through encouraging organisational learning and reflection, building relationships between cities and global and trans-local organisations, and governing for structurally transformative qualities. However, the practice of benchmarking is not without potential tensions: they may reify existing practices rather than reforming them, be less usable for or accessible to cities in lower income countries, and may neglect issues of climate justice, which are not easily reduced to comparative measures of success or failure. This suggests that a wholesale reliance on benchmarking as a mode of governing climate change might risk marginalising certain issues and amplifying others. We conclude by recommending improved material and technical support for urban data collection and suggest that benchmarking should be combined with a broader suite of performance indicators and reflective practices in order to support urban transformation.


Introduction
Cities have increased their participation in global climate governance over the past two decades and consequently the global urban climate domain is a complex and fragmented political space (Bulkeley 2021). Under these conditions, benchmarking has emerged as an essential mode of governance. Benchmarking is a diagnostic exercise meant to assess the performance of cities either relative to one another or against an absolute standard. It embodies the logic of "governing by goals" which has come to characterise the field of sustainable development (Kanie et al. 2019). When set by transnational and international organisations, benchmarks aim to mobilise cities and harmonise their actions with national and global goals, in effect providing mileposts for cities towards shared, long-term climate and sustainability goals. In this way, benchmarks render the field of urban climate governance "legible and logical" to global actors (Gordon and Johnson 2017, 706). However, the global goals to which benchmarks are calibrated are rapidly changing. In recent years, the purpose of action has moved beyond ambitious climate policy and towards transformative change (Walsh et al. 2022). As Rosenzweig and Solecki (2018, 756) describe, "the term 'transformation' is invoked to describe what cities must do to simultaneously improve climate resiliency and achieve the positive effects of low-carbon sustainable development". It refers to an interconnected change agenda in which efforts to limit global warming to 1.5°C also consider questions of equity and well-being (Bazaz et al. 2018). Transformation is distinct from the idea of transitions in both scope and scale: it reaches beyond bounded urban systems and is associated with ideas of "novelty" as it entails changes to underlying social, economic, and political orders (Bulkeley 2019;Hölscher, Frantzeskaki, and Loorbach 2019).
A key question, then, is whether benchmarking as a tool of governance can support transformative change? The practice of benchmarking is purported to provide a variety of benefits, including improvements to the quality of key information, enhanced accountability and transparency, opportunities for peer-to-peer learning, and encouragement to cities to be ambitious (Lehtonen, Sébastien, and Bauler 2016;Boyko et al. 2012). However, benchmarking is also criticised for a number of reasons, including promoting technocratic managerialism and obscuring key political debates by reducing complex phenomena to tidy indicators and neat numerical values (Tichenor et al. 2022;Elgert 2018;Broome and Quirk 2015).
In light of these diverging accounts, this paper considers the potential of benchmarks to shape the pathways to and outcomes of transformative change. We examine two sets of global benchmarks for urban sustainability: the WWF's One Planet City Challenge (OPCC) and UN-Habitat's Guiding Principles for City Climate Action Planning (GPCCAP). We find both initiatives show transformative potential in ways both predicted by and novel to the literature. The analysis highlights how benchmarks can facilitate organisational learning, build relationships between cities and global organisations, and encourage cities to govern for structurally transformative qualities. The paper also points to three key tensions that continue to beset this mode of governance. First, benchmarks may entrench, rather than transform, the status-quo through reifying existing practices. Second, data availability challenges may prevent cites, especially in low-income countries, from reaping the benefits of benchmarking. Third, certain qualities are more easily "benchmarked" than others; questions of climate justice, for instance, are not easily reduced to comparative measures of success or failure.
Overall, the paper finds that benchmarking can support transformation not only through goal setting but also through the practice of benchmarking itself: the processes of information gathering, reporting, and coordinating across organisations and actors provide opportunities through which the seeds of transformation may be sown.

Benchmarking and global urban policymaking
Benchmarking is a diagnostic exercise meant to assess the performance of cities either relative to one another, as is the case with prizegiving initiatives like the World Resources Institute's Ross Prize, or as judged against some absolute standard, which is the approach taken by the World Bank's Urban Sustainability Framework and in best practice compendiums like the C40's Cities 100. The practice of assessment can take one of three forms: competitive, where a third party assesses and ranks cities (without their express participation); cooperative, in which cities cooperate with the benchmarker in the assessment; or collaborative, where cities co-produce the benchmarks, standards, or indicators by which they are assessed (Luque Martínez and Muñoz-Leiva 2005). Finally, benchmarks assess three types of qualities. According to Broome and Quirk (2015, 815), these are the: (1) quality of conduct, or how well actors have discharged their responsibilities in specific areas; (2) quality of design, or how well specific policies, laws, or institutions have been formulated and applied; (3) quality of outcomes, or how well activities in specific areas align with defined goals (irrespective of who is actually responsible for the overall outcomes).
If benchmarking is ultimately the governance of qualities, it is important to assess which qualities are deemed valuable or undesirable. The various forms and approaches to benchmarks are summarised in Figure 1.
Benchmarking operates through four main mechanisms: encouraging competition, providing information and opportunities for learning, monitoring progress and holding policy makers accountable, and constructing political aspirations or agenda setting (Maassen and Galvin 2019;Moreno Pires, Magee, and Holden 2017;Hansson, Arfvidsson, and Simon 2019;Broome and Quirk 2015). The former mechanismsencouraging competition, supporting learning, and enhancing accountabilityhighlight how key changes occur through the practice of benchmarking, as actors gain information, become "knowable" to outside entities, and compete with peers. The latter mechanism, agenda setting, suggests benchmarks themselves also induce change through constituting "a new common sense" (Kuzemko 2015, 971); defining issues in particular ways, building consensus, narrowing the range of governance options, promoting norms, legitimising certain outcomes and approaches, and augmenting actors' authority (Bernstein and van der Ven 2017;Bulkeley 2006). In this way, benchmarks imprint particular visions of urban sustainability on cities and direct local actors towards certain policy ends (and away from others).
The use of benchmarking as a mode of global urban governance has been on the rise since the 1990s. In the field of urban climate governance, usage tracks with cities' increased participation in global environmental governance and comes at the encouragement of transnational city networks. In a review of European climate protection networks, Kern and Bulkeley (2009) find benchmarking (alongside recognition and certification) to be a core strategy through which city network secretariats internally govern member cities, using peer pressure to promote compliance with clearly defined standards and milestones. Furthermore, Gordon's work on the C40 network highlights how accountability practices including (but not limited to) benchmarking have become key sources of legitimacy for cities to act in global forums, establishing relations of accountability to global institutions and goals (Gordon 2020).
More generally, benchmarking is a key technology of government under New Public Management (Hood 1991)a defining political rationality in the field of sustainability governance (Tichenor et al. 2022;Turnhout, Neves, and de Lijster 2014). It emphasises clear goals while giving actors discretion to craft individual approaches, enabling devolution, delegation, and outsourcingin other words, less government and more "steering at a distance" (Rose and Miller 2010). By setting clear targets, benchmarking aspires to generate transparent, reliable, and standardised information which can be used to evaluate climate action across cities. By evaluating cities in a comparative manner, it encourages the efficient and effective achievement of defined ends, supporting the development of best practices and policy innovations (Elgert 2018). Benchmarking also embodies principles of neoliberalism by enhancing the scope and autonomy of private and non-state actors and centering economic value in its approach to urban sustainability. As Rosol, Béal, and Mössner (2017) argue, benchmarks, performance indicators, standards, rating systems, rankings, and awards are all among the neo-managerial instruments of control (and rewards) which have underpinned the development of an entire economic sector centred around green urbanism. This is illustrated when carbon disclosure platforms such as CDP-ICLEI TRACK frame the benefits of benchmarking in terms of economic opportunity, noting "environmental reporting can help unlock investment" (CDP 2018).

Benchmarking for transformation
The way benchmarking operates and the political epistemologies which underpin it raise important analytic and normative questions. Can benchmarking, with its embedded logic of competitive performance assessment and emphasis on standardised sustainability goals, be a tool for transformation? Are the socioeconomic rationalities which benchmarking serves inherent to this practice or might benchmarking be calibrated towards alternative ends? Are dynamics of global standardisation and local transformation at odds, or might benchmarks be a mode for scaling transformative urban experiments, helping cities to "experiment more effectively" (Evans et al. 2021, 171)?
Current research highlights numerous reasons to remain skeptical. Scholars note how benchmarks, especially those pegged to best practices, may have an inherent status quo bias since prescriptions are derived from existing practice (Bernstein and van der Ven 2017). Moreover, there are doubts about the contributions global benchmarks can make to solving complex issues, particularly justice and equity issues, since these evade "one-size-fits-all" approaches (Moreno Pires, Fidélis, and Ramos 2014; Hansson, Arfvidsson, and Simon 2019). Others warn benchmarking can have antidemocratic implications, since expert-led benchmarking can exclude "non-experts" from the discourse on what counts as "sustainable" (Rosol, Béal, and Mössner 2017). Relatedly, some are concerned that benchmarks might move climate and environmental governance towards a "postpolitical" consensus, obscuring key power relations and silencing dissenting voices (Swyngedouw 2010). Finally, given a lack of strong evaluative research, some are skeptical that any clear relationship exists between benchmarking and policy change at all (Boyko et al. 2012;Gahin, Veleva, and Hart 2003;Maclaren 1996) while others point to a "performance paradox" in which the practice of measurement actually produces worse, rather than better, results (van Thiel and Leeuw 2002).
In evaluating the effects of benchmarking, these critiques largely focus on the direct linkages between benchmarking and policy change, finding the impacts either too minimal or unsatisfactory. However, when examining the effects of benchmarking as a mode of governance, what is perhaps required is a more nuanced evaluation of "effect": are there latent effects to benchmarking that exist outside of policy change? A similar question might be usefully asked in our analyses of transformation; where might we locate transformative change?
Bearing this in mind, we take a pragmatic approach to understanding the interactions between benchmarking and transformative change. We look to four key sites and consider how benchmarking might catalyse different types of transformative change: in processes, outcomes, systems, and structures (see editorial introduction to this special issue).
The transformation of processes might entail changes to how benchmarks are constructed, through enhancing community or non-expert contributions or engaging in collaborative exercises for identifying best practices. A community-academic partnership in Orange County, for example, collaboratively generated a best-practice checklist for more equitable COVID-19 vaccine distribution, highlighting one alternative model of benchmark construction (Washburn et al. 2022). The transformation of outcomes considers the possibility of shifts in the arrangement of socio-material resources in cities; benchmarking might, for instance, result in funding or knowledge resources being directed to previously underserved or under-resourced cities or communities. However, in a review of 67 indicator sets for urban sustainability, Marino-Saum et al.(2020) find distributional and equity concerns consistently under-represented, suggesting that the link between benchmarking and the alleviation of resource disparities within communities may be largely theoretical at present.
System-level transformation entails changes to socio-technical and socio-ecological systems at the urban scale. The City Clean Energy Scorecard exemplifies system-level benchmarking: it assesses 100 US cities' progress against benchmarks in key sectors like buildings (e.g. the stringency of building energy codes) and transport (e.g. electric vehicle infrastructure investments) (Ribeiro et al. 2020). The final dimension, structural transformation, can be difficult to untangle from system-level transformation. However, as Bulkeley (2019, 13) notes, although these approaches are used interchangeably, and are often brought together in relation to policy goals, "they rely on fundamentally different concepts of what justice entails and what constitutes effective political processes". Structural transformation emphasises not only the reconfiguration of systems but their rearrangement in a manner which redistributes social and political power and challenges the socio-material configurations which produce unsustainable conditions in the first place. This is perhaps the most challenging site for transformation and, similarly, the most challenging kind of change to encourage through benchmarking due to the multi-scalar and deep-seated nature of structures. However, the Wellbeing Economy Alliance highlights an example of a collaborative effort to develop best practices for alternative economic models prioritising human well-being rather than economic growth (Wellbeing Economy Alliance 2021).
In taking a multidimensional approach to transformation, our goal is a more nuanced understanding of benchmarking's potential. In the next sections, we turn to the cases of the WWF's One Planet City Challenge and the UN-Habitat's Guiding Principles for City Climate Action and consider their potential to encourage transformative changes to processes, outcomes, systems, and structures.

Case studies and methods: the One Planet City Challenge and the Guiding Principles for City Climate Action Planning
To better understand the role benchmarking plays in catalysing or supporting transformation, we analyse two benchmarking practices: the 2019-2020 cycle of the WWF's One Planet City Challenge (OPCC), and the UN-Habitat's Guiding Principles for City Climate Action Planning (GPCCAP), published in 2015.
The OPCC is a bi-annual competition encouraging cities to set ambitious targets and develop climate plans in line with the Paris Climate Agreement's goal to limit global warming to1.5°C. The competition is cooperative, meaning cities choose to participate in the OPCC when reporting through CDP-ICLEI TRACK. 1 It also uses both absolute and relative comparison: it scores cities against a number of benchmarks derived from best practice (for instance, emissions reductions targets in line with 1.5°C, set percentages of renewable energy) but also relative to their past performance as well as the performance of their peers. The 2019-2020 competition cycle entailed five phases (see Figure 2): registration and reporting, prescreening assessment, deep dive assessment, evaluation by OPCC expert jury, and promotion and global awards (WWF 2019). Through the competition, cities receive recognition for their efforts as well as guidance on potential "bigwin" actions that can help them achieve their goals. More than 700 cities have participated in the competition since it began in 2013: winners include Bogotá (2022), Lund (2022), Mexico City (2020), Uppsala (2018), and Paris (2016), Seoul (2015), Cape Town (2014) and Vancouver (2013).
The GPCCAP is a series of eight principles developed to aid local policymakers in planning for climate change (see Figure 3). It was developed by UN-Habitat and 45 endorsing partners and launched at the Paris Climate Conference in 2015. Accompanying the core document outlining the broad principles for planning, UN-Habitat also released an Assessment Toolkit, which operationalises the principles so that they might be used to carry out a city-level assessment. Rather than articulating specific goals, the GPCCAP provides guidelines for cities to develop their own targets, goals, and plans. In this sense, GPCCAP is more expansiveand perhaps abstractthan most benchmarking initiatives. However, this is not to say that the GPCCAP lacks prescription. As Figure 3  illustrates, the principles highlight the form, if not the content, of "good" climate goalsthings like ratchet mechanisms (Principle 1 -Ambition) and inclusive and participatory planning processes (Principle 2 -Inclusion).
The OPCC and GPCCAP were selected for comparison because they represent "diverse" cases (Gerring 2007). The OPCC is a prize-giving competition: it is both relative and absolute in its comparison, as it assesses cities' efforts to align with an absolute quantitative target (the 1.5°C target) as well as their performance relative to one another. The GPCCAP, on the other hand, sets qualitative benchmarks against which cities are evaluated. Additionally, the OPCC is driven by policy outcomes whereas the GPCCAP seeks to improve policymaking. By comparing the OPCC and GPCCAP, we hope to tease out how these initiatives might potentially support (or undermine) urban transformation. Diverse case selection enables more exploratory analysis (Seawright and Gerring 2008), allowing us to weigh how variations in the practice of benchmarking might shape transformative change dynamics. With limited existing research on the transformative potential of benchmarks, this paper generates foci for future research rather than generalizable results.
Our analysis of the initiatives began with a document review, which included both publicly available and internal documents related to the GPCCAP and OPCC. 2 Following this, the lead author conducted 16 semi-structured interviews with officials at UN-Habitat, City of Glasgow, City of Lemon Grove, San Diego State University, and Second Nature. These occurred between July 2019 and September 2020 either in person (in Nairobi, Kenya) or online (over Zoom). Interviews with UN-Habitat officials used the four-part framework introduced in the previous section (focusing on changes to processes, outcomes, systems, and structures) to guide questions and help understand the direct and latent effects of the GPCCAP. Interviews with non-UN-Habitat officials related to GPCCAP (Lemon Grove, San Diego, Glasgow, Second Nature) were more informational in nature since these individuals were speaking as users of the GPCCAP benchmarks and as such were not familiar with the overall functioning of the GPCCAP initiative.
This paper is also a transdisciplinary collaboration between the co-authors of this paper: two researchers and five practitioners from the WWF. In addition to drawing on the expertise and observations of the WWF practitioners (who also relayed comments and critiques from municipal officials participating in the competition), the partnership was an opportunity for the WWF authors to consider the transformative potential of the OPCC. Guided one-on-one conversations were the key mode for reflection. Typically over an hour, these conversations, like the conversations with the GPCCAP officials, worked with the four-part framework of transformation to consider the direct and latent changes which the OPCC may have induced. Key insights were then collectively generated and "tested" in a larger group conversation that occurred over Zoom in September 2020 and included nearly 30 WWF officials.
Combining more traditional interviewing with co-productive, reflexive methods was largely a pragmatic choice made based on the availability and capacity of the practitioners involved in each case. However, the variation in methods used across the two cases was revealing, highlighting how transdisciplinary collaborations between researchers and practitioners can drive more solutionoriented research, especially when the analysis is done in-situ (e.g while the OPCC competition is ongoing and can benefit from the insights generated) versus ex-post (e.g. after the GPCCAP's publication) (Lang and Wiek 2022).
The following discussion draws on this mix of practitioner experience and observation, analysis of written materials, and firstand second-hand accounts from users of the GPCCAP and OPCC to understand the potential of these benchmarking initiatives to support urban transformation.
4. Discussion: identifying the transformative potential of the OPCC and GPCCAP Following the framework introduced in Section 2, this section considers whether and how the GPCCAP and OPCC might catalyse transformative change, analysing the initiatives in terms of their effects on outcomes, processes, systems, and structures.

Transformation through outcomes
One way to assess the outcomes of the OPCC and GPCCAP is by evaluating the number of cities utilising them. There is a sharp difference in uptake between the initiatives: over 700 cities have competed in the OPCC while fewer than a dozen cities have explicitly reported using the GPCCAP. The degree to which the GPCCAP has been utilised is harder to assess since there is no dedicated reporting platform. Moreover, the nature of the principles is such that cities can reference and implement them without working with or notifying UN-Habitat. That said, four cities have actively worked with the principles: Glasgow, Scotland; Vilankulo, Mozambique; Lemon Grove, USA; and Vancouver, Canada. The cities, which greatly vary in size and economic, social, and environmental needs, used the Principles at different stages of their policy planning processes. Glasgow used the Guiding Principles to reflect on existing and forthcoming policies. Lemon Grove used them to stimulate thinking about climate planning in the future. Vilankulo used them as a part of a broader resilience planning exercise. Finally, Vancouver, which utilised the GPCCAP independent of UN-Habitat facilitation, made the principles the basis of its Climate 2050 Strategic Framework.
The difference in uptake between the two initiatives is attributable to more than simply their inherent differences. The OPCC's success in uptake is also linked to its "fit" within the existing governance regime in terms of both its targets and its reporting mechanism. The OPCC is aligned with the Paris Climate Agreement's 1.5°C target. Moreover, participation in the OPCC is enhanced by linkages to existing reporting systems, including the Common Reporting Framework of the Global Covenant of Mayors (GCoM) and CDP-ICLEI TRACK. These synergies streamline the reporting process in order to enhance participation in climate action. In other words, the barriers to entry into the competition are low-or at least, lower than if cities had to gather new data or disclose to new platforms in order to participate in the competition. 3 In contrast, the GPCCAP is less explicitly crafted to "fit" with global goals. For one, the GPCCAP was launched in 2015 prior to the Paris Agreement being finalised and before the release of the SDGs: consequently, it has no links to these formative agendas. An official from Glasgow noted this as a key limitation because Scotland's national performance framework is linked to the SDGs, and local authorities must demonstrate how their agreements meet the national frameworks. Additionally, in contrast to the OPCC, the GPCCAP is a more independent set of benchmarks. Though developed in collaboration with, and endorsed by, 45 partners, including the CDP and ICLEI, the GPCCAP is not integrated into any other reporting platforms or toolkits (though it was consulted as a part of the OPCC's development). Furthermore, a number of organisations and user-cities critiqued the GPCCAP Assessment Toolkit as being too long and complex. In 2019, the UN-Habitat launched a review and update of the GPCCAP (GPCCAP 2.0), and these were noted as key areas for improvement. Criticisms of the GPCCAP's lack of harmonisation with global goals and reporting platforms and more complex toolkit suggest the importance of taking reporting burdens into account when aiming to encourage benchmark uptake.
That said, crafting benchmarks to encourage uptake can also be a double-edged sword. The pressure to "fit" benchmarks into a broader landscape of governance as well as the incentive to bundle benchmarking practices with other governance efforts can unintentionally support status quo biases in benchmarking. This is because a "high-fit" benchmark will only be as transformative as broader governance goalsthey are unlikely to set new or transformative goals and instead are limited by broader targets. The strategic choice to link the OPCC to the CDP-ICLEI TRACK represents such a tension: in supporting and creating synergies with existing global reporting efforts, the OPCC has voluntarily limited its approach to assessing cities, opting to work within the constraints of the existing CDP survey rather ask new questions or take new approaches to assessing climate action. In practice it means that the OPCC cannot assess elements of urban climate and environmental action that are not covered by the CDP-ICLEI TRACK questionnaire which has a focus on mitigation, adaptation and vulnerability. Consequently, information on other areas of action relevant to urban transformation and important to WWFsuch as biodiversitymay not be possible to include.
In contrast, while the standalone form of the GPCCAP has undermined its uptake, it has also enabled experimentation. In addition to its application in city planning, the GPCCAP has been used by academic institutions. In 2015, faculty and students at San Diego State University (USA) adapted the GPCCAP into a toolkit for university climate action plans: the Guiding Principles for Campus Climate Action Planning. The campus-level toolkit then drew the interest of Second Nature, a US NGO committed to accelerating climate action across higher education institutions. The organisation has since shared the Assessment Toolkit with over 30 universities, and according to one Second Nature official, the toolkit has been especially valuable for universities just embarking on climate action planning. Thus, while the GPCCAP's disconnection from reporting platforms and its more detailed and complex toolkit have limited its uptake in terms of overall numbers, these same qualities have enhanced its ability to be creatively used at varying stages of climate planning and across a diverse set of actors beyond municipalities.
Assessing the transformative potential of the OPCC and GPCCAP through their outcomes highlights a key tension in benchmarking. On the one hand, conceptions of transformation which emphasise the scope of change would look to the uptake of benchmarks as an important metric for success. However, the GPCCAP's experimental adaptation also highlights how "uptake" and usage may not be straightforward and the outcomes of benchmarks may be diffuse and unintended. Usage should therefore be considered both quantitively, in overall numbers, but also qualitatively, in terms of how and among whom the benchmarks are most useful.

Transformation through processes
In addition to transformation through outcomes, both initiatives highlight how the practice of benchmarking may support transformative change. Both provided opportunities for organisational learning, shifted organisational practices and forged new political networks. While existing literature emphasises city learning that can occur through benchmarking, the following examples highlight alternative sites and modes of learning not yet considered by researchers.
First, use of the GPCCAP during a resilience planning exercise led to changes in the UN-Habitat's approach to resilience planning. In March 2016, the GPCCAP were used during a City Resilience Action Planning (CityRAP) exercise in in Vilankulo, Mozambique. In addition to the particular insights the Guiding Principles Assessment yielded regarding Vilankulo's resilience plan, the exercise also provided an opportunity for UN-Habitat officials to reflect on the CityRAP approach to resilience, ultimately leading to the recognition that climate issues should be mainstreamed into the CityRAP tool (UN-Habitat 2016, 12). Therefore, in addition to identifying ways to improve planning at the local level, the practice of utilising the GPCCAP also changed the organisation's approach to planning in other workstreams. Through the practice of applying the GPCCAP, therefore, the UN-Habitat became aware of blind spots in its resilience approach and the need to integrate a holistic view of resilience which includes broader environmental pressures as well as systemic drivers of unequal vulnerabilities.
The OPCC process is also a mode through which WWF as an organisation can learn. Specifically, the WWF, through the OPCC, learns about city needs regarding climate and environmental governance. In the 2019-2020 OPCC competition, cities progressing to the second round were eligible for "deep dive assessments"; in-depth analyses to see if candidate cities are meeting or fulfilling the plans, targets, and inventories they submitted to the competition. The assessments' stated purpose was to evaluate candidates and provide feedback to cities as well as the expert panel of judges for the OPCC. However, the OPCC deep dive assessment was also a "door opener" for closer relationships between candidate cities and the WWF. The assessment familiarised the WWF with new citiestheir strengths, needs, and areas for improvement regarding climate policyand served as a turnkey for their greater involvement with the global organisation. 4 This relationship building is especially important for the organisation as it broadens its work beyond wildlife conservation and into urban climate action. Notably, networking is not necessarily transformative in and of itself, though it can enable transformative change. For the WWF, a future ambition is to facilitate links between OPCC cities and the Cities Climate Finance Leadership Alliance (CCFLA) in order to generate material support for solving the city challenges identified through the OPCC process.
These examples highlight how benchmarking can be a tool for organisational, rather than just city learning. This suggests a need to widen our analyses to consider more indirect modes through which benchmarking may be transformative, such as organisational change and forging new and meaningful political networks.

Systemic or structural transformation
In addition to changing outcomes and processes, benchmarking may support transformative changes to systems and structures. Systemic change entails technological, institutional, and cultural shifts which can "un-lock" systemic path dependencies. Structural change further emphasises the redistribution of social and political power in ways which challenge the underlying drivers of vulnerability and unsustainability. Despite differences in their underlying conceptions of justice, systemic and structural transformation are often joined in practical terms in the context of policymaking. As such, they are discussed together in this section.
Both the OPCC and GPCCAP have elements which target systems and structures. The GPCCAP has two principles explicitly aimed at structural change: Principle 2 emphasises inclusivity, stating that planning processes should involve "multiple city government departments, stakeholders and communities (with particular attention to marginalized groups), in all phases of planning and implementation" (UN-Habitat 2015, xii). Principle 3, "fair", encourages governments to "seek solutions that equitably address the risks of climate change and share the costs and benefits of action across the city" (UN-Habitat 2015, xii). Together, the emphasis on the representation of marginalised communities combined with a focus on the distributive implications of climate policy targets the underlying social, political, and economic structures that produce climate vulnerability, at least at the city level. The emphasis on systemic transformation is more limited in the GPCCAP. Besides being based around the 2°C limit on global warming, the principles do not prescribe specific targets related to technological systems. Institutional changes to policymaking practice make up the bulk of the GPCCAP's systemic focus: the principle of "comprehensive and integrated" encourages local planning which overcomes policymaking siloes. Moreover, the principle of "ambition" highlights the importance of rachet mechanisms and iterative policy design in order to enhance the transformative potential of urban climate policies.
The OPCC aims for both systemic and structural transformation, though systemic change is the central focus of the competition. By aligning the OPCC with the 1.5°C goal, cities are primarily assessed on their mitigation targets, though they may target different urban systems in their pursuit of these goals. The 2020 OPCC global winner, Mexico City, is reducing emissions in its transport sector, building a bus rapid transit system, expanding its cycle network, and deploying electric buses and bikes. The 2018 OPCC international winner and 2020 national winner Uppsala, Sweden, emphasises not only sustainable transport, but also aims to reach zero emissions in the heating sector (with a target for climate neutral/climate compensated heating by 2020) and to cut emissions from the construction industry by focusing on building materials (Uppsala Kommun 2020).
However, beyond simply evaluating cities' emissions reductions, the OPCC adds three dimensions to its assessment that enhance its focus on structural transformation and avoid depoliticising key distributional questions about the benefits and burdens of climate action. First, it considers not only emissions reductions targets but also the procedures for designing and implementing those goals, encouraging cities to consider key questions of representation and the distributional impact of policy goals. Second, the OPCC uses the Human Development Index (HDI) to adjust mid-term emissions reductions targets in order to reflect variation among levels of development. According to WWF, the result is that "prescribed 2030 targets range between 25-65% reductions depending on development levels as determined by the HDI" (WWF 2019, 8). This adjustment adds nuance to typically homogenous emissions reductions goals and captures the spirit of the principle of "common but differentiated responsibilities"a principle rooted in concerns about equity and historical responsibility for emissions. Third, the OPCC encourages cities to evaluate their consumption-based emissions, or emissions associated with goods and services imported into a city. In bringing consumption-based emissions into focus, the OPCC compels cities to take a more holistic approach to their emissions footprints whilst reframing emissions in heavily industrialised cities, onto whom emissions burdens are often unfairly exported. The emphasis on consumption-based emissions raises key questions about who is responsible for emissions and who should be held most accountable for actionessential questions which target the underlying economic structures and link emissions to processes which extend beyond the bounds of a city.
These examples highlight how, even within the context of narrowly quantitative emissions reduction targets, benchmarks can reach towards both systemic and structural forms of transformation. The focus on policy and planning processes emphasises the importance of representation and inclusion within urban climate governance while the HDI factor adjustment and introduction of consumption-based emissions aim to more equitably assign responsibility for climate action across diverse sets of cities. Notably, all three of these analyses are supplemental parts of a city's submission to the OPCC, occurring outside of the (CDP-ICLEI TRACK) questionnaire. This suggests rigid forms of benchmarking may be limiting and highlights how capturing both systemic and structural transformation dynamics is difficult within the bounds of existing disclosure platforms. More free-form evaluative practices may open avenues for meaningful reflection and learning.
Both sets of benchmarks take substantive steps towards structural, rather than just systemic, transformation in the goals that they set. However, in practice both initiatives struggle with structural issues of access and equity, raising questions of justice around who this mode of governance benefits and who it may exclude.
First, while the GPCCAP is framed in such a way as to be sufficiently broadcapturing the diversity of cities potentially utilising themcities new to climate action planning, especially in non-OECD countries, tend to perform poorly when assessed using the GPCCAP Assessment Toolkit. This is a downside of the GPCCAP Toolkit as it is currently formulated: the simplistic assessment (red = weak compliance, yellow = partial compliance, green = full compliance) is criticised as a potentially demotivating approach for cities, as "all-red" scorecards bluntly indicate how far off the mark a city might be compared to global standards.
Second, for the OPCC, data issues remain a key barrier to wider city participation. This is especially the case among cities in low-income countries, since few have urban observatories to track and organise city data. Two Indian cities that have consistently participated in the OPCC, Pune and Rajkot, have been able to generate the necessary data with the support from several organisations, including ICLEI and WWF. However, other Indian cities have faced issues with poor data sharing across municipal departments or missing information (though many make progress on this front year-by-year). Additionally, cities also face compatibility issues when disclosing to the OPCC. A wide range of models for vulnerability assessments means that existing data does not always fit into the questions posed in the assessment framework. Moreover, while a city may have an emissions inventory, they might not have one detailed enough, or wide enough, to set targets for both Scope 1 (direct emissions) and Scope 2 (indirect emissions), and therefore will fall short of OPCC reporting requirements. Disparities in data availability often map onto divides between wealthy and lowincome countries, with cities in the latter countries more likely to face issues with inadequate or ill-suited data. If the benefits accrued by participating in the OPCCfrom the WWF's tailored policy feedback to their support in developing public awareness campaigns and potentially accessing future revenue streamsare only available to those cities with the capacity to compete, then this represents an important distributional issue, as only a subset of cities can then access the rewards of this competition. This highlights how at a global scale, the practice of benchmarking may unintentionally widen inequalities between cities (Elgert 2018) and undermine dynamics of structural transformation.

Conclusions
This paper considers the transformative potential of benchmarking as a mode of governance. It analyses the WWF's One Planet City Challenge (OPCC) and the UN-Habitat's Guiding Principles for City Climate Action Planning (GPCCAP) using a four-part framework of transformation and considers how the two initiatives may support transformative change to processes, outcomes, systems, and structures.
Four key conclusions emerge from our analysis. First, the difference in the uptake of the OPCC and GPCCAPwith the OPCC enjoying much wider usage than the GPCCAPhighlights the appeal and ease-of-use of benchmarks which "fit" with existing governance goals and systems and therefore lower reporting burdens on cities. However, the creative adaptation of the GPCCAPs also suggests that independent benchmarks, while less easily usable, might enable more experimental usage and can be useful for particular issues (such as early-stage policy planning, in this case). This highlights a key tension, in which the incentive to "fit" with the existing governance landscape may produce initiatives that serve to reproduce rather than transform cites, while more standalone benchmarking practices might encourage experimentation but risk irrelevance, minimising their overall transformative potential. There is no easy solution for overcoming this tension, but the OPCC case demonstrates a potential approach to mitigating this issue. By pairing the benchmarking exercise with more fluid and qualitative follow-up assessment, the OPCC aims to deepen its focus on questions of justice and equity, going beyond the strictures of the CDP-ICLEI TRACK questionnnaire and encouraging cities to consider questions of representation in planning and the distribution of climate policy costs and benefits. In other words, diversifying the approach to governanceutilising a broader suite of assessment tools and approachesallows WWF to govern for qualities not necessarily featured in existing global goals and targets.
Second, in addition to acting as a learning tool for user cities, the analysis also found that benchmarking can be a key tool for organisational learning and networking. First, UN-Habitat integrated the GPCCAP into its approach to resilience planning. Second, the OPCC provided an opportunity for the WWF to learn about cities' key issues related to climate change, and subsequently helped open doors to further collaboration between the WWF and OPCC cities on these issues. In this sense, benchmarking as a practice facilitated space for both reflection (on UN-Habitat's part) and connection (between WWF and cities). While not necessarily transformative in a physical or technical sense, Maassen and Galvin (2019) suggest these types of changes to institutional structures and routines should be considered among the varieties of possible transformations. While their work is speaking in reference to city institutions in particular, we suggest that shifts within global organisations should also be included, as changes within and among these organisations can also alter agendas, institutional arrangements, and governance structures. Future research should consider organisational learning and networking as potential mechanisms through which urban transformation may occur.
Third, the OPCC and GPCCAP provide interesting cases of benchmarks in which structural, rather than just systemic change is encouraged. Existing literature demonstrates that systemic change is over-represented in climate benchmarking. This is because policy goals often target technological systems, like energy systems, and benchmarking is well suited to numeric targets like emissions reductions or energy mixes (Kuzemko 2015). However, the GPCCAP and OPCC highlight how it is also possible to govern for structural change through benchmarks, despite the challenge of distilling these issues in quantified form. In both initiatives, a focus on structural transformation was facilitated through qualitative targets and/or principles related to inclusion, planning participation, and vulnerability assessments. The focus on policy planning processes, in particular, provided a fruitful avenue for confronting issues about representation, inclusion, fairness, and justice. However, neither benchmarking initiative was able to develop metrics through which to measure or assess structurally transformative outcomes. In other words, there is a more natural affiliation between structural transformation and means-oriented processes rather than outcome-oriented targets. This underscores the need to diversify approaches to assessment, coupling more narrow benchmarking efforts with alternative modes of assessment.
Finally, our analysis finds that the transformative potential of benchmarking as a mode of governance may be undermined by barriers to participating in benchmarking in the first place. Especially among cities in low-income countries, the urban data and technical capacity required to engage with global benchmarks may be limited. This risks making benchmarkingand the material and political benefits that can come with ita privilege too few cities may access.
Based on the findings of the analysis, we have two recommendations for enhancing the transformative potential of benchmarking. First, benchmarking in its strictest sense is hard pressed to encourage structural transformation, as its expressions tend to be entangled in the particularities of a city context. However, as the OPCC and GPCCAP highlight, structural transformation can be supported through more qualitative benchmarks, such as the best practices for policy planning embedded in both frameworks. More research is required, however, to develop better tools to govern towards structurally transformative policy outcomes rather than just structurally transformative processes. Second, practically speaking, in order for the benefits of benchmarking to be accessed and enjoyed by the widest possible cross-section of cities, material support in data collection is essential. Without supporting cities in collecting and sharing their data, there is a risk of exponentially widening the gap between urban climate leaders and newcomers.
As previously noted, this paper is a collaborative effort between academics and WWF officials. Because this reflexive analysis was done in-situ, the findings of this paper have shaped the OPCC's future direction. It is currently being redesigned with an emphasis on supporting transformative processes (rather than outcomes), highlighting the importance of encouraging inclusive, fair, and participatory planning processes as cities pursue the global 1.5°C target. Additionally, the organisation is actively trying to lower the barriers to participating in benchmarking by providing enhanced scientific, technical, and operational support to cities. Overall, the process of researching and writing this paper provided a key opportunity for the co-authors to situate the OPCC within the field of climate benchmarks and clarify how it can strategically contribute to urban transformation. In addition to these outcomes for the OPCC, the process of writing this paper was also a unique opportunity to open up the "black box" of benchmarking and learn about the organisational processes, dynamics, personalities, and relationships which produced these tools. Critiques of benchmarking argue these tools concentrate policymaking power in the hands of a technocratic elite and push us towards a post-political consensus by precluding debate on important normative questions. Bearing these critiques in mind, it is essential that we foster these kinds of collaborative relationships and convene more reflective conversations which include even wider cross-sections of participants, opening up space for debate, critique, and dissent prior to and throughout the life of a benchmark.
The key takeaway from our analysis is that there is a horizon of possibilities for benchmarks and they can be crafted in ways that support transformative qualities in cities. Future research should continue to tease out the direct and indirect transformative dimensions of benchmarking and consider alternative contributions of benchmarking including prompting reflection, forging connections and partnerships, and promoting local and organisational learning. It is clear that benchmarking is an important if imperfect tool of governance. If benchmarks are to contribute to urban transformation, they must feature and govern towards qualities that will enable just and transformative change.
Notes 1. For a full list of the questions asked in the OPCC initial assessment, see online supplemental material.. 2. For the OPCC, these included its new assessment framework and related methodology, promotional and internal documents including the database of city results for the 2019-2020 OPCC competition. For the GPCCAP, key materials included the primary document outlining the GPCCAP, the GPCCAP Assessment Toolkit, city reports (for the cities of Glasgow, Scotland; Vilankulo, Mozambique; and Lemon Grove, California), internal meeting minutes, a version of the GPCCAP assessment toolkit adapted for university campuses, and a draft assessment of San Diego State University's climate action planning based on the adapted toolkit. 3. Even so, and despite additional reporting support from WWF country offices, many cities still struggle to overcome key barriers to disclosure such as a lack of technical resources, challenges with data availability, a lack of familiarity with disclosure practice, and language issues. 4. The deep dive assessments were not a part of the 2021-2022 OPCC competition and will not be included in the 2023-2024 competition.

Disclosure statement
Emma Lecavalier and Harriet Bulkeley declare no conflict of interests. Tabaré Arroyo-Currás, Carina Borgström-Hansson, Saurav Chowdhury, Jennifer Lenhart and Suchismita Mukhopadhyay were all employed by WWF at the time of writing. Their employer had no role in the design of the study; in the analysis or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Funding
This project has been funded by the European Union's Horizon 2020 research and innovation programme under grant agreement No. 730243.