Concepts of performance in post-occupancy evaluation post-probe: a literature review

ABSTRACT Building performance is a widely held goal in the architecture, engineering and construction industries, driven by a shared pursuit of the triple bottom line. This research paper re-examined the term ‘performance’ and its characterization in post-occupancy evaluation (POE) literature using a semi-systematic review of 160 articles published since 2008. The review identified how performance parameters have been defined, what the dominant attributes of studies are and what metrics have been used to measure them. A thematic content analysis found that many new priorities had emerged in recent years, problematizing Preiser et al.’s 1988 construct of the concept. The main contribution of this paper is a new expanded definition of ‘performance’ in terms of three interrelated domains: building, people and organization, and the development of subcategories for more nuanced analysis. This definition builds on the building performance-people performance paradigm first established by the UK’s PROBE initiative and responds to several shifts in thinking the review results revealed, including a shift from deterministic thinking towards a more bidirectional understanding of the person-environment relationship. Results were further distilled into recommendations to be used by researchers, practitioners and policymakers to identify performance areas of interest and develop more adaptive, integrated approaches to POE work.


Introduction
Performance is a prevailing theme in the building industry, driven, in recent decades, by a shared pursuit of the triple bottom line. This has generated renewed interest (Bordass & Leaman, 2015;Calderon & Froese, 2017;Faezah et al., 2021;Robertson & Mumovic, 2014) in post-occupancy evaluation ('POE') to determine the social, environmental and economic outcomes of different building attributes.
POE is a major component of Building Performance Evaluation ('BPE') conducted to systematically measure actual and perceived project outcomes, compare results to anticipated project outcomes and determine if the designas built and in usemeets its qualitative and quantitative objectives (Hadjri & Crozier, 2009). POE exists in the context of an overwhelming apathy around learning from completed projects, and in particular occupant behaviour and building intelligence as they become increasingly influential in the performance outcomes of sustainably designed buildings. Despite more advanced understandings of building behaviour, better data collection methods, more sophisticated consultation, coordination and construction processes, and better quality, higher functioning systems, materials and assemblies, it remains impossible to accurately simulate and account for every variable that may impact a building's performance, especially in use (Tuohy & El-Haridi, 2016).
The difference between performance objectives and outcomes is widely known as the 'performance gap'. The performance gap may appear in one or more ways: between modelled predictions and measured outcomes, between expectations and lived experiences, and or between measured outcomes and lived experienceswhat  call the predictions gap, the expectations gap and the outcomes gap respectively. The performance gap remains a common issue in high-performance building. When used constructively, however, the performance gap can offer lessons to feed back into building operations and future phases of work and or feed forward to new projects . Findings may also contribute to wider building science research and inform evidence-based design.
A select few organizations, typically government agencies, have mandated POE for their works (Government of Yukon, 2021;NSW Health, 2010;Queensland Government 2021). More are looking at ways to introduce POE meaningfully into their development framework (Alberta Infrastructure, 2014;Galasiu et al., 2019;Hay et al., 2016). It is important, therefore, to build a knowledge base on what constitutes performance and, indeed, high performance, and the ways in which it can be measured in complete and comprehensive ways. As performance demands intensify, new strategies and systems are designed to meet these demands, and the building industry becomes more comfortable collecting, analysing and sharing building data, BPE and POE will continue to gain traction.
The concept of performance, however, is an ephemeral construct. There is a tendency, given the green building movement, to associate building performance with energy efficiency and conservation. A building's performance, however, is broader than its carbon footprint. How it is defined, in what ways it is measured and by what means, varies from study to study, project to project and person to person across time and space. Whereas an abundance of research on POE and its applications has been published in the last several years, as yet there is no comprehensive, analytic review of performance as a concept in this body of research. Without a more critical understanding of the performance paradigm, it is difficult to meaningfully compare and compute trends in building assessment and identify new priorities and perspectives in person-environment research.
This paper presents the results of a semi-systematic review of the performance concept as related to POE and based on its findings, offers a new framework from which to understand and relate the different attributes of study and their indicators. It builds on the work of recent studies including Li et al.'s (2018) comprehensive and critical review of POE projects and protocols, current research objectives and potential areas of interest, examining a principal shortcoming and challenge of current POE: the inconsistency of study parameters that limits the feedback-feedforward cycle.

Background
The concept of performance Preiser, Rabinowitz and White released their book, Post-Occupancy Evaluation (1988), in 1988, providing a complete overview of POE, the theory behind it, its evolution in industry and pragmatic approaches for future studies. In this book, the authors specifically address the concept of performance. They describe its composition in terms of its technical elements, functional elements and behavioural elements. They also outline, generally, the quantitative and qualitative data collection tools used to measure them.
Many researchers, including Preiser, have since revisited the ideas of Post-Occupancy Evaluation (1988), publishing more extant and up-to-date information on the topics covered therein. Together with Mallory-Hill and Watson, for instance, Preiser coauthored Enhancing Building Performance (2012) with a detailed history of POE and BPE. Bordass and Leaman (2015) and Clements-Croome (2019) also explore the history of POE and posit new ideas to sustain its future outside academia. Stevenson (2009) Preiser et al.'s (1988) concept of performance, however, has yet to be re-examined. Its characterization, as such, does not necessarily reflect the current research context, including easier and more advanced methods of data collection, and our evolving understanding of the person-environment relationship. Earlier work fails to distinguish, for example, the tangible and intangible qualities of certain performance elements, or to consider new and emerging ideas around interactive adaptability, or to allow for a user-centered approach to performance assessment. In particular, current literature does little to challenge current definitions of 'performance' thereby leaving little room for lateral thinking about how and why highperformance buildings largely fail to meet their performance targets.

Performance and the PROBE Studies
The PROBE (Post Occupancy Review of Building Engineering) project is considered seminal in the development of POE (Baird, 2001;Brown & Gorgolewski, 2014;Deuble & de Dear, 2014). The project ran from 1995 to 2002, jointly funded by the UK Government and The Builder Group. It produced 23 case studies, developing a standard, well-honed POE protocol along with a database for benchmarking future POE studies (Cohen et al., 2001). The PROBE protocol coupled occupant questionnaires (the Building in Use Study or 'BUS') with the physical testing of indoor conditions and energy monitoring (Bordass et al., 2001a;2001b;. Albeit not new in its methods, PROBE presented a balanced consideration of perceived performance outcomes and measured performance outcomes, refocusing the idea of performance around human responses to technical issues (Bordass & Leaman, 2005;Brown & Gorgolewski, 2014;Stevenson, 2009).
The collective set of results was used to identify common performance issues in and across case study buildings . The case study buildings were non-domestic and considered to be high performing and/or examples of best practice. Their technical performance issues were published in a series of articles for Building Services Journal (CIBSE, n.d.). Despite the journal's relatively limited circulation, the series helped create a collective awareness of chronic industry-wide issues. And although the project did receive some criticism, the direct, transparent documentation of PROBE made it a critical point of reference for subsequent POE work (Loftness et al., 2009;Meir et al., 2009;Stevenson, 2009).
PROBE literature continues to be cited in peerreviewed literature for its introspective contributions to POE research. The PROBE survey component, moreover, has become one of a select few standard POE tools that has transcended its project of origin and is still in regular use today. In collecting self-reported data on occupant health and productivity along with perceived satisfaction, the Building Use Study ('BUS') survey introduced and popularized the building performancepeople performance paradigm in POE.

The post-PROBE era
Despite the documented merits of the PROBE project, its funding ended in 2002 and, for several years after, interest in POE research noticeably stagnated. Industry attention turned to process rather than outcomes (Bordass & Leaman, 2015) likely because of a convergence of new and emerging preoccupations: the monopolization of building information modelling, the expansion of the green building movement, the popularization of evidence-based design, increasing interests in occupant health and wellbeing, and the developing consciousness around equity, diversity and inclusion. By 2008, however, there was renewed interest in evaluating the outcomes of design decisions made in response to these preoccupations and, by extension, POE (Bordass & Leaman, 2015). Most, however, have been conducted as one offs (Li et al., 2018); few have achieved a scope comparable to the PROBE series. PROBE benefited from the time, resources and permissions necessary to meet its objectives. These, however, are persistent barriers to POE in most projects, along with issues of liability, responsibility and a lack of incentive (Durosaiye et al., 2020;Hadjri & Crozier, 2009;Leaman et al., 2010;Roberts et al., 2019;Sanni-Anibire et al., 2016;Stevenson, 2009). Moreover, there is no standard POE protocol that exists to date nor are there industry standards for conducting them.
Several researchers have compared existing and evolving POE methodologies (Brambilla & Capolongo, 2019;Calderon & Froese, 2017;Gossauer & Wagner, 2013). Many others have reviewed advancements in collecting and interpreting performance data (Clements-Croome, 2019; Galatioto et al., 2014;Heinzerling et al., 2013;Olivia & Christopher, 2015;Roberts et al., 2019). No published review, however, was found exploring the recharacterization of performance in the years since Preiser et al.'s (1988) Post-Occupancy Evaluation. The value of this paper is its attempt to capture and redefine performance in terms of the latest POE work and the building performance-people performance paradigm first established in PROBE and establish a framework to help guide future POE studies in identifying performance areas of interest.

Methodology
The search The goal of the search was to identify relevant peerreviewed journal articles published since 2008 on this topic. As shown in Figure 1, bibliographic research for the literature review was conducted in three phases using a semi-systematic method (Snyder, 2019).
The first phase comprised a free text search via two web-scale search services -Google Scholar and Summon Indexto establish key authors, journals, and databases within the timeline of interest and extract recurring key words and related reviews, if any. The free text search also served to inform preliminary exclusion criteria.
The second phase comprised a structured database search of six databases (JSTOR, PubMed, ProQuest, Avery, Scopus and Web of Science) considered to have the largest collections of information related to each of the major related disciplines: architecture and design, building science, environmental science, environmental psychology, human behaviour and human health. Although no geographical constraints were set, search parameters limited results to English and French language peer-reviewed journal articles and conference proceedings published from January 2008 to February 2022. The year 2008 was chosen as it coincides with the documented upsurge in POE post-PROBE described above. This search was conducted using 'post-occupancy evaluations' (and its alternates: POEs, post-occupancy study, post-occupancy studies, building-in-use study, buildings-in-use studies and building performance assessment) as the constant and pairing it with one or two additional terms from a total of five other key words to achieve a series of search permutations: 'building performance', 'performance gap', 'design quality', 'occupant satisfaction' and 'wellbeing'. These Boolean parameters respond to an unfortunate reality in which two keywords casted a broader net than is generally preferred but more than two keywords yielded too few results. Where possible, given the differences between databases and database search tools, search parameters were set to filter out articles in which key words did not appear in either the title or the abstract.
Database search results were imported into a reference software where duplicates were removed. They were then subject title and abstract review (Review 1, n = 242). Per the preliminary exclusion criteria, articles in which (i) the focus was larger than building-scale (i.e. neighbourhood); (ii) performance appraisal was not related in any way to the built environment; (iii) POE/ BPE was not the main subject of the article; and (iv) the full text was not available were eliminated.
During the title and abstract review, it was determined that the preliminary exclusion criteria were not restrictive enough given the varied nature or POE research and yielded articles that did not serve the research objective at hand. Therefore, a second set of exclusion criteria was added, and a second title and abstract review was conducted (Review 2, n = 156). Per the refined exclusion criteria, articles in which (i) POE/BPE is discussed exclusively in the context of the design stage; (ii) POE/BOE is discussed exclusive in the context of data analysis and or visualization; (iii) POE/BPE is discussed exclusive in the context of data storage, sharing and or retrieval; and (iv) POE/BPE is discussed exclusive in the context of architectural practice and or education (this being the subject of a future review by the author). Note, articles referencing the same study but presenting a different subset of results were also recounted as one.
All literature that met the inclusion/exclusion criteria as defined, then underwent full reading and review (Review 3), producing a full set of articles for analysis (n = 137). Of these articles, publications that considered three performance domainsbuilding, people and organizationalwere deemed highly relevant and subject to a citation (snowball) search, this search being the third phase. Additional literature of interest that had not yet been uncovered in Phases 1 and 2 and met the inclusion/exclusion criteria, were obtained in full and added to the register for reading, review (Review 4) and analysis, producing the final set of articles (n = 160).

The analysis
The analytical component of this review was two pronged. A thematic analysis was performed to identify and synthesize themes across articles and extract key findings related to the concept of performance, establishing three overarching domains of performance related to building assessment. A content analysis was then used to identify, index, extract and report patterns and trends amongst the final set of articles (Grbich, 2007;Vaismoradi et al., 2013). Findings from these analyses were synthesized to understand their implications relevant to the research query and build a new encompassing framework from which to understand and relate the different elements of performance as they present in the literature today.

Classification of the literature
The final set of articles subject to analysis (n = 160) included 22 papers that were exclusively reviews, 13 papers that were exclusively theoretical and 125 papers with an applied case study component. The following section summarizes the main findings from a review of this literature, that is the different characterizations of performance found in recent POE articles. Appendix A (Supplementary Material) is a classification of these articles based on their characterization of performance. Table 1 is a summary of these findings. It identifies three major performance domains: building performance, people performance and organizational performance. Each domain includes several categories, and each category includes a series of attributes. The resulting framework and its nomenclature are an evolution of Preiser et al. (1988) concept of performance aligned with current POE research. This framework is the principal contribution of this paper to the current body of knowledge and is unfolded in Section 5: Discussion.
This review focused exclusively on POE related to the built environment, so all 138 non-review papers considered at least one building performance attribute. As illustrated in Figure 2, 46 also considered people performance, 11 considered building performance and organizational performance and 22 considered at least one attribute from all three performance domains. Nearly one-third of studies (n = 41) report that they relied on a standard tool to conduct their performance assessment, the most dominant being an original or adapted version of the BUS questionnaire (n = 15), the survey used in the PROBE studies.

Building performance attributes
The search uncovered no apparent correlation between the year of publication and the popularity of a particular building performance attribute in research publications. The number of studies published since 2017 (n = 80), however, is equal to the number of studies published since 2008 (n = 80), demonstrating the increasing popularity of POE research overall.
Among the non-review papers, only Agha-Hossein et al. (2013) studied at least one attribute in each of the five categories. Steinke et al. (2010) and Vischer (2009) discuss an all-inclusive assessment framework but do not actually test them. Twenty-eight (n = 28) papers consider attributes from four of the five categories (systems, ambient, environmental, functional and psychosocial). Thirty-five (n = 35) papers consider attributes from three of the five categories. Thirty-six (n = 36) papers consider attributes from two of the five categories. Thirty-six (n = 36) consider attributes from only one of the five categories. The most frequent coupling is between ambient and functional attributes (n = 64).
As illustrated in Figure 3, ambient performance (n = 104) is the most studied across all building types except healthcare facilities, followed by functional performance (n = 87), psychosocial performance (n = 62), system performance (n = 51) and environmental performance (n = 30). However, as demonstrated in the Summary Table 1, not all attributes within these categories are studied equally. For example, layout/fit (n = 56) is studied significantly more than any other functional attribute. Aesthetic quality (n = 37) and privacy (n = 31) are studied significantly more than other psychosocial attributes, in part likely because of their intersections with functional attributes like fit/layout and appearance, respectively. However, the number of times studied is not a proxy for depth or rigour. In fact, many papers include attributes in their characterization of performance but do not report on them in their findings, likely because they are using a standard tool to conduct the POE (Altomonte et al., 2019;Best & Purdey, 2012;Brown & Gorgolewski, 2014;Day et al., 2020;Moore & Iyer-Raniga, 2019). Alvaro et al. (2016) is the only study to include all attributes under a single performance category (psychosocial) in its assessment framework. Indoor environment quality (IEQ) is repeatedly used in POE case studies (n = 50) to define a specific subset of performance attributes of interest: thermal comfort, indoor air quality, visual comfort and acoustic quality, fit/functional layout, safety and security, durability/ appearance, maintenance and serviceability, privacy, and aesthetic quality, drawing from all three building performance categories. These attributes are consistently studied together, although they may be reported on separately. Ambient IEQ is an abbreviated subset of attributes also found in the literature (n = 19), comprising only the ambient attributes listed above. Despite this common definition of IEQ and its frequency of use as an assessment framework, there is no common method of evaluation or measure of performance.
As shown in Figure 4, in healthcare facilities studies, functional attributes (n = 31) and psychosocial attributes (n = 25) are more popular. This is likely because functional attributes and psychosocial attributes support the principles of patient-centered care. A small subset of papers looks strictly at building performance in terms of psychosocial attributes, ignoring technical and functional performance attributes altogether (Brown, 2018;Steele Gray et al., 2015;Watson & Whitley, 2017).

People performance
Nearly all of the non-review papers discuss people performance in their introduction, background and or research rationale but only 39% actually measure it in the context of a POE. Sixty-eight (n = 68) studies among 138 non-review papers include at least one people performance attribute in their characterization and evaluation of performance. People performance is most often measured in offices using self-reported health and productivity scores. Overall, health (n = 32) is the most studied attribute followed closely by cognitive functioning (n = 29) and human behaviour (n = 29).

Organizational performance
Twenty-two (n = 22) studies among 138 non-review papers include at least one organizational performance attribute in their characterization of performance. Operational performance is the most frequently cited attribute, specifically workplace effectiveness (n = 20). It is typically measured using selfreported data from employees. Organizational performance is most frequently assessed in healthcare facilities. The remaining papers in which organizational performance is measured cite other non-domestic building types.

The performance yardstick
Performance is always relative; it is determined by an indicator against a specific target. Several targets and, by extension, several indicators can co-exist in a single POE. Two types of performance targets were identified in the literature review: intrinsic targets and extrinsic target. The analysis of case study papers showed that intrinsic targets are more commonly used in POE than extrinsic targets. As illustrated in Figure 5, they are most frequently derived from user expectations with satisfaction scores being the dominant indicator of performance followed by occupant comfort. Lai (2013) followed by Hou et al. (2020) even formalize this practice, adapting the SERVQUAL model from marketing research and customer satisfaction in their respective studies of university dorms.
Researchers have observed a strong correlation between building satisfaction scores and self-reported people performance score for health and productivity (Altomonte et al., 2019;Elzeyadi & Gatland, 2017;Vischer, 2009;Zhang, 2019). This phenomenon corresponds with the notion of environmental quality of perception ('EQP') in which the physical environment is considered a determinant of user satisfaction and productivity, and user satisfaction and productivity are positive performance indicators (Andrade et al., 2012). A checklist approach to performance is also a theme, particularly in healthcare facilities where evidence-based design is a key driver and specific features are known to support certain outcomes (Altizer et al., 2019;Durosaiye et al., 2020;Filippín et al., 2015;Foureur et al., 2011;Ghazali & Abbas, 2012;Hadjri et al., 2012;Quan et al., 2017). Given the above, it follows that perceptual metrics are the dominant means of measuring performance in POE (n = 130). As illustrated in Figure 6, more than half the papers report perceptual metrics with physical measurement (n = 60). Despite the economic implications of building performance, people performance and organizational performance, only 14 (n = 14) case study papers consider economic metrics.
Rarely are pre-OEs carried out (n = 14) to document the condition of a building and establish a performance baseline before it undergoes major renovation or redevelopment project. Even rarer is the use of a control building (n = 1).

Discussion
The semantics of performance The performance concept presents itself in POE literature in different ways, as does all associated terminology. The lack of consistency between performance attributes, indicators, weighting schemes and methods of measurement collectively contribute to the fact that the same building could have vastly different interpretations or valuations of performance (Heinzerling et al., 2013). Rating systems, technical standards and benchmark databases have been attempts to establish set values systems within which levels of performance can be defined. These attempts, however, fail to recognize that performance measured under different circumstances within different building types and with more than one building user (or user-type) against a set value system can only ever reveal gaps. The mutability of performance, arguably, underlies many of the well-documented challenges in POE, making standardization and benchmarking difficult and limiting the feedback-feedforward cycle. As such, there is a need for more research that considers the conditional and circumstantial elements of performance so as to understand the sum of variables that impact it (Brown et al., 2010).

An adapted concept of performance and classification system
As discussed above, the papers identified and analysed in the literature review were categorized thematically based on their performance areas and attributes of interest. The result of this systematic categorization produced an expanded concept of performance with an accompanying framework that may be used to guide future studies. This structure is useful in that it responds to and attempts to resolve the inconsistent terminology and sometimes conflicting semantics within the literature related to performance criteria. As illustrated in Figure 7, it comprises three major domains: building performance, people performance and organizational performance. Each domain includes several categories, and each category includes a series of attributes. These are described and discussed in the sub-sections below.

Building performance
Building Performance is a measure of the technical, functional and or psychosocial performance of the built environment. These performance categories and their attributes are defined and refined from work by Preiser et al. (1988) and others to best capture and relate the characterizations of performance described in recent literature.

Technical performance attributes
Technical performance attributes are those that support the habitability of a building. Based on findings from the literature, these attributes have been subdivided into system attributes, ambient attributes and environmental attributes to help distinguish and identify current areas of interest.
In this paper, we have defined system attributes as those that relate to the physical composition, condition, and/or operation of a building's passive and active systems. System performance can be qualitatively and or quantitatively reported. For example, Cochran Hameen et al. (2020) and Park et al. (2018) assessed the conditions of mechanical systems, electrical controls and the building enclosure using walk-through worksheets as part of their POEs.
In this paper, we have defined ambient attributes related to the intangible qualities of a building's indoor environment. Ambient performance is both objective and subjective. As reported in Alzoubi et al.'s (2010) study, it can be measured using instrumentation and compared to predetermined standards, building regulations or rating requirements. It can also be assessed strictly based on occupant feedback and or  Ambient performance is often impacted by system attributes.
In this paper, we have defined environmental attributes as related to a 'green' intent. Although environmental performance is contingent on system performance and often impacts ambient performance, it is measured differently and often separately. As with Alborz and Berardi (2015) and Moore and Iyer-Raniga (2019), among others, environmental performance is typically quantified using pre-installed metering devices, building system reports and/or utilities billing information.

Functional performance attributes
Functional performance attributes support the building programme and the activities of users in a safe, effective way. Functional performance may be quantitatively or qualitatively measured using both perceptual metrics and physical measurements of space and occupancy loads. In a study by Cai and Spreckelmeyer (2022) walk-through evaluations, space syntax analysis, occupant interviews and shadowing are all used to assess the efficacy of new nursing unit designs. In another study by Regodon et al. (2021b), researchers used digital tracking of office employees to map occupancy patterns, assess space dynamics and determine fit.

Psychosocial performance attributes
Psychosocial performance attributes impact the behaviour of users, contribute to their mental health and support architectural ideas of placemaking. Psychosocial performance can only be subjectively interpreted using occupant feedback and or observation. Sigurðardóttir et al. (2021), for example, used walkthroughs, interviews and observational analysis to collect data in situ on the psychosocial performance of an innovative new school design from different stakeholder groups. Longhinotti

People performance
People performance is a measure of the health, wellbeing, cognitive functioning and constructive behaviour of individual building users. Findings from the literature review indicate that people performance is often collected alongside building performance data; sometimes these data sets are analysed independently, other times they are analysed together and often people performance data are used as a building performance indicator. The recent paper by Rodriguez et al. (2019) provides an example of an integrated concept of performance.

Health attributes
Health refers to the physical and physiological condition of the human body. In the papers evaluated in this study, health is typically evaluated by self-reporting. In the papers reviewed, such as in Agha-Hossein et al. (2013) and Meir et al. (2019), it is also quantified using HR data such as absenteeism and retention rates. Health is increasingly being measured using wearable technologies as explored in 5.4.

Wellbeing attributes
Wellbeing considers all matters of mental health: social, emotional and psychological. It both impacts and is impacted by health. In the review, it was found that wellbeing is most often self-reported, although it can also be observed, as in the work of Ahern et al. (2016) where self-reporting was not always possible.

Cognitive functioning attributes
Cognitive functioning is about the fitness of the mind as determined by memory, alertness, ability to and capacity to learn. It is most often self-reported under 'productivity'. Loftness et al. (2009), for example, developed an addition survey in their POE specifically to assess effective concentration. If correlated with economic metrics, cognitive functioning outcomes could serve as an indicator of organizational performance, although no examples were found.

Behaviour attributes
Behaviour is the degree to which occupants make use of and adjust to their environment in constructive ways, made easier or harder by the environment itself as well as the other attributes of people performance. Productive (or un-productive) behaviour can be observed as in Chiu et al. (2014), self-reported by survey as in Brown (2016) and or tracked and quantified using digital systems like Regodon et al.'s (2021a) study of co-living spaces using available digital infrastructure as data sources. Note, behaviour is the only people performance attribute whose impacts on building performance outcomes that can be physically measured, for example, energy consumption.

Organizational performance
Organizational performance is a measure of financial, operational and cultural performance of an organizing body that occupies a building. Findings from the literature review determined that organizational performance is not often measured alongside building performance in POEs, arguably because organizational performance attributes are typically associated only with corporate contexts (although a single or multi-family dwelling may also be considered an organizing body). Like people performance, organizational performance data can be used as a building performance indicator. People performance and organizational performance are inextricably linked. In some cases, one is assessed in lieu of the other.

Financial performance attributes
Financial performance attributes are the capital costs and operational costs that determine the fiscal health of an organization. Financial performance can only be quantified using economic metrics. Watson and Whiteley (2010), for example, trialed a social return on investment method in POE, a method derived from traditional cost-benefit analysis.

Operational performance attributes
Operational performance attributes relate to productive staff workflow and the effective, efficient delivery of service. Operational performance can be qualitatively reported on by occupants and or administrators. Joseph et al. (2014), for instance, administered surveys to both staff and the building owner/representative to assess the efficacy of different design variables and their impact on operational outcomes. Operational performance is often discussed in terms of or in relationship to evidencebased design, particularly in a healthcare context. It could also be quantified using physical measurements or economic metrics, to study the operational impacts of Lean design on staff workflows and production for instance, but no published peer-reviewed papers on this were found in this review.

Cultural performance attributes
Cultural performance attributes determine the strength of the workforce in terms of employee loyalty, commitment, cooperation and morale. Research findings published by authors such as Brown et al. (2010) and Sailer et al. (2008) reveal cultural performance can be objectively and subjectively reported on through marketing material, occupant feedback and observation. The cultural performance of a company could be a factor in building performance scores.

Programme and people-specific performance requirements
Even with a consistent set of performance parameters, performance requirements are not universal, particularly in complex building types that accommodate many different programmes and a wide range of user profiles. Abisuga et al. (2020) acknowledge that the conditions of performance vary not only from building to building but from room to room, depending on the given requirements of their programme and users. Bortolini and Forcada (2021) go further, suggesting the different user groups and the types of activities they perform drive 'satisfaction differences'. And Fieldson and Sodagar (2017) observe these user groups take different measures to respond to and regulate the specific characteristics of their indoor environment, particularly where passive systems and or user controls are available. There is arguably limited value, therefore, in POEs that set a standard of performance attributes within and across building types (Zuo et al., 2011). In response, each of the aforementioned authors developed assessment frameworks in which individual rooms can be evaluated on their own merits as perceived by different usertypes. Işıklar Bengi and Topraklı (2020) took a similar stance but produced an assessment framework that breaks performance down by attribute rather than room type. While the value of determining a total performance score is debated (Işıklar Bengi & Topraklı, 2020), in the case of Abisuga et al. (2020), it can be tallied from the sum of individual performance scores.
It is also worth noting that several papers acknowledge the conditions and characterizations of performance often differ within the project team. Joseph et al. (2014) and Kotzer et al. (2011), for instance, developed and administered different questionnaires to these stakeholdersthe client, the architect and the principal user group(s)to collect and compare the different performance strata that can exist within a single building project.

The performance perspective in context
Results suggest that since 2008, POEs regularly rely on surveys to assess building performance, likely because surveys are generally considered inexpensive, resource light, easy to develop and, with digital distribution platforms, even easier to implement (Deuble & de Dear, 2014). The exclusive use of surveys to measure building performance in POE is a significant divergence from PROBE (Day et al., 2020), one that has incited some debate. Researchers are critical of comfort and satisfaction scores as well as self-reported health and productivity because of the inability to control for respondent bias, identify higher levels of tolerance within the participant group or dissociate respondents' general feelings with those specifically related to their physical environment (Day et al., 2020;Gorgievski et al., 2010;Gossauer & Wagner, 2013;Vischer, 2009). Steele Gray et al. (2015 explored this phenomenon in their study of the impacts of change readiness on optimism, wellbeing and facility satisfaction amongst hospital staff. As Brown (2018) suggested, there is a challenge in defining 'who' the user is outside their role in the building and in what way that identity both informs and is informed by the building-user interface. Furthermore, there is no standard of measurement or appropriate scale for 'perceived' performance (Bae et al., 2020;Best & Purdey, 2012). And, unlike Duarte Roa et al. (2020), most surveys ask occupants to rate how they feel without also asking how they would like to feel (Chiu et al., 2014), creating an impediment to benchmarking.
Perceived (building) performance is a highly complex construct of the different, converging human experiences of space. The prevalence of technical attributes over functional attributes and functional attributes of psychosocial attributes in POE assessment frameworks recognizes an increasing difficulty to control for variables and measure, with sufficient rigour, personal and interpersonal experiences in the context of the built environment. According to Pearce (2018), these experiences include the sensory experience, the cognitive experience and the immersive experience, and they can be influenced by both internal and external factors unrelated to the physical environment. These factors, arguably, create the bias that informs much of the skepticism around subjective feedback. Shepley et al. (2012), for instance, observed that patients in health facilities tend to be more influenced by their caregivers than the care environment and often score new environments lower on a comfort scale, suggesting familiarity plays a role in comfort.
Some researchers have constructed multi-method approaches, coupling surveys with other qualitative techniques typically from the social sciences to recontextualize feedback in terms of everyday experiences (Deuble & de Dear, 2014;Ghazali & Abbas, 2012;Longhinotti Felippe et al., 2017;Manahasa & Özsoy, 2016;Sigurðardóttir et al., 2021). Ethnographic methods of observation, for example, are being used more regularly (either as stand-alone methods or part of a mixed/multi-method protocol) to extrapolate a more meaningful understanding of the person-environment relationship (Chiu et al., 2014;Ornstein et al., 2009;Ornstein et al., 2010;Pannier et al., 2021;Pink et al., 2020;Regodon et al., 2021a;2021b;Tweed & Zapata-Lancaster, 2018). These are an attempt to identify variables and gain feedback that could not be uncovered through conventional means (Pink et al., 2020). Such methods are deemed particularly important in evaluating building performance and its relationship to people performance where vulnerable user groups are concerned (Ahern et al., 2016;Cochran Hameen et al., 2020;Ferreira et al., 2017;Leung et al., 2012;Leung et al., 2014).
Other researchers are building on the PROBE approach, using right-here, right now occupant surveys rather than or in addition to general satisfaction surveys and paring them with physical measurements to better correlate user feedback and actual environmental conditions, diagnose chronic performance issues and develop more responsive solutions (Candido et al., 2016;Choi & Moon, 2017;Duarte Roa et al., 2020;Elzeyadi & Gatland, 2017;Regodon et al., 2021a). Pannier et al. (2021) and Tweed and Zapata-Lancaster (2018), among others, assumed a multidisciplinary methodological approach to building assessment, marrying methods from building science, social science and, in the case of Pannier et al. (2021), life cycle analysis.
In addition, how people performance is measured in the context of the human-building interface is changing, with theoretical interest in other people performance attributes beyond generic health, wellbeing and productivity scales Liu & Qian, 2019) and increasing discussions around how to measure people performance more objectively through digital tracking and wearable technology Regodon et al., 2021a;Sant'Anna et al., 2018;Verma et al., 2017).
Overall, most articles acknowledge the singular use of surveys as a limitation, advocating for more robust methodologies that offer 'explanatory information' (Gossauer & Wagner, 2013), resources permitting. Even then, integrative, and comparative approaches to the analysis of objective and subjective data remain difficult and are not always present in more exploratory studies, given instead as two parallel streams (Loftness et al., 2009;Meir et al., 2019).

The human variable and bidirectional performance
There are two competing perspectives in performance assessment: determinism and social constructivism. Vischer (2008) argues determinism is often preferred in POE studies because it implies a direct cause-and-effect relationship between building and userbetween building performance and people performanceas is the case in a majority of the literature cited here. Constructivism, on the other hand, ignores the effects of the built environment on users completelybuilding performance and people performance are independent from one another. Both are problematic.
Findings from this literature review show there is a marked transition occurring away from deterministic thinking around the person-environment relationship towards a more bidirectional view performance, embracing the active role users play in its construct (Gossauer & Wagner, 2013). Users adapt and adapt to their building over its lifecycle (Park et al., 2018), and the more informed they are the better their decisions and opinions are (Brown, 2016;Brown & Gorgolewski, 2014). Several papers explore how and why users behave the way they do in an attempt to better understand and diagnose performance-related issues and affect productive change, often in the context of energy savings (Bourikas et al., 2020;Brown, 2016;Chiu et al., 2014). Göçer (2014) and Zhao & Yang (2021) even proposed new methods of data collection and data analysis respectively using smart building technology to help make buildings adaptive to users in real-time, improving operational and environmental performance.
A bidirectional understanding of performance recognizes (a) the interdependency of performance domains; (b) the unpredictable impacts of human behaviour on performance outcomes; and (c) the inevitable gaps this causes in the performance assessment of a building in use. People are always a variable in the performance equation. Moreover, a one-dimensional, unidirectional construct of performance neglects the interrelationship between attributes. As evidenced by several POEs that have revealed low IEQ satisfaction in energy-efficient 'green' buildings (Ng & Abidin Akasah, 2013;Radwan & Issa, 2017;Sant'Anna et al., 2018;Thomas, 2009;Wang & Zheng, 2020;Xuan, 2018), performance cannot be achieved in one area without impacting performance in another, and the ways and means of measuring performance need to consider this and respond accordingly.
With this, new and emerging theories published in recent papers suggest a shift in thinking is underway from 'gaps' to 'trade-offs'. Trade-offs thinking acknowledges the performance of one attribute may compromise the performance of another be it within or across domains (Andargie et al., 2020;Oliver et al., 2019). Andargie et al. (2020) argue a sustainable approach to performance is one that considers and responds to the interdependency of attributes, in other words, develops and delivers a building based on the most appropriate trade-offs relative to the project brief. POE must respond accordingly.

Limitations
The transdisciplinary nature of POE and the variations in its nomenclature rendered the search and screening process in this research inherently complex. It is probable, despite the rigour of the methodology, that not all relevant references were captured, and studies published in highly specialized journals were missed. Furthermore, the citation search revealed, specifically, that research of this kind led by healthcare researchers and published in healthcare journals is rarely referred to as a POE and instead is more generically titled (e.g. 'assessment of', 'impact of', 'qualitative study of', etc.) thereby evading the search parameters. In addition, several conference proceedings could not be accessed despite the efforts of the authors. The language parameter limited the search to English and French articles only. Although a concerted attempt was made to eliminate bias from the search protocols, the final register of articles reflects the subjective interpretation and qualitative assessment of the authors as to the articles' relevance to the review and research questions.

Conclusion and recommendations
This semi-systematic review of 160 articles on POE published since 2008 explored the characterization of performance and the impediment its variability and mutability impose on this field of study, both in practice and in research. The classification of these articles found that many new priorities had emerged following a transformative period in the building industry post-PROBE, problematizing Preiser et al.'s 1988 construct of the concept. From this classification, the authors developed an expanded definition of building performance with new, integrated modules for people performance and organizational performance.
A further contribution of this paper is the shifts in thinking the analysis of the results revealed, shifts (i) from deterministic thinking to more bidirectional understandings of the person-environment relationship; (ii) from the inhabitant as passive to the inhabitant as adaptive and adapting; (iii) from technical standards, rating systems and benchmarks to more tailored indicators of quality and value that respond to individual project objectives; and (iv) from performance gaps to performance trade-offs. This reframing of performance should be used by researchers, practitioners and policymakers to understand the different domains of performance, the implications associated with their variability and interdependencies, identify performance areas of interest and develop more integrated assessment frameworks for their POE work.
POEs, however, will continue to be limited latitudinally by how many performance attributes can successfully be measured and the number of performance variables that can be accounted for to rationalize results. They will continue to be limited longitudinally by the transient nature of performance as buildings, their materials, systems and assemblies age, and new users with new needs, programmes and preferences occupy them. Overcoming these limitations also demands rethinking POE. The authors propose the following recommendations based on the results of the review: . POE projects should assume a more tailored approach to their investigations that reflects and responds to the individual performance objectives of a building and its occupants. While methods of data collection can be standardized, performance attributes and indicators cannot. A one-size-fits-all attitude with a singular focus on standardization and benchmarking will continue to pose a barrier to POE. . POE projects should assume an adaptive and cyclical approach to their investigations that acknowledges the conditional and circumstantial nature of performance in the person-environment relationship. Conducting POEs at key intervals over a building's lifecycle with updated attributes and indicators offers the best opportunity to redress new issues as they arise and optimize performance to the extent possible. . Per Ning and Chen (2016), POE projects with the intent to advance industry (as opposed to a specific research question) should take a participatory approach to select performance attributes and indicators to reflect the specific and current needs and priorities of different users and stakeholder groups. Feedback is only as valuable as it is relevant. More practitioners, policymakers and building owners/ operators will be receptive to POE when performance measures offer intelligence that supports their cause. . POE projects should include corresponding attributes from across different domains so as to control as many variables as possible. Considering building performance in a silo will produce oversimplified results, and if this is a limitation of a study it needs to be acknowledged explicitly.
. More multidisciplinary, mixed-methods approaches should continue being developed to not only measure performance as a product of the person-environment relationship but interpret and understand the results in this way. Performance has several interrelated domains and attributes, therefore presenting discrete data sets is of limited value to the feedback-feedforward cycle. These need to be factored into the analysis of results as much as the assessment framework.
The sum of these recommendations responds to the well-evidenced fact that performance is nuanced and therefore the results of POE should be nuanced. Nuance is what, arguably, will drive the continued evolution of building intelligence in the pursuit of social, economic and environmental sustainability.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
This work was supported by the Faculty of Engineering and Architectural Science at Toronto Metropolitan University.