Aerobic capacity in persons with Parkinson’s disease: a systematic review

Abstract Purpose To systematically review studies assessing (1) psychometric properties of the maximal oxygen uptake (VO2max) test in PD, (2) VO2max levels in persons with PD (pwPD) compared to healthy controls (HCs), and (3) reported VO2max associations in PD. Materials and methods Six databases were searched. Descriptive data synthesis was used to summarize psychometric properties and reported VO2max associations. The VO2max means and test end-criteria were calculated using linear mixed models. Simple linear regression was used for associations. Results The review included 25 studies. Psychometric properties of the VO2max test, reported in one study, showed intraclass correlations of 0.90–0.94 for VO2max. Thirteen studies reported test end-criteria, with only mean respiratory exchange ratio (on medication) and percentage of predicted maximal heart rate (off medication) fulfilling standardized minimum values for the VO2max test. The VO2max was comparable between pwPD and HC as well as between different PD-medication states. Associations between VO2max and age, sex, and fatigue were reported. Conclusions In mildly to moderately affected pwPD, limited evidence exists on the psychometric properties of the VO2max test and end-criteria were sparsely reported. Surprisingly, VO2max was comparable between pwPD and HC as well as between different PD-medication states, and only age, sex, and fatigue were associated with VO2max. Implications for rehabilitation In mildly to moderately affected persons with PD (pwPD), only one study has examined psychometric properties of the VO2max test, reporting excellent test–retest reliability. A general lack of consistency on how to measure and report VO2max end-criteria was observed, but when reported, the end-criteria were most often not met. No difference was found in VO2max between mildly to moderately affected pwPD and HC, or between pwPD across different medication states. The identified negative association between VO2max and fatigue suggests aerobic exercise as a potential symptomatic treatment of fatigue when rehabilitation professionals are treating pwPD.


Introduction
Parkinson's disease (PD) was originally described by James Parkinson in 1817, by outlining some of the major motor signs including bradykinesia, rigidity, and tremor as well as mental symptoms [1]. Worldwide, PD has been estimated to affect more than 10 million people [2], and recent data suggest that PD is the fastest growing neurological disease, even outpacing Alzheimer's disease [3].
Despite progress in pharmaceutical and surgical approaches in symptomatic treatment of PD, disease modifying therapies remain to be elucidated [4]. During the past decades, physical exercise has been recommended as a supplement to pharmaceutical treatment in order to improve symptom management [5]. Nonetheless, persons with PD (pwPD) are known to adopt a markedly more physically inactive lifestyle compared with healthy controls (HCs) [6][7][8]. Low levels of physical activity often lead to reduced aerobic capacity (maximal oxygen uptake: VO 2max ) [9] and subsequently cause a number of related health problems (e.g., increased risk of cardiovascular diseases, diabetes, osteoporosis, and depression [8,10]) as well as worsening of various nonmotor symptoms (e.g., insomnia and constipation [8]). Importantly, aerobic capacity has been identified as a strong health and performance predictor in both healthy [11][12][13][14] and clinical [15] populations and is therefore also considered a highly relevant physiological outcome in PD [16]. In context thereof, an in-depth understanding of the exact levels of aerobic capacity in pwPD, with different medication states (i.e., on/off medication), compared to HC appears essential. So do the understanding of any potential associations between VO 2max and clinical measures of motor and non-motor symptoms or functional outcomes in pwPD, as this could potentially identify new relevant targets that could be impacted in future (aerobic) exercise interventions. Yet, systematic reviews on these topics could not be identified.
Aerobic capacity is often assessed when prescribing aerobic exercise or when evaluating the effects of aerobic exercise interventions [17]. When assessing VO 2max , direct assessment, using a graded whole-body protocol with complementary respiratory gasexchange measurements, is considered the gold standard [18]. However, insights into the psychometric properties of the VO 2max test in pwPD are a prerequisite for correct interpretation of test results. Nonetheless, we could not identify studies that have synthesized the existing literature regarding the psychometric properties of the VO 2max test in pwPD. Therefore, the validity, reliability, and responsiveness of the VO 2max test remain unclear in PD, highlighting the need for a systematic review on this topic.
Consequently, the objectives of the present study were to systematically review and summarize studies assessing (1) the psychometric properties (validity, reliability, and responsiveness) of the VO 2max test in pwPD, (2) the literature comparing aerobic capacity, as measured by VO 2max , in pwPD with different medication states to HC, and (3) associations between VO 2max and clinical measures of motor and non-motor symptoms or functional outcomes in pwPD.

Study selection
The present systematic review follows the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines [19]. A predefined review protocol was published at PROSPERO (registration number: CRD42021236072). The study was based on systematic literature searches of six databases (PubMed, EMBASE, Cochrane Library, PEDro, CINAHL, and SPORTDiscus), which were performed to identify studies on VO 2max in pwPD published before 26 April 2022. Table 1 shows the exact search terms used in the various databases. Results from the literature search were screened independently by two evaluators (CT and MG) using the Covidence Software [20], which is designed for article screening. In case of disagreement on the potential acceptability of studies, the studies were discussed among the evaluators, and resolved by involving a third author (UD) when needed. Electronic searches were supplemented by hand searches of the reference lists from the included papers.
All included studies had to enroll participants with a reported diagnosis of PD, assess VO 2max with complementary respiratory gas-exchange measurements using a graded exercise test to voluntary exhaustion on a whole-body ergometer (e.g., bicycle, treadmill), be peer reviewed, be available in English, be published in full text and report VO 2max data separately for pwPD in studies involving other (patient) groups. Additional inclusion criteria were specific for objectives 1 and 3: Objective 1: Include assessment of one or more psychometric properties (i.e., validity, reliability and/or responsiveness) related to VO 2max testing of pwPD.
Objective 3: Report associations of VO 2max and clinical measures of motor and non-motor symptoms or functional outcomes in pwPD.
Trustworthiness, relevance, and results of published papers were based on the mentioned inclusion criteria.

Quality assessment
The quality and risk of bias of the included studies were assessed by two reviewers using the Quality Assessment Tool for observational cohort and cross-sectional studies (NIH, Bethesda, MA). This quality appraisal tool was chosen to accommodate the various study designs (i.e., cross-sectional and randomized controlled trials). A rating scale (scoring options: yes ¼ 1, no ¼ 0, cannot determine ¼ 0, not applicable ¼ 0, and not reported ¼ 0) was applied for the 14 questions of the checklist. Study quality was rated as poor (0-4 out of 14 questions), fair (5-10 out of 14 questions), or good (11-14 out of 14 questions), based on individual scores and the severity of the risk of bias. The quality assessment of each study is reported in Supplementary Table 1.

Terminology
One minor, yet important, distinction relates to the reported outcome of the aerobic capacity test, reported as either peak oxygen uptake (VO 2peak ) or VO 2max [11][12][13][14]. Of note, VO 2peak simply defines the highest value attained during a VO 2 test, regardless of the participant's effort and test modality (rowing, cycling, treadmill, etc.), and does not necessarily represent the true VO 2max , a term introduced by Hill and Lupton in 1923 as "the oxygen intake during an exercise intensity at which actual oxygen intake reaches a maximum beyond which no increase in effort can raise it" [21]. The latter determination is normally confirmed by attainment of several validation criteria [22][23][24]. However, these definitions are used interchangeably in the literature, with both representing the highest observed value during an incremental test. Despite this difference between the terms, both will throughout the present review be described as VO 2max for simplicity.
The psychometric properties of the VO 2max test in pwPD were reviewed with respect to validity, reliability, and responsiveness [25]. In the present review, validity was determined as the degree to which the VO 2max test was an adequate reflection of the aerobic capacity in pwPD. Since direct assessment of VO 2max using a graded, whole-body protocol with complementary respiratory gas-exchange is widely accepted as the gold standard for measuring aerobic capacity in healthy persons [18], it was not considered relevant to evaluate either construct (i.e., the degree to which the VO 2max test measures VO 2max ) [25] nor criterion (i.e., comparing the VO 2max test with an established or widely used test, that is already considered valid) [25] validity of the test. However, content validity (i.e., representativity of the construct) [25] of the VO 2max test in PD was assessed by looking into different end-criteria that should be met during the VO 2max test for it to be considered a true test. To do this, baseline data from cross-sectional and longitudinal studies reporting end-criteria, were used for the evaluation of content validity. The cut-off values for the end-criteria applied to validate a true maximal test are in this review set as (1) a plateau/ leveling off in oxygen uptake despite an increasing workload, (2) a post-test blood lactate level �3.5-8.0 mmol/L (age dependent), (3) a respiratory exchange ratio (RER: VO2/VCO2) �1.00-1.15 (age dependent), (4) a maximal heart rate (HR max ) within 10% or 10 beats/min of the estimated HR max (e.g., 220-age or 208-0.7 � age), and (5) a perceived exertion �17 on the 6-20 Borg scale of perceived exertion [22][23][24].
The reliability (i.e., the extent to which VO 2max measurements will give the same results on separate administrations and is free of measurement error) of the VO 2max measure was determined by evaluating studies, where pwPD completed a VO 2max test on more than one occasion with the assumption that no real change in VO 2max had occurred between sessions [26]. Measures of reliability included (1) change in mean values (i.e., random or systematic change), (2) within-subject variation (i.e., standard error of measurement (SEM), limits of agreement (LOA), coefficient of variation (CV%), and/or smallest detectable change (SDC)), and/or (3) correlations (intraclass correlations coefficient (ICC)) [27,28]. These measures provide insight into the extent of changes required to exceed methodological variability.
Finally, responsiveness was defined as the ability of the VO 2max test to detect a change over time. Two aspects of responsiveness is internal responsiveness (i.e., characterizes the ability of a measure to change over a predefined time period), and external responsiveness (i.e., reflects the extent to which change in a measure relates to a z<corresponding change in a reference measure of clinical or health status) [29].
Data extraction from the included studies was performed by one reviewer (CT) and included (if available) participant characteristics (i.e., sex, height, weight, body mass index (BMI), time since diagnosis, medication status, the Unified Parkinson's Disease Rating Scale (UPDRS), the Movement Disorder Society revised UPDRS (MDS-UPDRS), and Hoehn and Yahr scale), study characteristics (i.e., study design, sample size, and distribution of sex in sample size), testing procedures (i.e., testing-protocol and methods for evaluating gas exchange), and values of VO 2max . From the longitudinal data, only baseline VO 2max measurements were extracted.

Statistical analysis
Across the included studies, sample size and variability weighted means ± 95% confidence interval (CI) were calculated using linear mixed models (StataCorp., 2019, Stata Statistical Software: Release 16, StataCorp LLC, College Station, TX) for VO 2max and the reported VO 2max test end-criteria. When comparing differences in VO 2max between groups (i.e., all pwPD (PD all ) vs. HC, pwPD on medication (PD on ) vs. HC, pwPD off medication (PD off ) vs. HC, PD on vs. PD off , and PD vs. HC (direct comparisons within same study)), "group" was set as a fixed effect and "study ID" as a random effect. Absolute VO 2max values were normalized to the reported mean bodyweight values when possible (i.e., mL O 2 �kg À 1 �min À 1 ). Descriptive data synthesis was used to summarize studies reporting associations between VO 2max and key characteristics of the study population (e.g., UPDRS, Hoehn, and Yahr scale). Simple linear regression analyses were used to explore associations between key study characteristics and VO 2max from the present review. Strong, moderate, and weak associations were defined as coefficients (r) exceeding �0.70, coefficients between 0.40 and 0.70, and coefficients �0.40, respectively [30].
Statistical significance was set at p � 0.05.

Study selection
The literature search yielded 1679 hits. After removal of duplicates, 1483 publications remained. Five additional papers were identified from other sources [30][31][32][33][34]. From the screening, a total of 30 papers fulfilled the inclusion criteria (see Figure 1). Authors from six papers were contacted by email as it was suspected that four [35][36][37][38] and two [39,40] papers, respectively, were published from the same two study populations without cross-referencing. This was confirmed to be the case for the two papers [39,40], whereas the corresponding author of the four papers did not respond. However, based on uniformity of authors, participant recruitment, participant characteristics, and results it was assumed, that the study population was the same in the four homogeneous papers [35][36][37][38]. If any relevant information or data were missing, authors were contacted by email for additional details. Authors from two papers [33,41] were contacted regarding data on participants' weight or a weight adjusted VO 2max, but they did not respond. Authors from four papers [30][31][32]42] were contacted regarding their testing protocol, as it was unclear if a direct VO 2max test to volitional exhaustion had been used. Two authors [30][31][32], confirmed the tests to be direct measurements performed until volitional exhaustion. Authors from two papers [31,43] were contacted regarding medication status during the VO 2max test and one returned with the missing details [31]. Finally, authors from two papers [30,44] were requested to provide VO 2max data in mean ± SD rather than median (range), as reported in their articles. Both authors returned the requested data. All of the above-mentioned papers were included in the review independent of whether or not the corresponding authors responded to our inquiry. The quality assessment of individual ) OR (Parkinson's disease)) AND (maximal aerobic capacity)) OR (aerobic capacity)) OR (breathing gas analysis)) OR (VO2peak)) OR (VO2max)) OR (VO2-max)) OR (cardiopulmonary exercise test)) OR (cardiopulmonary exercise testing)) OR (maximal oxygen uptake)) OR (maximal oxygen consumption)) OR ( 1. Studies assessing the psychometric properties of the VO 2max test in PD (N ¼ 1) and studies reporting end-criteria for fulfillment of a true VO 2max (N ¼ 13); 2. Cross-sectional and longitudinal papers reporting baseline measurements of VO 2max in pwPD (N ¼ 30); 3. Studies reporting associations of VO 2max in PD (N ¼ 5).

Psychometric properties of the VO 2max test in PwPD
Only one study (n ¼ 70) evaluating reliability of the VO 2max test in pwPD was identified [37]. Day-to-day reliability was tested by performing two max treadmill tests one week apart. The ICC 2,1 (twoway random, absolute agreement, across two or three tests) was determined for a number of cardiopulmonary parameters. Mean VO 2max was 2.4% higher in the second test compared to the first test (21.4 ± 4.3 mL O 2 �kg À 1 �min À 1 vs. 21.9 ± 4.5 mL O 2 �kg À 1 �min À 1 , p ¼ 0.03) [37]. The VO 2max expressed as both mL O 2 �kg À 1 �min À 1 and L O 2 �min À 1 had excellent reliability with ICC of 0. . Bland-Altman's plots of the within-subject change for VO 2max , expressed in mL O 2 �kg À 1 �min À 1 , vs. the mean of test one and two showed an increase of 0.56 mL O 2 �kg À 1 �min À 1 with 95% LOA ranging from À 3.5 to 4.6 mL O 2 �kg À 1 �min À 1 . In 21 participants where the first two tests differed by �5%, a third VO 2max test was performed. The VO 2max increased 0.56 mL O 2 �kg À 1 �min À 1 /test (i.e., a total of 1.2 mL O 2 �kg À 1 �min À 1 from first to the third test). The HR max and RQ showed no changes across the three tests [37]. It was further reported that in 63 participants who were not on beta blockers, only seven (11%) participants achieved a true VO 2max , based on an RER exceeding 1.1 and an HR max �85% of age predicted HR max .

Reported and calculated associations of VO 2max in PwPD
Five studies [9, 34,35,37,47] reported associations between VO 2max and clinical as well as functional outcomes. Ivey et al. [35] (n ¼ 70) showed that neither UPDRS total score (b±standard error (SE): À 0.024 ± 0.034, p ¼ 0.492) nor UPDRS motor score (UPDRS III) (b±SE: À 0.038 ± 0.048, p ¼ 0.429) were associated with VO 2max after adjusting for age and sex in mildly to moderately affected pwPD. Similarly, Hoehn and Yahr scale and Hoehn and Yahr scale combined with the UPDRS measures were not associated with VO 2max . In contrast, age (b±SE: À 0.190 ± 0.042, p < 0.001) and sex (b±SE: À 3.821 ± 0.986, p < 0.001) were both negatively associated with VO 2max , representing a decline in VO 2max with increasing age and a reduced VO 2max in women compared to men, respectively. Canning et al.
[9] (n ¼ 20) showed that Watt-peak was positively associated with VO 2max (r ¼ 0.74, p < 0.002). Moreover, there was no difference between the measured and calculated linear regression lines of Watt-peak and VO 2max (i.e., participants consumed the predicted amount of oxygen when achieving their max work rate). Garber and Friedman [47] (n ¼ 37) found a moderate negative association between the Fatigue Severity Score Index (FSS) and VO 2max (-0.49, p ¼ 0.011). Katzel et al. [37] (n ¼ 70) examined time (test number), age, sex, race, medical comorbidities, UPDRS total, UPDRS III, and Hoehn and Yahr scale to see if they predicted the change in VO 2max from the first to the second test, when performed one week apart. None of the variables showed significant associations with VO 2max . Likewise, including all variables from the study in a multiple regression analysis did not predict VO 2max . Johansson et al. [34] (n ¼ 21) found a moderate positive association between frontoparietal network connectivity (dorsolateral prefrontal cortex-right frontoparietal network) and VO 2max (0.62, p ¼ 0.003).

Discussion
The present systematic review of VO 2max in pwPD provides a comprehensive summary of the existing literature. In order for both clinicians and researchers to prescribe and/or evaluate exercise interventions, valid and reliable assessment of VO 2max is crucial. However, only one study specifically investigated the test-retest reliability of the VO 2max test in pwPD with limited measures reported, highlighting a knowledge gap regarding the psychometric properties of the VO 2max test in PD. Furthermore, a general lack of, or incomplete use of, VO 2max end-criteria along with an inadequate reporting of these data was observed in the identified literature. Of note, no difference was found in aerobic capacity as measured by VO 2max between any PD groups (across different medication states) or when comparing pwPD to HC suggesting that the pwPD in the studies were fit and active. However, this should be interpreted cautiously as studies including HC were few. Our search for VO 2max associations identified fatigue as a potential "treatment-target" for aerobic exercise interventions, but generally the investigation of associations between VO 2max and other non-motor as well as motor outcomes was limited.

Psychometric properties of the VO 2max test in PwPD
Although whole-body exercise testing in terms of VO 2max has been used to evaluate aerobic capacity in numerous studies in pwPD [30][31][32]42,46], very little research has been done evaluating the psychometric properties of VO 2max testing in this population. This is problematic since the psychometric properties of the VO 2max test may differ substantially in PwPD compared to healthy people or other populations, due to higher day-to-day variation caused by the many symptoms of PD, drug exposure or additional PD related aspects. Only one study by Katzel et al. [37], evaluating the test-retest reliability of the test, was identified. Of note, a learning effect was observed across all three VO 2max tests. The LOA for VO 2max between the first and second test ranged from n: sample size; RER: respiratory exchange ratio; RPE: rating of perceived exertion-Borg Scale (1-10 or 6-20); VO 2max : maximal oxygen uptake; SD: standard deviation. Data are presented as mean ± SD/(range) unless otherwise specified. If studies had more than one PD group with the same medication status, a sample size weighted mean was calculated for the individual study. Total means ± 95% confidence interval (CI) for all and on/off medication (sample size and variability weighted) were calculated based on studies reporting mean values only. Percentage of predicted HR max was calculated based on the formula (HR max / 208 -(0.7�age))�100. Total means of % predicted HR max were weighted based on sample size only. a Data reported in median. À 3.5 to 4.6 mL O 2 �kg À 1 �min À 1 . This implies that for a change to exceed random or systematic error a positive change beyond 4.6 mL O 2 �kg À 1 �min À 1 is needed [27]. An increase exceeding 4.6 mL O 2 �kg À 1 �min À 1 in VO 2max corresponds to a change of �20% of the average PD VO 2max presented by Katzel et al. (21.4 mL O 2 �kg À 1 �min À 1 ) and the present review (22.7 mL O 2 �kg À 1 �min À 1 ). In comparison, Langeskov-Christensen et al. investigated the validity and reliability of the VO 2max test in people with multiple sclerosis (MS) [59]. Results from this study support that a valid test of VO 2max , at a level corresponding to that of HCs, can be performed in mildly to moderately impaired people with MS with a day-to-day variation of 10% in VO 2max . Studies assessing the effects of aerobic exercise on aerobic capacity in pwPD [30][31][32]42,46,57] report changes in VO 2max ranging from 0% to 22% after 6-12 weeks of moderate to high-intensity exercise interventions. However, only one study reported a change in VO 2max exceeding �20% [31]. The SDC was not specified in the reliability study by Katzel et al. [37], although this could provide further insight into the changes required in order to be certain that a change is beyond methodological variability. The SDC on a group level might differ significantly from SDC on an individual level, as previously reported in persons with MS [60]. No Bland-Altman's plot was reported for the within-subject change between the second and third test, although these data could have provided further insight into the effects of a systematic error (familiarization effect). Based on their study results, Katzel et al. [37] recommended two assessments of VO 2max for intervention studies, although they concluded that a single test would probably be sufficient for characterizing fitness levels in cross-sectional studies. Furthermore, their high ICC values made them conclude that the VO 2max test is reliable and repeatable in participants with mild to moderate PD. A detailed examination of studies that included VO 2max measurements in pwPD (Table 2) showed that the classical test-validation end-criteria, which indicate if a "true" VO 2max test has been undertaken, are often inadequately reported. Moreover, only 12 RCT: randomized controlled trial; n: sample size; [IQR]: interquartile range; PD G1, G2, and G3: different Parkinson's disease groups from the same study; -II-: same study population; HCs: healthy controls; NR: not reported. UPDRS: Unified Parkinson's Disease Rating Scale total; UPDRS III: UPDRS motor score; H&Y: Hoehn and Yahr Scale; VO 2max : maximal oxygen uptake is reported in mL O 2 �kg À 1 �min À 1 , unless otherwise stated. If VO 2max was reported in mL O 2 �min À 1 or L O 2 �min À 1 , data were normalized to bodyweight when possible. Data are presented as mean ± SD: standard deviation/(range) unless otherwise specified. Disease duration was converted to years if months were reported in the original study. a Data reported in median. b One or few persons differ in medication status from the rest of the group. c VO 2max reported in L O 2 �min À 1 . d VO 2max reported in mL O 2 �min À 1 . e UPDRS total and III (motor score) examined off medication. f Movement Disorder Society revised UPDRS (MDS-UPDRS). g Parameter examined in smaller group than reported in the table.
out of 23 studies reported one or more end-criteria, even though more studies described a set of minimum end-criteria for the test to be considered a true maximal test [31,[35][36][37][38][39][40]46,47,49,51,52,54]. Across the seven studies that reported RER, the mean RER in pwPD on medication exceeded 1.05 (recommended value for healthy population >50 years [22]). Yet, with the exception of one study (reporting that one out of 16 pwPD did not achieve a cut off value of 1.05 (9)), none of the remaining six studies reported how many individuals that achieved the predefined minimum value for RER (different values applied across studies). Six out of the seven individual studies had a mean RER value exceeding 1.05, while the last study had a mean RER of 1.02. As for the RPE criterion, this was assessed with the Borg scale CR-10 in all studies, but no predefined minimum end value was reported in any of the studies. An exploratory percentage conversion from Borg 6-20 (perceived exertion �17 indicating an approved result) to Borg CR-10 will give an approved result when pwPD rate �8. No individual study means, or total mean, exceeded this value indicating that pwPD might have difficulties reaching and/or judging their own maximum level of exertion. Difficulties for pwPD to reach the maximum level of exertion have also been suggested to be due to mitochondrial dysfunction [61]. The average percent of the attained age predicted HR max did not lie within the standardized maximum of ±10% variation for all pwPD and separately for pwPD on medication, but pwPD off medication reached 90% of their age predicted HR max . The HR max was the most reported end-criteria with 11 studies reporting this. However, using HR max as an end-criteria has been criticized since the standard deviation is higher in the older age groups (65-85 years: ±15.0) [22]. Also, while the American College of Sports Medicine stated years ago that age predicted HR max should not be used as an absolute criterion for maximal effort [22], the current review emphasize that the criterion is still widely used when assessing VO 2max in pwPD. No studies reported VO 2 -plateau/leveling off or post-blood lactate levels, although a VO 2 -plateau is often recognized as the primary criteria for determining if a true VO 2max was performed [21]. However, a lack of consistency in the literature still exists on how to evaluate VO 2 -plateau/leveling off [22][23][24], which might explain why this criterion is often left out. Taken together, assessment and reporting of validation end-criteria are inconsistent in the PD literature. Consequently, the term VO 2peak is regularly used when involving clinical populations, as there is an assumption that these persons seldom reach their true VO 2max . This is problematic since an accurate and valid VO 2max value is of substantial physiological importance when analyzing the health and performance of the implicated participants. Katzel et al. [37] stated that they anticipated that many of the included deconditioned pwPD would not be able to obtain a true VO 2max based on the standard end-criteria. This was confirmed as only 7/63 (11%) participants not taking beta blockers achieved a true VO 2max (based on RER > 1.1 and HR max >85% of age predicted HR max ). Of note, however, Magnan et al. [62] examined VO 2max end-criteria (VO 2 -plateau, RER, RPE, and HR max ) in healthy inactive individuals and their results raised questions about the validity of the commonly used end-criteria when applied in less active populations. This underlines the need for a full evaluation of the psychometric properties of the VO 2max test in pwPD, as sedentary behavior in general typifies pwPD because of motor and non-motor symptoms [8, [63][64][65]. In addition, Midgley et al. [24] published a critique of the existing endcriteria used to determine VO 2max . They emphasized, in a larger updated survey, the considerable variation in the classical VO 2max end-criteria. Between 2005 and 2006, a total of 62% of the studies addressing the VO 2max test did not use or report any end-criteria at all, which is a pattern that seems to continue based on the present review. Interestingly, respiratory frequency, with a cutpoint of �40 breath/min, has been suggested [66] as an additional secondary end-criteria for defining maximal effort in patients with cancer, as they found this criterion to be useful as part of the effort-evaluation in people performing a VO 2max test. This might also apply to pwPD, yet warrants further investigation.
Based on the present review, a knowledge gap exists regarding the psychometric properties of the VO 2max test in pwPD, despite the frequent use of the test when evaluating aerobic capacity in this population. The studies addressing the psychometric properties show heterogeneous results from which no If studies had more than one PD group with the same medication status (see Table 3), a sample size weighted mean ± SD was calculated for the individual study. PD on : persons with PD (pwPD) on medication; PD off : pwPD off medication; PD NR : medication status not reported. (A) VO 2max in L O 2 �min À 1 (not possible to normalize to bodyweight). (B) VO 2max in mL O 2 �kg À 1 �min À 1 . Italics indicate VO 2max reported in median. (C) Sample size, variability, and study weighted means with 95% confidence intervals for all studies reporting mean values and SD. VO 2max : maximal oxygen uptake; N: number of studies. # This study also examined HC, yet we were unable to retrieve these data in mean for comparison from the study authors.
well-founded conclusion can be drawn regarding the validity, reliability, and responsiveness of the VO 2max test in pwPD. Furthermore, future research should attempt to standardize endcriteria of VO 2max testing and the proportion that need to be satisfied to confirm test validity. The few existing data suggest that reliability should optimally be evaluated based on three repeated tests, which enables assessment of retest reliability with or without familiarization. Furthermore, clinically applicable measures (e.g., SEM, LOA, CV%, SDC, and ICC) should be reported to a greater extent.

Aerobic capacity in PwPD
Studies comparing VO 2max in pwPD to HC (N ¼ 6), as well as across medication states in pwPD, revealed no differences between groups. Based on aerobic capacity results from other neurological disorders (i.e., MS [15]), along with the notion that pwPD are physically inactive when compared to HC, the current results are somewhat unexpected. Furthermore, pwPD in the on medication state would in theory be able to achieve higher VO 2max levels when compared to pwPD in the off medication state because of beneficial motor effects of antiparkinson agents. However, these discrepancies may in part be explained by the relatively mild disease severity of the included pwPD [67] (i.e., UPDRS III: unweighted mean ¼ 20.1, range 12-32) making a distinction between especially pwPD and HC complicated. Another observation in relation to the findings regarding different medication states is the fact that very few and small studies reported VO 2max in the off medication state (i.e., five studies but only three studies reported VO 2max in mL O 2 �kg À 1 �min À 1 allowing a comparison between studies). When comparing studies including both PD and HC groups, three studies (total pwPD ¼ 42 and HC ¼ 45) [39,40,51,52] reported no difference in VO 2max between groups, whereas three studies reported higher values in HC compared to pwPD (total pwPD ¼ 149 and HC ¼ 93) [33,44,48]. Those studies reporting no difference between pwPD and HC did not report UPDRS III scores, whereas two of the three studies that found a difference in favor of HC reported a UPDRS III score of 12 [33] and 17 [44], respectively. Studies showing higher VO 2max values in HC compared to pwPD were generally based on larger cohorts compared to studies showing no differences. For example the study with the largest sample size, and thus arguably the most robust and reliable data, by Mavrommati et al. [44] (PD: n ¼ 83, HC: n ¼ 55) found HC to have higher VO 2max than pwPD (PD (median (range)): 1.46 (2.35) L O 2 �min À 1 , HC: 1.69 (2.57) L O 2 �min À 1 , p ¼ 0.008) indicating that VO 2max may in fact be decreased in pwPD when compared to HC. Interestingly, this difference was shown without adjusting for a higher proportion of men in the PD group. However, no solid conclusions can be drawn regarding VO 2max levels in PD compared to HC due to the sparse and somewhat conflicting existing data.

Reported and calculated associations of VO 2max in PwPD
In other neurological populations, VO 2max is associated with a number of clinical, physical, and psychological parameters [15]. Across the identified PD studies only few reported associations. One study did not observe associations between UPDRS total or UPDRS III and VO 2max [35]. This finding might be explained by the narrow range of (mild to moderate) UPDRS III scores. Moreover, floor effects may limit the sensitivity of the UPDRS III in the milder stages of the disease. Another explanation may relate to the items assessed in the UPDRS scores. The UPDRS III predominantly focuses on the motor features of PD, such as bradykinesia, rigidity, and tremor [68]. Consequently, none of the items in the UPDRS assesses the level of physical activity or functions related to endurance. Ivey et al. [35] points to the fact that studies [69,70] have shown relatively strong associations between various balance scores and UPDRS, but failed to observe associations to strict ambulatory functions. One study [70] concluded that the single item of gait assessment in the UPDRS III is inadequate to reflect performance related to aerobic capacity in mildly to moderately impaired pwPD, which the results from the present review support. Additionally, Canning et al. [9] showed that VO 2max was not related to disease severity in terms of the Hoehn and Yahr scale and similarly Christiansen et al. [71] found that submaximal VO 2 values were not associated with UPDRS total.
Calculated associations (Figure 3), based on extracted data of the included studies, also did not show associations between UPDRS III and VO 2max , which confirm the pattern of reported associations. Taken together, it seems reasonable to suggest that in mildly to moderately affected pwPD, the UPDRS score is not associated with VO 2max . The inability to find associations between this marker for disease progression and VO 2max might be due to the sparse existing literature in this field. As recommended by Ivey UPDRS III (before slash) and VO 2max (after slash)) and (B) disease duration (converted to years if months were reported in original study; on and off represent medication status in the examination of VO 2max ). Each dot represents the mean value (studies reporting other than mean values were not included) of an individual study. If studies had more than one PD group, a sample size weighted mean for MDS-UPDRS III/UPDRS III, disease duration, and VO 2max were calculated for the individual study. MDS-UPDRS III/UPDRS III: Movement Disorder Society (MDS) Unified Parkinson's Disease Rating Scale (UPDRS) motor score (part III), VO 2max : maximal oxygen uptake. et al. [35], future studies should compare UPDRS with a larger battery of objective functional outcomes and include a wider range of disease severity. Lastly, and in contrast to the above, the recent study by Johansson et al. [34] (n ¼ 21) found a moderate positive association between frontoparietal network connectivity (dorsolateral prefrontal cortex-right frontoparietal network) and VO 2max , suggesting that aerobic exercise may alter/stabilize brain function, and possibly underpin associated benefits on motor function in PD.

Methodological considerations and future perspectives
Several methodological issues should be kept in mind when interpreting the results of the present review. First, the low number and heterogeneous study quality (Supplementary table 1) of studies calls for cautious interpretation of the existing data on VO 2max in pwPD. Second, the PD study populations from the individual studies are relatively homogeneous in relation to age, MDS-UPDRS/UPDRS III or Hoehn and Yahr scale (i.e., pwPD are mildly to moderately impaired), and VO 2max . This is of importance as it may narrow the data range thereby reducing the possibility of associations. The existing data suggest that the psychometric properties of VO 2max testing is understudied in pwPD which limit interpretation and calls for future studies. There is a need to determine SDC and reliability of all VO 2max measures for more effective prescription of aerobic exercise interventions. The fact that no clear difference was observed in VO 2max when comparing pwPD to HC does not offer strong support for implementing aerobic exercise interventions known to improve VO 2max in pwPD. However, part of the identified studies did find a difference in VO 2max favoring HC, while aerobic exercise is also known to have beneficial effects on a number of symptoms and potentially neurodegenerative processes associated with PD [34,72,73]. Consequently, this justifies the current use of aerobic interventions in pwPD, but also highlights that further studies comparing VO 2max in pwPD (with different degrees of disease severity) to HC are needed. Lastly, the identified association between VO 2max and fatigue suggests aerobic exercise as a potential symptomatic treatment of fatigue, as also seen in other neurological disorders [74].

Conclusions
In mildly to moderately affected pwPD, only one study has examined psychometric properties of the VO 2max test, reporting excellent test-retest reliability. A general lack of consistency on how to measure and report end-criteria was observed, but when reported, end-criteria were most often not met. No difference was found in VO 2max between mildly to moderately affected pwPD and HC, or between pwPD across different medication states. None of the identified studies or the exploratory analysis in the present review found any parameters, except age, sex, and fatigue that were associated with VO 2max . None of the identified studies examined severely affected pwPD.