Assessing adult physical activity and compliance with 2008 CDC guidelines using a Bayesian two-part measurement error model

While there is wide agreement that physical activity is an important component of a healthy lifestyle, it is unclear how many people adhere to public health recommendations on physical activity. The Physical Activity Guidelines (PAG), published by the CDC, provide guidelines to American adults, but it is difficult to assess compliance with these guidelines. The PAG further complicate adherence assessment by recommending activity to occur in at least 10 minute bouts. To better understand the measurement capabilities of various instruments to quantify activity, and to propose an approach to evaluate activity relative to the PAG, researchers at Iowa State University administered the Physical Activity Measurement Survey (PAMS) to over 1,000 participants in four different Iowa counties. In this paper, we develop a two-part Bayesian measurement error model and apply it to the PAMS data in order to assess compliance to the PAG in the Iowa adult population. The model accurately accounts for the 10 minute bout requirement put forth in the PAG. The measurement error model corrects biased estimates and accounts for day to day variation in activity. The model is also applied to the nationally representative National Health and Nutrition Examination Survey.

In recent years the pace of research in physical activity and its effect on health has accelerated. According to the Centers for Disease Control and Prevention (CDC), over 70% of Americans age 20 and over are overweight or obese, and almost 40% are obese 3 . In 2008, the Department of Health and Human Services issued the 2008 Phys-ical Activity Guidelines for Americans (PAG) 4 . This was the first time that physical activity guidelines were published by the federal government.
The PAG recommends that adults spend at least 150 minutes each week carrying out moderate-intensity activity, at least 75 minutes in vigorous-intensity activity, or some combination of moderate to vigorous physical activity (MVPA). Furthermore, it recommends that this activity be in intervals, or bouts, of at least 10 minutes. They define moderate-intensity to be a 5 or 6 on an intensity scale of 0 to 10; a brisk walk is an example. Vigorous-intensity is a 7 or 8 on the same scale; jogging or lap swimming are examples. In addition, the PAG recommends doing muscle-strengthening activities that involve all major muscle groups twice or more per week. These are the minimum levels of activity that are expected to have an effect on health. The report also advises that any physical activity above the minimum will result in additional health benefits.
A public health question is: what proportion of the population adheres to these guidelines? And does that proportion change based on age, sex, or other demographic variables? This type of information is important for policy makers not only to assess compliance but also to design interventions that target certain subpopulations. Yet, there is no agreement about how to measure physical activity. Furthermore, the only nationally representative source of physical activity measurements is the National Health and Nutrition Examination Survey (NHANES) 5 . To better understand the measurement error associated with different instruments, Iowa State University conducted the NIH-funded Physical Activity Measurement Survey (PAMS) to collect physical activity information [3]. The objectives of the study were to understand the measurement error of different methods of measuring physical activity in adults. The PAMS data helps develop a Bayesian two-part modeling approach that accounts for measurement error in physical activity observations. That modeling approach can be used to determine the proportion of the Iowa adult population which adhere to the PAG. Results are compared at a national level with NHANES data and possible reasons for the differences are discussed. This paper is organized as follows. Section 2 introduces PAMS and reviews the literature on approaches to measure physical activity. Section 3 develops a two-part model to jointly model the distribution of daily 10-minute bouts and the average excess minutes of MVPA. Section 4 presents the fitted model's results in the context of the application. In this section, we also compare results with those obtained using NHANES data and discuss differences. Section 5 presents model diagnostics and goodness of fit. Section 6 discusses results and suggests future work.

A brief review of physical activity measurement
The term "physical activity" is not well defined. It is hardly surprising that many different approaches to quantify physical activity have been proposed in recent years. Thinking of physical activity as the amount of energy expended per day during a short period (e.g., two weeks), then doubly-labeled water is considered to be the gold standard among measurement instruments [4,14]. However, it is impractical to use doubly-labeled water in large studies, not only because of cost but also because of respondent burden.
In practice, instruments such as accelerometers that measure movement have become common-place. Accelerometers provide estimates of movement through uni-,bi-, or triaxial measurements. Measurements of activity are then often reported as "counts" such as with the Actigraph or metabolic equivalents (MET) for Sensewear Armbands (SWA). Typically, raw accelerometer data are converted to counts or METs using proprietary algorithms, but there are new attempts to analyze raw accelerometry data [2]. Urbanek et al. [40] uses the full, raw accelerometry data to create new measures of stride-to-stride gait variation. He et al. [15] uses wavelets [1] to classify activity types based on accelerometry data. This may help compensate for the fact that accelerometry data provide no information about the context in which physical activity takes place. There is a rich literature that focuses on the relationship between total counts per day and some outcome variable [31,34]. Other authors use count data at the hour level to further understand how physical activity levels vary by demographic groups [34,35]. Functional data analysis is also a way to model and analyze high-frequency, accelerometer data and the short term timeframe such as how varying activity levels within a day affect covariates. Xiao et al. [42] provides methods to model the systematic and random patterns of physical activity while accounting for dependence on covariates such as age and gender. Fan et al. [11] using functional ANOVA to assess the circadian activity profiles of teenage girls. Goldsmith et al. [13] uses functional scalar regression to understand the association between physical activity and a variety of covariates.

Description of PAMS
The PAMS was conducted over two years starting in 2009 in Iowa. The goal was to obtain information on physical activity of adult men and women. The survey was conducted in two stages across four counties and included two strata per county. In each county there was a "high minority population" and "low minority population" stratum to improve chances of recruiting African American and Hispanic individuals. Eligible participants included adults between 21 and 71, with the ability to engage in physical activity, who were not pregnant or lactating, were able to speak English or Spanish, and had a landline in their place of residence. A summary of the demographic characteristics of PAMS participants is given in Table 1. Energy expenditure (EE) information was collected on two separate occasions using SWA. In order to mitigate dependence in activity across days for an individual, the two measurements were taken 2-3 weeks apart. The SWA provides MET levels every minute. A MET is a measure of energy cost for a particular physical activity. Formally, 1 MET is defined as 0.0175 kcal/kg/min expended. METs can be thought of as a multiplicative effort to carry out the activity relative to resting state. An activity that is classified as 5 METs then requires about 5 times the energy that is required to be at rest. MET-minutes are the number of minutes in an activity multiplied by the MET value of that activity (we include a figure in the supplemental material of the raw data for three individuals from the SWA for reference).
The method with which the SWA calculates MET-minutes is proprietary, but SWA's measurement properties and validity have been studied. Hills et al. [17] and [16] found the SWA to be an accurate measure of physical activity. Santos et al. [29] and Scheers et al. [30] found that the SWA tends to overestimate MVPA. Calabro et al. [6] also found the SWA to slightly overestimate physical activity, but it was much closer to truth than the other accelerometers used which underestimated physical activity by a greater magnitude. Casiraghi et al. [7] notes that the SWA is a good measurement for certain activities like running and walking, but its use is limited in activities like cycling and swimming. The Compendium of Physical Activities 6 gives MET values for many common daily activities.
The PAG defines activities with METs ranging from 3.0 to 6.0 as moderate intensity and activity with METs greater than 6.0 as vigorous. This means that the recommendation of 150 minutes of moderate physical activity is equivalent to 150×3.0 = 450 MET-minutes per week. Another stipulation in the PAG is that activity must occur in bouts of at least 10 minutes to count toward this total. In practice, what constitutes a bout is less clear. To address the research questions we need an operational definition of what constitutes a bout.

Definition of Bouts and Average Excess MET-minutes
We define a bout as a burst of activity in at least 8 out of 10 minutes, in at least 3 METs [20,38], as a bout. This means that at least 8 minutes out of 10 minutes must be in at least moderate physical activity to count toward the recommended guidelines. We allow the 8 out of 10 to move along a rolling window, by shifting a 10 minute window, minute by minute, to determine if the time in the moving window counts towards at least 3 MET activity. As long as we observe ≤ 2 minutes of less than moderate activity (<3 METs), the clock continues to count MET-minutes for that bout. Once there are ≥ 3 minutes in less than moderate activity, the "clock stops" at the minute before the 3rd minute is reached. Further, the final 2 minutes of activity cannot be below moderate level. About 24% of the 24 hour data collections had zero bouts, and 11% of individuals in PAMS had zero bouts on both study days.
Total MET-mins in MVPA is zero for individuals with zero bouts, and is ≥30 for individuals with a minimum of one bout (10 minutes × 3 METs = 30 MET-mins in MVPA). To account for these constraints, define the random variables as: (2) Refer to Y 2ij as the average excess MET-minutes. There were several outliers in both number of bouts and total MET-minutes; we removed persons with more than 2500 total MET-minutes because they are believed to be mismeasurements. Figure 1 plots Y 2 against Y 1 . The range in the plot is constrained between 1 and 13 per day to ensure that that there are at least 20 observations in each bout boxplot. Even after accounting for the number of bouts, the medians of Y 2ij is positively related to number of bouts. This means those who have more bouts, often engage in longer or more intense bouts. Additionally, the distribution of residuals of Y 2 is right skewed, and a log transformation makes these residuals resemble a Normal distribution. Further details and plot are in Section 1 of the supplemental material.

Checking for Day Effect of Observations
Creating a two way contingency table for number of bouts in day one versus number of bouts in day two allows for checking whether exchangeability is a reasonable assumption for Y i1 and Y i2 using Bowker's test [5]. The p-value for the hypothesis test was 0.12 indicating exchangeability is not an unreasonable assumption. The contingency table and further description of the test is Section 2 of the supplemental material. We also checked for weekend effect on Y 1 using a paired t-test, which resulted in a p-value =0.50. Since there is no obvious indication of a day effect, bouts within individuals are assumed exchangeable. A day effect is also possible for Y 2 , which depends on the number of bouts ( Figure  1). There is also interest in knowing whether there is an effect of weekend on Y 2 . To explore the association between Y 2 across days, we fit the following linear model: where W eekend ij is an indicator for weekday (M-F) versus weekend (Sat or Sun), β 0 represents the day effect and β 2 represents weekend effect. Hypothesis tests for day or a weekend effect on Y 2 indicate no effect (p-value = 0.63, 0.65, respectively). We also checked for weekend effect on Y 1 using a paired t-test, which resulted in a p-value =0.50. Since there is no obvious indication of a day effect, we will assume that average excess MET-minutes within individuals are exchangeable.

Model for MET-mins in MVPA During at least 10 Minute Bouts
We introduced the correlated random variables Y 1ij and Y 2ij earlier as the response variables. In this section, we present a measurement error model for Y 1 and Y 2 .

Notation and Data
After removing outliers and individuals without a replicate observation, we have N = 2114 observations obtained on n = 1057 individuals. We let i represent individual, i = 1, ..., 1057 and j represent the measurement occasion, j = 1, 2. We define a vector Z i of dimension eight, that includes covariates for individual i: gender, age, indicators for Black, Hispanic, smoker, college degree, and physical job. The full model matrix is Z = (Z 1 , Z 2 , ..., Z 1057 ) ′ . There were 315 instances of item non-response for occupation in the 1057 individuals, so we imputed the missing values using predictions from a logistic regression with physical job as the response and all remaining covariates in Z as covariates. Denote T 1ij as individual i's unobservable true number of bouts on day j and T 2ij as individual i's unobservable true average excess MET-minutes per bout on day j. We let t 1i and t 2i be the expected values of T 1ij and T 2ij conditional on individual i, respectively. We refer to these quantities as individual i's usual number of bouts in a day and usual average excess MET-mins per bout, respectively. More formally: Following [22] we assume that the measurements of physical activity are unbiased for the usual activity levels. This is a plausible assumption because the measurements are obtained using an objective instrument. We recognize that this is a strong assumption since we cannot validate it with the data we have. Future work should design data collections, such as done in [33], in order to properly assess or mitigate this assumption. We also assume that the armband records zero bouts if and only if individual i participated in zero bouts of activity on day j. Formally, these assumptions can be expressed as: ). To answer the original question of adherence to the PAG, individual i's usual total MET-minutes in MVPA for a day is defined as:

Modeling Number of Bouts
Individual i's number of bouts at measurement j, Y ij is a count, so a natural model is the Poisson distribution. However, as Figure 3 in the supplemental material shows, there is within person overdispersion present, so a standard Poisson model is not flexible enough for the PAMS data. We also considered a Negative Binomial distribution to handle the overdispersion. During model assessment, the Generalized Poisson proved to be a better fit (see Section 5).
An alternative to the Poisson distribution that allows for a more flexible meanvariance relationship is the Generalized Poisson distribution [8]. The Generalized Poisson distribution is indexed by two parameters, θ and λ, with probability density function The Generalized Poisson is overdispersed relative to a Poisson distribution if λ > 0, underdispersed if λ < 0 and a regular Poisson if λ = 0. When 0 < λ < 1, the probability mass function and first two moments of the distribution can be written directly without truncation or normalization [8,32]. In this case, its expected value is θ 1−λ and the variance is θ (1−λ) 3 . Reparametrizing the distribution in terms of the mean, µ, the variance is µ (1−λ) 2 . At this point we only concern ourselves with overdispersion, thus the restriction that 0 < λ < 1 is appropriate.
We model the mean of the Generalized Poisson distribution as a function of the covariates plus an individual random effect for across person overdispersion. The random effects are assumed to be joint Normal with random effects with average excess MET-minutes. The priors for λ and γ are proper and independent, but relatively non-informative. The model for Y 1ij is written as:

Modeling Average Excess MET-minutes
Average Excess MET-minutes can take positive value or be zero; this type of data is commonly referred to as "semicontinuous data" and occurs often in the fields of epidemiology and nutrition. Many models for semicontinuous data have built upon the work of [25,36,37]. [23] and [24] propose Bayesian approaches for estimation in these models. Kipnis et al. [22] and [21] propose a measurement error approach for semicontinuous data via regression calibration in the context of a nutrition application.
To account for measurement error and the large number of zeros in the sample, we propose the following model for total excess MET-minutes: where π i = P (t 2ij > 0|i) = P (Y 1ij > 0|i) is individual i's probability of participating in at least one bout, which can be calculated using the Generalized Poisson probability mass function given parameters γ, λ, b 1i and covariates Z i . The priors for β, σ 2 y , Σ b are conjugate, independent, and relatively noninformative. Sensitivity analysis showed little effect of the priors on inference for the variance components.
The full likelihood for an individual can be written as: where f (Y 1ij |·), f (Y 2ij |·), and f (b 1i , b 2i |·) are as defined in Equations (8) and (9), and θ is a vector of all unknown parameters. Along with the assumption of independence between individuals, the full likelihood is:

Estimating Distribution of Usual Daily MVPA
Our goal is to estimate the proportion of Iowans who are in compliance with the PAG on average. To answer this question, we focus on the distribution of usual total METminutes in MVPA for individuals from a specified population in a day. We specify the population in which we are interested through the design matrix Z. To estimate this distribution, simulate draws of t 3 through the following: For ℓ from ℓ = 1, 2, ..., L do: (1) Sample θ (ℓ) from the posterior distribution p(θ|Y).
The proportion of individuals from the population who meet the PAG in the ℓ th draw is given by: If there are weights w i associated with the individuals of the design matrix Z, estimates of percentiles of the distribution of t 3 can be obtained by: Recall that our model is estimating usual daily MET-minutes in bouts, and our model already considers how often individuals participate in at least a bout of MVPA. Because of this, we can consider weekly activity to be 7×usual daily MET-minutes in bouts.

Results
We proceed with estimation via MCMC following [23] and [24], who propose Gibbs algorithms for two-part models with semicontinuous data that are nearly or completely conjugate. We construct a Gibbs algorithm for drawing samples from the posterior distribution, and since many of the priors are not conjugate, we need to use a Metropolis-within-Gibbs sampler. The Gibbs algorithm was written in C++ and R. Full conditional distributions can be found in Section 4 of the supplemental material. Starting values for the MCMC are obtained from maximum likelihood. We used the resulting MLE's and lower and upper bound of 99.99% confidence intervals as well dispersed starting values for the regression parameters in order to use the Gelman-Rubin diagnostic in assessing chain convergence. We dispersed λ between 0 and 1 for its starting values in the 3 chains. Values for Σ b , σ 2 y were chosen such that starting values were far above and below the final region of the posterior distribution.
We ran 3 chains of length 500,000, with the first 50,000 draws as burn-in, and thinned every 15 iterations to save on memory and reduce the autocorrelation of parameter draws. Traceplots and Gelman-Rubin diagnostics (all < 1.05) indicated good mixing and no signs of non-convergence. The Monte Carlo standard error was calculated using the R package mcmcse. The MC error was less than 1.5% of posterior standard deviation for all parameters. Figure 2 shows posterior means and 95% credible intervals for all regression coefficients in the model. The signs on the coefficients and relative interval widths nearly match for all covariates across the two parts of the model. Males and those with physical jobs tend to exhibit a higher number of bouts per day and more average excess METminutes per bout. BMI is negatively associated with bouts per day as well as average excess MET-minutes per bout. Age is negatively associated with bouts but not with Hispanic was negatively associated with bouts per day. Black is negatively associated with average excess MET-minutes but there is considerable uncertainty. Smoking is negatively associated with both number of bouts and average excess MET-minutes. Physical jobs being positively associated with bouts and average excess METminutes makes intuitive sense, as people with these jobs are engaging in physical activity throughout the workday, and because men more often have these jobs. The negative associations with BMI could be explained by those who participate in activity are less likely to be overweight since those individuals are seeing the benefits of physical activity. Those with a college education may be more likely to have non-physical jobs, so it is possible they get their physical activity through voluntary exercise. This exercise could happen all at once outside work hours, which would explain the nonrelationship with number of bouts. Table 2 shows posterior means and 95% credible intervals for the remaining parameters. Recall that in a Generalized Poisson distribution, a value of λ > 0 indicates overdispersion. For the PAMS data, the estimate of λ was 0.09 (0.08,0.1). The estimated measurement error variances, σ 2 b1 , σ 2 b2 , are large relative to the regression coefficients corresponding to their respective model component. This suggests that there is considerable day to day variation in physical activity and that the device measurements themselves are noisy. The estimate of ρ b is 0.41 (0.18,0.77), indicating that there is a significant amount of correlation between the mean functions of Y 1 and Y 2 .

Distribution of Usual MVPA
In Section 3.4 we explained how to generate distributions of MVPA in MET-minutes for any population of interest. Here we consider the PAMS population, and differences in gender, BMI, and age. Table 3 shows PAG compliance rates for these different populations. The mean compliance rates and 95% credible intervals for the PAMS sample was 0.6 (0.46, 0.69). Figure 3 shows the distribution of daily usual METminutes for each of these populations with uncertainty. Other compliance rates match what the regression coefficients in the previous section suggested, i.e. that male's tend to have higher compliance as well as younger people, and those with lower BMI.
Overall, these numbers are high compared to compliance across the entire United States [38]. However, these results also show there is significant variability among the population, indicating that interventions targeted to specific subpopulations could be more effective than targeting the entire adult population.

Usual MVPA using NHANES Data
Because the results for the PAMS showed a high level of compliance in Iowa, we apply this same model to a nationally representative survey, NHANES. NHANES is a large national survey that can be used to assess the health of Americans. The 2003-2006 NHANES is the most recent collection which included physical activity monitoring with accelerometers (ActiGraph AM-7164) worn on the hip. The aim was to compare the results from the PAMS study to a different large survey that collected accelerometry information. So that results obtained from the two surveys would be comparable, we used the method proposed by [18] to select a subsample from the NHANES participants of equal size to PAMS and that would match the PAMS sample in other important ways like demographics. We implemented the method using their R package MatchIt [19]. The subsample from NHANES was selected such that each person in PAMS was matched to someone from NHANES on demographic variables including gender, age, race, education, and BMI. Unfortunately, NHANES does not report participants' occupation, a variable that we found to be significantly associated with physical activity. For the individuals we include from NHANES, we randomly sam-pled two days of accelerometer measurements from the six available days. To compute bouts for the NHANES participants, we used the minute to minute information and follow the approach suggested by [39], and the threshold for moderate activity to be 2020 counts per minute. Counts during minutes within bouts were then converted to MET-minutes using the method of [12]. The same model is fit to the subset of NHANES data. Estimated compliance with the PAG for the US population, as well as for the same populations in Table 4, are shown in Table 4. Figure 4 shows the estimated distribution of daily MET-minutes for these same populations. The results for NHANES are similar to those in [38], but there is a large difference when compared to the results using PAMS. Levels of activity are much lower in the NHANES data. These large differences may be attributed to several differences between PAMS and NHANES: i) PAMS is a sample of the population of four Iowa counties while NHANES is a nationally representative sample, ii) PAMS used the SWA to measure physical activity while NHANES used the Actigraph accelerometer, iii) compliance and wear time were much higher for PAMS, iv) the SWA uses a proprietary algorithm to calculate METs while we used Freedson et al.'s method to compute METs for NHANES. Finally, over 10 years elapsed between the two surveys. Consequently, we can expect differences in terms of the desirability of participating in physical activity.
Although the populations from which the samples were drawn are not directly comparable, we would not expect such a large difference between the two populations. Participants in PAMS wore their monitor for the entire day and night while NHANES participants were instructed to wear the device during waking hours, so this difference in wear time should not have a major effect on the measurement of MVPA. We believe that the major differences can be at least partially attributed to the variability in different brands of accelerometers and the way in which they convert movement to activity levels/METs. There is a large variety of methods and considerable variation between the methods of converting counts to METs [9].

Number of bouts Observed Generalized Poisson Negative
To assess the fit of Y 1 , we count the number of individuals who had zero bouts on day one and zero bouts on day two, the number of individuals who had one bout on day one and zero bouts on day two, and so on for all combinations of 0,1,2+ bouts. We stop at 2+ because if an individual has two bouts in a day, they will almost certainly achieve the recommended time in MVPA. Doing this for all M = 1000 simulated data sets, we calculate means for each category across all simulated data sets and compare to our observed proportions using a Chi-square test for proportions. We also do this procedure using a Negative Binomial distribution for Y 1 instead of a Generalized Poisson, swapping distributional forms in Equation (8). Table 5 shows the results. The large p-value for the Generalized Poisson here indicates that data simulated from the fitted model look similar to the observed data, at least with respect to the specific statistic. The small p-value related to the Negative Binomial model for Y 1 indicates a lack of fit, and therefore the Generalized Poisson model is preferred in this application.
We also calculate the mean within-person standard deviation of Y 1 and the withinperson range of Y 1 . The posterior predictive p-values for these are 0.7 and 0.139, respectively, which indicates no lack of fit.
To assess the overall fit of the non-zero values of Y 2 , we use the Kolmogorov-Smirnov test to compare each simulated data sets' empirical cumulative distribution function (ecdf) from the fitted model to the observed values' ecdf of Y 2 . We perform this test for all M simulated data sets, so we have M p-values. Table 6 shows a summary of those p-values. These results show there are not apparent issues in the fit of Y 2 either. We performed the same model assessment procedures after fitting the model to the NHANES data, and the results were similar, indicating that the model also appears to fit the NHANES data well.

Discussion
This paper presented a two-part Bayesian hierarchical model with measurement error that can be used to estimate MET-minutes in MVPA. In turn, the model can further be used to estimate compliance with the PAG. We were able to accommodate the recommendation that activity must come in at least 10 minute bouts by jointly modeling the number of bouts and average excess MET-minutes per bout for individuals. Additionally, these were modeled as functions of demographic variables which could then be used to create distributions for subpopulations. We used data from the PAMS study to fit the model. In PAMS, participants wore an activity monitor on two separate days, for 24 hours. In preliminary analysis, we found that the 2-3 week buffer between measurements in PAMS seemed to successfully remove any dependence between recording days. The results showed that men and those with jobs that are physically demanding had higher levels of MVPA, and those with college degrees did as well but to a lesser extent. Age and BMI were negatively associated with MVPA. This type of information might be useful in designing interventions and that target specific subpopulations.
The estimated distributions of usual MVPA that were based on the PAMS data were unexpected in that about 60% of the Iowa adult population met the current the PAG. The high proportion of compliers is at odds with the rates of obesity and the sedentary lifestyle that have been documented [26]. Based on the raw data, only 27% of the sample didn't achieve sufficient condition of two bouts per day to meet PAG guidelines. Moreover, only 11% didn't participate in a bout of MVPA. There are various interpretations for these results. First, there are differences in reported activity when accelerometers are worn on the hip versus the wrist or arm. Both [28] and [10] found higher accuracy when accelerometers were worn on the hip. Since the SWA is worn on the arm, it can capture upper body activity and potentially record it as MVPA when it is not. This is one argument for why the results from PAMS seem so high. In addition to these problems, the SWA is known to overestimate MVPA [29,30]. Second, it is possible that the PAG are set at a level that is easy to meet and that health benefits are realized with higher levels of physical activity. In contrast, the results we obtained using NHANES data suggest that only 10% of American adults are in compliance with the PAG.
New methods that do not assume unbiasedness of accelerometry measurements are needed. To fit these new models, we require a gold standard to measure minute by minute physical activity in order to calibrate accelerometry measurements. To further complicate things, [17] claims that it is "unlikely that a single measure of reported PA would suffice", in reference to assessing every possible activity in which humans engage. Finally, the PAG also advises adults to participate in two sessions of muscle building activity per week to realize health effects. The PAMS did not measure this type of physical activity, and thus we did not consider it in our calculation of compliance rates.
Elizabeth Schneider for her help proofing and help getting this manuscript in final form.

Funding
This work was supported by National Institutes of Health Grant number HL091024. No potential conflict of interest was reported by the authors. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-NA0003525. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government.

Data Availability Statement
The data that support the findings of this study are available from the corresponding author, DR, upon reasonable request. NHANES data is available at https://wwwn.cdc.gov/nchs/nhanes/default.aspx.

Supplemental Material
Example of raw data given by PAMS SWA Figure 5 gives an example of three individuals' plot of MET activity across 24 hours. MET levels are often hovering around 1.5 during waking hours with a couple short duration spikes in MET activity during the day.

Distribution of Average Excess MET-minutes Y 2
Using log-transformed the positive Y 2 values, we performed the linear regression: Figure 6 shows a QQ plot for the residuals for the above model. A Shapiro-Wilk test for normality of the residuals results in a p-value of 0.15; this along with the QQ plot suggests that the empirical distribution of the log transformed data approximates a normal distribution, which allows us to use a lognormal distribution to model the Y 2 in the original scale.

Bowker's Test to justify Exchangeability Assumption
To check whether assuming observations within an individual for Y 1 are exchangeable is reasonable, including only individuals who had two observations. Figure 7 shows the frequency of individuals that had the particular combination of bouts on days one and

Normal Q−Q Plot
Theoretical Quantiles Sample Quantiles Figure 6. Normal quantile plot of residuals for log transformed Y 2 regression. This plot shows a log(Y 2 ) is approximately Normally distributed.
two. Bowker proposed a test for symmetry in m by m contingency tables. The null hypothesis of Bowker's test is that π lk = π kl ∀ l ̸ = k where π ij is the true frequency in the ijth cell. We tested the symmetry of the contingency table and the p-value was 0.12. This test is sensitive to the presence of zero or low counts, so we also implemented the same test on a smaller subset of the contingency table (number of bouts up to 6) to ensure that the results were consistent. In all cases, we failed to reject the null hypothesis, which suggests that within individual measurements of number of bouts can be assumed to be exchangeable.

Within-person Overdispersion of Y 1
We fit a Poisson model to the Y 1 data, and simulated data from the fitted model. Figure 8 shows the distribution of mean within person standard deviations for each simulated dataset as well as the truth as a vertical line. This shows the standard Poisson distribution is not sufficient for these data. Full conditional distributions where ∼ N (m β , V β ) (21) × I(0 < λ < 1) (25) Figure 9 shows the posterior medians and 95% CIs for the γ and β regression coefficients. There are only small differences between the different prior sets.